MulAddDst

Applicability

Product

Supported/Unsupported

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference product 's AI Core

Atlas inference product 's Vector Core

x

Atlas training products

x

Function Usage

Multiplies src0 and src1 by element, adds the result to dst, and stores the final result in dst. The formula is as follows.

Prototype

  • Computation of the first n pieces of data of a tensor
    1
    2
    template <typename T, typename U>
    __aicore__ inline void MulAddDst(const LocalTensor<T>& dst, const LocalTensor<U>& src0, const LocalTensor<U>& src1, const int32_t& count)
    
  • High-dimensional tensor sharding computation
    • Bitwise mask mode
      1
      2
      template <typename T, typename U, bool isSetMask = true>
      __aicore__ inline void MulAddDst(const LocalTensor<T>& dst, const LocalTensor<U>& src0, const LocalTensor<U>& src1, const uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams)
      
    • Contiguous mask mode
      1
      2
      template <typename T, typename U, bool isSetMask = true>
      __aicore__ inline void MulAddDst(const LocalTensor<T>& dst, const LocalTensor<U>& src0, const LocalTensor<U>& src1, uint64_t mask, const uint8_t repeatTime, const BinaryRepeatParams& repeatParams)
      

Parameters

Table 1 Parameters in the template

Parameter

Description

T

Data type of the destination operand. For details about the data type constraints of the destination and source operands, see Table 3.

For the Atlas A3 training products / Atlas A3 inference products , the supported data types are half and float.

For the Atlas A2 training products / Atlas A2 inference products , the supported data types are half and float.

Atlas 200I/500 A2 inference products : The supported data types are int16_t, uint16_t, half, int32_t, uint32_t, and float.

For the Atlas inference product 's AI Core, the supported data types are half and float.

U

Data type of the source operand.

For the Atlas A3 training products / Atlas A3 inference products , the supported data types are half and float.

For the Atlas A2 training products / Atlas A2 inference products , the supported data types are half and float.

Atlas 200I/500 A2 inference products : The supported data types are int16_t, uint16_t, half, int32_t, uint32_t, and float.

For the Atlas inference product 's AI Core, the supported data types are half and float.

isSetMask

Indicates whether to set mask inside the API.

  • true: sets mask inside the API.
  • false: sets mask outside the API. Developers need to use the SetVectorMask API to set the mask value. In this mode, the mask value in the input parameter of this API must be set to the placeholder MASK_PLACEHOLDER.
Table 2 Parameters

Parameter

Input/Output

Description

dst

Output

Destination operand.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

src0, src1

Input

Source operand. The data types of the source operand and the destination operand can be different.

The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT.

The start address of the LocalTensor must be 32-byte aligned.

count

Input

Number of elements involved in the computation.

mask[]/mask

Input

The mask parameter is used to control the elements involved in computation in each iteration.

  • Bitwise mode: controls the elements that participate in computation by bit. If a bit is set to 1, the corresponding element participates in the computation. If a bit is set to 0, the corresponding element is masked in the computation.

    The mask is in array form. The array length and the value range of the array elements are related to the data type of the operand. When the operand is 16-bit, the array length is 2. In this case, mask[0] and mask[1] must be in the range of [0, 264 – 1] and cannot be 0 at the same time. When the operand is 32-bit, the array length is 1. In this case, mask[0] must be in the range of (0, 264 – 1]. When the operand is 64-bit, the array length is 1. In this case, mask[0] must be in the range of (0, 232 – 1].

    For example, if mask = [0, 8] and 8 = 0b1000, only the fourth element participates in computation.

  • Contiguous mode: indicates the number of contiguous elements that participate in computation. The value range is related to the operand data type. The maximum number of elements that can be processed in each repeat varies according to the data type. When the operand is 16-bit, mask ∈ [1, 128]. When the operand is 32-bit, mask ∈ [1, 64]. When the operand is 64-bit, mask ∈ [1, 32].

repeatTime

Input

Number of iteration repeats. The Vector Unit reads 256 bytes of contiguous data for computation each time. To read the complete data for processing, the unit needs to read the input data in multiple repeats. repeatTime indicates the number of iteration repeats.

For details about this parameter, see High-dimensional Sharding APIs.

repeatParams

Input

Parameters that control the operand address strides. They are of the BinaryRepeatParams type, and contain such parameters as those that specify the address stride of the operand for the same data block between adjacent iterations and address stride of the operand between different data blocks in a single iteration.

For details about the address stride parameters between adjacent iterations, see repeatStride. For details about the address stride parameters of DataBlock in the same iteration, see dataBlockStride.

Table 3 Data type constraints:

src0 Data Type

src1 Data Type

dst Data Type

PAR

Availability

half

half

half

128

Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas inference product 's AI Core

Atlas 200I/500 A2 inference products

float

float

float

64

Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas inference product 's AI Core

Atlas 200I/500 A2 inference products

half

half

float

64

Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas inference product 's AI Core

int16_t

int16_t

int16_t

128

Atlas 200I/500 A2 inference products

uint16_t

uint16_t

uint16_t

128

Atlas 200I/500 A2 inference products

int32_t

int32_t

int32_t

64

Atlas 200I/500 A2 inference products

uint32_t

uint32_t

uint32_t

64

Atlas 200I/500 A2 inference products

Returns

None

Constraints

  • For details about the operand address alignment requirements, see General Address Alignment Restrictions.
  • For details about the restrictions on overlapping operand addresses, see General Address Overlap Restrictions. In particular, when the source operand is of the half type and the destination operand is of the float type, the source operand and destination operand cannot be 100% overlapped. Therefore, address overlapping is not supported.

Examples

  • Example of high-dimensional sharding computation API - contiguous mask mode (the data type of the source operand is half and that of the destination operand is float)
    1
    2
    3
    4
    5
    uint64_t mask = 64;
    // repeatTime = 4. 64 elements are computed in each iteration, and 256 elements are computed in total.
    // dstBlkStride, src0BlkStride, src1BlkStride = 1. Data is continuously read and written in a single repeat.
    // dstRepStride = 8, src0RepStride, src1RepStride = 4. Data is continuously read and written between adjacent iterations.
    AscendC::MulAddDst(dstLocal, src0Local, src1Local, 64, 4, { 1, 1, 1, 8, 4, 4 });
    
  • Example of high-dimensional sharding computation API - bitwise mask mode (the data type of the source operand is half and that of the destination operand is float)
    1
    2
    3
    4
    5
    uint64_t mask[2] = { UINT64_MAX, 0 };
    // repeatTime = 4. 64 elements are computed in each iteration, and 256 elements are computed in total.
    // dstBlkStride, src0BlkStride, src1BlkStride = 1. Data is continuously read and written in a single repeat.
    // dstRepStride = 8, src0RepStride, src1RepStride = 4. Data is continuously read and written between adjacent iterations.
    AscendC::MulAddDst(dstLocal, src0Local, src1Local, mask, 4, { 1, 1, 1, 8, 4, 4 });
    
  • Example of computing the first n data elements of a tensor (the data type of the source operand is half and that of the destination operand is float)
    1
    AscendC::MulAddDst(dstLocal, src0Local, src1Local, 256);
    
Result example:
Input (src0Local):
[-83.     58.2   -14.28  -43.12   20.72  -79.9    54.16   31.56   -1.464
  68.25  -28.31  -93.5    -4.2   -46.56  -22.23   78.5   -69.56  -37.03
 -53.12   58.28  -71.56  -34.44   85.94   96.3    66.06   99.94  -45.94
   8.75  -93.9    35.56   82.56  -70.8   -68.75  -35.4    95.3   -49.1
 -56.34   86.75   90.25   24.17   79.06  -49.66  -95.3    -6.965 -63.72
 -33.16  -15.56  -43.28   51.28   40.1    83.25   49.72   55.47  -53.7
  17.55  -36.06   63.     59.16  -66.8    -9.01   25.56   44.28   22.12
 -33.84  -31.9   -74.2    79.94  -34.94    1.119  18.45  -92.75  -83.25
  42.66  -77.6    33.28    0.709 -19.3    44.44   45.28  -33.4   -55.94
 -42.22  -37.72   39.4    87.25   23.19   34.16   51.3   -22.16   15.234
  59.    -20.45  -63.9    41.84  -14.63  -80.94   47.8   -36.84    8.47
 -60.66  -26.06  -42.78   30.5   -91.3    55.84  -85.44  -99.44   68.2
 -71.7    27.45  -11.48  -48.03   71.     71.5   -59.2    14.67   79.25
  32.7   -54.22    6.17  -69.94  -49.22   87.7   -61.53   36.25  -57.84
 -81.75  -24.84  -35.    -62.44  -47.22   19.95   21.16  -31.56   13.38
  72.4   -64.06  -89.75  -28.17   34.4   -68.06  -46.94   16.06   65.56
   3.16  -59.88  -32.97   30.69   89.5    16.66   25.05   -1.988   5.27
 -23.14  -26.89  -24.72    1.427 -14.46   81.9   -59.94   68.7   -83.2
 -75.44   88.6    27.62  -58.06  -36.1   -49.53   27.73   89.5   -51.5
  90.     67.94  -70.8    24.2   -75.8   -96.75  -22.66   33.03    6.293
 -87.5    36.56   36.06  -76.8     1.786  82.9    87.6   -63.94   -4.51
 -89.06  -56.06   75.2   -31.89   27.44   35.22  -27.19   37.53   96.94
 -83.25  -49.6    31.78  -50.25   65.2    69.9    63.03   53.    -70.1
 -57.22  -11.99  -23.14   44.28  -77.3    77.25   10.805  16.3   -96.6
 -94.9    34.1   -40.25  -99.7    -6.156  44.97   82.7    51.1   -53.28
  85.44  -80.94  -47.    -53.47  -35.22   76.75  -28.38   26.48  -67.06
  34.28  -54.6    21.52  -38.9    79.75   51.7   -39.44   48.56  -91.7
 -44.06   92.9    11.79    8.98   -5.074  12.375 -24.77  -27.31   76.2
  39.8    -5.46   25.17   47.   ]
Input (src1Local):
[-57.97    43.5      8.08    72.4    -81.44   -52.      69.1    -84.25
  31.12    34.34    74.75    83.56   -83.      80.1     42.84   -31.6
  88.56    47.34    18.89   -95.25    16.88   -85.75    76.75   -17.19
  23.39    92.56    22.81    77.94    38.62   -55.8     38.22   -88.6
 -99.4    -66.75    90.44    80.56    12.78   -12.6    -68.4      2.816
  27.45   -60.88    70.      61.78   -90.56   -99.25    38.25   -14.49
 -35.88    38.1     13.      29.22   -57.06   -44.7      6.535  -44.6
 -76.3     91.7     36.66    83.9     66.     -81.25   -50.06    68.
   2.705  -51.72    66.9     49.03    15.76     9.37    33.2     99.56
 -20.55    83.3    -57.1     37.06    68.94   -91.9    -46.06   -92.7
  64.4      8.164    8.98    10.76   -75.6     26.94    46.8     62.
   8.734  -69.25   -70.2    -59.      67.25    87.6     48.72    60.16
  19.39    48.62    21.64    25.06     1.013  -36.6    -46.28   -29.14
  67.44    56.7     32.03   -28.81   -94.44    49.6      0.583  -84.4
 -51.53   -43.      66.     -68.      77.44   -50.16   -90.4    -46.22
  90.25    88.      79.25   -40.84   -71.7    -27.03    19.53    85.44
  45.06    60.72    19.22   -28.95   -47.72    97.8    -51.6     31.42
  31.75   -21.84   -71.4     77.9     43.12    35.66   -50.84   -52.
 -48.84   -53.97   -59.56    31.2    -64.3    -10.47    86.25   -84.44
 -56.4    -63.03   -99.9     54.44    40.72    74.94     8.305   18.52
 -47.34   -74.06    79.1     92.44    84.94   -98.7    -41.06   -80.2
 -71.06    89.06    96.2    -19.83   -51.03   -92.      82.25   -75.75
  58.66    22.72   -89.06   -83.06   -73.5     18.75    -0.939  -96.4
  50.12   -73.9    -56.97    52.34   -95.56    11.02   -46.3    -52.2
  -8.46    80.56    77.     -51.72    38.8    -66.44   -69.     -30.33
 -53.3      5.406   74.8     52.25   -35.88    92.5     51.38    40.47
  43.94   -29.05    89.7    -74.5    -83.5     81.75   -56.6    -13.625
  86.9     -4.58   -67.5     -6.67   -59.53   -30.4    -91.75   -84.3
 -66.6    -28.61   -13.79   -70.75   -90.2    -47.94    59.56    84.2
   0.7085 -57.44   -24.94   -11.875  -90.4     54.22   -44.16   -36.34
 -31.64    72.1    -81.25    75.8     93.9    -28.28   -20.53    90.2
 -58.97   -95.7     59.22   -37.8     94.9    -86.7     36.16    26.47  ]
Input (dstLocal):
[-97.94773    -61.303955    32.56878    -87.50743    -78.92147
  59.20739     50.336506    49.039738   -76.2525       0.25441223
 -71.73807      6.481831   -55.5052     -51.057415    31.403702
  63.285076    98.1897      86.71727    -50.16466     88.94256
  72.111435     8.4164915   34.524082    73.14016      4.838548
  69.67902    -97.855736    90.358696     9.051491    37.595695
 -66.01661    -97.110634    82.84477     69.46122     25.561102
  47.926853   -10.202202    78.2545      31.339691    12.940468
 -31.499294    -3.351652    62.46355     45.0427     -86.02812
 -43.48385    -62.274956   -36.077827    51.81446     32.47797
  59.10228     68.18655      9.3604145  -76.47674    -50.29268
  94.496346    30.837933   -48.315712   -44.92399    -62.369625
  47.578724    84.84092    -66.64584     88.376434    95.05615
 -92.37309      3.0038757   85.21814     -6.688882    97.74142
  20.733965    -5.62451     69.6166     -64.435455    94.09325
 -63.13334     89.150345   -17.61865     32.776333    27.28345
  31.288876    -9.983517   -46.39662    -37.025536    47.853374
 -30.384796   -79.801544   -11.131944   -36.417023    84.25002
 -74.19904    -86.72338     -6.5878353   26.253004   -28.112898
 -64.88305    -40.56897    -65.849686    22.276798    -3.356709
 -78.41364    -67.26924    -10.346288   -43.172684    10.149812
 -22.575602   -28.780804   -64.24396    -14.579756   -30.369322
 -59.28742    -37.098255    31.078829    29.901808    50.531147
 -88.35735    -45.65366     -6.7495203    6.8026304   56.172153
  -0.8727364    9.618746    89.294815    75.4403      81.63827
 -61.722088   -72.85743      9.296161   -69.17855      2.3497865
  20.234892   -13.279363   -44.531677    55.188084   -45.736256
 -30.018398    27.09971     28.841034    35.764072    21.457811
 -15.206495    94.05271     79.9942     -36.39198     38.40136
   5.2365685  -11.435508    67.15551     87.03286      7.9285994
  78.32062     97.863335   -28.68556    -72.658554   -79.39075
 -82.65206     39.52689    -22.053177    30.602457   -26.158005
  49.83525    -72.24563    -97.10148     54.803936    65.070786
 -57.019573    35.972733     6.694148   -74.88097    -71.13884
 -84.549545   -26.875593    -3.2775877   -8.592472    -5.248627
 -22.2127      98.26377    -51.741936   -69.48398    -47.230175
  92.72371     18.192408   -39.66745     44.556633   -21.733562
  15.191482     5.9535656   41.23602     89.30139    -32.57541
 -47.595608   -50.371124   -87.899666    57.644466    38.85747
  47.65093     49.42874    -32.424126   -22.5012      78.78245
 -70.6598     -87.218544    50.347565    55.945244    -3.4658287
  17.902784   -30.977674    53.424767   -82.00753      2.9060571
  -1.010124   -94.316765    13.186674   -52.089214    58.975357
  48.281635    26.436571   -27.11565     89.21593    -10.962796
  49.347828    21.556795    78.163956    35.06028     10.803711
  53.231297   -44.78757     -0.6473386   26.717777    63.757347
  -4.90904     21.724916    37.443634   -89.250656    62.98874
  72.13095    -12.19138     84.16487     71.54008    -73.41178
 -97.612564    39.947853    -1.3887504   -5.6196795  -54.509125
 -28.877354    26.259935    42.28702    -38.848114   -76.46558
 -91.69401     71.27111     89.36143    -65.70425    -31.810083
  82.811226  ]
Output (dstLocal):
[ 4.71345850e+03  2.46985229e+03 -8.27969437e+01 -3.20867920e+03
 -1.76620471e+03  4.21270752e+03  3.79388721e+03 -2.61010083e+03
 -1.21815369e+02  2.34421533e+03 -2.18809741e+03 -7.80661182e+03
  2.93029968e+02 -3.78187769e+03 -9.21200317e+02 -2.41682422e+03
 -6.06243945e+03 -1.66648096e+03 -1.05372913e+03 -5.46234668e+03
 -1.13550574e+03  2.96143213e+03  6.63022705e+03 -1.58223096e+03
  1.55008167e+03  9.32014355e+03 -1.14580493e+03  7.72311829e+02
 -3.61687036e+03 -1.94723633e+03  3.08941895e+03  6.17864697e+03
  6.91487598e+03  2.43282837e+03  8.64538574e+03 -3.90718848e+03
 -7.30345764e+02 -1.01493103e+03 -6.13950391e+03  8.10182877e+01
  2.13901343e+03  3.01947266e+03 -6.60941162e+03 -3.85254059e+02
  5.68450098e+03  3.24727393e+03 -6.57540588e+02  5.91162170e+02
 -1.78790039e+03  1.55979932e+03  1.14135229e+03  1.52090625e+03
 -3.15582520e+03  2.32268335e+03  6.43788910e+01  1.70265845e+03
 -4.77684961e+03  5.37557275e+03 -2.49401978e+03 -8.17899902e+02
  1.73470374e+03 -3.51301074e+03 -1.17427869e+03 -2.21299854e+03
  8.74725342e+00  3.74451172e+03  5.34882422e+03 -1.62781116e+03
  1.09463263e+01  2.70595306e+02 -3.05740674e+03 -8.29420215e+03
 -8.06836060e+02 -6.53156836e+03 -1.80605811e+03 -3.68566055e+01
 -1.24112793e+03 -4.10031396e+03 -2.05299121e+03  3.12362524e+03
 -3.56968774e+03 -3.54660034e+02 -3.84981323e+02  3.86899506e+02
 -6.55042773e+03  5.94228516e+02  1.51913794e+03  3.17024316e+03
 -2.29938019e+02 -9.70730469e+02 -4.21526172e+03  1.12001099e+03
 -4.30428320e+03  3.69281152e+03 -7.41005249e+02 -4.93377930e+03
  8.86545288e+02 -1.85737708e+03  2.05545837e+02 -1.52355396e+03
 -1.04807014e+02  1.49825708e+03 -1.42192444e+03  2.61773071e+03
  3.77611279e+03 -4.86581396e+03 -3.21388818e+03 -2.02889624e+03
  6.75540869e+03  1.33113416e+03 -6.59783478e+01  4.01553857e+03
 -3.62763989e+03 -3.04459814e+03 -3.85584375e+03 -1.08604480e+03
  6.09126807e+03 -1.64623193e+03  4.90682227e+03 -2.29084198e+02
 -6.31273193e+03 -4.32163135e+03  7.03852930e+03  2.58860718e+03
 -2.51703369e+03  1.50186682e+03 -1.66953711e+03 -2.11329175e+03
 -1.64636609e+03 -3.78877710e+03 -8.87250488e+02 -5.90984680e+02
 -1.05408154e+03 -3.03201904e+03 -7.36205750e+02  2.24413989e+03
 -2.00688464e+03  1.98931763e+03  2.04653162e+03  2.70084448e+03
 -2.95040186e+03 -1.57956250e+03 -7.36683533e+02 -3.44564209e+03
 -1.15952522e+02  3.23661548e+03  1.95226562e+03  1.02470142e+03
 -5.66893604e+03 -1.66441513e+02  2.23861353e+03  2.65748840e+02
 -3.25920044e+02  1.38592395e+03  2.60631055e+03 -1.42827905e+03
  9.76226807e+01 -1.10571973e+03  7.10548767e+02 -1.13593823e+03
 -3.20208862e+03  6.08882861e+03 -6.06609375e+03  8.24707715e+03
  2.41146924e+03  5.67302344e+03  1.51807239e+03  3.97848120e+03
 -2.04575500e+03  7.89995508e+03 -5.03820557e+03 -1.81140686e+03
 -3.47021313e+03  6.50615771e+03  1.98545837e+03  5.72058398e+03
 -5.57672852e+03 -5.66463623e+02 -3.01132959e+03 -5.69939880e+02
  6.52397363e+03  7.03739258e+02 -7.35288696e+01  7.44736133e+03
  6.77963409e+01 -6.10719922e+03 -4.98593311e+03 -3.30549243e+03
  5.20452515e+02 -1.01435034e+03  2.54879883e+03 -3.97421875e+03
  1.81924927e+02  2.26807812e+03  2.75070117e+03  1.45375439e+03
  1.50611035e+03 -6.47270947e+03  5.72174902e+03  1.58286792e+03
 -1.76499768e+03 -3.58882599e+02  4.92718750e+03  3.70691406e+03
 -2.26471191e+03  4.92040283e+03 -3.63364966e+03 -2.26214648e+03
 -6.08914246e+02  6.75068909e+02  3.97046460e+03  5.66546436e+03
 -6.43718848e+03  8.31193970e+02 -8.63325928e+02  1.36479724e+03
 -8.21582910e+03 -1.83201096e+02  2.80609082e+03  6.54139771e+02
  4.15837097e+02 -1.34577429e+03 -7.50841406e+03 -4.27278174e+03
  3.56066699e+03 -2.39108228e+03  1.07126465e+03  3.32460278e+03
  4.84893066e+03  1.75205615e+03  4.56651270e+03 -2.36709546e+03
  5.62077103e+01  3.76265161e+03 -7.91899902e+02  7.20431763e+02
 -1.95666602e+03 -2.02528333e+03 -3.44992090e+03 -1.95192932e+03
  1.15021460e+03  3.54251807e+03  7.44822070e+03 -3.34610791e+03
  8.66413184e+03 -3.62286774e+02 -1.58040115e+02 -4.15344086e+02
 -7.68586426e+02  2.29329517e+03 -1.70910608e+03 -2.80956885e+03
  3.86657227e+03  4.07690765e+02  8.78310547e+02  1.32684253e+03]