MulAddDst
Applicability
|
Product |
Supported/Unsupported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
|
|
x |
|
|
x |
Function Usage
Multiplies src0 and src1 by element, adds the result to dst, and stores the final result in dst. The formula is as follows.

Prototype
- Computation of the first n pieces of data of a tensor
1 2
template <typename T, typename U> __aicore__ inline void MulAddDst(const LocalTensor<T>& dst, const LocalTensor<U>& src0, const LocalTensor<U>& src1, const int32_t& count)
- High-dimensional tensor sharding computation
- Bitwise mask mode
1 2
template <typename T, typename U, bool isSetMask = true> __aicore__ inline void MulAddDst(const LocalTensor<T>& dst, const LocalTensor<U>& src0, const LocalTensor<U>& src1, const uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams)
- Contiguous mask mode
1 2
template <typename T, typename U, bool isSetMask = true> __aicore__ inline void MulAddDst(const LocalTensor<T>& dst, const LocalTensor<U>& src0, const LocalTensor<U>& src1, uint64_t mask, const uint8_t repeatTime, const BinaryRepeatParams& repeatParams)
- Bitwise mask mode
Parameters
|
Parameter |
Description |
|---|---|
|
T |
Data type of the destination operand. For details about the data type constraints of the destination and source operands, see Table 3. For the For the For the |
|
U |
Data type of the source operand. For the For the For the |
|
isSetMask |
Indicates whether to set mask inside the API.
|
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dst |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
|
src0, src1 |
Input |
Source operand. The data types of the source operand and the destination operand can be different. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
|
count |
Input |
Number of elements involved in the computation. |
|
mask[]/mask |
Input |
The mask parameter is used to control the elements involved in computation in each iteration.
|
|
repeatTime |
Input |
Number of iteration repeats. The Vector Unit reads 256 bytes of contiguous data for computation each time. To read the complete data for processing, the unit needs to read the input data in multiple repeats. repeatTime indicates the number of iteration repeats. For details about this parameter, see High-dimensional Sharding APIs. |
|
repeatParams |
Input |
Parameters that control the operand address strides. They are of the BinaryRepeatParams type, and contain such parameters as those that specify the address stride of the operand for the same data block between adjacent iterations and address stride of the operand between different data blocks in a single iteration. For details about the address stride parameters between adjacent iterations, see repeatStride. For details about the address stride parameters of DataBlock in the same iteration, see dataBlockStride. |
|
src0 Data Type |
src1 Data Type |
dst Data Type |
PAR |
Availability |
|---|---|---|---|---|
|
half |
half |
half |
128 |
|
|
float |
float |
float |
64 |
|
|
half |
half |
float |
64 |
|
|
int16_t |
int16_t |
int16_t |
128 |
|
|
uint16_t |
uint16_t |
uint16_t |
128 |
|
|
int32_t |
int32_t |
int32_t |
64 |
|
|
uint32_t |
uint32_t |
uint32_t |
64 |
|
Returns
None
Constraints
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- For details about the restrictions on overlapping operand addresses, see General Address Overlap Restrictions. In particular, when the source operand is of the half type and the destination operand is of the float type, the source operand and destination operand cannot be 100% overlapped. Therefore, address overlapping is not supported.
Examples
- Example of high-dimensional sharding computation API - contiguous mask mode (the data type of the source operand is half and that of the destination operand is float)
1 2 3 4 5
uint64_t mask = 64; // repeatTime = 4. 64 elements are computed in each iteration, and 256 elements are computed in total. // dstBlkStride, src0BlkStride, src1BlkStride = 1. Data is continuously read and written in a single repeat. // dstRepStride = 8, src0RepStride, src1RepStride = 4. Data is continuously read and written between adjacent iterations. AscendC::MulAddDst(dstLocal, src0Local, src1Local, 64, 4, { 1, 1, 1, 8, 4, 4 });
- Example of high-dimensional sharding computation API - bitwise mask mode (the data type of the source operand is half and that of the destination operand is float)
1 2 3 4 5
uint64_t mask[2] = { UINT64_MAX, 0 }; // repeatTime = 4. 64 elements are computed in each iteration, and 256 elements are computed in total. // dstBlkStride, src0BlkStride, src1BlkStride = 1. Data is continuously read and written in a single repeat. // dstRepStride = 8, src0RepStride, src1RepStride = 4. Data is continuously read and written between adjacent iterations. AscendC::MulAddDst(dstLocal, src0Local, src1Local, mask, 4, { 1, 1, 1, 8, 4, 4 });
- Example of computing the first n data elements of a tensor (the data type of the source operand is half and that of the destination operand is float)
1AscendC::MulAddDst(dstLocal, src0Local, src1Local, 256);
Input (src0Local): [-83. 58.2 -14.28 -43.12 20.72 -79.9 54.16 31.56 -1.464 68.25 -28.31 -93.5 -4.2 -46.56 -22.23 78.5 -69.56 -37.03 -53.12 58.28 -71.56 -34.44 85.94 96.3 66.06 99.94 -45.94 8.75 -93.9 35.56 82.56 -70.8 -68.75 -35.4 95.3 -49.1 -56.34 86.75 90.25 24.17 79.06 -49.66 -95.3 -6.965 -63.72 -33.16 -15.56 -43.28 51.28 40.1 83.25 49.72 55.47 -53.7 17.55 -36.06 63. 59.16 -66.8 -9.01 25.56 44.28 22.12 -33.84 -31.9 -74.2 79.94 -34.94 1.119 18.45 -92.75 -83.25 42.66 -77.6 33.28 0.709 -19.3 44.44 45.28 -33.4 -55.94 -42.22 -37.72 39.4 87.25 23.19 34.16 51.3 -22.16 15.234 59. -20.45 -63.9 41.84 -14.63 -80.94 47.8 -36.84 8.47 -60.66 -26.06 -42.78 30.5 -91.3 55.84 -85.44 -99.44 68.2 -71.7 27.45 -11.48 -48.03 71. 71.5 -59.2 14.67 79.25 32.7 -54.22 6.17 -69.94 -49.22 87.7 -61.53 36.25 -57.84 -81.75 -24.84 -35. -62.44 -47.22 19.95 21.16 -31.56 13.38 72.4 -64.06 -89.75 -28.17 34.4 -68.06 -46.94 16.06 65.56 3.16 -59.88 -32.97 30.69 89.5 16.66 25.05 -1.988 5.27 -23.14 -26.89 -24.72 1.427 -14.46 81.9 -59.94 68.7 -83.2 -75.44 88.6 27.62 -58.06 -36.1 -49.53 27.73 89.5 -51.5 90. 67.94 -70.8 24.2 -75.8 -96.75 -22.66 33.03 6.293 -87.5 36.56 36.06 -76.8 1.786 82.9 87.6 -63.94 -4.51 -89.06 -56.06 75.2 -31.89 27.44 35.22 -27.19 37.53 96.94 -83.25 -49.6 31.78 -50.25 65.2 69.9 63.03 53. -70.1 -57.22 -11.99 -23.14 44.28 -77.3 77.25 10.805 16.3 -96.6 -94.9 34.1 -40.25 -99.7 -6.156 44.97 82.7 51.1 -53.28 85.44 -80.94 -47. -53.47 -35.22 76.75 -28.38 26.48 -67.06 34.28 -54.6 21.52 -38.9 79.75 51.7 -39.44 48.56 -91.7 -44.06 92.9 11.79 8.98 -5.074 12.375 -24.77 -27.31 76.2 39.8 -5.46 25.17 47. ] Input (src1Local): [-57.97 43.5 8.08 72.4 -81.44 -52. 69.1 -84.25 31.12 34.34 74.75 83.56 -83. 80.1 42.84 -31.6 88.56 47.34 18.89 -95.25 16.88 -85.75 76.75 -17.19 23.39 92.56 22.81 77.94 38.62 -55.8 38.22 -88.6 -99.4 -66.75 90.44 80.56 12.78 -12.6 -68.4 2.816 27.45 -60.88 70. 61.78 -90.56 -99.25 38.25 -14.49 -35.88 38.1 13. 29.22 -57.06 -44.7 6.535 -44.6 -76.3 91.7 36.66 83.9 66. -81.25 -50.06 68. 2.705 -51.72 66.9 49.03 15.76 9.37 33.2 99.56 -20.55 83.3 -57.1 37.06 68.94 -91.9 -46.06 -92.7 64.4 8.164 8.98 10.76 -75.6 26.94 46.8 62. 8.734 -69.25 -70.2 -59. 67.25 87.6 48.72 60.16 19.39 48.62 21.64 25.06 1.013 -36.6 -46.28 -29.14 67.44 56.7 32.03 -28.81 -94.44 49.6 0.583 -84.4 -51.53 -43. 66. -68. 77.44 -50.16 -90.4 -46.22 90.25 88. 79.25 -40.84 -71.7 -27.03 19.53 85.44 45.06 60.72 19.22 -28.95 -47.72 97.8 -51.6 31.42 31.75 -21.84 -71.4 77.9 43.12 35.66 -50.84 -52. -48.84 -53.97 -59.56 31.2 -64.3 -10.47 86.25 -84.44 -56.4 -63.03 -99.9 54.44 40.72 74.94 8.305 18.52 -47.34 -74.06 79.1 92.44 84.94 -98.7 -41.06 -80.2 -71.06 89.06 96.2 -19.83 -51.03 -92. 82.25 -75.75 58.66 22.72 -89.06 -83.06 -73.5 18.75 -0.939 -96.4 50.12 -73.9 -56.97 52.34 -95.56 11.02 -46.3 -52.2 -8.46 80.56 77. -51.72 38.8 -66.44 -69. -30.33 -53.3 5.406 74.8 52.25 -35.88 92.5 51.38 40.47 43.94 -29.05 89.7 -74.5 -83.5 81.75 -56.6 -13.625 86.9 -4.58 -67.5 -6.67 -59.53 -30.4 -91.75 -84.3 -66.6 -28.61 -13.79 -70.75 -90.2 -47.94 59.56 84.2 0.7085 -57.44 -24.94 -11.875 -90.4 54.22 -44.16 -36.34 -31.64 72.1 -81.25 75.8 93.9 -28.28 -20.53 90.2 -58.97 -95.7 59.22 -37.8 94.9 -86.7 36.16 26.47 ] Input (dstLocal): [-97.94773 -61.303955 32.56878 -87.50743 -78.92147 59.20739 50.336506 49.039738 -76.2525 0.25441223 -71.73807 6.481831 -55.5052 -51.057415 31.403702 63.285076 98.1897 86.71727 -50.16466 88.94256 72.111435 8.4164915 34.524082 73.14016 4.838548 69.67902 -97.855736 90.358696 9.051491 37.595695 -66.01661 -97.110634 82.84477 69.46122 25.561102 47.926853 -10.202202 78.2545 31.339691 12.940468 -31.499294 -3.351652 62.46355 45.0427 -86.02812 -43.48385 -62.274956 -36.077827 51.81446 32.47797 59.10228 68.18655 9.3604145 -76.47674 -50.29268 94.496346 30.837933 -48.315712 -44.92399 -62.369625 47.578724 84.84092 -66.64584 88.376434 95.05615 -92.37309 3.0038757 85.21814 -6.688882 97.74142 20.733965 -5.62451 69.6166 -64.435455 94.09325 -63.13334 89.150345 -17.61865 32.776333 27.28345 31.288876 -9.983517 -46.39662 -37.025536 47.853374 -30.384796 -79.801544 -11.131944 -36.417023 84.25002 -74.19904 -86.72338 -6.5878353 26.253004 -28.112898 -64.88305 -40.56897 -65.849686 22.276798 -3.356709 -78.41364 -67.26924 -10.346288 -43.172684 10.149812 -22.575602 -28.780804 -64.24396 -14.579756 -30.369322 -59.28742 -37.098255 31.078829 29.901808 50.531147 -88.35735 -45.65366 -6.7495203 6.8026304 56.172153 -0.8727364 9.618746 89.294815 75.4403 81.63827 -61.722088 -72.85743 9.296161 -69.17855 2.3497865 20.234892 -13.279363 -44.531677 55.188084 -45.736256 -30.018398 27.09971 28.841034 35.764072 21.457811 -15.206495 94.05271 79.9942 -36.39198 38.40136 5.2365685 -11.435508 67.15551 87.03286 7.9285994 78.32062 97.863335 -28.68556 -72.658554 -79.39075 -82.65206 39.52689 -22.053177 30.602457 -26.158005 49.83525 -72.24563 -97.10148 54.803936 65.070786 -57.019573 35.972733 6.694148 -74.88097 -71.13884 -84.549545 -26.875593 -3.2775877 -8.592472 -5.248627 -22.2127 98.26377 -51.741936 -69.48398 -47.230175 92.72371 18.192408 -39.66745 44.556633 -21.733562 15.191482 5.9535656 41.23602 89.30139 -32.57541 -47.595608 -50.371124 -87.899666 57.644466 38.85747 47.65093 49.42874 -32.424126 -22.5012 78.78245 -70.6598 -87.218544 50.347565 55.945244 -3.4658287 17.902784 -30.977674 53.424767 -82.00753 2.9060571 -1.010124 -94.316765 13.186674 -52.089214 58.975357 48.281635 26.436571 -27.11565 89.21593 -10.962796 49.347828 21.556795 78.163956 35.06028 10.803711 53.231297 -44.78757 -0.6473386 26.717777 63.757347 -4.90904 21.724916 37.443634 -89.250656 62.98874 72.13095 -12.19138 84.16487 71.54008 -73.41178 -97.612564 39.947853 -1.3887504 -5.6196795 -54.509125 -28.877354 26.259935 42.28702 -38.848114 -76.46558 -91.69401 71.27111 89.36143 -65.70425 -31.810083 82.811226 ] Output (dstLocal): [ 4.71345850e+03 2.46985229e+03 -8.27969437e+01 -3.20867920e+03 -1.76620471e+03 4.21270752e+03 3.79388721e+03 -2.61010083e+03 -1.21815369e+02 2.34421533e+03 -2.18809741e+03 -7.80661182e+03 2.93029968e+02 -3.78187769e+03 -9.21200317e+02 -2.41682422e+03 -6.06243945e+03 -1.66648096e+03 -1.05372913e+03 -5.46234668e+03 -1.13550574e+03 2.96143213e+03 6.63022705e+03 -1.58223096e+03 1.55008167e+03 9.32014355e+03 -1.14580493e+03 7.72311829e+02 -3.61687036e+03 -1.94723633e+03 3.08941895e+03 6.17864697e+03 6.91487598e+03 2.43282837e+03 8.64538574e+03 -3.90718848e+03 -7.30345764e+02 -1.01493103e+03 -6.13950391e+03 8.10182877e+01 2.13901343e+03 3.01947266e+03 -6.60941162e+03 -3.85254059e+02 5.68450098e+03 3.24727393e+03 -6.57540588e+02 5.91162170e+02 -1.78790039e+03 1.55979932e+03 1.14135229e+03 1.52090625e+03 -3.15582520e+03 2.32268335e+03 6.43788910e+01 1.70265845e+03 -4.77684961e+03 5.37557275e+03 -2.49401978e+03 -8.17899902e+02 1.73470374e+03 -3.51301074e+03 -1.17427869e+03 -2.21299854e+03 8.74725342e+00 3.74451172e+03 5.34882422e+03 -1.62781116e+03 1.09463263e+01 2.70595306e+02 -3.05740674e+03 -8.29420215e+03 -8.06836060e+02 -6.53156836e+03 -1.80605811e+03 -3.68566055e+01 -1.24112793e+03 -4.10031396e+03 -2.05299121e+03 3.12362524e+03 -3.56968774e+03 -3.54660034e+02 -3.84981323e+02 3.86899506e+02 -6.55042773e+03 5.94228516e+02 1.51913794e+03 3.17024316e+03 -2.29938019e+02 -9.70730469e+02 -4.21526172e+03 1.12001099e+03 -4.30428320e+03 3.69281152e+03 -7.41005249e+02 -4.93377930e+03 8.86545288e+02 -1.85737708e+03 2.05545837e+02 -1.52355396e+03 -1.04807014e+02 1.49825708e+03 -1.42192444e+03 2.61773071e+03 3.77611279e+03 -4.86581396e+03 -3.21388818e+03 -2.02889624e+03 6.75540869e+03 1.33113416e+03 -6.59783478e+01 4.01553857e+03 -3.62763989e+03 -3.04459814e+03 -3.85584375e+03 -1.08604480e+03 6.09126807e+03 -1.64623193e+03 4.90682227e+03 -2.29084198e+02 -6.31273193e+03 -4.32163135e+03 7.03852930e+03 2.58860718e+03 -2.51703369e+03 1.50186682e+03 -1.66953711e+03 -2.11329175e+03 -1.64636609e+03 -3.78877710e+03 -8.87250488e+02 -5.90984680e+02 -1.05408154e+03 -3.03201904e+03 -7.36205750e+02 2.24413989e+03 -2.00688464e+03 1.98931763e+03 2.04653162e+03 2.70084448e+03 -2.95040186e+03 -1.57956250e+03 -7.36683533e+02 -3.44564209e+03 -1.15952522e+02 3.23661548e+03 1.95226562e+03 1.02470142e+03 -5.66893604e+03 -1.66441513e+02 2.23861353e+03 2.65748840e+02 -3.25920044e+02 1.38592395e+03 2.60631055e+03 -1.42827905e+03 9.76226807e+01 -1.10571973e+03 7.10548767e+02 -1.13593823e+03 -3.20208862e+03 6.08882861e+03 -6.06609375e+03 8.24707715e+03 2.41146924e+03 5.67302344e+03 1.51807239e+03 3.97848120e+03 -2.04575500e+03 7.89995508e+03 -5.03820557e+03 -1.81140686e+03 -3.47021313e+03 6.50615771e+03 1.98545837e+03 5.72058398e+03 -5.57672852e+03 -5.66463623e+02 -3.01132959e+03 -5.69939880e+02 6.52397363e+03 7.03739258e+02 -7.35288696e+01 7.44736133e+03 6.77963409e+01 -6.10719922e+03 -4.98593311e+03 -3.30549243e+03 5.20452515e+02 -1.01435034e+03 2.54879883e+03 -3.97421875e+03 1.81924927e+02 2.26807812e+03 2.75070117e+03 1.45375439e+03 1.50611035e+03 -6.47270947e+03 5.72174902e+03 1.58286792e+03 -1.76499768e+03 -3.58882599e+02 4.92718750e+03 3.70691406e+03 -2.26471191e+03 4.92040283e+03 -3.63364966e+03 -2.26214648e+03 -6.08914246e+02 6.75068909e+02 3.97046460e+03 5.66546436e+03 -6.43718848e+03 8.31193970e+02 -8.63325928e+02 1.36479724e+03 -8.21582910e+03 -1.83201096e+02 2.80609082e+03 6.54139771e+02 4.15837097e+02 -1.34577429e+03 -7.50841406e+03 -4.27278174e+03 3.56066699e+03 -2.39108228e+03 1.07126465e+03 3.32460278e+03 4.84893066e+03 1.75205615e+03 4.56651270e+03 -2.36709546e+03 5.62077103e+01 3.76265161e+03 -7.91899902e+02 7.20431763e+02 -1.95666602e+03 -2.02528333e+03 -3.44992090e+03 -1.95192932e+03 1.15021460e+03 3.54251807e+03 7.44822070e+03 -3.34610791e+03 8.66413184e+03 -3.62286774e+02 -1.58040115e+02 -4.15344086e+02 -7.68586426e+02 2.29329517e+03 -1.70910608e+03 -2.80956885e+03 3.86657227e+03 4.07690765e+02 8.78310547e+02 1.32684253e+03]