全归约指令分为三种:
接口名 |
原型定义 |
---|---|
WholeReduceMax |
|
WholeReduceMin |
|
WholeReduceSum |
|
参数名 |
描述 |
---|---|
T |
操作数数据类型。 |
isSetMask |
是否在接口内部设置mask。
|
参数名称 |
输入/输出 |
含义 |
---|---|---|
dstLocal |
输出 |
目的操作数。 类型为LocalTensor,支持的TPosition为VECIN/VECCALC/VECOUT。 Atlas 训练系列产品,支持的数据类型为:half,注意,对于WholeReduceSum,支持数据类型half/float Atlas推理系列产品AI Core,支持的数据类型为:half/float Atlas A2训练系列产品/Atlas 800I A2推理产品,支持的数据类型为:half/float Atlas 200I/500 A2推理产品,支持的数据类型为:half/float |
srcLocal |
输入 |
源操作数。 类型为LocalTensor,支持的TPosition为VECIN/VECCALC/VECOUT。 源操作数的数据类型需要与目的操作数保持一致。 Atlas 训练系列产品,支持的数据类型为:half,注意,对于WholeReduceSum,支持数据类型half/float Atlas推理系列产品AI Core,支持的数据类型为:half/float Atlas A2训练系列产品/Atlas 800I A2推理产品,支持的数据类型为:half/float Atlas 200I/500 A2推理产品,支持的数据类型为:half/float |
mask |
输入 |
mask用于控制每次迭代内参与计算的元素。
|
repeatTimes |
输入 |
重复迭代次数。取值范围为[0, 255]。 矢量计算单元,每次读取连续的256 Bytes数据进行计算,为完成对输入数据的处理,必须通过多次迭代(repeat)才能完成所有数据的读取与计算。repeatTimes表示迭代的次数。 关于该参数的具体描述请参考基础API通用说明。 |
dstRepStride |
输入 |
相邻迭代间,目的操作数相同block地址步长。即目的操作数每次迭代跳过的block数目。 对于WholeReduceMax和WholeReduceMin,单位为dstLocal数据类型所占Bytes的两倍。比如当dstLocal为half时,单位为4Bytes,即每次迭代目的操作数跳过4Bytes大小,需要注意的是,当order取值ORDER_ONLY_VALUE时,单位为dstLocal数据类型所占Bytes长度,不再是两倍。当取值为ORDER_ONLY_INDEX时,单位固定为uint32_t类型所占Bytes长度。 对于WholeReduceSum,单位为dstLocal数据类型所占Bytes。比如当dstLocal为half时,单位为2Bytes,即每次迭代目的操作数跳过2Bytes大小。 注意,此参数值Atlas 训练系列产品不支持配置0。 |
srcBlkStride |
输入 |
单次迭代内,源操作数不同block间地址步长。详细说明请参考Block stride(同一迭代内不同datablock的地址步长)。 |
srcRepStride |
输入 |
相邻迭代间,源操作数相同block地址步长,即源操作数每次迭代跳过的block数目。详细说明请参考Repeat stride(相邻迭代间相同datablock的地址步长)。 |
order |
输入 |
使用order参数指定dstLocal中index与value的相对位置以及返回结果行为(仅WholeReduceMax/WholeReduceMin支持),参数为ReduceOrder类型,默认值为ORDER_VALUE_INDEX。取值范围如下:
|
无
Atlas 训练系列产品
Atlas推理系列产品AI Core
Atlas A2训练系列产品/Atlas 800I A2推理产品
Atlas 200I/500 A2推理产品
假设源操作数为128个float16的数据[data0,data1,data2...data127],一个repeat可以计算完,计算过程如下。
需要注意的是两两相加的计算过程中,计算结果大于65504时结果保存为65504。例如源操作数为[60000,60000,-30000,100],首先60000+60000溢出,结果为65504,第二步计算-30000+100=-29900,第四步计算65504-29900=35604。
// dstLocal,srcLocal均为half类型,srcLocal的计算数据量为512,连续排布,计算结果也需要连续排布,使用tensor高维切分计算接口,设定mask为最多的128个全部元素参与计算 // 根据以上信息,推断出repeatTimes为4,dstRepStride为1,srcBlkStride为1,srcRepStride为8 // 若求最大值及索引,并且需要存储顺序为[value, index]的结果,可以使用默认order,接口示例为: WholeReduceMax<half>(dstLocal, srcLocal, 128, 4, 1, 1, 8); // 若求最大值及索引,并且需要存储顺序为[index, value]的结果,接口示例为: WholeReduceMax<half>(dstLocal, srcLocal, 128, 4, 1, 1, 8, ReduceOrder::ORDER_INDEX_VALUE); // 若求和,则接口示例为: WholeReduceSum<half>(dstLocal, srcLocal, 128, 4, 1, 1, 8);
// dstLocal,srcLocal均为half类型,srcLocal的计算数据量为512,连续排布,计算结果也需要连续排布,使用tensor高维切分计算接口,设定mask为最多的128个全部元素参与计算 // 根据以上信息,推断出repeatTimes为4,dstRepStride为1,srcBlkStride为1,srcRepStride为8 // 若求最大值及索引,并且需要存储顺序为[value, index]的结果,使用默认order,接口示例为: WholeReduceMax<half>(dstLocal, srcLocal, { 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF }, 4, 1, 1, 8);
#include "kernel_operator.h" namespace AscendC { class KernelReduce { public: __aicore__ inline KernelReduce() {} __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { srcGlobal.SetGlobalBuffer((__gm__ half*)src); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); repeat = srcDataSize / mask; pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>(); DataCopy(srcLocal, srcGlobal, srcDataSize); inQueueSrc.EnQue(srcLocal); } __aicore__ inline void Compute() { LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>(); LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>(); WholeReduceMax<half>(dstLocal, srcLocal, mask, repeat, 1, 1, 8); // 使用默认order outQueueDst.EnQue<half>(dstLocal); inQueueSrc.FreeTensor(srcLocal); } __aicore__ inline void CopyOut() { LocalTensor<half> dstLocal = outQueueDst.DeQue<half>(); DataCopy(dstGlobal, dstLocal, dstDataSize); outQueueDst.FreeTensor(dstLocal); } private: TPipe pipe; TQue<QuePosition::VECIN, 1> inQueueSrc; TQue<QuePosition::VECOUT, 1> outQueueDst; GlobalTensor<half> srcGlobal, dstGlobal; int srcDataSize = 1024; int dstDataSize = 16; int mask = 128; int repeat = 0; }; } // namespace AscendC extern "C" __global__ __aicore__ void reduce_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { AscendC::KernelReduce op; op.Init(src, dstGm); op.Process(); } 输入数据(src_gm): [0.00787 0.8516 0.01558 0.152 0.887 0.2532 0.2272 0.1295 0.7207 0.628 0.5522 0.991 0.3164 0.961 0.526 0.5513 0.03973 0.3293 0.809 0.562 0.915 0.56 0.3464 0.3438 0.6094 0.1201 0.8384 0.848 0.004436 0.4263 0.01917 0.753 0.9126 0.2307 0.1066 0.644 0.8657 0.7085 0.7915 0.1707 0.3806 0.957 0.0483 0.858 0.10675 0.21 0.03345 0.55 0.3757 0.3281 0.927 0.09406 0.6445 0.985 0.405 0.09393 0.773 0.7227 0.03714 0.595 0.889 0.0948 0.4202 0.2747 0.5894 0.3022 0.894 0.675 0.6016 0.938 0.585 0.5244 0.8643 0.888 0.794 0.636 0.976 0.148 0.7427 0.1742 0.32 0.0649 0.2954 0.2018 0.833 0.0976 0.4048 0.2861 0.8765 0.722 0.998 0.03041 0.005512 0.9087 0.9873 0.1436 0.4812 0.1901 0.78 0.6934 0.2317 0.3782 0.8613 0.808 0.06885 0.3584 0.5684 0.541 0.5415 0.3096 0.5957 0.9043 0.7964 0.501 0.4324 0.7544 0.687 0.8447 0.526 0.548 0.926 0.9106 0.1616 0.183 0.6704 0.642 0.4783 0.1797 0.2078 0.59 0.4866 0.4683 0.649 0.7266 0.4976 0.8364 0.6245 0.07385 0.0786 0.586 0.7827 0.3298 0.9497 0.1617 0.4375 0.3572 0.2896 0.6465 0.1156 0.4905 0.2617 0.8267 0.2054 0.1415 0.2993 0.8374 0.754 0.942 0.6416 0.1222 0.1465 0.3335 0.3577 0.6484 0.614 0.5825 0.6807 0.9297 0.694 0.759 0.908 0.9126 0.4731 0.963 0.3271 0.724 0.4077 0.335 0.672 0.4219 0.1818 0.843 0.2708 0.0816 0.457 0.3481 0.67 0.6895 0.6924 0.191 0.2013 0.2484 0.8833 0.9146 0.4102 0.1063 0.6685 0.804 0.6606 0.2491 0.34 0.3281 0.823 0.603 0.521 0.6797 0.401 0.5 0.03683 0.04758 0.507 0.667 0.9014 0.263 0.2477 0.0179 0.8735 0.007023 0.545 0.758 0.3508 0.6333 0.9375 0.5903 0.2732 0.0847 0.489 0.196 0.5557 0.403 0.9204 0.3655 0.5083 0.7515 0.3347 0.6914 0.2185 0.2458 0.5537 0.3457 0.4878 0.869 0.908 0.0877 0.295 0.9 0.9307 0.05545 0.4639 0.4001 0.8433 0.4883 0.916 0.7026 0.5063 0.05164 0.936 0.844 0.2086 0.625 0.0197 0.4312 0.3677 0.983 0.625 0.004665 0.2479 0.3093 0.9214 0.003672 0.7915 0.921 0.331 0.01127 0.703 0.6416 0.4053 0.53 0.9688 0.10297 0.5547 0.07367 0.2305 0.02821 0.8115 0.4202 0.0561 0.0917 0.04828 0.536 0.0905 0.328 0.8413 0.3696 0.982 0.3733 0.436 0.753 0.1937 0.8706 0.991 0.273 0.763 0.418 0.4446 0.513 0.6724 0.1179 0.921 0.756 0.7144 0.6196 0.9634 0.562 0.3088 0.864 0.709 0.6797 0.2114 0.534 0.5225 0.1852 0.038 0.5454 0.8823 0.849 0.608 0.7734 0.7446 0.7236 0.1903 0.1031 0.497 0.57 0.172 0.1907 0.6333 0.641 0.681 0.2323 0.1007 0.4094 0.3655 0.4248 0.08044 0.1483 0.08716 0.354 0.128 0.3933 0.775 0.215 0.728 0.909 0.4204 0.618 0.2517 0.9106 0.3647 0.5977 0.3445 0.315 0.488 0.99 0.9443 0.6196 0.9287 0.088 0.9946 0.796 0.7515 0.1912 0.4312 0.7974 0.735 0.01536 0.7456 0.643 0.484 0.218 0.9272 0.1703 0.1885 0.1982 0.754 0.902 0.848 0.05832 0.4138 0.6885 0.3853 0.3499 0.639 0.5786 0.6353 0.5664 0.02621 0.56 0.532 0.08246 0.733 0.1334 0.0728 0.7817 0.5273 0.126 0.179 0.7334 0.1565 0.457 0.4807 0.6987 0.5845 0.6206 0.902 0.9277 0.501 0.6763 0.3418 0.7925 0.07556 0.0929 0.9014 0.3145 0.04907 0.7188 0.958 0.7275 0.1963 0.1742 0.785 0.518 0.61 0.1112 0.481 0.10583 0.198 0.181 0.3271 0.2773 0.2391 0.5625 0.621 0.173 0.05936 0.5654 0.838 0.865 0.01523 0.6724 0.546 0.737 0.778 0.8613 0.7085 0.8213 0.08826 0.818 0.4866 0.159 0.4143 0.1007 0.7773 0.487 0.5225 0.8984 0.4907 0.525 0.4075 0.2632 0.2292 0.134 0.4622 0.65 0.294 0.607 0.2725 0.2603 0.9326 0.787 0.9478 0.941 0.3066 0.2944 0.3928 0.73 0.1797 0.2157 0.609 0.4216 0.8984 0.8477 0.863 0.2478 0.993 0.6274 0.724 0.03668 0.0991 0.5825 0.662 0.6904 0.7017 0.2379 0.514 0.1646 0.3245 0.03072 0.3232 0.907 0.9966 0.6396 0.2969 0.02539 0.66 0.764 0.7803 0.515 0.04074 0.2258 0.08887 0.1782 0.875 0.1517 0.2351 0.3848 0.5933 0.6875 0.1969 0.1283 0.06232 0.4348 0.168 0.6904 0.5464 0.12036 0.885 0.007717 0.5967 0.2856 0.628 0.62 0.854 0.4297 0.733 0.2274 0.9736 0.01622 0.456 0.4763 0.9707 0.874 0.8794 0.511 0.1628 0.03458 0.506 0.1464 0.3674 0.1532 0.786 0.3809 0.406 0.015434 0.901 0.951 0.3018 0.3584 0.5337 0.4983 0.85 0.833 0.7324 0.492 0.39 0.09845 0.8965 0.862 0.4033 0.181 0.2203 0.3738 0.2761 0.9653 0.3577 0.289 0.3167 0.91 0.2688 0.3972 0.585 0.2178 0.307 0.4966 0.513 0.5225 0.786 0.1888 0.9287 0.5093 0.1193 0.3987 0.799 0.9995 0.611 0.9897 0.7515 0.4478 0.3232 0.2426 0.3323 0.7134 0.77 0.7275 0.02043 0.3132 0.3555 0.03122 0.8623 0.4705 0.6357 0.3157 0.5063 0.1711 0.885 0.7554 0.815 0.0213 0.4346 0.049 0.905 0.525 0.921 0.02411 0.771 0.7227 0.1786 0.278 0.03387 0.7744 0.05875 0.8955 0.8374 0.715 0.3765 0.02075 0.675 0.9883 0.63 0.7017 0.299 0.92 0.1644 0.3977 0.487 0.818 0.636 0.3452 0.6406 0.783 0.3728 0.1619 0.7725 0.4673 0.297 0.9375 0.083 0.0914 0.6704 0.08923 0.332 0.0973 0.507 0.201 0.1658 0.2358 0.8706 0.6846 0.6396 0.289 0.831 0.669 0.4683 0.2568 0.219 0.616 0.978 0.1564 0.925 0.4265 0.6055 0.7246 0.235 0.5376 0.03668 0.2441 0.7935 0.383 0.2996 0.3523 0.2544 0.6006 0.8896 0.757 0.7134 0.3196 0.3657 0.249 0.2429 0.921 0.877 0.728 0.8853 0.1635 0.546 0.9243 0.676 0.4749 0.3928 0.4187 0.612 0.3953 0.2372 0.4092 0.1523 0.1599 0.03108 0.1602 0.2474 0.3572 0.0643 0.9434 0.52 0.8574 0.959 0.7593 0.2318 0.5444 0.2222 0.3884 0.8066 0.4573 0.664 0.335 0.02025 0.1519 0.01386 0.989 0.852 0.695 0.01289 0.3433 0.2148 0.9404 0.6753 0.704 0.11163 0.675 0.5264 0.1514 0.5273 0.9785 0.2769 0.4846 0.2747 0.558 0.742 0.681 0.835 0.9546 0.941 0.588 0.785 0.2095 0.07294 0.4343 0.086 0.5825 0.513 0.6313 0.04236 0.4072 0.558 0.681 0.4805 0.492 0.625 0.7744 0.002626 0.662 0.9043 0.4766 0.6597 0.6934 0.3394 0.05453 0.9146 0.2222 0.7925 0.605 0.812 0.671 0.4329 0.2118 0.363 0.1444 0.0955 0.692 0.675 0.3 0.6846 0.535 0.9834 0.929 0.3582 0.964 0.3835 0.1466 0.801 0.954 0.2554 0.01357 0.6636 0.8325 0.6494 0.817 0.2268 0.00904 0.0487 0.08716 0.6753 0.3833 0.663 0.396 0.6685 0.983 0.0728 0.694 0.02364 0.137 0.1727 0.231 0.7896 0.8057 0.478 0.883 0.1785 0.5938 0.11456 0.6997 0.1945 0.02365 0.7236 0.8623 0.2178 0.1295 0.3867 0.7188 0.11475 0.6 0.419 0.2673 0.4404 0.0107 0.4304 0.1364 0.3708 0.1158 0.1714 0.3123 0.3403 0.7163 0.079 0.6245 0.719 0.558 0.4526 0.09924 0.512 0.2452 0.519 0.999 0.7207 0.5605 0.7217 0.653 0.1164 0.789 0.4724 0.2727 0.10315 0.9644 0.7573 0.06464 0.858 0.7847 0.958 0.618 0.9536 0.46 0.9766 0.4263 0.4363 0.4434 0.95 0.3032 0.4338 0.809 0.1642 0.0561 0.2668 0.1853 0.356 0.934 0.968 0.327 0.913 0.434 0.6616 0.00502 0.05066 0.5327 0.276 0.5176 0.0674 0.6143 0.8345 0.2976 0.315 0.6646 0.527 0.791 0.0299 0.4558 0.8354 0.3115 0.3735 0.3582 0.742 0.2637 0.8877 0.7603 0.4568 0.2045 0.4746 0.392 0.65 0.391 0.972 0.6973 0.2297 0.568 0.49 0.1895 0.547 0.79 0.747 0.5205 0.313 0.3809 0.7817 0.32 0.1012 0.339 0.716 0.8955 0.8564 0.126 0.6597 0.228 0.1194 0.4775 0.173 0.0265 0.7456 0.859 0.4841 0.595 0.4553 0.1351 0.2246 0.3564 0.1832 0.8535 0.703 0.2423 0.04187 0.145 0.997 0.1919 0.571 0.8555 0.1578 0.2688 0.405 0.3909 0.1428 0.863 0.7295 0.3267 0.1294 0.5986 0.677 0.7065 0.8853 0.923 0.9385 0.935 0.1747 0.32 0.2292 0.2676 0.1161 0.4666 0.3826 0.2588 0.1863 0.7993 0.3984 0.2961 0.2952 0.3247 0.923 0.05746 ] 输出数据(dst_gm): [9.980e-01 5.364e-06 9.629e-01 2.682e-06 9.946e-01 6.676e-06 9.966e-01 7.510e-06 9.995e-01 5.424e-06 9.888e-01 6.378e-06 9.990e-01 6.735e-06 9.971e-01 5.484e-06]
#include "kernel_operator.h" namespace AscendC { class KernelReduce { public: __aicore__ inline KernelReduce() {} __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { srcGlobal.SetGlobalBuffer((__gm__ half*)src); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); repeat = srcDataSize / mask; pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>(); DataCopy(srcLocal, srcGlobal, srcDataSize); inQueueSrc.EnQue(srcLocal); } __aicore__ inline void Compute() { LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>(); LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>(); WholeReduceMin<half>(dstLocal, srcLocal, mask, repeat, 1, 1, 8); // 使用默认order outQueueDst.EnQue<half>(dstLocal); inQueueSrc.FreeTensor(srcLocal); } __aicore__ inline void CopyOut() { LocalTensor<half> dstLocal = outQueueDst.DeQue<half>(); DataCopy(dstGlobal, dstLocal, dstDataSize); outQueueDst.FreeTensor(dstLocal); } private: TPipe pipe; TQue<QuePosition::VECIN, 1> inQueueSrc; TQue<QuePosition::VECOUT, 1> outQueueDst; GlobalTensor<half> srcGlobal, dstGlobal; int srcDataSize = 1024; int dstDataSize = 16; int mask = 128; int repeat = 0; }; } // namespace AscendC extern "C" __global__ __aicore__ void reduce_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { AscendC::KernelReduce op; op.Init(src, dstGm); op.Process(); } 输入数据(src_gm): [8.94 4.773 8.53 6.047 7.883 3.824 8.04 1.233 9.125 7.484 8.21 1.197 4.34 2.99 6.55 2.494 2.758 9.664 3.406 1.665 2.059 3.836 8.83 2.72 5.81 9.055 2.95 5.906 2.29 1.999 8.27 3.234 2.389 4.73 8.21 6.945 1.834 1.227 4.598 2.285 3.504 6.48 4.984 6.125 6.21 4.035 8.375 6.89 5.258 9.43 9.805 5.195 2.143 2.36 3.467 2.746 4.203 1.737 4.734 2.717 3.8 1.995 4.133 4.742 5.83 7.844 2.182 5.95 6.887 9.13 3.393 6.938 8.33 4.074 5.812 4.805 5.92 5.832 7.176 8.01 6.64 1.468 5.61 4.49 2.736 6.766 8.5 4.164 5.1 7.58 2.771 1.703 2.588 5.53 6.773 4.758 1.837 6.08 5.555 9.55 6.3 8.086 7.043 7.383 9.73 7.484 6.113 7.93 9.11 2.72 5.406 8.9 6.688 5.73 3.037 1.871 5.33 6.633 9.43 8.805 1.092 9.2 4.16 2.543 2.072 5.297 1.922 3.06 5.883 5.996 6.31 9.69 9.42 6.46 2.363 2.664 1.711 4.227 9.73 6.875 4.43 3.652 7.91 5.875 2.154 8.77 3.064 7.76 5.254 2.986 5.453 3.344 3.256 7.566 7.336 7.62 6.61 5.94 6.547 9.3 4.418 9.21 3.518 7.53 7.766 9.37 4.125 2.275 6.355 9.07 2.633 2.15 5.363 2.148 8.84 7.918 1.124 2.107 9.695 2.475 3.168 4.336 3.639 6.76 7.625 5.375 4.35 9.11 2.66 3.082 3.156 6.574 1.6875 5.285 5.984 3.71 7.324 8.7 1.902 6.883 3.38 2.812 5.52 4.355 7.883 2.424 2.033 1.163 3.502 9.7 4.53 4.086 1.8955 2.42 6.695 8.72 7.32 5.477 4.99 4.715 9.78 3.45 5.73 8.73 8.38 1.751 1.987 8.41 4.984 1.489 3.73 7.613 8.44 4.027 9.97 3.303 3.438 2.475 6.27 6.742 3.492 7.152 9.87 3.135 3.658 2.887 6.55 7.6 9.695 1.997 3.959 9.85 3.79 7.938 7.97 3.17 9.78 5.688 8.15 8.22 1.746 4.633 4.06 9.71 9.695 4. 3.314 7.56 8.56 3.45 8.52 5.39 6.332 6.883 1.269 1.232 3.148 3.582 8.33 1.179 1.37 5.297 4.66 7.285 1.086 2.473 3.51 7.28 4.13 8.37 2.441 5.73 7.496 5.31 8.76 2.38 7.348 7.453 2.664 2.328 9.93 1.119 8.766 6.395 5.965 5.99 4.6 2.154 1.278 4.074 3.883 6.617 6.05 2.447 2.256 8.63 6.348 7.816 1.547 1.743 8.94 9.414 9.49 9.625 8.21 1.641 1.308 5.79 3.178 6.17 4.094 4.812 6.434 1.946 5.64 7.957 7.75 1.073 8.33 7.105 4.39 5.98 7.53 6.05 1.823 2.086 5.5 6.71 8.33 8.29 3.584 7.684 5.766 2.354 6.78 1.824 9.97 8.51 6.58 6.43 6.21 6.4 4.367 4.406 2.604 4.33 1.739 8. 1.828 9.14 6.32 9.2 3.469 8.586 9.01 3.854 9.49 4.133 6.266 5.08 2.426 7.574 1.077 3.453 3.975 9.58 8.7 8.48 8.82 8.92 3.809 7.355 7.758 9.336 6.734 2.578 9.23 7.406 9.28 2.688 9.82 4.816 1.821 4.99 4.26 3.223 7.277 4.25 8.3 9.734 4.65 6.535 1.145 7.367 3.615 7.36 8.33 7.58 9.336 5.17 6.52 9.41 9.98 2.766 9.42 6.85 2.258 5.3 6.85 1.848 5.83 4.863 6.875 2.215 5.13 5.836 8.01 4.56 7.89 5.273 7.51 6.938 9.42 2.69 2.434 9.586 9.375 4.48 3.656 1.709 6.43 7.363 2.744 6.316 1.648 8.62 9.61 3.787 2.877 9.09 3.76 1.255 9.84 2.592 1.932 5.68 1.545 5.27 5.758 2.615 1.832 4.492 4.258 8.64 1.39 1.534 4.465 4.832 5.62 2.893 3.928 3.438 3.84 2.105 5.355 4.402 9.54 8.98 5.723 5.91 4.97 3.984 5.707 8.82 7.71 1.297 3.387 7.04 2.494 3.83 6.375 7.28 2.805 3.244 4.97 3.736 5.363 9.64 3.41 6.297 9.83 5.832 3.182 1.314 9.02 5.95 6.215 5.043 7.984 5.75 6.29 4.297 3.11 9.11 2.44 9.42 6.27 3.5 7.652 7.043 7.36 3.336 5.938 7.88 8.414 9.445 3.121 8.57 6.848 8.375 4.395 3.344 7.2 7.188 3.502 1.3955 7.113 8.17 7.625 7.375 9.43 4.996 8.82 7.47 9.01 9.914 6.05 3.867 8.87 2.713 1.194 7.246 1.3 6.07 3.338 9.37 8.98 4.402 8.414 9.91 4.273 5.07 6.832 8.1 5.79 4.207 7.098 6.89 4.875 8.1 5.562 1.795 1.216 6.06 7.05 8.46 8.6 4.18 9.55 9.17 4.832 4.348 5.11 1.57 3.262 2.871 7.586 6.89 1.491 5.07 8.516 5.453 7.027 8.75 2.98 8.14 1.939 3.496 9.13 6.695 9.88 6.918 8.11 2.334 3.172 2.023 5.71 5.73 8.93 7.59 7.676 6.156 4.63 9.3 9.85 7.64 3.037 7.844 1.864 8.86 8.95 3.492 5.094 3.98 8.734 5.7 8.83 4.83 8.77 3.256 1.446 9.57 7.24 1.619 4.305 2.613 8.52 1.942 4.51 1.763 7.008 2.906 3.297 2.9 6. 7.266 1.484 9.82 9.49 4.29 5.184 9.23 5.32 4.977 8.46 5.01 8.83 8.125 6.703 5.76 2.81 5.477 9.21 5.965 1.945 7.785 5.402 2.926 4.125 8.66 3.064 7.67 5.617 1.917 5.652 6.71 6.016 1.414 3.623 5.543 5.496 1.709 5.63 9.8 4.074 8.45 8.69 3.287 7.598 4.82 9.34 6.863 3.615 9.57 6.914 1.097 5.77 3.168 4.13 8.805 9.11 6.074 6.94 4.207 8.87 3.771 6.723 6.18 5.035 5.168 2.54 6.5 1.165 8.27 8.34 6.55 5.48 2.916 5.227 7.355 6.773 8.93 8.03 7.016 9.055 9.38 5.96 7.605 1.135 2.719 5.67 8.47 8.586 1.516 5.88 2.809 3.754 5.08 4.523 4.11 7.37 8.27 7.13 7.375 6.21 8.27 6.258 7.2 9.875 2.72 8.836 2.295 3.596 6.4 6.664 2.426 2.326 2.234 9.13 1.09 9.31 7.383 6.848 9.77 3.455 1.8955 6.52 7.934 3.096 2.916 4.414 7.7 6.53 7.883 5.312 3.621 4.26 2.764 7.105 2.695 8.88 3.555 8.23 2.025 3.723 1.196 9.31 6.984 5.156 7.996 7.68 2.73 5.074 5.566 6.027 8.49 2.867 8.15 2.607 4.12 8.26 2.084 5.19 2.662 2.92 6.574 9.516 4.066 3.162 4.785 6.754 1.17 3.25 9.29 6.49 1.221 7.5 7.5 7.176 7.355 4.605 7.17 3.082 4.1 4.17 7.3 2.621 5.188 7.848 9.62 6.586 4.727 8.49 2.406 5.637 2.627 2.666 1.433 4.594 4.88 4.914 3.025 8.05 9.22 9.14 7.965 9.93 5.695 1.479 4.594 3.604 7.51 7.13 7.61 4.164 8.8 3.176 4.48 5.414 4.88 2.848 7.9 5.734 2.412 6.234 6.13 2.422 7. 6.46 5.28 2.537 9.26 5.508 4.15 6.965 9.984 2.588 1.44 9.27 9.48 1.508 4.164 4.6 4.78 2.553 7.42 8.19 2.09 9.17 6.39 5.117 4.316 2.928 1.542 6.156 5.367 7.465 3.67 2.71 8.56 1.676 9.74 1.035 4.35 7.5 9.06 5.242 3.38 9.02 9.74 3.441 2.215 7.453 6.547 8.77 1.679 7.656 1.884 9.86 7.883 2.838 7.453 2.102 4.016 6.887 7.74 7.04 8.195 5.957 5.348 6.99 5.723 3.357 7.945 6.863 5.895 8.24 1.139 4.688 7.727 5.473 8.38 7.953 1.94 7.387 4.152 9.664 5.984 3.938 1.157 9.37 7.023 9.26 7.47 6.973 2.006 2.646 7.94 8.695 4.49 7.99 3.072 7.39 9.15 1.879 8.97 8.125 4.613 1.028 2.877 9.15 2.771 9.11 2.422 2.613 5.12 1.508 5.746 2.5 3.857 7.28 8.836 3.615 6.316 2.506 7.938 2.576 5.2 1.335 7.88 3.838 8.8 5.723 9.836 6.35 3.557 5.08 2.344 2.633 5.46 8.39 1.893 8.164 5.836 1.698 1.498 9.33 3.895 4.137 6.684 7.793 2.14 9.055 3.16 ] 输出数据(dst_gm): [1.092e+00 7.153e-06 1.124e+00 2.861e-06 1.073e+00 4.828e-06 1.145e+00 1.669e-06 1.194e+00 2.861e-06 1.097e+00 3.874e-06 1.090e+00 5.960e-08 1.028e+00 4.888e-06]
#include "kernel_operator.h" namespace AscendC { class KernelReduce { public: __aicore__ inline KernelReduce() {} __aicore__ inline void Init(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { srcGlobal.SetGlobalBuffer((__gm__ half*)src); dstGlobal.SetGlobalBuffer((__gm__ half*)dstGm); repeat = srcDataSize / mask; pipe.InitBuffer(inQueueSrc, 1, srcDataSize * sizeof(half)); pipe.InitBuffer(outQueueDst, 1, dstDataSize * sizeof(half)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } private: __aicore__ inline void CopyIn() { LocalTensor<half> srcLocal = inQueueSrc.AllocTensor<half>(); DataCopy(srcLocal, srcGlobal, srcDataSize); inQueueSrc.EnQue(srcLocal); } __aicore__ inline void Compute() { LocalTensor<half> srcLocal = inQueueSrc.DeQue<half>(); LocalTensor<half> dstLocal = outQueueDst.AllocTensor<half>(); WholeReduceSum<half>(dstLocal, srcLocal, mask, repeat, 1, 1, 8); outQueueDst.EnQue<half>(dstLocal); inQueueSrc.FreeTensor(srcLocal); } __aicore__ inline void CopyOut() { LocalTensor<half> dstLocal = outQueueDst.DeQue<half>(); DataCopy(dstGlobal, dstLocal, dstDataSize); outQueueDst.FreeTensor(dstLocal); } private: TPipe pipe; TQue<QuePosition::VECIN, 1> inQueueSrc; TQue<QuePosition::VECOUT, 1> outQueueDst; GlobalTensor<half> srcGlobal, dstGlobal; int srcDataSize = 2048; int dstDataSize = 16; int mask = 128; int repeat = 0; }; } // namespace AscendC extern "C" __global__ __aicore__ void reduce_kernel(__gm__ uint8_t* src, __gm__ uint8_t* dstGm) { AscendC::KernelReduce op; op.Init(src, dstGm); op.Process(); } 输入数据(src_gm): [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 输出数据(dst_gm): [128. 128. 128. 128. 128. 128. 128. 128. 128. 128. 128. 128. 128. 128. 128. 128.]