Brcb
Applicability
Product |
Supported/Unsupported |
|---|---|
√ |
|
√ |
|
x |
|
√ |
|
x |
|
x |
Function Usage
Extracts eight elements from a given input tensor each time and fills them in eight data blocks (32 bytes) in the result tensor. Each element corresponds to a data block.
Prototype
1 2 | template <typename T> __aicore__ inline void Brcb(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const uint8_t repeatTime, const BrcbRepeatParams& repeatParams) |
Parameters
Parameter |
Description |
|---|---|
T |
Operand data type. For the For the For the |
Parameter |
Input/Output |
Meaning |
|---|---|---|
dst |
Output |
Destination operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. |
src0 |
Input |
Source operand. The type is LocalTensor, and the supported TPosition is VECIN, VECCALC, or VECOUT. The start address of the LocalTensor must be 32-byte aligned. Has the same data type as dst. Eight elements are read from src0 in each iteration. Therefore, the number of elements in src0 must be greater than or equal to 8 x repeatTime. |
repeatTime |
Input |
Number of instruction repeats. Eight data blocks are collected in each repeat. Data range: repeatTimes ∈ [0, 255] |
repeatParams |
Input |
Controls parameters related to instruction iteration. The type is BrcbRepeatParams. For details, see ${INSTALL_DIR}/include/ascendc/basic_api/interface/kernel_struct_brcb.h. Replace ${INSTALL_DIR} with the actual CANN component directory. dstBlkStride and dstRepStride can be set by users. For details about the parameters, see Table 3. |
Parameter |
Meaning |
|---|---|
dstBlkStride |
Address stride of the vector destination operand between different data blocks in a single repeat Note: When dstBlkStride is set to 0, the value 1 is used by default. |
dstRepStride |
Address stride of the vector destination operand for the same data block between adjacent repeats |
blockNumber |
Reserved. This parameter is reserved for future functions. You can use the default value. |
src0BlkStride |
|
src1BlkStride |
|
src0RepStride |
|
src1RepStride |
|
repeatStrideMode |
|
strideSizeMode |
Returns
None
Restrictions
- For details about the operand address alignment requirements, see General Address Alignment Restrictions.
- src0 and dst cannot be the same memory address.
- For the
Atlas Inference Series Product 's AI Core, reserve an 8 KB as the temporary data storage of the API.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #include "kernel_operator.h" class VbrcbCase { public: __aicore__ inline VbrcbCase() {} __aicore__ inline void Init(__gm__ uint8_t *x, __gm__ uint8_t *y) { x_gm.SetGlobalBuffer(reinterpret_cast<__gm__ uint16_t *>(x)); y_gm.SetGlobalBuffer(reinterpret_cast<__gm__ uint16_t *>(y)); tpipe.InitBuffer(vecIn, 1, 16 * sizeof(uint16_t)); tpipe.InitBuffer(vecOut, 1, 256 * sizeof(uint16_t)); } __aicore__ inline void Process() { CopyIn(); Compute(); CopyOut(); } __aicore__ inline void CopyIn() { auto x_buf = vecIn.AllocTensor<uint16_t>(); AscendC::DataCopy(x_buf, x_gm, 16); vecIn.EnQue(x_buf); } __aicore__ inline void Compute() { auto x_buf = vecIn.DeQue<uint16_t>(); auto y_buf = vecOut.AllocTensor<uint16_t>(); AscendC::Brcb(y_buf, x_buf, 2, {1,8}); vecOut.EnQue(y_buf); vecIn.FreeTensor(x_buf); } __aicore__ inline void CopyOut() { auto y_buf = vecOut.DeQue<uint16_t>(); AscendC::DataCopy(y_gm, y_buf, 256); vecOut.FreeTensor(y_buf); } private: AscendC::GlobalTensor<uint16_t> x_gm; AscendC::GlobalTensor<uint16_t> y_gm; AscendC::TPipe tpipe; AscendC::TQue<AscendC::TPosition::VECIN, 1> vecIn; AscendC::TQue<AscendC::TPosition::VECOUT, 1> vecOut; }; extern "C" __global__ __aicore__ void vbrcb_uint16_t_16(__gm__ uint8_t *x, __gm__ uint8_t *y) { VbrcbCase op; op.Init(x, y); op.Process(); } |
Input data x_gm: [1 2 3... 16] Output data y_gm: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ... 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16]