Dynamic Shape Scenario
Section Operator Implementation describes the kernel implementation of vector operators with static shapes. The shapes and data types of the operators are static. In actual operator development scenarios, the information can change dynamically, making the scenarios more flexible and complex. The following describes the differences between the dynamic shape and the static shape.
The main difference is that the input shape is unknown in the dynamic shape scenario. Some variables related to the input shape (such as the block size moved each time) need to be computed using tiling and then passed to the kernel. The kernel uses this parameter for subsequent compute.
- In the operator example with a static shape in Operator Implementation, TILE_NUM (total number of computing data blocks on each core), BLOCK_LENGTH (total size of computing data on each core), and TILE_LENGTH (size of each block) are fixed values.
1 2 3 4 5 6
constexpr int32_t TOTAL_LENGTH = 8 * 2048; // total length of data constexpr int32_t USE_CORE_NUM = 8; // num of core used constexpr int32_t BLOCK_LENGTH = TOTAL_LENGTH / USE_CORE_NUM; // length computed of each core constexpr int32_t TILE_NUM = 8; // split data into 8 tiles for each core constexpr int32_t BUFFER_NUM = 2; // tensor num for each queue constexpr int32_t TILE_LENGTH = BLOCK_LENGTH / TILE_NUM / BUFFER_NUM; // each tile length is separated to 2 part, due to double buffer
- If the above code needs to be converted to adapt to the dynamic shape scenario, you need to add the tiling parameter to the kernel function definition, and compute and pass the tiling parameter. Then compute singleCoreSize (total size of computed data on each core), tileNum (total number of computing data blocks on each core), tileLength (size of each block), and other variables based on the tiling parameter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
__aicore__ inline void Init(GM_ADDR x, GM_ADDR y, GM_ADDR z, uint32_t totalLength, uint32_t tileNum) { ASSERT(GetBlockNum() != 0 && "block dim can not be zero!"); this->blockLength = totalLength / GetBlockNum(); this->tileNum = tileNum; ASSERT(tileNum != 0 && "tile num can not be zero!"); this->tileLength = this->blockLength / tileNum / BUFFER_NUM; // ... } extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z, GM_ADDR workspace, AddCustomTilingData tiling) { KernelAdd op; op.Init(x, y, z, tiling.totalLength, tiling.tileNum); op.Process(); }
Parent topic: Vector Programming