Using the Data Transfer API Efficiently
[Priority] High
[Description] When using the data movement API, you are advised to configure the movement control parameters to implement continuous movement or movement at a fixed interval instead of using the for loop. The efficiency of the two methods is greatly different. As shown in the following figure, each row is 16 KB, and the first 2 KB needs to be moved from each row. In this scenario, if the for loop is used to traverse each row, only 2 KB can be moved at a time. If the DataCopyParams parameters (including srcStride/dstStride/blockLen/blockCount) are directly configured, the required data can be transferred at a time (recommended), that is, 32 KB. For details about the relationship between the amount of data to be moved and the actual bandwidth, see Transferring a Large Data Block at a Time.
1 2 3 4 5 6 7 8 9 10 11 |
// There is an interval for moving data. 2 KB data is transferred among the 16 KB data in each row on the GM, and there are 16 rows in total. LocalTensor<float> tensorIn; GlobalTensor<float> tensorGM; ... constexpr int32_t copyWidth = 2 * 1024 / sizeof(float); constexpr int32_t imgWidth = 16 * 1024 / sizeof(float); constexpr int32_t imgHeight = 16; // Use the for loop. Only 2 KB data can be moved each time. Repeat this operation for 16 times. for (int i = 0; i < imgHeight; i++) { DataCopy(tensorIn[i * copyWidth], tensorGM[i * imgWidth], copyWidth); } |
[Positive Example]
1 2 3 4 5 6 7 8 9 10 11 12 13 |
LocalTensor<float> tensorIn; GlobalTensor<float> tensorGM; ... constexpr int32_t copyWidth = 2 * 1024 / sizeof(float); constexpr int32_t imgWidth = 16 * 1024 / sizeof(float); constexpr int32_t imgHeight = 16; // Use the DataCopy API that contains DataCopyParams to complete data movement at a time. DataCopyParams copyParams; copyParams.blockCount = imgHeight; copyParams.blockLen = copyWidth / 8; // The unit of movement is data block (32 bytes). Each data block has eight floats. copyParams.srcStride = (imgWidth - copyWidth) / 8; // Indicates the interval between two src movements. The unit is data block. copyParams.dstStride = 0; // Indicates continuous write. The interval between two dst movements is 0. The unit is data block. DataCopy(tensorGM, tensorIn, copyParams); |