How to Use Mask Operation APIs
Mask is used to control the number of elements involved in vector computation. The following working modes and configuration methods are supported:
|
Working Mode |
Description |
|---|---|
|
Normal mode |
This is the default mode. The mask capability within a single iteration is supported. You need to configure the number of iterations and perform extra computation for the tail block. In normal mode, the mask parameter is used to control the number of elements involved in computation in a single iteration. |
|
Counter mode |
This is a simplified mode. The amount of data to be computed is directly passed, and the number of iterations is automatically inferred. You do not need to concern about the number of iterations or handle the unaligned tail block. However, the mask capability within a single iteration is not supported. In counter mode, the mask parameter indicates the number of elements involved in the entire vector computation. |
The methods of using the mask operation is as follows.
The following is an example in typical scenarios:
- Scenario 1: normal mode + external API configuration + high-dimensional sharding compute API
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
AscendC::LocalTensor<half> dstLocal; AscendC::LocalTensor<half> src0Local; AscendC::LocalTensor<half> src1Local; // 1. Set the normal mode. AscendC::SetMaskNorm(); // 2. Set the mask. AscendC::SetVectorMask<half, AscendC::MaskMode::NORMAL>(0xffffffffffffffff, 0xffffffffffffffff); // Bitwise mode // SetVectorMask<half, MaskMode::NORMAL>(128); // Contiguous mode // 3. Call the vector computation API for multiple times. Set the isSetMask template parameter to false and mask in the API input parameters to the placeholder MASK_PLACEHOLDER, which has no actual meaning. // Set the repeatTime, , and parameters based on the application scenario. // dstBlkStride, src0BlkStride, src1BlkStride = 1. Data is continuously read and written in a single iteration. // dstRepStride, src0RepStride, src1RepStride = 8. Data is continuously read and written between adjacent iterations. AscendC::Add<half, false>(dstLocal, src0Local, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 2, 2, 2, 8, 8, 8 }); AscendC::Sub<half, false>(src0Local, dstLocal, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 2, 2, 2, 8, 8, 8 }); AscendC::Mul<half, false>(src1Local, dstLocal, src0Local, AscendC::MASK_PLACEHOLDER, 1, { 2, 2, 2, 8, 8, 8 }); // 4. Restore mask to the default value. AscendC::ResetMask();
- Scenario 2: counter mode + external API configuration + high-dimensional sharding compute API
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
AscendC::LocalTensor<half> dstLocal; AscendC::LocalTensor<half> src0Local; AscendC::LocalTensor<half> src1Local; int32_t len = 128; // Number of elements involved in computation // 1. Set the counter mode. AscendC::SetMaskCount(); // 2. Set the mask. AscendC::SetVectorMask<half, AscendC::MaskMode::COUNTER>(len); // 3. Call the vector computation API for multiple times. Set the isSetMask template parameter to false and mask in the API input parameters to the placeholder MASK_PLACEHOLDER, which has no actual meaning. // Correctly set the and parameters based on the application scenario. Set repeatTime to a fixed value (recommended: 1, indicating that this value does not take effect). AscendC::Add<half, false>(dstLocal, src0Local, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 1, 1, 1, 8, 8, 8 }); AscendC::Sub<half, false>(src0Local, dstLocal, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 1, 1, 1, 8, 8, 8 }); AscendC::Mul<half, false>(src1Local, dstLocal, src0Local, AscendC::MASK_PLACEHOLDER, 1, { 1, 1, 1, 8, 8, 8 }); // 4. Restore the working mode. AscendC::SetMaskNorm(); // 5. Restore mask to the default value. AscendC::ResetMask();
- Scenario 3: counter mode + external API configuration + API for computing the first n data elements
1 2 3 4 5 6 7 8 9 10 11 12
AscendC::LocalTensor<half> dstLocal; AscendC::LocalTensor<half> src0Local; half num = 2; // 1. Set the mask. AscendC::SetVectorMask<half, AscendC::MaskMode::COUNTER>(128); // The number of elements involved in computation is 128. // 2. Call the API for computing the first n data elements. Set the isSetMask template parameter to false. You are advised to set calCount in the API input parameters to 1. AscendC::Adds<half, false>(dstLocal, src0Local, num, 1); AscendC::Muls<half, false>(dstLocal, src0Local, num, 1); // 3. Restore the working mode. AscendC::SetMaskNorm(); // 4. Restore mask to the default value. AscendC::ResetMask();
- The working mode is set to the counter mode in the API for computing the first n data elements. Therefore, if the API for computing the first n data elements is used together with the counter mode, you do not need to manually call SetMaskCount to set the counter mode.
- In all scenarios where the counter mode is manually used, you need to call SetMaskNorm to restore the working mode after using the counter mode.
- If SetVectorMask is called to set mask, you need to call ResetMask to restore mask to the default value after using it.
- When the high-dimensional sharding compute API is used with the counter mode, the interval computation is added to the API for computing the first n data elements. The dataBlockStride and repeatStride parameters are supported.