How to Use Mask Operation APIs

Mask is used to control the number of elements involved in vector computation. The following working modes and configuration methods are supported:

Table 1 Mask working modes

Working Mode

Description

Normal mode

This is the default mode. The mask capability within a single iteration is supported. You need to configure the number of iterations and perform extra computation for the tail block.

In normal mode, the mask parameter is used to control the number of elements involved in computation in a single iteration.

Call SetMaskNorm to set the normal mode.

Counter mode

This is a simplified mode. The amount of data to be computed is directly passed, and the number of iterations is automatically inferred. You do not need to concern about the number of iterations or handle the unaligned tail block. However, the mask capability within a single iteration is not supported.

In counter mode, the mask parameter indicates the number of elements involved in the entire vector computation.

Call SetMaskCount to set the counter mode.

Table 2 Mask configuration modes

Configuration Mode

Description

API parameter pass (default)

Pass the mask value through the input parameter of the vector computation APIs. The template parameter isSetMask (supported only by some APIs) of the vector computation APIs is used to determine whether to pass the mask parameter through the API or configure it through an external API. The default value is true, indicating that the mask parameter is passed through the API. The mask parameter corresponds to the mask/mask[] parameter in the high-dimensional sharding compute API or the calCount parameter in the API for computing the first n elements of a tensor.

External API configuration

Call the SetVectorMask API to set the mask value. If the template parameter isSetMask of the vector computation API is set to false, the mask parameter in the API input parameters (corresponding to the mask/mask[] parameter in the high-dimensional sharding compute API or the calCount parameter in the API for computing the first n elements of a tensor) does not take effect. This mode is applicable to scenarios where the mask parameter is the same and is repeatedly used for multiple times. You do not need to repeatedly set the mask parameter in the vector computation API, which offers certain performance advantages.

The methods of using the mask operation is as follows.

Table 3 Methods of using the mask operation

Configuration Mode

Working Mode

API for Computing the First n Data Elements

High-dimensional Sharding Compute API

API parameter pass

Normal mode

N/A

Set the isSetMask template parameter to true, pass the mask parameter through the API input parameters, and configure the , , and repeatTime parameters based on the application scenario.

Counter mode

Set the isSetMask template parameter to true and pass the mask parameter through the API input parameters.

  • Set the isSetMask template parameter to true and pass the mask parameter through the API input parameters.
  • Set the and parameters based on the application scenario. Set repeatTime to a fixed value (recommended: 1, indicating that this value does not take effect).

External API configuration

Normal mode

N/A

Call SetVectorMask to set the mask, and then call the high-dimensional sharding compute API.
  • Set the isSetMask template parameter to false and mask in the API input parameters to the placeholder MASK_PLACEHOLDER, which has no actual meaning.
  • Set the repeatTime, , and parameters based on the application scenario.

Counter mode

Call SetVectorMask to set the mask, and then call the API for computing the first n data elements. Set the isSetMask template parameter to false. You are advised to set calCount in the API input parameters to 1.

Call SetVectorMask to set the mask, and then call the high-dimensional sharding compute API.
  • Set the isSetMask template parameter to false and mask in the API input parameters to the placeholder MASK_PLACEHOLDER, which has no actual meaning.
  • Set the and parameters based on the application scenario. Set repeatTime to a fixed value (recommended: 1, indicating that this value does not take effect).

The following is an example in typical scenarios:

  • Scenario 1: normal mode + external API configuration + high-dimensional sharding compute API
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    AscendC::LocalTensor<half> dstLocal;
    AscendC::LocalTensor<half> src0Local;
    AscendC::LocalTensor<half> src1Local;
    
    // 1. Set the normal mode.
    AscendC::SetMaskNorm();
    // 2. Set the mask.
    AscendC::SetVectorMask<half, AscendC::MaskMode::NORMAL>(0xffffffffffffffff, 0xffffffffffffffff);  // Bitwise mode
    // SetVectorMask<half, MaskMode::NORMAL>(128);  // Contiguous mode
    
    // 3. Call the vector computation API for multiple times. Set the isSetMask template parameter to false and mask in the API input parameters to the placeholder MASK_PLACEHOLDER, which has no actual meaning.
    // Set the repeatTime, , and  parameters based on the application scenario.
    // dstBlkStride, src0BlkStride, src1BlkStride = 1. Data is continuously read and written in a single iteration.
    // dstRepStride, src0RepStride, src1RepStride = 8. Data is continuously read and written between adjacent iterations.
    AscendC::Add<half, false>(dstLocal, src0Local, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 2, 2, 2, 8, 8, 8 });
    AscendC::Sub<half, false>(src0Local, dstLocal, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 2, 2, 2, 8, 8, 8 });
    AscendC::Mul<half, false>(src1Local, dstLocal, src0Local, AscendC::MASK_PLACEHOLDER, 1, { 2, 2, 2, 8, 8, 8 });
    // 4. Restore mask to the default value.
    AscendC::ResetMask();
    
  • Scenario 2: counter mode + external API configuration + high-dimensional sharding compute API
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    AscendC::LocalTensor<half> dstLocal;
    AscendC::LocalTensor<half> src0Local;
    AscendC::LocalTensor<half> src1Local;
    int32_t len = 128;  // Number of elements involved in computation
    // 1. Set the counter mode.
    AscendC::SetMaskCount();
    // 2. Set the mask.
    AscendC::SetVectorMask<half, AscendC::MaskMode::COUNTER>(len);
    // 3. Call the vector computation API for multiple times. Set the isSetMask template parameter to false and mask in the API input parameters to the placeholder MASK_PLACEHOLDER, which has no actual meaning.
    // Correctly set the  and  parameters based on the application scenario. Set repeatTime to a fixed value (recommended: 1, indicating that this value does not take effect).
    AscendC::Add<half, false>(dstLocal, src0Local, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 1, 1, 1, 8, 8, 8 });
    AscendC::Sub<half, false>(src0Local, dstLocal, src1Local, AscendC::MASK_PLACEHOLDER, 1, { 1, 1, 1, 8, 8, 8 });
    AscendC::Mul<half, false>(src1Local, dstLocal, src0Local, AscendC::MASK_PLACEHOLDER, 1, { 1, 1, 1, 8, 8, 8 });
    // 4. Restore the working mode.
    AscendC::SetMaskNorm();
    // 5. Restore mask to the default value.
    AscendC::ResetMask();
    
  • Scenario 3: counter mode + external API configuration + API for computing the first n data elements
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    AscendC::LocalTensor<half> dstLocal;
    AscendC::LocalTensor<half> src0Local;
    half num = 2; 
    // 1. Set the mask.
    AscendC::SetVectorMask<half, AscendC::MaskMode::COUNTER>(128); // The number of elements involved in computation is 128.
    // 2. Call the API for computing the first n data elements. Set the isSetMask template parameter to false. You are advised to set calCount in the API input parameters to 1.
    AscendC::Adds<half, false>(dstLocal, src0Local, num, 1);
    AscendC::Muls<half, false>(dstLocal, src0Local, num, 1);
    // 3. Restore the working mode.
    AscendC::SetMaskNorm();
    // 4. Restore mask to the default value.
    AscendC::ResetMask();
    
  • The working mode is set to the counter mode in the API for computing the first n data elements. Therefore, if the API for computing the first n data elements is used together with the counter mode, you do not need to manually call SetMaskCount to set the counter mode.
  • In all scenarios where the counter mode is manually used, you need to call SetMaskNorm to restore the working mode after using the counter mode.
  • If SetVectorMask is called to set mask, you need to call ResetMask to restore mask to the default value after using it.
  • When the high-dimensional sharding compute API is used with the counter mode, the interval computation is added to the API for computing the first n data elements. The dataBlockStride and repeatStride parameters are supported.