How to Enable Vector Cores During Vector Programming

For Atlas inference products , in addition to AI Core, an independent hardware architecture Vector Core is set as the supplement to the Vector Unit in the AI Core to alleviate the vector compute bottleneck. Vector Core includes only two basic compute resources: Vector Unit and Scalar Unit, which are used to compute vector and scalar data, respectively. If Vector Cores are enabled during vector operator development, both AI Cores and Vector Cores are started at the same time during operator execution, and the same kernel function code is executed in parallel.

This section describes how to enable Vector Cores in Atlas inference products . Before learning this section, you are advised to familiarize yourself with operator implementation, Kernel Launch Based on a Sample Project, and Project-based Operator Development, and master the end-to-end operator development process based on AI Cores. This section will focus on the differences when Vector Cores are enabled. The details are as follows:

When an operator is developed on the kernel, you need to use the macro KERNEL_TASK_TYPE_DEFAULT to enable Vector Cores. When the operator is executed, both AI Cores and Vector Cores are started. In this case, AI Cores are used as Vector Cores. The following code example shows how to enable Vector Cores:

       
        
          
          
            extern "C" __global__ __aicore__ void add_custom(__gm__ uint8_t *x, __gm__ uint8_t *y, __gm__ uint8_t *z, __gm__ uint8_t *workspace, __gm__ uint8_t *tiling)
{
    GET_TILING_DATA(tilingData, tiling);
    if (workspace == nullptr) {
        return;
    }
    GM_ADDR usr = AscendC::GetUserWorkspace(workspace);
    KernelAdd op;
    op.Init(x, y, z, tilingData.blockDim, tilingData.totalLength, tilingData.tileNum);
    KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_MIX_VECTOR_CORE); // Enable Vector Cores.
    if (TILING_KEY_IS(1)) {
        op.Process1();
    } else if (TILING_KEY_IS(2)) {
        op.Process2();
    }
    // ...
}

           

         

       
      

In the tiling development on the host, blockDim indicates the total number of AI Cores and Vector Cores. For example, if it is set to 10 on the host, a total of 10 AI Cores and Vector Cores are started. To ensure that Vector Cores can be started, set blockDim to a value greater than the number of AI Cores. You can call the GetCoreNumAic API to obtain the number of AI Cores and call the GetCoreNumVector API to obtain the number of Vector Cores. The following code snippets are examples for the kernel launch project and custom operator project, where the total number of AI Cores and Vector Cores is set, indicating that all AI Cores and Vector Cores are started.

Kernel launch project

         
              auto ascendcPlatform = platform_ascendc::PlatformAscendCManager::GetInstance();
auto totalCoreNum = ascendcPlatform.GetCoreNumAic();
// Replace ASCENDXXX with the actual version number.
if (ascendcPlatform.GetSocVersion() == platform_ascendc::SocVersion::ASCENDXXX) {
   totalCoreNum = totalCoreNum + ascendcPlatform.GetCoreNumVector();
}
...
kernel_name<<<totalCoreNum , l2ctrl, stream>>>(argument list);

Custom operator project

         
              // Example of a matching tiling function on the host:
ge::graphStatus TilingFunc(gert::TilingContext* context)
{	
    // Enable Vector Cores and set blockDim to the sum of vector cores in AI Cores and Vector Cores.
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(platformInfo);
    auto totalCoreNum = ascendcPlatform.GetCoreNumAic();
    // Replace ASCENDXXX with the actual version number.
    if (ascendcPlatform.GetSocVersion() == platform_ascendc::SocVersion::ASCENDXXX) {
       totalCoreNum = totalCoreNum + ascendcPlatform.GetCoreNumVector();
    }
    context->SetBlockDim(totalCoreNum);
}

Check whether the API supports the Atlas inference product 's Vector Core by referring to the models supported in Ascend C API.
After Vector Cores are enabled, inter-core synchronization instructions (such as IBSet, IBWait, and SyncAll) are not supported, because AI Cores and Vector Cores are executed separately and scheduled by different tasks.
When operator computation overflow occurs (the input Inf/NaN or the computation result exceeds the range), the results of AI Cores and Vector Cores are inconsistent. AI Cores support only the saturation mode, while Vector Cores support only the Inf/NaN mode.

Parent topic: Common Operations