Setting Kernel Type

Product Support

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	x
Atlas inference product 's AI Core	√
Atlas inference product 's Vector Core	x
Atlas training products	x

Function

Customizes the kernel type. Only cores of this type are started during operator execution. This prevents unnecessary cores from being started and shortens the core startup overhead.

Prototype

Sets the global default kernel type, which applies to all tiling keys.
Currently, this function can be used in custom operator projects and kernel launch projects.
1

KERNEL_TASK_TYPE_DEFAULT(value)
Sets the kernel type corresponding to a specific tiling key.
Currently, this function can be used only in custom operator projects.
1

KERNEL_TASK_TYPE(key, value)

Parameters

Table 1 Parameters

Parameter

Input/Output

Description

key

Input

Key value of the tiling key. The value is a positive number, indicating a branch of a kernel function.

value

Input

Kernel type. For details about kernel types, see Table 2.

           
                enum KernelMetaType {
    KERNEL_TYPE_AIV_ONLY,
    KERNEL_TYPE_AIC_ONLY,
    KERNEL_TYPE_MIX_AIV_1_0,
    KERNEL_TYPE_MIX_AIC_1_0,
    KERNEL_TYPE_MIX_AIC_1_1,
    KERNEL_TYPE_MIX_AIC_1_2,
    KERNEL_TYPE_AICORE,
    KERNEL_TYPE_VECTORCORE,
    KERNEL_TYPE_MIX_AICORE,
    KERNEL_TYPE_MIX_VECTOR_CORE,
    KERNEL_TYPE_MAX
};

**Table 2** Kernel types
Parameter	Description
KERNEL_TYPE_AIV_ONLY	Only the Vector Cores on the AI Cores are started during operator execution. For example, if blockDim is set to 10 on the host, 10 Vector Cores are started. For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For the Atlas inference product 's AI Core, this parameter is not supported.
KERNEL_TYPE_AIC_ONLY	Only the Cube Cores on the AI Cores are started during operator execution. For example, if blockDim is set to 10 on the host, 10 Cube Cores are started. For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For the Atlas inference product 's AI Core, this parameter is not supported.
KERNEL_TYPE_MIX_AIV_1_0	In the AIC/AIV hybrid scenario, when commands related to multi-core control is used, set the kernel function type to MIX AIV:AIC 1:0 (with hardware synchronization). When the operator is executed, only the Vector Core on the AI Core is started. For example, if blockDim is set to 10 on the host, 10 Vector Cores are started. Hardware synchronization: When different Cores operate the same global memory block, SyncAll() can be called to synchronize the cores to avoid data dependency problems such as write-after-read, read-after-write, and write-after-write. Currently, multi-core synchronization is classified into hardware synchronization and software synchronization. Hardware synchronization uses the full-core synchronization instruction of the hardware to ensure multi-core synchronization. For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For the Atlas inference product 's AI Core, this parameter is not supported.
KERNEL_TYPE_MIX_AIC_1_0	In the AIC/AIV hybrid scenario, when commands related to multi-core control is used, set the kernel function type to MIX AIC:AIV 1:0 (with hardware synchronization). When the operator is executed, only the Cube Core on the AI Core is started. For example, if blockDim is set to 10 on the host, 10 Cube Cores are started. For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For the Atlas inference product 's AI Core, this parameter is not supported.
KERNEL_TYPE_MIX_AIC_1_1	In the AIC and AIV hybrid scenario, set the kernel function type to MIX AIC. AIV 1:1. When the operator is executed, the Cube and Vector Cores on the AI Core are started at the same time. For example, if blockDim is set to 10 on the host, 10 Cube Cores and 10 Vector Cores are started. For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For the Atlas inference product 's AI Core, this parameter is not supported.
KERNEL_TYPE_MIX_AIC_1_2	In the AIC and AIV hybrid scenario, set the kernel function type to MIX AIC:AIV 1:2. When the operator is executed, the Cube and Vector Cores on the AI Core are started at the same time. For example, if blockDim is set to 10 on the host, 10 Cube Cores and 20 Vector Cores are started. For the Atlas A2 training products / Atlas A2 inference products , this parameter is supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is supported. For the Atlas inference product 's AI Core, this parameter is not supported.
KERNEL_TYPE_AICORE	When the operator is executed, only the AI Core is started. For example, if blockDim is set to 5 on the host, 5 AI Cores are started. For the Atlas inference products , this parameter is supported. For the Atlas A2 training products / Atlas A2 inference products , this parameter is not supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is not supported.
KERNEL_TYPE_VECTORCORE	This parameter is reserved and is not supported in the current version.
KERNEL_TYPE_MIX_AICORE	This parameter is reserved and is not supported in the current version.
KERNEL_TYPE_MIX_VECTOR_CORE	Vector computation operators developed based on Ascend C can run on the Vector Core. This API is called to pass this parameter to enable the Vector Core. After the Vector Core is enabled, AI Cores and Vector Cores are started at the same time for parallel computation during operator execution. For example, if block_dim is set to 10 on the host, a total of 10 AI Cores and Vector Cores are started. Note that when the number of cores is set by using SetBlockDim, the number of cores must be greater than that of AI Cores. Otherwise, the Vector Core will not be started. For the Atlas inference products , this parameter is supported. For the Atlas A2 training products / Atlas A2 inference products , this parameter is not supported. For the Atlas A3 training products / Atlas A3 inference products , this parameter is not supported.

Restrictions

The priority of KERNEL_TASK_TYPE is higher than that of KERNEL_TASK_TYPE_DEFAULT. If both the global kernel type and the kernel type of a tiling key are set, the kernel type of the tiling key is subject to the setting of KERNEL_TASK_TYPE.
If the global default kernel type is not set and you set the kernel type for only several tiling keys, that is, the kernel type is not set for some tiling keys, an error will be reported during operator kernel compilation.
When setting a specific kernel task type, ensure that the operator implementation matches the kernel type. For example, if the kernel type is set to KERNEL_TYPE_MIX_AIC_1_2, the internal implementation of the operator should match the core ratio AIC:AIV of 1:2. If the kernel type is set to KERNEL_TYPE_AIC_ONLY, the internal implementation of the operator should be pure cube logic and should not contain vector logic. This setting also applies to other kernel types.
When the kernel type of a pure cube or pure vec operator is forcibly set to MIX, the workspace size cannot be set to 0. Instead, it must be set to a value greater than 0, for example, 16 or 32.
When using the Tiling template programming, you only need to set the kernel type by using ASCENDC_TPL_KERNEL_TYPE_SEL. This API does not take effect.

Example

Example 1: Enabling Vector Core

When an operator is developed on the kernel, you need to use this API to enable the Vector Core. When the operator is executed, the AI Core and Vector Core are started at the same time. In this case, the AI Core is used as the Vector Core. The following is an example:

          
           
             
             
               extern "C" __global__ __aicore__ void add_custom(__gm__ uint8_t *x, __gm__ uint8_t *y, __gm__ uint8_t *z, __gm__ uint8_t *workspace, __gm__ uint8_t *tiling)
{
    GET_TILING_DATA(tilingData, tiling);
    if (workspace == nullptr) {
        return;
    }
    KernelAdd op;
    op.Init(x, y, z, tilingData.blockDim, tilingData.totalLength, tilingData.tileNum);
    KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_MIX_VECTOR_CORE); // Enable Vector Core.
    if (TILING_KEY_IS(1)) {
        op.Process1();
    } else if (TILING_KEY_IS(2)) {
        op.Process2();
    }
    // ...
}

              

            

          
         

In the operator tiling development on the host, block_dim indicates the total number of AI Cores and Vector Cores. For example, if it is set to 10 on the host, a total of 10 AI Cores and Vector Cores are started. To ensure that Vector Cores can be started, set blockDim to a value greater than the number of AI Cores. You can call the GetCoreNumAic API to obtain the number of AI Cores and call the GetCoreNumVector API to obtain the number of Vector Cores. The following code snippet shows how to set block_dim. In this example, block_dim is set to the sum of AI Cores and Vector Cores, indicating that all AI Cores and Vector Cores are started.

          
               // Example of a matching tiling function on the host:
ge::graphStatus TilingFunc(gert::TilingContext* context)
{	
    // Enable Vector Core and set block_dim to the number of vector cores in AI Cores plus the number of vector cores in Vector Cores.
    auto ascendcPlatform = platform_ascendc::PlatformAscendC(platformInfo);
    auto totalCoreNum = ascendcPlatform.GetCoreNumAiv();
    // Replace ASCENDXXX with the actual version number.
    if (ascendcPlatform.GetSocVersion() == platform_ascendc::SocVersion::ASCENDXXX) {
       totalCoreNum = totalCoreNum + ascendcPlatform.GetCoreNumVector();
    }
    context->SetBlockDim(totalCoreNum);
}

Example 2: Set the kernel type corresponding to a specific tiling key. The following is pseudo-code and cannot be run directly.

        
         
           
           
             extern "C" __global__ __aicore__ void add_custom(__gm__ uint8_t *x, __gm__ uint8_t *y, __gm__ uint8_t *z, __gm__ uint8_t *workspace, __gm__ uint8_t *tiling)
{
    GET_TILING_DATA(tilingData, tiling);
    if (workspace == nullptr) {
        return;
    }
    KernelAdd op;
    op.Init(x, y, z, tilingData.blockDim, tilingData.totalLength, tilingData.tileNum);
    KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_AIV_ONLY); // Set the default kernel type to pure AIV.
    if (TILING_KEY_IS(1)) {
        KERNEL_TASK_TYPE(1, KERNEL_TYPE_MIX_AIV_1_0); // Set the kernel type to MIX AIV 1:0 when the tiling key is 1.
        op.Process1();
    } else if (TILING_KEY_IS(2)) {
        KERNEL_TASK_TYPE(2, KERNEL_TYPE_AIV_ONLY); // Set the kernel type to pure AIV when the tiling key is 2.
        op.Process2();
    }
    // ...
}
// Example of a matching tiling function on the host:
ge::graphStatus TilingFunc(gert::TilingContext* context)
{	
    // ...
    if (context->GetInputShape(0) > 10) {
        context->SetTilingKey(1);
    } else if (some condition) {
        context->SetTilingKey(2);
    }
}

            

          

        
       

Parent topic: Kernel Tiling