Setting the Kernel Type
Supported Products
|
Product |
Supported/Unsupported |
|---|---|
|
|
√ |
|
|
√ |
|
|
x |
|
|
√ |
|
|
x |
|
|
x |
Functions
Customizes the kernel type. Only cores of this type are started during operator execution. This prevents unnecessary cores from being started and shortens the core startup overhead.
Prototype
- Sets the global default kernel type, which applies to all tiling keys.
Currently, this function can be used in custom operator projects and kernel launch projects.
1KERNEL_TASK_TYPE_DEFAULT(value)
- Sets the kernel type corresponding to a specific tiling key.
Currently, this function can be used only in custom operator projects.
1KERNEL_TASK_TYPE(key, value)
Parameters
|
Parameter |
Input/Output |
Description |
||
|---|---|---|---|---|
|
key |
Input |
Key value of the tiling key. The value is a positive number, indicating a branch of a kernel function. |
||
|
value |
Input |
Kernel type. For details about kernel types, see Table 2.
|
|
Parameter |
Description |
|---|---|
|
KERNEL_TYPE_AIV_ONLY |
Only the Vector Cores on the AI Cores are started during operator execution. For example, if blockDim is set to 10 on the host, 10 Vector Cores are started. For the For the For the |
|
KERNEL_TYPE_AIC_ONLY |
Only the Cube Cores on the AI Cores are started during operator execution. For example, if blockDim is set to 10 on the host, 10 Cube Cores are started. For the For the For the |
|
KERNEL_TYPE_MIX_AIV_1_0 |
In the AIC/AIV hybrid scenario, when commands related to multi-core control is used, set the kernel function type to MIX AIV:AIC 1:0 (with hardware synchronization). When the operator is executed, only the Vector Core on the AI Core is started. For example, if blockDim is set to 10 on the host, 10 Vector Cores are started. Hardware synchronization: When different Cores operate the same global memory block, SyncAll() can be called to synchronize the cores to avoid data dependency problems such as write-after-read, read-after-write, and write-after-write. Currently, multi-core synchronization is classified into hardware synchronization and software synchronization. Hardware synchronization uses the full-core synchronization instruction of the hardware to ensure multi-core synchronization. For the For the For the |
|
KERNEL_TYPE_MIX_AIC_1_0 |
In the AIC/AIV hybrid scenario, when commands related to multi-core control is used, set the kernel function type to MIX AIC:AIV 1:0 (with hardware synchronization). When the operator is executed, only the Cube Core on the AI Core is started. For example, if blockDim is set to 10 on the host, 10 Cube Cores are started. For the For the For the |
|
KERNEL_TYPE_MIX_AIC_1_1 |
In the AIC and AIV hybrid scenario, set the kernel function type to MIX AIC. AIV 1:1. When the operator is executed, the Cube and Vector Cores on the AI Core are started at the same time. For example, if blockDim is set to 10 on the host, 10 Cube Cores and 10 Vector Cores are started. For the For the For the |
|
KERNEL_TYPE_MIX_AIC_1_2 |
In the AIC and AIV hybrid scenario, set the kernel function type to MIX AIC:AIV 1:2. When the operator is executed, the Cube and Vector Cores on the AI Core are started at the same time. For example, if blockDim is set to 10 on the host, 10 Cube Cores and 20 Vector Cores are started. For the For the For the |
|
KERNEL_TYPE_AICORE |
When the operator is executed, only the AI Core is started. For example, if blockDim is set to 5 on the host, 5 AI Cores are started. For the For the For the |
|
KERNEL_TYPE_VECTORCORE |
This parameter is reserved and is not supported in the current version. |
|
KERNEL_TYPE_MIX_AICORE |
This parameter is reserved and is not supported in the current version. |
|
KERNEL_TYPE_MIX_VECTOR_CORE |
Vector operators developed based on Ascend C can run on the Vector Core. This API is called to pass this parameter to enable the Vector Core. After Vector Cores are enabled, AI Cores and Vector Cores are started at the same time for parallel computing during operator execution. For example, if block_dim is set to 10 on the host, a total of 10 AI Cores and Vector Cores are started. Note that when the number of cores is set by using SetBlockDim, the number of cores must be greater than that of AI Cores. Otherwise, the Vector Core will not be started. For the For the For the |
Constraints
- The priority of KERNEL_TASK_TYPE is higher than that of KERNEL_TASK_TYPE_DEFAULT. If both the global kernel type and the kernel type of a tiling key are set, the kernel type of the tiling key is subject to the setting of KERNEL_TASK_TYPE.
- If the global default kernel type is not set and you set the kernel type for only several tiling keys, that is, the kernel type is not set for some tiling keys, an error will be reported during operator kernel compilation.
- When setting a specific kernel task type, ensure that the operator implementation matches the kernel type. For example, if the kernel type is set to KERNEL_TYPE_MIX_AIC_1_2, the internal implementation of the operator should match the core ratio AIC:AIV of 1:2. If the kernel type is set to KERNEL_TYPE_AIC_ONLY, the internal implementation of the operator should be pure cube logic and should not contain vector logic. This setting also applies to other kernel types.
- When the kernel type of a pure cube or pure vec operator is forcibly set to MIX, the workspace size cannot be set to 0. Instead, it must be set to a value greater than 0, for example, 16 or 32.
- When the Tiling template programming is used, you only need to set the kernel type by using ASCENDC_TPL_KERNEL_TYPE_SEL. This API is not required and does not take effect.
Examples
- Example 1: Enabling Vector Core
- After the operator kernel is developed, this API is called to enable the Vector Core. During operator execution, the AI Core and Vector Core are started at the same time. In this case, the AI Core is used as the Vector Core. The following is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
extern "C" __global__ __aicore__ void add_custom(__gm__ uint8_t *x, __gm__ uint8_t *y, __gm__ uint8_t *z, __gm__ uint8_t *workspace, __gm__ uint8_t *tiling) { GET_TILING_DATA(tilingData, tiling); if (workspace == nullptr) { return; } KernelAdd op; op.Init(x, y, z, tilingData.blockDim, tilingData.totalLength, tilingData.tileNum); KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_MIX_VECTOR_CORE); // Enable Vector Core. if (TILING_KEY_IS(1)) { op.Process1(); } else if (TILING_KEY_IS(2)) { op.Process2(); } // ... }
- In the operator tiling development on the host, block_dim indicates the total number of AI Cores and Vector Cores. For example, if it is set to 10 on the host, a total of 10 AI Cores and Vector Cores are started. To ensure that Vector Cores can be started, set blockDim to a value greater than the number of AI Cores. You can call the GetCoreNumAic API to obtain the number of AI Cores and call the GetCoreNumVector API to obtain the number of Vector Cores. The following code snippet shows how to set block_dim. In this example, block_dim is set to the sum of AI Cores and Vector Cores, indicating that all AI Cores and Vector Cores are started.
1 2 3 4 5 6 7 8 9 10 11 12
// Example of a matching tiling function on the host: ge::graphStatus TilingFunc(gert::TilingContext* context) { // Enable Vector Core and set block_dim to the number of vector cores in AI Cores plus the number of vector cores in Vector Cores. auto ascendcPlatform = platform_ascendc::PlatformAscendC(platformInfo); auto totalCoreNum = ascendcPlatform.GetCoreNumAiv(); // Replace ASCENDXXX with the actual version number. if (ascendcPlatform.GetSocVersion() == platform_ascendc::SocVersion::ASCENDXXX) { totalCoreNum = totalCoreNum + ascendcPlatform.GetCoreNumVector(); } context->SetBlockDim(totalCoreNum); }
- After the operator kernel is developed, this API is called to enable the Vector Core. During operator execution, the AI Core and Vector Core are started at the same time. In this case, the AI Core is used as the Vector Core. The following is an example:
- Example 2: Set the kernel type corresponding to a specific tiling key. The following is pseudo-code and cannot be run directly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
extern "C" __global__ __aicore__ void add_custom(__gm__ uint8_t *x, __gm__ uint8_t *y, __gm__ uint8_t *z, __gm__ uint8_t *workspace, __gm__ uint8_t *tiling) { GET_TILING_DATA(tilingData, tiling); if (workspace == nullptr) { return; } KernelAdd op; op.Init(x, y, z, tilingData.blockDim, tilingData.totalLength, tilingData.tileNum); KERNEL_TASK_TYPE_DEFAULT(KERNEL_TYPE_AIV_ONLY); // Set the default kernel type to pure AIV. if (TILING_KEY_IS(1)) { KERNEL_TASK_TYPE(1, KERNEL_TYPE_MIX_AIV_1_0); // Set the kernel type to MIX AIV 1:0 when the tiling key is 1. op.Process1(); } else if (TILING_KEY_IS(2)) { KERNEL_TASK_TYPE(2, KERNEL_TYPE_AIV_ONLY); // Set the kernel type to pure AIV when the tiling key is 2. op.Process2(); } // ... } // Example of a matching tiling function on the host: ge::graphStatus TilingFunc(gert::TilingContext* context) { // ... if (context->GetInputShape(0) > 10) { context->SetTilingKey(1); } else if (some condition) { context->SetTilingKey(2); } }