ICPU_RUN_KF

Supported Products

Product

Supported/Unsupported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference product's AI Core

Atlas inference product's Vector Core

x

Atlas training products

Functions

Functions as the CPU debugging entry and completes calls to CPU operator programs during verification of the CPU-side operation of the kernel function.

Prototype

1
#define ICPU_RUN_KF(func, blkdim, ...)

Parameters

Parameter

Input/Output

Description

func

Input

Pointer to the operator kernel function.

blkdim

Input

Number of operator cores (corenum).

...

Input

Fill in all input and output parameters in sequence. The maximum number of parameters is 32. If the number of parameters exceeds 32, a building error occurs.

Returns

None

Constraints

Except func and blkdim, other variables must be pointers to the shared memory allocated by GmAlloc. The number and sequence of input parameters must be the same as those of the kernel function.

Examples

The following code uses the add_custom operator as an example to describe how to compile the application called by the operator when the operator kernel function is verified on the CPU. When implementing your own applications, pay attention to the modifications caused by different operator kernel functions, including different operator kernel function names and input and output parameters. Properly allocate memory, copy memory, and read/write files. You can directly reuse the calling methods of related APIs.

  1. Include header files as required. The ASCENDC_CPU_DEBUG macro is used to distinguish the header files that need to be included on the CPU side from those on the NPU side.
    1
    2
    3
    4
    5
    6
    7
    #include "data_utils.h"
    #ifndef ASCENDC_CPU_DEBUG
    #include "acl/acl.h"
    #else
    #include "tikicpulib.h"
    extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z); // Kernel Function Declaration
    #endif
    
  2. Verify the running on the CPU. To implement the runtime verification of the operator kernel function on the CPU, perform the following steps.
    Figure 1 Runtime verification on the CPU
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    int32_t main(int32_t argc, char* argv[])
    {
        uint32_t blockDim = 8;
        size_t inputByteSize = 8 * 2048 * sizeof(uint16_t);
        size_t outputByteSize = 8 * 2048 * sizeof(uint16_t);
    
        // Call GmAlloc to allocate shared memory and initialize data.
        uint8_t* x = (uint8_t*)AscendC::GmAlloc(inputByteSize);
        uint8_t* y = (uint8_t*)AscendC::GmAlloc(inputByteSize);
        uint8_t* z = (uint8_t*)AscendC::GmAlloc(outputByteSize);
    
        ReadFile("./input/input_x.bin", inputByteSize, x, inputByteSize);
        ReadFile("./input/input_y.bin", inputByteSize, y, inputByteSize);
        // Set the kernel mode to AIV for vector operators.
        AscendC::SetKernelMode(KernelMode::AIV_MODE);
        // Call the ICPU_RUN_KF debugging macro to call the kernel function on the CPU.
        ICPU_RUN_KF(add_custom, blockDim, x, y, z);
        // Write the output data.
        WriteFile("./output/output_z.bin", z, outputByteSize);
        // Call GmFree to release allocated resources.
        AscendC::GmFree((void *)x);
        AscendC::GmFree((void *)y);
        AscendC::GmFree((void *)z);
        return 0;
    }