ICPU_RUN_KF
Supported Products
Product |
Supported/Unsupported |
|---|---|
√ |
|
√ |
|
√ |
|
√ |
|
x |
|
√ |
Functions
Functions as the CPU debugging entry and completes calls to CPU operator programs during verification of the CPU-side operation of the kernel function.
Prototype
1 | #define ICPU_RUN_KF(func, blkdim, ...)
|
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
func |
Input |
Pointer to the operator kernel function. |
blkdim |
Input |
Number of operator cores (corenum). |
... |
Input |
Fill in all input and output parameters in sequence. The maximum number of parameters is 32. If the number of parameters exceeds 32, a building error occurs. |
Returns
None
Constraints
Except func and blkdim, other variables must be pointers to the shared memory allocated by GmAlloc. The number and sequence of input parameters must be the same as those of the kernel function.
Examples
The following code uses the add_custom operator as an example to describe how to compile the application called by the operator when the operator kernel function is verified on the CPU. When implementing your own applications, pay attention to the modifications caused by different operator kernel functions, including different operator kernel function names and input and output parameters. Properly allocate memory, copy memory, and read/write files. You can directly reuse the calling methods of related APIs.
- Include header files as required. The ASCENDC_CPU_DEBUG macro is used to distinguish the header files that need to be included on the CPU side from those on the NPU side.
1 2 3 4 5 6 7
#include "data_utils.h" #ifndef ASCENDC_CPU_DEBUG #include "acl/acl.h" #else #include "tikicpulib.h" extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR y, GM_ADDR z); // Kernel Function Declaration #endif
- Verify the running on the CPU. To implement the runtime verification of the operator kernel function on the CPU, perform the following steps.Figure 1 Runtime verification on the CPU

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
int32_t main(int32_t argc, char* argv[]) { uint32_t blockDim = 8; size_t inputByteSize = 8 * 2048 * sizeof(uint16_t); size_t outputByteSize = 8 * 2048 * sizeof(uint16_t); // Call GmAlloc to allocate shared memory and initialize data. uint8_t* x = (uint8_t*)AscendC::GmAlloc(inputByteSize); uint8_t* y = (uint8_t*)AscendC::GmAlloc(inputByteSize); uint8_t* z = (uint8_t*)AscendC::GmAlloc(outputByteSize); ReadFile("./input/input_x.bin", inputByteSize, x, inputByteSize); ReadFile("./input/input_y.bin", inputByteSize, y, inputByteSize); // Set the kernel mode to AIV for vector operators. AscendC::SetKernelMode(KernelMode::AIV_MODE); // Call the ICPU_RUN_KF debugging macro to call the kernel function on the CPU. ICPU_RUN_KF(add_custom, blockDim, x, y, z); // Write the output data. WriteFile("./output/output_z.bin", z, outputByteSize); // Call GmFree to release allocated resources. AscendC::GmFree((void *)x); AscendC::GmFree((void *)y); AscendC::GmFree((void *)z); return 0; }