Concepts and Usage
Basic Concepts
The APIs in this section manipulate the operator binary, kernel function, kernel function parameter list, and parameters. The following figure shows the relationships between them.

- Operator binary: After the operator source code is built, the operator binary file *.o is obtained. For built-in CANN operators, you can obtain the operator binary file from the operator binary package (package name: Ascend-cann-*-ops-*.run). For a custom operator, you can obtain the operator binary file after building the operator and releasing the binary file. For details about how to develop and build a custom operator, see Ascend C Operator Development Guide.
- Kernel function: It is an entry function for implementing an operator on the device. You can use the syntax extension of a C/C++ function to write the running code on the device. You can also perform data access and computing operations in the kernel function to implement all functions of the operator.
API Call Sequence for Kernel Loading and Execution

The major steps are as follows:
- Call aclInit for initialization.
- Allocate runtime resources. Call aclrtSetDevice to specify the compute device and call aclrtCreateStream to create a stream.
For details, see Runtime Resource Allocation and Deallocation.
- Call aclrtBinaryLoadFromFile to load the operator binary file.
AI CPU operators also support the mode of loading operator binary data from the memory by calling aclrtBinaryLoadFromData. After the operator binary data is loaded, call aclrtRegisterCpuFunc to register the AI CPU operators.
- Call aclrtBinaryGetFunctionByEntry or aclrtBinaryGetFunction to obtain the kernel function handle.
- (Optional) Operate the parameter list according to the kernel function handle. The operations are as follows:
- Initialize the parameter list.
Currently, the memory can be managed by the system (by calling aclrtKernelArgsInit) or by users (by calling aclrtKernelArgsInitByUserMem).
- Append parameters and update parameter values.
The kernel function parameter list contains parameters of different types, such as pointer, placeholder, and uint8_t parameters.
- Pointer parameter: Its value is a device memory address. Generally, the input and output of an operator are parameters of this type. You need to call the device memory allocation API (for example, aclrtMalloc) in advance to allocate memory and copy data to the device.
- Placeholder: A placeholder is also a pointer parameter. The difference is that you do not need to manually copy the parameter data to the device. Instead, this operation is completed by the Runtime. The Runtime does not fill the actual device address when a parameter is appended, but fills it only during kernel launch. That is where the placeholder comes in. For non-input and non-output parameters of an operator, you can use a placeholder to combine the host-to-device copies of small data (< 2 KB recommended) into one copy during kernel launch, thus reducing the number of copy operations and improving performance.

You can call different parameter appending APIs for different types of parameters.
- For a placeholder parameter, the associated memory must be placed after all parameters. Therefore, when appending a placeholder parameter, call aclrtKernelArgsAppendPlaceHolder to set a placeholder. After all parameters are appended, call aclrtKernelArgsGetPlaceHolderBuffer to obtain the memory address to which the placeholder points. You can manage the data in the memory based on the obtained memory address.
- For a non-placeholder parameter (such as a pointer parameter or an uint8_t parameter), call aclrtKernelArgsAppend to copy the user-defined parameter value to the parameter data area to which argsHandle points. To update the parameter value, call aclrtKernelArgsParaUpdate.
Note that the kernel function parameter list may contain multiple parameters, and parameters of different types may appear alternately. Therefore, you need to append parameters from left to right according to the parameter sequence in the parameter list. A maximum of 128 parameters can be appended.
- End the parameter appending and parameter value update.
After all parameters are appended, call aclrtKernelArgsFinalize to indicate that the parameters are assembled. After aclrtKernelArgsFinalize is called, the parameter values can still be updated. Then, aclrtKernelArgsFinalize needs to be called again.
- Initialize the parameter list.
- Call the Launch Kernel API to start the compute task of the corresponding operator.
If the aclrtArgsHandle parameter list handle is used to assemble the input data of the kernel function, call aclrtLaunchKernelWithConfig to start the compute task of the corresponding operator. In this mode, you only need to append parameters to the parameter list in sequence. You do not need to pay attention to the assembly details in the memory or inner parameters.
If the input data of the kernel function is stored in the host or device memory, call aclrtLaunchKernel, aclrtLaunchKernelV2, or aclrtLaunchKernelWithHostArgs to start the compute task of the corresponding operator.
- Call aclrtBinaryUnLoad to unload the operator binary file.
- Deallocate runtime resources. Call aclrtDestroyStream to destroy streams and call aclrtResetDevice to release resources on the device.
For details, see Runtime Resource Allocation and Deallocation.
- Call aclFinalize for deinitialization.