Concepts and Usage
Concepts
This section covers APIs for managing operator binaries, kernel functions, and parameter configurations. The following figure shows the relationships between them.

- Operator binary: After the operator source code is built, the operator binary file *.o is obtained. For a built-in CANN operator, you can obtain the operator binary file from the operator binary package (Ascend-cann-kernels-*.run). For a custom operator, you can obtain the operator binary file after building the operator and releasing the binary file. For details about how to develop and build a custom operator, see Ascend C Operator Development Guide.
- Kernel function: It is an entry function for implementing an operator on the device. You can use the syntax extension of a C/C++ function to write the running code on the device. You can also perform data access and computing operations in the kernel function to implement all functions of the operator.
API Call Sequence for Kernel Loading and Execution

The major steps are as follows:
- Call acl.init for initialization.
- Allocate runtime resources. Call acl.rt.set_device to specify the compute device and call acl.rt.create_stream to create a stream.
For details, see Runtime Resource Allocation and Deallocation.
- Call acl.rt.binary_load_from_file to load the operator binary file.
- Call acl.rt.binary_get_function to obtain the kernel function handle.
- Operate the parameter list according to the kernel function handle. The operations are as follows:
- Initialize the parameter list.
Currently, the memory can be managed by the system (by calling acl.rt.kernel_args_init) or by users (by calling acl.rt.kernel_args_init_by_user_mem).
- Append parameters and update parameter values.
The kernel function parameter list contains parameters of different types, such as pointer, placeholder, and uint8_t parameters.
- Pointer parameter: Its value is a device memory address. Generally, the input and output of an operator are parameters of this type. You need to call the device memory allocation API (for example, aclrtMalloc) in advance to allocate memory and copy data to the device.
- Placeholder: A placeholder is also a pointer parameter. The difference is that you do not need to manually copy the parameter data to the device. Instead, this operation is completed by the Runtime. The Runtime does not fill the actual device address when a parameter is appended, but fills it only during kernel launch. That is where the placeholder comes in. For non-input and non-output parameters of an operator, you can use a placeholder to combine the host-to-device copies of small data (< 2 KB recommended) into one copy during kernel launch, thus reducing the number of copy operations and improving performance.

You can call different parameter appending APIs for different types of parameters.
- For a placeholder parameter, the associated memory must be placed after all parameters. Therefore, when appending a placeholder parameter, call aclrtKernelArgsAppendPlaceHolder to set a placeholder. After all parameters are appended, call aclrtKernelArgsGetPlaceHolderBuffer to obtain the memory address to which the placeholder points. You can manage the data in the memory based on the obtained memory address.
- For a non-placeholder parameter (such as a pointer parameter or an uint8_t parameter), call aclrtKernelArgsAppend to copy the user-defined parameter value to the parameter data area to which argsHandle points. Call acl.rt.kernel_args_para_update to update parameter values if needed.
Note that the kernel function parameter list may contain multiple parameters, and parameters of different types may appear alternately. Therefore, you need to append parameters from left to right according to the parameter sequence in the parameter list. A maximum of 128 parameters can be appended.
- End the parameter appending and parameter value update.
After all parameters are appended, call aclrtKernelArgsFinalize to indicate that the parameters are assembled. After acl.rt.kernel_args_finalize is called, the parameter values can still be updated. Then, acl.rt.kernel_args_finalize needs to be called again.
- Initialize the parameter list.
- Call acl.rt.launch_kernel_with_config to launch the kernel and start the compute task of the corresponding operator.
- Call acl.rt.binary_unload to unload the operator binary file.
- Deallocate runtime resources. Call acl.rt.destroy_stream to destroy streams and call acl.rt.reset_device to release resources on the device.
For details, see Runtime Resource Allocation and Deallocation.
- Call acl.finalize for deinitialization.