Single-Operator API Calling
Single-operator API call refers to directly calling the single-operator API and executing the operator based on C language API. After an operator project is created, based on the project code framework, define the operator prototype, implement the operator on the kernel and tiling on the host, compile and deploy the operator through the project build script, and then call single-operator APIs.
Principles
After the custom operator is compiled and deployed, the single-operator API is automatically generated and can be directly called in an application.
Generally, the single-operator API is defined as a two-phase API. See the following example:
aclnnStatus aclnnXxxGetWorkspaceSize(const aclTensor *src, ..., aclTensor *out, uint64_t *workspaceSize, aclOpExecutor **executor); aclnnStatus aclnnXxx(void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream);
aclnnXxxGetWorkspaceSize is the first-phase API, which is used to compute the workspace size required for the API calling. After obtaining the workspace size, allocate memory on the device based on workspaceSize, and call the second-phase API aclnnXxx to perform computation. Xxx indicates the operator type passed during operator prototype registration.
The rules for generating input and output parameters of the aclnnXxxGetWorkspaceSize API are as follows:
- The suffix Optional is added to the name of an optional input. In the following example, x is optional.
aclnnStatus aclnnXxxGetWorkspaceSize(const aclTensor *xOptional, ..., aclTensor *out, uint64_t *workspaceSize, aclOpExecutor **executor);
- If the input and output have the same name and are carried by the same tensor, the generated aclnn API retains only the argument, removes the const modifier of the input, and uses Ref as the suffix. In the following example, input and output are defined as x, and xRef is used as both input and output.
aclnnStatus aclnnXxxGetWorkspaceSize(aclTensor *xRef, ..., uint64_t *workspaceSize, aclOpExecutor **executor);
- If there is only one output, the output parameter is named out. If there are multiple outputs, each output is suffixed with Out.
// There is only one output. aclnnStatus aclnnXxxGetWorkspaceSize(const aclTensor *src, ..., aclTensor *out, uint64_t *workspaceSize, aclOpExecutor **executor); // There are multiple outputs. aclnnStatus aclnnXxxGetWorkspaceSize(const aclTensor *src, ..., aclTensor *yOut, aclTensor *y1Out, ..., uint64_t *workspaceSize, aclOpExecutor **executor);
Pre-operations
- Create a custom operator project by referring to Operator Project Creation or create a simple custom operator project by referring to Simple Custom Operator Project.
- Prepare for the implementation on the kernel by referring to Operator Implementation on the Kernel, and prepare for the implementation on the host by referring to Tiling Implementation on the Host and Operator Prototype Definition.
- For custom operator projects, compile and deploy the operator by referring to Operator Project Build and Operator Package Deployment. During build and deployment, enable the binary build function of the operator as follows: Modify the build configuration item file CMakePresets.json in the operator project and set ENABLE_BINARY_PACKAGE to True. You can deploy the binary operator to the current environment for subsequent operator calling.
"ENABLE_BINARY_PACKAGE": { "type": "BOOL", "value": "True" },After the operator is compiled and deployed, the header file aclnn_xx.h and dynamic library libcust_opapi.so for calling a single-operator are generated in the op_api directory under the operator package installation directory.
Take the default installation scenario as an example. The directory structure of the .h file and dynamic library libcust_opapi.so called by a single-operator is as follows:├── opp // Operator library directory │ ├── vendors // Directory of custom operators │ ├── config.ini │ └── vendor_name1 // Custom operator deployed by the storage vendor. vendor_name is configured during the build of the custom operator installation package. If vendor_name is not configured, the default value customize is used. │ ├── op_api │ │ ├── include │ │ │ └── aclnn_xx.h │ │ └── lib │ │ └── libcust_opapi.so ...
- For a simple custom operator development project, compile the operator by referring to Simple Custom Operator Project. After the build is complete, the aclnn_xx.h file and dynamic library libcust_opapi.so for calling a single-operator are generated in the following path: CMAKE_INSTALL_PREFIX, which indicates the path for storing compilation products configured in the CMake file.
- Dynamic library path: ${CMAKE_INSTALL_PREFIX}/op_api/lib/libcust_opapi.so
- Header file path: ${CMAKE_INSTALL_PREFIX}/op_api/include
Verification Code Project Preparation
├──input // Directory for storing the input data generated by the script ├──output // Directory for storing the output data and truth value generated during operator execution ├── inc // Header file directory │ ├── common.h // Common method class declaration file, used to read binary files │ ├── operator_desc.h // Operator description declaration file, including the operator input and output, operator type, input description, and output description │ ├── op_runner.h // Operator execution information declaration file, including the numbers and sizes of operator input and output ├── src │ ├── CMakeLists.txt // Build script │ ├── common.cpp // Common function file, used to read binary files │ ├── main.cpp // Entry for single-operator calling │ ├── operator_desc.cpp // File used to construct the input and output description of the operator │ ├── op_runner.cpp // Main process implementation file for single-operator calling ├── scripts │ ├── verify_result.py // Truth value comparison file │ ├── gen_data.py // Script file for generating the input data and truth value │ ├── acl.json // ACL configuration file
The following describes how to compile the main.cpp, op_runner.cpp, and CMakeLists.txt files for single-operator calling.
Single-Operator Call Sequence
The calling process of the single-operator API is as follows.

This section uses the AddCustom operator as an example to describe how to compile the code logic for operator calling. Other operators are called in a similar way to the logic AND Add operator. Modify the code based on your actual need.
The following is a code snippet of key steps only, which is not ready to be built or run. After APIs are called, you need to add exception handling branches and record error logs and info logs.
In single-operator API execution mode, .cpp and .h files are automatically generated in the build_out/autogen directory of the build project. When compiling the calling code of the single-operator, ensure that the automatically generated header file for single-operator API execution is included. The following is an example:
#include "aclnn_add_custom.h"
// 1. Initialize AscendCL.
aclRet = aclInit("../scripts/acl.json");
// 2. Allocate runtime resources.
int deviceId = 0;
aclRet = aclrtSetDevice(deviceid);
// Obtain the run mode of the software stack. Different run modes lead to different API call sequences (for example, whether data transfer is required).
aclrtRunMode runMode;
bool g_isDevice = false;
aclError aclRet = aclrtGetRunMode(&runMode);
g_isDevice = (runMode == ACL_DEVICE);
// 3. Allocate memory to store the input and output of the operator.
// ......
// 4. Transmit data.
if (aclrtMemcpy(devInputs_[i], size, hostInputs_[i], size, kind) != ACL_SUCCESS) {
return false;
}
// 5. Compute the workspace size and allocate memory.
size_t workspaceSize = 0;
aclOpExecutor *handle = nullptr;
auto ret = aclnnAddCustomGetWorkspaceSize(inputTensor_[0], inputTensor_[1], outputTensor_[0],
&workspaceSize, &handle);
// ...
void *workspace = nullptr;
if (workspaceSize != 0) {
if (aclrtMalloc(&workspace, workspaceSize, ACL_MEM_MALLOC_HUGE_FIRST) != ACL_SUCCESS) {
ERROR_LOG("Malloc device memory failed");
}
}
// 6. Execute the single-operator.
if (aclnnAddCustom(workspace, workspaceSize, handle, stream) != ACL_SUCCESS) {
(void)aclrtDestroyStream(stream);
ERROR_LOG("Execute Operator failed. error code is %d", static_cast<int32_t>(ret));
return false;
}
// 7. Synchronization.
aclrtSynchronizeStream(stream);
// 8. Process the output data after the operator is executed, for example, printing data to the screen or writing the data to a file. You can implement this function as required.
// ......
// 9. Destroy runtime allocations.
aclRet = aclrtResetDevice(deviceid);
// ....
// 10. Deinitialize AscendCL.
aclRet = aclFinalize();
CMakeLists File
After the operator is compiled, the aclnn_xx.h file and dynamic library libcust_opapi.so for single-operator calling are generated. For details about the path, see Pre-operations.
When compiling the operator calling program, add the header file directory for single-operator calling to the search path include_directories of the header file so that the header file can be found. In addition, you need to link the cust_opapi dynamic library and add the directory where libcust_opapi.so is located to the search path link_directories of the library file.
- Add the header file directory for single-operator calling to the search path include_directories of the header file. The following example is for reference only. Set the parameters based on the actual directory location of the header file.
include_directories( ${INC_PATH}/runtime/include ${INC_PATH}/atc/include ../inc ${OP_API_PATH}/include )
- Link to the cust_opapi link library.
target_link_libraries(execute_add_op ascendcl cust_opapi acl_op_compiler nnopbase stdc++ ) - Add the directory where libcust_opapi.so is located to the search path link_directories of the library file. The following example is for reference only. Set parameters based on the actual directory of the library file.
link_directories( ${LIB_PATH} ${LIB_PATH1} ${OP_API_PATH}/lib )
Test Data Generation
Run the following command in the sample operator project directory:
python3 scripts/gen_data.py
Two data files input_0.bin and input_1.bin with shape (8, 2048) and data type float16 are generated in the current project directory for verifying the AddCustom operator.
A code example is as follows:
import numpy as np
a = np.random.randint(100, size=(8, 2048,)).astype(np.float16)
b = np.random.randint(100, size=(8, 2048,)).astype(np.float16)
a.tofile('input_0.bin')
b.tofile('input_1.bin')
Build and Running
- In the development environment, set environment variables and configure the paths of the header files and library files on which the build of the AscendCL single-operator verification program depends. The following is an example of setting environment variables. ${INSTALL_DIR} indicates the CANN software installation directory, for example, $HOME/Ascend/ascend-toolkit/latest. {arch-os} indicates the architecture and OS of the operating environment. arch indicates the OS architecture, and os indicates the operating system, for example, x86_64-linux or aarch64-linux.
export DDK_PATH=${INSTALL_DIR} export NPU_HOST_LIB=${INSTALL_DIR}/{arch-os}/lib64
- Build the sample project to generate an executable file for single-operator verification.
- Go to the directory of the sample project and run the following command in this directory to create a directory (for example, build) for storing the generated executable file.
mkdir -p build
- Go to the build directory and run the CMake compile command to generate build files.
Command example:
cd build cmake ../src
- Run the following command to generate an executable file:
make
The executable file execute_add_op is generated in the output directory of the project.
- Go to the directory of the sample project and run the following command in this directory to create a directory (for example, build) for storing the generated executable file.
- Execute the single-operator.
- Copy execute_add_op in the output directory of the sample project in the development environment to any directory in the operating environment as the running user (for example, HwHiAiUser).
Note: If your development environment is the operating environment, skip this step.
- Run the execute_add_op file in the operating environment.
chmod +x execute_add_op ./execute_add_op
Check the command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[INFO] Set device[0] success [INFO] Get RunMode[1] success [INFO] Init resource success [INFO] Set input success [INFO] Copy input[0] success [INFO] Copy input[1] success [INFO] Create stream success [INFO] Execute aclnnAddCustomGetWorkspaceSize success, workspace size 0 [INFO] Execute aclnnAddCustom success [INFO] Synchronize stream success [INFO] Copy output[0] success [INFO] Write output success [INFO] Run op success [INFO] Reset Device success [INFO] Destroy resource success
If Run op success is displayed, the execution is successful and the output_z.bin file is generated in the output directory.
- Copy execute_add_op in the output directory of the sample project in the development environment to any directory in the operating environment as the running user (for example, HwHiAiUser).
- Compare the truth value files.
Switch to the root directory of the sample project and run the following command:
python3 scripts/verify_result.py output/output_z.bin output/golden.bin
Check the command output:1test pass
The verification result of the AddCustom operator is correct.