Single-Operator Call Sequence

This section describes two methods of calling a single-operator and the API call sequence in the two methods.

If you need to execute a single-operator during app development, see AscendCL API Call Sequence to understand the overall process and then view the procedure described in this section.

For details about the operators supported by the system, see CANN Operator Specifications in Operator Acceleration Library API Reference.

For operators that are not supported by the system, you need to develop custom operators by referring to Ascend C Operator Development Guide .

For TIK custom dynamic-shape operators, you need to register an operator selector first. For details, see Sample Code for Executing a Dynamic-Shape Operator (Operator Selector Registered).

Single-Operator Calling Modes

  • Single-operator API execution: C language-based API execution operator. The IR (Intermediate Representation) definition does not need to be provided. You can directly call the operator APIs under Group Management. In this mode, the APIs are defined as two-phase APIs as follows:
    1
    2
    aclnnStatus aclXxxGetWorkspaceSize(const aclTensor *src, ..., aclTensor *out, ..., uint64_t *workspaceSize, aclOpExecutor **executor);
    aclnnStatus aclXxx(void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream);
    

    Call the first API acl Xxx GetWorkspaceSize to calculate the workspace memory required for the current API call. After obtaining the workspaceSize required for the current calculation, apply for the NPU memory based on the workspaceSize, and then call the second API acl Xxx to calculate .

    Currently, the following operators are called in single-operator API execution mode:

    • NN operator: Neural Network operator, which is a built-in basic operator of CANN. The API prefix is aclnnXxx. It covers the calculation types related to deep learning algorithms in frameworks such as TensorFlow, Pytorch, MindSpore and ONNX, such as Softmax, MatMul, and Convolution.
    • Fusion operator: built-in fusion operator of CANN. The API prefix is aclnnXxx. Multiple independent basic small operators (such as vector and matrix cube) are fused into a large operator. The functions of multiple small operators are equivalent to those of the large operator, large operators outperform small operators in terms of performance or memory. For example, common operators include Flash Attention and common computing fusion operators (MC2 operators for short).
    • DVPP operator: Digital Vision Pre-Processing operator. The API prefix is acldvppXxx. It provides preprocessing APIs such as high-performance video/image encoding and decoding and image cropping and resizing.
    • When you call the NN operator and fused operator APIs, the operators that are already built in the operator binary package (Ascend-cann-kernels) are directly called. You do not need to build the operators again. For details about how to install the operator binary package, see CANN Software Installation Guide.
    • When you call a DVPP operator API, you do not need to build the corresponding operator.
  • Single-operator model execution: Operator execution is based on graph IR. First, compile the operator (for example, use the ATC tool to compile the single-operator description file defined by Ascend IR into an operator .om model file). Then, call an AscendCL API to load the operator model (for example, aclopSetModelDir). Finally, call an AscendCL API to execute the operator (for example, aclopExecuteV2).

The following table describes the support for the two single-operator calling modes in Ascend AI Processor.

-

Single-Operator API Execution

Single-Operator Model Execution

Atlas 200/300/500 Inference Product

x

Atlas Training Series Product

√ (Partially supported)

API Call Sequence of Single-Operator API Execution

Figure 1 API call sequence of single-operator API execution
The key APIs are described as follows:
  1. Initialize AscendCL.

    Call aclInit to initialize AscendCL.

  2. Allocate runtime resources.

    Allocate runtime resources in sequence. For details, see Runtime Resource Allocation and Deallocation.

  3. Allocate and transfer data memory.
    1. Call aclrtMalloc to allocate device memory to store the input and output data of the operator.
    2. Call APIs such as aclCreateTensor and aclCreateIntArray to construct the input and output data of the operator, such as aclTensor and aclIntArray. For details about the APIs, see Group Management.

    To transfer data from the host to the device, call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to copy the memory.

  4. Calculate the workspace and execute the operator.
    1. Call aclxxXxxGetWorkspaceSize to obtain the input argument of the operator and calculate the workspace required for executing the operator.
    2. Call the aclrtMalloc API to allocate device memory based on the workspace size.
    3. Call aclxxXxx to perform calculation and obtain the result.

    Single-operator execution involves two-phase API calls, that is, aclxxXxxGetWorkspaceSize and aclxxXxx. For details about the APIs and how to use them, see Group Management.

  5. Call aclrtSynchronizeStream to wait for the stream tasks to complete.
  6. Call aclrtFree to free the memory.

    Call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to transfer data from the device to the host using memory copy.

  7. Deallocate runtime resources.
    1. Call APIs such as aclDestroyTensor and aclDestroyIntArray to destroy the input and output of the operator. For details about the APIs, see .Group Management
    2. After all data is released, deallocate runtime resources in sequence. For details, see Runtime Resource Allocation and Deallocation.
  8. Deinitialize AscendCL.

    Call aclFinalize to deinitialize AscendCL.

API Call Sequence of Single-Operator Model Execution

Figure 2 API call sequence of single-operator model execution
The key APIs are described as follows:
  1. Compile an operator.

    Operators can be compiled in either of the following modes:

    • After an operator is compiled, the operator data is saved in the .om file.

      In this mode, you need to use the ATC tool to compile the operator. For details, see ATC Instructions. Compile the single-operator definition file (*.json) into an offline model adapted to the Ascend AI Processor (*.om file).

      After the compilation, perform 2, 3, 4, 5, 6, and 7 in sequence.

    • After an operator is compiled, the operator data is saved in memory.

      In this mode, you need to call AscendCL APIs as required.

      • For the operators that will be executed for multiple times, you are advised to call aclopCompile to compile the operators. After the compilation, perform 3, 4, 5, 6, and 7 in sequence.
      • For the operators that will be compiled and executed for the same number of times, you are advised to perform 3 and then call aclopCompileAndExecute. After the compilation, perform 6 and 7 in sequence.
  2. Load the operator model file.
    You can use either of the following methods:
    • Call aclopSetModelDir to set the directory of the single-operator .om model file.
    • Call aclopLoad to load the single-operator model data from memory. The memory is managed by the user. The "single-operator model data" refers to the data that is loaded to the memory from the single-operator .om file.
  3. Call aclrtMalloc to allocate device memory to store the input and output data of the operator.

    To transfer data from the host to the device, call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to copy the memory.

  4. In the dynamic-shape scenario, if the output shape of an operator cannot be determined, you need to infer or estimate the output shape of the operator before executing the operator.

    You need to call the aclopInferShape, aclGetTensorDescNumDims, aclGetTensorDescDimV2, and aclGetTensorDescDimRange APIs to infer or estimate the output shape of the operator as the input of the operator execution API aclopExecuteV2.

  5. Execute the operator.
    • Operators encapsulated as AscendCL APIs (for details, see CBLAS APIs), including the GEMM operator and Cast operator, can be executed in either of the following ways:
    • Operators that are not encapsulated as AscendCL APIs, can be executed in either of the following ways:

    If an operator is executed in non-handle mode, the system matches the model in the memory based on the operator description in every execution.

    When an operator is executed in handle mode, the system matches the operator description information with the model in the memory and caches the information in the handle. Each time the operator is executed, the operator and model do not need to be matched repeatedly. Therefore, when the same operator is executed for multiple times, the efficiency is higher. However, this mode does not support dynamic-shape operators. After the handle is used, aclopDestroyHandle needs to be called to release the handle.

  6. Call aclrtSynchronizeStream to wait for the stream tasks to complete.
  7. Call aclrtFree to free the memory.

    Call aclrtMemcpy (synchronous mode) or aclrtMemcpyAsync (asynchronous mode) to transfer data from the device to the host using memory copy.