aclnn Development APIs
You can directly call the Ascend C custom operators or CANN built-in operators through the aclnn development APIs (also called Level-2 APIs) without providing the IR (Intermediate Representation) definition.
This section provides the bottom-layer framework capability APIs (also called nnopbase APIs) and basic tensor operation APIs (also called Level-0 APIs) required for aclnn development APIs to directly call these operators.
Basic Concepts
- Level-0 APIs
L0 APIs for short, which are used to call single-kernel APIs on the host and provide fine-grained APIs (single-kernel delivery), basic structures (such as tensor definition) for operator API development, and common basic capabilities (such as workspace reuse and engine scheduling). Upper-layer applications or L2 APIs can quickly assemble L0 APIs to implement high-performance computing.
The return type of an L0 APIs is a tensor type structure, such as aclTensor*, std::tuple<aclTensor*, aclTensor*>, or aclTensorList*. The last parameter is fixed at aclOpExecutor *executor, and the type and name cannot be changed. The following is an example:
1aclTensor* AddNd(aclTensor *x1, aclTensor *x2, aclOpExecutor *executor)
The namespace of the L0 APIs is namespace l0op, and the API name is ${op_type}${format}${dtype}, where ${op_type} indicates the operator name, ${format} indicates the input/output data format of the operator, and ${dtype} indicates the input/output data type of the operator. (For unconventional input/output data types, the data type mapping must be specified.) The following is an example:
1 2 3
l0op::AddNd //The input of the Add operator is computed in ND format. l0op::MatMulNdFp162Fp32 //The input and output of the MatMul operator are computed in ND format. The value 2 indicates To, indicating that the input is fp16 and the output is fp32. l0op::MatMulNzFp162Fp16 //The input and output of the MatMul operator are computed in NZ format. The value 2 indicates To, indicating that both the input and output are fp16.
- Level-2 APIs
L2 APIs for short, which are higher-level encapsulation of L0 APIs (they call one or more L0 APIs to implement more flexible functions) and are APIs on the host at a higher level. This type of APIs provides the single-operator calling mode, which shields the internal implementation logic of operators. You can call the L2 APIs to call operators.
The return value of the L2 APIs is of the aclnnStatus type, which generally involves the two-phase APIs for workspaceSize obtaining and operator execution.
1 2
aclnnStatus aclnnXxxGetWorkspaceSize(const aclTensor *src, ..., aclTensor *out, ..., uint64_t *workspaceSize, aclOpExecutor **executor); aclnnStatus aclnnXxx(void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream);
- The last two parameters of aclnnXxxGetWorkspaceSize are fixed at (uint64_t *workspaceSize, aclOpExecutor **executor). Their name and type cannot be changed.
- The aclnnXxx API parameter is fixed at (void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream).
aclnnXxxGetWorkspaceSize is the first-phase API, which is used to calculate the workspace memory required during the API call. After obtaining the workspace size required for the current computation, allocate the NPU memory based on the workspace size, and then call the second-phase API aclnnXxx to perform computation. Xxx indicates the operator type, for example, Add operator.
- The workspace refers to the temporary memory required by API computation on the Ascend AI Processor except for input and output.
- The second-phase API aclnnXxx(...) cannot be called repeatedly. The following call throws an exception:
1 2 3
aclnnXxxGetWorkspaceSize(...) aclnnXxx(...) aclnnXxx(...)
- In-place Operator APIs
These APIs perform update operations directly at the original memory address. During computation, the input and output share the same address, minimizing redundant memory usage. In the aclnn class, the in-place operator APIs are typically named aclnnInplaceXxxGetWorkspaceSize (first-phase API) and aclnnInplaceXxx (second-phase API).
API List
- Framework capability APIs: Basic capability APIs for implementing the aclnn APIs are provided, such as opExecutor processing and data type, format, and shape operations. For details, see Table 1. In addition, common classes and macros are included. For details, see Table 2 and Table 3. The source code implementation of these APIs has been open-sourced in the CANN/opbase repository, where you can explore further details.
- Basic tensor operation APIs: Basic tensor operation APIs for implementing the aclnn APIs are provided, such as tensor data type conversion and shape reconstruction. For details, see Table 4. The source code implementation of these APIs has been open-sourced in the CANN/ops-math repository, where you can explore further details.
- Header file description: When using the preceding APIs, include the dependent header files based on the site requirements. The header files are stored in the ${INSTALL_DIR}/include directory. Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.
For details about how to use the APIs in this section, see the component project of the operator library in the CANN open source community. The source code implementation of certain operators has been open-sourced, allowing you to leverage them to customize the APIs described in this section.
API Category |
Description |
Header File |
|---|---|---|
Describes the implementation classes of the bfloat16 type on the CPU. |
aclnn/opdev/bfloat16.h |
|
Describes basic aclnn data structures such as aclTensor and aclScalar. |
aclnn/opdev/common_types.h |
|
Provides basic APIs related to data types, for example, the API for obtaining the size of a specified data type. |
aclnn/opdev/data_type_utils.h |
|
fast_vector |
Describes the FastVector type, which is an efficient vector data structure implemented in aclnn. Note: The APIs defined in this header file are reserved and can be ignored. |
aclnn/opdev/fast_vector.h |
Provides basic APIs related to the format. |
aclnn/opdev/format_utils.h |
|
Describes the implementation classes of the float16 type on the CPU. |
aclnn/opdev/fp16_t.h |
|
Describes the capability of copying data from the host to the device provided by the framework. |
aclnn/opdev/framework_op.h |
|
make_op_executor |
Provides the macro declaration for initializing aclOpExecutor. Note: The APIs defined in this header file are reserved and can be ignored. |
aclnn/opdev/make_op_executor.h |
Describes the base class Object of basic data structures such as aclTensor in aclnn. It is used to overload and implement the new and delete methods. |
aclnn/opdev/object.h |
|
Describes the OpArgContext class and provides macro declarations such as OP_INPUT. |
aclnn/opdev/op_arg_def.h |
|
Describes OpExecCache and related classes, which are used to cache aclnn and improve runtime performance. |
aclnn/opdev/op_cache.h |
|
Describes the aclnn cache container with least recently used (LRU) mechanism. |
aclnn/opdev/op_cache_container.h |
|
Provides configurations related to operator running, such as the deterministic computing switch. |
aclnn/opdev/op_config.h |
|
Defines basic enumerations and constants, such as the precision mode OpImplMode. |
aclnn/opdev/op_def.h |
|
Describes the DfxGuard class, which is used for API printing and profiling reporting. |
aclnn/opdev/op_dfx.h |
|
Defines the aclnn error codes. |
aclnn/opdev/op_errno.h |
|
Describes the aclOpExecutor class. |
aclnn/opdev/op_executor.h |
|
Defines the log printing macro in aclnn. |
aclnn/opdev/op_log.h |
|
Describes the PlatformInfo class, which is used to store SoC platform information. |
aclnn/opdev/platform.h |
|
Describes the PoolAllocator class, which is used to implement the CPU memory pool in aclnn. |
aclnn/opdev/pool_allocator.h |
|
Provides basic operations related to the shape, such as shape printing. |
aclnn/opdev/shape_utils.h |
|
Describes the SmallVector class, which is an efficient vector data structure implemented in aclnn and is mainly used in scenarios where the known data size is small. |
aclnn/opdev/small_vector.h |
|
Provides basic operations on the View class, for example, determining whether aclTensor is contiguous. |
aclnn/opdev/tensor_view_utils.h |
|
Provides basic APIs related to the data type, for example, checking whether a specified data type is an integer. |
aclnn/opdev/op_common/data_type_utils.h |
|
Provides the handling logic of combined computing tasks related to the AI CPU, for example, combining arguments related to computing tasks. |
aclnn/opdev/aicpu/aicpu_args_handler.h |
|
Provides the handling logic of extended arguments of computing tasks related to the AI CPU, for example, the API for combining and parsing extended arguments. |
aclnn/opdev/aicpu/aicpu_ext_info_handle.h |
|
Provides logic for setting and delivering AI CPU tasks, for example, setting an AI CPU operator to be called and setting the operator input and output. |
aclnn/opdev/aicpu/aicpu_task.h |
|
Describes the common APIs required by AI CPU tasks. |
aclnn/opdev/aicpu/aicpu_uitls.h |
Macro |
Description |
Header File |
|---|---|---|
Packs all API input parameters on the host in L2_DFX_PHASE_1. |
aclnn/opdev/op_dfx.h |
|
Packs all API output parameters on the host in L2_DFX_PHASE_1. |
||
Must be used in the L0 API on the host to print the API and input parameters in the L0 API. |
||
Must be called at the beginning of the first-phase API to print the API and input parameters in the first phase. |
||
Must be called at the beginning of the second-phase API for API printing. |
||
Must be used at the beginning of the L0 API to register the L0 operator. |
||
Packs operator attribute parameters in ADD_TO_LAUNCHER_LIST_AICORE. |
aclnn/opdev/op_arg_def.h |
|
Holds the place for an empty input or output in ADD_TO_LAUNCHER_LIST_AICORE. |
||
Packs the operator input aclTensor in ADD_TO_LAUNCHER_LIST_AICORE. |
||
Packs the operator running options in ADD_TO_LAUNCHER_LIST_AICORE, for example, whether to enable HF32. |
||
Packs the operator output aclTensor in ADD_TO_LAUNCHER_LIST_AICORE. |
||
Sets the aclTensor for storing the output shape for the third type of operators in ADD_TO_LAUNCHER_LIST_AICORE. |
||
Packs the precision mode specified by the operator in ADD_TO_LAUNCHER_LIST_AICORE. |
||
Packs the workspace parameter explicitly specified by the operator in ADD_TO_LAUNCHER_LIST_AICORE. |
||
Creates a UniqueExecutor object, which is the factory class of aclOpExecutor. |
aclnn/opdev/make_op_executor.h |
|
Runs the infershape function of a specified operator to infer the output shape. |
||
Creates an execution task for an AI Core operator and adds the task to the aclOpExecutor execution queue. This API is executed in the second phase. |
||
Packs the character type attributes of an AI CPU operator. It is a vector of string type. |
aclnn/opdev/aicpu/aicpu_task.h |
|
Creates an execution task for an AI CPU operator and adds the task to the aclOpExecutor execution queue. This API is executed in the second phase. |
Class/Struct |
Description |
Header File |
||||||
|---|---|---|---|---|---|---|---|---|
aclOpExecutor |
Indicates the operator executor, which records the context structure of the API running information on the host, such as the computational graph during the execution of an L2 API, launch subtask of an L0 operator, and workspace address and size. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see op_executor. |
aclnn/opdev/op_executor.h |
||||||
aclTensor |
Indicates a tensor object, including the shape, data type, format, and address of the tensor. The data can be stored on the host or device. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
aclnn/opdev/common_type.h |
||||||
aclScalar |
Indicates a scalar object. The data is generally stored on the host. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclTensorList |
Indicates a list object consisting of a group of aclTensor types. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclScalarList |
Indicates a list object consisting of a group of aclScalar types. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclBoolArray |
Indicates an array object of the Boolean type. The data is generally stored on the host. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclIntArray |
Indicates an array object of the int64_t type. The data is generally stored on the host. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclFloatArray |
Indicates an array object of the fp32 type. The data is generally stored on the host. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclFp16Array |
Indicates an array object of the fp16 type. The data is generally stored on the host. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
aclBf16Array |
Indicates an array object of the bf16 type. The data is generally stored on the host. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types. |
|||||||
SmallVector |
This class uses the internal memory pool to implement the vector container. The basic functions of this class are the same as those of the std::vector container in the C++ standard library. Not every capacity expansion requires memory allocation so that the performance is not affected. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see small_vector.
|
aclnn/opdev/small_vector.h |
||||||
OpExecMode |
Indicates the enumeration class for the operator run mode. For details, see OpExecMode. |
aclnn/opdev/op_def.h |
||||||
OpImplMode |
Indicates the enumeration class for the operator precision modes. For details, see OpImplMode. |
API |
Description |
Header File |
|---|---|---|
Converts an input tensor to the specified data type. |
aclnn_kernels/cast.h |
|
Converts a non-contiguous tensor into a contiguous tensor. |
aclnn_kernels/contiguous.h |
|
Moves a contiguous tensor to a contiguous or non-contiguous tensor. |
||
Pads dimensions of an input tensor based on paddings. The padding value is 0. |
aclnn_kernels/pad.h |
|
Reshapes the input tensor x. |
aclnn_kernels/reshape.h |
|
Obtains tensor slices. |
aclnn_kernels/slice.h |
|
Transposes the shape of input tensor x according to the permutation of dimensions (perm) and outputs the result. |
aclnn_kernels/transpose.h |
|
Converts the format of an input tensor to the specified dstPrimaryFormat. |
aclnn_kernels/transdata.h |
|
Converts the format of an input tensor to the specified dstPrimaryFormat. This API is similar to TransData. |
||
Reformats the input tensor x to the destination format without changing the dimensions. |
||
Checks whether the input pointer is null. |
aclnn_kernels/common/op_error_check.h |