Overview

Basic Concepts

  • Level-0 APIs

    L0 APIs for short, indicating the APIs of the basic kernel. These APIs directly call the kernel of the affinity SoC.

    The return type of an L0 API is a tensor type structure, such as aclTensor*, std::tuple<aclTensor*, aclTensor*&gt, or aclTensorList*. The last parameter is fixed at aclOpExecutor *executor, and the type and name cannot be changed. The following is an example:

    1
    aclTensor* AddNd(aclTensor *x1, aclTensor *x2, aclOpExecutor *executor)
    

    The L0 API namespace is namespace l0op, and the API name is ${op_type}${format}${dtype}, where ${op_type} indicates the operator name, ${format} indicates the operator input/output format, and ${dtype} indicates the operator input/output type. The following is an example:

    1
    2
    l0op::AddNd                               // The input of the Add operator is computed in ND format.
    l0op::MatMulNdFp162Fp32                   // The input and output are computed in ND format. The input is fp16 and the output is fp32.
    
  • Level-2 APIs

    L2 APIs, which are higher-level APIs (also called host APIs). They can call multiple L0 APIs to implement more flexible functions. In addition, they correspond to framework API functions to facilitate framework adaptation and script migration.

    The return type of an L2 API is aclnnStatus, which is generally defined as a two-phase API.

    1
    2
    aclnnStatus aclnnXxxGetWorkspaceSize(const aclTensor *src, ..., aclTensor *out, ..., uint64_t *workspaceSize, aclOpExecutor **executor);
    aclnnStatus aclnnXxx(void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream);
    
    • The last two parameters of aclnnXxxGetWorkspaceSize are fixed at (uint64_t *workspaceSize, aclOpExecutor **executor). Their name and type cannot be changed.
    • The aclnnXxx API parameter is fixed at (void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream).

    aclnnXxxGetWorkspaceSize is the first-phase API, which is used to compute the workspace size required for the API calling. After obtaining the workspace size, allocate memory on the device based on workspaceSize, and call the second-phase API aclnnXxx to perform computation.

    • The workspace refers to the temporary memory required by API computation on the Ascend AI Processor except for input and output.
    • The second-phase aclnnXxx(...) API cannot be called repeatedly. The following call throws an exception:
      1
      2
      3
      aclnnXxxGetWorkspaceSize(...)
      aclnnXxx(...)
      aclnnXxx(...)
      
  • Broadcast relationship

    Broadcasting describes how an operator treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is "broadcast" across the larger array so that they have compatible shapes. For more information about the broadcast technology, see the NumPy official website.

Overall Description

This chapter describes the basic framework APIs, macros, classes, and basic kernel function APIs (Level-0 APIs) involved in single-operator API execution. It describes the function prototype, function usage, parameters, constraints, and examples to help you quickly customize neural network (NN) operators and fused operators (generally prefixed with aclnn) or modify built-in CANN operators, in order to support various AI services.

When calling single-operator APIs, you need to include the dependent header files as required. The paths of the header files are as follows:

  • The header files of framework capability APIs are stored in the ${INSTALL_DIR}/include directory. For details, see Table 1. For details about the common classes and macro definitions provided by the framework, see Table 2 and Table 3.
  • The header files of basic kernel function APIs are stored in the ${INSTALL_DIR}/include directory. For details, see Table 4.

Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest.

Table 1 Framework capability APIs

API Category

Description

Header File

bfloat16

Describes the implementation classes of the bfloat16 type on the CPU.

aclnn/opdev/bfloat16.h

common_types

Describes basic aclnn data structures such as aclTensor and aclScalar.

aclnn/opdev/common_type.h

data_type_utils

Provides basic APIs related to data types, for example, the API for obtaining the size of a specified data type.

aclnn/opdev/data_type_utils.h

This API is reserved and can be ignored.

Describes the FastVector type, which is an efficient vector data structure implemented in aclnn.

aclnn/opdev/fast_vector.h

format_utils

Provides basic APIs related to the format.

aclnn/opdev/format_utils.h

fp16_t

Describes the implementation classes of the float16 type on the CPU.

aclnn/opdev/fp16_t.h

framework_op

Describes the capability of copying data from the host to the device provided by the framework.

aclnn/opdev/framework_op.h

This API is reserved and can be ignored.

Provides the macro declaration for initializing aclOpExecutor.

aclnn/opdev/make_op_executor.h

object

Describes the base class Object of basic data structures such as aclTensor in aclnn. It is used to overload and implement the new and delete methods.

aclnn/opdev/object.h

op_arg_def

Describes the OpArgContext class and provides macro declarations such as OP_INPUT.

aclnn/opdev/op_arg_def.h

op_cache

Describes OpExecCache and related classes, which are used to cache aclnn and improve runtime performance.

aclnn/opdev/op_cache.h

op_cache_container

Describes the aclnn cache container with least recently used (LRU) mechanism.

aclnn/opdev/op_cache_container.h

op_config

Provides configurations related to operator running, such as the deterministic computing switch.

aclnn/opdev/op_config.h

op_def

Defines basic enumerations and constants, such as the precision mode OpImplMode.

aclnn/opdev/op_def.h

op_dfx

Describes the DfxGuard class, which is used for API printing and profiling reporting.

aclnn/opdev/op_dfx.h

For details, see "NN Operator APIs > aclnn Return Codes" in Operator Acceleration Library API Reference.

Defines the aclnn error codes.

aclnn/opdev/op_errno.h

op_executor

Describes the aclOpExecutor class.

aclnn/opdev/op_executor.h

op_log

Defines the log printing macro in aclnn.

aclnn/opdev/op_log.h

platform

Describes the PlatformInfo class, which is used to store SoC platform information.

aclnn/opdev/platform.h

pool_allocator

Describes the PoolAllocator class, which is used to implement the CPU memory pool in aclnn.

aclnn/opdev/pool_allocator.h

shape_utils

Provides basic operations related to the shape, such as shape printing.

aclnn/opdev/shape_utils.h

small_vector

Describes the SmallVector class, which is an efficient vector data structure implemented in aclnn and is mainly used in scenarios where the known data size is small.

aclnn/opdev/small_vector.h

tensor_view_utils

Provides basic operations on the View class, for example, determining whether aclTensor is contiguous.

aclnn/opdev/tensor_view_utils.h

data_type_utils

Provides basic APIs related to the data type, for example, checking whether a specified data type is an integer.

aclnn/opdev/op_common/data_type_utils.h

aicpu_args_handler

Provides the handling logic of combined computing tasks related to the AI CPU, for example, combining arguments related to computing tasks.

aclnn/opdev/aicpu/aicpu_args_handler.h

aicpu_ext_info_handle

Provides the handling logic of extended arguments of computing tasks related to the AI CPU, for example, the API for combining and parsing extended arguments.

aclnn/opdev/aicpu/aicpu_ext_info_handle.h

aicpu_uitls

Describes the common APIs required by AI CPU tasks.

aclnn/opdev/aicpu/aicpu_uitls.h

aicpu_task

Provides logic for setting and delivering AI CPU tasks, for example, setting an AI CPU operator to be called and setting the operator input and output.

aclnn/opdev/aicpu/aicpu_task.h

Table 2 Common macros

Macro

Description

Header File

DFX_IN

Packs all API input parameters on the host in L2_DFX_PHASE_1.

aclnn/opdev/op_dfx.h

DFX_OUT

Packs all API output parameters on the host in L2_DFX_PHASE_1.

L0_DFX

Must be used in the L0 API on the host to print the API and input parameters in the L0 API.

L2_DFX_PHASE_1

Must be called at the beginning of the first-phase API to print the API and input parameters in the first phase.

L2_DFX_PHASE_2

Must be called at the beginning of the second-phase API for API printing.

OP_TYPE_REGISTER

Must be used at the beginning of the L0 API to register the L0 operator.

OP_ATTR

Packs operator attribute parameters in ADD_TO_LAUNCHER_LIST_AICORE.

aclnn/opdev/op_arg_def.h

OP_EMPTY_ARG

Holds the place for an empty input or output in ADD_TO_LAUNCHER_LIST_AICORE.

OP_INPUT

Packs the operator input aclTensor in ADD_TO_LAUNCHER_LIST_AICORE.

OP_MODE

Packs the operator running options in ADD_TO_LAUNCHER_LIST_AICORE, for example, whether to enable HF32.

OP_OUTPUT

Packs the operator output aclTensor in ADD_TO_LAUNCHER_LIST_AICORE.

OP_OUTSHAPE

Sets the aclTensor for storing the output shape for the third type of operators in ADD_TO_LAUNCHER_LIST_AICORE.

OP_OPTION

Packs the precision mode specified by the operator in ADD_TO_LAUNCHER_LIST_AICORE.

OP_WORKSPACE

Packs the workspace parameter explicitly specified by the operator in ADD_TO_LAUNCHER_LIST_AICORE.

CREATE_EXECUTOR

Creates a UniqueExcutor object, which is the factory class of aclOpExecutor.

aclnn/opdev/make_op_executor.h

INFER_SHAPE

Runs the infershape function of a specified operator to infer the output shape.

ADD_TO_LAUNCHER_LIST_AICORE

Creates an execution task for an AI Core operator and adds the task to the aclOpExecutor execution queue. This API is executed in the second phase.

OP_ATTR_NAMES

Packs the character type attributes of an AI CPU operator. It is a vector of string type.

aclnn/opdev/aicpu/aicpu_task.h

ADD_TO_LAUNCHER_LIST_AICPU

Creates an execution task for an AI CPU operator and adds the task to the aclOpExecutor execution queue. This API is executed in the second phase.

Table 3 Common classes

Class

Description

Header File

aclOpExecutor

Indicates the operator executor, which records the context structure of the API running information on the host, such as the computational graph during the execution of an L2 API, launch subtask of an L0 operator, and workspace address and size.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see op_executor.

aclnn/opdev/op_executor.h

aclTensor

Indicates a tensor object, including the shape, data type, format, and address of the tensor. The data can be stored on the host or device.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclnn/opdev/common_type.h

aclScalar

Indicates a scalar object. The data is generally stored on the host.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclTensorList

Indicates a list object consisting of a group of aclTensor types.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclScalarList

Indicates a list object consisting of a group of aclScalar types.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclBoolArray

Indicates an array object of the Boolean type. The data is generally stored on the host.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclIntArray

Indicates an array object of the int64_t type. The data is generally stored on the host.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclFloatArray

Indicates an array object of the fp32 type. The data is generally stored on the host.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclFp16Array

Indicates an array object of the fp16 type. The data is generally stored on the host.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

aclBf16Array

Indicates an array object of the bf16 type. The data is generally stored on the host.

The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see common_types.

SmallVector

This class uses the internal memory pool to implement the vector container. The basic functions of this class are the same as those of the std::vector container in the C++ standard library. Not every capacity expansion requires memory allocation so that the performance is not affected. The member variables defined by this class are private variables and can be ignored. For details about the defined member functions, see small_vector.

  • op::FVector: SmallVector whose storage capacity length is 8.
    1
    2
    3
    4
    namespace op {
    template<typename T, size_t N = 8>
    using FVector = op::internal::SmallVector<T, N, op::internal::PoolAllocator<T>>;
    }
    
  • op:Strides: FVector whose storage capacity length is 25. It has int64_t elements and stores the stride information.
    1
    2
    3
    4
    namespace op {
    constexpr uint64_t MAX_DIM_NUM = 25;
    using Strides = FVector<int64_t, MAX_DIM_NUM>;
    }
    
  • op::ShapeVector: FVector whose storage capacity length is 25. It has int64_t elements and stores the shape information.
    1
    2
    3
    4
    namespace op {
    constexpr uint64_t MAX_DIM_NUM = 25;
    using ShapeVector = FVector<int64_t, MAX_DIM_NUM>;
    }
    

aclnn/opdev/small_vector.h

OpExecMode

Indicates the enumeration class for the operator run mode. For details, see OpExecMode.

aclnn/opdev/op_def.h

OpImplMode

Indicates the enumeration class for the operator precision modes. For details, see OpImplMode.

Table 4 Basic kernel function APIs

API

Description

Header File

Cast

Converts an input tensor to the specified data type.

aclnn_kernels/cast.h

Contiguous

Converts a non-contiguous tensor to a contiguous tensor.

aclnn_kernels/contiguous.h

ViewCopy

Moves a contiguous tensor to a continuous or non-contiguous tensor.

Pad

Pads dimensions of an input tensor based on paddings. The padding value is 0.

aclnn_kernels/pad.h

Reshape

Reshapes the input tensor x.

aclnn_kernels/reshape.h

Slice

Obtains tensor slices.

aclnn_kernels/slice.h

Transpose

Transposes the shape of input tensor x according to the permutation of dimensions (perm) and outputs the result.

aclnn_kernels/transpose.h

TransData

Converts the format of an input tensor to the specified dstPrimaryFormat.

aclnn_kernels/transdata.h

TransDataSpecial

Converts the format of an input tensor to the specified dstPrimaryFormat. This API is similar to TransData.

ReFormat

Reformats the input tensor x to the destination format without changing the dimensions.

IsNullptr

Checks whether the input pointer is null.

aclnn_kernels/common/op_error_check.h