General Description and Restrictions

Header Files to Be Included

Header file directory of Ascend C APIs:

  • Basic APIs: ${INSTALL_DIR}/include/ascendc/basic_api/interface
  • High-level APIs: (Note that if the APIs contained in the following header file directories are not declared in the documentation, they are called indirectly, and developers do not need to pay attention to them.)
    • ${INSTALL_DIR}/include/ascendc/highlevel_api/lib
    • ${INSTALL_DIR}/include/tiling

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

To facilitate development, both Ascend C basic APIs and high-level APIs can be called by including the kernel_operator.h file. Unless otherwise specified, including this header file is sufficient for API calling. If the API document has special instructions, follow the specific instructions of the API.

1
#include "kernel_operator.h"

Mapping Between Logical Locations and Physical Storage

The operands of Ascend C APIs are usually GlobalTensor and LocalTensor. The storage locations of tensor data are expressed by logical locations (TPosition), which hides the differences in hardware architectures. The TPosition type includes VECIN, VECOUT, VECCALC, A1, A2, B1, B2, CO1, and CO2. The following table describes the mapping between TPosition and physical memory.
Table 1 Mapping between TPosition and physical memory

TPosition

Physical Memory

GM

Global Memory

VECIN

Unified Buffer

VECCALC

Unified Buffer

VECOUT

Unified Buffer

A1

L1 Buffer

A2

L0A Buffer

B1

L1 Buffer

B2

L0B Buffer

C1

Atlas training products, Unified Buffer.

Atlas inference product's AI Core, Unified Buffer.

Atlas A2 training products/Atlas A2 inference products, L1 Buffer.

Atlas A3 training products/Atlas A3 inference products, L1 Buffer.

Atlas 200I/500 A2 inference products, Unified Buffer.

C2

Atlas training products, L0C Buffer.

Atlas inference product's AI Core, L0C Buffer.

Atlas A2 training products/Atlas A2 inference products, BiasTable Buffer.

Atlas A3 training products/Atlas A3 inference products, BiasTable Buffer.

Atlas 200I/500 A2 inference products, BiasTable Buffer.

CO1

L0C Buffer

CO2

Atlas training products, Unified Buffer.

Atlas inference product's AI Core, Unified Buffer.

Atlas A2 training products/Atlas A2 inference products, Global Memory.

Atlas A3 training products/Atlas A3 inference products, Global Memory.

Atlas 200I/500 A2 inference products, Global Memory.

TSCM

L1 Buffer

SPM

Atlas training products, L1 Buffer.

Atlas inference product's AI Core, L1 Buffer.

Atlas A2 training products/Atlas A2 inference products, Global Memory.

Atlas A3 training products/Atlas A3 inference products, Global Memory.

C2PIPE2GM

Atlas A2 training products/Atlas A2 inference products, FixPipe Buffer.

Atlas A3 training products/Atlas A3 inference products, FixPipe Buffer.

General Address Alignment Restrictions

The storage units on the AI Core are used to store the source and destination operands for vector and matrix computations. Table 2 describes the alignment requirements of each storage unit. Therefore, the start address alignment requirements of the operands in Ascend C APIs must be consistent with those of these storage units. Note that if the start address alignment requirements of the operands are specified in the API, the description in the specific API shall prevail.
Table 2 Alignment requirements for different memory units

Memory Unit

Alignment Requirement

Global Memory

No alignment requirement.

Unified Buffer

32-byte aligned.

L1 Buffer

32-byte aligned.

L0A Buffer/L0B Buffer

512-byte aligned.

L0C Buffer

64-byte aligned.

BiasTable Buffer

64-byte aligned.

Fixpipe Buffer

64-byte aligned.

General Address Overlap Restrictions

When using the high-dimensional tensor splitting and computation API of the basic APIs, you can define a tensor for the source and destination operands to share (that is, address overlapping) to save the address space. Pay attention to the following restrictions:

  • In a single iteration: The source operand must completely overlap the destination operand. Partial overlapping is not supported.
  • Multiple iterations: The destination operand of a previous iteration cannot overlap with the source operand of a subsequent iteration. For example, the destination operand of the Nth iteration is the source operand of the (N+1) th iteration (as shown in the following figure). In this case, the Nth iteration may rewrite the value that overwrites the source operand. As a result, the expected result cannot be obtained. Particularly, for some binocular computing APIs (Add, Sub, Mul, Max, Min, AddRelu and SubRelu), when the data type is half, int32_t, or float, the destination operand of the pre-order iteration can overlap the source operand of the post-order iteration. This error occurs only when the destination operand overlaps the second source operand, and src1RepStride or dstRepStride must be 0.
Figure 1 Example of address overlapping (not supported)

The general address overlapping constraints described in this section apply to common scenarios. If there are special requirements in the API reference, the requirements in the API reference prevail.

If the address overlapping constraint is not described in the API, it is considered that the address overlapping of tensor high-dimensional segmentation calculation is not supported. In this case, the calculation result may not meet the expectation.