General Description and Restrictions

Header Files to Be Included

Header file directory of Ascend C APIs:

Basic APIs: ${INSTALL_DIR}/include/ascendc/basic_api/interface
High-level APIs: (Note that if the APIs contained in the following header file directories are not declared in the documentation, they are called indirectly, and developers do not need to pay attention to them.)
- ${INSTALL_DIR}/include/ascendc/highlevel_api/lib
- ${INSTALL_DIR}/include/tiling

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann.

For your convenience, both basic and high-level Ascend C APIs can be called by including the kernel_operator.h file. Unless otherwise specified, including this header file meets the API calling requirements. If there are special notes in the API documentation, follow the specific instructions provided.

#include "kernel_operator.h"

Mapping Between Logical Locations and Physical Storage

Operands of Ascend C APIs are typically GlobalTensor and LocalTensor. The storage location of tensor data is expressed using logical positions (TPosition), which hides the differences in hardware architecture. The TPosition types include VECIN, VECOUT, VECCALC, A1, A2, B1, B2, CO1, and CO2. The mapping between these logical positions and physical memory is shown in the following table.

**Table 1** Mapping between TPosition and physical memory
TPosition	Physical Memory
GM	Global Memory
VECIN	Unified Buffer
VECCALC	Unified Buffer
VECOUT	Unified Buffer
A1	L1 Buffer
A2	L0A Buffer
B1	L1 Buffer
B2	L0B Buffer
C1	Atlas training products, Unified Buffer. Atlas inference product's AI Core, Unified Buffer. Atlas A2 training products/Atlas A2 inference products, L1 Buffer. Atlas A3 training products/Atlas A3 inference products, L1 Buffer. Atlas 200I/500 A2 inference products, Unified Buffer.
C2	Atlas training products, L0C Buffer. Atlas inference product's AI Core, L0C Buffer. Atlas A2 training products/Atlas A2 inference products, BiasTable Buffer Atlas A3 training products/Atlas A3 inference products, BiasTable Buffer Atlas 200I/500 A2 inference products, BiasTable Buffer
CO1	L0C Buffer
CO2	Atlas training products, Unified Buffer. Atlas inference product's AI Core, Unified Buffer. Atlas A2 training products/Atlas A2 inference products, Global Memory. Atlas A3 training products/Atlas A3 inference products, Global Memory. Atlas 200I/500 A2 inference products, Global Memory.
TSCM	L1 Buffer
SPM	Atlas training products, L1 Buffer. Atlas inference product's AI Core, L1 Buffer. Atlas A2 training products/Atlas A2 inference products, Global Memory. Atlas A3 training products/Atlas A3 inference products, Global Memory.
C2PIPE2GM	Atlas A2 training products/Atlas A2 inference products, FixPipe Buffer. Atlas A3 training products/Atlas A3 inference products, FixPipe Buffer.

General Address Alignment Restrictions

The storage units on the AI Core are used to store the source and destination operands for vector and matrix computations. The alignment requirements of each storage unit are shown in Table 2. Therefore, the start address alignment requirements of the operands in Ascend C APIs must be consistent with those of such storage units. Note that if the start address alignment requirements of the operands are specified in the API, the description in the specific API shall prevail.

**Table 2** Alignment requirements for different memory units
Memory Unit	Alignment Requirement
Global Memory	No alignment requirement.
Unified Buffer	32-byte aligned.
L1 Buffer	32-byte aligned.
L0A Buffer/L0B Buffer	512-byte aligned.
L0C Buffer	64-byte aligned.
BiasTable Buffer	64-byte aligned.
Fixpipe Buffer	64-byte aligned.

General Address Overlapping Restrictions

To save memory space when using high-dimensional tensor sharding compute APIs of basic APIs, you can define a tensor shared by the source and destination operands (by address overlapping). Pay attention to the following restrictions when using this:

In a single iteration, the source operand must completely overlap the destination operand. Partial overlapping is not supported.
Among multiple iterations, the destination operand of a previous iteration cannot overlap the source operand of a subsequent iteration. For example, the destination operand of the Nth iteration is the source operand of the (N+1)th iteration (as shown in the following figure). In this case, the N th iteration may overwrite the value of the source operand, resulting in an unexpected result. In particular, for some two-operand compute APIs (Add, Sub, Mul, Max, Min, AddRelu, and SubRelu), when the data type is half, int32_t, or float, the destination operand of a previous iteration can overlap the source operand of a subsequent iteration. This is only applicable when the destination operand overlaps the second source operand, and src1RepStride or dstRepStride must be 0.

Figure 1 Example of address overlapping (not supported)

The general restrictions on address overlapping described in this section apply to common cases. If there are additional restrictions in the API reference, the restrictions in the specific API prevail.

If an API does not describe address overlapping restrictions, address overlapping is not supported for high-dimensional tensor sharding computation. In this case, the computation result may not meet expectations.