TPosition

When managing physical memories at different levels, uses an abstract logical position (TPosition) to express memory at different levels, replacing on-chip physical storage and hiding the hardware architecture. The main TPosition types are as follows: VECIN, VECOUT, VECCALC, A1, A2, B1, B2, C1, C2, CO1, and CO2. VECIN, VECCALC, and VECOUT are used for vector programming, and A1, A2, B1, B2, C1, C2, CO1, and CO2 are used for matrix programming. You can refer to Programming Paradigm to understand the basic concepts of TPosition, and Table 1 to understand the mapping between TPosition and physical storage.

TPosition is defined as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
enum class TPosition : uint8_t {
    GM,
    A1,
    A2,
    B1,
    B2,
    C1,
    C2,
    CO1,
    CO2,
    VECIN,
    VECOUT,
    VECCALC,
    LCM = VECCALC,
    SPM,
    SHM = SPM,
    TSCM,
    C2PIPE2GM,
    C2PIPE2LOCAL,
    MAX,
};

The enumerated values of TPosition are defined as follows.

Table 1 Enumerated values of TPosition

Enumerated Value

Description

GM

Global memory, corresponding to the external memory of AI Core.

VECIN

Used for vector computation; storage location of the move-in data. This location is used when data is moved in to the Vector Unit.

VECOUT

Used for vector computation; storage location of the move-out data. This location is used when moving out the result from the Vector Unit.

VECCALC

Used for vector/matrix computation. This location is used when temporary variables are required for the computation.

A1

Used for matrix computation and used to store the entire matrix A, which is similar to the L2 cache in the multi-level cache of the CPU.

B1

Used for matrix computation and used to store the entire matrix B, which is similar to the L2 cache in the multi-level cache of the CPU.

C1

Used for matrix computation and used to store the entire bias matrix, which is similar to the L2 cache in the multi-level cache of the CPU.

A2

Used for matrix computation and used to store the split smaller matrix A, which is similar to the L1 cache in the multi-level cache of the CPU.

B2

Used for matrix computation and used to store the split smaller matrix B, which is similar to the L1 cache in the multi-level cache of the CPU.

C2

Used for matrix computation and used to store the split smaller bias matrix, which is similar to the L1 cache in the multi-level cache of the CPU.

CO1

Used for matrix computation and used to store the small-block result matrix C, which can be considered as Cube Out.

CO2

Used for matrix computation and used to store the entire result matrix C, which can be considered as Cube Out.

LCM

Local cache memory, which is the alias of the unified buffer and implements the same function as VECCALC.

SPM

Used to temporarily store data in the unified buffer when the unified buffer may overflow.

SHM

Alias of SPM.

TSCM

Temp Swap Cache Memory, used to temporarily swap data to extra space for Matmul operation.

C2PIPE2GM

Used to store FixPipe quantization parameters.

C2PIPE2LOCAL

Reserved for future use.