TPosition
When managing physical memories at different levels, uses an abstract logical position (TPosition) to express memory at different levels, replacing on-chip physical storage and hiding the hardware architecture. The main TPosition types are as follows: VECIN, VECOUT, VECCALC, A1, A2, B1, B2, C1, C2, CO1, and CO2. VECIN, VECCALC, and VECOUT are used for vector programming, and A1, A2, B1, B2, C1, C2, CO1, and CO2 are used for matrix programming. You can refer to Programming Paradigm to understand the basic concepts of TPosition, and Table 1 to understand the mapping between TPosition and physical storage.
TPosition is defined as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
enum class TPosition : uint8_t { GM, A1, A2, B1, B2, C1, C2, CO1, CO2, VECIN, VECOUT, VECCALC, LCM = VECCALC, SPM, SHM = SPM, TSCM, C2PIPE2GM, C2PIPE2LOCAL, MAX, }; |
The enumerated values of TPosition are defined as follows.
|
Enumerated Value |
Description |
|---|---|
|
GM |
Global memory, corresponding to the external memory of AI Core. |
|
VECIN |
Used for vector computation; storage location of the move-in data. This location is used when data is moved in to the Vector Unit. |
|
VECOUT |
Used for vector computation; storage location of the move-out data. This location is used when moving out the result from the Vector Unit. |
|
VECCALC |
Used for vector/matrix computation. This location is used when temporary variables are required for the computation. |
|
A1 |
Used for matrix computation and used to store the entire matrix A, which is similar to the L2 cache in the multi-level cache of the CPU. |
|
B1 |
Used for matrix computation and used to store the entire matrix B, which is similar to the L2 cache in the multi-level cache of the CPU. |
|
C1 |
Used for matrix computation and used to store the entire bias matrix, which is similar to the L2 cache in the multi-level cache of the CPU. |
|
A2 |
Used for matrix computation and used to store the split smaller matrix A, which is similar to the L1 cache in the multi-level cache of the CPU. |
|
B2 |
Used for matrix computation and used to store the split smaller matrix B, which is similar to the L1 cache in the multi-level cache of the CPU. |
|
C2 |
Used for matrix computation and used to store the split smaller bias matrix, which is similar to the L1 cache in the multi-level cache of the CPU. |
|
CO1 |
Used for matrix computation and used to store the small-block result matrix C, which can be considered as Cube Out. |
|
CO2 |
Used for matrix computation and used to store the entire result matrix C, which can be considered as Cube Out. |
|
LCM |
Local cache memory, which is the alias of the unified buffer and implements the same function as VECCALC. |
|
SPM |
Used to temporarily store data in the unified buffer when the unified buffer may overflow. |
|
SHM |
Alias of SPM. |
|
Temp Swap Cache Memory, used to temporarily swap data to extra space for Matmul operation. |
|
|
C2PIPE2GM |
Used to store FixPipe quantization parameters. |
|
C2PIPE2LOCAL |
Reserved for future use. |