Matmul Features
In addition to the basic computing capabilities described in Basic Knowledge and Operator Implementation, Matmul matrix programming also provides processing capabilities and functions applicable to different scenarios. The following table lists the scenarios and functions. For details, see the following sections.
|
Category |
Feature Description |
Overview |
|---|---|---|
|
Function Implementation |
In multi-core scenarios, matrix data can be tiled along the M, N, and K axes. This feature is used to implement parallel matrix multiplication on multiple cores when M is exactly divided by singleCoreM, N is exactly divided by singleCoreN, and K is exactly divided by singleCoreK. |
|
|
In multi-core scenarios, matrix data can be tiled along the M, N, and K axes. This feature is used to implement the processing mode when M is not exactly divided by singleCoreM, N is not exactly divided by singleCoreN, or K is not exactly divided by singleCoreK (that is, the tail block scenario). |
||
|
In the MIX scenario (including matrix and vector computation), other computations can be performed first without waiting for the completion of matrix multiplication. |
||
|
Customized data movement functions before and after matrix multiplication. This function allows users to customize the process of moving the left matrix A and right matrix B from the global memory to A1 and B1 respectively, and the process of moving the output matrix C from CO1 to the global memory. |
||
|
Channel splitting of the matrix multiplication output is also called ChannelSplit. It refers to the output matrix C in float data type and NZ data format, which is stored in the fractal size of 16 x 8. |
||
|
Matrix-vector multiplication, also called GEMV, refers to the scenario where M = 1 and K > 1 in matrix multiplication. That is, matrix multiplication is performed on the left matrix A with the shape of (1, K). |
||
|
The computation of elements in the lower or upper triangular part of the matrix is ignored, and the matrix multiplication of elements in the upper or lower triangular part of the matrix is implemented. |
||
|
Matrix multiplication is performed on the left matrix A or right matrix B whose logical memory location is TSCM. |
||
|
N-direction alignment of the matrix multiplication output is also called ND_ALIGN output. It refers to the automatic padding and output of the output matrix C in ND_ALIGN format in the N direction with 32-byte alignment. |
||
|
Partial output of single matrix multiplication, also called partial output, refers to the output of the computation result directly without accumulating the computation result in the K direction of a single core during matrix multiplication. |
||
|
The independent running mechanism of AIC and AIV is also called dual-master mode. In the MIX scenario (including matrix and vector computation), the AIC and AIV cores run independently and are not driven by messages. |
|
Category |
Feature Description |
Overview |
|---|---|---|
|
Function Implementation |
Quantization/Dequantization of the matrix multiplication output |
When the matrix multiplication result is moved from CO1 to the global memory, the data quantization or dequantization operation is performed on the matrix elements. |
|
4:2 sparse matrix multiplication is also called sparse matmul. It refers to the matrix multiplication performed on the sparse left matrix A and the right matrix B that is 4:2 dense. |
|
Category |
Feature Description |
Overview |
|---|---|---|
|
Function Implementation |
The basic functions of Batch Matmul support batch processing of Matmul. The IterateBatch API is called once to calculate multiple C matrices of the size singleCoreM x singleCoreN. |
|
|
The same bias matrix without the batch axis is reused for Matmul computation in each batch. |