General Matrix-Vector Multiplication

Overview

General Matrix-Vector Multiplication, or GEMV, refers to a scenario in which matrix multiplication is performed on the left matrix A with shape (1, K) and a right matrix B with shape (K, N) in Matmul computation when M is 1. Matmul allows you to enable the GEMV mode by setting the data format of matrix A to VECTOR in the tiling process and on the kernel. In this way, the computation scenario where M is 1 can be efficiently processed. If the GEMV mode is disabled when M is 1, the M direction is processed as a non-alignment scenario during Matmul computation. Compared with the non-alignment processing mode, the GEMV mode moves less data and provides better performance.

The following uses Matmul with M = 1, K = 256, N = 32, and the data type of both the left and right matrices being half, as an example to illustrate the internal processing process of the Matmul API in GEMV mode.

GEMV
When matrix A is moved from A1 to A2, the 1 x 256 vector is processed as a 16 x 16 matrix. The LoadData API is called to move the 16 x 16 fractal matrix at once. The movement and matrix multiplication of matrix B are the same as those in basic scenarios, as shown in the following figure.

Figure 1 Matrix multiplication in GEMV mode (M = 1)
Non-GEMV
When matrix A is moved from A1 to A2, the 1 x 256 vector is processed as non-aligned matrix data. The M direction needs to be 32-byte aligned before the movement. The LoadData API is called to move a 16 x 16 fractal matrix each time, for a total of 16 times (K/16). As a result, the amount of moved data increases, and the performance is poorer than that in GEMV mode, as illustrated in the following figure.

Figure 2 Matrix multiplication in non-GEMV mode (M = 1)

Application Scenarios

Matrix multiplication is performed on matrix A (M = 1, K > 1) with the shape of (1, K). In other words, the input matrix A consists of vector data.

Restrictions

In Matmul computation, the precondition for enabling GEMV is that M, the original input shape of matrix A, must be 1.

In the GEMV scenario, the left matrix A cannot be transposed.
In the GEMV scenario, the left matrix data in the global memory must be 16-byte aligned.

Example

For a complete operator example, see matmul_gemv operator sample.

Tiling implementation

Call the SetAType API to set the data format of matrix A to CubeFormat::VECTOR. Other tiling implementation details are the same as those in basic scenarios.

         
              auto ascendcPlatform = platform_ascendc::PlatformAscendC(context->GetPlatformInfo());
matmul_tiling::MatmulApiTiling tiling(ascendcPlatform);
// Call the API to set the format of matrix A to CubeFormat::VECTOR.
tiling.SetAType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::VECTOR, matmul_tiling::DataType::DT_FLOAT16);
tiling.SetBType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT16); 
tiling.SetCType(matmul_tiling::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT);
tiling.SetBiasType(AscendC::TPosition::GM, matmul_tiling::CubeFormat::ND, matmul_tiling::DataType::DT_FLOAT); 
... // Other implementation details.
optiling::TCubeTiling tilingData;   
int ret = tiling.GetTiling(tilingData);

Kernel implementation

In the GEMV scenario, when a Matmul object is created, the data format of the template parameter A_TYPE is set to CubeFormat::VECTOR. This is different from basic scenarios.

         
              #include "lib/matmul_intf.h"

using A_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::VECTOR, half>; 
using B_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, half>;
using C_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float>; 
using BIAS_TYPE = AscendC::MatmulType<AscendC::TPosition::GM, CubeFormat::ND, float>; 
AscendC::Matmul<A_TYPE, B_TYPE, C_TYPE, BIAS_TYPE> mm;

Parent topic: Feature Scenarios