Ascend C提供一组MatMul高阶API，方便用户快速实现MatMul矩阵乘法的运算操作。

MatMul的计算公式如下：C = A * B + Bias。

A、B为源操作数，A为左矩阵，形状为[M, K]；B为右矩阵，形状为[K, N]。
C为目的操作数，存放矩阵乘结果的矩阵，形状为[M, N]。
Bias为矩阵乘偏置，形状为[N]。

MatMul的计算示意图如下：

图1 Matmul矩阵乘示意图
点击放大

下文中提及的M轴方向，即为A矩阵纵向；K轴方向，即为A矩阵横向或B矩阵纵向；N轴方向，即为B矩阵横向。

实现MatMul矩阵乘运算的具体步骤如下：

创建Matmul对象。
设置左矩阵A、右矩阵B、Bias。
完成矩阵乘操作。
结束矩阵乘操作。

创建Matmul对象

创建Matmul对象的示例如下：

typedef MatmulType<TPosition::GM, CubeFormat::ND, half> aType; 
typedef MatmulType<TPosition::GM, CubeFormat::ND, half> bType; 
typedef MatmulType<TPosition::GM, CubeFormat::ND, float> cType; 
typedef MatmulType<TPosition::GM, CubeFormat::ND, float> biasType; 
Matmul<aType, bType, cType, biasType> mm;

创建对象时需要传入A、B、C、Bias的参数类型信息，类型信息通过MatmulType来定义，包括：内存逻辑位置、数据格式、数据类型。

template <TPosition POSITION, CubeFormat FORMAT, typename TYPE> struct MatmulType {
    constexpr static TPosition pos = POSITION;
    constexpr static CubeFormat format = FORMAT;
    using T = TYPE;
};

表1 MatmulType参数说明
参数	说明
POSITION	内存逻辑位置 A矩阵可设置为TPosition::GM，TPosition::VECCALC，TPosition::TSCM B矩阵可设置为TPosition::GM，TPosition::VECCALC，TPosition::TSCM Bias可设置为TPosition::GM，TPosition::VECCALC C矩阵可设置为TPosition::GM，TPosition::VECCALC
CubeFormat	A矩阵可设置为CubeFormat::ND，CubeFormat::NZ B矩阵可设置为CubeFormat::ND，CubeFormat::NZ Bias可设置为CubeFormat::ND C矩阵可设置为CubeFormat::ND，CubeFormat::NZ, CubeFormat::ND_ALIGN
TYPE	针对Atlas A2训练系列产品： A矩阵可设置为half或float B矩阵可设置为half或float Bias可设置为half或float C矩阵可设置为half或float 注意：A矩阵和B矩阵数据类型需要一致

初始化操作。
```
mm.Init(&tiling); // 初始化
```

设置左矩阵A、右矩阵B、Bias。

mm.SetTensorA(gm_a);    // 设置左矩阵A
mm.SetTensorB(gm_b);    // 设置右矩阵B
mm.SetBias(gm_bias);    // 设置Bias

完成矩阵乘操作。
- 调用Iterate完成单次迭代计算，叠加while循环完成单核全量数据的计算。Iterate方式，可以自行控制迭代次数，完成所需数据量的计算，方式比较灵活。
```
while (mm.Iterate()) {   
    mm.GetTensorC(gm_c); 
}
```
- 调用IterateAll完成单核上所有数据的计算。IterateAll方式，无需循环迭代，使用比较简单。
```
mm.IterateAll(gm_c);
```
结束矩阵乘操作。
```
mm.End();
```

使用说明