调用一次IterateAll，会计算出singleCoreM * singleCoreN大小的C矩阵。迭代顺序可通过tiling参数iterateOrder调整。

template <bool sync = true>

__aicore__ inline void IterateAll(const GlobalTensor<DstT>& gm, uint8_t enAtomic = 0,

bool enSequentialWrite = false, bool waitIterateAll = false){};

template <bool sync = true>

__aicore__ inline void IterateAll(const LocalTensor<DstT>& ubCmatrix, uint8_t enAtomic = 0){};

表1 模板参数说明
参数名	描述
sync	设置同步或者异步模式：同步模式设置为true；异步模式设置为false。

表2 接口参数说明
参数名	输入/输出	描述
gm	输入	C矩阵放置于Global Memory的地址。 Atlas推理系列产品（Ascend 310P处理器）AI Core，支持的数据类型为：half/float Atlas A2训练系列产品，支持的数据类型为：half/float/bfloat16_t
ubCmatrix	输入	C矩阵放置于Local Memory的地址。 Atlas推理系列产品（Ascend 310P处理器）AI Core，支持的数据类型为：half/float Atlas A2训练系列产品，支持的数据类型为：half/float/bfloat16_t
enAtomic	输入	是否开启Atomic操作，默认值为0。参数取值： 0：不开启Atomic操作 1：开启AtomicAdd累加操作 2：开启AtomicMax求最大值操作 3：开启AtomicMin求最小值操作对于Atlas A2训练系列产品，只有输出位置是GM才支持开启Atomic操作。
enSequentialWrite	输入	是否开启连续写模式到GM（连续写，写入[baseM,baseN]；非连续写，写入[singleCoreM、singleCoreN]中对应的位置），默认值false（非连续写模式）。
waitIterateAll	输入	仅在异步场景下使用，是否需要等待IterateAll执行结束。该参数当前为预留参数。

无

Atlas A2训练系列产品

Atlas推理系列产品（Ascend 310P处理器）AI Core

传入的C矩阵地址空间大小需要保证不小于singleM * singleN。