aclnnFusedLinearCrossEntropyLossGrad

产品支持情况

[object Object]undefined

功能说明

接口功能：本算子是词汇表并行场景下交叉熵损失计算模块中的一部分，解决超大规模词汇表下的显存和计算效率问题，当前部分为梯度计算实现，用于计算叶子节点[object Object]和[object Object]的梯度。需要获得[object Object]、[object Object]的相关输出，以及[object Object]相关的全局通信结果作为本接口输入。
计算公式：

高性能模式，softmaxOptional非nullptr：

\text{softmax} \in \mathbb{R}^{BT \times V}

\text{arange\_1d} = [0, 1, \dots, BT-1] \in \mathbb{N}^{BT}

\text{softmax\_update} = \mathbf{1} - \text{target\_mask}.view(-1) \in \mathbb{R}^{BT}

\text{softmax}[\text{arange\_1d}, \text{masked\_target}] \leftarrow \text{softmax}[\text{arange\_1d}, \text{masked\_target}] - \text{softmax\_update}

\text{softmax} \leftarrow \text{softmax} \odot \text{grad}.unsqueeze(-1) \in \mathbb{R}^{BT \times V}

\text{grad\_input} = \text{softmax} \cdot \text{weight}^T \in \mathbb{R}^{BT \times H}

\text{grad\_weight} = \text{softmax}^T \cdot \text{input} \in \mathbb{R}^{V \times H}

[object Object]

\text{vocab\_parallel\_logits} = \text{input} \cdot \text{weight}^T \quad \in \mathbb{R}^{BT \times V}

\text{logits\_sub} = \text{vocab\_parallel\_logits} - \text{logits\_max}.unsqueeze(-1) \quad \in \mathbb{R}^{BT \times V}

\text{exp\_logits} = \exp(\text{logits\_sub}) \quad \in \mathbb{R}^{BT \times V}

\text{exp\_logits} \gets \frac{\text{exp\_logits}}{\text{sum\_exp\_logits}.unsqueeze(-1)} \quad \in \mathbb{R}^{BT \times V}

\text{grad\_logits} = \text{exp\_logits} \quad \in \mathbb{R}^{BT \times V}

\text{grad\_2d} = \text{grad\_logits}.view(-1, \text{partition\_vocab\_size}) \quad \in \mathbb{R}^{BT \times V}

\text{arange\_1d} = [0, 1, \dots, BT-1] \quad \in \mathbb{N}^{BT}

\text{softmax\_update} = 1 - \text{target\_mask}.view(-1) \quad \in \mathbb{R}^{BT}

\text{grad\_2d}[\text{arange\_1d}, \text{masked\_target\_1d}] \gets \text{grad\_2d}[\text{arange\_1d}, \text{masked\_target\_1d}] - \text{softmax\_update}

\text{grad\_logits} \gets \text{grad\_logits} \odot \text{grad}.unsqueeze(-1) \quad \in \mathbb{R}^{BT \times V}

\text{grad\_input} = \text{grad\_logits} \cdot \text{weight} \quad \in \mathbb{R}^{BT \times H}

\text{grad\_weight} = \text{grad\_logits}^T \cdot \text{input} \quad \in \mathbb{R}^{V \times H}

函数原型

每个算子分为，必须先调用[object Object]接口获取计算所需workspace大小以及包含了算子计算流程的执行器，再调用[object Object]接口执行计算。

[object Object]

[object Object]

aclnnFusedLinearCrossEntropyLossGradGetWorkspaceSize

参数说明
[object Object]
返回值

aclnnStatus：返回状态码，具体参见。

第一段接口完成入参校验，出现以下场景时报错：
[object Object]

aclnnFusedLinearCrossEntropyLossGrad

参数说明
[object Object]
返回值

aclnnStatus：返回状态码，具体参见。

约束说明

确定性说明：
- aclnnFusedLinearCrossEntropyLossGrad默认确定性实现。

调用示例

示例代码如下，仅供参考，具体编译和执行过程请参考。

[object Object]