aclnnMoeInitRoutingV3-Transformer类接口-算子接口（aclnn）-算子库接口-API-CANN社区版9.0.0-beta.2开发文档-昇腾社区

[object Object]

[object Object][object Object]undefined

[object Object]

接口功能：MoE的routing计算，根据的计算结果做routing处理，支持不量化、静态量化和动态量化模式。本接口针对V2接口做出如下功能变更，请根据实际情况选择合适的接口：[object Object]
[object Object]
计算公式：

1.对输入expertIdx做排序，得出排序后的结果sortedExpertIdx和对应的序号sortedRowIdx：
$sortedExpertIdx, sortedRowIdx=keyValueSort(expertIdx,rowIdx)$
2.以sortedRowIdx做位置映射得出expandedRowIdxOut：
- rowIdxType等于1时, 输出scatter索引
  $expandedRowIdxOut[i]=sortedRowIdx[i]$
- rowIdxType等于0时, 输出gather索引
  $expandedRowIdxOut[sortedRowIdx[i]]=i$
3.对sortedExpertIdx的每个专家统计直方图结果，得出expertTokensCountOrCumsumOutOptional：
$expertTokensCountOrCumsumOutOptional[i]=Histogram(sortedExpertIdx)$
4.如果quantMode不等于-1, 计算quant结果：
- 静态quant
$quantResult=round((x∗scaleOptional)+offsetOptional)$
- 动态quant：
  - 若不输入scale：
    $dynamicQuantScaleOutOptional = row\_max(abs(x)) / 127$ $quantResult = round(x / dynamicQuantScaleOutOptional)$
  - 若输入scale:
    $dynamicQuantScaleOutOptional = row\_max(abs(x * scaleOptional)) / 127$ $quantResult = round(x / dynamicQuantScaleOutOptional)$
5.若活跃的expert范围为全专家范围时，按照Scatter索引搬运token；反之按照Gather索引搬运token。在dropPadMode为1时将每个专家需要处理的Token个数对齐为expertCapacity个，超过expertCapacity个的Token会被Drop，不足的会用0填充。得出expandedXOut：
- 非量化场景
  - 按照Scatter索引搬运
  $expandedXOut[i]=x[scatterRowIdx[i] // K]$
  - 按照Gather索引搬运
  $expandedXOut[gatherRowIdx[i]]=x[i // K]$
- 量化场景
  - 按照Scatter索引搬运
  $expandedXOut[i]=quantResult[scatterRowIdx[i] // K]$
  - 按照Gather索引搬运
  $expandedXOut[gatherRowIdx[i]]=quantResult[i // K]$
6.expandedRowIdxOut的有效元素数量availableIdxNum，计算方式为expertIdx中activeExpertRangeOptional范围内的元素的个数
$availableIdxNum = |\{x\in expertIdx| expert\_start \le x<expert\_end \ \}|$

[object Object]

每个算子分为，必须先调用 “aclnnMoeInitRoutingV3GetWorkspaceSize”接口获取入参并计算所需workspace大小以及包含了算子计算流程的执行器，再调用“aclnnMoeInitRoutingV3”接口执行计算。

[object Object]

参数说明：
[object Object]
返回值

aclnnStatus：返回状态码，具体参见。

第一段接口完成入参校验，出现以下场景时报错：
[object Object][object Object]
不同产品支持情况差异
- quantMode支持情况差异：
  - [object Object]Atlas A2 训练系列产品/Atlas A2 推理系列产品[object Object]、[object Object]Atlas A3 训练系列产品/Atlas A3 推理系列产品[object Object]：支持-1、0、1。
  - Atlas 350 加速卡：支持-1、1、2、3。
- Atlas 350 加速卡仅支持如下参数的值：
  - activeNum仅支持值等于NUM_ROWS*K。
  - expertCapacity仅校验其值，不使用该参数（即不限制每个专家能够处理的tokens数）。
  - dropPadMode仅支持取值为0。
  - expertTokensNumType仅支持取值1、2。
  - expertTokensNumFlag仅支持取值为true。

[object Object]

参数说明：
[object Object]
返回值：

返回aclnnStatus状态码，具体参见。

[object Object]

确定性计算：
- aclnnMoeInitRoutingV3默认确定性实现。
该算子在以下产品型号上支持三种性能模板，需要分别额外满足准入条件，否则进入通用模板：
- 支持性能模板的产品：
  - [object Object]Atlas A2 训练系列产品/Atlas A2 推理系列产品[object Object]
  - [object Object]Atlas A3 训练系列产品/Atlas A3 推理系列产品[object Object]
- 性能模板的准入条件：[object Object]

[object Object]

示例代码如下，仅供参考，具体编译和执行过程请参考。

[object Object]