aclnnTfScatterAdd

产品支持情况

产品	是否支持
[object Object]Atlas A3 训练系列产品/Atlas A3 推理系列产品[object Object]	√
[object Object]Atlas A2 训练系列产品/Atlas 800I A2 推理产品/A200I A2 Box 异构组件[object Object]	√
[object Object]Atlas 200I/500 A2 推理产品[object Object]	×
[object Object]Atlas 推理系列产品 [object Object]	×
[object Object]Atlas 训练系列产品[object Object]	×

功能说明

算子功能：对标tf.scatter_add，将tensor updates中的值按指定的索引tensor indices加到tensor varRef的切片上。若有多于一个updates值被填入到varRef的同一个切片，那么这些值将会在这一切片上进行累加。规则如下： $varRef[indices[i,...,j],...] = varRef[indices[i,...,j],...] + updates$
用例：

当indices为一维时：输入tensor $varRef = \begin{bmatrix} 1 & 2 & 3 & 4 & 5 \end{bmatrix}$ , 索引tensor $indices = \begin{bmatrix} 0 & 2 & 4 \end{bmatrix}$ , 源tensor $updates = \begin{bmatrix} 10 & 20 & 30 \end{bmatrix}$ , 输出tensor $varRef = \begin{bmatrix} 11 & 2 & 23 & 4 & 35 \end{bmatrix}$

当indices为二维及以上：输入tensor $varRef = \begin{bmatrix} [1 & 2] \\ [3 & 4] \\ [5 & 6] \end{bmatrix}$ , 索引tensor $indices = \begin{bmatrix} [0 & 1] \\ [2 & 0] \end{bmatrix}$ , 源tensor $updates = \begin{bmatrix} \begin{bmatrix} [10 & 20] \\ [30 & 40] \end{bmatrix}, \begin{bmatrix} [50 & 60] \\ [70 & 80] \end{bmatrix} \end{bmatrix}$ , 输出tensor $varRef = \begin{bmatrix} [81 & 102] \\ [33 & 44] \\ [55 & 66] \end{bmatrix}$

函数原型

每个算子分为undefined，必须先调用“aclnnTfScatterAddGetWorkspaceSize”接口获取计算所需workspace大小以及包含了算子计算流程的执行器，再调用“aclnnTfScatterAdd”接口执行计算。

aclnnStatus aclnnTfScatterAddGetWorkspaceSize(aclTensor *varRef, const aclTensor *indices, const aclTensor *updates, uint64_t *workspaceSize, aclOpExecutor **executor)
aclnnStatus aclnnTfScatterAdd(void *workspace, uint64_t workspaceSize, aclOpExecutor *executor, aclrtStream stream)

aclnnTfScatterAddGetWorkspaceSize

参数说明
- varRef(aclTensor *，计算输入|计算输出)：公式中的输入varRef，Device侧的aclTensor。支持undefined，undefined支持ND，维数支持1~8维，数据类型需要与updates一致。数据类型支持FLOAT32，FLOAT16，BFLOAT16，INT32，INT8，UINT8
- indices(aclTensor*，计算输入)：公式中的输入indices，Device侧的aclTensor。支持undefined，undefined支持ND，维数支持1~8维，indices中的索引数据不支持越界。数据类型支持INT32、INT64。
- updates(aclTensor*，计算输入)：公式中的输入updates，Device侧的aclTensor。支持undefined，undefined支持ND，维数支持1~8维，数据类型需要与varRef一致。数据类型支持FLOAT32，FLOAT16，BFLOAT16，INT32，INT8，UINT8
- workspaceSize(uint64_t *，计算输入)：返回需要在Device侧申请的workspace大小。
- executor(uint64_t *，出参)：返回op执行器，包含了算子计算流程。
返回值

返回aclnnStatus状态码，具体参见undefined。

[object Object]

aclnnTfScatterAdd

参数说明
- workspace(void *，入参)：在Device侧申请的workspace内存地址。
- workspaceSize(uint64_t，入参)：在Device侧申请的workspace大小，由第一段接口aclnnTfScatterAddGetWorkspaceSize获取。
- executor(aclOpExecutor *，入参)：op执行器，包含了算子计算流程。
- stream(aclrtStream，入参)：指定执行任务的Stream
返回值

返回aclnnStatus状态码，具体参见undefined。

约束说明

indices中值的取值范围为[0, varRef.shape[0] - 1]。举例：varRef.shape=[2, 3, 4]，则indices中值的取值范围为[0, 1]。不支持索引越界。若出现索引越界，则不对varRef进行更新。
updates.shape = indices.shape + varRef.shape[1:]。举例：varRef.shape=[2, 3, 4]，indices.shape=[4, 5, 6]，则updates.shape=[4, 5, 6, 3, 4]。
涉及且支持确定性计算，默认随机累加，开启确定性计算之后顺序累加。

调用示例

示例代码如下，仅供参考，具体编译和执行过程请参考undefined。

[object Object]