aclnnRopeWithSinCosCacheV2-Transformer类接口-算子接口（aclnn）-算子库接口-API-CANN社区版9.0.0-beta.2开发文档-昇腾社区

[object Object]

[object Object][object Object]undefined

[object Object]

接口功能：推理网络为了提升性能，将sin和cos输入通过cache传入，执行旋转位置编码计算。该接口相较于接口，新增cacheMode参数，指示拼接cos和sin的方式：
- cacheMode=0时，与实现相同，为分段式拼接cos和sin。
- cacheMode=1时，为交错式拼接cos和sin。
计算公式：

1、mrope模式：positions的shape输入是[m, numTokens], m为mropeSection的元素数，支持3或4：
$cosSin[i] = cosSinCache[positions[i]]$ $cos, sin = cosSin.chunk(2, dim=-1)$
（1）cacheMode为0：
- mropeSection的元素数为3：
  $cos0 = cos[0, :, :mropeSection[0]]$ $cos1 = cos[1, :, mropeSection[0]:(mropeSection[0] + mropeSection[1])]$ $cos2 = cos[2, :, (mropeSection[0] + mropeSection[1]):(mropeSection[0] + mropeSection[1] + mropeSection[2])]$ $cos = torch.cat((cos0, cos1, cos2), dim=-1)$ $sin0 = sin[0, :, :mropeSection[0]]$ $sin1 = sin[1, :, mropeSection[0]:(mropeSection[0] + mropeSection[1])]$ $sin2 = sin[2, :, (mropeSection[0] + mropeSection[1]):(mropeSection[0] + mropeSection[1] + mropeSection[2])]$ $sin= torch.cat((sin0, sin1, sin2), dim=-1)$ $queryRot = query[..., :rotaryDim]$ $queryPass = query[..., rotaryDim:]$
- mropeSection的元素数为4：
  $cos = torch.cat([m[i]\ for\ i, m\ in\ enumerate(cos.split(mropeSection, dim=-1))], dim=-1)$ $sin = torch.cat([m[i]\ for\ i, m\ in\ enumerate(sin.split(mropeSection, dim=-1))], dim=-1)$ $queryRot = query[..., :rotaryDim]$ $queryPass = query[..., rotaryDim:]$
（2）cacheMode为1：
$cosTmp = cos$ $cos [..., 1:mropeSection[1] * 3:3] = cosTmp[1, ..., 1:mropeSection[1] * 3:3]$ $cos[..., 2:mropeSection[1] * 3:3] = cosTmp[2, ..., 2:mropeSection[1] * 3:3]$ $sinTmp = sin$ $sin[..., 1:mropeSection[1] * 3:3] = sinTmp [1, ..., 1:mropeSection[1] * 3:3]$ $sin[..., 2:mropeSection[1] * 3:3] = sinTmp [2, ..., 2:mropeSection[1] * 3:3]$ $queryRot = query[..., :rotaryDim]$ $queryPass = query[..., rotaryDim:]$
（1）rotate_half（GPT-NeoX style）计算模式：
$x1, x2 = torch.chunk(queryRot, 2, dim=-1)$ $o1[i] = x1[i] * cos[i] - x2[i] * sin[i]$ $o2[i] = x2[i] * cos[i] + x1[i] * sin[i]$ $queryRot = torch.cat((o1, o2), dim=-1)$ $query = torch.cat((queryRot, queryPass), dim=-1)$
（2）rotate_interleaved（GPT-J style）计算模式：
$x1 = queryRot[..., ::2]$ $x2 = queryRot[..., 1::2]$ $o1[i] = x1[i] * cos[i] - x2[i] * sin[i]$ $o2[i] = x2[i] * cos[i] + x1[i] * sin[i]$ $queryRot = torch.stack((o1, o2), dim=-1)$ $query = torch.cat((queryRot, queryPass), dim=-1)$
2、rope模式：positions的shape输入是[numTokens]：
$cosSin[i] = cosSinCache[positions[i]]$ $cos, sin = cosSin.chunk(2, dim=-1)$ $queryRot = query[..., :rotaryDim]$ $queryPass = query[..., rotaryDim:]$
（1）rotate_half（GPT-NeoX style）计算模式：
$x1, x2 = torch.chunk(queryRot, 2, dim=-1)$ $o1[i] = x1[i] * cos[i] - x2[i] * sin[i]$ $o2[i] = x2[i] * cos[i] + x1[i] * sin[i]$ $queryRot = torch.cat((o1, o2), dim=-1)$ $query = torch.cat((queryRot, queryPass), dim=-1)$
（2）rotate_interleaved（GPT-J style）计算模式：
$x1 = queryRot[..., ::2]$ $x2 = queryRot[..., 1::2]$ $o1[i] = x1[i] * cos[i] - x2[i] * sin[i]$ $o2[i] = x2[i] * cos[i] + x1[i] * sin[i]$ $queryRot = torch.stack((o1, o2), dim=-1)$ $query = torch.cat((queryRot, queryPass), dim=-1)$

[object Object]

每个算子分为，必须先调用“aclnnRopeWithSinCosCacheV2GetWorkspaceSize”接口获取计算所需workspace大小以及包含了算子计算流程的执行器，再调用“aclnnRopeWithSinCosCacheV2”接口执行计算。

[object Object]

参数说明：
[object Object]
返回值：

aclnnStatus：返回状态码，具体参见。

第一段接口完成入参校验，出现以下场景时报错：
[object Object]

[object Object]

参数说明：
[object Object]
返回值：

aclnnStatus：返回状态码，具体参见。

[object Object]

确定性计算：
- aclnnRopeWithSinCosCacheV2默认确定性实现。
queryIn、keyIn、cosSinCache只支持2维shape输入。
queryIn、keyIn、cosSinCache输入的数据类型需要保持一致。
headSize：数据类型为BFLOAT16或FLOAT16时为32的倍数，数据类型为FLOAT32时为16的倍数。
rotaryDim：始终小于等于headSize；数据类型为BFLOAT16或FLOAT16时为32的倍数，数据类型为FLOAT32时为16的倍数;mrope模式下应满足mropeSection所有元素累加和为rotaryDim值的一半。
输入tensor positions的取值应小于cosSinCache的0维maxSeqLen。
mrope模式下，mropeSection：取值当前仅支持[16, 24, 24]、[24, 20, 20]、[8, 12, 12]和[16, 16, 16, 16]。
mrope模式下，cacheMode仅支持0和1, 当mropeSection为[16, 16, 16, 16]时，仅支持0。

[object Object]

示例代码如下，仅供参考，具体编译和执行过程请参考。

[object Object]