开发者
资源
[object Object]

[object Object][object Object]undefined
[object Object]
  • 接口功能:推理网络为了提升性能,将sin和cos输入通过cache传入,执行旋转位置编码计算。该接口相较于接口,新增cacheMode参数,指示拼接cos和sin的方式

    • cacheMode=0时,与实现相同,为分段式拼接cos和sin。
    • cacheMode=1时,为交错式拼接cos和sin。
  • 计算公式:

    1、mrope模式:positions的shape输入是[m, numTokens], m为mropeSection的元素数,支持3或4:

    cosSin[i]=cosSinCache[positions[i]]cosSin[i] = cosSinCache[positions[i]] cos,sin=cosSin.chunk(2,dim=1)cos, sin = cosSin.chunk(2, dim=-1)

    (1)cacheMode为0:

    • mropeSection的元素数为3:

      cos0=cos[0,:,:mropeSection[0]]cos0 = cos[0, :, :mropeSection[0]] cos1=cos[1,:,mropeSection[0]:(mropeSection[0]+mropeSection[1])]cos1 = cos[1, :, mropeSection[0]:(mropeSection[0] + mropeSection[1])] cos2=cos[2,:,(mropeSection[0]+mropeSection[1]):(mropeSection[0]+mropeSection[1]+mropeSection[2])]cos2 = cos[2, :, (mropeSection[0] + mropeSection[1]):(mropeSection[0] + mropeSection[1] + mropeSection[2])] cos=torch.cat((cos0,cos1,cos2),dim=1)cos = torch.cat((cos0, cos1, cos2), dim=-1) sin0=sin[0,:,:mropeSection[0]]sin0 = sin[0, :, :mropeSection[0]] sin1=sin[1,:,mropeSection[0]:(mropeSection[0]+mropeSection[1])]sin1 = sin[1, :, mropeSection[0]:(mropeSection[0] + mropeSection[1])] sin2=sin[2,:,(mropeSection[0]+mropeSection[1]):(mropeSection[0]+mropeSection[1]+mropeSection[2])]sin2 = sin[2, :, (mropeSection[0] + mropeSection[1]):(mropeSection[0] + mropeSection[1] + mropeSection[2])] sin=torch.cat((sin0,sin1,sin2),dim=1)sin= torch.cat((sin0, sin1, sin2), dim=-1) queryRot=query[...,:rotaryDim]queryRot = query[..., :rotaryDim] queryPass=query[...,rotaryDim:]queryPass = query[..., rotaryDim:]
    • mropeSection的元素数为4:

      cos=torch.cat([m[i] for i,m in enumerate(cos.split(mropeSection,dim=1))],dim=1)cos = torch.cat([m[i]\ for\ i, m\ in\ enumerate(cos.split(mropeSection, dim=-1))], dim=-1) sin=torch.cat([m[i] for i,m in enumerate(sin.split(mropeSection,dim=1))],dim=1)sin = torch.cat([m[i]\ for\ i, m\ in\ enumerate(sin.split(mropeSection, dim=-1))], dim=-1) queryRot=query[...,:rotaryDim]queryRot = query[..., :rotaryDim] queryPass=query[...,rotaryDim:]queryPass = query[..., rotaryDim:]

    (2)cacheMode为1:

    cosTmp=coscosTmp = cos cos[...,1:mropeSection[1]3:3]=cosTmp[1,...,1:mropeSection[1]3:3]cos [..., 1:mropeSection[1] * 3:3] = cosTmp[1, ..., 1:mropeSection[1] * 3:3] cos[...,2:mropeSection[1]3:3]=cosTmp[2,...,2:mropeSection[1]3:3]cos[..., 2:mropeSection[1] * 3:3] = cosTmp[2, ..., 2:mropeSection[1] * 3:3] sinTmp=sinsinTmp = sin sin[...,1:mropeSection[1]3:3]=sinTmp[1,...,1:mropeSection[1]3:3]sin[..., 1:mropeSection[1] * 3:3] = sinTmp [1, ..., 1:mropeSection[1] * 3:3] sin[...,2:mropeSection[1]3:3]=sinTmp[2,...,2:mropeSection[1]3:3]sin[..., 2:mropeSection[1] * 3:3] = sinTmp [2, ..., 2:mropeSection[1] * 3:3] queryRot=query[...,:rotaryDim]queryRot = query[..., :rotaryDim] queryPass=query[...,rotaryDim:]queryPass = query[..., rotaryDim:]

    (1)rotate_half(GPT-NeoX style)计算模式:

    x1,x2=torch.chunk(queryRot,2,dim=1)x1, x2 = torch.chunk(queryRot, 2, dim=-1) o1[i]=x1[i]cos[i]x2[i]sin[i]o1[i] = x1[i] * cos[i] - x2[i] * sin[i] o2[i]=x2[i]cos[i]+x1[i]sin[i]o2[i] = x2[i] * cos[i] + x1[i] * sin[i] queryRot=torch.cat((o1,o2),dim=1)queryRot = torch.cat((o1, o2), dim=-1) query=torch.cat((queryRot,queryPass),dim=1)query = torch.cat((queryRot, queryPass), dim=-1)

    (2)rotate_interleaved(GPT-J style)计算模式:

    x1=queryRot[...,::2]x1 = queryRot[..., ::2] x2=queryRot[...,1::2]x2 = queryRot[..., 1::2] o1[i]=x1[i]cos[i]x2[i]sin[i]o1[i] = x1[i] * cos[i] - x2[i] * sin[i] o2[i]=x2[i]cos[i]+x1[i]sin[i]o2[i] = x2[i] * cos[i] + x1[i] * sin[i] queryRot=torch.stack((o1,o2),dim=1)queryRot = torch.stack((o1, o2), dim=-1) query=torch.cat((queryRot,queryPass),dim=1)query = torch.cat((queryRot, queryPass), dim=-1)

    2、rope模式:positions的shape输入是[numTokens]:

    cosSin[i]=cosSinCache[positions[i]]cosSin[i] = cosSinCache[positions[i]] cos,sin=cosSin.chunk(2,dim=1)cos, sin = cosSin.chunk(2, dim=-1) queryRot=query[...,:rotaryDim]queryRot = query[..., :rotaryDim] queryPass=query[...,rotaryDim:]queryPass = query[..., rotaryDim:]

    (1)rotate_half(GPT-NeoX style)计算模式:

    x1,x2=torch.chunk(queryRot,2,dim=1)x1, x2 = torch.chunk(queryRot, 2, dim=-1) o1[i]=x1[i]cos[i]x2[i]sin[i]o1[i] = x1[i] * cos[i] - x2[i] * sin[i] o2[i]=x2[i]cos[i]+x1[i]sin[i]o2[i] = x2[i] * cos[i] + x1[i] * sin[i] queryRot=torch.cat((o1,o2),dim=1)queryRot = torch.cat((o1, o2), dim=-1) query=torch.cat((queryRot,queryPass),dim=1)query = torch.cat((queryRot, queryPass), dim=-1)

    (2)rotate_interleaved(GPT-J style)计算模式:

    x1=queryRot[...,::2]x1 = queryRot[..., ::2] x2=queryRot[...,1::2]x2 = queryRot[..., 1::2] o1[i]=x1[i]cos[i]x2[i]sin[i]o1[i] = x1[i] * cos[i] - x2[i] * sin[i] o2[i]=x2[i]cos[i]+x1[i]sin[i]o2[i] = x2[i] * cos[i] + x1[i] * sin[i] queryRot=torch.stack((o1,o2),dim=1)queryRot = torch.stack((o1, o2), dim=-1) query=torch.cat((queryRot,queryPass),dim=1)query = torch.cat((queryRot, queryPass), dim=-1)
[object Object]

每个算子分为,必须先调用“aclnnRopeWithSinCosCacheV2GetWorkspaceSize”接口获取计算所需workspace大小以及包含了算子计算流程的执行器,再调用“aclnnRopeWithSinCosCacheV2”接口执行计算。

[object Object]
[object Object]
[object Object]
  • 参数说明

    [object Object]
  • 返回值:

    aclnnStatus:返回状态码,具体参见

    第一段接口完成入参校验,出现以下场景时报错:

    [object Object]
[object Object]
  • 参数说明:

    [object Object]
  • 返回值:

    aclnnStatus:返回状态码,具体参见

[object Object]
  • 确定性计算:
    • aclnnRopeWithSinCosCacheV2默认确定性实现。
  • queryIn、keyIn、cosSinCache只支持2维shape输入。
  • queryIn、keyIn、cosSinCache输入的数据类型需要保持一致。
  • headSize:数据类型为BFLOAT16或FLOAT16时为32的倍数,数据类型为FLOAT32时为16的倍数。
  • rotaryDim:始终小于等于headSize;数据类型为BFLOAT16或FLOAT16时为32的倍数,数据类型为FLOAT32时为16的倍数;mrope模式下应满足mropeSection所有元素累加和为rotaryDim值的一半。
  • 输入tensor positions的取值应小于cosSinCache的0维maxSeqLen。
  • mrope模式下,mropeSection:取值当前仅支持[16, 24, 24]、[24, 20, 20]、[8, 12, 12]和[16, 16, 16, 16]。
  • mrope模式下,cacheMode仅支持0和1, 当mropeSection为[16, 16, 16, 16]时,仅支持0。
[object Object]

示例代码如下,仅供参考,具体编译和执行过程请参考

[object Object]