SetAippFunctions

功能说明

设置图片预处理（AIPP，AI core pre-process）相关参数。和LoadImageToLocal接口配合使用。设置后，调用LoadImageToLocal接口可在搬运过程中完成图像预处理操作：包括数据填充，通道交换，单行读取、数据类型转换、通道填充、色域转换。调用SetAippFunctions接口时需传入源图片在Global Memory上的矩阵、源图片的图片格式。

数据填充：在图片HW方向上padding。分为如下几种模式：
- 模式0：常量填充模式，padding区域各位置填充为常数，支持设置每个通道填充的常数。该模式下仅支持左右padding，不支持上下padding。
  图1 常量填充模式
- 模式1：行列填充模式，padding区域各位置填充行/列上最邻近源图片位置的数据。
  图2 行列填充模式
- 模式2：块填充模式，按照padding的宽高，从源图片拷贝数据块进行padding区域填充。
  图3 块填充模式
- 模式3：镜像块填充模式，按照padding的宽高，从源图片拷贝数据块的镜像进行padding区域填充。
  图4 镜像块填充模式

通道交换：将图片通道进行交换。
对于RGB888格式，支持交换R和B通道。

对于YUV420SP格式，支持交换U和V通道。

对于XRGB8888格式，支持X通道后移（XRGB->RGBX）、支持交换R和B通道。

例：对于XRGB格式的数据，芯片在处理的时候会默认丢弃掉第四个通道的数据留下XRG。因此如果要保留RGB通道的数据，对于XRGB输入的需要后移X通道，将输入的通道转换为RGBX即可。

单行读取：源图片中仅读取一行。

调用数据搬运接口时，开启单行读取后设置的目的图片高度参数无效，如LoadImageToLocal接口的loadImageToLocalParams.verSize。

数据类型转换：转换像素的数据类型，支持uint8转换为int8或fp16。

        
             // 例1：实现uint8 ->int8 的类型转换，同时实现零均值化：设置每个通道mean值为该通道所有数据的平均值（min和var值无效，不用设置）。
output[i][j][k] = input[i][j][k] – mean[k]
// 例2：实现uint8 -> fp16 的类型转换，同时实现归一化：设置每个通道mean值为该通道所有数据的平均值，min值为该通道所有数据零均值化后的最小值，var值为该通道所有数据的最大值减最小值的倒数。
uint8 -> fp16:  output[i][j][k] = (input[i][j][k] – mean[k] – min[k]) * var[k]

转换后的数据类型是由模板参数U配置，U为uint8时数据类型转换功能不生效。

调用数据搬运接口时，目的Tensor的数据类型需要与本接口输出数据类型保持一致，如LoadImageToLocal的dstLocal参数的数据类型。

通道填充：在图片通道方向上padding。默认为模式0。
模式0：将通道padding至32Bytes。即输出数据类型为uint8/int8时，padding至32通道；输出数据类型为fp16时，padding至16通道。

模式1：将通道padding至4通道。

色域转换：RGB格式转换为YUV格式，或YUV模式转换为RGB格式。
$\text{[math]}$

$\text{[math]}$

函数原型

输入图片格式为YUV400、RGB888、XRGB8888

        
             template<typename T, typename U>
void SetAippFunctions(const GlobalTensor<T>& src0, AippInputFormat format, AippParams<U> config)

输入图片格式为YUV420 Semi-Planar

        
             template<typename T, typename U>
void SetAippFunctions(const GlobalTensor<T>& src0, const GlobalTensor<T>& src1, AippInputFormat format, AippParams<U> config)

参数说明

表1 模板参数说明
参数名称	含义
T	输入的数据类型，需要与format中设置的数据类型保持一致。
U	输出的数据类型，需要在搬运接口配置同样的数据类型，如LoadImageToLocal的dstLocal参数数据类型。如果不使能数据类型转换功能，需要与输入类型保持一致；如果使能数据类型转换功能，需要与期望转换后的类型保持一致。

表2 参数说明

参数名称

输入/输出

含义

src0

输入

源图片在Global Memory上的矩阵。

源图片格式为YUV420SP时，表示Y维度在Global Memory上的矩阵。

src1

输入

源图片格式为YUV420SP时，表示UV维度在Global Memory上的矩阵。

源图片格式为其他格式时，该参数无效。

format

输入

源图片的图片格式。AippInputFormat为枚举类型，取值为：

           
                AippInputFormat::YUV420SP_U8  // YUV420 Semi-Planar，数据类型为uint8_t
AippInputFormat::XRGB8888_U8  // XRGB8888，数据类型为uint8_t
AippInputFormat::RGB888_U8  // RGB888，数据类型为uint8_t
AippInputFormat::YUV400_U8  // YUV400，数据类型为uint8_t

config

输入

图片预处理的相关参数，类型为AippParams，结构体具体定义为：

           
                template <typename U>
struct AippParams {
    AippPaddingParams<U> paddingParams;
    AippSwapParams swapParams;
    AippSingleLineParams singleLineParams;
    AippDataTypeConvParams dtcParams;
    AippChannelPaddingParams<U> cPaddingParams;
    AippColorSpaceConvParams cscParams;
};

AippParams结构体内各子结构体定义如下：

数据填充功能相关参数，说明见表3。

             
                  template <typename U>
struct AippPaddingParams {
    uint32_t paddingMode;
    U paddingValueCh0;
    U paddingValueCh1;
    U paddingValueCh2;
};

通道交换功能相关参数，说明见表4。

             
                  struct AippSwapParams {
    bool isSwapRB;
    bool isSwapUV;
    bool isSwapAX;
};

单行读取功能相关参数，说明见表5。

             
                  struct AippSingleLineParams {
    bool isSingleLineCopy;
};

数据类型转换功能相关参数，说明见表6。

             
                  struct AippDataTypeConvParams {
    uint8_t meanValueCh0;
    uint8_t meanValueCh1;
    uint8_t meanValueCh2;
    half minValueCh0;
    half minValueCh1;
    half minValueCh2;
    half varValueCh0;
    half varValueCh1;
    half varValueCh2;
};

通道填充功能相关参数，说明见表7。

             
                  template <typename U>
struct AippChannelPaddingParams {
    uint32_t cPaddingMode;
    U cPaddingValue;
};

色域转换功能相关参数，说明见表8。

             
              
                
                
                  struct AippColorSpaceConvParams {
    bool isEnableCsc;
    int16_t cscMatrixR0C0;
    int16_t cscMatrixR0C1;
    int16_t cscMatrixR0C2;
    int16_t cscMatrixR1C0;
    int16_t cscMatrixR1C1;
    int16_t cscMatrixR1C2;
    int16_t cscMatrixR2C0;
    int16_t cscMatrixR2C1;
    int16_t cscMatrixR2C2;
    uint8_t cscBiasIn0;
    uint8_t cscBiasIn1;
    uint8_t cscBiasIn2;
    uint8_t cscBiasOut0;
    uint8_t cscBiasOut1;
    uint8_t cscBiasOut2;
};

                 

               

             
            

表3 AippPaddingParams结构体内参数说明
参数名称	输入/输出	含义
paddingMode	输入	padding的模式，取值范围[0, 3]，默认值为0。 0：常数填充模式，此模式仅支持左右填充。 1：行列拷贝模式。 2：块拷贝模式。 3：镜像块拷贝模式。
paddingValueCh0	输入	padding区域中channel0填充的数据，仅常数填充模式有效，数据类型为U，默认值为0。
paddingValueCh1	输入	padding区域中channel1填充的数据，仅常数填充模式有效，数据类型为U，默认值为0。
paddingValueCh2	输入	padding区域中channel2填充的数据，仅常数填充模式有效，数据类型为U，默认值为0。
paddingValueCh3	输入	padding区域中channel3填充的数据，仅常数填充模式有效，数据类型为U，默认值为0。

表4 AippSwapParams结构体内参数说明
参数名称	输入/输出	含义
isSwapRB	输入	对于RGB888、XRGB8888格式，是否交换R和B通道。默认值为false。
isSwapUV	输入	对于YUV420SP格式，是否交换U和V通道。默认值为false。
isSwapAX	输入	对于XRGB8888格式，是否将X通道后移，即XRGB->RGBX。默认值为false。

表5 AippSingleLineParams结构体内参数说明
参数名称	输入/输出	含义
isSingleLineCopy	输入	是否开启单行读取模式。开启后，仅从源图片读取一行。默认值为false。

表6 AippDataTypeConvParams结构体内参数说明
参数名称	输入/输出	含义
dtcMeanCh0	输入	计算公式内的mean值，channel0，数据类型为uint8，默认值为0。
dtcMeanCh1	输入	计算公式内的mean值，channel1，数据类型为uint8，默认值为0。
dtcMeanCh2	输入	计算公式内的mean值，channel2，数据类型为uint8，默认值为0。
dtcMinCh0	输入	计算公式内的min值，channel0，数据类型为half，默认值为0。 Atlas 200/500 A2推理产品不支持配置该参数。
dtcMinCh1	输入	计算公式内的min值，channel1，数据类型为half，默认值为0。 Atlas 200/500 A2推理产品不支持配置该参数。
dtcMinCh2	输入	计算公式内的min值，channel2，数据类型为half，默认值为0。 Atlas 200/500 A2推理产品不支持配置该参数。
dtcVarCh0	输入	计算公式内的var值，channel0，数据类型为half，默认值为1.0。
dtcVarCh1	输入	计算公式内的var值，channel1，数据类型为half，默认值为1.0。
dtcVarCh2	输入	计算公式内的var值，channel2，数据类型为half，默认值为1.0。

表7 AippChannelPaddingParams结构体内参数说明
参数名称	输入/输出	含义
cPaddingMode	输入	channel padding的类型，取值范围为[0, 1]，默认值为0。 0：填充到32B。即输出数据类型U为uint8/int8时填充到32通道，为half时填充到16通道。 1：填充到4通道。
cPaddingValue	输入	channel padding填充的值，数据类型为U，默认值为0。

表8 AippColorSpaceConvParams结构体内参数说明
参数名称	输入/输出	含义
isEnableCsc	输入	是否开启色域转换功能，默认值为false。
cscMatrixR0C0	输入	色域转换矩阵cscMatrix[0][0]。
cscMatrixR0C1	输入	色域转换矩阵cscMatrix[0][1]。
cscMatrixR0C2	输入	色域转换矩阵cscMatrix[0][2]。
cscMatrixR1C0	输入	色域转换矩阵cscMatrix[1][0]。
cscMatrixR1C1	输入	色域转换矩阵cscMatrix[1][1]。
cscMatrixR1C2	输入	色域转换矩阵cscMatrix[1][2]。
cscMatrixR2C0	输入	色域转换矩阵cscMatrix[2][0]。
cscMatrixR2C1	输入	色域转换矩阵cscMatrix[2][1]。
cscMatrixR2C2	输入	色域转换矩阵cscMatrix[2][2]。
cscBiasIn0	输入	RGB转YUV偏置cscBiasIn[0]。YUV转RGB时无效。
cscBiasIn1	输入	RGB转YUV偏置cscBiasIn[1]。YUV转RGB时无效。
cscBiasIn2	输入	RGB转YUV偏置cscBiasIn[2]。YUV转RGB时无效。
cscBiasOut0	输入	YUV转RGB偏置cscBiasOut0[0]。RGB转YUV时无效。
cscBiasOut1	输入	YUV转RGB偏置cscBiasOut1[1]。RGB转YUV时无效。
cscBiasOut2	输入	YUV转RGB偏置cscBiasOut2[2]。RGB转YUV时无效。

支持的型号

Atlas推理系列产品AI Core

Atlas A2训练系列产品/Atlas 800I A2推理产品

Atlas 200/500 A2推理产品

注意事项

src0、src1在Global Memory上的地址对齐要求如下：

图片格式	src0	src1
YUV420SP	必须2Bytes对齐	必须2Bytes对齐
XRGB8888	必须4Bytes对齐	-
RGB888	无对齐要求	-
YUV400	无对齐要求	-

返回值

无

调用示例

该调用示例支持的运行平台为Atlas推理系列产品AI Core，示例图片格式为YUV420SP。

        
         
           
           
             #include "kernel_operator.h"

class KernelLoadImage {
public:
    __aicore__ inline KernelLoadImage()
    {
        // YUV420SP 图片中，Y 维度的 size
        gmSrc0Size = srcHorizSize * srcVertSize;
        // YUV420SP 图片中，UV 维度的 size
        gmSrc1Size = (srcHorizSize / 2) * (srcVertSize / 2) * 2;
        dstSize = dstHorizSize * dstVertSize * cSize;
    }
    __aicore__ inline void Init(__gm__ uint8_t *fmGm, __gm__ uint8_t *dstGm)
    {
        fmGlobal.SetGlobalBuffer((__gm__ uint8_t *)fmGm);
        dstGlobal.SetGlobalBuffer((__gm__ int8_t *)dstGm);
        pipe.InitBuffer(inQueueA1, 1, (gmSrc0Size + gmSrc1Size) * sizeof(int8_t));
        pipe.InitBuffer(outQueueUB, 1, dstSize * sizeof(int8_t));
    }
    __aicore__ inline void Process()
    {
        CopyIn();
        CopyToUB();
        CopyOut();
    }
private:
    __aicore__ inline void CopyIn()
    {
        AscendC::LocalTensor<int8_t> featureMapA1 = inQueueA1.AllocTensor<int8_t>();
        uint64_t fm_addr = static_cast<uint64_t>(reinterpret_cast<uintptr_t>(fmGlobal.GetPhyAddr()));
        // aipp config
        AscendC::AippParams<int8_t> aippConfig;
        aippConfig.cPaddingParams.cPaddingMode = cPadMode;
        aippConfig.cPaddingParams.cPaddingValue = cPaddingValue;
        // fmGlobal为整张输入图片，src1参数处填入图片UV维度的起始地址
        AscendC::SetAippFunctions(fmGlobal, fmGlobal[gmSrc0Size], inputFormat, aippConfig);
        AscendC::LoadImageToLocal(featureMapA1, { horizSize, vertSize, horizStartPos, vertStartPos, srcHorizSize, topPadSize, botPadSize, leftPadSize, rightPadSize });
        inQueueA1.EnQue(featureMapA1);
    }
    __aicore__ inline void CopyToUB()
    {
        AscendC::LocalTensor<int8_t> featureMapA1 = inQueueA1.DeQue<int8_t>();
        AscendC::LocalTensor<int8_t> featureMapUB = outQueueUB.AllocTensor<int8_t>();
        AscendC::DataCopy(featureMapUB, featureMapA1, dstSize);
        event_t eventIdMTE1ToMTE3 = static_cast<event_t>(GetTPipePtr()->FetchEventID(AscendC::HardEvent::MTE1_MTE3));
        AscendC::SetFlag<AscendC::HardEvent::MTE1_MTE3>(eventIdMTE1ToMTE3);
        AscendC::WaitFlag<AscendC::HardEvent::MTE1_MTE3>(eventIdMTE1ToMTE3);
        outQueueUB.EnQue<int8_t>(featureMapUB);
        inQueueA1.FreeTensor(featureMapA1);
    }
    __aicore__ inline void CopyOut()
    {
        AscendC::LocalTensor<int8_t> featureMapUB = outQueueUB.DeQue<int8_t>();
        AscendC::DataCopy(dstGlobal, featureMapUB, dstSize);
        outQueueUB.FreeTensor(featureMapUB);
    }
private:
    AscendC::TPipe pipe;
    AscendC::TQue<AscendC::QuePosition::A1, 1> inQueueA1;
    AscendC::TQue<AscendC::QuePosition::VECOUT, 1> outQueueUB;

    AscendC::GlobalTensor<uint8_t> fmGlobal;
    AscendC::GlobalTensor<int8_t> dstGlobal;

    uint16_t horizSize = 32, vertSize = 32, horizStartPos = 0, vertStartPos = 0, srcHorizSize = 32, srcVertSize = 32, leftPadSize = 0, rightPadSize = 0;
    uint32_t dstHorizSize = 32, dstVertSize = 32, cSize = 32;
    uint8_t topPadSize = 0, botPadSize = 0;
    uint32_t gmSrc0Size = 0, gmSrc1Size = 0, dstSize = 0;
    AscendC::AippInputFormat inputFormat = AscendC::AippInputFormat::YUV420SP_U8;
    uint32_t cPadMode = 0;
    int8_t cPaddingValue = 0;
};

extern "C" __global__ __aicore__ void load_image_simple_kernel(__gm__ uint8_t *fmGm, __gm__ uint8_t *dstGm)
{
    KernelLoadImage op;
    op.Init(fmGm, dstGm);
    op.Process();
}

            

          

        
       

父主题： 矩阵计算