Basics About Media Data Processing

This section describes the specific functions, API calling processes, and sample code of image/video data processing.

Typical Functions

CANN provides AIPP- and DVPP-based image/video data processing modes. This section focuses on DVPP-based image/video data processing.

Processing Mode	Description
Artificial Intelligence Pre-Processing (AIPP)	AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing). Static AIPP and dynamic AIPP modes are supported. However, the two modes are mutually exclusive. Static AIPP: If you use this mode and specify the AIPP parameters when converting a model, the AIPP attribute values are saved in the offline model (.om file) after the model is generated. Fixed AIPP configurations are used in each model inference. In static AIPP mode, batches share the same set of AIPP parameters. The AIPP parameters are set when the ATC tool is used for model conversion. For details about the ATC tool, see ATC Instructions. Dynamic AIPP: During model conversion, specify the AIPP mode to dynamic, and set different sets of dynamic AIPP parameters as required. In this way, different sets of parameters can be used for model inference. If the dynamic AIPP mode is used, multiple batches can use different AIPP parameters. The AIPP parameter values used by each batch are set by calling pyACL APIs. For details, see Dynamic AIPP Model Inference.
Digital Vision Pre-Processing (DVPP)	DVPP is a built-in image processing unit of Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through the pyACL media data processing APIs. It provides the following functions: Vision Preprocessing Core (VPC): Processes YUV and RGB images, including resizing, cropping, and CSC. JPEG Decoder (JPEGD): Decodes images from JPEG to YUV. JPEG Encoder (JPEGE): Encodes images from YUV to JPEG. Video Decoder (VDEC): Decodes video streams from H.264/H.265 to YUV/RGB. Video Encoder (VENC): Encodes video streams from YUV420SP to H.264/H.265. PNG Decoder (PNGD): Decodes images from PNG to RGB. NOTE: AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making. For example, for Atlas 200/300/500 Inference Product and Atlas Training Series Product , because DVPP video decoding supports only YUV images, if the model requires RGB images, AIPP color space conversion is required.

Processing Mode

Description

Artificial Intelligence Pre-Processing (AIPP)

AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing).

Static AIPP and dynamic AIPP modes are supported. However, the two modes are mutually exclusive.

Static AIPP: If you use this mode and specify the AIPP parameters when converting a model, the AIPP attribute values are saved in the offline model (.om file) after the model is generated. Fixed AIPP configurations are used in each model inference.
In static AIPP mode, batches share the same set of AIPP parameters. The AIPP parameters are set when the ATC tool is used for model conversion. For details about the ATC tool, see ATC Instructions.
Dynamic AIPP: During model conversion, specify the AIPP mode to dynamic, and set different sets of dynamic AIPP parameters as required. In this way, different sets of parameters can be used for model inference.
If the dynamic AIPP mode is used, multiple batches can use different AIPP parameters. The AIPP parameter values used by each batch are set by calling pyACL APIs. For details, see Dynamic AIPP Model Inference.

Digital Vision Pre-Processing (DVPP)

DVPP is a built-in image processing unit of Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through the pyACL media data processing APIs. It provides the following functions:

Vision Preprocessing Core (VPC): Processes YUV and RGB images, including resizing, cropping, and CSC.
JPEG Decoder (JPEGD): Decodes images from JPEG to YUV.
JPEG Encoder (JPEGE): Encodes images from YUV to JPEG.
Video Decoder (VDEC): Decodes video streams from H.264/H.265 to YUV/RGB.
Video Encoder (VENC): Encodes video streams from YUV420SP to H.264/H.265.
PNG Decoder (PNGD): Decodes images from PNG to RGB.

NOTE:

AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making.

For example, for Atlas 200/300/500 Inference Product and Atlas Training Series Product , because DVPP video decoding supports only YUV images, if the model requires RGB images, AIPP color space conversion is required.

Typical Scenarios

The resolution and format of the source image or video can be processed to meet the model requirements. The following is an example of a typical scenario.

Video decoding and resizing
The input video is in H.264/H.265 encoding format and the resolution is 1920 x 1080. However, the YOLOv3 model for object detection requires an RGB or YUV input image with the resolution of 416 x 416. In this case, you can process the video as follows.

Figure 1 Video decoding and resizing
Image decoding, resizing, and format conversion
The input image is in JPEG encoding format and the resolution is 1280 x 720. However, the ResNet-50 model for image classification requires an RGB input image with the resolution of 224 x 224. In this case, you can process the image as follows.

Figure 2 Image decoding, resizing, and format conversion
Image cropping, resizing, and format conversion
The input image is in YUV420SP format and the resolution is 1280 x 720. However, the ResNet-50 model for image classification requires an RGB input image with the resolution of 224 x 224. In this case, you can process the image as follows.

Figure 3 Image cropping, resizing, and format conversion

Development Workflow of Media Data Processing

Figure 4 Development workflow

Set up the environment.
For details, see App Development Environment Setup.
Create code directories.
Before developing an app, you must create directories to store code files, scripts, test images, and model files.

In the model inference scenario, offline model adapted to the Ascend AI Processor (*.om file) is mandatory. For details, see Model Building.

If model inference is involved in an app, you need to build a model.
Develop an app.
If model inference is involved in the app, write code by referring to Model Inference and Additional Features.
Run the app. For details, see App Debugging.

Parent topic: DVPP Image/Video Processing (Media Data Processing)