Basics About Media Data Processing
This section describes the specific functions, API calling processes, and sample code of image/video data processing.
Typical Functions
CANN provides AIPP- and DVPP-based image/video data processing modes. This section focuses on DVPP-based image/video data processing.
|
Processing Mode |
Description |
|---|---|
|
Artificial Intelligence Pre-Processing (AIPP) |
AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing).
Static AIPP and dynamic AIPP modes are supported. However, the two modes are mutually exclusive.
|
|
Digital Vision Pre-Processing (DVPP) |
DVPP is a built-in image processing unit of Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through the pyACL media data processing APIs. It provides the following functions:
NOTE:
AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making. For example, for |
Typical Scenarios
The resolution and format of the source image or video can be processed to meet the model requirements. The following is an example of a typical scenario.
- Video decoding and resizing
The input video is in H.264/H.265 encoding format and the resolution is 1920 x 1080. However, the YOLOv3 model for object detection requires an RGB or YUV input image with the resolution of 416 x 416. In this case, you can process the video as follows.
Figure 1 Video decoding and resizing
- Image decoding, resizing, and format conversion
The input image is in JPEG encoding format and the resolution is 1280 x 720. However, the ResNet-50 model for image classification requires an RGB input image with the resolution of 224 x 224. In this case, you can process the image as follows.
Figure 2 Image decoding, resizing, and format conversion
- Image cropping, resizing, and format conversion
The input image is in YUV420SP format and the resolution is 1280 x 720. However, the ResNet-50 model for image classification requires an RGB input image with the resolution of 224 x 224. In this case, you can process the image as follows.
Figure 3 Image cropping, resizing, and format conversion
Development Workflow of Media Data Processing
- Set up the environment.
For details, see App Development Environment Setup.
- Create code directories.
Before developing an app, you must create directories to store code files, scripts, test images, and model files.
In the model inference scenario, offline model adapted to the Ascend AI Processor (*.om file) is mandatory. For details, see Model Building.
If model inference is involved in an app, you need to build a model.
- Develop an app.
If model inference is involved in the app, write code by referring to Model Inference and Additional Features.
- Run the app. For details, see App Debugging.