Basics About Media Data Processing
This section describes the functions, API calling processes, and sample code of image/video/audio data processing.
Typical Functions

The following table describes the functions. For details about the media data processing functions supported by each product model, see Function Support of Different Versions. The current AIPP versions support all the functions.
Function |
Sub-Function Module |
Definition |
|---|---|---|
Obtain video data. |
Image signal processing (ISP) system control |
The system control function is used to register the 3A algorithm, register the sensor driver, initialize the ISP firmware, run the ISP firmware, exit the ISP firmware, and configure the ISP attributes. |
MIPI RX ioctl command words |
MIPI RX is a collection unit that supports multiple differential video input interfaces. It receives data from the MIPI, LVDS, sub-LVDS, and HiSPI interfaces through the combo PHY. MIPI RX supports data transmission at multiple speeds and resolutions by configuring different function modes and supports multiple external input devices. |
|
Video Input (VI) |
The VI module captures video images, performs operations such as cropping, stabilization, color optimization, brightness optimization, and noise removal on the images, and outputs YUV or RAW images. |
|
Display video data. |
VO (Video Output) |
The VO module receives the images that have been processed by VPSS, controls the playing of the images, and outputs the images to peripheral video devices based on the configured output protocols (only HDMI is supported now). The VO module can work with the two-dimensional engine (TDE) module and HiSilicon Framebuffer (HiFB) module to draw graphics and manage graphics layers by leveraging hardware. |
High Definition Multimedia Interface (HDMI) |
HDMI is a fully digital video/audio interface for transmitting uncompressed audio and video signals. |
|
Two-Dimensional Engine (TDE) |
The TDE is a two-dimensional graphics acceleration engine. It uses hardware to provide fast graphics drawing functions for the On Screen Display (OSD) and Graphics User Interface (GUI). The functions include quick copy, quick color filling, and pattern filling. (Currently, only alpha blending is supported.) |
|
HiSilicon Framebuffer (HiFB) |
The HiFB is used to manage overlaid graphics layers. It not only provides the basic functions of Linux framebuffer, but also provides extended functions such as modifying the display start position of a graphics layer and inter-layer alpha. |
|
Manage regions. |
Region |
The overlaid OSD and color blocks on a video are called regions. The Region module is used to manage the region resources in a unified manner. It is used to display specific information (such as the channel ID and PTS) on the video or fill color blocks in the video for covering. Currently, this function must be used together with VPSS. |
Process image/video data. |
Video Process Sub-System (VPSS) |
The VPSS module preprocesses input images in a unified manner, such as denoising, deinterlacing, and cropping, and then processes each channel separately, such as scaling and bordering. |
Artificial Intelligence Pre-Processing (AIPP) |
AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing). AIPP supports static and dynamic modes. However, the two modes are mutually exclusive.
|
|
Digital Vision Pre-Processing (DVPP) |
DVPP is an embedded image processing unit of the Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through AscendCL APIs. It delivers the following functions:
NOTE:
AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making. For example, in the |
|
Obtain and output audio data. |
Audio Input (AI) |
The AI module captures audio data. |
Audio Output (AO) |
The AO module plays the audio decoded by the ADEC module. |
|
Encode and decode audio data. |
Audio Encoder (AENC) |
The AENC module encodes the audio obtained by the AI module and outputs audio streams. |
Audio Decoder (ADEC) |
The ADEC module decodes G.711a, G.711u, and other audio streams and plays audios through the AO module. |
Typical Scenarios
The resolution and format of the source image or video can be processed to meet the model requirements. The following is an example of a typical scenario.
- Video decoding and resizing
The input video is in H.264 encoding format and the resolution is 1920 x 1080. However, the YOLOv3 model for object detection requires an RGB or YUV input image with the resolution of 416 x 416. In this case, you can process the video as follows.
Figure 2 Video decoding and resizing
- Image decoding, resizing, and format conversion
The input image is in JPEG encoding format and the resolution is 1280 x 720. However, the ResNet-50 model for image classification requires an RGB input image with the resolution of 224 x 224. In this case, you can process the image as follows.
Figure 3 Image decoding, resizing, and format conversion
- Image cropping, resizing, and format conversion
The input image is in YUV420SP format and the resolution is 1280 x 720. However, the ResNet-50 model for image classification requires an RGB input image with the resolution of 224 x 224. In this case, you can process the image as follows.
Figure 4 Image cropping, resizing, and format conversion
Development Workflow of Media Data Processing

- Set up the environment.
For details, see Development and Operating Environment Setup.
- Create code directories.
Create directories to store code files, build scripts, test images, and model files.
The following is an example:
├App name ├── model // Model files │ ├── xxx.json ├── data │ ├── xxxxxx // Test data ├── inc // Header files that declare functions │ ├── xxx.h ├── out // Output files ├── src │ ├── xxx.json // Configuration files for system initialization │ ├── CMakeLists.txt // Build scripts │ ├── xxx.cpp // Implementation files
- (Optional) Build a model.
For model inference, the offline model adapted to the Ascend AI Processor (*.om file) is required. For details, see Building a Model.
If model inference is involved in an app, you need to build a model.
- Develop an app.
For details about the required header files and library files, see Dependent Header Files and Library Files.
If model inference is involved in an app, write code by referring to Inference with Single-Batch and Static-Shape Inputs and Additional Features.
- Build and run the app. For details, see App Build and Run.