Overview

API Differences Between Versions

In this document, the media data processing APIs in the V1 and V2 versions have the same functions such as video encoding and decoding, image encoding and decoding, and image processing. Nevertheless, these two sets of APIs must not be mixed together.
  • V2 has more functions than V1. For example:
    • JPEGE: The APIs in the V2 version support advanced parameter configuration, such as Huffman table configuration.
    • VENC: The APIs in the V2 version support more refined configuration of bit rate control parameters and effect tuning, such as the QP of I-/P-frames and macroblock bit rate control.
    • VDEC: The APIs in the V2 version support more refined memory control, such as the setting of the input stream buffer.
  • V2 APIs are recommended, which guarantee continuous evolution of API functions and services in later versions.
  • V1 APIs are retained for backward compatibility considerations, but will be deprecated in later versions.

Typical Functions of Image/Video/Audio Data Processing

Figure 1 Image/Video data processing

The following table describes the functions. For details about the media data processing functions supported by each product model, see Function Support. The current AIPP versions support all the functions.

Function

Sub-Function Module

Definition

Obtain video data.

Image signal processing (ISP) system control

The system control function is used to register the 3A algorithm, register the sensor driver, initialize the ISP firmware, run the ISP firmware, exit the ISP firmware, and configure the ISP attributes.

MIPI RX ioctl command words

MIPI RX is a collection unit that supports multiple differential video input interfaces. It receives data from the MIPI, LVDS, sub-LVDS, and HiSPI interfaces through the combo PHY. MIPI RX supports data transmission at multiple speeds and resolutions by configuring different function modes and supports multiple external input devices.

Video Input (VI)

The VI module captures video images, performs operations such as cropping, stabilization, color optimization, brightness optimization, and noise removal on the images, and outputs YUV or RAW images.

Display video data.

VO (Video Output)

The VO module receives the images that have been processed by VPSS, controls the playing of the images, and outputs the images to peripheral video devices based on the configured output protocols (only HDMI is supported now).

The VO module can work with the two-dimensional engine (TDE) module and HiSilicon Framebuffer (HiFB) module to draw graphics and manage graphics layers by leveraging hardware.

High Definition Multimedia Interface (HDMI)

HDMI is a fully digital video/audio interface for transmitting uncompressed audio and video signals.

Two-Dimensional Engine (TDE)

The TDE is a two-dimensional graphics acceleration engine. It uses hardware to provide fast graphics drawing functions for the On Screen Display (OSD) and Graphics User Interface (GUI). The functions include quick copy, quick color filling, and pattern filling. (Currently, only alpha blending is supported.)

HiSilicon Framebuffer (HiFB)

The HiFB is used to manage overlaid graphics layers. It not only provides the basic functions of Linux framebuffer, but also provides extended functions such as modifying the display start position of a graphics layer and inter-layer alpha.

Manage regions.

Region

The overlaid OSD and color blocks on a video are called regions. The Region module is used to manage the region resources in a unified manner. It is used to display specific information (such as the channel ID and PTS) on the video or fill color blocks in the video for covering. Currently, this function must be used together with VPSS.

Process image/video data.

Video Process Sub-System (VPSS)

The VPSS module preprocesses input images in a unified manner, such as denoising, deinterlacing, and cropping, and then processes each channel separately, such as scaling and bordering.

Artificial Intelligence Pre-Processing (AIPP)

AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing).

AIPP supports static and dynamic modes. However, the two modes are mutually exclusive.
  • Static AIPP: If you use this mode and specify the AIPP parameters when converting a model, the AIPP attribute values are saved in the offline model (*.om file) after the model is generated. Fixed AIPP configurations are used in each model inference.

    If the static AIPP mode is used, multiple batches share the same AIPP parameters. The AIPP parameter values are set when the ATC tool is used for model conversion. For details about the ATC tool, see ATC Instructions.

  • Dynamic AIPP: If you use this mode when converting a model, you can set dynamic AIPP parameters before running the model for inference. Then, different AIPP parameters are used in model execution.

    If the dynamic AIPP mode is used, multiple batches can use different AIPP parameters. The AIPP parameter values used by each batch are set by calling AscendCL APIs. For details, see Dynamic AIPP Model Inference.

Digital Vision Pre-Processing (DVPP)

DVPP is an embedded image processing unit of the Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through AscendCL APIs. It delivers the following functions:

  • Vision Preprocessing Core (VPC): processes YUV and RGB images, including resizing, cropping, pyramid, and CSC.
  • JPEG Decoder (JPEGD): decodes images from JPEG to YUV.
  • JPEG Encoder (JPEGE): encodes images from YUV to JPEG.
  • Video Decoder (VDEC): decodes video streams from H.264/H.265 to YUV/RGB.
  • Video Encoder (VENC): encodes video streams from YUV420SP to H.264/H.265.
  • PNG Decoder (PNGD): decodes images from PNG to RGB.
NOTE:

AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making.

For example, in the Atlas 200/300/500 Inference Product and Atlas Training Series Product , DVPP video decoding supports only the output of YUV images. If the model requires RGB images, AIPP is required to perform CSC.

Obtain and output audio data.

Audio Input (AI)

The AI module captures audio data.

Audio Output (AO)

The AO module plays the audio decoded by the ADEC module.

Encode and decode audio data.

Audio Encoder (AENC)

The AENC module encodes the audio obtained by the AI module and outputs audio streams.

Audio Decoder (ADEC)

The ADEC module decodes G.711a, G.711u, and other audio streams and plays audios through the AO module.

Function Support

The following table describes the functions of media data processing V1 supported by each product model.

The meanings of the identifiers are as follows:
  • √: Yes
  • x: Not supported

Model

VPC

JPEGD

JPEGE

PNGD

VDEC

VENC

Atlas 200/300/500 Inference Product

x

x

x

x

x

x

Atlas Training Series Product

x

x

x

x

x

x

Restrictions

When using the APIs described in this chapter, pay attention to the following points:

  • About memory allocation and deallocation:
    1. Before implementing some media data processing functions such as VPC, JPEGD, and JPEGE, you need to allocate memory to store the input or output data. The media data processing functions have higher requirements on the memory for storing the input and output data. Therefore, you need to call dedicated memory allocation APIs. For details, see the restrictions in sections that describe the functions. If multiple functions are used in cascade and the same memory segment needs to be reused, allocate the maximum allowed memory.
    2. The memory allocated in 1 can be used for media data processing and other tasks. For example, the output of media data processing can be used as the input of model inference to implement memory reuse and reduce memory copy.
    3. Because the address space accessed by media data processing is limited, you are advised to call the APIs under Memory Management (such as aclrtMalloc and aclrtMallocHost) to allocate memory for other functions (such as model loading) to ensure sufficient memory during media data processing.
  • About channel requirements

    Before implementing each function of media data processing, you must call APIs to create corresponding channels. See the channel creation and destruction APIs in VPC, VDEC/JPEGD, VENC/JPEGE, and PNGD to learn about the API descriptions and the maximum number of channels.

    Channel creation and destruction involve resource allocation and release. Repeated channel creation and destruction, however, affect service performance. Therefore, you are advised to manage channels based on your actual scenario. For example, to process VPC images continuously, create VPC channels, wait until all VPC functions are called, and then destroy the VPC channels.

    A too large number of channels would affect the CPU usage and memory usage of the device. For details about the number of channels, see the performance specifications in the corresponding function sections.

  • This section describes the structs and enumerations. The reserved fields must be manually set to 0 to avoid incompatibility with future versions.

    The reserved fields in the structs and enumerations contain a _BUTT suffix, for example, HI_COMPRESS_MODE_BUTT.