Overview

API Differences Between Versions

In this document, the media data processing APIs in the V1 and V2 versions have the same functions such as video encoding and decoding, image encoding and decoding, and image processing. Nevertheless, these two sets of APIs must not be mixed.
  • V2 has more functions than V1. For example:
    • JPEGE: The APIs in the V2 version support advanced parameter configuration, such as Huffman table configuration.
    • VENC: The APIs in the V2 version support more refined configuration of bit rate control parameters and effect tuning, such as the QP of I-/P-frames and macroblock bit rate control.
    • VDEC: The APIs in the V2 version support more refined memory control, such as the setting of the input stream buffer.
    • Video data obtaining (ISP system control, MIPI command, and VI function): supported only by the APIs in the V2 version.
    • VPSS video processing: supported only by the APIs in the V2 version.
    • Audio-related functions, including recording, playing, and volume adjustment: supported only by the APIs in the V2 version.
    • Video data display (VO function and HDMI peripheral): supported only by the APIs in the V2 version.
  • V2 APIs are recommended, which guarantee continuous evolution of API functions and services in later versions.
  • V1 APIs are retained for backward compatibility considerations, but will be deprecated in later versions.

Typical Functions

Figure 1 Image/Video data processing

The following table describes the functions. For details about the functions of different media data processing versions supported by each product model, see Function Support. AIPP is supported by all versions.

Function

Sub-Function Module

Definition

Obtain video data.

Image signal processing (ISP) system control

The system control function is used to register the 3A algorithm, register the sensor driver, initialize the ISP firmware, run the ISP firmware, exit the ISP firmware, and configure the ISP attributes.

MIPI RX ioctl command words

MIPI RX is a collection unit that supports multiple differential video input interfaces. It receives data from the MIPI, LVDS, sub-LVDS, and HiSPI interfaces through the combo PHY. MIPI RX supports data transmission at multiple speeds and resolutions by configuring different function modes and supports multiple external input devices.

Video Input (VI)

The VI module captures video images, performs operations such as cropping, color optimization, brightness optimization, and noise removal on the images, and outputs YUV or RAW images.

Display video data.

VO (Video Output)

The VO module receives the images that have been processed by VPSS, controls the playing of the images, and outputs the images to peripheral video devices based on the configured output protocols (only HDMI is supported now).

The VO module can work with the two-dimensional engine (TDE) module and HiSilicon Framebuffer (HiFB) module to draw graphics and manage graphics layers by leveraging hardware.

High Definition Multimedia Interface (HDMI)

HDMI is a fully digital video/audio interface for transmitting uncompressed audio and video signals.

Two-Dimensional Engine (TDE)

The TDE is a two-dimensional graphics acceleration engine. It uses hardware to provide fast graphics drawing functions for the On Screen Display (OSD) and Graphics User Interface (GUI). The functions include quick copy, quick color filling, and pattern filling. (Currently, only alpha blending is supported.)

HiSilicon Framebuffer (HiFB)

The HiFB is used to manage overlaid graphics layers. It not only provides the basic functions of Linux framebuffer, but also provides extended functions such as modifying the display start position of a graphics layer and inter-layer alpha.

Manage regions.

Region

The overlaid OSD and color blocks on a video are called regions. The Region module is used to manage the region resources in a unified manner. It is used to display specific information (such as the channel ID and PTS) on the video or fill color blocks in the video for covering. Currently, this function must be used together with VPSS.

Process image/video data.

Video Process Sub-System (VPSS)

The VPSS module preprocesses input images in a unified manner, such as denoising, deinterlacing, and cropping, and then processes each channel separately, such as scaling and bordering.

Image/Video data processing

Artificial Intelligence Pre-Processing (AIPP)

AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing).

AIPP supports static and dynamic modes. However, the two modes are mutually exclusive.
  • Static AIPP: If you use this mode and specify the AIPP parameters when converting a model, the AIPP attribute values are saved in the offline model (*.om file) after the model is generated. Fixed AIPP configurations are used in each model inference.

    If the static AIPP mode is used, multiple batches share the same AIPP parameters. The AIPP parameter values are set when the ATC tool is used for model conversion. For details about the ATC tool, see ATC Instructions.

  • Dynamic AIPP: If you use this mode when converting a model, you can set dynamic AIPP parameters before running the model for inference. Then, different AIPP parameters are used in model execution.

    If the dynamic AIPP mode is used, multiple batches can use different AIPP parameters. The AIPP parameter values used by each batch are set through acl API calls. For details, see Dynamic AIPP Model Inference.

Digital Vision Pre-Processing (DVPP)

DVPP is an embedded image processing unit of the Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through media data processing APIs. It delivers the following functions:

  • Vision Preprocessing Core (VPC): processes YUV and RGB images, including resizing, cropping, pyramid, and CSC.
  • JPEG Decoder (JPEGD): decodes images from JPEG to YUV.
  • JPEG Encoder (JPEGE): encodes images from YUV to JPEG.
  • Video Decoder (VDEC): decodes video streams from H.264/H.265 to YUV/RGB.
  • Video Encoder (VENC): encodes video streams from YUV420SP to H.264/H.265.
  • PNG Decoder (PNGD): decodes images from PNG to RGB.

AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making.

For example, in the Atlas 200/300/500 inference products and Atlas training products , DVPP video decoding supports only the output of YUV images. If the model requires RGB images, AIPP is required to perform CSC.

Obtain and output audio data.

Audio Input (AI)

The AI module captures audio data.

Audio Output (AO)

The AO module plays the audio decoded by the ADEC module.

Encode and decode audio data.

Audio Encoder (AENC)

The AENC module encodes the audio obtained by the AI module and outputs audio streams.

Audio Decoder (ADEC)

The ADEC module decodes G.711a, G.711u, and other audio streams and plays audios through the AO module.

Function Support

The following table describes the functions of media data processing V2 supported by each product model.

The meanings of the identifiers are as follows:
  • √: Yes
  • x: Not supported

Model

VPC

JPEGD

JPEGE

PNGD

VDEC

VENC

Video Data Obtaining

VPSS Video Processing

Audio Functions (Recording/Playing/Volume Adjustment)

Video Data Display

Region Management

Atlas training products

x

x

x

x

x

x

x

x

x

x

x

Atlas inference products

x

x

x

x

x

Atlas 200I/500 A2 inference products

Atlas A2 training products / Atlas A2 inference products

x

x

x

x

x

x

Atlas A3 training products / Atlas A3 inference products

x

x

x

x

x

x

Restrictions

When using the APIs described in this chapter, pay attention to the following points:

  • About memory allocation and deallocation:
    1. Before implementing some media data processing functions such as VPC, JPEGD, and JPEGE, you need to allocate memory to store the input or output data. The media data processing functions have higher requirements on the memory for storing the input and output data. Therefore, you need to call dedicated memory allocation APIs. For details, see the restrictions in sections that describe the functions. If multiple functions are used in cascade and the same memory segment needs to be reused, allocate the maximum allowed memory.
    2. The memory allocated in 1 can be used for media data processing and other tasks. For example, the output of media data processing can be used as the input of model inference to implement memory reuse and reduce memory copy.
    3. The address space accessed by media data processing is limited. You are advised to call the other memory allocation APIs, such as aclrtMalloc, to allocate memory for other functions (for example, model loading) to ensure sufficient memory during media data processing.
  • About channel requirements

    Before implementing each function of media data processing, you must call APIs to create corresponding channels. See the channel creation and destruction APIs in Vision Preprocessing Core (VPC), Video Decoder (VDEC) and JPEG Decoder (JPEGD), Video Encoding (VENC) and JPEG Encoder (JPEGE), and PNG Decoder (PNGD) to learn about the API descriptions and the maximum number of channels.

    Channel creation and destruction involve resource allocation and release. Repeated channel creation and destruction, however, affect service performance. Therefore, you are advised to manage channels based on your actual scenario. For example, to process VPC images continuously, create VPC channels, wait until all VPC functions are called, and then destroy the VPC channels.

    Too many channels would affect the CPU usage and memory usage of the device. For details about the number of channels, see the performance specifications in the corresponding function sections.

  • This section describes the structs and enumerations. The reserved fields must be manually set to 0 to avoid incompatibility with future versions.

    The reserved fields in the structs and enumerations contain a _BUTT suffix, for example, HI_COMPRESS_MODE_BUTT.