Overview

API Differences Between Versions

In this document, the media data processing APIs in the V1 and V2 versions are both used to implement functions such as image cropping, image resizing, and format conversion. However, these two sets of APIs must not be mixed together.
  • V2 has more functions than V1. For example:
    • JPEGE: The APIs in the V2 version support advanced parameter configuration, such as Huffman table configuration.
    • VENC: The APIs in the V2 version support more refined configuration of bit rate control parameters and effect tuning, such as the QP of I-/P-frames and macroblock bit rate control.
    • VDEC: The APIs in the V2 version support more refined memory control, such as the setting of the input stream buffer.
  • The APIs in the V2 version are recommended, which guarantee continuous evolution of API functions and services in later versions.
  • V1 APIs are retained for backward compatibility considerations, but will be deprecated in later versions.

Typical Functions

CANN provides AIPP- and DVPP-based image/video data processing modes. This section focuses on DVPP-based image/video data processing.

Processing Mode

Description

Artificial Intelligence Pre-Processing (AIPP)

AIPP implements functions on the AI Core, including image resizing (such as cropping and padding), CSC, mean subtraction, and factor multiplication (for pixel changing).

Static AIPP and dynamic AIPP modes are supported. However, the two modes are mutually exclusive.
  • Static AIPP: If you use this mode and specify the AIPP parameters when converting a model, the AIPP attribute values are saved in the offline model (.om file) after the model is generated. Fixed AIPP configurations are used in each model inference.

    In static AIPP mode, batches share the same set of AIPP parameters. The AIPP parameters are set when the ATC tool is used for model conversion. For details about the ATC tool, see ATC Instructions.

  • Dynamic AIPP: During model conversion, specify the AIPP mode to dynamic, and set different sets of dynamic AIPP parameters as required. In this way, different sets of parameters can be used for model inference.

    If the dynamic AIPP mode is used, multiple batches can use different AIPP parameters. The AIPP parameter values used by each batch are set by calling pyACL APIs. For details, see Dynamic AIPP Model Inference.

Digital Vision Pre-Processing (DVPP)

DVPP is a built-in image processing unit of Ascend AI Processor. It provides powerful hardware acceleration capabilities for media processing through the pyACL media data processing APIs. It provides the following functions:

  • Vision Preprocessing Core (VPC): Processes YUV and RGB images, including resizing, cropping, and CSC.
  • JPEG Decoder (JPEGD): Decodes images from JPEG to YUV.
  • JPEG Encoder (JPEGE): Encodes images from YUV to JPEG.
  • Video Decoder (VDEC): Decodes video streams from H.264/H.265 to YUV/RGB.
  • Video Encoder (VENC): Encodes video streams from YUV420SP to H.264/H.265.
  • PNG Decoder (PNGD): Decodes images from PNG to RGB.
NOTE:

AIPP and DVPP can be used separately or together. In combined applications, DVPP is used first to decode, crop, and resize images or videos. However, due to DVPP hardware restrictions, the image format and resolution after DVPP may not meet the model requirements. Therefore, AIPP is required to further perform color space conversion (CSC), image cropping, and border making.

For example, for Atlas 200/300/500 Inference Product and Atlas Training Series Product , because DVPP video decoding supports only YUV images, if the model requires RGB images, AIPP color space conversion is required.

Function Support

The following table describes the supported functions of media data processing V2

Product

VPC

JPEGD

JPEGE

PNGD

VDEC

VENC

Atlas 200/300/500 Inference Product

x

x

x

x

x

x

Atlas Training Series Product

x

x

x

x

x

x

Restrictions

When using the APIs described in this chapter, pay attention to the following points:

  • About memory allocation and deallocation
    1. If memory is needed to store the input or output data before implementing the VPC, JPEGD, and JPEGE functions for media data processing, call acl.himpi.dvpp_malloc to allocate memory and acl.himpi.dvpp_free to free up memory.
    2. The memory allocated in 1 can be used for media data processing and other tasks. For example, the output of media data processing can be used as the input of model inference to implement memory reuse and reduce memory copy.
    3. Because the address space accessed by media data processing is limited, you are advised to call acl.rt.malloc, acl.rt.malloc_host, or acl.rt.malloc_cached described in section "Memory Management" to allocate memory for other functions (for example, model loading) to ensure sufficient memory during media data processing.
  • About channel requirements

    Before implementing each function of media data processing, you must call APIs to create corresponding channels. See the channel creation and destruction APIs in VPC, VDEC/JPEGD APIs, VENC/JPEGE APIs, and PNGD APIs to learn about the API descriptions and the maximum number of channels.

    Channel creation and destruction involve resource allocation and release. Repeated channel creation and destruction, however, affect service performance. Therefore, you are advised to manage channels based on your actual scenario. For example, to process VPC images continuously, create VPC channels, wait until all VPC functions are called, and then destroy the VPC channels.

    A too large number of channels would affect the CPU usage and memory usage of the device. For details about the number of channels, see the performance specifications in the corresponding function sections.