JPEGD+VPC+Model Inference (Atlas 200/300/500 Inference Product)

When image decoding, image cropping/resizing, and model inference functions are cascaded, the inference accuracy may be compromised due to API calls or configuration. This section provides some suggestions for this case.

Symptom

When the JPEG Decoder (JPEGD), Vision Preprocessing Core (VPC), and model inference functions are cascaded, the network inference accuracy may deteriorate due to the deviation between the JPEGD and VPC or between the VPC and model inference caused by the width or height alignment and output image format configuration issues.

Accuracy Optimization Suggestions

When the JPEGD, VPC, and model inference functions are cascaded, the possible issues and countermeasures are as follows:

During JPEGD+VPC cascading:
Width stride x height stride of the JPEGD output image must be a multiple of 128 x 16, so there are some invalid paddings. Before calling acldvppVpcResizeAsync, call acldvppSetPicDescWidth and acldvppSetPicDescHeight to correctly set the original width and height of the input image, based on which VPC automatically crops and resizes the image to eliminate the impact of invalid data on image accuracy.

To improve model inference accuracy, you are advised to write the code logic by referring to the positive examples shown in Figure 1 and Figure 2

According to the negative example in Figure 1, after JPEGD decoding, the aligned width and height are directly sent to VPC. As a result, the input image for model inference has invalid data, which may affect the accuracy.

For details about the API call sequences of JPEGD and VPC, see JPEGD API Call Sequence and VPC API Call Sequence.
During the use of VPC:
- The image cropping and resizing functions of VPC can be implemented by using the acldvppVpcCropAsync or acldvppVpcBatchCropAsync call. Width stride x height stride of the output image must be a multiple of 16 x 2. Otherwise, an error is returned.
  You can call acldvppVpcCropAsync for image cropping and acldvppVpcResizeAsync for image resizing. acldvppVpcCropAsync also applies to the scenario where image cropping and resizing are cascaded, which provides better performance.
- The image cropping, resizing, and pasting functions of VPC can be implemented by using the acldvppVpcCropAndPasteAsync or acldvppVpcBatchCropAndPasteAsync call. Width stride x height stride of the output image must be a multiple of 16 x 2. Otherwise, an error is returned.
- In the event of VPC resizing+pasting cascading, if the width of the resized image is not a multiple of 16, padding is performed (see the negative example in Figure 2) to meet the alignment requirement. To prevent invalid data from affecting the inference accuracy, you are advised to resize the image to a multiple of 16 x 2 by referring to the positive example in Figure 2. As for VPC resizing, if VPC performs image resizing with the aspect ratio preserved, the output resolution will be 238 x 416, which is not a multiple of 16. To prevent the invalid data from affecting the inference accuracy, you are advised to resize the image to 240 x 416.
- For VPC pasting, the left offset of the paste ROI relative to the output canvas must be a multiple of 16. For details, see the positive example in Figure 2. For resizing with the aspect ratio preserved on object detection networks, if the left offset of the paste ROI relative to the output canvas (d3) is a multiple of 16, the paste ROI may not be in the center of the output canvas. In this case, pay attention to the following: Distance between the BBox and the left border of the paste ROI (d1) = Distance between the BBox and the left border of the output canvas (d2 – d3)
  For example, as shown in Figure 2, the output image is 416 x 416, the paste ROI is 240 x 416, and the left offset of the paste ROI relative to the output canvas is (416 – 240)/2 = 88. However, 88 is not a multiple of 16 and needs to be rounded up to 96. In this case, d1 should be calculated as d2 – 96, instead of d2 – 88.
During model inference:
If hardware-based AIPP is required to perform CSC, the source image format in the AIPP CSC configuration must be the same as the VPC output format. Assume that the VPC output format is YUV420SP. If YVU420SP to RGB888 CSC is configured, model inference will suffer accuracy drop.

For the API call sequence of model inference, see Model Inference.

For details about the CSC configuration, see ATC Instructions.

To verify whether the accuracy meets the requirements on Ascend, it is recommended that data preprocessing before model inference be the same as that during model training. For details, see the sample code in the ModelZoo repository.

Typical Cases

Figure 1 Image classification network

Figure 2 Object detection network

Sample Code

Sample code for image classification networks: Link.

Sample code for object detection networks: Link.

Parent topic: Accuracy Optimization Suggestions