Development Using APIs (Python)

The samples described in this section apply to the Atlas inference product and Atlas 200I/500 A2 inference product .

Sample Overview

The following uses Atlas inference product as an example to demonstrate how to use Vision SDK Python APIs to develop an image classification application. Figure 1 shows the inference process of an image classification model. A ResNet-50 model under the Caffe framework is used as an example.

Figure 1 Workflow of the image classification model

Preparations

Install and deploy Vision SDK, and then perform the quick start sample.

Table 1 Required dependencies

Dependency

Version

Link

For details, see Supported Hardware and Operating Systems.

System dependency

Ubuntu or CentOS

CANN development kit

8.1.RC1

Click here to download CANN.

npu-driver

Ascend HDK 25.0.RC1

Click here. In the Select Resource area on the left, filter the required software packages, confirm the version information, and download the software packages.

For details, see Driver and Firmware Installation and Upgrade Guides of each hardware product.

npu-firmware

Ascend HDK 25.0.RC1

NumPy

1.25.2

             
                  pip3 install numpy==1.25.2

opencv-python

4.9.0.80

             
                  pip3 install opencv-python==4.9.0.80

Python

3.9.2

You are advised to obtain the source package for compilation and installation. For details about the installation procedure, see Installing Python.

Obtain the sample code.
Click here to obtain the sample code package.
Log in to the development environment where Vision SDK is installed, and upload the sample code package.

Decompress the sample code package and go to the decompressed directory.

        
             unzip resnet50_sdk_python_sample.zip
cd resnet50_sdk_python_sample

The following is an example of the directory structure of the sample code:

        
             |-- resnet50_sdk_python_sample
|   |-- main.py
|   |-- README.md
|   |-- run.sh          # Script for running the program. It is recommended that you use the dos2unix tool to run the dos2unix run.sh command to format the script before running the program.
|   |-- data
|   |   |-- test.jpg    # Test image
|   |-- model
|   |   |-- resnet50.caffemodel
|   |   |-- resnet50.prototxt
|   |-- utils
|   |   |-- resnet50.cfg
|   |   |-- resnet50_clsidx_to_labels.names

Prepare image data for inference.
You can either use the test.jpg image in the sample or other images for the test (change the name of your image to test.jpg).

Figure 2 test.jpg

Code Parsing

In this sample, the key steps and code are as follows which cannot be directly copied for compilation or running. For details about the complete sample code, see the sample file.

Input the third-party library required by the sample and the files required for Vision SDK model inference to the main.py file.

        
             import numpy as np # Calculate multi-dimensional arrays.
import cv2 # Third-party library, which is used to perform preprocessing and postprocessing on images.

from mindx.sdk import Tensor  # Tensor data structure in Vision SDK
from mindx.sdk import base  # Vision SDK inference API
from mindx.sdk.base import post  # post.Resnet50PostProcess is a ResNet50 postprocessing API.

The program main process is as follows:

         
              if __name__ == "__main__":
    base.mx_init()   # Initialize the Vision SDK resource.
    process()        # Main logic of the program
    base.mx_deinit() # Deinitialize the Vision SDK resource.

Configure model-related variables, such as paths of the image, model, configuration file, and label.

        
             '''Configure model-related variables.'''
pic_path = 'data/test.jpg'  # Single image
model_path = "model/resnet50.om"  # Model path
device_id = 0  # Specify the device for calculation.
config_path='utils/resnet50.cfg'  # Postprocessing configuration file
label_path='utils/resnet50_clsidx_to_labels.names'  # Class label file
img_size = 256

Preprocess the input data. Use OpenCV to read the image to obtain a three-dimensional array. Crop and resize the image, convert the color space, and convert the image into the data format (tensor type) required for inference.

        
             '''Preprocessing'''
img_bgr = cv2.imread(pic_path)
img_rgb = img_bgr[:,:,::-1]
img = cv2.resize(img_rgb, (img_size, img_size)) # Resize the image to the target size.
hw_off = (img_size - 224) // 2  # Split the image and obtain the middle area.
crop_img = img[hw_off:img_size - hw_off, hw_off:img_size - hw_off, :]
img = crop_img.astype("float32")  # Convert the image to the float32 data type.
img[:, :, 0] -= 104  # Constants 104, 117, and 123 are used to convert the image to the color space required by the Caffe model.
img[:, :, 1] -= 117
img[:, :, 2] -= 123
img = np.expand_dims(img, axis=0)  # Extend the first dimension to adapt to model input.
img = img.transpose([0, 3, 1, 2])  # Convert (batch,height,width,channels) to (batch,channels,height,width).
img = np.ascontiguousarray(img)  # Arrange the memory continuously.
img = Tensor(img) # Convert NumPy to the tensor class.

Call the model.infer() API to perform model inference and obtain the model output result.

        
             '''Model inference'''
model = base.model(modelPath=model_path, deviceId=device_id)  # Initialize the base.model class.
output = model.infer([img])[0]  # Perform inference. Input data type: List[base.Tensor]. List[base.Tensor] output by model inference is returned.

Postprocess the model output. The postprocessing module provided by Vision SDK can be used to obtain the prediction class and its confidence score, and display them on the source image.

        
             '''Postprocessing'''
postprocessor = post.Resnet50PostProcess(config_path=config_path, label_path=label_path)  # Obtain the postprocessing object.
pred = postprocessor.process([output])[0][0]  # Use the Vision SDK API to perform postprocessing. pred: <ClassInfo classId=... confidence=... className=...>
confidence = pred.confidence  # Obtain the class confidence score.
className = pred.className  # Obtain the class name.
print('{}: {}'.format(className, confidence))  # Print the result.

'''Save the inference image.'''
img_res = cv2.putText(img_bgr, f'{className}: {confidence:.2f}', (20, 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 1)  # Add the predicted class and confidence score to the image.
cv2.imwrite('result.png', img_res)
print('save infer result success')

Inference Running

Configure environment variables. (The default CANN installation path /usr/local/Ascend/cann and the Vision SDK installation path /usr/local/Ascend/mxVision-{version} are used as examples.)

        
             source /usr/local/Ascend/cann/set_env.sh
source /usr/local/Ascend/mxVision-{version}/set_env.sh

Perform inference.

        
             bash run.sh

If the following information is displayed, the running is successful:

        
             Standard Poodle: 0.98583984375
save infer result success

After the inference is complete, a result.png file is generated in the current folder. The image result, as shown in Figure 3, displays the class label and confidence score of the image.

Figure 3 result.png

Parent topic: Quick Start