Development Using APIs (Python)
The samples described in this section apply to the Atlas inference product and
Sample Overview
The following uses Atlas inference product as an example to demonstrate how to use Vision SDK Python APIs to develop an image classification application. Figure 1 shows the inference process of an image classification model. A ResNet-50 model under the Caffe framework is used as an example.
Preparations
- Install and deploy Vision SDK, and then perform the quick start sample.
Table 1 Required dependencies Dependency
Version
Link
OS
For details, see Supported Hardware and Operating Systems.
-
System dependency
-
CANN development kit
8.1.RC1
npu-driver
Ascend HDK 25.0.RC1
Click here. In the Select Resource area on the left, filter the required software packages, confirm the version information, and download the software packages.
For details, see Driver and Firmware Installation and Upgrade Guides of each hardware product.
npu-firmware
Ascend HDK 25.0.RC1
NumPy
1.25.2
1pip3 install numpy==1.25.2
opencv-python
4.9.0.80
1pip3 install opencv-python==4.9.0.80
Python
3.9.2
You are advised to obtain the source package for compilation and installation. For details about the installation procedure, see Installing Python.
- Obtain the sample code.
- Log in to the development environment where Vision SDK is installed, and upload the sample code package.
- Decompress the sample code package and go to the decompressed directory.
1 2
unzip resnet50_sdk_python_sample.zip cd resnet50_sdk_python_sample
The following is an example of the directory structure of the sample code:
1 2 3 4 5 6 7 8 9 10 11 12
|-- resnet50_sdk_python_sample | |-- main.py | |-- README.md | |-- run.sh # Script for running the program. It is recommended that you use the dos2unix tool to run the dos2unix run.sh command to format the script before running the program. | |-- data | | |-- test.jpg # Test image | |-- model | | |-- resnet50.caffemodel | | |-- resnet50.prototxt | |-- utils | | |-- resnet50.cfg | | |-- resnet50_clsidx_to_labels.names
- Prepare image data for inference.
You can either use the test.jpg image in the sample or other images for the test (change the name of your image to test.jpg).
Figure 2 test.jpg
Code Parsing
In this sample, the key steps and code are as follows which cannot be directly copied for compilation or running. For details about the complete sample code, see the sample file.
- Input the third-party library required by the sample and the files required for Vision SDK model inference to the main.py file.
1 2 3 4 5 6
import numpy as np # Calculate multi-dimensional arrays. import cv2 # Third-party library, which is used to perform preprocessing and postprocessing on images. from mindx.sdk import Tensor # Tensor data structure in Vision SDK from mindx.sdk import base # Vision SDK inference API from mindx.sdk.base import post # post.Resnet50PostProcess is a ResNet50 postprocessing API.
The program main process is as follows:1 2 3 4
if __name__ == "__main__": base.mx_init() # Initialize the Vision SDK resource. process() # Main logic of the program base.mx_deinit() # Deinitialize the Vision SDK resource.
- Configure model-related variables, such as paths of the image, model, configuration file, and label.
1 2 3 4 5 6 7
'''Configure model-related variables.''' pic_path = 'data/test.jpg' # Single image model_path = "model/resnet50.om" # Model path device_id = 0 # Specify the device for calculation. config_path='utils/resnet50.cfg' # Postprocessing configuration file label_path='utils/resnet50_clsidx_to_labels.names' # Class label file img_size = 256
- Preprocess the input data. Use OpenCV to read the image to obtain a three-dimensional array. Crop and resize the image, convert the color space, and convert the image into the data format (tensor type) required for inference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
'''Preprocessing''' img_bgr = cv2.imread(pic_path) img_rgb = img_bgr[:,:,::-1] img = cv2.resize(img_rgb, (img_size, img_size)) # Resize the image to the target size. hw_off = (img_size - 224) // 2 # Split the image and obtain the middle area. crop_img = img[hw_off:img_size - hw_off, hw_off:img_size - hw_off, :] img = crop_img.astype("float32") # Convert the image to the float32 data type. img[:, :, 0] -= 104 # Constants 104, 117, and 123 are used to convert the image to the color space required by the Caffe model. img[:, :, 1] -= 117 img[:, :, 2] -= 123 img = np.expand_dims(img, axis=0) # Extend the first dimension to adapt to model input. img = img.transpose([0, 3, 1, 2]) # Convert (batch,height,width,channels) to (batch,channels,height,width). img = np.ascontiguousarray(img) # Arrange the memory continuously. img = Tensor(img) # Convert NumPy to the tensor class.
- Call the model.infer() API to perform model inference and obtain the model output result.
1 2 3
'''Model inference''' model = base.model(modelPath=model_path, deviceId=device_id) # Initialize the base.model class. output = model.infer([img])[0] # Perform inference. Input data type: List[base.Tensor]. List[base.Tensor] output by model inference is returned.
- Postprocess the model output. The postprocessing module provided by Vision SDK can be used to obtain the prediction class and its confidence score, and display them on the source image.
1 2 3 4 5 6 7 8 9 10 11
'''Postprocessing''' postprocessor = post.Resnet50PostProcess(config_path=config_path, label_path=label_path) # Obtain the postprocessing object. pred = postprocessor.process([output])[0][0] # Use the Vision SDK API to perform postprocessing. pred: <ClassInfo classId=... confidence=... className=...> confidence = pred.confidence # Obtain the class confidence score. className = pred.className # Obtain the class name. print('{}: {}'.format(className, confidence)) # Print the result. '''Save the inference image.''' img_res = cv2.putText(img_bgr, f'{className}: {confidence:.2f}', (20, 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 1) # Add the predicted class and confidence score to the image. cv2.imwrite('result.png', img_res) print('save infer result success')
Inference Running
- Configure environment variables. (The default CANN installation path /usr/local/Ascend/cann and the Vision SDK installation path /usr/local/Ascend/mxVision-{version} are used as examples.)
1 2
source /usr/local/Ascend/cann/set_env.sh source /usr/local/Ascend/mxVision-{version}/set_env.sh
- Perform inference.
1bash run.sh
If the following information is displayed, the running is successful:
1 2
Standard Poodle: 0.98583984375 save infer result success
After the inference is complete, a result.png file is generated in the current folder. The image result, as shown in Figure 3, displays the class label and confidence score of the image.

