Development Using APIs (C++)

The samples described in this section apply to the Atlas inference product and Atlas 200I/500 A2 inference product .

Sample Overview

The following uses Atlas inference product as an example to demonstrate how to use Vision SDK C++ APIs to develop an image object detection application. Figure 1 shows the inference process of an image object detection model. A YOLOv3 model under the TensorFlow framework is used as an example.

Figure 1 Workflow of the image object detection model

Preparations

  1. Install and deploy Vision SDK, and then perform the quick start sample.
    Table 1 Required dependencies

    Dependency

    Version

    Link

    OS

    For details, see Supported Hardware and Operating Systems.

    -

    System dependency

    -

    Ubuntu or CentOS

    CANN development kit

    8.1.RC1

    Click here to download CANN.

    npu-driver

    Ascend HDK 25.0.RC1

    Click here. In the Select Resource area on the left, filter the required software packages, confirm the version information, and download the software packages.

    For details, see Driver and Firmware Installation and Upgrade Guides of each hardware product.

    npu-firmware

    Ascend HDK 25.0.RC1

    NumPy

    1.25.2

    1
    pip3 install numpy==1.25.2
    
  2. Obtain the sample code.

    Click here to obtain the sample code package.

  3. Log in to the development environment where Vision SDK is installed, and upload the sample code package.
  4. Decompress the sample code package and go to the decompressed directory.
    1
    2
    unzip YoloV3Infer.zip
    cd YoloV3Infer
    

    The following is an example of the directory structure of the sample code.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    YoloV3Infer
    ├── model
    │ ├── yolov3.names                        # YOLOv3 postprocessing label file ├── yolov3_tf_bs1_fp16.cfg            # YOLOv3 postprocessing configuration file ├── aipp_yolov3_416_416.aippconfig  # YOLOv3.om model AIPP conversion file
    ├── main.cpp                  # Main program file
    ├── CMakeLists.txt           
    ├── run.sh               # Script for running the program. It is recommended that you use the dos2unix tool to run the dos2unix run.sh command to format the script before running the program.
    ├── README.md                 
    ├── test.jpg                  # Test image prepared by the user
    
  5. Prepare the yolov3_tf.pb model for inference by referring to section "Preparing a Model" in README.md (see the decompression directory in 4).
  6. Prepare image data for inference.

    Use your image to perform the test (change the name of the image to test.jpg). The following image is used for demonstration.

    Figure 2 test.jpg

If issues such as unavailable CMake occur on the openEuler system, see System Commands Yum and Cmake Are Unavailable to solve the issues.

Code Parsing

In this sample, the key steps and code are as follows which cannot be directly copied for compilation or running. For details about the complete sample code, see the sample file.

  1. Initialize resources and configure model-related variables, such as paths of the model, configuration file, and label.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    // Initialize resources and variables.
    const uint32_t YOLOV3_RESIZE = 416; // Image resizing size
    
    std::string yolov3ModelPath = "./model/yolov3_tf_bs1_fp16.om"; // Model path (OM model file automatically generated after the run.sh script is executed. The model file is stored in the ./model directory)
    std::string yolov3ConfigPath = "./model/yolov3_tf_bs1_fp16.cfg"; // Postprocessing configuration file path
    std::string yolov3LabelPath = "./model/yolov3.names"; // Postprocessing label file path
    
    v2Param.deviceId = 0; // Configuration
    v2Param.labelPath = yolov3LabelPath;
    v2Param.configPath = yolov3ConfigPath;
    v2Param.modelPath = yolov3ModelPath;
    APP_ERROR ret = MxBase::MxInit();
    
  2. Preprocess the input data. Execute MxInit to initialize resources, initialize the ImageProcessor object, and decode the image to obtain the Image object, resize the image, and convert the image into the data format (tensor type) required for inference.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    // Preprocessing
    // Construct the image processing class.
    MxBase::ImageProcessor imageProcessor(deviceId); 
    
    // Construct the decoded image class.
    MxBase::Image decodedImage;
    // Perform decoding based on the image path.
    ret = imageProcessor.Decode(imgPath, decodedImage, ImageFormat::YUV_SP_420);
    
    MxBase::Image resizeImage;
    // Resizing size
    MxBase::Size resizeConfig(YOLOV3_RESIZE, YOLOV3_RESIZE);
    // Perform resizing.
    ret = imageProcessor.Resize(decodedImage, resizeConfig, resizeImage, MxBase::Interpolation::HUAWEI_HIGH_ORDER_FILTER);
    
    std::string path = "./resized_yolov3_416.jpg";
    // Encode the resized image and output it to the specified path.
    ret = imageProcessor.Encode(resizeImage, path);
    
    // Convert the Image object to a Tensor.
    MxBase::Tensor tensorImg = resizeImage.ConvertToTensor();
    // Set the ID of the device where Tensor is located.
    ret = tensorImg.ToDevice(deviceId);
    
  3. After the model class is built, input the tensor object built during preprocessing, call the Infer API, and obtain the model output result yoloV3Outputs.
    1
    2
    3
    4
    5
    6
    7
    8
    // Model inference
    // Construct the model class.
    MxBase::Model yoloV3(modelPath, deviceId);
    
    // Construct the batch tensor as the input parameter of the Infer API.
    std::vector<MxBase::Tensor> yoloV3Inputs = {tensorImg};
    // Perform model inference.
    std::vector<MxBase::Tensor> yoloV3Outputs = yoloV3.Infer(yoloV3Inputs);
    
  4. Postprocess the model output. The postprocessing module provided by Vision SDK (or developed by yourself) can be used to obtain the bounding box and object class, and display them on the source image through OpenCV.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    // Postprocessing
    // Postprocessing source image information
    MxBase::ImageInfo imageInfo;
    imageInfo.oriImagePath = argv[1];
    imageInfo.oriImage = decodedImage;
    // Execute the postprocessing function.
    ret = YoloV3PostProcess(imageInfo, v2Param.configPath, v2Param.labelPath, yoloV3Outputs);
    
    // Main logic of the postprocessing function of YoloV3PostProcess
    // Create postprocessing configuration information.
    std::map<std::string, std::string> postConfig;
    postConfig.insert(pair<std::string, std::string>("postProcessConfigPath", yoloV3ConfigPath));
    postConfig.insert(pair<std::string, std::string>("labelPath", yoloV3LabelPath));
    
    // Initialize the postprocessing class.
    MxBase::Yolov3PostProcess yolov3PostProcess;
    APP_ERROR ret = yolov3PostProcess.Init(postConfig);
    
    // Postprocessing
    vector<MxBase::TensorBase> tensors;
    // Construct object detection information based on the model inference result. The information is required for postprocessing implemented by Vision SDK.
    // If the postprocessing function is user-defined, construct it based on the actual situation.
    vector<vector<MxBase::ObjectInfo>> objectInfos;
    auto shape = yoloV3Outputs[0].GetShape();
    MxBase::ResizedImageInfo imgInfo;
    // Image width prior to resizing
    imgInfo.widthOriginal = imageInfo.oriImage.GetOriginalSize().width;
    // Image height prior to resizing
    imgInfo.heightOriginal = imageInfo.oriImage.GetOriginalSize().height;
    // Image width after resizing.
    imgInfo.widthResize = YOLOV3_RESIZE;
    // Image height after resizing.
    imgInfo.heightResize = YOLOV3_RESIZE;
    // Image resizing mode.
    imgInfo.resizeType = MxBase::RESIZER_STRETCHING;
    std::vector<MxBase::ResizedImageInfo> imageInfoVec = {};
    imageInfoVec.push_back(imgInfo);
    // Perform postprocessing.
    ret = yolov3PostProcess.Process(tensors, objectInfos, imageInfoVec);
    // Use OpenCV to visualize the bounding box.
    cv::putText(imgBgr, objectInfos[i][j].className, cv::Point(x0 + 10, y0 + 10), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 255,0), 4, 8);
       cv::rectangle(imgBgr, cv::Rect(x0, y0, x1 - x0, y1 - y0), cv::Scalar(0, 255, 0), 4);
    // Perform model postprocessing deinitialization.
    ret = yolov3PostProcess.DeInit();
    
  5. Perform deinitialization and destroy resources.
    1
    2
    3
    4
    5
    6
    // Deinitialization
    ret = MxBase::MxDeInit();
    if (ret != APP_ERR_OK) {
        LogError << "MxDeInit failed, ret=" << ret << ".";
        return ret;
    }
    

Inference Running

  1. Configure environment variables. (The default CANN installation path /usr/local/Ascend/cann and the Vision SDK installation path /usr/local/Ascend/mxVision-{version} are used as examples.)
    source /usr/local/Ascend/cann/set_env.sh
    source /usr/local/Ascend/mxVision-{version}/set_env.sh
  2. Perform inference. Before running the inference script, modify the MX_SDK_HOME variable in CMakeLists.txt based on the Vision SDK installation path.
    1
    bash run.sh
    

    If the following information is displayed, the running is successful:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    yoloV3Outputs len=3
    ******YoloV3PostProcess******
    Size of objectInfos is 1
    objectInfo-0 ,Size:1
    *****objectInfo-0:0
    x0 is 410.738
    y0 is 27.4772
    x1 is 948.388
    y1 is 645.941
    confidence is 0.758505
    classId is 16
    className is dog
    ******YoloV3PostProcess end******
    

    After the inference is complete, a result.jpg file is generated in the current folder. The image result, as shown in Figure 3, displays the coordinate box and class of the detected object.

    Figure 3 result.jpg