Running a Model

This section describes the data to be prepared before model execution, model execution APIs, and resources to be deallocated after model execution cased on the API call sequence and sample code.

Principles

If network-wide model inference is involved, ensure that your app contains the code logic for model execution. For details about the API call sequence, see AscendCL API Call Sequence. This section describes the API call sequence for executing a model on the entire network. For details about loading and executing a single-operator, see Single-Operator Calling.

Before executing a model after it is loaded, prepare the input and output data structures, and upload the input data to the buffer corresponding to the model's input data structure.
After model execution is complete, free the buffer and destroy allocations (including the input data, aclmdlDesc type, aclmdlDataset type, and aclDataBuffer type) in a timely manner to avoid buffer exceptions. A model may have multiple inputs and outputs. The memory address and memory size of each input/output are described by data of the aclDataBuffer type. For each input/output, you need to call the aclrtFree API to release the data in the memory, and then call the aclDestroyDataBuffer API to destroy the corresponding aclDataBuffer type.

Model Execution

Figure 1 Typical model inference workflow

The key APIs are described as follows:

Call aclmdlCreateDesc to create data for describing the model.
Call aclmdlGetDesc to obtain the model description using the model ID returned in Loading a Model.
Prepare the input and output data structures for model execution. For details, see Preparing Input/Output Data Structure for Model Execution.
To configure dynamic batch size, dynamic image size, dynamic AIPP, or dynamic dimensions (ND format only) for your model input, see Model Inference with Dynamic-Shape Inputs and Dynamic AIPP Model Inference.
Run model inference.
In static batch size (greater than 1) scenarios, the input data is fed to your model for inference only when the input data reaches the given batch size. Design the processing logic for the remainder data that is less than the batch size as needed.

Currently, synchronous model inference and asynchronous model inference are supported.
- Synchronous inference
  Call aclmdlExecute to execute synchronous inference.
- Asynchronous inference
  Call aclmdlExecuteAsync to execute asynchronous inference.
  
  Call aclrtSynchronizeStream to wait for the stream tasks to complete.
  
  For details about asynchronous inference, see Asynchronous Model Inference.
Obtain the results of model inference for subsequent use.
- For synchronous inference, obtain the output data of model inference directly.
- For asynchronous inference with callback, obtain the model inference result from the callback function for subsequent use.
Free the buffer.
Call aclrtFree to free the device buffer.
Destroy data of specific types.
After model inference is complete, call aclDestroyDataBuffer and aclmdlDestroyDataset in sequence to free up the input and output data of the model. If the model takes multiple inputs or outputs, a separate call to aclDestroyDataBuffer is needed for each input or output.

Preparing Input/Output Data Structure for Model Execution

AscendCL provides the following data types to describe a model, model inputs and outputs, and data buffers, as the input parameters of the model execution call.

Use data of the aclmdlDesc type to describe the basic information of your model (such as the input/output count, and the name, data type, format, and shape of each input/output).
After a model is successfully loaded, call aclmdlGetDesc to obtain the model description based on the model ID. Then, you can obtain the input/output count, and buffer size, shape, format, and data type of each input/output from the model description by using the operation APIs under aclmdlDesc.
Use data of the aclmdlDataset type to describe the input/output data of your model. Note that a model might have more than one input and more than one output.
Call the operation APIs under aclmdlDataset to add aclDataBuffers and obtain the number of aclDataBuffers.
Use data of the aclDataBuffer type to describe the buffer address and buffer size of each input/output.
Call the operation APIs of under aclDataBuffer type to obtain the buffer address and buffer size of each input/output.

Figure 2 Relationship between aclmdlDataset and aclDataBuffer

After learning related data types, you can use the operation APIs for these data types to prepare the input and output data structures of the model, as shown in the following figure.

Figure 3 Input and output data structure preparation workflow

The key points are described as follows:

When a model has multiple inputs and outputs, you can call aclmdlGetNumInputs and aclmdlGetNumOutputs to obtain the actual numbers of inputs and outputs.
You can call aclmdlGetInputSizeByIndex or aclmdlGetOutputSizeByIndex to obtain the buffer size required by each model input or output.
If the model allows dynamic batch size, dynamic image size, or dynamic dimensions (ND format only), the input tensor supports a range of shape profiles and is not determined until the model is executed, you are advised to use the aclmdlGetInputSizeByIndex call to obtain the required buffer size, which is the buffer size required by the maximum profile to ensure buffer sufficiency.
When a model has multiple inputs or outputs, to ensure that aclDataBuffers are added to aclmdlDataset in order, you are advised to call aclmdlGetInputNameByIndex or aclmdlGetOutputNameByIndex to obtain the input or output names, so that you can add each aclDataBuffer to aclmdlDataset based on the corresponding input or output index.

Sample Code

This sample processes the outputs of the image classification model by printing the class indexes of the top 5 confidence values of each image. You can customize your own output processing logic.

You can view the complete code in Image Classification with ResNet-50 (Synchronous Inference).

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

      
       
         
         
           // 1. Obtain the model description based on the model ID.
// modelDesc_ is the aclmdlDesc type.
modelDesc_ = aclmdlCreateDesc();
ret = aclmdlGetDesc(modelDesc_, modelId_);

// 2. Prepare the input data structure for model inference.
// (1) Allocate input buffer.
size_t modelInputSize;
void *modelInputBuffer = nullptr;
// The model in this sample code has only one input, which is naturally indexed 0. If the model has multiple inputs, call the aclmdlGetNumInputs API to obtain the actual input count.
modelInputSize = aclmdlGetInputSizeByIndex(modelDesc_, 0);
aclRet = aclrtMalloc(&modelInputBuffer, modelInputSize, ACL_MEM_MALLOC_HUGE_FIRST);

// (2) Prepare the input data structure of the model.
// Create data of the aclmdlDataset type to describe the inputs for model inference. input_ is of the aclmdlDataset type.
input_ = aclmdlCreateDataset();
aclDataBuffer *inputData = aclCreateDataBuffer(modelInputBuffer, modelInputSize);
ret = aclmdlAddDatasetBuffer(input_, inputData);

// 3. Prepare the output data structure for model inference.
// (1) Create data of the aclmdlDataset type to describe the outputs of model inference. output_ is of the aclmdlDataset type.
output_ = aclmdlCreateDataset();

// (2) Obtain the number of model outputs.
size_t outputSize = aclmdlGetNumOutputs(modelDesc_);

// (3) Allocate buffer for each output with a for loop and add each output to aclmdlDataset.
for (size_t i = 0; i < outputSize; ++i) {
    size_t buffer_size = aclmdlGetOutputSizeByIndex(modelDesc_, i);
    void *outputBuffer = nullptr;
    aclError ret = aclrtMalloc(&outputBuffer, buffer_size, ACL_MEM_MALLOC_HUGE_FIRST);
    aclDataBuffer* outputData = aclCreateDataBuffer(outputBuffer, buffer_size);   
    ret = aclmdlAddDatasetBuffer(output_, outputData);
    }

// 4. Run the model.
string testFile[] = {
        "../data/dog1_1024_683.bin",
        "../data/dog2_1024_683.bin"
    };

for (size_t index = 0; index < sizeof(testFile) / sizeof(testFile[0]); ++index) {
    // 4.1 Define the ReadBinFile function, and call the functions in the C++ standard library std::ifstream to read the image files to obtain the buffer size (inputBuffSize) and buffer address (inputBuff) of the image files.
    void *inputBuff = nullptr;
    uint32_t inputBuffSize = 0;
    auto ret = Utils::ReadBinFile(fileName, inputBuff, inputBuffSize);
    
    // 4.2. Prepare the input data for model inference.
    // Call aclrtGetRunMode to obtain the run mode of the software stack before you allocate runtime resources.
    // If the run mode is ACL_DEVICE, the value of g_isDevice is True, which indicates that the software stack runs on the device. Therefore, image data transfer is not required. You need to transfer data on the device. Otherwise, the memory copy API needs to be called to transfer data to the device.
    if (!g_isDevice) {
        // if app is running in host, need copy data from host to device
        // modelInputBuffer and modelInputSize indicate the buffer address and buffer size of the input data for model inference, respectively. The buffer is allocated when the input/output data structure is prepared.
        aclError aclRet = aclrtMemcpy(modelInputBuffer, modelInputSize, inputBuff, inputBuffSize, ACL_MEMCPY_HOST_TO_DEVICE);
        (void)aclrtFreeHost(inputBuff);
    } else { // app is running in device
        aclError aclRet = aclrtMemcpy(modelInputBuffer, modelInputSize, inputBuff, inputBuffSize, ACL_MEMCPY_DEVICE_TO_DEVICE);
        (void)aclrtFree(inputBuff);
    }

    // 4.3 Run model inference.
    // modelId_ indicates the model ID. After the model is successfully loaded, the model ID is returned.
    // input_ and output_ indicate the input data and output data of model inference, respectively. They are defined when the input and output data structures of model inference are prepared.
    aclError ret = aclmdlExecute(modelId_, input_, output_);
        

    // Process the model inference output and print the class indexes corresponding to the top 5 confidence values.
    // output_ indicates the output of model execution.
    for (size_t i = 0; i < aclmdlGetDatasetNumBuffers(output_); ++i) {
    // Obtain the buffer address and buffer size of each output.
        aclDataBuffer* dataBuffer = aclmdlGetDatasetBuffer(output_, i);
        void* data = aclGetDataBufferAddr(dataBuffer);

        size_t len = aclGetDataBufferSizeV2(dataBuffer);

        // Cast the buffered data to the float type.
        float *outData = NULL;
        outData = reinterpret_cast<float*>(data);
        
        // Print the class indexes of top 5 confidence values.
        map<float, int, greater<float> > resultMap;
        for (int j = 0; j < len / sizeof(float); ++j) {
            resultMap[*outData] = j;
            outData++;
        }
        int cnt = 0;
        for (auto it = resultMap.begin(); it != resultMap.end(); ++it) {
            // print top 5
            if (++cnt > 5) {
                break;
            }

            INFO_LOG("top %d: index[%d] value[%lf]", cnt, it->second, it->first);
    }
}

// 5. Destroy the input and output allocations for model inference.
// Destroy the input data structures and free the buffer.
for (size_t i = 0; i < aclmdlGetDatasetNumBuffers(input_); ++i) {
        aclDataBuffer *dataBuffer = aclmdlGetDatasetBuffer(input_, i);
        (void)aclDestroyDataBuffer(dataBuffer);
}
(void)aclmdlDestroyDataset(input_);
input_ = nullptr;
aclrtFree(modelInputBuffer);

// Destroy the output data structures and free the buffer.
for (size_t i = 0; i < aclmdlGetDatasetNumBuffers(output_); ++i) {
    aclDataBuffer* dataBuffer = aclmdlGetDatasetBuffer(output_, i);
    void* data = aclGetDataBufferAddr(dataBuffer);
    (void)aclrtFree(data);
    (void)aclDestroyDataBuffer(dataBuffer);
}

(void)aclmdlDestroyDataset(output_);
output_ = nullptr;

          

        

      
     

Parent topic: Inference with Single-Batch and Static-Shape Inputs