Queue-based Model Inference

This section describes how to load a model based on queues, prepare model input data, and obtain model inference output data.

Principles

Call aclmdlLoadFromFileWithQ or aclmdlLoadFromMemWithQ to load a model in queue mode.
Call acltdtEnqueueData to pass the input data of a model to a queue. AscendCL performs inference based on the input data in the queue, with no need for you to call the model execution API.
Call the acltdtDequeueData API to wait until the model inference is complete, and then obtain the result data from the output buffer.

If multiple threads are involved, when the model has multiple inputs, the enqueuing tasks of input data (acltdtEnqueueData) must be in the same thread. When the model has multiple outputs, the dequeuing tasks of output data (acltdtDequeueData) must also be in the same thread.

Sample Code

This section focuses on the code logic of queue-based model inference. . For details about how to initialize and deinitialize AscendCL, see Initializing AscendCL. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation.

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

       
        
          
          
            #include "acl/acl.h"
// ......
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//Obtain the run mode of the AI software stack. The API calls for memory allocation and memory copy are different in different run modes.
extern bool g_isDevice;
aclrtRunMode runMode;
aclError ret = aclrtGetRunMode(&runMode);
g_isDevice = (runMode == ACL_DEVICE);

//3. Load and execute the model.
//The two dots (..) indicate a path relative to the directory of the executable file.
//For example, if the executable file is stored in the out directory, the two dots (..) point to the parent directory of the out directory.
const char* omModelPath = "../model/resnet50.om";

//3.1 Create an input queue for the model. If the model has multiple inputs, create multiple input queues. The following uses one input as an example.
acltdtQueueAttr *attr = acltdtCreateQueueAttr();
uint32_t *inputQueueList = new (nothrow) uint32_t[num];
int32_t inputNum = 1;

for (int n = 0; n < inputNum; n++) {
        uint32_t inputQid;
        ret = acltdtCreateQueue(attr, &inputQid);
        inputQueueList[n] = inputQid;
    }

//3.2 Create an output queue for the model. If the model has multiple outputs, create multiple output queues. The following uses one output as an example.
uint32_t *outputQueueList = new (nothrow) uint32_t[num];
int32_t outputNum = 1;

for (int n = 0; n < outputNum; n++) {
        uint32_t outputQid;
        ret = acltdtCreateQueue(attr, &outputQid);
        outputQueueList[n] = outputQid;
    }

//3.3 Load a model.
uint32_t modelId;
ret= aclmdlLoadFromFileWithQ(modelPath, &modelId, 
                             inputQueueList, inputNum, outputQueueList, outputNum);

//3.4 Obtain the model description based on the model ID.
aclmdlDesc *modelDesc = aclmdlCreateDesc();
ret = aclmdlGetDesc(modelDesc, modelId);

//3.5 Obtain the input buffer size of the model. If the model has multiple inputs, obtain the buffer size of each input. The following uses one input as an example.
size_t inputSize = aclmdlGetInputSizeByIndex(modelDesc, 0);

//3.6 Load the test image data, perform inference, and postprocess the inference result data.
string testFile[] = {
        "../data/dog1_1024_683.bin",
        "../data/dog2_1024_683.bin"
};

for (size_t index = 0; index < sizeof(testFile) / sizeof(testFile[0]); ++index) {
        uint32_t devBufferSize;
        void *picDevBuffer = nullptr;
        //Customize the ReadBinFile function, allocate buffer based on the run mode of the AI software stack, and call the function in the C++ standard library that reads the image data into the buffer.
        ret = Utils::ReadBinFile(testFile[index], picDevBuffer, devBufferSize);

        //Pass the model input data to the queue and run model inference. The value -1 indicates that the program is blocked until the input data is completely enqueued.
        ret = acltdtEnqueueData(inputQid, picDevBuffer, devBufferSize, nullptr, 0, -1, 0);
        //Obtain the size of each output.
	size_t dataSize = aclmdlGetOutputSizeByIndex(modelDesc, 0);
	void *data = nullptr;
        size_t retDataSize = 0;
        //Allocate buffer for the model output data.
	if (!g_isDevice) {
            aclError aclRet = aclrtMallocHost(&data, dataSize);
	 } else {
	    aclError aclRet = aclrtMalloc(&data, dataSize, ACL_MEM_MALLOC_HUGE_FIRST);
	 }
         //Wait until the inference execution of the model is complete and obtain the result data from the output buffer. The value -1 indicates that the program is blocked until the inference output data is enqueued.
         ret = acltdtDequeueData(outputQid, data, dataSize, &retDataSize, nullptr, 0, -1);
         //Cast the data in the output buffer to the float type.
         float *outData = NULL;
         outData = reinterpret_cast<float*>(data);
        
        //Print the class indexes of top 5 confidence values.
         map<float, int, greater<float> > resultMap;
         for (int j = 0; j < len / sizeof(float); ++j) {
            resultMap[*outData] = j;
            outData++;
         }
         int cnt = 0;
         for (auto it = resultMap.begin(); it != resultMap.end(); ++it) {
            // print top 5
            if (++cnt > 5) {
               break;
            }
            INFO_LOG("top %d: index[%d] value[%lf]", cnt, it->second, it->first);
         }
         if (!g_isDevice) {
            aclError aclRet = aclrtFreeHost(picDevBuffer);
            aclrtFreeHost(data);
	 } else {
	    aclError aclRet = aclrtFree(picDevBuffer);
            aclrtFree(data); 
	 }
}

//4. Unload the model and free the model inference resources.
aclmdlUnload(modelId);
aclmdlDestroyDesc(modelDesc);
acltdtDestroyQueue(inputQid);
acltdtDestroyQueue(outputQid);
acltdtDestroyQueueAttr(attr);

//6. Deallocate runtime resources.

//7. Deinitialize AscendCL.

// ......

           

         

       
      

Parent topic: Model Inference