Multi-batch Model Inference
A similar workflow, as described in Model Inference, can be followed for inference with larger batch sizes.
However, there are some slight differences worth mentioning in multi-batch inference:
- If the batch size is greater than 1, when building a model, you need to set the batch size by using the input_shape parameter of the ATC tool. For details, see ATC Instructions.
- Before inference, you need to write a piece of code: If the input data meets the requirements of batch size greater than 1 (for example, batch size = 8), allocate device memory to store the batch data as the input of model inference. The remainder data that is less than the batch size will be directly fed to the model for inference.
After APIs are called, you need to add exception handling branches and record error logs and info logs. In the following example, the batch size is set to 8. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | uint32_t batchSize = 8; uint32_t deviceNum = 1; uint32_t deviceId = 0; //Obtain the size of the first input of the model. uint32_t modelInputSize = aclmdlGetInputSizeByIndex(modelDesc, 0); //Obtain the input size per batch. uint32_t singleBuffSize = modelInputSize / batchSize; //Define a variable to accumulate the input data to the batch size (8). uint32_t cnt = 0; //Define a variable to describe the offset for loading each file to memory. uint32_t pos = 0; void* p_batchDst = NULL; std::vector<std::string>inferFile_vec; for (int i = 0; i < files.size(); ++i) { //Allocate device memory every eight input files (batch size = 8). if (cnt % batchSize == 0) { pos = 0; inferFile_vec.clear(); //Allocate device memory. aclrtMalloc(&p_batchDst, modelInputSize, ACL_MEM_MALLOC_HUGE_FIRST); } //TODO: Load a file from a directory and calculate the file size fileSize. //Allocate memory to store file data based on the file size. aclrtMallocHost(&p_imgBuf, fileSize); //Transfer data to the device memory. aclrtMemcpy((uint8_t *)p_batchDst + pos, fileSize, p_imgBuf, fileSize, ACL_MEMCPY_HOST_TO_DEVICE); pos += fileSize; //Free unused memory in a timely manner. aclrtFreeHost(p_imgBuf); //Save the ith file to the vector and increase the value of cnt by 1. inferFile_vec.push_back(files[i]); cnt++; // Send the input data (batchSize = 8) for model inference. if (cnt % batchSize == 0) { //TODO: Create data of type aclmdlDataset and aclDataBuffer to describe the input and output data of the model. //TODO: Call aclmdlExecute to start model inference. //TODO: Call aclrtFree to free device memory after the inference is complete. } } //Feed the remainder data that is less than the batch size to the model for inference. if (cnt % batchSize != 0) { //TODO: Create data of type aclmdlDataset and aclDataBuffer to describe the input and output data of the model. //TODO: Call aclmdlExecute to start model inference. //TODO: Call aclrtFree to free device memory after the inference is complete. } |
Parent topic: Model Inference