Loading a Model

This section describes how to load a model to prepare for model execution.

API Call Sequence

If network-wide model inference is involved, ensure that your app contains the code logic for model loading. For details about the API call sequence, see API Call Sequence. This section describes the API call sequence for loading a model on the entire network. For details about loading and executing a single-operator, see Single-Operator Call Sequence.

Two sets of model loading acl APIs are provided for you to choose from based on your programming habits and application scenarios.

Figure 1: You only need to set configuration parameters in APIs for different loading modes (such as loading from a file or from the memory). This has a wider application scope, but multiple APIs need to be used together to create configuration objects, set attribute values in objects, and load models, respectively.
Figure 2: You need to select different APIs according to different loading modes (such as loading from a file or from the memory). The operation is relatively simple, but you need to remember the loading APIs of various modes.
Figure 1 Model loading workflow (setting parameters in the model loading API)

Figure 2 Model loading workflow (using different model loading APIs)

The key APIs are described as follows:

Before loading a model, build an offline model adapted to the Ascend AI Processor (.om file). For details, see Building a Model.
If the memory is managed by the user, use aclmdlQuerySize to query the sizes of the workspace and weight memory required for model execution to avoid memory waste.
If the shape of the input data is uncertain, you cannot call aclmdlQuerySize to query the memory. As a result, you cannot manage the memory during model loading. Therefore, you need to call aclmdlLoadFromFile to allow the system to manage the memory.

If you use graph build APIs to build your own network, and the model data is stored in the memory without generating the offline OM model file, the memory size cannot be queried using the aclmdlQuerySize API. For details about graph APIs, see Graph Mode Development Guide.
A model can be loaded using the following APIs. A model ID is returned after the model is successfully loaded.
- aclmdlSetConfigOpt and aclmdlLoadWithConfig are more complicated. The caller needs to set the attributes in the configuration object passed to the API call to decide how the model will be loaded and who will manage the memory.
- When the following APIs are used, the caller can determine whether to load the model from a file or from memory and whether the memory is managed by the system or the user:
  - aclmdlLoadFromFile: loads offline model data from a file. The memory is managed by the system.
  - aclmdlLoadFromMem: loads offline model data from memory. The memory is managed by the system.
  - aclmdlLoadFromFileWithMem: loads offline model data from a file. The memory (including the workspace for storing temporary data at model runtime and weight memory for storing the weight data of the model) is managed by the user.
  - aclmdlLoadFromMemWithMem: loads offline model data from memory. The memory (including workspace and weight memory) is managed by the user.

Sample Code

After the model is loaded successfully, the ID of the model is returned, which will be used in Running a Model.

The following describes how to load a model from a file and manage the memory by the user. You can view the complete code in Image Classification with ResNet-50 (Synchronous Inference).

Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.

      
           // 1. Initialize variables.
// The two dots (..) indicate a path relative to the directory of the executable file.
// For example, if the executable file is stored in the out directory, the two dots (..) point to the parent directory of the out directory.
const char* omModelPath = "../model/resnet50.om";
// ......

// 2. Obtain the weight memory size and workspace size required for model execution.
aclError ret = aclmdlQuerySize(omModelPath, &modelMemSize_, &modelWeightSize_);

// 3. Allocate workspace on the device for model execution.
ret = aclrtMalloc(&modelMemPtr_, modelMemSize_, ACL_MEM_MALLOC_HUGE_FIRST);

// 4. Allocate weight memory on the device for model execution.
ret = aclrtMalloc(&modelWeightPtr_, modelWeightSize_, ACL_MEM_MALLOC_HUGE_FIRST);

// 5. Load your offline model. The memory (including the weight memory and workspace) is managed by the user.
// The model is successfully loaded, and the model ID is returned.
ret = aclmdlLoadFromFileWithMem(omModelPath, &modelId_, modelMemPtr_, modelMemSize_, modelWeightPtr_, modelWeightSize_);

// ......

Parent topic: Model Inference with Static-Shape Inputs