Loading a Model

This section describes how to load a model to prepare for model execution.

API Call Sequence

If network-wide model inference is involved, ensure that your app contains the code logic for model loading. For details about the API call sequence, see AscendCL API Call Sequence. This section describes the API call sequence for loading a model on the entire network. For details about loading and executing a single-operator, see Single-Operator Call Sequence.

AscendCL provides two sets of model loading APIs for you to choose from based on your programming habits and application scenarios.

Figure 1: You only need to set configuration parameters in APIs for different loading modes (such as loading from a file or from the memory). This has a wider application scope, but multiple APIs need to be used together to create configuration objects, set attribute values in objects, and load models, respectively.
Figure 2: You need to select different APIs according to different loading modes (such as loading from a file or from the memory). The operation is relatively simple, but you need to remember the loading APIs of various modes.

Figure 1 Model loading workflow (setting parameters in the model loading API)

Figure 2 Model loading workflow (using different model loading APIs)

The key APIs are described as follows:

Before loading a model, build an offline model adapted to the Ascend AI Processor (.om file). For details, see Building a Model.
If the memory is managed by the user, use aclmdlQuerySize to query the sizes of the workspace and weight memory required for model execution to avoid memory waste.
If the shape of the model input data is uncertain, the aclmdlQuerySize API cannot be called to query the memory size. As a result, the memory cannot be managed by the user during model loading. Therefore, you need to select a model loading API (for example, aclmdlLoadFromFile) whose memory is managed by the system.

If the Ascend Graph API is called to build your own network during model building, and the model data is stored in the memory without generating the .om offline model file, the memory size cannot be queried using the aclmdlQuerySize API. For details about the Ascend Graph APIs, see Ascend Graph Developer Guide.
A model can be loaded using the following APIs. A model ID is returned after the model is successfully loaded.
- aclmdlSetConfigOpt and aclmdlLoadWithConfig are more complicated. The caller needs to set the attributes in the configuration object passed to the API call to decide how the model will be loaded and who will manage the memory.
- When the following APIs are used, the caller can determine whether to load the model from a file or from memory and whether the memory is managed by the system or the user:
  - aclmdlLoadFromFile: loads offline model data from a file. The memory for running the model is managed by the system.
  - aclmdlLoadFromMem: loads offline model data from the memory. The memory for model running is managed by the system.
  - aclmdlLoadFromFileWithMem: loads offline model data from a file. The memory (including the workspace for storing temporary data at model runtime and weight memory for storing the weight data of the model) is managed by the user.
  - aclmdlLoadFromMemWithMem: loads offline model data from memory. The memory (including workspace and weight memory) is managed by the user.

Sample Code

After the model is loaded successfully, the ID of the model is returned, which will be used in Running a Model.

The following describes how to load a model from a file and manage the memory by the user. You can view the complete code in Image Classification with ResNet-50 (Synchronous Inference).

Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.

      
           // 1. Initialize variables.
// The two dots (..) indicate a path relative to the directory of the executable file.
// For example, if the executable file is stored in the out directory, the two dots (..) point to the parent directory of the out directory.
const char* omModelPath = "../model/resnet50.om";
// ......

// 2. Obtain the weight memory size and workspace size required for model execution.
aclError ret = aclmdlQuerySize(omModelPath, &modelMemSize_, &modelWeightSize_);

// 3. Allocate workspace on the device for model execution.
ret = aclrtMalloc(&modelMemPtr_, modelMemSize_, ACL_MEM_MALLOC_HUGE_FIRST);

// 4. Allocate weight memory on the device for model execution.
ret = aclrtMalloc(&modelWeightPtr_, modelWeightSize_, ACL_MEM_MALLOC_HUGE_FIRST);

// 5. Load your offline model. The memory (including the weight memory and workspace) is managed by the user.
// The model is successfully loaded, and the model ID is returned.
ret = aclmdlLoadFromFileWithMem(modelPath, &modelId_, modelMemPtr_, modelMemSize_, modelWeightPtr_, modelWeightSize_);

// ......

Parent topic: Inference with Single-Batch and Static-Shape Inputs