Weight Update (System-managed Memory)
API Call Sequence
In the weight update scenario, you can call the following APIs to dynamically update the weight during model execution after the model is built:
- Compile and save the model based on graph build APIs. The model contains multiple graphs, such as the inference graph, variable initialization graph, and variable update graph.
In this section, aclgrphBundleBuildModel is called to build the model, and aclgrphBundleSaveModel is called to save the model. For details about the APIs, see "aclgrphBundleBuildModel" and "aclgrphBundleSaveModel".
Weight initialization is optional. You can determine whether to include the weight initialization graph based on the service scenario. If the weight initialization graph is not included, the device memory required for model loading can be saved.
- Call aclmdlBundleLoadFromFile or aclmdlBundleLoadFromMem to load the model.
- Call aclmdlBundleGetModelId to obtain the IDs of the three graphs.
- Call the model execution API (for example, aclmdlExecute) to execute the weight initialization graph based on the weight initialization graph ID.
- To update the weight, call aclmdlSetDatasetTensorDesc to set the tensor description of the graph before updating the weight.
- Update the graph based on the weight update graph ID and call the model execution API (for example, aclmdlExecute) to update the weight graph.
- Call the model execution API (for example, aclmdlExecute) to execute the inference graph based on the inference graph ID.
- After the inference is complete, call aclmdlBundleUnload to unload the model.
Sample Code
This section focuses on the code logic of model inference. For details about how to perform initialization and deinitialization, see Initialization and Deinitialization. For details about how to allocate and deallocate runtime resources, see Initialization and Deinitialization.
Following the API calls, add exception handling branches and specify log printing of error and information levels. The following is a code snippet of key steps only, which is not ready to be built or run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
//1. Initialize resources. aclInit(nullptr); aclrtSetDevice(0); //2. Load the model built in Ascend Graph mode. The model contains the inference graph, weight initialization graph, and weight update graph. The model file bundle.om is used as an example. uint32_t bundle_id = 0; aclmdlBundleLoadFromFile("./bundle.om", &bundle_id); //3. Obtain the number of executable graphs based on the bundle ID. size_t modelNum = 0; aclmdlBundleGetModelNum(bundle_id, &modelNum); //Assume that modelNum is set to 3. The input parameters of the aclgrphBundleBuildModel API are three graphs with fixed IDs, which are 0, 1, and 2. uint32_t infer_id= 0; aclmdlBundleGetModelId(bundle_id, 0, &infer_id); uint32_t init_id= 0; aclmdlBundleGetModelId(bundle_id, 1, &init_id); uint32_t update_id= 0; aclmdlBundleGetModelId(bundle_id, 2, &update_id); //If the weight does not need to be updated, execute the weight initialization graph and inference graph. // 4. Execute the weight initialization graph. For details about how to prepare the model input and output, see the sample code in other sections about inference features under "Model Inference". aclmdlExecute(init_id, init_mdl_input, init_mdl_output); // 5. Execute the inference graph. For details about how to prepare the model input and output, see the sample code in other sections about inference features under "Model Inference". aclmdlExecute(infer_id, infer_mdl_input, infer_mdl_output); // If a weight needs to be updated, update the weight before executing the inference graph. // 6. Execute the weight update graph. // If a weight does not need to be updated, for example, the 0th weight, the shape can be passed as an empty tensor, but the device memory must be valid. size_t no_need_refresh_index = 0; std::vector<int64_t> dims{0}; // If the elements in the dims array are 0, the tensor is empty. auto tensorDesc = aclCreateTensorDesc(ACL_FLOAT, dims.size(), dims.data(), ACL_FORMAT_ND); aclmdlSetDatasetTensorDesc(update_mdl_input, tensorDesc, no_need_refresh_index); //7. The following is an example of updating the first weight. size_t need_refresh_index = 1; std::vector<int64_t> dims{1, 3, 224, 224}; auto tensorDesc = aclCreateTensorDesc(ACL_FLOAT, dims.size(), dims.data(), ACL_FORMAT_ND); aclmdlSetDatasetTensorDesc(update_mdl_input, tensorDesc, need_refresh_index); //8. Execute the weight update graph. For details about how to prepare the model input and output, see the sample code in other sections about inference features under "Model Inference". aclmdlExecute(update_id, update_mdl_input, update_mdl_output); //9. Execute the inference graph. For details about how to prepare the model input and output, see the sample code in other sections about inference features under "Model Inference". aclmdlExecute(infer_id, infer_mdl_input, infer_mdl_output); //10. Unload the bundle model. aclmdlBundleUnload(bundle_id); //11. Destroy allocations. aclrtResetDevice(0); aclFinalize(); |