Dynamic Shape Input (Setting the Shape Range)

This section describes the key APIs, API call sequence, and sample code for setting the shape range in the dynamic shape input scenario.

This feature is not supported by the Atlas 200/300/500 Inference Product .

API Call Sequence

If the input shape of the model is dynamic, call aclmdlSetDatasetTensorDesc to set the tensor description (mainly the shape information) before the model is executed. After the model is executed: Call aclmdlGetDatasetTensorDesc to obtain the tensor description information dynamically output by the model, and then call the operation APIs under aclTensorDesc to obtain the memory size occupied by the output tensor data, tensor format, and tensor dimensions.

The procedure is described as follows:

  1. Build a model.

    In the model inference scenario, when the ATC tool is used to convert the dynamic-shape input data, the input_shape parameter is used to set the input shape range. For details about the parameters, see --input_shape in ATC Instructions.

  2. Load the model.

    For details about the model loading workflow, see Loading a Model. After the model is successfully loaded, the model ID is returned.

  3. Create data of the aclmdlDataset type to describe the input and output of model execution.

    For details about the call sequence, see Preparing Input/Output Data Structure for Model Execution.

    Notes:

    • If the size obtained by calling aclmdlGetInputSizeByIndex is 0, the input shape is dynamic. A larger input buffer can be estimated based on the actual situation.
    • If the size obtained by calling aclmdlGetOutputSizeByIndex is 0, the output shape is dynamic. You can estimate a large output buffer based on the actual situation, or the system allocates the output buffer of the corresponding index. For details, see the description of the aclmdlGetOutputSizeByIndex interface.
  4. After a model is successfully loaded and before the model is executed, call aclmdlSetDatasetTensorDesc to set the tensor description (mainly the shape information) of the dynamic shape input.

    When calling aclCreateTensorDesc to create a tensor description, set the shape information, including the number of dimensions and the number of elements in each dimension, which must be within the range of the input shape set during model building. For details about model building, see Building a Model.

  5. (Optional) Create an Allocator descriptor and register an Allocator.
    Note: Currently, the external Allocator can be used to manage the memory only in the dynamic-shape model inference scenario. The API for registering the Allocator must be used together with and called before the aclmdlExecuteAsync API.
    1. Call aclrtAllocatorCreateDesc to create an Allocator descriptor.
    2. Call aclrtAllocatorSetObjToDesc, aclrtAllocatorSetAllocFuncToDesc, aclrtAllocatorSetGetAddrFromBlockFuncToDesc, and aclrtAllocatorSetFreeFuncToDesc to set the Allocator object and callback function.
    3. Call aclrtAllocatorRegister to register the Allocator and bind the Allocator to the stream. The same stream must be used during model execution.
    4. After the Allocator is registered, you can call aclrtAllocatorDestroyDesc to destroy the Allocator descriptor.
  6. Execute the model.
    • Call aclmdlExecute in synchronous inference scenarios.
    • Call aclmdlExecuteAsync in asynchronous inference scenarios. Then, call aclrtSynchronizeStream to check that the asynchronous task is complete.

      In asynchronous inference scenarios, only aclmdlExecuteAsync supports the Allocator registered by the user, that is, the Allocator registered in step 5.

  7. Obtain the model execution result.

    Call aclmdlGetDatasetTensorDesc to obtain the tensor description of the dynamic shape output, and then use the operation API of the aclTensorDesc type to obtain the tensor description attributes. The following describes how to obtain size (size occupied by tensor data) as an example: Then, read the data of the corresponding size from the memory.

  8. (Optional) If an Allocator is registered in 5, deregister and destroy registered Allocator.

    The Allocator registered by the user is bound to the stream. If the Allocator needs to be released or destroyed, call aclrtAllocatorUnregister to cancel the registration before releasing the stream, and then release the stream resources and destroy the Allocator.

Sample Code for Synchronous Inference Scenarios

This section focuses on the code logic of model inference. For details about how to initialize and deinitialize AscendCL, see Initializing AscendCL. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation.

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//In this sample, assume that the first input of the model is dynamic with an index of 0, and the first output of the model is dynamic with also an index of 0.

//3. Load the model. After the model is successfully loaded, set the tensor description for dynamic input, mainly the shape information.
// ......

//4. Prepare the model description modelDesc_, and the model inputs input_ and model outputs output_.
//Pay attention to the following:
//If the size obtained by using aclmdlGetInputSizeByIndex is 0, the input shape is dynamic. A larger input buffer can be estimated based on the actual situation.
//If the size obtained by using aclmdlGetOutputSizeByIndex is 0, the output shape is dynamic. A larger output buffer can be estimated based on the actual situation.
// ......

//5. Customize a function to set the description of the dynamic input tensor.
void SetTensorDesc()
{
        // ......
    //Create the tensor description.
    //The shape must be the same as that of the specified input data.
	int64_t shapes = {1, 3, 224, 224};
	aclTensorDesc *inputDesc = aclCreateTensorDesc(ACL_FLOAT, 4, shapes, ACL_FORMAT_NCHW);
	//Set the description of the dynamic input tensor whose index is 0.
	aclError ret = aclmdlSetDatasetTensorDesc(input_, inputDesc, 0);
        // ......
}
//6. Customize a function, execute the model, and obtain the description of the dynamic output tensor.
void ModelExecute()
{
	aclError ret;
	//Call a custom API to set the description of the dynamic input tensor.
	SetTensorDesc();
        //Execute the model.
	ret = aclmdlExecute(modelId, input_, output_);
	//Obtain the description of the dynamic output tensor whose index is 0.
        aclTensorDesc *outputDesc = aclmdlGetDatasetTensorDesc(output_, 0);
	//Use the operation API of the aclTensorDesc type to obtain the outputDesc attribute. You need to obtain size (size occupied by tensor data) as an example: Then, read the data of the corresponding size from the memory.
	string outputFileName = ss.str();
        FILE *outputFile = fopen(outputFileName.c_str(), "wb");
	size_t outputDesc_size = aclGetTensorDescSize(outputDesc);
        aclDataBuffer *dataBuffer = aclmdlGetDatasetBuffer(output_, 0);
        void *data = aclGetDataBufferAddr(dataBuffer);
	void *outHostData = nullptr;
        //Call aclrtGetRunMode to obtain the run mode of the software stack and determine whether to transfer data, based on the run mode.
        aclrtRunMode runMode;
        ret = aclrtGetRunMode(&runMode);
        if (runMode == ACL_HOST) {
            ret = aclrtMallocHost(&outHostData, outputDesc_size);
	   //Since the memory allocated for the dynamic shape is large, the actual data size outputDesc_size is used to copy the memory.
	    ret = aclrtMemcpy(outHostData, outputDesc_size, data, outputDesc_size, ACL_MEMCPY_DEVICE_TO_HOST);
	    fwrite(outHostData, outputDesc_size, sizeof(char), outputFile);
            ret = aclrtFreeHost(outHostData);
	} else {
		// if app is running in host, write model output data into result file
                fwrite(data, outputDesc_size, sizeof(char), outputFile);
	}
	fclose(outputFile);
        // ......
}

//7. Process the model inference result.

//8. Deallocate runtime resources.

//9. Deinitialize the AscendCL.

// ......

Sample Code for Asynchronous Inference Scenarios

This section focuses on the code logic of model inference. For details about how to initialize and deinitialize AscendCL, see Initializing AscendCL. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation.

In the example scenario, a user registers an Allocator and calls the asynchronous API aclmdlExecuteAsync to perform inference.

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//The sample code is for reference only. In actual application scenarios, verify the return value.
//In this sample, assume that the first input of the model is dynamic with an index of 0, and the first output of the model is dynamic with also an index of 0.

//3. Load the model. After the model is successfully loaded, set the tensor description for dynamic input, mainly the shape information.
// ...

//4. Prepare the model description modelDesc_, and the model inputs input_ and model outputs output_.
//Pay attention to the following:
//If the size obtained by using aclmdlGetInputSizeByIndex is 0, the input shape is dynamic. A larger input buffer can be estimated based on the actual situation.
//If the size obtained by using aclmdlGetOutputSizeByIndex is 0, the output shape is dynamic. A larger output buffer can be estimated based on the actual situation.
// ......

//5. Customize a function to set the description of the dynamic input tensor.
void SetTensorDesc()
{
        // ......
    //Create the tensor description.
    //The shape must be the same as that of the specified input data.
	int64_t shapes = {1, 3, 224, 224};
	aclTensorDesc *inputDesc = aclCreateTensorDesc(ACL_FLOAT, 4, shapes, ACL_FORMAT_NCHW);
	//Set the description of the dynamic input tensor whose index is 0.
	aclError ret = aclmdlSetDatasetTensorDesc(input_, inputDesc, 0);
        // ......
}

//6. Create the Allocator descriptor and register the Allocator.
//Assume that the allocator is the Allocator object that the user wants to register.
void ResgisterCustomAllocator(aclrtAllocator allocator, aclrtStream stream) {
    //6.1 Create AllocatorDesc.
    aclrtAllocatorDesc allocatorDesc = aclrtAllocatorCreateDesc();
    //6.2 Initialize AllocatorDesc and set callback functions related to Allocator memory allocation and deallocation. CustomMallocFunc, CustomFreeFunc, CustomMallocAdviseFunc, and CustomGetBlockAddrFunc are callback functions defined in the C style and are transferred to AllocatorDesc as function pointers.
    aclrtAllocatorSetObjToDesc(allocatorDesc, allocator);
    aclrtAllocatorSetAllocFuncToDesc(allocatorDesc, CustomMallocFunc);
    aclrtAllocatorSetFreeFuncToDesc(allocatorDesc, CustomFreeFunc);
    aclrtAllocatorSetAllocAdviseFuncToDesc(allocatorDesc, CustomMallocAdviseFunc);
    aclrtAllocatorSetGetAddrFromBlockFuncToDesc(allocatorDesc, CustomGetBlockAddrFunc);
    //Register the Allocator and bind it to the stream. The API creates the Allocator based on AllocatorDesc.
    aclrtAllocatorRegister(stream, allocatorDesc);
    //After the Allocator descriptor is used, it can be destroyed.
    aclrtAllocatorDestroyDesc(allocatorDesc);
}


//7. Customize a function, execute the model, and obtain the description of the dynamic output tensor.
void ModelExecute(aclrtStream stream)
{
	aclError ret;
	//Call a custom API to set the description of the dynamic input tensor.
	SetTensorDesc();
       //Execute the model. Only asynchronous APIs support the Allocator registered by the user, and the execution stream must be the same as the stream bound to the registered Allocator.
	ret = aclmdlExecuteAsync(modelId, input_, output_, stream);
       //If the asynchronous API is successfully called, the task is successfully delivered. Before obtaining the result, call the call synchronization API to ensure that the task has been executed.
        aclrtSynchronizeStream(stream);
       //Obtain the description of the dynamic output tensor whose index is 0.
        aclTensorDesc *outputDesc = aclmdlGetDatasetTensorDesc(output_, 0);
       //Use the operation API of the aclTensorDesc type to obtain the outputDesc attribute. You need to obtain size (size occupied by tensor data) as an example: Then, read the data of the corresponding size from the memory.
	string outputFileName = ss.str();
        FILE *outputFile = fopen(outputFileName.c_str(), "wb");
	size_t outputDesc_size = aclGetTensorDescSize(outputDesc);
        aclDataBuffer *dataBuffer = aclmdlGetDatasetBuffer(output_, 0);
        void *data = aclGetDataBufferAddr(dataBuffer);
	void *outHostData = nullptr;
       //Call aclrtGetRunMode to obtain the run mode of the software stack and determine whether to transfer data, based on the run mode.
        aclrtRunMode runMode;
        ret = aclrtGetRunMode(&runMode);
        if (runMode == ACL_HOST) {
            ret = aclrtMallocHost(&outHostData, outputDesc_size);
           //Since the memory allocated for the dynamic shape is large, the actual data size outputDesc_size is used to copy the memory.
	    ret = aclrtMemcpy(outHostData, outputDesc_size, data, outputDesc_size, ACL_MEMCPY_DEVICE_TO_HOST);
	    fwrite(outHostData, outputDesc_size, sizeof(char), outputFile);
            ret = aclrtFreeHost(outHostData);
	    } else {
		// if app is running in host, write model output data into result file
                fwrite(data, outputDesc_size, sizeof(char), outputFile);
	    }
	fclose(outputFile);
        // ......
}

//8. Process the model inference result.
// ...

//9. Deregister and destroy the registered Allocator.
void UnresgisterCustomAllocator(aclrtStream stream) {
     aclrtAllocatorUnregister(stream);
    //Destroy the customized Allocator.
}

//10. Deallocate runtime resources.

//11. Deinitialize AscendCL.

// ......