Dynamic Shape Input (Setting the Shape Range)

Atlas 200/300/500 Inference Product does not support this feature..

API Call Sequence

If the input shape of the model is dynamic, call acl.mdl.set_dataset_tensor_desc to set the tensor description (mainly the shape information) before the model is executed. After the model is executed, call acl.mdl.get_dataset_tensor_desc to obtain the tensor description information dynamically output by the model, and then call the operation APIs of the aclTensorDesc type to obtain the memory size occupied by the output tensor data, tensor format, and tensor dimensions.

The procedure is described as follows:

  1. Build a model.

    In the model inference scenario, when the ATC tool is used to convert the dynamic-shape input, the input_shape parameter is used to set the input shape range.

    For details about ATC parameters, see ATC Instructions.

  2. Load a model.

    For details about the model loading workflow, see Loading a Model. After the model is successfully loaded, the model ID is returned.

  3. Create data of the aclmdlDataset type to describe the input and output of model execution.

    For details about the call sequence, see Preparing Input/Output Data Structure for Model Execution.

    Notes:

    • If the size obtained by calling acl.mdl.get_input_size_by_index is 0, the input shape is dynamic. A larger input buffer can be estimated based on the actual situation.
    • If the size obtained by using acl.mdl.get_output_size_by_index is 0, the output shape is dynamic. A larger output buffer can be estimated based on the actual situation.
  4. After a model is successfully loaded and before the model is executed, call acl.mdl.set_dataset_tensor_desc to set the tensor description (mainly the shape information) of the dynamic shape input.

    When calling acl.create_tensor_desc to create a tensor description, set the shape information, including the number of dimensions and the number of elements in each dimension, which must be within the range of the input shape set during model building. For details about model building, see Model Building.

  5. (Optional) Create an Allocator descriptor and register an Allocator.
    Note: Currently, external Allocators can be used to manage the memory only in the dynamic-shape model inference scenario. The API for registering an Allocator must be used together with the acl.mdl.execute_async API and must be called before the acl.mdl.execute_async API.
    1. Call acl.rt.allocator_create_desc to create an Allocator descriptor.
    2. Call acl.rt.allocator_set_obj_to_desc, acl.rt.allocator_set_alloc_func_to_desc, acl.rt.allocator_set_get_addr_from_block_func_to_desc, and acl.rt.allocator_set_free_func_to_desc to set the Allocator object and callback function.
    3. Call acl.rt.allocator_register to register the Allocator and bind the Allocator to the stream. The same stream must be used for model execution.
    4. After an Allocator is registered, you can call the acl.rt.allocator_destroy_desc API to destroy the Allocator descriptor.
  6. Execute the model.

    For example, call acl.mdl.execute (synchronous) to execute the model.

  7. Obtain the model execution result.

    Call acl.mdl.get_dataset_tensor_desc to obtain the tensor description of the dynamic shape output, and then use the operation API of the aclTensorDesc type to obtain the tensor description attributes. The following describes how to obtain size (size occupied by tensor data) as an example. Then, read the data of the corresponding size from the memory.

  8. (Optional) If an Allocator is registered in 5, deregister and destroy registered Allocator.

    The Allocator registered by a user is bound to a stream. If the Allocator needs to be released or destroyed, call allocator_unregister to deregister the Allocator before releasing the stream. Then, release the stream resources and destroy the Allocator.

Sample Code

After APIs are called, add an exception handling branch, and record error logs and warning logs. The following is a code snippet of key steps only, which is not ready to be built or run.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# In this sample, assume that the first input of the model is dynamic with an index of 0, and the first output of the model is dynamic with also an index of 0.
# 1. Load the model. After the model is successfully loaded, set the tensor description for dynamic input, mainly the shape information.
# ......

# 2. Prepare the model description modelDesc_, and the model inputs input_ and model outputs output_.
# Pay attention to the following:
# If the size obtained by using acl.mdl.get_input_size_by_index is 0, the input shape is dynamic. A larger input buffer can be estimated based on the actual situation.
# If the size obtained by using acl.mdl.get_output_size_by_index is 0, the output shape is dynamic. A larger output buffer can be estimated based on the actual situation.
# ......

# 3. Customize a function to set the description of the dynamic input tensor.
def set_tensor_desc():
    # ......
    # Create a tensor description. You do not need to set the data type and format. pyACL directly obtains them from the model. The default values are used here.
    # The shape must be the same as that of the specified input data.
    shapes = [1, 3, 224, 224]
    inputDesc = acl.create_tensor_desc(0, shapes, 0)

    # Set the description of the dynamic input tensor whose index is 0.
    ret = acl.mdl.set_dataset_tensor_desc(input_, inputDesc, 0)
    # ......

# 4. Create an Allocator descriptor and register an Allocator.
# Assume that the allocator is the Allocator object that the user wants to register.
def resgister_custom_allocator(allocator, stream):
    # 6.1 Create AllocatorDesc.
    allocatorDesc = acl.rt.allocator_create_desc();
    # 6.2 Initialize AllocatorDesc and set callback functions related to Allocator memory allocation and deallocation.
    acl.rt.allocator_set_obj_to_desc(allocatorDesc, allocator);
    acl.rt.allocator_set_alloc_func_to_desc(allocatorDesc, CustomMallocFunc);
    acl.rt.allocator_set_free_func_to_desc(allocatorDesc, CustomFreeFunc);
    acl.rt.allocator_set_alloc_advise_func_to_desc(allocatorDesc, CustomMallocAdviseFunc);
    acl.rt.allocator_set_get_addr_from_block_func_to_desc(allocatorDesc, CustomGetBlockAddrFunc);
    # Register the Allocator and bind it to the stream. The API is created based on AllocatorDesc.
    acl.rt.allocator_register(stream, allocatorDesc);
    // After the Allocator descriptor is used, it can be destroyed.
    acl.rt.allocator_destroy_desc(allocatorDesc); 

def model_execute():
    # ......
    # Call a custom API to set the description of the dynamic input tensor.
    set_tensor_desc()

    # Execute the model.
    ret = acl.mdl.execute(modelId, input_, output_)

    # Obtain the description of the dynamic output tensor whose index is 0.
    outputDesc = acl.mdl.get_dataset_tensor_desc(output_, 0)

    # Use the operation API of the aclTensorDesc type to obtain the outputDesc attribute. You need to obtain size (size occupied by tensor data) as an example: Then, read the data of the corresponding size from the memory.
    outputFileName = str(ss)

    outputDesc_size = acl.get_tensor_desc_size(outputDesc)
    dataBuffer = acl.mdl.get_dataset_buffer(output_, 0)
    data = acl.get_data_buffer_addr(dataBuffer)
    outHostData = None

    # Call acl.rt.get_run_mode to obtain the run mode of the software stack and determine whether to transfer data, based on the run mode.
    runMode = None
    ret = acl.rt.get_run_mode(runMode)
    if runMode == ACL_HOST:
        ret = acl.rt.malloc_host(outHostData, outputDesc_size)

        # Since the memory allocated for the dynamic shape is large, the actual data size outputDesc_size is used to copy the memory.
        ret = acl.rt.memcpy(outHostData, outputDesc_size, data, outputDesc_size, ACL_MEMCPY_DEVICE_TO_HOST)
        with open(outputFileName, "wb") as f:
            f.write(outHostData)
        ret = acl.rt.free_host(outHostData)
    else:
        with open(outputFileName, "wb") as f:
            f.write(data)
    # ......

# 5. Process the model inference result.
# TODO
# 6. Deregister and destroy the registered Allocator.
def unresgister_custom_allocator(stream):
   acl.rt.allocator_unregister(stream)# Destroy the customized Allocator.