Dynamic BatchSize

The section describes how to support the dynamic BatchSize function at model build time.

Overview

BatchSize is the number of images processed per batch during model inference. In static batch size scenarios, the batch size is determined by the value of the N dimension in the shape. The dynamic BatchSize feature is introduced to support the scenario where the batch size is not determinable until runtime. For example, to process two, four, or eight images per inference batch, you can set the dynamic batch size profiles to 2,4,8. The memory will be allocated based on the maximum preset batch size.

Procedure

In Data operator definition, set the dynamic dimension in the shape to -1.

        
                 auto shape_data = vector<int64_t>({ -1,1,28,28 });
    TensorDesc desc_data(ge::Shape(shape_data), FORMAT_ND, DT_FLOAT);
    auto data = op::Data("data");
    data.update_input_desc_data(desc_data);
    data.update_output_desc_out(desc_data);

At model build time, set INPUT_SHAPE and INPUT_FORMAT in the options argument passed to the aclgrphBuildModel call and specify the batch size of each by setting DYNAMIC_BATCH_SIZE.

INPUT_FORMAT must be consistent with the format of each Data operator, and only NCHW and NHWC are supported. Otherwise, model build fails.
INPUT_SHAPE is optional. If this parameter is not set, the shape of the corresponding Data nodes is used by default. Otherwise, the passed argument is used and updated to those of the corresponding Data nodes.

        
             void PrepareOptions(std::map<AscendString, AscendString>& options) {
    options.insert({
        {ge::ir_option::INPUT_FORMAT, "NCHW"},
        {ge::ir_option::INPUT_SHAPE, "data:-1,1,28,28"}, // -1 in INPUT_SHAPE indicates dynamic batch.
        {ge::ir_option::DYNAMIC_BATCH_SIZE, "2,4,8"}     // Set the batch size profiles.
    });
}

Precautions

This function is exclusive with dynamic image size and dynamic dimensions.

Too large batch sizes or too many batch size profiles will cause model build failures.
If the number of images to be processed each batch is unfixed, you can set this parameter to dynamically allocate the number of images to be processed per batch. For example, to run inference on two, four, or eight images per batch, set this option to 2,4,8. Memory will be allocated based on the runtime batch size.
If this option is used to set the dynamic batch during model build, you need to perform the following operations before calling the model execution APIs to run an application project for inference:
- Use the aclmdlSetDynamicBatchSize API provided by AscendCL to set the runtime BatchSize.
- If aclmdlSetDynamicBatchSize is not called, the maximum value in the batch size range is assigned by default during model execution.
For details about the APIs, see aclmdlSetDynamicBatchSize.
If you have set dynamic batch sizes as well as dynamic AIPP (by setting INSERT_OP_FILE):
In your inference code, call the aclmdlSetInputAIPP API provided by AscendCL to set dynamic AIPP parameters. Ensure that batchSize is set to the maximum batch size. For details about the APIs, see aclmdlSetInputAIPP.
The offline model generated with this option included is configured with the dynamic batch size feature, which might have structure differences from that generated without this option and therefore shows different inference performance.
In the scenario where you have set too large batch sizes or too many batch size profiles, you are advised to run the swapoff -a command to disable the use of swap space as memory to prevent slow operating environment.

Parent topic: Defining Sparse Convolution Operator 3 (FixedRulegenSSC)