Dynamic batch_size

The section describes how to support the dynamic batch_size function at model build time.

Overview

batch_size is the number of images processed per batch during model inference. In static batch_size scenarios, the batch_size is determined by the value of N in the shape. In dynamic batch_size scenarios, the batch_size can be set dynamically. For example, to process two, four, or eight images per inference batch, you can set the dynamic batch size profiles to 2,4,8. The memory will be allocated based on the maximum preset batch size.

Procedure

  1. In Data operator definition, set the dynamic dimension in the shape to -1.
    1
    2
    3
    4
    5
        auto shape_data = vector<int64_t>({ -1,1,28,28 });
        TensorDesc desc_data(ge::Shape(shape_data), FORMAT_NCHW, DT_FLOAT);
        auto data = op::Data("data");
        data.update_input_desc_data(desc_data);
        data.update_output_desc_out(desc_data);
    
  2. At model build time, set INPUT_SHAPE and INPUT_FORMAT in the options argument passed to the aclgrphBuildModel call and specify the batch size of each by setting DYNAMIC_BATCH_SIZE.
    • INPUT_FORMAT must be consistent with the format of each Data operator, and only NCHW and NHWC are supported. Otherwise, model build fails.
    • INPUT_SHAPE must be set.
    1
    2
    3
    4
    5
    6
    7
    void PrepareOptions(std::map<AscendString, AscendString>& options) {
        options.insert({
            {ge::ir_option::INPUT_FORMAT, "NCHW"},
            {ge::ir_option::INPUT_SHAPE, "data:-1,1,28,28"}, // -1 in INPUT_SHAPE indicates dynamic batch.
            {ge::ir_option::DYNAMIC_BATCH_SIZE, "2,4,8"}     // Set the batch size profiles.
        });
    }
    

Precautions

  • This function is exclusive with dynamic image size and dynamic dimension.
  • Too large batch sizes or too many batch size profiles will cause model build failures.
  • If the number of images to be processed each batch is unfixed, you can set this parameter to dynamically allocate the number of images to be processed per batch. For example, to run inference on two, four, or eight images per batch, set this option to 2,4,8. Memory will be allocated based on the runtime batch size.
  • If this option is used to set the dynamic batch_size during model build, you need to perform the following operations before calling the model execution APIs to run an application project for inference:
    • Use the aclmdlSetDynamicBatchSize API to set the real batch_size profile.
    • If aclmdlSetDynamicBatchSize is not used, the maximum value in the batch_size range is assigned by default during model execution.

    For details about the API, see ""aclmdlSetDynamicBatchSize"".

  • If you have set a dynamic batch_size as well as dynamic AIPP (by setting INSERT_OP_FILE):

    In your inference code, call the aclmdlSetInputAIPP API to set dynamic AIPP parameters. Ensure that the batch_size is set to the allowed maximum batch size profile. For details about the APIs, see ""aclmdlSetInputAIPP"".

  • The offline model generated with this option configured with the dynamic batch_size feature, which may result in structural differences compared with models generated without this option, leading to different inference performance.
  • In the scenario where you have set too large batch sizes or too many batch size profiles, when performing inference in the operating environment, you are advised to run the swapoff -a command to disable the use of swap space as memory to prevent slow operating environment.