Tuning Procedure

Prerequisites

  • IR graph construction has been completed. For details about the development process, see the Ascend Graph Developer Guide.
  • The .air file for IR graph construction has been obtained. The obtaining procedure is as follows:
    1. Open the developed .cpp graph file. After graph construction and before the aclgrphBuildInitialize call, add the following code to generate the corresponding .air file. /path/to/graph.air corresponds the path and name of the .air file. Change it as required. For details about the SaveToFile API, see SaveToFile.
      graph.SaveToFile("/path/to/graph.air");
    2. Save the preceding .cpp file and run it again. Obtain the corresponding graph.air file.

      After the tuning is complete, comment out graph.SaveToFile("/path/to/graph.air"); to prevent redundant .air files from being generated.

    Ascend Intermediate Representation (AIR), similar to ONNX, is an open file format defined by Huawei for machine learning and can better adapt to Ascend AI Processors.

Procedure

  • If there is only one AOE process, ensure that the following conditions are met. If there are multiple AOE processes, perform the expansion based on the following conditions.
    • Available disk space in the home directory of the user who performs tuning: ≥ 20 GB
    • Available memory: ≥ 32 GB Note: If operators with large shapes exist in the model, more memory may be required.
    • Recommended quantity of Host CPUs during operator tuning when --model_path is not specified: ≥ TE_PARALLEL_COMPILER + TUNING_PARALLEL_NUM + 1 + min(Number of CPU cores/2, 8) + 50; Recommended quantity of Host CPUs when --model_path: ≥ TE_PARALLEL_COMPILER + TUNING_PARALLEL_NUM + 1 + min(Number of CPU cores/2, 8) + 58. For details about TE_PARALLEL_COMPILER and TUNING_PARALLEL_NUM, see Table 1 and Table 1.
    • During subgraph tuning, the recommended quantity of Host CPUs is: ≥ 2 x TUNING_PARALLEL_NUM + TE_PARALLEL_COMPILER + 1. For details about TE_PARALLEL_COMPILER and TUNING_PARALLEL_NUM, see Table 1 and Table 1.
    • Number of device cores ≥ Maximum number of cores used by all operators in the model
    • Device memory: related to the model and model memory overcommitment.
  • Before tuning, disable the profiling function to avoid affecting the tuning result. For details about how to disable the profiling function, see the Performance Tuning Tool User Guide .
  • AOE does not allow different users to use the same device for tuning at the same time.
  • The AOE tuning engine also provides other functions controlled by environment variables. For details, see Environment Variable Configuration.
  • You are advised to perform subgraph tuning and then operator tuning. The reason is that performing subgraph tuning first can generate the graph partition mode. After subgraph tuning is complete, the operators are partitioned into the final shapes. Operator tuning can then be performed based on the final shapes. If operator tuning is performed first, the shapes of the tuned operators are not the final shapes after operator partitioning, which does not meet the actual application scenarios.
  • Run the AOE to tune subgraphs.

    Command example:

    aoe --framework=1 --model=./xxxx.air --job_type=1
  • Run the AOE to reload and tune subgraphs.

    After the current subgraph tuning process is interrupted, if you want to continue tuning from the previous phase, select this mode.

    aoe --framework=1 --model=./xxxx.air --job_type=1 --reload

    This command must be used in the same directory where the tuning command is executed last time. The reason is that the intermediate file of the last tuning is required for reloading tuning. The intermediate file is stored in the aoe_workspace directory in the path where the last tuning command is executed.

  • Run the AOE to tune operators.

    Command example:

    aoe --framework=1 --model=./xxxx.air --job_type=2

    For more AOE parameters, see AOE Command-Line Options.