Tuning Procedure

This section provides tuning command examples in offline inference scenarios.

  • If there is only one AOE process, ensure that the following conditions are met. If there are multiple AOE processes, perform the expansion based on the following conditions.
    • Available disk space in the home directory of the user who performs tuning: ≥ 20 GB
    • Available memory: ≥ 32 GB Note: If operators with large shapes exist in the model, more memory may be required.
    • Recommended quantity of Host CPUs during operator tuning when --model_path is not specified: ≥ TE_PARALLEL_COMPILER + TUNING_PARALLEL_NUM + 1 + min(Number of CPU cores/2, 8) + 50; Recommended quantity of Host CPUs when --model_path: ≥ TE_PARALLEL_COMPILER + TUNING_PARALLEL_NUM + 1 + min(Number of CPU cores/2, 8) + 58. For details about TE_PARALLEL_COMPILER and TUNING_PARALLEL_NUM, see Table 1 and Table 1.
    • During subgraph tuning, the recommended quantity of Host CPUs is: ≥ 2 x TUNING_PARALLEL_NUM + TE_PARALLEL_COMPILER + 1. For details about TE_PARALLEL_COMPILER and TUNING_PARALLEL_NUM, see Table 1 and Table 1.
    • Number of device cores ≥ Maximum number of cores used by all operators in the model
    • Device memory: related to the model and model memory overcommitment.
  • Before tuning, disable the profiling function to avoid affecting the tuning result. For details about how to disable the profiling function, see the Performance Tuning Tool User Guide .
  • AOE does not allow different users to use the same device for tuning at the same time.
  • The AOE tuning engine also provides other functions controlled by environment variables. For details, see Environment Variable Configuration.
  • You are advised to perform subgraph tuning and then operator tuning. The reason is that performing subgraph tuning first can generate the graph partition mode. After subgraph tuning is complete, the operators are partitioned into the final shapes. Operator tuning can then be performed based on the final shapes. If operator tuning is performed first, the shapes of the tuned operators are not the final shapes after operator partitioning, which does not meet the actual application scenarios.
  • Run the AOE tuning engine to tune subgraphs.

    Command example:

    aoe --framework=0 --model=./resnet18.prototxt --weight=./resnet18.caffemodel --job_type=1
  • Run the AOE tuning engine to reload and tune subgraphs.

    After the current subgraph tuning process is interrupted, if you want to continue tuning from the previous phase, select this mode.

    Command example:

    aoe --framework=0 --model=./resnet18.prototxt --weight=./resnet18.caffemodel --job_type=1 --reload

    This command must be used in the same directory where the tuning command is executed last time. The reason is that the intermediate file of the last tuning is required for reloading tuning. The intermediate file is stored in the aoe_workspace directory in the path where the last tuning command is executed.

  • Run the AOE tuning engine to tune operators.

    Command example:

    aoe --framework=0 --model=./resnet18.prototxt --weight=./resnet18.caffemodel --job_type=2

For more AOE parameters, see AOE Command-Line Options.