Operator Tuning
This section provides guidance on tuning operators in PyTorch-based training scenarios, including tuning precautions, operator graph dump, environment variable configuration, and tuning commands.
Tuning Precautions
- Ensure that the training script can be successfully executed on the Ascend AI Processor and the function and accuracy meet the expectation.
- You are not advised binding the training process to a specified CPU. Use the default CPU scheduling policy. Otherwise, the tuning effect may be affected.
- To improve the tuning efficiency, you are advised to control the number of training steps as much as possible. Generally, a complete graph execution process can be completed through one step. Ensure that all operators in the graph can be traversed for tuning.
- Currently, only static operators are supported. Dynamic operators are not supported.
- Only single-device scripts can be used to dump graphs.
- AOE does not allow different users to use the same device for tuning at the same time.
- If there is only one AOE process, ensure that the following conditions are met. If there are multiple AOE processes, perform the expansion based on the following conditions.
- Available disk space in the home directory of the user who performs tuning: ≥ 20 GB
- Available memory: ≥ 32 GB
- Recommended quantity of Host CPUs during operator tuning: ≥ TE_PARALLEL_COMPILER + TUNING_PARALLEL_NUM + 1 + min(Number of CPU cores/2, 8) + 58. For details about TE_PARALLEL_COMPILER and TUNING_PARALLEL_NUM, see Table 1 and Table 1.
- Number of device cores ≥ Maximum number of cores used by all operators in the model
- Device memory: related to the model and model memory overcommitment.
- Before tuning, disable the profiling function to avoid affecting the tuning result. For details about how to disable the profiling function, see the Performance Tuning Tool User Guide .
Dumping the Operator Graph
Method 1: Dump the operator graph by calling aclGenGraphAndDumpForOp.
# Confirm the PyTorch framework version in the first line of the model script.
import torch
if torch.__version__ >= "1.8":
import torch_npu
import torch.npu
def train_model():
# For version 1.8 or later, refer to the following setting to enable the AOE dump interface. dump_path is the path for storing the operator subgraph. Set it as required.
torch_npu.npu.set_aoe(dump_path)
train_model_one_step() # Model training process. Generally, only one step is required for model training.
# dump_path indicates the path for saving the dumped operator graph. It is mandatory and cannot be empty. If the configured path does not exist, the system will create a path. Multi-level directories are supported.
#line 427~437
model.train()
optimizer.zero_grad()
end = time.time()
torch_npu.npu.set_aoe(dump_path) # Enable the interface.
# Graph mode
if args.graph_mode:
print("args.graph_mode")
torch.npu.enable_graph_mode()
if i > 0: # Only one step is required.
exit()
if i > 100:
pass
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
Reference link: https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/PyTorch/built-in/cv/classification/ResNet50_for_PyTorch/pytorch_resnet50_apex.py
Configuring Environment Variables
- Basic environment variables of the CANN software
The CANN portfolio provides a process-level environment variable setting script to automatically set environment variables. The following commands are used as examples, in which the default installation paths are under the root or non-root user. Replace them with actual installation paths.
# Install Toolkit as the root user. . /usr/local/Ascend/ascend-toolkit/set_env.sh # Install Toolkit as a non-root user. . ${HOME}/Ascend/ascend-toolkit/set_env.sh - AOE depends on Python. Take Python 3.7.5 as an example. Run the following commands as the running user to configure the environment variables related to Python 3.7.5:
# Set the Python 3.7.5 library path. export LD_LIBRARY_PATH=/usr/local/python3.7.5/lib:$LD_LIBRARY_PATH # If multiple Python 3 versions exist in the user environment, use Python 3.7.5. export PATH=/usr/local/python3.7.5/bin:$PATH
Replace the Python 3.7.5 installation path based on the actual requirements. You can also write the preceding commands to the ~/.bashrc file and run the source ~/.bashrc command to make the modification take effect immediately.
- Before tuning, you can configure other optional environment variables by referring to the following example. For details, see Table 1.
export ASCEND_DEVICE_ID=0 export TUNE_BANK_PATH=/home/HwHiAiUser/custom_tune_bank export TE_PARALLEL_COMPILER=8 export REPEAT_TUNE=False
You can write the commands for configuring environment variables to the custom script for future use.
Performing Tuning
Use AOE to tune the prepared operator graph. The following is an example:
aoe --job_type=2 --model_path=dump_path
For more AOE parameters, see AOE Command-Line Options.