Operator Tuning
- Dump operator information.
Before operator tuning, obtain the operator information file (.json file) in the model, which contains the shape, dtype, and format information of the operator. Currently, there are two methods to dump the operator .json file. Choose one as required.
- When using Python APIs of PyTorch for programming, you can dump the operator .json file by calling the Ascend PyTorch Profiler API. For details, see "Other Collection Methods > Profile Data Collection with PyTorch Framework APIs > Ascend PyTorch Profiler APIs" in Performance Tuning Tool User Guide .
- Use the Ascend PyTorch Profiler API to enable profile data collection during PyTorch-based training.
Before training, enable the operator information statistics function in the extended option experimental_config, that is, set record_op_args to True.
- View the result file of profile data collected during PyTorch-based training.
After the training is complete, the dumped operator .json file is stored in the {worker_name}_{timestamp}_ascend_pt_op_args/{pid} directory by default.
- Use the Ascend PyTorch Profiler API to enable profile data collection during PyTorch-based training.
- When using C++ APIs of AscendCL for programming, you can dump the operator information file to a specified directory by calling aclopStartDumpArgs and aclopStopDumpArgs.
- When using Python APIs of PyTorch for programming, you can dump the operator .json file by calling the Ascend PyTorch Profiler API. For details, see "Other Collection Methods > Profile Data Collection with PyTorch Framework APIs > Ascend PyTorch Profiler APIs" in Performance Tuning Tool User Guide .
- Compile the static kernel package.
In any directory, run the corresponding command as the running user (for example, HwHiAiUser) to compile the operator:
- Command example of static compilation (default mode):
op_compiler --op_params_dir=<dump_dir> --soc_version=<soc_version> --log=info --job=128 --output=<output_dir> op_compiler -p <dump_dir> -v <soc_version> -l info -j 128 -o <output_dir>
- Command example of compilation (tuning mode):
op_compiler --op_params_dir=<dump_dir> --soc_version=<soc_version> --log=info --job=128 --compile_mode=tune --output=<output_dir> op_compiler -p <dump_dir> -v <soc_version> -l info -j 128 -m tune -o <output_dir>
The key parameters are described as follows. You can set them as required by referring to "Command-Line Options" in Operator Compilation Tool User Guide.
- --op_params_dir: -p for short. This is a mandatory parameter specifying the absolute or relative path where the operator information file exported by the dump tool is stored.
- --soc_version: -v for short. This is a parameter required during operator compilation and specifies the Ascend AI Processor version.
To query the value of soc_version of the current device:
- Run the npu-smi info command on the server where the Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy.
- --log: -l for short. This parameter is optional and specifies the log level during operator compilation. By default, this parameter is set to null, indicating that no log is generated. The other options are debug, info, warning, and error.
- debug: outputs debug, info, warning, error, and event logs.
- info: outputs info, warning, error, and event logs.
- warning: outputs warning, error, and event logs.
- error: outputs error and event logs.
- --job: -j for short. This parameter is optional and specifies the number of working processes during compilation. Default value: 2 x Maximum number of physical CPU cores – 1. The minimum value is 1.
- --compile_mode: -m for short. This parameter is optional and specifies the compilation mode. The options are compile (default) and tune.
- --output: -o for short. This parameter is optional and specifies the absolute or relative path of the output installation package, including the directory and package name, for example, xxx/xxx/*.run.
If no directory is specified, the installation package is generated in the current directory where the command is executed. If no package name is specified, the default name static_kernel_${datatime}.run is used.
You can use --count and -p together to count the number of operator information .json files in the directory specified by -p.
The following is an example:
op_compiler -p <dump_dir> --count
Operation information can be dumped only for dynamic shapes. After the static kernel package is installed, the information of the corresponding operator will not be dumped. If the network is adjusted, you can check whether the static kernel package matches the current network by comparing the number of dumped .json files before and after the adjustment.
That is, perform the dump operation before and after the network adjustment and then use the --count option to count the number of .json files generated after the dump operation. If the count after the adjustment is greater than that before the adjustment, some operators in the static kernel package no longer match the current network. In this case, you can choose either of the following methods:
- Uninstall the static kernel package, perform the dump operation again, and compile and install the new static kernel package.
- Still use the current static kernel package. In this case, a dynamic process is started for unmatched operators, and performance improvement cannot be achieved.
- Command example of static compilation (default mode):
- Install the static kernel package.Go to the directory where the static_kernel_${datatime}.run package is stored and run the package as the running user.
./static_kernel_${datatime}.runThe following information indicates that the installation is successful:
1 2
Verifying archive integrity... 100% SHA256 checksums are OK. All good. Uncompressing STATIC KERNEL RUN PACKAGE 100%
Currently, the installation directory cannot be specified. The .runfile is installed in the ${INSTALL_DIR} /opp/static_kernel directory by default.
The following is an example of the directory structure after the .run package is installed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|-- ${INSTALL_DIR}/opp/static_kernel |-- ai_core |-- config |-- ascendxxxx |-- binary_info_config.json # Indexes of all static kernel packages. |-- config.ini # Configuration file that records the installation sequence. |-- static_kernel_230808110316 # Static kernel file wth the timestamp 230808110316. |-- ascendxxxx | |-- Add # Directory for storing operator binary files. | |-- Add_float16_NCL_xxxx_d0.json | |-- Add_float16_NCL_xxxx_d1.json | |-- Add_float16_NCL_xxxx_d0.o | |-- Add_float16_NCL_xxxx_d1.o | |-- xxxx | |-- xxx.json | |-- xxx.o | |-- ...... |-- config # Index of a single static kernel package. | |-- ascendxxxx | |-- binary_info_config.json |-- scripts # Common scripts. | |-- ...... |-- uninstall.sh # Script for uninstalling a single package. |-- static_kernel_xxxx # Static kernel files with different timestamps. |-- uninstall.sh # Script for uninstalling all packages. |-- version.info # Version information.
Multiple kernel packages can be installed. If two packages contain the same operator kernel, the kernel package installed later is used.
- (Optional) Uninstall a single kernel package or all kernel packages.
- Uninstalling a single package
Go to the directory where the static_kernel_${datatime}.run package is installed and run uninstall.sh as the running user.
cd ${INSTALL_DIR}/opp/static_kernel/ai_core/static_kernel_${datatime} ./uninstall.shThe static_kernel_${datatime} folder is deleted from ai_core.
- Uninstalling all packages
Go to the ${INSTALL_DIR} /opp/static_kernel/ai_core directory and run uninstall.sh as the running user.
cd ${INSTALL_DIR}/opp/static_kernel/ai_core/ ./uninstall.shAll content in the ai_core directory is deleted, and all installed kernel packages are uninstalled.
- Uninstalling a single package