Compiling an Operator to Generate the Kernel Package

  1. Prepare the operator information file to be dumped. Currently, there are two methods to dump the operator .json file. Choose one as required.
    • When using Python APIs of PyTorch for programming, you can dump the operator .json file by calling the Ascend PyTorch Profiler API. For details, see "Other Collection Methods > Profile Data Collection with PyTorch Framework APIs > Ascend PyTorch Profiler APIs" in Performance Tuning Tool User Guide .
      1. Use the Ascend PyTorch Profiler API to enable profile data collection during PyTorch-based training.

        Before training, enable the operator information statistics function in the extended option experimental_config, that is, set record_op_args to True.

      2. View the result file of profile data collected during PyTorch-based training.

        After the training is complete, the dumped operator .json file is stored in the {worker_name}_{timestamp}_ascend_pt_op_args/{pid} directory by default.

    • When using the AscendCL C++ APIs for programming, you can use the aclopStartDumpArgs and aclopStopDumpArgs APIs to dump the operator statistics file to a specified directory.
  2. In any directory, run the corresponding command as the running user (for example, HwHiAiUser) to compile the operator:

    Command example of compilation (default mode):

    op_compiler --op_params_dir=<dump_dir>  --soc_version=<soc_version> --log=info --job=128 --output=<output_dir>
    op_compiler -p <dump_dir>  -v <soc_version> -l info -j 128 -o <output_dir>

    Command example of compilation (tuning mode):

    op_compiler --op_params_dir=<dump_dir>  --soc_version=<soc_version> --log=info --job=128 --compile_mode=tune --output=<output_dir>
    op_compiler -p <dump_dir>  -v <soc_version> -l info -j 128 -m tune -o <output_dir>
    • --op_params_dir: -p for short. This option is required, and specifies the absolute or relative path where the operator information file exported by the dump tool is stored.
    • --soc_version: -v for short. This option is required when the operator compilation function is performed, and specifies the model of the AI processor during operator compilation.

      If the soc_version of the current device cannot be determined, run the npu-smi info command on the server where the NPU driver package is installed and add Ascend before the obtained value of Name. For example, if the value of Name is xxxyy, the actual soc_version is Ascendxxxyy.

    • --log: -l for short. This option is optional and specifies the log level during operator compilation. It can be set to debug, info, warning, error, or null (default).
    • --job: -j for short. This option is optional and specifies the number of working processes during compilation. The minimum value is 1 and the default value is 16.
    • --compile_mode: -m for short. This option is optional and can be set to tune, indicating that the tuning mode is enabled and the tuning and compilation process is executed. When this option is not used, the default compilation process is executed.
    • --output: -o for short. This option is optional, and specifies the absolute or relative path and name of the output installation package, for example, xxx/xxx/xxx.run. If no path is specified, the installation package is generated in the current path. If no package name is specified, the default name static_kernel_${datatime}.run is used.

    For details about all options supported by the tool, see Command-Line Options.

    If you see information similar to the following, the compilation is successful.

    1
    generate run package static_kernel_${datatime}.run success
    

You can use --count and -p together to count the number of operator information .json files in the directory specified by -p.

An example is as follows.

op_compiler -p <dump_dir> --count

Operation information can be dumped only for dynamic shapes. After the static kernel package is installed, the information of the corresponding operator will not be dumped. If the network is adjusted, you can check whether the static kernel package matches the current network by comparing the number of dumped .json files before and after the adjustment.

That is, perform the dump operation before and after the network adjustment and then use the --count option to count the number of .json files generated after the dump operation. If the count after the adjustment is greater than that before the adjustment, some operators in the static kernel package no longer match the current network. In this case, you can choose either of the following methods:

  • Uninstall the static kernel package, perform the dump operation again, and compile and install the new static kernel package.
  • Use the current static kernel package still. In this case, a dynamic process is started for unmatched operators, and performance improvement cannot be achieved.