Compiling an Operator to Generate the Kernel Package

In any directory, run the corresponding command as the running user (for example, HwHiAiUser) to compile the operator:

Import the dumped operator information file to compile the debug build. The following is a command example of the compilation:

op_compiler --op_params_dir=<dump_dir>  --op_debug_config=gm_debug.cfg --soc_version=<soc_version> --log=info --job=128 --output=<output_dir> 
op_compiler -p <dump_dir>  -d gm_debug.cfg -v <soc_version> -l info -j 128 -o <output_dir>

Import kernel_name to compile the debug build. The following is a command example of the compilation:

op_compiler --kernel_name=<kernel_name>  --op_debug_config=gm_debug.cfg --soc_version=<soc_version> --log=info --job=128  --output=<output_dir>
op_compiler -k <kernel_name>  -d gm_debug.cfg -v <soc_version> -l info -j 128 -o <output_dir>
  • Either --op_params_dir or --kernel_name can be used. If the kernel name of the operator can be obtained from the error log, you are advised to use kernel_name.
    • --op_params_dir: -p for short. This option specifies the absolute or relative path where the operator information file exported by the dump tool is stored. Currently, there are two methods to dump the operator .json file. Choose one as required.
      • When using Python APIs of PyTorch for programming, you can dump the operator .json file by calling the Ascend PyTorch Profiler API. For details, see "Other Collection Methods > Profile Data Collection with PyTorch Framework APIs > Ascend PyTorch Profiler APIs" in Performance Tuning Tool User Guide .
        1. Use the Ascend PyTorch Profiler API to enable profile data collection during PyTorch-based training.

          Before training, enable the operator information statistics function in the extended option experimental_config, that is, set record_op_args to True.

        2. View the result file of profile data collected during PyTorch-based training.

          After the training is complete, the dumped operator .json file is stored in the {worker_name}_{timestamp}_ascend_pt_op_args/{pid}_debug directory by default.

      • When using the AscendCL C++ APIs for programming, you can use the aclopStartDumpArgs and aclopStopDumpArgs APIs to dump the operator statistics file to a specified directory.
    • --kernel_name: -k for short. This option obtains kernel_name from the error log reported during operator execution.
  • --op_debug_config: -d for short. This option specifies the path and name of the debugging configuration file. The path can be an absolute path or a relative path. Example:
    --op_debug_config=$HOME/module/gm_debug.cfg

    The following is an example of the configuration file content. For details about the configuration method, see --op_debug_config.

    op_debug_config=ccec_g,oom
  • --soc_version: -v for short. This option is required when the operator compilation function is performed, and specifies the model of the AI processor during operator compilation.

    If the soc_version of the current device cannot be determined, run the npu-smi info command on the server where the NPU driver package is installed and add Ascend before the obtained value of Name. For example, if the value of Name is xxxyy, the actual soc_version is Ascendxxxyy.

  • --log: -l for short. This option is optional and specifies the log level during operator compilation. It can be set to debug, info, warning, error, or null (default).
  • --job: -j for short. This option is optional and specifies the number of working processes during compilation. The minimum value is 1 and the default value is 16.
  • --output: -o for short. This option is optional, and specifies the absolute or relative path and name of the output installation package, for example, xxx/xxx/xxx.run. If no path is specified, the installation package is generated in the current path. If no package name is specified, the default name debug_kernel_${datatime}.run is used.

For details about all options supported by the tool, see Command-Line Options.

If you see information similar to the following, the compilation is successful.

generate run package debug_kernel_${datatime}.run success