Data Preparation
Inference Scenario
Include the op_debug_level and debug_dir options in the ATC command. The process is as follows:
- Open the Model Converter dialog box.
- Expand Advanced Options.
- Turn the Additional Arguments switch on and add op_debug_level (for example, --op_debug_level=2) and debug_dir (debug output path, which is under the project directory. Manually create the compile_path_infer folder in the path). See Figure 1.
In Windows environments, you cannot convert a model by turning the Additional Arguments switch on and adding op_debug_level and debug_dir. Instead, you can directly convert the model with the Additional Arguments switch off.
Training Scenario
- If an AI Core error occurs during training, perform the following steps to configure the op_debug_level and enable_exception_dump parameters:
- In Estimator mode, set op_debug_level and enable_exception_dump as follows.
from npu_bridge.estimator.npu.npu_config import NPURunConfig from npu_bridge.estimator.npu.npu_config import DumpConfig session_config=tf.ConfigProto() config = NPURunConfig( op_debug_level = 2, //Enable operator debug. session_config=session_config, enable_exception_dump=1 //Dump the inputs and outputs of the error operator to the script execution directory. Dynamic-shape operators cannot be dumped. )
- In sess.run mode, set op_debug_level and enable_exception_dump as follows.
import tensorflow as tf from npu_bridge.estimator import npu_ops from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig config = tf.ConfigProto() custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True custom_op.parameter_map["enable_exception_dump"].i = 1 //Dump the inputs and outputs of the error operator to the script execution directory. Dynamic-shape operators cannot be dumped. custom_op.parameter_map["op_debug_level"].i = 2 //Enable operator debug. config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # Disable remapping. with tf.Session(config=config) as sess: print(sess.run(cost))
- In Estimator mode, set op_debug_level and enable_exception_dump as follows.
- After the retrain is complete, an instruction mapping file and an error operator dump file are generated in the training execution directory.
Value Range of op_debug_level
|
Value |
Description |
|---|---|
|
0 (default) |
Disables operator debug. |
|
1 |
Enables operator debug and generates a TBE instruction mapping file. In this case, an operator CCE file (*.cce) and a Python-CCE mapping file (*_loc.json), and operator .o and .json files are generated in the kernel_meta folder in the training script execution directory. You can locate the AI Core error by using the line numbers in the CCE code and TBE code of the error operator. |
|
2 |
Enables operator debug and generates a TBE instruction mapping file. In this case, an operator CCE file (*.cce) and a Python-CCE mapping file (*_loc.json), and operator .o and .json files are generated in the kernel_meta folder in the training script execution directory, and the build optimization is disabled by enabling the CCE compiler -O0-g. You can locate the AI Core error by using the line numbers in the CCE code and TBE code of the error operator. |
