NPU Data Dump
- Modify the training script and insert the dump configuration.
- Training configuration example in session.run mode:
import precision_tool.tf_config as npu_tf_config config = npu_tf_config.session_dump_config(config, action='dump') sess = tf.Session(config)
- Training configuration example in Estimator mode:
import precision_tool.tf_config as npu_tf_config dump_config=npu_tf_config.estimator_dump_config(action='dump') npu_config = NPURunConfig(dump_config=dump_config)
- In session.run mode, the dump configuration and Rec SDK TensorFlow model saving cannot be used at the same time.
- During multi-device training, you only need to add the dump configuration to the training of one device. Otherwise, data conflicts will occur when multiple devices save data at the same time.
- Training configuration example in session.run mode:
- Perform training.
Change the maximum number of training steps to 1 and perform training. The dump data file is generated in the directory specified by precision_data/npu/debug_0/, that is, precision_data/npu/debug_0/dump/{time}/{deviceid}/{model_name}/{model_id}/{data_index}. The following is an example of the file directory structure:
precision_data/npu/debug_0/dump/20240125153144/0/ge_default_20240125153322_41/6/0/
Table 1 Dump data file path format Path Key
Description
Remarks
dump_path
Dump data path. If a relative path is set, the full path is used.
-
time
Time when the dump data file is flushed to drives.
Format: YYYYMMDDHHMMSS
deviceid
Device ID.
-
model_name
Subgraph name.
The model_name layer may have multiple folders. The dump data is obtained from the directory corresponding to the computational graph name.
If the value of model_name contains periods (.), slashes (/), backslashes (\), or spaces, replace them with underscores (_).
model_id
Subgraph ID.
-
data_index
Number of iterations to be dumped.
If dump_step is specified, the value of data_index is the same as that of dump_step. If dump_step is not specified, the value of data_index starts from 0 and increases by 1 each time an iteration is dumped.