Preparing NPU-side Dump Data and Computational Graph Files

Precautions

You have completed the development, compilation, and execution of the training or online inference network to ensure a functional project is available.
The data dump process in this section is for reference only. For details, see TensorFlow 1.15 Model Porting Guide.
Dump data is generated in every iteration. If the training dataset is large, the dump data volume in each iteration increases accordingly. You are advised to control the number of iterations to one. In foundation model training scenarios, dumping a large amount of data typically requires a significant amount of time. One solution is to use dump_data to enable the operator statistics function, use the statistics to identify potentially abnormal operators, and then proceed to dump the abnormal operators.
In multi-device environments, differing process startup times for each device will result in multiple timestamped directories during data dump.
When the command is executed in a container, the generated data is stored in the container.
If the training/online inference network contains random factors, remove them before dumping.
Ensure that your code is the same as the code for the training/online inference on the GPUs in terms of the network structure, operator, optimizer, and parameter initialization policy. Otherwise, the comparison is meaningless.
Performing training and evaluation within the same script is not recommended. Doing so will generate two sets of dump data, which can easily lead to confusion during analysis.
Currently, only the AICPU, AI Core, and HCCL operators support data dump.

Dump Parameter Configuration

Modify the training/online inference script to enable the dump function. Add the lines in bold in the corresponding positions of the script.

In Estimator mode, collect dump data using dump_config in NPURunConfig. Before NPURunConfig is created, instantiate a DumpConfig class for dump configuration, including the dump path, iterations to dump, and the dump mode (operator inputs or outputs).

from npu_bridge.npu_init import *

# dump_path: dump path. Create the specified path in advance in the training/online inference environment (either in a container or on the host). The running user configured during installation must have the read and write permissions on this path.
# enable_dump: dump enable.
# dump_step: iterations to dump.
# dump_mode: dump mode, selected from input, output, and all.
dump_config = DumpConfig(enable_dump=True, dump_path = "/home/output", dump_step="0|5|10", dump_mode="all")

config = NPURunConfig(
  dump_config=dump_config, 
  session_config=session_config
  )

In sess.run mode, set the dump parameters by setting the session configuration items enable_dump, dump_path, dump_step, and dump_mode.

config = tf.ConfigProto()

custom_op =  config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name =  "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True

custom_op.parameter_map["enable_dump"].b = True
custom_op.parameter_map["dump_path"].s = tf.compat.as_bytes("/home/output") 
custom_op.parameter_map["dump_step"].s = tf.compat.as_bytes("0|5|10")
custom_op.parameter_map["dump_mode"].s = tf.compat.as_bytes("all") 
custom_op.parameter_map["dump_layer"].s = tf.compat.as_bytes("nodename1 nodename2 nodename3")
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF

with tf.Session(config=config) as sess:
  print(sess.run(cost))

Operator overflow/underflow may occur during TensorFlow model training/online inference. In this case, do not directly perform accuracy comparison; otherwise, the comparison result will be inaccurate. For details about how to enable collection of overflow/underflow data, see Overflow/Underflow Operator Data Collection and Analysis.

Obtaining Dump Data and Computational Graph Files

Run the training/online inference script to generate the dump data file and computational graph file.

After dump data collection is enabled, a dump file of the computational graph (basic dump without data such as weights; only the graph optimized and compiled by the GE is dumped) is automatically generated in the current execution directory during script execution. This computational graph file is used to search for dump data files in the follow-up accuracy analysis. You can also use the environment variable DUMP_GRAPH_PATH to specify the path for storing the dump graph file. The following is an example:

export DUMP_GRAPH_PATH=/home/dumpgraph

The dump data file is generated in the directory specified by {dump_path}, that is, the {dump_path}/{time}/{device_id}/{model_name}/{model_id}/{data_index} directory. For example, if {dump_path} is set to /home/output, the dump data file is stored in the /home/output/20200808163566/0/ge_default_20200808163719_121/11/0 directory.

**Table 1** Path format of a dump file
Path Key	Description	Remarks
dump_path	Path for storing the dump data. (If a relative path is set, the corresponding absolute path applies.)	-
time	Dump time.	Format: YYYYMMDDHHMMSS
device_id	Device ID.	-
model_name	Subgraph name.	If the *model_name* directory contains more than one folder, dump data in the folder with the same name as the computational graph is used. Periods (.), forward slashes (/), backslashes (\), and spaces in model_name are replaced with underscores (_).
model_id	Subgraph ID.	--
data_index	Iterations to dump.	If dump_step is specified, *data_index* equals to dump_step. If it is not specified, *data_index* starts at 0 and is incremented by 1 with each dump.

Select a computational graph file.
- Method 1:
  After the training script is executed, you might find that more than one GE graph file is generated to the training script directory. To select the right computational graph file, save the TensorFlow model as a .pb file and view the .pb model. Choose the name of a random compute operator as the search keyword, and search for the keyword in the generated graph files. The graph that gives a match is the desired computational graph file, whose name is indicated by the name field under graph.
- Method 2:
  Search for the keyword Iterator in all dump files whose names end with _Build.txt. Record the name of the computational graph file, which will be used in accuracy analysis.
```
grep Iterator *_Build.txt
```
  As shown in the preceding figure, ge_proto_00292_Build.txt is the required computational graph file.

Select the dump data file.

Open the computational graph file found in Step 2 and record the value of the name field in the first graph. In the following example, record the value "ge_default_20240613143502_1".

graph {
  name: "ge_default_20240613143502_1"
  op {
    name: "atomic_addr_clean0_71"
    type: "AtomicAddrClean"
    attr {
      key: "_fe_imply_type"
      value {
        i: 6
      }
    }
  }
}

Go to the directory for storing the dump file named after the timestamp. The following folders exist in the directory:
Find the folder whose name is the recorded value, for example, ge_default_20240613143502_1. The files in the folder are the required dump data files.

The dump data file is named in the format of {op_type}.{op_name}.{task_id}.{stream_id}.{timestamp}.

For the following products, the file name may be in other formats:

Atlas A2 training products/Atlas A2 inference products

Atlas A3 training products/Atlas A3 inference products
- {op_type}.{op_name_lxsliceN}.({stream_id}.){task_id}.{timestamp}.{task_type}.{context_id}.{thread_id}.{device_id}
- {op_type}.{op_name}.({stream_id}.){task_id}.{timestamp}.{task_type}.{context_id}.{thread_id}.{device_id}
- A dot (.), slash (/), backslash (\), or space in op_type and op_name in the dump file will be converted to an underscore (_).
- If the length of a file name exceeds the OS file name length limit (generally 255 characters), the dump file is renamed a string of random digits. For details about the mapping, see the mapping.csv file in the same directory.
- During graph execution, the following operators do not generate dump data:
  - Before graph execution, some operators are not delivered to the device for execution, such as conditional operators (if/while/for/case), data operators (Data/RefData/Const), and data flow operators (StackPush/StackPop/Concat/Split).
  - During graph optimization, GE marks some operators so that they are not delivered to the device for execution. The _no_task attribute in the dump graph of these operators is true.
  - Operators that cannot go through the final execution in the graph.

Parent topic: GPU vs. NPU (TensorFlow 1.15 Training/Online Inference)