Preparing Dump Data of an Offline Model
Precautions
- Before dumping data, build and run the application project of the model to ensure that the project is normal.
- Dump data is generated during inference. If the number of cycles is large, the dump data volume increases accordingly. You are advised to perform inference only once during data dump. In foundation model training scenarios, dumping a large amount of data typically requires a significant amount of time. One solution is to use dump_data to enable the operator statistics function, use the statistics to identify potentially abnormal operators, and then proceed to dump the abnormal operators.
- In Docker scenarios, dump is not supported in containers.
- The aclInit() and aclmdlSetDump() APIs are provided to dump data.
Dump Data Generation
Perform the following steps to dump data of the offline model:
- Open the code file of the inference application project where the aclInit() function is located, view the called aclInit() or aclmdlSetDump() function, and obtain the path of the acl.json file.
If aclInit() or aclmdlSetDump() is initialized to empty, pass the acl.json path created in 2 to the call. The acl.json path is relative to the path of the binary file generated during project build.
- Modify the acl.json file in the directory (if the file does not exist, create it to the out directory after project build) to add the dump configuration in the following format.The following is an example of model dump configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
{ "dump":{ "dump_list":[ { "model_name":"ResNet-101" }, { "model_name":"ResNet-50", "layer":[ "conv1conv1_relu", "res2a_branch2ares2a_branch2a_relu", "res2a_branch1", "pool1" ] } ], "dump_path":"$HOME/output", "dump_mode":"output", "dump_op_switch":"off", "dump_data":"tensor" } }
The following is an example of dump configuration of the single-operator model execution mode in the single-operator dump scenario:
1 2 3 4 5 6 7 8
{ "dump":{ "dump_path":"output", "dump_list":[], "dump_op_switch":"on", "dump_data":"tensor" } }
The following is an example of dump configuration of the single-operator API execution mode in the single-operator dump scenario:
1 2 3 4 5 6 7
{ "dump":{ "dump_path":"output", "dump_list":[], "dump_data":"tensor" } }
- Run the application to generate dump data files. The path and format of the generated dump data files are described as follows.
Dump file path: {dump_path}/{time}/{deviceid}/{model_name}/{model_id}/{data_index}/{dump file}
For a single-operator model, the dump path is {dump_path}/{time}/{deviceid}/{dump file}.
Table 2 Path format of a dump file Path Key
Description
Note
dump_path
Dump path configured in the acl.json file.
-
time
Dump time.
Formatted as YYYYMMDDHHMMSS.
deviceid
Device ID.
-
model_name
Model name.
Periods (.), forward slashes (/), backslashes (\), and spaces in model_name are replaced with underscores (_).
model_id
Model ID.
-
data_index
Execution sequence number of each task, indexed starting at 0. This value is increased by 1 every dump.
-
The dump data file is named in the format of {op_type}.{op_name}.{task_id}.{stream_id}.{timestamp}.
- A dot (.), slash (/), backslash (\), or space in op_type and op_name in the dump file will be converted to an underscore (_).
- If the length of a file name exceeds the OS file name length limit (generally 255 characters), the dump file is renamed a string of random digits. For details about the mapping, see the mapping.csv file in the same directory.
- During graph execution, the following operators do not generate dump data:
- Before graph execution, some operators are not delivered to the device for execution, such as conditional operators (if/while/for/case), data operators (Data/RefData/Const), and data flow operators (StackPush/StackPop/Concat/Split).
- During graph optimization, GE marks some operators so that they are not delivered to the device for execution. The _no_task attribute in the dump graph of these operators is true.
- Operators that cannot go through the final execution in the graph.