aclmdlSetDump

Description

Sets dump parameters.

aclmdlInitDump, aclmdlSetDump, and aclmdlFinalizeDump work together to record dump data to files. These APIs can be called for multiple times in a single process to obtain dump data of different dump configurations. The specific scenarios are exemplified as follows:

Restrictions

  • The configured dump information is valid only when the model is loaded after the dump function is enabled by calling this API. The dump configuration does not take effect on a model loaded before this API call unless you reload the model after this API call.

    For example, in the following API call sequence, the dump configuration is valid only for model 2:

    aclmdlInitDump --> model 1 loading --> aclmdlSetDump --> model 2 loading --> aclmdlFinalizeDump

  • If this API is called repeatedly to set dump configuration for the same model, the most recent configuration is applied.

    For example, in the following API call sequence, the dump configuration of the later call overwrites that of the previous call.

    aclmdlInitDump --> aclmdlSetDump --> aclmdlSetDump --> model 1 loading --> aclmdlFinalizeDump

Prototype

aclError aclmdlSetDump(const char *dumpCfgPath)

Parameters

Parameter

Input/Output

Description

dumpCfgPath

Input

Pointer to the configuration file path, including the file name. The configuration file is in JSON format.

Currently, the following dump information can be configured:
  • Model dump configuration (used to export the input and output data of operators at each layer in the model) and single-operator dump configuration (used to export the input and output data of an operator): The exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about the configuration example, description, and restrictions, see Configuration File Example (Model Dump and Single-Operator Dump). Dump configurations are disabled by default.
  • Exception operator dump configuration (used to export the input and output data, workspace information, and Tiling information of the exception operator). The exported data is used to analyze AI Core errors. For details about the configuration example, see Configuration File Example (Exception Operator Dump Configuration). Dump configurations are disabled by default.
  • Overflow/Underflow operator dump configuration (used to export the input and output data of the overflow/underflow operator in the model). The exported data is used to analyze overflow/underflow causes and locate model accuracy issues. For details about the configuration example, description, and restrictions, see Configuration File Example (Overflow/Underflow Operator Dump Configuration). Dump configurations are disabled by default.
  • Configuration for operator dump watch mode (used to enable the observation mode for the output data of a specified operator). If you suspect that the memory is overwritten by other operators after locating the accuracy issues of some operators and excluding the calculation issues of the operators, you can enable the dump watch mode. For details about the configuration example and restrictions, see Configuration File Example (Configuration for Operator Dump Watch Mode). The dump watch mode is disabled by default.

Configuration File Example (Model Dump and Single-Operator Dump)

Model dump configuration example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{                                                                                            
	"dump":{
		"dump_list":[                                                                        
			{	"model_name":"ResNet-101"
			},
			{                                                                                
				"model_name":"ResNet-50",
				"layer":[
				      "conv1conv1_relu",
				      "res2a_branch2ares2a_branch2a_relu",
				      "res2a_branch1",
				      "pool1"
				] 
			}  
		],  
		"dump_path":"$HOME/output",
                "dump_mode":"output",
		"dump_op_switch":"off",
                "dump_data":"tensor"
	}                                                                                        
}

The following is an example of dump configuration of the single-operator model execution mode in the single-operator dump scenario:

1
2
3
4
5
6
7
8
{
    "dump":{
        "dump_path":"output",
        "dump_list":[], 
	"dump_op_switch":"on",
        "dump_data":"tensor"
    }
}

The following is an example of dump configuration of the single-operator API execution mode in the single-operator dump scenario:

1
2
3
4
5
6
7
{
    "dump":{
        "dump_path":"output",
        "dump_list":[], 
        "dump_data":"tensor"
    }
}
Table 1 Format of the acl.json file

Parameter

Description

dump_list

(Required) List of network-wide models for data dump.

Create model dump configuration information. If multiple models need to be dumped, separate them with commas (,).

In the single-operator calling scenario (including single-operator model execution and single-operator API execution), dump_list is empty.

model_name

Model name. The value of model_name of each model must be unique.

  • To load a model from a file, enter the model file name without the name extension. You can also set this parameter to the value of the outermost name field in the .json file after ATC-based model conversion.
  • To load a model from memory, set this parameter to the value of the name field in the .json file after ATC-based model conversion.

layer

It is advised to dump certain operators only. Otherwise, excessive data may induce timeouts if the I/O performance is poor. This field can be used to specify the name of the operator to be dumped. The name can be the name of the operator after ATC model conversion or the name of the original operator before conversion.

  • Configure the operator name in each line in the format. Use commas (,) to separate operators.
  • You do not need to set model_name. In this case, the corresponding operators of all models are dumped by default. If model_name is set, the corresponding operators of the model are dumped.
  • If the input of the specified operator involves the data operator, the data operator information is dumped. To dump the data operator, enter the downstream nodes of the data operator.
  • To dump all operators of a model, the layer field does not need to be included.

dump_path

(Required) Directory for storing dump data files in the operating environment. The directory must be created in advance and the running user configured during installation must have the read and write permissions on the directory.

The path can be either absolute or relative.
  • An absolute path starts with a slash (/), for example, $HOME/output.
  • A relative path starts with a directory name, for example, output.

dump_mode

Dump mode.

  • input: dumps operator inputs only.
  • output (default): dumps operator outputs only.
  • all: dumps both operator inputs and outputs.

    Note: If this parameter is set to all, the input data of some operators, such as collective communication operators HcomAllGather and HcomAllReduce, will be modified during execution. Therefore, the system dumps the operator input before operator execution and dumps the operator output after operator execution. In this way, the dumped input and output data of the same operator is flushed to drives separately, and multiple dump files are generated. After parsing the dump files, you can determine whether the data is an input or output based on the file content.

dump_level

Dump data level. The options are as follows:

  • op: dumps data at the operator level.
  • kernel: dumps data at the kernel level.
  • all (default): dumps both op and kernel level data.

If the default value is used, there are a large number of dump files, for example, dump files starting with aclnn. If you have requirements on the dump performance or the memory resources are limited, you can set this parameter to the op level to improve the dump performance and reduce the number of dump files.

NOTE:

An operator is a representation of operation logic (for example, addition, subtraction, multiplication, and division operations). The kernel is the implementation of the operation logic for computing and needs a specific computing device to complete computing.

dump_op_switch

Dump data switch of the single-operator model execution mode in the single-operator dump scenario.

  • on: enables dump for the single-operator model.
  • off (default): disables dump for the single-operator model.

dump_step

Iterations to dump. This parameter is not required in the inference scenario.

If this parameter is not configured, dump data will be generated for all iterations by default, which may result in a large amount of data. You are advised to specify iterations as required.

Separate multiple iterations using vertical bars (|), for example, 0|5|10. You can also use hyphens (-) to specify the iteration range, for example, 0|3-5|10.

Configuration example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
	"dump":{
		"dump_list":[     
			...... 
		],  
		"dump_path":"$HOME/output",
                "dump_mode":"output",
		"dump_op_switch":"off",
                "dump_step": "0|3-5|10"
	}  
}
NOTE:

In the training scenario, if the dump_step parameter in acl.json is used to specify the iterations whose dump data is to be collected and the ge.exec.dumpStep parameter is configured in the GEInitialize API (this parameter is also used to specify the iterations whose dump data is to be collected), the last configured parameter will be used. For details about the GEInitialize API, see " GEInitialize" in the Ascend Graph Developer Guide.

dump_data

Type of the operator dump content. The options are as follows:

  • tensor (default): dumps operator data.
  • stats: dumps operator statistics. The result file is in .csv format and contains the operator name, input/output data type, maximum value, and minimum value.

Dumping a large amount of data typically requires a significant amount of time. One solution is to first dump operator statistics, use the statistics to identify potentially abnormal operators, and then proceed to dump the data of the identified operators.

In the model dump scenario, the information of operator input or output or both can be collected based on the configuration of dump_mode.

Configuration File Example (Exception Operator Dump Configuration)

If dump_scene is set to lite_exception or lite_exception_with_shape, the dump function of abnormal operators is enabled. In addition, the ASCEND_WORK_PATH environment variable can be used to configure the dump path. Otherwise, the dump path is the current execution directory of the application.

Note:

  • The dump configuration of the abnormal operator cannot be enabled together with the model dump configuration or single-operator dump configuration. Otherwise, the model dump or single-operator dump does not take effect.
  • lite_exception_with_shape is a trial feature and may be changed in later versions. It cannot be used in commercial products. The difference between lite_exception_with_shape and lite_exception is that in the graph scenario, lite_exception_with_shape can be used to export a dump file with the operator shape information.

The following gives a configuration example:

{
   "dump":{
           "dump_scene":"lite_exception"
    }
}

The collected dump files cannot be viewed using a text tool. To view the dump file content, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see ""Viewing Dump Files" " in Accuracy Debugging Tool Guide.

Configuration File Example (Overflow/Underflow Operator Dump Configuration)

Note the following restrictions on the overflow/underflow operator dump configuration:
  • If dump_debug is set to on, the overflow/underflow operator configuration is enabled. If dump_debug is not set or set to off, the overflow/underflow operator configuration is disabled.
  • If this configuration is enabled, dump_path, which is the directory for storing exported data files, must be set.
    The path can be either absolute or relative.
    • An absolute path starts with a slash (/), for example, /home.
    • A relative path starts with a directory name, for example, output.
  • The overflow/underflow operator configuration cannot be enabled if the model dump configuration or single-operator dump configuration is enabled. Otherwise, an error is returned.
  • Only overflow/underflow data of AI Core operators can be collected.
The following gives a configuration example.
{
    "dump":{
        "dump_path":"output",
        "dump_debug":"on"
    }
}

For details about how to parse the exported data file, see Overflow/Underflow Operator Data Collection and Analysis in Accuracy Debugging Tool Guide.

Configuration File Example (Configuration for Operator Dump Watch Mode)

Set dump_scene to watcher to enable the operator dump watch mode. The configuration description and restrictions are as follows:

  • If the operator dump watch mode is enabled, the overflow/underflow operator dump (by configuring the dump_debug parameter) or the single-operator model dump (by configuring the dump_op_switch parameter) cannot be enabled. Otherwise, an error will be reported. This mode does not take effect in the single-operator API dump scenario.
  • In dump_list, the layer parameter is used to configure the names of the operators that may overwrite the memory of other operators, and the watcher_nodes parameter is used to configure the names of the operators with accuracy issues possibly due to output memory being overwritten by other operators.
    • If layer is specified, the output of the operators configured for watcher_nodes is dumped after all operators that support dump in the model are executed.
    • If an operator configured for layer and watcher_node is not in the static graph and static subgraph, the configuration does not take effect.
    • If an operator name configured for layer and watcher_node is duplicate, or an operator configured for layer is a collective communication operator (the operator type starts with Hcom, for example, HcomAllReduce), no dump file will be exported.
    • For a fusion operator, its name configured for watcher_node must be the name of the operator after fusion. If the name of an operator before fusion is configured, no dump file will be exported.
    • Currently, model_name cannot be configured in dump_list.
  • If the operator dump watch mode is enabled, dump_path, which is the path for storing the exported dump file, must be configured.
    The path can be either absolute or relative.
    • An absolute path starts with a slash (/), for example, /home.
    • A relative path starts with a directory name, for example, output.
  • dump_mode is used to specify the data of the operators configured for watcher_nodes to be exported. Currently, only output can be configured.

The following is an example of the content in the configuration file. The configuration effect is as follows: After operators A and B are executed, the output of operators C and D is dumped, and the following four dump files are dumped to check whether operator A or B overwrites the output memory of operator C or D: opType.A_To_C.* (operator C), opType.A_To_D.* (operator D), opType.B_To_C.* (operator C), and opType.B_To_D.* (operator D).

{
    "dump":{
        "dump_list":[
            {
                "layer":["A", "B"],
                "watcher_nodes":["C", "D"]
            }
        ],
        "dump_path":"/home/",
        "dump_mode":"output",
        "dump_scene":"watcher"
    }
}

The collected dump files cannot be viewed using a text tool. To view the dump file content, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.

Returns

The value 0 indicates success, and other values indicate failure. For details, see aclError.

Related APIs

AscendCL also provides the aclInit API. During AscendCL initialization, the dump configuration is passed as a JSON configuration file to dump the app data at run time. In this mode, aclInit can be called only once in a process. To change the dump configuration, modify the JSON configuration file.