aclmdlSetDump
Applicability
|
Product |
Supported |
|---|---|
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
|
|
√ |
Description
Sets dump parameters.
- To execute two different models, you need to set dump configurations differently. The API call sequence is as follows: aclInit --> aclmdlInitDump --> aclmdlSetDump --> model 1 loading --> model 1 execution --> aclmdlFinalizeDump --> model 1 unloading --> aclmdlInitDump --> aclmdlSetDump --> model 2 loading --> model 2 execution --> aclmdlFinalizeDump --> model 2 unloading --> execution of other tasks --> aclFinalize
- To execute the same model twice, you only need to perform the dump operation for the first execution. The API call sequence is as follows: aclInit --> aclmdlInitDump --> aclmdlSetDump --> model loading --> model execution --> aclmdlFinalizeDump --> model unloading --> model loading --> model execution --> execution of other tasks --> aclFinalize
Prototype
aclError aclmdlSetDump(const char *dumpCfgPath)
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
dumpCfgPath |
Input |
Pointer to the configuration file path, including the file name. The configuration file is in JSON format.
Currently, the following dump information can be configured: (If the operator input or output contains sensitive user information, information leakage may occur.)
|
Examples of Model Dump Configuration and Single-Operator Dump Configuration
After model dump or single-operator dump is configured, the exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about the comparison method, see Accuracy Debugging Tool Guide.
Example of model dump configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
{ "dump":{ "dump_list":[ { "model_name":"ResNet-101" }, { "model_name":"ResNet-50", "layer":[ "conv1conv1_relu", "res2a_branch2ares2a_branch2a_relu", "res2a_branch1", "pool1" ] } ], "dump_path":"$HOME/output", "dump_mode":"output", "dump_op_switch":"off", "dump_data":"tensor" } } |
Example of single-operator dump configuration:
1 2 3 4 5 6 7 8 |
{ "dump":{ "dump_path":"output", "dump_list":[], "dump_op_switch":"on", "dump_data":"tensor" } } |
Configuration File Example (Exception Operator Dump Configuration)
You can enable the dump function for exception operators by setting dump_scene. The following is an example of the configuration file, indicating that lightweight exception dump is enabled:
{
"dump":{
"dump_path":"output",
"dump_scene":"aic_err_brief_dump"
}
}
The details are as follows:
- dump_scene can be set to:
- aic_err_brief_dump: lightweight exception dump, which is used to export the input, output, and workspace data of the incorrect operators of AI Core.
- aic_err_norm_dump: common exception dump, which is used to export the shape, data type, format, and attribute information in addition to the lightweight exception dump.
- aic_err_detail_dump: exports the internal storage, register, and call stack information of AI Core in addition to the lightweight exception dump.
When configuring this parameter, pay attention to the following:
- This option is supported only by the following models and requires the driver of 25.0.RC1 or later:
Atlas A2 training products /Atlas A2 inference products Atlas A3 training products /Atlas A3 inference products You can click here to download the driver installation package of Ascend HDK 25.0.RC1 or later on the Firmware and Drivers page and install or upgrade the driver by referring to the document of the corresponding version.
- If the parameter is set to aic_err_detail_dump, this API must be called before the aclrtSetDevice call. In addition, aclmdlFinalizeDump cannot be used to deinitialize the dump.
- During dump file export, the AI Core where the faulty operator is located is suspended, which may affect the normal execution of other service processes on the device. After the dump files are exported, the AI Core is automatically restored.
- After the dump files are exported, the user service processes on the host are forcibly exited. The error reported during the forcible exit is not used as the input for AI Core problem analysis.
- If multiple user service processes on the host are specified with the same device and are configured with aic_err_detail_dump, the processes that are executed first export the dump files based on aic_err_detail_dump, and the processes that are executed later export the dump files based on aic_err_brief_dump.
- This option is supported only by the following models and requires the driver of 25.0.RC1 or later:
- lite_exception: indicates light exception dump. This value is provided to be compatible with earlier versions and is equivalent to aic_err_brief_dump.
- dump_path is an optional parameter, indicating the path for storing exported dump files.
The priority of the dump file storage path is as follows: NPU_COLLECT_PATH environment variable > ASCEND_WORK_PATH environment variable > dump_path in the configuration file > current execution directory of the application.
For details about the environment variable, see Environment Variables.
- To view the content of an exported dump file, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.
If dump_scene is set to aic_err_detail_dump, you can use msDebug to view the content of an exported dump file. For details, see Operator Development Tool User Guide.
- The exception operator dump configuration cannot be enabled together with the model dump configuration or single-operator dump configuration.
Example of Overflow/Underflow Operator Dump Configuration
{
"dump":{
"dump_path":"output",
"dump_debug":"on"
}
}
- If dump_debug is not set or set to off, the overflow/underflow operator configuration is disabled.
- If the overflow/underflow operator configuration is enabled, dump_path must be set to specify the path for storing exported dump files.
After obtaining the exported data files, parse the files by referring to "Overflow/Underflow Operator Data Collection and Analysis" in Accuracy Debugging Tool Guide.
dump_path can be either absolute or relative.- An absolute path starts with a slash (/), for example, /home.
- A relative path starts with a directory name, for example, output.
- The overflow/underflow operator configuration cannot be enabled if the model dump configuration or single-operator dump configuration is enabled. Otherwise, an error is returned.
- Only overflow/underflow data of AI Core operators can be collected.
Dump Watch Configuration for Operators
Set dump_scene to watcher to enable dump watch for operators. Below is an example of the content in the configuration file. The configuration effect is as follows: (1) After operators A and B are executed, the output of operators C and D is dumped; (2) After operators C and D are executed, the output of operators C and D is also dumped. The dump files of operators C and D in (1) will be compared with those in (2) to check whether operator A or B overwrites the output memory of operator C or D.
{
"dump":{
"dump_list":[
{
"layer":["A", "B"],
"watcher_nodes":["C", "D"]
}
],
"dump_path":"/home/",
"dump_mode":"output",
"dump_scene":"watcher"
}
}
The details are as follows:
- If dump watch is enabled for operators, the overflow/underflow operator dump (by setting dump_debug) and single-operator model dump (by setting dump_op_switch) cannot be enabled. Otherwise, an error will be reported. Dump watch cannot be applied in the single-operator API dump scenario.
- In dump_list, layer is the operators that may overwrite the memory of other operators, and watcher_nodes is the operators whose output memory may be overwritten by other operators. If the output of an operator is overwritten, the operator accuracy may decrease.
- If layer is not specified, the output of operators configured with watcher_nodes will be dumped after all operators that support dump in the model are executed.
- If any operator in layer and watcher_nodes is not in a static graph or static subgraph, the configuration does not take effect.
- If an operator is in both layer and watcher_nodes or an operator in layer is a collective communication operator (the operator type starts with Hcom, for example, HcomAllReduce), only the dump files of operators in watcher_nodes will be exported.
- For a fused operator, use its name after fusion when you add it to watcher_nodes. Otherwise, dump files cannot be exported.
- Currently, model_name cannot be configured in dump_list.
- If the dump watch configuration is enabled for operators, dump_path must be set to specify the path for storing exported dump files.
The exported dump files cannot be viewed using a text tool. To view the content of a dump file, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.
dump_path can be either absolute or relative.- An absolute path starts with a slash (/), for example, /home.
- A relative path starts with a directory name, for example, output.
- dump_mode is used to control what data of operators in watcher_nodes can be exported. Currently, the value can only be output.
Returns
0 on success; else, failure. For details, see aclError.
Restrictions
- The configured dump information is valid only when the model is loaded after the dump function is enabled by calling this API. The dump configuration does not take effect on a model loaded before this API call unless you reload the model after this API call.
For example, in the following API call sequence, the dump configuration is valid only for model 2:
aclmdlInitDump --> model 1 loading --> aclmdlSetDump --> model 2 loading --> aclmdlFinalizeDump
- If this API is called repeatedly to set the dump configuration for the same model, the most recent configuration is applied.
For example, in the following API call sequence, the dump configuration of the later call overwrites that of the previous call.
aclmdlInitDump --> aclmdlSetDump --> aclmdlSetDump --> model 1 loading --> aclmdlFinalizeDump
See Also
Currently, the aclInit API is also provided. During initialization, the dump configuration is passed as a JSON configuration file to dump the app data at runtime. In this mode, aclInit can be called only once in a process. To change the dump configuration, modify the JSON configuration file.