Function: set_dump

Applicability

Product

Supported (√/x)

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products

Atlas training products

Atlas inference products

Atlas 200I/500 A2 inference products

Function Usage

Sets dump parameters.

Prototype

  • C Prototype
    1
    aclError aclmdlSetDump(const char *dumpCfgPath)
    
  • Python Function
    1
    ret = acl.mdl.set_dump(dump_cfg_path)
    

Parameter Description

Parameter

Description

dump_cfg_path

  • Str, path of the configuration file, including the file name.

Currently, the following dump information can be configured: (If the operator input or output contains sensitive user information, information leakage may occur.)

  • Model dump configuration (used to export the input and output data of operators at each layer in the model) and single-operator dump configuration (used to export the input and output data of an operator). The exported data is used to compare with that of a specified model or operator to locate accuracy issues. For details about the configuration example, description, and restrictions, see Examples of Model Dump Configuration and Single-Operator Dump Configuration. Dump configurations are disabled by default.
  • Dump configuration of the exception operator (used to export the input and output data, workspace information, and tiling information of the exception operator). The exported data is used to analyze AI Core errors. For details about the configuration example, see Example of Dump Configuration for Exception Operators. Dump configurations are disabled by default.
  • Overflow/Underflow operator dump configuration (used to export the input and output data of the overflow/underflow operator in the model). The exported data is used to analyze overflow/underflow causes and locate model accuracy issues. For details about the configuration example, description, and restrictions, see Example of Overflow/Underflow Operator Dump Configuration. By default, this dump configuration is disabled.
  • Configuration for operator dump watch mode (used to enable the observation mode for the output data of a specified operator). If you suspect that the memory is overwritten by other operators after locating the accuracy issues of some operators and excluding the calculation issues of the operators, you can enable the dump watch mode. For details about the configuration example and restrictions, see Dump Watch Configuration for Operators. The dump watch mode is disabled by default.

Return Value Description

Return Value

Description

ret

Int, error code: 0 on success; else, failure.

Restrictions

  • The configured dump information is valid only when the model is loaded after the dump function is enabled by calling this API. The dump configuration does not take effect on models loaded before this API call unless you reload the models after this API call.

    For example, in the following API calling sequence, the dump configuration is valid only for model 2.

    acl.mdl.init_dump --> model 1 loading --> acl.mdl.set_dump --> model 2 loading --> acl.mdl.finalize_dump

  • If this API is called repeatedly to set the dump configuration for the same model, the most recent configuration is applied.

    For example, in the following API call sequence, the second dump configuration call overwrites the first call:

    acl.mdl.init_dump --> acl.mdl.set_dump --> acl.mdl.set_dump --> model 1 loading --> acl.mdl.finalize_dump

Reference

The acl.init API is also provided. During initialization, the dump configuration is passed as a JSON configuration file to dump the app data at run time. In this mode, the acl.init API can be called only once in a process. To modify the dump configuration, you need to modify the configuration in the JSON file.

Dump Watch Configuration for Operators

Set dump_scene to watcher to enable dump watch for operators. Below is an example of the content in the configuration file. The configuration effect is as follows: (1) After operators A and B are executed, the output of operators C and D is dumped; (2) After operators C and D are executed, the output of operators C and D is also dumped. The dump files of operators C and D in (1) will be compared with those in (2) to check whether operator A or B overwrites the output memory of operator C or D.

{
    "dump":{
        "dump_list":[
            {
                "layer":["A", "B"],
                "watcher_nodes":["C", "D"]
            }
        ],
        "dump_path":"/home/",
        "dump_mode":"output",
        "dump_scene":"watcher"
    }
}

The details are as follows:

  • If the operator dump watch mode is enabled, the overflow/underflow operator dump (by configuring the dump_debug parameter) or the single-operator model dump (by configuring the dump_op_switch parameter) cannot be enabled. Otherwise, an error will be reported. Dump watch cannot be applied in the single-operator API dump scenario.
  • In dump_list, the layer parameter is used to configure the names of the operators that may overwrite the memory of other operators, and the watcher_nodes parameter is used to configure the names of the operators with accuracy issues possibly due to output memory being overwritten by other operators.
    • If layer is unspecified, the output of the operators configured for watcher_nodes is dumped after all operators that support dump in the model are executed.
    • If an operator configured for layer and watcher_node is not in the static graph and static subgraph, the configuration does not take effect.
    • If an operator name configured for layer and watcher_node is duplicate, or an operator configured for layer is a collective communication operator (the operator type starts with Hcom, for example, HcomAllReduce), only the dump file of the operator configured for watcher_node is exported.
    • For a fusion operator, its name configured for watcher_node must be the name of the operator after fusion. If the name of an operator before fusion is configured, no dump file will be exported.
    • Currently, model_name cannot be configured in dump_list.
  • If the operator dump watch mode is enabled, dump_path, which is the path for storing the exported dump file, must be configured.

    The exported dump files cannot be viewed using a text tool. To view the content of a dump file, convert the dump file to a NumPy file and then view the NumPy file using Python. For details about the conversion procedure, see "Viewing Dump Files" in Accuracy Debugging Tool Guide.

    dump_path can be either absolute or relative.
    • An absolute path starts with a slash (/), for example, /home.
    • A relative path starts with a directory name, for example, output.
  • dump_mode is used to specify the data of the operators configured for watcher_nodes to be exported. Currently, only output can be configured.