DumpConfig Constructor
Description
Constructor of the DumpConfig class, which is used to configure the dump function.
Prototype
def __init__(self, enable_dump=False, dump_path=None, dump_step=None, dump_mode="output", enable_dump_debug=False, dump_debug_mode="all", dump_data="tensor", dump_layer=None )
Options
Option |
Input/Output |
Description |
|---|---|---|
enable_dump |
Input |
Data dump enable
NOTE:
|
dump_path |
Input |
Dump path. This option is required when enable_dump or enable_dump_debug is set to True. Create the specified path in advance in the environment (either in a container or on the host) where training is performed. The running user configured during installation must have the read and write permissions on this path. The path can be an absolute path or a relative path relative to the path where the training script is executed.
|
dump_step |
Input |
Iterations to dump. Separate multiple iterations using vertical bars (|), for example, 0|5|10. You can also use hyphens (-) to specify the iteration range, for example, 0|3-5|10. If this option is not set, dump data of all iterations is collected. |
dump_mode |
Input |
Dump mode, for specifying whether to dump the inputs or outputs of an operator.
NOTE:
If this option is set to all, the input data of some operators, such as collective communication operators HcomAllGather and HcomAllReduce, will be modified during execution. Therefore, the system dumps the operator input before operator execution and dumps the operator output after operator execution. In this way, the dumped input and output data of the same operator is flushed to drives separately, and multiple dump files are generated. After parsing the dump files, you can determine whether the data is an input or output based on the file content. |
enable_dump_debug |
Input |
Overflow/underflow data collection enable.
NOTE:
|
dump_debug_mode |
Input |
Overflow/Underflow detection mode. The values are as follows:
|
dump_data |
Input |
Type of operator content to dump.
In large-scale training scenarios, dumping a large amount of data takes a long time. You can dump the statistics of all operators, identify the operators that may be abnormal based on the statistics, and then dump the input or output data of these abnormal operators. |
dump_layer |
Input |
Name of the operator to dump. Multiple operator names are separated by spaces. If this option is not set, all operators are dumped by default. If the input of the specified operator involves the data operator, the data operator information is also dumped. |
Returns
An object of the DumpConfig class, as the argument of NPURunConfig.
Restrictions
enable_dump and enable_dump_debug are mutually exclusive.
Examples
1 2 3 4 5 | from npu_bridge.npu_init import * ... dump_config = DumpConfig(enable_dump=True, dump_path="/home/HwHiAiUser/output", dump_step="0|5|10", dump_mode="all") session_config=tf.ConfigProto(allow_soft_placement=True) config = NPURunConfig(dump_config=dump_config, session_config=session_config) |