NPU Data Dump

  1. Modify the training script and insert the dump configuration.
    • Training configuration example in session.run mode:
      import precision_tool.tf_config as npu_tf_config
      config = npu_tf_config.session_dump_config(config, action='dump')
      sess = tf.Session(config)
    • Training configuration example in Estimator mode:
      import precision_tool.tf_config as npu_tf_config
      dump_config=npu_tf_config.estimator_dump_config(action='dump')
      npu_config = NPURunConfig(dump_config=dump_config)
    • In session.run mode, the dump configuration and Rec SDK TensorFlow model saving cannot be used at the same time.
    • During multi-device training, you only need to add the dump configuration to the training of one device. Otherwise, data conflicts will occur when multiple devices save data at the same time.
  2. Perform training.

    Change the maximum number of training steps to 1 and perform training. The dump data file is generated in the directory specified by precision_data/npu/debug_0/, that is, precision_data/npu/debug_0/dump/{time}/{deviceid}/{model_name}/{model_id}/{data_index}. The following is an example of the file directory structure:

    precision_data/npu/debug_0/dump/20240125153144/0/ge_default_20240125153322_41/6/0/
    Table 1 Dump data file path format

    Path Key

    Description

    Remarks

    dump_path

    Dump data path. If a relative path is set, the full path is used.

    -

    time

    Time when the dump data file is flushed to drives.

    Format: YYYYMMDDHHMMSS

    deviceid

    Device ID.

    -

    model_name

    Subgraph name.

    The model_name layer may have multiple folders. The dump data is obtained from the directory corresponding to the computational graph name.

    If the value of model_name contains periods (.), slashes (/), backslashes (\), or spaces, replace them with underscores (_).

    model_id

    Subgraph ID.

    -

    data_index

    Number of iterations to be dumped.

    If dump_step is specified, the value of data_index is the same as that of dump_step. If dump_step is not specified, the value of data_index starts from 0 and increases by 1 each time an iteration is dumped.