How Do I Handle Exceptions in the Generated .npy File Names in Batches?

Symptom

When generating dump data of a TensorFlow model, the names of some .npy files may be truncated due to the tfdbg or operating environment. As a result, the .npy file names may not meet the naming requirements, and dump file conversion may fail.

Possible Cause

The tfdbg or operating environment is faulty.

Solution

Perform the following steps to re-generate the .npy files:

  • The script names and paths in this document are used as examples. Replace them as required.
  • After batch processing, if the dump file of an operator exists but the comparison result is NaN, check whether the {op_name} field of the dump file name of the operator is the same as the TensorFlow operator name. If not, manually change {op_name} to the TensorFlow operator name. If a slash (/) exists, replace it with an underscore (_).
  1. Execute the TensorFlow project.

    In the interactive debugger command line, enter run to run the script.

  2. Run lt > tensor_name to temporarily store all tensor names to a file.
  3. Create an executable script, for example, pt_cmd.sh, to obtain the tensor_index corresponding to tensor_name in the tensor_name file.

    The script content is as follows.

    #!/bin/bash
    timestamp=$[$(date +%s%N)/1000]
    index=1
    while read -r line
    do
      tensor_index=`echo $line | awk '{print $4}'`
      echo "pt "$tensor_index" -n 0 -w "$((index++))"."$timestamp".npy" >> $2
    done < $1

    Grant the execute permission on the pt_cmd.sh script and execute the script.

    chmod +x pt_cmd.sh
    bash pt_cmd.sh tensor_name tensor_name.txt
  4. Go back to the tfdbg command line, run the script, and paste and execute the content in the tensor_name.txt file generated in the previous step to save all .npy files.
  5. Move the generated .npy files to a new folder, for example, npy_dir.
  6. Create an executable script, for example, index_to_tensorname.sh, and run the script to change the .npy file names in batches.

    The script content is as follows.

    #!/bin/bash
    timestamp=$[$(date +%s%N)/1000]
    while read -r line
    do
      tensor_index=`echo $line | awk '{print $2}'`
      real_file=`echo $line | awk '{print $6}'`
      changed1_tensor_index=${tensor_index//\//_}
      changed2_tensor_index=${changed1_tensor_index//:/.}
      echo $2/$real_file $2/$changed2_tensor_index"."$timestamp".npy"
      if [ -r $2/$real_file ]
      then
        mv $2/$real_file $2/$changed2_tensor_index"."$timestamp".npy"
      fi
    done < $1

    Add the execute permission to the index_to_tensorname.sh script and execute the script.

    chmod +x index_to_tensorname.sh
    bash index_to_tensorname.sh tensor_name.txt npy_dir