Preparing .npy Data of a TensorFlow Model

This version does not support the generation of .npy files of a TensorFlow model. You need to install the TensorFlow environment and prepare .npy file in advance. This section provides only an example of the TensorFlow .npy file for reference.

To use a dump file in binary format for comparison, convert the .npy file to a dump file. For details, see How Do I Convert an .npy File into a Dump File?.

Before generating .npy files of a TensorFlow model, a complete, executable, standard TensorFlow model application project is required. You can use the TensorFlow debugger (tfdbg) to generate .npy files. The major steps are as follows:

Modify the TensorFlow application project script to add the debugging configuration option by adding the following code:

Estimator mode:

         
              from tensorflow.python import debug as tf_debug
training_hooks = [train_helper.PrefillStagingAreaHook(), tf_debug.LocalCLIDebugHook()]

Add the tfdbg hook, as shown in Figure 1.

Figure 1 Estimator mode

Session.run mode:

         
              from tensorflow.python import debug as tf_debug
sess = tf_debug.LocalCLIDebugWrapperSession(sess, ui_type="readline")

Set the tfdbg wrapper before run, as shown in Figure 2.

Figure 2 Session.run mode

Run the inference script.
In the interactive debugger command line, enter run to run the script.
Collect .npy files.
After the script is executed, you can run the lt command to query the stored tensors, run the pt command to view the tensor content, and save it as a file in NumPy format.

The tfdbg dumps only one tensor at a time. To dump all tensors, perform the following steps:
1. Run lt > tensor_name to temporarily store all tensor names to a file.
2. Exit the tfdbg command line, enter the Linux command line, and run the following command to generate commands to run in tfdbg:
timestamp=$[$(date +%s%N)/1000] ; cat tensor_name | awk '{print "pt",$4,$4}' | awk '{gsub("/", "_", $3);gsub(":", ".", $3);print($1,$2,"-n 0 -w "$3".""'$timestamp'"".npy")}' > tensor_name_cmd.txt
- The tensor names in the example are stored in the tensor_name_cmd.txt file. The .npy file names meet the naming rules for accuracy comparison, where, tensor_name is the name of the file that stores all tensor names and timestamp is of 16 bits.
- You can also run the command in the new window without exiting the tfdbg command line.
1. Go back to the tfdbg command line, run the script, and run the command generated in the previous step for saving all .npy files.
  By default, .npy files are stored using numpy.save(). Slashes (/) and colons (:) are replaced by underscores (_).
  
  If the command cannot be pasted on the CLI, run the mouse off command in the tfdbg command line to disable the mouse mode before pasting again.
2. Check that names of the generated .npy files comply with the naming rules, as shown in Figure 3.
  - Names of the .npy files are in {op_name}.{output_index}.{timestamp}.npy format, where op_name must comply with the A-Za-z0-9_- regular expression rule, timestamp is of 16 bits, and output_index is a number.
  - If the name of an .npy file exceeds 255 characters due to the long operator name, comparison of this operator is not supported.
  - The name of some .npy files may not meet the naming requirements due to the tfdbg or operating environment. You can manually rename the files based on the naming rules. If there are a large number of .npy files that do not meet the requirements, generate .npy files again. For details, see Handling Exceptions in the Generated .npy File Names in Batches.
Figure 3 Viewing the .npy files

Parent topic: Data Preparation