PyTorch Profiling APIs
Prerequisites
Prepare a model trained on PyTorch 1.8.1 or 1.11.0 and a matched dataset, and port the model to the Ascend AI Processor. For details, see "Porting Adaptation" in the PyTorch Training Model Porting and Tuning Guide .
Profile Data Collection
Use the Profiling API to reconstruct the loss calculation and optimization process of the original code.
1 2 3 4 5 6 7 8 9 10 11 12 |
# Use the Profiling API adapted to Ascend-PyTorch. You are advised to run only one step. with torch.autograd.profiler.profile(use_npu=True) as prof: out = model(input_tensor) loss=loss_func(out) loss.backward() optimizer.zero_grad() optimizer.step() # Print the Profiling result. print(prof) # Export the chrome_trace file to a specified path. output_path = '/home/HwHiAiUser/profile_data.json' prof.export_chrome_trace(output_path) |
To ensure data accuracy, you are advised to perform the prof operation for more than 10 steps. The profile data after step 10 is more accurate.
After the collection is complete, the profile_data.json file is generated. For details, see Profile Data Viewing.
Profile Data Viewing
Enter chrome://tracing in the address box of Google Chrome, drag the profile_data.json file to the blank space, and press the shortcut keys (w: zoom in; s: zoom out; a: move left; d: move right) on the keyboard to view the file. See Figure 1.
To analyze the profile data, perform the following steps:
- Click the
button marked by 1 in the figure. - Select the timeline data marked by 2 (data required by the user) in the figure.
- Click the
button marked by 3 in the figure. The detailed data information is shown in the area marked by 4. - According to the selftime data sorted in descending order in the area marked by 4, find the top N time-consuming operators and analyze the performance problems in the model.
For details about how to tune the performance of a PyTorch model, see "Performance Tuning" in the PyTorch Training Model Porting and Tuning Guide .
More Functions
- Obtain the shape information of the input tensor of an operator.
1 2 3 4
# Add the record_shapes parameter to obtain the shape information of the input tensor. with torch.autograd.profiler.profile(use_npu=True, record_shapes=True) as prof: # Add the model calculation process. print(prof)
The Input Shape information of each operator is added to the printed result.
- Obtain the memory information of the NPU in use.
1 2 3 4
# Add profiling parameters to obtain the memory usage of the operator. with torch.autograd.profiler.profile(use_npu=True, profile_memory=True) as prof: # Add the model calculation process. print(prof)
The CPU Mem, Self CPU Mem, NPU Mem, and Self NPU Mem information of each operator is added to the printed result.
This function is supported only by PyTorch 1.8.1 and later versions.
- Obtain simple operator performance information.
This function prints only the operator information at the bottom layer of each operator stack, simplifying the analysis result.
1 2 3 4 5 6
# Add the use_npu_simple parameter to obtain the simple operator information. with torch.autograd.profiler.profile(use_npu=True, use_npu_simple=True) as prof: # Add the model calculation process. output_path = '/home/HwHiAiUser/profile_data.json' # Export the chrome_trace file to a specified path. prof.export_chrome_trace(output_path)
Open the chrome_trace result file in Google Chrome to view the simple operator performance information.
