Performance Overhead Caused by the PyTorch Profiler

Symptom

During model training, the steps where the PyTorch profiler starts and ends show significant performance overhead compared to other steps.

Possible Causes

Before the profiler starts profiling, there is noticeable performance overhead during the initialization and warm-up phases of the profiler.

In scenarios involving automatic parsing after profiling, there is significant performance overhead during the parsing phase.

Solution

When using the profiler to collect profile data, you can derive more accurate step time estimates by excluding the initial setup and final parsing steps, focusing solely on the steady-state steps in between.