Collecting Profile Data Using Ascend PyTorch Profiler
Ascend PyTorch Profiler is a set of APIs provided during the training process of LLMs in the Ascend PyTorch framework. It can collect raw profile data from the framework, CANN, and the device, and perform analysis on the data.
For details about Ascend PyTorch Profiler, see Other Profiling Methods > Using PyTorch APIs for Data Profiling and Parsing > Ascend PyTorch Profiler APIs in Profiling Instructions.
Add the following sample code to the training script (for example, train_*.py) to configure profiling parameters, and then start the training.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | import torch import torch_npu # Pre-configured parameters for Profiler data profiling and analysis. experimental_config = torch_npu.profiler._ExperimentalConfig( export_type=torch_npu.profiler.ExportType.Text, profiler_level=torch_npu.profiler.ProfilerLevel.Level1, msprof_tx=False, aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, l2_cache=False, op_attr=False, data_simplification=False, record_op_args=False, gc_detect_threshold=None ) # The number of the LLM training iterations. steps = 7 with torch_npu.profiler.profile( activities=[ torch_npu.profiler.ProfilerActivity.CPU, torch_npu.profiler.ProfilerActivity.NPU ], schedule=torch_npu.profiler.schedule(wait=0, warmup=0, active=2, repeat=2, skip_first=1), on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./result"), record_shapes=False, profile_memory=True, with_stack=False, with_modules=False, with_flops=False, experimental_config=experimental_config) as prof: for step in range(steps): # Model training train_one_step(step, steps, train_loader, model, optimizer, criterion) # Call the step method to perform profiling and analysis. prof.step() |
The file structure generated by the profiling is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | └── localhost.localdomain_139247_20240628101435_ascend_pt ├── profiler_info.json ├── profiler_metadata.json ├── ASCEND_PROFILER_OUTPUT │ ├── communication.json │ ├── communication_matrix.json │ ├── kernel_details.csv │ ├── memory_record.csv │ ├── npu_module_mem.csv │ ├── operator_details.csv │ ├── operator_memory.csv │ ├── step_trace_time.csv │ ├── op_statistic.csv │ ├── api_statistic.csv │ └── trace_view.json ├── FRAMEWORK └── PROF_000001_20230628101435646_FKFLNPEPPRRCFCBA ├── analyze ├── device_* ├── host ├── mindstudio_profiler_log └── mindstudio_profiler_output |
Parent topic: Troubleshooting