Profile Data Collection
Prerequisites
Complete Model Development and Migration to obtain the GPU and NPU environments that can properly execute training jobs.
Before collecting profile data, delete the Accuracy Collection API from the training script (main.py) because accuracy data collection and profile data collection cannot be performed at the same time.
Collection
- Add the Ascend PyTorch Profiler API tool to the training script (main.py) in the GPU and NPU environments.
23 24 import torch_npu 25 from torch_npu.contrib import transfer_to_npu 26 ... 322 experimental_config = torch_npu.profiler._ExperimentalConfig( 323 export_type=torch_npu.profiler.ExportType.Text, 324 profiler_level=torch_npu.profiler.ProfilerLevel.Level1, 325 msprof_tx=False, 326 aic_metrics=torch_npu.profiler.AiCMetrics.AiCoreNone, 327 l2_cache=False, 328 op_attr=False, 329 data_simplification=False, 330 record_op_args=False, 331 gc_detect_threshold=None) 332 with torch_npu.profiler.profile( 333 activities=[ 334 torch_npu.profiler.ProfilerActivity.CPU, 335 torch_npu.profiler.ProfilerActivity.NPU 336 ], 337 schedule=torch_npu.profiler.schedule(wait=0, warmup=0, active=1, repeat=1, skip_first=1), 338 on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data"), 339 record_shapes=False, 340 profile_memory=False, 341 with_stack=False, 342 with_modules=False, 343 with_flops=False, 344 experimental_config=experimental_config) as prof: 345 for i, (images, target) in enumerate(train_loader): 346 # measure data loading time 347 data_time.update(time.time() - end) 348 349 # move data to the same device as model 350 images = images.to(device, non_blocking=True) 351 target = target.to(device, non_blocking=True) 352 353 # compute output 354 output = model(images) 355 loss = criterion(output, target) 356 357 # measure accuracy and record loss 358 acc1, acc5 = accuracy(output, target, topk=(1, 5)) 359 losses.update(loss.item(), images.size(0)) 360 top1.update(acc1[0], images.size(0)) 361 top5.update(acc5[0], images.size(0)) 362 363 # compute gradient and do SGD step 364 optimizer.zero_grad() 365 loss.backward() 366 optimizer.step() 367 prof.step() ...
- For details about the APIs in the example, see Ascend PyTorch Profiler APIs of Performance Tuning Tool User Guide .
- Profile data occupies certain disk space. As a result, the server may be unavailable when the disk space is used up. The space required by profile data is closely related to the model parameters, collection configurations, and number of collection iterations. You need to ensure that the available disk space in the directory where profile data is flushed is sufficient.
- Run the training script command. The tool collects the profile data during model training.
python main.py -a resnet50 -b 32 --gpu 1 --dummy
- View the result file of profile data collected during PyTorch-based training.
After the training is complete, the collection result directory of the Ascend PyTorch Profiler API is generated in the directory specified by the torch_npu.profiler.tensorboard_trace_handler API.
└── localhost-247.localdomain_2201189_20241114070751139_ascend_pt ├── ASCEND_PROFILER_OUTPUT │ ├── api_statistic.csv │ ├── kernel_details.csv │ ├── operator_details.csv │ ├── op_statistic.csv │ ├── step_trace_time.csv │ └── trace_view.json ├── FRAMEWORK ... ├── PROF_000001_20241114151021952_PGRJNNCFAIJQMERA │ ├── device_1 │ │ ├── data ... │ ├── host │ │ ├── data ... │ ├── mindstudio_profiler_log ... │ └── mindstudio_profiler_output │ ├── api_statistic_20241114151110.csv │ ├── msprof_20241114151108.json │ ├── op_statistic_20241114151110.csv │ ├── op_summary_20241114151110.csv │ ├── prof_rule_1_20241114151110.json │ ├── README.txt │ └── task_time_20241114151110.csv └── profiler_info.jsonYou are advised to use MindStudio Insight to analyze the profile data collected by the Ascend PyTorch Profiler API in a visualized manner. You can also use the msprof-analyze tool of mstt to assist in analysis. For details, see Using MindStudio Insight to Display Profile Data and Using msprof-analyze to Analyze Profile Data.