Profile Data Collection

Prerequisites

Complete Model Development and Migration to obtain the GPU and NPU environments that can properly execute training jobs.

Before collecting profile data, delete the Accuracy Collection API from the training script (main.py) because accuracy data collection and profile data collection cannot be performed at the same time.

Collection

Add the Ascend PyTorch Profiler API tool to the training script (main.py) in the GPU and NPU environments.

 23
 24 import torch_npu
 25 from torch_npu.contrib import transfer_to_npu
 26
...
322     experimental_config = torch_npu.profiler._ExperimentalConfig(
323         export_type=torch_npu.profiler.ExportType.Text,
324         profiler_level=torch_npu.profiler.ProfilerLevel.Level1,
325         msprof_tx=False,
326         aic_metrics=torch_npu.profiler.AiCMetrics.AiCoreNone,
327         l2_cache=False,
328         op_attr=False,
329         data_simplification=False,
330         record_op_args=False,
331         gc_detect_threshold=None)
332     with torch_npu.profiler.profile(
333         activities=[
334             torch_npu.profiler.ProfilerActivity.CPU,
335             torch_npu.profiler.ProfilerActivity.NPU
336             ],
337         schedule=torch_npu.profiler.schedule(wait=0, warmup=0, active=1, repeat=1, skip_first=1),
338         on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data"),
339         record_shapes=False,
340         profile_memory=False,
341         with_stack=False,
342         with_modules=False,
343         with_flops=False,
344         experimental_config=experimental_config) as prof:
345         for i, (images, target) in enumerate(train_loader):
346             # measure data loading time
347             data_time.update(time.time() - end)
348
349             # move data to the same device as model
350             images = images.to(device, non_blocking=True)
351             target = target.to(device, non_blocking=True)
352
353             # compute output
354             output = model(images)
355             loss = criterion(output, target)
356
357             # measure accuracy and record loss
358             acc1, acc5 = accuracy(output, target, topk=(1, 5))
359             losses.update(loss.item(), images.size(0))
360             top1.update(acc1[0], images.size(0))
361             top5.update(acc5[0], images.size(0))
362
363             # compute gradient and do SGD step
364             optimizer.zero_grad()
365             loss.backward()
366             optimizer.step()
367             prof.step()
...

For details about the APIs in the example, see Ascend PyTorch Profiler APIs of Performance Tuning Tool User Guide .
Profile data occupies certain disk space. As a result, the server may be unavailable when the disk space is used up. The space required by profile data is closely related to the model parameters, collection configurations, and number of collection iterations. You need to ensure that the available disk space in the directory where profile data is flushed is sufficient.

Run the training script command. The tool collects the profile data during model training.
```
python main.py -a resnet50 -b 32 --gpu 1 --dummy
```

View the result file of profile data collected during PyTorch-based training.

After the training is complete, the collection result directory of the Ascend PyTorch Profiler API is generated in the directory specified by the torch_npu.profiler.tensorboard_trace_handler API.

└── localhost-247.localdomain_2201189_20241114070751139_ascend_pt
    ├── ASCEND_PROFILER_OUTPUT
    │   ├── api_statistic.csv
    │   ├── kernel_details.csv
    │   ├── operator_details.csv
    │   ├── op_statistic.csv
    │   ├── step_trace_time.csv
    │   └── trace_view.json
    ├── FRAMEWORK
...
    ├── PROF_000001_20241114151021952_PGRJNNCFAIJQMERA
    │   ├── device_1
    │   │   ├── data
...
    │   ├── host
    │   │   ├── data
...
    │   ├── mindstudio_profiler_log
...
    │   └── mindstudio_profiler_output
    │       ├── api_statistic_20241114151110.csv
    │       ├── msprof_20241114151108.json
    │       ├── op_statistic_20241114151110.csv
    │       ├── op_summary_20241114151110.csv
    │       ├── prof_rule_1_20241114151110.json
    │       ├── README.txt
    │       └── task_time_20241114151110.csv
    └── profiler_info.json

You are advised to use MindStudio Insight to analyze the profile data collected by the Ascend PyTorch Profiler API in a visualized manner. You can also use the msprof-analyze tool of mstt to assist in analysis. For details, see Using MindStudio Insight to Display Profile Data and Using msprof-analyze to Analyze Profile Data.

Parent topic: Model Performance Tuning