Collecting Profile Data Using Ascend PyTorch Profiler

Ascend PyTorch Profiler is a set of APIs provided during the training process of LLMs in the Ascend PyTorch framework. It can collect raw profile data from the framework, CANN, and the device, and perform analysis on the data. Insert the torch_npu.profiler-related profiling and analysis configurations and parameters into the training script (for example, the train_*.py file), and then start the training to collect profile data.

For details about Ascend PyTorch Profiler, see Performance Tuning Tool User Guide .

The sample code is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import torch
import torch_npu
# Pre-configured parameters for Profiler data profiling and analysis.
experimental_config = torch_npu.profiler._ExperimentalConfig(
	export_type=torch_npu.profiler.ExportType.Text,
	profiler_level=torch_npu.profiler.ProfilerLevel.Level0,
	msprof_tx=False,
	aic_metrics=torch_npu.profiler.AiCMetrics.AiCoreNone,
	l2_cache=False,
	op_attr=False,
	data_simplification=False,
	record_op_args=False,
	gc_detect_threshold=None
)
# The number of the LLM training iterations.
steps = 7
with torch_npu.profiler.profile(
activities=[
	torch_npu.profiler.ProfilerActivity.CPU,
	torch_npu.profiler.ProfilerActivity.NPU
	],
schedule=torch_npu.profiler.schedule(wait=0, warmup=0, active=2, repeat=2, skip_first=1),
on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./result"),
record_shapes=False,
profile_memory=False,
with_stack=False,
with_modules=False,
with_flops=False,
experimental_config=experimental_config) as prof:
	for step in range(steps):
	# Model training
		train_one_step(step, steps, train_loader, model, optimizer, criterion)
		# Call the step method to perform profiling and analysis.
		prof.step()

The file structure generated by the profiling is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
└── localhost.localdomain_139247_20240628101435_ascend_pt
    ├── profiler_info.json
    ├── profiler_metadata.json
    ├── ASCEND_PROFILER_OUTPUT
       ├── communication.json
       ├── communication_matrix.json
       ├── data_preprocess.csv
       ├── kernel_details.csv
       ├── l2_cache.csv
       ├── memory_record.csv
       ├── npu_module_mem.csv
       ├── operator_details.csv
       ├── operator_memory.csv
       ├── step_trace_time.csv
       ├── op_statistic.csv
       ├── api_statistic.csv
       └── trace_view.json
    ├── FRAMEWORK
    └── PROF_000001_20230628101435646_FKFLNPEPPRRCFCBA
          ├── analyze
          ├── device_*
          ├── host
          ├── mindstudio_profiler_log
          └── mindstudio_profiler_output