昇腾社区首页
中文
注册

Ascend PyTorch Profiler采集性能数据

Ascend PyTorch Profiler是大模型在Ascend PyTorch框架下训练过程中提供的一套采集性能数据的API接口,能够采集到框架侧、CANN侧和device侧的原始性能数据,并完成解析。在训练脚本(如train_*.py文件)内插入torch_npu.profiler相关采集、解析的配置和参数,再启动训练,即可采集性能数据。

Ascend PyTorch Profiler详细介绍请参见性能调优工具用户指南

示例代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import torch
import torch_npu

# Profiler采集、解析的前置配置参数
experimental_config = torch_npu.profiler._ExperimentalConfig(
    export_type=torch_npu.profiler.ExportType.Text,
    profiler_level=torch_npu.profiler.ProfilerLevel.Level1,
    msprof_tx=False,
    aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization,
    l2_cache=False,
    op_attr=False,
    data_simplification=False,
    record_op_args=False,
    gc_detect_threshold=None
)
# 大模型训练的次数
steps = 7
with torch_npu.profiler.profile(
        activities=[
            torch_npu.profiler.ProfilerActivity.CPU,
            torch_npu.profiler.ProfilerActivity.NPU
        ],
        schedule=torch_npu.profiler.schedule(wait=0, warmup=0, active=2, repeat=2, skip_first=1),
        on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./result"),
        record_shapes=False,
        profile_memory=True,
        with_stack=False,
        with_modules=False,
        with_flops=False,
        experimental_config=experimental_config) as prof:
    for step in range(steps):
        # 模型训练
        train_one_step(step, steps, train_loader, model, optimizer, criterion)
        # 调用step方法进行采集、解析数据
        prof.step()

采集生成的文件结构如下所示:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
└── localhost.localdomain_139247_20240628101435_ascend_pt
    ├── profiler_info.json
    ├── profiler_metadata.json
    ├── ASCEND_PROFILER_OUTPUT
       ├── communication.json
       ├── communication_matrix.json
       ├── kernel_details.csv
       ├── memory_record.csv
       ├── npu_module_mem.csv
       ├── operator_details.csv
       ├── operator_memory.csv
       ├── step_trace_time.csv
       ├── op_statistic.csv
       ├── api_statistic.csv
       └── trace_view.json
    ├── FRAMEWORK
    └── PROF_000001_20230628101435646_FKFLNPEPPRRCFCBA
          ├── analyze
          ├── device_*
          ├── host
          ├── mindstudio_profiler_log
          └── mindstudio_profiler_output