日志说明
Ascend DMI工具在执行命令行操作时会记录日志,日志存放路径如下:
- root用户:/var/log/ascend-dmi
- 非root用户:~/var/log/ascend-dmi
当日志文件大小超过10MB后,将转存为日志文件.XX.gz(XX按自然数从1开始递增),所有转存文件总量不超过10,超过时将删除转存日期最早日志以维持最大日志文件数量。

- 当Ascend DMI工具试图获取设备类型失败时,将按照上述默认路径进行转存。
- Ascend DMI工具设备类型为Atlas 500 A2 智能小站时,日志文件大小超过1MB后,将转存为日志文件.XX.gz(XX按自然数从1开始递增),所有转存文件总数量不超过10个,超过时将删除转存日期最早日志以维持最大日志文件数量。其转存日志存放路径为:/home/log/ascend-dmi。
- debug日志只会转存至/var/log/ascend-dmi目录下,且文件大小为10MB时会进行转存,在Atlas 500 A2 智能小站上请注意及时保存debug日志,防止重启发生丢失。
日志备份
当设备类型为Atlas 500 A2 智能小站时,因驱动重启后,会清除原日志存放路径下的toolbox的日志文件,但驱动会将其保存在“/home/log/kbox_last_logs/”路径下的压缩文件reboot_back_up_XX.tar.gz中,解压后查看重启前的日志文件。
数据落盘
在执行带宽诊断、算力诊断、NIC诊断时,如果执行诊断输出的格式为JSON,将会进行数据落盘操作,显示具体Device对应的带宽或算力数值。数据落盘文件存放路径如下:
- root用户:/var/log/ascend_check/result.txt
- 非root用户:~/var/log/ascend_check/result.txt
例如以下执行指令,都会生成数据落盘文件:
ascend-dmi -dg -i bandwidth -fmt json
ascend-dmi -dg -i aiflops -fmt json
ascend-dmi -dg -fmt json
ascend-dmi -dg -i nic -fmt json

- Atlas 200T A2 Box16 异构子框在虚拟机场景下,由于数据传输通道的特殊性,BandWidth诊断将不执行两个8p之间的P2P测试。
- 使用Atlas A2 训练系列产品、Atlas 800I A2推理产品,执行带宽和算力诊断时,回显如下:
{ "device_0": { "aiflops": "287.95", "d2d bandwidth": "743.41", "d2d write bandwidth": "740.86", "d2h bandwidth": "28.07", "h2d bandwidth": "25.12", "p2p bidirectional bandwidth": "X", "p2p bidirectional write bandwidth": "X", "p2p unidirectional bandwidth": "X", "p2p unidirectional write bandwidth": "X" } }
- 使用Atlas 900 A2 PoD 集群基础单元,执行NIC诊断时,回显如下:
{ "device_0": { "nic roce read bandwidth": "device_7: 22.716700, device_1: 22.716524, device_6: 22.716612", "nic roce send bandwidth": "device_6: 22.739834, device_1: 22.739473, device_7: 22.739336", "nic roce write bandwidth": "device_1: 22.717470, device_7: 22.717920, device_6: 22.716806" }, "device_1": { "nic roce read bandwidth": "device_0: 22.716396, device_6: 22.716591, device_7: 22.716866", "nic roce send bandwidth": "device_0: 22.739386, device_7: 22.740028, device_6: 22.739374", "nic roce write bandwidth": "device_0: 22.716515, device_6: 22.716797, device_7: 22.716660" }, "device_2": { "nic roce read bandwidth": "device_4: 22.716534, device_5: 22.716562, device_3: 22.716787", "nic roce send bandwidth": "device_4: 22.739746, device_5: 22.739492, device_3: 22.739464", "nic roce write bandwidth": "device_3: 22.718027, device_4: 22.716728, device_5: 22.716581" }, "device_3": { "nic roce read bandwidth": "device_2: 22.716768, device_5: 22.716759, device_4: 22.716738", "nic roce send bandwidth": "device_2: 22.739170, device_5: 22.739248, device_4: 22.739483", "nic roce write bandwidth": "device_2: 22.716377, device_5: 22.716700, device_4: 22.717323" }, "device_4": { "nic roce read bandwidth": "device_2: 22.716816, device_3: 22.716747, device_5: 22.716280", "nic roce send bandwidth": "device_2: 22.739374, device_3: 22.739355, device_5: 22.739552", "nic roce write bandwidth": "device_2: 22.716934, device_5: 22.716484, device_3: 22.717091" }, "device_5": { "nic roce read bandwidth": "device_4: 22.717598, device_3: 22.717157, device_2: 22.717579", "nic roce send bandwidth": "device_4: 22.739483, device_3: 22.739492, device_2: 22.739336", "nic roce write bandwidth": "device_4: 22.716825, device_2: 22.713037, device_3: 22.716856" }, "device_6": { "nic roce read bandwidth": "device_0: 22.716681, device_7: 22.716719, device_1: 22.716681", "nic roce send bandwidth": "device_0: 22.739630, device_1: 22.739414, device_7: 22.739374", "nic roce write bandwidth": "device_0: 22.716446, device_7: 22.718134, device_1: 22.717091" }, "device_7": { "nic roce read bandwidth": "device_6: 22.717169, device_1: 22.716700, device_0: 22.717842", "nic roce send bandwidth": "device_6: 22.739590, device_1: 22.739199, device_0: 22.739590", "nic roce write bandwidth": "device_6: 22.716631, device_1: 22.716846, device_0: 22.716806" } }
- 使用Atlas A3 训练系列产品,执行带宽诊断时,回显如下:
{ "device_all": { "d2h bandwidth": "356.64", "h2d bandwidth": "297.58" }, "device_0": { "d2d bandwidth": "1516.66", "d2d write bandwidth": "1484.89", "p2p bidirectional bandwidth": "X, 366.68, 270.14, 269.81, 270.05, 269.96, 269.74, 269.78, 270.09, 270.03, 269.88, 269.82, 269.93, 269.97, 269.80, 269.80", "p2p bidirectional write bandwidth": "X, 343.01, 250.02, 247.81, 248.13, 245.36, 246.59, 247.84, 246.04, 246.09, 248.19, 246.14, 245.33, 246.54, 248.46, 246.87", "p2p unidirectional bandwidth": "X, 202.95, 164.73, 164.72, 164.78, 164.74, 164.73, 164.72, 164.77, 164.74, 164.75, 164.71, 164.75, 164.77, 164.75, 164.72", "p2p unidirectional write bandwidth": "X, 191.71, 137.20, 137.41, 137.26, 137.21, 137.46, 137.49, 136.72, 137.18, 137.45, 137.40, 137.00, 137.11, 137.46, 137.34" }, "device_1": { "d2d bandwidth": "1528.46", "d2d write bandwidth": "1470.62", "p2p bidirectional bandwidth": "368.80, X, 269.88, 269.87, 269.99, 270.09, 269.81, 269.74, 270.06, 270.02, 269.96, 270.03, 270.13, 269.98, 269.94, 269.92", "p2p bidirectional write bandwidth": "340.45, X, 246.08, 247.17, 246.87, 245.46, 245.34, 248.06, 244.03, 243.79, 247.23, 243.96, 243.78, 246.07, 247.84, 247.25", "p2p unidirectional bandwidth": "202.96, X, 164.73, 164.74, 164.74, 164.76, 164.73, 164.74, 164.77, 164.77, 164.73, 164.77, 164.76, 164.78, 164.76, 164.73", "p2p unidirectional write bandwidth": "191.68, X, 137.08, 137.58, 137.36, 136.93, 137.32, 137.66, 136.75, 136.97, 137.38, 137.24, 136.73, 137.26, 137.54, 137.35" } }
参数 |
说明 |
---|---|
值 |
具体Device对应的带宽或算力数值。 带宽诊断单位为GB/s,算力诊断单位为TFLOPS。 |
X/NA |
不支持显示此数值。 |
FAIL |
执行结果失败。 |
父主题: Ascend DMI工具