昇腾社区首页
中文
注册

日志说明

Ascend DMI工具在执行命令行操作时会记录日志,日志存放路径如下:

  • root用户:/var/log/ascend-dmi
  • 非root用户:~/var/log/ascend-dmi

当日志文件大小超过10MB后,将转存为日志文件.XX.gzXX按自然数从1开始递增),所有转存文件总量不超过10,超过时将删除转存日期最早日志以维持最大日志文件数量。

  1. Ascend DMI工具试图获取设备类型失败时,将按照上述默认路径进行转存。
  2. Ascend DMI工具设备类型为Atlas 500 A2 智能小站时,日志文件大小超过1MB后,将转存为日志文件.XX.gzXX按自然数从1开始递增),所有转存文件总数量不超过10个,超过时将删除转存日期最早日志以维持最大日志文件数量。其转存日志存放路径为:/home/log/ascend-dmi。
  3. debug日志只会转存至/var/log/ascend-dmi目录下,且文件大小为10MB时会进行转存,在Atlas 500 A2 智能小站上请注意及时保存debug日志,防止重启发生丢失。

日志备份

当设备类型为Atlas 500 A2 智能小站时,因驱动重启后,会清除原日志存放路径下的toolbox的日志文件,但驱动会将其保存在“/home/log/kbox_last_logs/”路径下的压缩文件reboot_back_up_XX.tar.gz中,解压后查看重启前的日志文件。

数据落盘

在执行BandWidth带宽诊断、Aiflops算力诊断时,如果执行诊断输出的格式为json,将会进行数据落盘操作,显示具体Device对应的带宽或算力数值。数据落盘文件存放路径如下:

  • root用户:/var/log/ascend_check/result.txt
  • 非root用户:~/var/log/ascend_check/result.txt

例如以下执行指令,都会生成数据落盘文件:

ascend-dmi -dg -i bandwidth -fmt json
ascend-dmi -dg -i aiflops -fmt json
ascend-dmi -dg -fmt json
  • Atlas 200T A2 Box16 异构子框在虚拟机场景下,由于数据传输通道的特殊性,BandWidth诊断将不执行两个8p之间的P2P测试。
  • 使用Atlas A2 训练系列产品Atlas 800I A2推理产品,执行带宽和算力诊断时,回显如下:
    {
        "device_0": {
            "aiflops": "287.95",
            "d2d bandwidth": "743.41",
            "d2d write bandwidth": "740.86",
            "d2h bandwidth": "28.07",
            "h2d bandwidth": "25.12",
            "p2p bidirectional bandwidth": "X",
            "p2p bidirectional write bandwidth": "X",
            "p2p unidirectional bandwidth": "X",
            "p2p unidirectional write bandwidth": "X"
        }
    }
  • 使用Atlas A3 训练系列产品,执行带宽诊断时,回显如下:
    {
        "device_all": {
            "d2h bandwidth": "356.64",
            "h2d bandwidth": "297.58"
        },
        "device_0": {
            "d2d bandwidth": "1516.66",
            "d2d write bandwidth": "1484.89",
            "p2p bidirectional bandwidth": "X, 366.68, 270.14, 269.81, 270.05, 269.96, 269.74, 269.78, 270.09, 270.03, 269.88, 269.82, 269.93, 269.97, 269.80, 269.80",
            "p2p bidirectional write bandwidth": "X, 343.01, 250.02, 247.81, 248.13, 245.36, 246.59, 247.84, 246.04, 246.09, 248.19, 246.14, 245.33, 246.54, 248.46, 246.87",
            "p2p unidirectional bandwidth": "X, 202.95, 164.73, 164.72, 164.78, 164.74, 164.73, 164.72, 164.77, 164.74, 164.75, 164.71, 164.75, 164.77, 164.75, 164.72",
            "p2p unidirectional write bandwidth": "X, 191.71, 137.20, 137.41, 137.26, 137.21, 137.46, 137.49, 136.72, 137.18, 137.45, 137.40, 137.00, 137.11, 137.46, 137.34"
        },
        "device_1": {
            "d2d bandwidth": "1528.46",
            "d2d write bandwidth": "1470.62",
            "p2p bidirectional bandwidth": "368.80, X, 269.88, 269.87, 269.99, 270.09, 269.81, 269.74, 270.06, 270.02, 269.96, 270.03, 270.13, 269.98, 269.94, 269.92",
            "p2p bidirectional write bandwidth": "340.45, X, 246.08, 247.17, 246.87, 245.46, 245.34, 248.06, 244.03, 243.79, 247.23, 243.96, 243.78, 246.07, 247.84, 247.25",
            "p2p unidirectional bandwidth": "202.96, X, 164.73, 164.74, 164.74, 164.76, 164.73, 164.74, 164.77, 164.77, 164.73, 164.77, 164.76, 164.78, 164.76, 164.73",
            "p2p unidirectional write bandwidth": "191.68, X, 137.08, 137.58, 137.36, 136.93, 137.32, 137.66, 136.75, 136.97, 137.38, 137.24, 136.73, 137.26, 137.54, 137.35"
        }
    }
表1 显示结果参数说明

参数

说明

具体Device对应的带宽或算力数值。

带宽诊断单位为GB/s,算力诊断单位为TFLOPS。

X/NA

不支持显示此数值。

FAIL

执行结果失败。