昇腾社区首页
中文
注册

如何通过Device日志查看资源占用过高的信息

有些业务程序运行异常的情况可能由于Device的内存被耗尽、CPU占用率过高等问题引起,因此在定位问题时,我们可以通过日志中的一些关键信息排查这部分问题。

  1. 在Host侧服务器上,通过msnpureport工具导出Device日志。
    msnpureport工具命令示例如下,其中/usr/local/Ascend是驱动包的默认安装路径,请根据实际情况替换。
    /usr/local/Ascend/driver/tools/msnpureport -f

    Device侧系统进程产生的运行日志默认在/var/log/npu/report/*/slog/device-os-id/run/device-os/目录下,其中,*表示时间戳,device-os-id中的id表示Device ID

  2. 在Device日志中,通过SYSMONITOR模块的日志查看资源占用情况。
    • 查看内存占用率

      memory usage alarm:出现了内存占用率超过阈值上限(90%)的告警。

      memory usage stat:一个周期(一小时)结束时,如果周期内有告警事件,打印该周期的统计结果。

      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:16:[INFO] SYSMONITOR(2150,log-daemon):1970-01-01-08:00:09.728.115 [sys_monitor_frame.c:60][tid:2168]>>> system resource monitor start, period: 10000ms
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:151:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:03.421.505 [sys_memory_monitor.c:233][tid:2168]>>> memory usage alarm: 93.7%
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:165:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.581.257 [sys_memory_monitor.c:225][tid:2168]>>> PID VSZ %VSZ COMMAND
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:166:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.422 [sys_memory_monitor.c:225][tid:2168]>>> 2112 950m 2.1 /usr/bin/mdc/base-plat/aosservice/iammgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:167:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.519 [sys_memory_monitor.c:225][tid:2168]>>> 2096 969m 2.1 /var/slogd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:168:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.529 [sys_memory_monitor.c:225][tid:2168]>>> 2142 734m 1.6 /var/resource_mgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:169:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.537 [sys_memory_monitor.c:225][tid:2168]>>> 2088 726m 1.6 /usr/bin/mdc/base-plat/process-manager/process-manager
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:170:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.544 [sys_memory_monitor.c:225][tid:2168]>>> 2150 622m 1.4 /var/log-daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:171:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.550 [sys_memory_monitor.c:225][tid:2168]>>> 2241 522m 1.1 /var/tsdaemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:172:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.558 [sys_memory_monitor.c:225][tid:2168]>>> 2195 510m 1.1 /usr/bin/mdc/base-plat/process-manager/proc_launcher
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:173:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.564 [sys_memory_monitor.c:225][tid:2168]>>> 2155 486m 1.0 /var/dmp_daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:174:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.570 [sys_memory_monitor.c:225][tid:2168]>>> 2222 284m 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:175:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.576 [sys_memory_monitor.c:225][tid:2168]>>> 2211 284m 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:176:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-17:17:33.503.564 [sys_memory_monitor.c:251][tid:2168]>>> memory usage stat: minUsage= 2.3%, maxUsage=93.7%, avgUsage= 2.6%, alarmNum=1, resumeNum=1, duration=10000ms
    • 查看CPU占用率

      cpu usage alarm:出现了cpu占用率超过阈值上限(90%)的告警。

      cpu usage stat:一个周期(一小时)结束时,如果周期内有告警事件,打印该周期的统计结果。

      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:16:[INFO] SYSMONITOR(2150,log-daemon):1970-01-01-08:00:09.728.115 [sys_monitor_frame.c:60][tid:2168]>>> system resource monitor start, period: 10000ms
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:151:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:03.421.505 [sys_cpu_monitor.c:176][tid:2168]>>> cpu usage alarm: 95.2%
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:165:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.581.257 [sys_cpu_monitor.c:168][tid:2168]>>> PID CPU %CPU COMMAND
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:166:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.422 [sys_cpu_monitor.c:168][tid:2168]>>> 2112 0 12.1 /usr/bin/mdc/base-plat/aosservice/iammgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:167:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.519 [sys_cpu_monitor.c:168][tid:2168]>>> 2096 0 2.1 /var/slogd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:168:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.529 [sys_cpu_monitor.c:168][tid:2168]>>> 2142 0 1.6 /var/resource_mgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:169:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.537 [sys_cpu_monitor.c:168][tid:2168]>>> 2088 0 1.6 /usr/bin/mdc/base-plat/process-manager/process-manager
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:170:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.544 [sys_cpu_monitor.c:168][tid:2168]>>> 2150 0 1.4 /var/log-daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:171:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.550 [sys_cpu_monitor.c:168][tid:2168]>>> 2241 0 1.1 /var/tsdaemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:172:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.558 [sys_cpu_monitor.c:168][tid:2168]>>> 2195 0 1.1 /usr/bin/mdc/base-plat/process-manager/proc_launcher
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:173:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.564 [sys_cpu_monitor.c:168][tid:2168]>>> 2155 0 1.0 /var/dmp_daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:174:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.570 [sys_cpu_monitor.c:168][tid:2168]>>> 2222 0 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:175:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.576 [sys_cpu_monitor.c:168][tid:2168]>>> 2211 0 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:176:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-17:17:33.503.564 [sys_cpu_monitor.c:184][tid:2168]>>> cpu usage stat: minUsage= 1.8%, maxUsage=95.2%, avgUsage= 3.6%, alarmNum=1, resumeNum=1, duration=10000ms