How Do I View Information About High Resource Usages in Device Logs?

Some service program running exceptions may be caused by device memory exhaustion, high CPU usage, file handle quantity reaching the upper limit, process quantity reaching the upper limit, and other problems. You can locate these problems based on key information in logs.

  1. On the host server, run the msnpureport tool in a directory on which you have the read, write, and execute permissions (for example, /var/log/npu/report) to export device logs.
    The following is an example of the msnpureport tool command. /usr/local/Ascend is the default installation path of the driver package. Replace it with the actual path.
    /usr/local/Ascend/driver/tools/msnpureport -f

    By default, run logs generated by system processes on the device are stored in /var/log/npu/report/*/slog/device-os-id/run/device-os/, where * indicates the timestamp, and id in device-os-id indicates the device ID.

  2. In the device log, check the resource usage, including the memory usage, CPU usage, number of file handles, and number of zombie processes, through the logs of the SYSMONITOR module.
    • Check the memory usage.

      memory usage alarm: An alarm is generated, indicating that the memory usage exceeds the upper threshold (90%).

      memory usage stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.

      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:16:[INFO] SYSMONITOR(2150,log-daemon):1970-01-01-08:00:09.728.115 [sys_monitor_frame.c:60][tid:2168]>>> system resource monitor start, period: 10000ms
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:151:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:03.421.505 [sys_memory_monitor.c:233][tid:2168]>>> memory usage alarm: 93.7%
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:165:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.581.257 [sys_memory_monitor.c:225][tid:2168]>>> PID VSZ %VSZ COMMAND
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:166:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.422 [sys_memory_monitor.c:225][tid:2168]>>> 2112 950m 2.1 /usr/bin/mdc/base-plat/aosservice/iammgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:167:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.519 [sys_memory_monitor.c:225][tid:2168]>>> 2096 969m 2.1 /var/slogd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:168:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.529 [sys_memory_monitor.c:225][tid:2168]>>> 2142 734m 1.6 /var/resource_mgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:169:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.537 [sys_memory_monitor.c:225][tid:2168]>>> 2088 726m 1.6 /usr/bin/mdc/base-plat/process-manager/process-manager
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:170:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.544 [sys_memory_monitor.c:225][tid:2168]>>> 2150 622m 1.4 /var/log-daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:171:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.550 [sys_memory_monitor.c:225][tid:2168]>>> 2241 522m 1.1 /var/tsdaemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:172:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.558 [sys_memory_monitor.c:225][tid:2168]>>> 2195 510m 1.1 /usr/bin/mdc/base-plat/process-manager/proc_launcher
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:173:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.564 [sys_memory_monitor.c:225][tid:2168]>>> 2155 486m 1.0 /var/dmp_daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:174:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.570 [sys_memory_monitor.c:225][tid:2168]>>> 2222 284m 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:175:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.576 [sys_memory_monitor.c:225][tid:2168]>>> 2211 284m 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:176:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-17:17:33.503.564 [sys_memory_monitor.c:251][tid:2168]>>> memory usage stat: minUsage= 2.3%, maxUsage=93.7%, avgUsage= 2.6%, alarmNum=1, resumeNum=1, duration=10000ms
    • Check the CPU usage.

      cpu usage alarm: An alarm is generated, indicating that the CPU usage exceeds the upper threshold (90%).

      cpu usage stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.

      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:16:[INFO] SYSMONITOR(2150,log-daemon):1970-01-01-08:00:09.728.115 [sys_monitor_frame.c:60][tid:2168]>>> system resource monitor start, period: 10000ms
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:151:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:03.421.505 [sys_cpu_monitor.c:176][tid:2168]>>> cpu usage alarm: 95.2%
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:165:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.581.257 [sys_cpu_monitor.c:168][tid:2168]>>> PID CPU %CPU COMMAND
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:166:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.422 [sys_cpu_monitor.c:168][tid:2168]>>> 2112 0 12.1 /usr/bin/mdc/base-plat/aosservice/iammgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:167:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.519 [sys_cpu_monitor.c:168][tid:2168]>>> 2096 0 2.1 /var/slogd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:168:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.529 [sys_cpu_monitor.c:168][tid:2168]>>> 2142 0 1.6 /var/resource_mgr
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:169:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.537 [sys_cpu_monitor.c:168][tid:2168]>>> 2088 0 1.6 /usr/bin/mdc/base-plat/process-manager/process-manager
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:170:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.544 [sys_cpu_monitor.c:168][tid:2168]>>> 2150 0 1.4 /var/log-daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:171:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.550 [sys_cpu_monitor.c:168][tid:2168]>>> 2241 0 1.1 /var/tsdaemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:172:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.558 [sys_cpu_monitor.c:168][tid:2168]>>> 2195 0 1.1 /usr/bin/mdc/base-plat/process-manager/proc_launcher
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:173:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.564 [sys_cpu_monitor.c:168][tid:2168]>>> 2155 0 1.0 /var/dmp_daemon
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:174:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.570 [sys_cpu_monitor.c:168][tid:2168]>>> 2222 0 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:175:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.576 [sys_cpu_monitor.c:168][tid:2168]>>> 2211 0 0.6 /var/hdcd
      slog/dev-os-0/run/device-os/device-os_19700101080010358.log:176:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-17:17:33.503.564 [sys_cpu_monitor.c:184][tid:2168]>>> cpu usage stat: minUsage= 1.8%, maxUsage=95.2%, avgUsage= 3.6%, alarmNum=1, resumeNum=1, duration=10000ms
    • Check the number of file handles.

      fd usage alarm: An alarm is generated, indicating that the file handle usage exceeds the upper threshold (90%).

      fd usage stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.

      [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.820.774 [sys_fd_monitor.c:156][tid:2187]>>> fd total: 4454775, used: 4445866, fd usage alarm: 99.8%
      [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.820.865 [sys_fd_monitor.c:126][tid:2187]>>> sysmonitor fd process top three
      [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.933.536 [sys_fd_monitor.c:147][tid:2187]>>> pid: 12170 , fd used: 4445532
      [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.933.641 [sys_fd_monitor.c:147][tid:2187]>>> pid: 2271 , fd used: 28
      [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.933.650 [sys_fd_monitor.c:147][tid:2187]>>> pid: 2116 , fd used: 26
      [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:27:43.856.787 [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:27:43.856.787 [sys_fd_monitor.c:162][tid:2187]>>> fd usage stat: minUsage= 0.0%, maxUsage= 99.8%, avgUsage= 87.3%, alarmNum=1, resumeNum=1, duration=80000ms
    • Check the number of zombie processes.

      zombie process count alarm: An alarm is generated, indicating that the number of zombie processes exceeds the upper threshold (5).

      zombie process count stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.

      [INFO] SYSMONITOR(2166,log-daemon):2024-05-15-00:01:24.019.292 [sys_zp_monitor.c:119][tid:2195]>>> zombie process count alarm: 98
      [INFO] SYSMONITOR(2166,log-daemon):2024-05-15-00:01:24.019.356 [sys_zp_monitor.c:134][tid:2195]>>> zombie process count stat: minCount=0, maxCount=98, avgCount=98, alarmNum=1, resumeNum=0, duration=120000ms