How Do I View Information About High Resource Usages in Device Logs?
Some service program running exceptions may be caused by device memory exhaustion, high CPU usage, file handle quantity reaching the upper limit, process quantity reaching the upper limit, and other problems. You can locate these problems based on key information in logs.
- On the host server, run the msnpureport tool in a directory on which you have the read, write, and execute permissions (for example, /var/log/npu/report) to export device logs.The following is an example of the msnpureport tool command. /usr/local/Ascend is the default installation path of the driver package. Replace it with the actual path.
/usr/local/Ascend/driver/tools/msnpureport -f
By default, run logs generated by system processes on the device are stored in /var/log/npu/report/*/slog/device-os-id/run/device-os/, where * indicates the timestamp, and id in device-os-id indicates the device ID.
- In the device log, check the resource usage, including the memory usage, CPU usage, number of file handles, and number of zombie processes, through the logs of the SYSMONITOR module.
- Check the memory usage.
memory usage alarm: An alarm is generated, indicating that the memory usage exceeds the upper threshold (90%).
memory usage stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.
slog/dev-os-0/run/device-os/device-os_19700101080010358.log:16:[INFO] SYSMONITOR(2150,log-daemon):1970-01-01-08:00:09.728.115 [sys_monitor_frame.c:60][tid:2168]>>> system resource monitor start, period: 10000ms slog/dev-os-0/run/device-os/device-os_19700101080010358.log:151:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:03.421.505 [sys_memory_monitor.c:233][tid:2168]>>> memory usage alarm: 93.7% slog/dev-os-0/run/device-os/device-os_19700101080010358.log:165:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.581.257 [sys_memory_monitor.c:225][tid:2168]>>> PID VSZ %VSZ COMMAND slog/dev-os-0/run/device-os/device-os_19700101080010358.log:166:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.422 [sys_memory_monitor.c:225][tid:2168]>>> 2112 950m 2.1 /usr/bin/mdc/base-plat/aosservice/iammgr slog/dev-os-0/run/device-os/device-os_19700101080010358.log:167:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.519 [sys_memory_monitor.c:225][tid:2168]>>> 2096 969m 2.1 /var/slogd slog/dev-os-0/run/device-os/device-os_19700101080010358.log:168:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.529 [sys_memory_monitor.c:225][tid:2168]>>> 2142 734m 1.6 /var/resource_mgr slog/dev-os-0/run/device-os/device-os_19700101080010358.log:169:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.537 [sys_memory_monitor.c:225][tid:2168]>>> 2088 726m 1.6 /usr/bin/mdc/base-plat/process-manager/process-manager slog/dev-os-0/run/device-os/device-os_19700101080010358.log:170:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.544 [sys_memory_monitor.c:225][tid:2168]>>> 2150 622m 1.4 /var/log-daemon slog/dev-os-0/run/device-os/device-os_19700101080010358.log:171:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.550 [sys_memory_monitor.c:225][tid:2168]>>> 2241 522m 1.1 /var/tsdaemon slog/dev-os-0/run/device-os/device-os_19700101080010358.log:172:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.558 [sys_memory_monitor.c:225][tid:2168]>>> 2195 510m 1.1 /usr/bin/mdc/base-plat/process-manager/proc_launcher slog/dev-os-0/run/device-os/device-os_19700101080010358.log:173:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.564 [sys_memory_monitor.c:225][tid:2168]>>> 2155 486m 1.0 /var/dmp_daemon slog/dev-os-0/run/device-os/device-os_19700101080010358.log:174:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.570 [sys_memory_monitor.c:225][tid:2168]>>> 2222 284m 0.6 /var/hdcd slog/dev-os-0/run/device-os/device-os_19700101080010358.log:175:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.576 [sys_memory_monitor.c:225][tid:2168]>>> 2211 284m 0.6 /var/hdcd slog/dev-os-0/run/device-os/device-os_19700101080010358.log:176:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-17:17:33.503.564 [sys_memory_monitor.c:251][tid:2168]>>> memory usage stat: minUsage= 2.3%, maxUsage=93.7%, avgUsage= 2.6%, alarmNum=1, resumeNum=1, duration=10000ms
- Check the CPU usage.
cpu usage alarm: An alarm is generated, indicating that the CPU usage exceeds the upper threshold (90%).
cpu usage stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.
slog/dev-os-0/run/device-os/device-os_19700101080010358.log:16:[INFO] SYSMONITOR(2150,log-daemon):1970-01-01-08:00:09.728.115 [sys_monitor_frame.c:60][tid:2168]>>> system resource monitor start, period: 10000ms slog/dev-os-0/run/device-os/device-os_19700101080010358.log:151:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:03.421.505 [sys_cpu_monitor.c:176][tid:2168]>>> cpu usage alarm: 95.2% slog/dev-os-0/run/device-os/device-os_19700101080010358.log:165:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.581.257 [sys_cpu_monitor.c:168][tid:2168]>>> PID CPU %CPU COMMAND slog/dev-os-0/run/device-os/device-os_19700101080010358.log:166:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.422 [sys_cpu_monitor.c:168][tid:2168]>>> 2112 0 12.1 /usr/bin/mdc/base-plat/aosservice/iammgr slog/dev-os-0/run/device-os/device-os_19700101080010358.log:167:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.519 [sys_cpu_monitor.c:168][tid:2168]>>> 2096 0 2.1 /var/slogd slog/dev-os-0/run/device-os/device-os_19700101080010358.log:168:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.529 [sys_cpu_monitor.c:168][tid:2168]>>> 2142 0 1.6 /var/resource_mgr slog/dev-os-0/run/device-os/device-os_19700101080010358.log:169:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.537 [sys_cpu_monitor.c:168][tid:2168]>>> 2088 0 1.6 /usr/bin/mdc/base-plat/process-manager/process-manager slog/dev-os-0/run/device-os/device-os_19700101080010358.log:170:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.544 [sys_cpu_monitor.c:168][tid:2168]>>> 2150 0 1.4 /var/log-daemon slog/dev-os-0/run/device-os/device-os_19700101080010358.log:171:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.550 [sys_cpu_monitor.c:168][tid:2168]>>> 2241 0 1.1 /var/tsdaemon slog/dev-os-0/run/device-os/device-os_19700101080010358.log:172:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.558 [sys_cpu_monitor.c:168][tid:2168]>>> 2195 0 1.1 /usr/bin/mdc/base-plat/process-manager/proc_launcher slog/dev-os-0/run/device-os/device-os_19700101080010358.log:173:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.564 [sys_cpu_monitor.c:168][tid:2168]>>> 2155 0 1.0 /var/dmp_daemon slog/dev-os-0/run/device-os/device-os_19700101080010358.log:174:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.570 [sys_cpu_monitor.c:168][tid:2168]>>> 2222 0 0.6 /var/hdcd slog/dev-os-0/run/device-os/device-os_19700101080010358.log:175:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-16:22:07.741.576 [sys_cpu_monitor.c:168][tid:2168]>>> 2211 0 0.6 /var/hdcd slog/dev-os-0/run/device-os/device-os_19700101080010358.log:176:[INFO] SYSMONITOR(2150,log-daemon):2024-04-26-17:17:33.503.564 [sys_cpu_monitor.c:184][tid:2168]>>> cpu usage stat: minUsage= 1.8%, maxUsage=95.2%, avgUsage= 3.6%, alarmNum=1, resumeNum=1, duration=10000ms
- Check the number of file handles.
fd usage alarm: An alarm is generated, indicating that the file handle usage exceeds the upper threshold (90%).
fd usage stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.
[INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.820.774 [sys_fd_monitor.c:156][tid:2187]>>> fd total: 4454775, used: 4445866, fd usage alarm: 99.8% [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.820.865 [sys_fd_monitor.c:126][tid:2187]>>> sysmonitor fd process top three [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.933.536 [sys_fd_monitor.c:147][tid:2187]>>> pid: 12170 , fd used: 4445532 [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.933.641 [sys_fd_monitor.c:147][tid:2187]>>> pid: 2271 , fd used: 28 [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:17:43.933.650 [sys_fd_monitor.c:147][tid:2187]>>> pid: 2116 , fd used: 26 [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:27:43.856.787 [INFO] SYSMONITOR(2170,log-daemon):2024-05-15-03:27:43.856.787 [sys_fd_monitor.c:162][tid:2187]>>> fd usage stat: minUsage= 0.0%, maxUsage= 99.8%, avgUsage= 87.3%, alarmNum=1, resumeNum=1, duration=80000ms
- Check the number of zombie processes.
zombie process count alarm: An alarm is generated, indicating that the number of zombie processes exceeds the upper threshold (5).
zombie process count stat: Upon the conclusion of a given period (one hour), if an alarm event is detected within that period, its statistical outcome is printed.
[INFO] SYSMONITOR(2166,log-daemon):2024-05-15-00:01:24.019.292 [sys_zp_monitor.c:119][tid:2195]>>> zombie process count alarm: 98 [INFO] SYSMONITOR(2166,log-daemon):2024-05-15-00:01:24.019.356 [sys_zp_monitor.c:134][tid:2195]>>> zombie process count stat: minCount=0, maxCount=98, avgCount=98, alarmNum=1, resumeNum=0, duration=120000ms
- Check the memory usage.