Host Logs
File Description
Log Name |
Naming Constraint |
Storage Path |
|---|---|---|
Host OS logs |
messages-*? |
Collection directory |
Host kernel message logs |
dmesg |
|
Host system monitoring logs |
sysmonitor.log |
|
Host kernel message logs when the system breaks down |
vmcore-dmesg.txt |
Collecting Host OS Logs
- Go to the log storage directory and open the messages file.
cd /var/log && vi messages
- Obtain the log information based on the training or inference start time and end time, create the messages file in the collection directory, and dump the log content.
cd Collection_directory/ && vi messages
Dump log information. A log example is as follows:
Aug 13 03:19:24 # A training or inference job starts. ... Aug 13 04:14:39 # A training or inference job ends.
Run the :wq command to save the file and exit. The log content varies according to the actual file.
Collecting Host Kernel Message Logs
Run the following command to collect the latest dmesg log and place it in the collection directory. A maximum of 100,000 lines can be collected.
dmesg -T | tail -n 100000 > Collection directory/dmesg
A log example is as follows:
[Fri Aug 30 16:42:49 2024] Log printing … [Fri Aug 30 16:42:49 2024] Log printing
Collecting Host System Monitoring Logs
Copy the sysmonitor.log file to the collection directory.
cp -r /var/log/sysmonitor.log Collection_directory/
A log example is as follows:
2024-08-27T19:54:48.242959+00:00|info|sysmonitor[xxxxx]: Log printing
…
2024-08-27T19:54:48.343493+00:00|info|sysmonitor[xxxxx]: Log printing
Collecting Host Kernel Message Logs when the System Breaks Down
Host kernel message logs are host kernel message files saved when the system breaks down. Perform the following steps to capture these logs:
Copy the vmcore-dmesg.txt file to the collection directory.
cp -r /var/crash/Collection_directory/
A log example is as follows:
[292.448078] Log printing …… [292.448080] Log printing
Collecting dmidecode Logs on the Host
The host-side dmidecode logs contain DMI hardware information.
Run the following command to collect them:
dmidecode > dmidecode.txt
Parent topic: Collection After Training or Inference