What Do I Do If Log Flush Fails?

Ascend EP Standard Form

If device-side system logs fail to be exported through the msnpureport tool:

Perform the following steps:

  1. Run the related command to view the printed information, for fault locating and problem solving.

    If the fault persists, go to 2.

  2. Run the following command on the host to check whether the disk space of the log storage path (path where the msnpureport tool is running) on the host is full:
    df -h

    If the fault persists: After obtaining the logs, click here to contact technical support.

If the flush of app logs fails:

Check whether the app logs (including the plog logs in the $HOME/ascend/log/ directory and the device-id logs) are flushed properly. If no, perform the following steps:

  1. Check whether there is any error log file in the host-side directory /var/log/messages.
    AArch64 architecture:
    cat /var/log/messages

    x86_64 architecture:

    cat /var/log/syslog

    If the fault persists, go to 2.

  2. Run the following command on the host to check whether the space of the log flush directory ($HOME/ascend/log/) is sufficient:
    df -h

    If the fault persists, go to 3.

  3. On the host, use the msnpureport tool to export device-side system logs and check whether there are any error logs.

    For details, see msnpureport Instructions.

  4. On the condition that only flush of device-id fails, you can view the error information in the plog logs to locate the error process.

Ascend RC Form

If the flush of app logs fails:

Perform the following steps:

  1. Check whether the dynamic library on which the app process depends is correct:
    ldd xxx

    Replace xxx with the binary's process.

  2. Check whether the space of the log flush directory /var/log/npu/slog is sufficient:
    df -h
  3. Check whether the slogd process exists.
    ps -elf | grep slogd

    If information about the slogd process is output, the slogd process exists.

    For the Atlas 200/300/500 Inference Product, if the slogd process does not exist, perform the following steps to restart the slogd process:

    1. Switch to a common user (for example, the HwHiAiUser user):
      su HwHiAiUser
    2. Manually start the slogd process.
      nohup /var/slogd > /dev/null 2>&1 &
    3. Check whether the slogd process has been started.
      ps -elf | grep slogd
  4. If the fault persists but application logs are not flushed to drives, rectify the fault by referring to Restarting Log Processes.

If flush of the system logs fails:

Perform the following steps:

  1. Check whether the slogd and sklogd processes exist:
    ps -elf | grep log

    If the process information is output, the log process exists.

    For the Atlas 200/300/500 Inference Product, if the process does not exist, perform the following steps to restart the log process:

    1. Switch to a common user (for example, the HwHiAiUser user):
      su HwHiAiUser
    2. Run the following commands to manually start related log processes:
      • Start the slogd process.
        nohup /var/slogd > /dev/null 2>&1 &
      • Start the sklogd process.
        nohup /var/sklogd > /dev/null 2>&1 &
    3. Check whether the log processes are started.
      ps -elf | grep log
  2. Check whether the space of the log flush directory /var/log/npu/slog is sufficient:
    df -h
  3. If the fault persists but system logs are not flushed to drives, rectify the fault by referring to Restarting Log Processes.