Process Is Killed and Terminated Abnormally During the AICORE Stress Test

Symptom

A process is killed and terminated abnormally during the AICORE stress test.

[root@l*****]# ascend-dmi --dg -i aicore -s -q
Stress test is being performed,please wait.
Killed

Cause Analysis

The memory used by the process exceeds the upper limit. As a result, the process is killed and terminated abnormally.

Check the OS system logs. The /var/log/message or /var/log/syslog contains oom-killer-related log information. You can view the cgroup and memory limit of the current process in the log file.

Solution

  1. Reserve sufficient memory before running commands to prevent abnormal process interruption. You can run the free -h command to query the available memory of the current system.

  2. If the available memory is sufficient, you are advised to adjust the upper memory limit of the cgroup. You can run the following command to query the memory limit of the cgroup. If cgroup v2 is used, the configuration file is memory.max.
    /sys/fs/cgroup/memory/${cgroup where the process is running}/memory.limit_in_bytes