"A Software or Internal Error Occurs. Contact Huawei technical support." Is Displayed

Symptom

Ascend DMI fails to perform AICORE diagnosis, and the message "A software or internal error occurs. Contact Huawei technical support" is displayed.

Figure 1 Error message

Possible Causes

  • The driver and firmware versions are earlier than 23.0.0.
  • The MCU version is earlier than 23.0.0.
  • The kernel package is not installed. (For CANN 8.5.0 and later versions, the ops package is not installed.)

Solution

  1. Run the ascend-dmi -c command to check whether the driver and firmware versions are 23.0.0 or later.

  2. Run the npu-smi upgrade -b mcu -i $i command to check whether the MCU version is 23.0.0 or later. ($i indicates the device ID.)
    [root@****]# npu-smi upgrade -b mcu -i 0
          Version                    :23.3.6
  3. Run the find /usr/local/Ascend/ -name kernel command to check whether the kernel package is installed.

    Generally, it is installed in the tbe directory. The following is an example of the kernel path:

    /usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/kernel/