Restrictions

  • Ensure that Ascend DMI is independently deployed on each server. If it is deployed in a shared directory for multiple servers to access or is used during Ascend software installation, upgrade, or uninstallation, unexpected problems may occur, such as execution failure or performance not meeting expectations.
  • Ascend DMI does not support concurrent multi-process execution on a single device for performance benchmarking. It is recommended that the P2P stress test be performed when the NPU is idle and no service is running. Running concurrent processes may lead to inaccurate metrics or unpredictable execution failures. For example, CCAE or NPU Exporter can call the DCMI to track environment status, which occupies communication link bandwidth and leads to inaccurate results. Additionally, when the CANN's performance analysis tool is collecting performance data, Ascend DMI cannot be used.
  • For security purposes, ensure that the fuser, lscpu, hccn_tool, systemd-detect-virt, dmidecode, hostname, mv, cp, and gzip commands are secure and available before running the Ascend DMI. It is recommended that fuser be pre-installed in the environment so that Ascend DMI can monitor NPU processes.
  • For the container scenario, if commands fail to be executed for stream test, one-click diagnosis, one-click on-chip memory stress test, AICORE diagnosis, AICORE stress testing, AICPU stress test, high-risk address stress test of on-chip memory, PRBS stream diagnosis, eye diagram diagnosis, eye diagram test, and NPU environment restoration, rectify the fault by referring to AICORE Command Fails to Be Executed in the Container, with Error Code 46 in Plog.
  • Ascend DMI does not support non-standard products such as mixed cards, which may cause function failures or poor performance.
  • To prevent frequent log output from affecting the test result, ensure that log levels on the host and device are set to ERROR before the test. The method is as follows:
    1. Check the log level.
      • Host: Run the echo $GLOBAL_LOG_LEVEL command. If the query result is invalid or empty, the log level is ERROR (corresponding to the value 3). Run the export ASCEND_GLOBAL_EVENT_ENABLE=0 command to disable the plog EVENT.
      • Device: Check the global log level, module log level, and whether EVENT-level logging is enabled by referring to msnpureport Tool Usage.
    2. If the log level is not ERROR, set the log level on the host and device by referring to "Setting the Log Level" in CANN Log Reference.

Function Constraints

The following table lists the constraints on each test.

Function

Restrictions

Software-hardware version compatibility test

  • Only users in the root group can query the firmware version during the software and hardware version compatibility test.
  • For the Atlas 800 training server (model 9000), Atlas 800 training server (model 9010), and Atlas 200/300/500 inference product, the MCU version cannot be queried during the software-hardware version compatibility test.

Bandwidth test

  • The D2D bandwidth test result is the total amount of data read and write operations divided by the consumed time. Similar to actual training or inference, the D2D bandwidth test has been optimized by some internal methods, such as caching and prefetching. Therefore, the calculated bandwidth may exceed the nominal bandwidth.
  • When H2D or D2H test data transfers on the Atlas 300I Duo inference card, the bandwidth of the secondary chip is lower than that of the primary chip. This discrepancy is a standard phenomenon due to transfer characteristics.
  • When the test data of the Atlas 200I SoC A1 core board flows in H2D or D2H mode, the test result is directly copied from the CPU due to the particularity of the architecture. The test result is different from that of other product models, which is a normal phenomenon.
  • For the Atlas 200T A2 Box16/Atlas 200I A2 Box16 heterogeneous subrack in virtual machine scenarios, P2P testing between two 8-NPU groups shows low bandwidth. This discrepancy is a known result of characteristics of data transmission channels and is a standard phenomenon.
  • For the Atlas A3 training product and Atlas A3 inference product in physical machine, container, or virtual machine scenarios, when the test is performed for the first time after a restart, the tested bandwidth is slow, which is a standard phenomenon.
  • To ensure the optimal bandwidth test result, perform the test on the bare metal server. During data transfer in the bandwidth test, the reusability of hardware resources is affected. For example, when the number of copy times (-et) or the size of transmitted data (-s) is small, the reusability is low. As a result, the bandwidth test result may be poor.
  • To ensure the accuracy of the bandwidth test, you are advised to perform the test during the deployment of a training or inference service. The CCAE or NPU Exporter calls the DCMI to monitor the environment status, which occupies certain bandwidth. As a result, the bandwidth test result will be inaccurate.
  • During the bandwidth test and SuperPoD P2P bandwidth test, bidirectional task streams are delivered at different times, which naturally causes slight fluctuations in the bidirectional P2P bandwidth results.
  • The calculation method of the P2P bandwidth test depends on the NPU working mode. If the P2P bandwidth test result differs greatly from the nominal bandwidth, you are advised to use the SMP mode. Perform the following operations: Log in to the iBMC and run the following command to set the SMP mode. The value 1 indicates the SMP mode, and the value 0 indicates the AMP mode.

    ipmcset -d npuworkmode -v 1

  • If the amount of data to be transferred and the number of data copies specified by -s and -et are small, the optimal performance may not be obtained. To obtain the optimal performance, you are advised to set -s to 512 MB and -et to a value greater than 10.
  • It is recommended that the bandwidth test be performed on a physical machine. The test result on a container or virtual machine may be inaccurate.
  • When a non-root user uses Atlas A3 training product or Atlas A3 inference product, the P2P bandwidth test is not supported by driver 25.0.RC1 and earlier versions.
  • In container scenarios, non-root users can perform the P2P and D2D bandwidth tests on the Atlas A3 training product and Atlas A3 inference product with driver versions earlier than 25.3.RC1.

SuperPoD P2P bandwidth test

  • To ensure that the test can be performed properly, do not perform the test on three or more SuperPoDs at the same time. Otherwise, the P2P bandwidth test may fail.
  • During the bandwidth test and SuperPoD P2P bandwidth test, bidirectional task streams are delivered at different times, which naturally causes slight fluctuations in the bidirectional P2P bandwidth results.
  • Currently, only IPv4 addresses are supported for the SuperPoD P2P bandwidth test.
  • The values of the -s and -et parameter specified on the two nodes where the SuperPoD P2P bandwidth test is performed must be the same.
  • If the amount of data to be transferred and the number of data copies specified by -s and -et are small, the optimal performance may not be obtained. To obtain the optimal performance, you are advised to set -s to 512 MB and -et to a value greater than 10.
  • In the container scenario, you are advised to use a shared directory mode and specify the --pid=host parameter during container mounting to ensure that the PID of the container is the same as that of the physical machine.
  • Before performing the P2P bandwidth test, ensure that the NPU type of the two SuperPoDs to be tested is consistent. Assume that the SuperPoD P2P bandwidth test is conducted on device A and device B. If the NPU type of device A is Atlas 900 A3 SuperPoD, the NPU type of device B must also be Atlas 900 A3 SuperPoD.

Power consumption test

  • Power consumption is closely related to the MCU. Before conducting a power consumption test, ensure the MCU is upgraded to a compatible version. Otherwise, AICORE usage might not reach 100%, and voltage adjustments might be abnormal.
  • The power consumption data is collected periodically, and there is an interval between two collections. Therefore, there is a low probability that the actual power consumption data is not collected, resulting in a low value displayed.
  • The power consumption test has the start time and exit time. There is an error between the first and last command outputs, which is normal.
  • Considering the operation cost, the number of printing times in the power consumption test may not be the same as the theoretical value. For example, if the running time of the power consumption tool is 60s and the interval for refreshing the print information is 5s, the theoretical number of printing times is 12. However, the actual number of printing times is less than 12.
  • INT8 mode employs integer operations and requires fewer computing units, resulting in lower power consumption compared to the FP16 floating-point operations. In addition, a performance threshold is preset for hardware devices. In FP16 mode, the threshold is easily to reach, after which protection mechanisms, such as active frequency reduction and voltage adjustment will be triggered to prevent the hardware device power consumption from exceeding the threshold for a long time. In INT8 mode, the power consumption is relatively low. If the threshold is not reached, the power consumption of different hardware devices may differ significantly.
  • Atlas A3 training product and Atlas A3 inference product contain multiple NPUs, so their test results display the power consumption of the entire NPU instead of the device-level power consumption. In addition, an error is reported only when the power consumption of two devices is abnormal.
  • The power consumption test of the Atlas 300I Duo inference card includes AICORE/AICPU/chipMemory operator execution and DVPP. To perform a stress test on the DVPP module, you need to generate an image file in the /var/log/ascend_check directory (~/var/log/ascend_check for a non-root user). At least 1 GB memory space must be reserved in advance.

Eye diagram test

  • If a CDR loopback has been configured on an NPU, cancel the loopback before performing the eye diagram test.
  • If the eye diagram test is performed on the Atlas A3 training product and Atlas A3 inference product, non-root users of driver versions earlier than 25.3.RC1 are not supported in container scenarios.
  • For the Atlas 900 A2 PoD cluster basic unit, Atlas 900 A2 PoDc cluster basic unit, or Atlas 800T A2 training server, non-root users of driver versions earlier than 25.3.RC1 are not allowed to query the HCCS signal quality in container scenarios.
  • The HCCS signal quality on the CPU is diagnosed on the Atlas A3 training product and Atlas A3 inference product equipped with Kunpeng 920 processors. Only the root user of a physical machine is supported.
  • The full eye diagram test is supported only by the root user of the A200T A3 Box8 SuperPoD Server when the communication port type is PCIe.

Stream test (one-click/custom traffic generation)

  • This test is a high-risk operation and may cause the network port to go down. Therefore, you need to perform this test separately.
  • The NPU and CDR adaptation is automatically disabled during the PRBS stream test. However, multiple executions of the stream test command can repeatedly enable and disable adaptation. If the adaptation process is not completed in a timely manner, the number of bit errors can reach 67092480 at certain time, which is a normal phenomenon.
  • If a CDR loopback is used for traffic generation, cancel the CDR loopback after the traffic is generated.

Eye diagram diagnosis

  • For the Atlas A3 training product, Atlas A3 inference product, Atlas 900 A2 PoD cluster basic unit, Atlas 900 A2 PoDc cluster basic unit, Atlas 800T A2 training server, or A200I A2 Box heterogeneous component, non-root users of driver versions earlier than 25.3.RC1 are not supported in container scenarios.
  • The HCCS signal quality on the CPU is diagnosed on the Atlas A3 training product and Atlas A3 inference product equipped with Kunpeng 920 processors. Only the root user of a physical machine is supported.

Bandwidth diagnosis

  • For the Atlas 200T A2 Box16/Atlas 200I A2 Box16 heterogeneous subrack in the virtual machine scenario, due to the particularity of data transmission channels, the bandwidth test is not performed between two 8-NPU groups.
  • If the bandwidth diagnosis is performed on the Atlas A3 training product and Atlas A3 inference product, non-root users of driver versions earlier than 25.3.RC1 are not supported in container scenarios.

NIC diagnosis

  • Before enabling NIC diagnosis, ensure that the parameter plane networks of all devices in the environment are connected.

PRBS stream diagnosis

  • The NPU and CDR adaptation is automatically disabled during the PRBS stream test. However, multiple executions of the stream test command can repeatedly enable and disable adaptation. If the adaptation process is not completed in a timely manner, the number of bit errors can reach 67092480 at certain time, which is a normal phenomenon.
  • If a CDR loopback is used for traffic generation, cancel the CDR loopback after the traffic is generated.

AICORE stress test

  • The AICORE stress test occupies about 20 GB to 40 GB memory of the host server. Before running the command, reserve sufficient memory to prevent process interrupts.

AICORE diagnosis

  • After the AICORE diagnosis is complete, check whether the AIC and bus voltages are normal. If they are abnormal, run the ascend-dmi -r command to restore the NPU environment. For details, see NPU Environment Restoration.

Power consumption stress test

  • To ensure the correctness and accuracy of the test result, perform the power consumption stress test separately.
  • Power consumption is closely related to the MCU. Before conducting a power consumption test, ensure the MCU is upgraded to a compatible version. Otherwise, AICORE usage might not reach 100%, and voltage adjustments could be abnormal.
  • The power consumption stress test cannot be performed in environments with high temperatures or heat dissipation issues. Otherwise, hardware disconnection (i.e. the NPU is undetectable via the npu-smi info command) and other hardware faults may occur.
  • The power consumption stress test cannot be used to test the heat dissipation of the hardware at different temperatures. Otherwise, hardware disconnection (i.e. the NPU is undetectable via the npu-smi info command) and other hardware faults may occur.

One-click on-chip memory stress test

  • The processor may be reset during the stress test. You need to perform the stress test as the root user; otherwise, the reset will fail.

P2P stress test

  • You are not advised to use this function when NPUs are isolated.
  • Atlas 800I A2 inference server (32 GB PCIe) does not support the P2P stress test.

NPU environment restoration

  • For the Atlas A2 training products and Atlas A2 inference products, this operation is supported in physical machine and container scenarios.

DSA stress test

  • Supported only by the Atlas A2 training products, Atlas A2 inference products, Atlas A3 training product, and Atlas A3 inference product.