Common Parameters

The following table describes commands' common parameters.

Level-1 Parameter

Level-2 Parameter

Mandatory

Description

Supported Function

[-dg, --dg, --diagnosis]

[-dg, --dg, --diagnosis]

Yes

Performs a fault diagnosis of the entire NPU.

If multiple level-2 parameters such as -i and -r need to be appended to ascend-dmi --dg, the sequence of these parameters is flexible. The sequence does not affect the command output.

Fault diagnosis:

  • One-click diagnosis
  • CANN-driver compatibility diagnosis
  • Chip diagnosis
  • Network health diagnosis
  • On-chip memory diagnosis
  • Driver health diagnosis
  • Eye diagram diagnosis
  • Bandwidth diagnosis
  • NIC diagnosis
  • PRBS stream diagnosis
  • Computing power diagnosis
  • AICORE stress test
  • AICORE diagnosis
  • AICPU stress test
  • Power consumption stress test
  • One-click on-chip memory stress test
  • On-chip memory stress test
  • On-chip memory high-risk address stress test
  • P2P stress test
  • DSA stress test

[-se, --scene, --se]

No

Specifies a diagnosis scenario. Currently, the following scenarios are supported:

  • healthCheck
  • performanceCheck
  • stressTest

-

[-i, --items]

No

Specifies the diagnosis check item.

  • You can specify one or more items among driver, cann, device, network, bandwidth, aiflops, hbm/chipMemory, signalQuality, and random. Separate the items with commas (,).
  • If this parameter is not specified, check items except AICORE, PRBS, EDP, TDP, AICPU, and NIC are diagnosed by default.
  • If -i specifies aicpu, other diagnostic items cannot be used.

Fault diagnosis:

  • CANN-driver compatibility diagnosis
  • Chip diagnosis
  • Network health diagnosis
  • On-chip memory diagnosis
  • Driver health diagnosis
  • Eye diagram diagnosis
  • Bandwidth diagnosis
  • NIC diagnosis
  • PRBS stream diagnosis
  • Computing power diagnosis
  • AICORE stress test
  • AICORE diagnosis
  • AICPU stress test
  • Power consumption stress test
  • One-click on-chip memory stress test
  • On-chip memory stress test
  • On-chip memory high-risk address stress test
  • P2P stress test
  • DSA stress test

[-r, --result]

No

Specifies the path for saving fault diagnosis results and information collection results, for example, /test. The specified path must meet security requirements and cannot contain the wildcard (*).

  • If you specify a path for saving the result file, create an ascend_check folder in the specified path. The path specified by the root user is created in the root directory, and the path specified by a non-root user is created in the $HOME directory. If no path is specified, the result file is saved in the default path. For the root user, the path is /var/log/ascend_check. For a non-root user, the path is $HOME/var/log/ascend_check.
  • To prevent the permission on the result saving directory from being modified, you can set the permission on ascend_check to 700 for security purposes.

Fault diagnosis:

  • One-click diagnosis
  • CANN-driver compatibility diagnosis
  • Chip diagnosis
  • Network health diagnosis
  • On-chip memory diagnosis
  • Driver health diagnosis
  • Eye diagram diagnosis
  • Bandwidth diagnosis
  • NIC diagnosis
  • PRBS stream diagnosis
  • Computing power diagnosis
  • AICORE stress test
  • AICORE diagnosis
  • AICPU stress test
  • Power consumption stress test
  • One-click on-chip memory stress test
  • On-chip memory stress test
  • On-chip memory high-risk address stress test
  • P2P stress test
  • DSA stress test

[-s, --stress]

No

Performs a stress test.

This parameter is mandatory when the functions listed in the right column are enabled.

Fault diagnosis:

  • AICORE stress test
  • AICPU stress test
  • Power consumption stress test
  • One-click on-chip memory stress test
  • On-chip memory stress test
  • On-chip memory high-risk address stress test
  • P2P stress test

[-p, --path]

-

No

Specifies the installation path.

  • If you do not use the default installation path when installing the target package, this parameter must be set to the actual installation path.
  • The specified path must meet security requirements and cannot contain the wildcard (*).
  • If this parameter is not specified and the package is installed by the root user, the default path /usr/local/Ascend is used.
  • If the check items specified by [-i, --items] do not contain cann, leave this parameter unspecified.

Information query:

  • Software-hardware compatibility test

Fault diagnosis:

  • One-click diagnosis
  • CANN-driver compatibility diagnosis

[-d, --device]

-

No

Specifies the ID of the device to be diagnosed. The device ID is the logic ID of the Ascend NPU.

  • You can specify one or more device IDs. Separate device IDs with commas (,).
  • If no device ID is specified, the bandwidth test result of device 0 is returned in H2D/D2H/D2D mode for products except the Atlas A3 training product and Atlas A3 inference products. In other scenarios, the diagnosis results of all devices are returned.
  • In this document, the input or output device ID is the logic ID. You can run the npu-smi info -m command to obtain the logic ID by viewing the value of Chip Logic ID on the GUI. The NPU ID is the physical chip ID.

Performance test:

  • Bandwidth test
  • SuperPoD P2P bandwidth test
  • Computing power test
  • Eye diagram test
  • Stream test (one-click/custom traffic generation)

Fault diagnosis:

  • Chip diagnosis
  • Network health diagnosis
  • On-chip memory diagnosis
  • Eye diagram diagnosis
  • Bandwidth diagnosis
  • NIC diagnosis
  • PRBS stream diagnosis
  • Computing power diagnosis
  • AICORE stress test
  • AICORE diagnosis
  • AICPU stress test
  • One-click on-chip memory stress test
  • On-chip memory stress test
  • On-chip memory high-risk address stress test
  • P2P stress test
  • DSA stress test

NPU environment restoration

[-fmt, --fmt, --format]

-

No

Specifies the output format. The value can be normal or json.

  • If this parameter is not specified, the default value normal is used.
  • If the json format is specified by [-fmt, --fmt, --format], the fault diagnosis result is saved in the ascend_check/environment_check_before.txt file. If the json format is not specified, the fault diagnosis result is not saved.
  • If a diagnosis item fails, the returned JSON result is displayed as JSON Example Returned When Diagnosis Fails.

All

[-q, --quiet]

-

No

  • If this parameter is specified, no foolproof message is displayed. By default, this operation is allowed. This parameter must be used together with bandwidth, aiflops, hbm, aicore, prbs, tdp, edp, aicpu, nic, and random specified by -i.
  • If this parameter is not specified, you need to enter Y or N (y or n) to confirm whether to perform the test.

Performance test:

  • Bandwidth test
  • SuperPoD P2P bandwidth test
  • Power consumption test
  • Computing power test
  • Stream test (one-click/custom traffic generation)

Fault diagnosis:

  • One-click diagnosis
  • Bandwidth diagnosis
  • NIC diagnosis
  • PRBS stream diagnosis
  • Computing power diagnosis
  • AICORE stress test
  • AICORE diagnosis
  • AICPU stress test
  • Power consumption stress test
  • One-click on-chip memory stress test
  • On-chip memory stress test
  • On-chip memory high-risk address stress test
  • P2P stress test
  • DSA stress test

NPU environment restoration

[-h, --help]

-

No

Displays the help information of a specified function of Ascend DMI.

All