昇腾社区首页
中文
注册

检查安装结果

通过MindCluster Ascend Deployer工具安装完成后,可以通过执行如下操作检查各组件的版本及能否正常工作。

<target>可选范围可通过执行bash install.sh --help查看或参考install参数说明
1
bash install.sh --test=<target>

全量检查

检查已安装的全量组件及版本信息。
1
bash install.sh   --test=all                                //测试已安装的全量组件及版本信息

回显信息如下所示。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
task info:do dl test on master
[===================================================]

{'51.38.68.75': {'driver': ['OK', '24.1.RC3'], 'firmware': ['OK', '7.5.0.1.129'], 'toolbox': ['OK', '6.0.0'], 'mindspore': ['OK', '2.4.10'], 'pytorch': ['OK', '2.4.0.post2'], 'tensorflow': ['OK', '2.6.5'], 'tfplugin': ['not installed', ''], 'nnae': ['OK', '8.0.0'], 'nnrt': ['OK', '8.0.0'], 'toolkit': ['OK', '8.0.0'], 'fault-diag': ['OK', '6.0.0']}}

ubuntu: {'ascend-operator': ['OK', 'v6.0.0'], 'clusterd': ['OK', 'v6.0.0'], 'resilience-controller': ['OK', 'v6.0.0'], 'hccl-controller': ['not installed', ''], 'volcano': ['OK', 'v1.7.0'], 'ascend-device-plugin': ['OK', 'v6.0.0'], 'noded': ['OK', 'v6.0.0'], 'npu-exporter': ['OK', 'v6.0.0'], 'ascend-docker-runtime': ['OK', 'v6.0.0']}

npu-clusters
------------+--------------+----------------
npu         | driver       | firmware
------------+--------------+----------------
51.38.68.75 | OK, 24.1.RC3 | OK, 7.5.0.1.129
------------+--------------+----------------

cann-clusters
------------+-------------+---------------+---------------+---------------+--------------
cann        | toolbox     | tfplugin      | nnae          | nnrt          | toolkit      
------------+-------------+---------------+---------------+---------------+--------------
51.38.68.75 | OK, 6.0.0   | not installed | OK, 8.0.0     | OK, 8.0.0     | OK, 8.0.0    
------------+-------------+---------------+---------------+---------------+--------------

pypkg-clusters
------------+---------------+------------+-------------------+--------------
pypkg       | mindspore     | tensorflow | pytorch           | fault-diag
------------+---------------+------------+-------------------+--------------
51.38.68.75 | OK, 2.4.10    | OK, 2.6.5  | OK, 2.4.0.post2   | OK, 6.0.0
------------+---------------+------------+-------------------+--------------


dl-clusters(master-node)
------------+-----------------+--------------+-----------------+------------+----------------------
master-node | ascend-operator | clusterd     | hccl-controller | volcano    | resilience-controller
------------+-----------------+--------------+-----------------+------------+----------------------
ubuntu      | OK, v6.0.0      | OK, v6.0.0   | not installed   | OK, v1.7.0 | OK, v6.0.0   
------------+-----------------+--------------+-----------------+------------+----------------------

dl-clusters(worker-node)
------------+----------------------+-----------------------+--------------+-------------
worker-node | ascend-device-plugin | ascend-docker-runtime | noded        | npu-exporter
------------+----------------------+-----------------------+--------------+-------------
ubuntu      | OK, v6.0.0           | OK, v6.0.0            | OK, v6.0.0   | OK, v6.0.0
------------+----------------------+-----------------------+--------------+-------------
 => localhost
ascend deployer processed success
run cmd: --test=all successfully

检查NPU驱动与固件、CANN、MindCluster 故障诊断、MindCluster 性能测试

可执行如下命令查询已安装的NPU驱动固件、CANN、MindCluster 故障诊断、MindCluster 性能测试信息。

1
bash install.sh --test=toolbox                                     //测试toolbox是否正常

检查AI框架

可执行如下命令查询已安装的AI框架信息。
1
bash install.sh   --test=mindspore                 //测试已安装的mindspore及版本信息

检查MindCluster 集群调度组件

方式一:执行如下命令检查已安装的MindCluster 集群调度组件及版本信息。示例如下。

1
bash install.sh   --test=clusterd 

方式二:安装MindCluster 集群调度后,可进入/ascend_deployer/report目录查看报告文件“report.json”“report.csv”查看安装结果及节点状态。

后续操作

配置环境变量