昇腾社区首页
中文
注册
开发者
下载

检查安装结果

通过MindCluster Ascend Deployer工具安装完成后,可以通过执行如下操作检查各组件的版本及能否正常工作。

<target>可选范围可通过执行bash install.sh --help查看或参考install参数说明
bash install.sh --test=<target>

全量检查

检查已安装的全量组件及版本信息。
bash install.sh   --test=all                                //测试已安装的全量组件及版本信息

回显信息如下所示。

task info:Generate the test report
[===================================================]
 
{'xx.xx.xx.191': {'driver': ['OK', '25.3.0'], 'firmware': ['OK', '7.8.0.6.236'], 'mcu': {'npu_id_2': '25.5.3'}, 'toolbox': ['OK', '7.2.RC1'], 'mindspore': ['not installed', ''], 'pytorch': ['not installed', ''], 'tensorflow': ['not installed', ''], 'tfplugin': ['not installed', ''], 'nnae': ['OK', '8.3.RC1'], 'nnrt': ['OK', '8.3.RC1'], 'toolkit': ['OK', '8.3.RC1'], 'fault-diag': ['OK', '7.2.RC1'], 'mindie_image': ['not installed', '']}}

worker-191: xx.xx.xx.191: {'ascend-operator': ['OK', 'v7.2.RC1'], 'clusterd': ['OK', 'v7.2.RC1'], 'resilience-controller': ['not installed', ''], 'hccl-controller': ['not installed', ''], 'volcano': ['OK', 'v1.7.0'], 'ascend-device-plugin': ['OK', 'v7.2.RC1'], 'noded': ['OK', 'v7.2.RC1'], 'npu-exporter': ['OK', 'v7.2.RC1'], 'ascend-docker-runtime': ['OK', 'v7.2.RC1']}
 
npu-clusters
-------------+------------+----------------
npu          | driver     | firmware
-------------+------------+----------------
xx.xx.xx.191 | OK, 25.3.0 | OK, 7.8.0.6.236
-------------+------------+----------------
 
mcu-version
-------------+----------+---------
ip           | npu_id_1 | npu_id_2
-------------+----------+---------
xx.xx.xx.191 |          | 25.5.3
-------------+----------+---------
 
cann-clusters
-------------+-------------+---------------+-------------+-------------+-------------+--------------
cann         | toolbox     | tfplugin      | nnae        | nnrt        | toolkit     | mindie_image
-------------+-------------+---------------+-------------+-------------+-------------+--------------
xx.xx.xx.191 | OK, 7.2.RC1 | not installed | OK, 8.3.RC1 | OK, 8.3.RC1 | OK, 8.3.RC1 | not installed
-------------+-------------+---------------+-------------+-------------+-------------+--------------
 
pypkg-clusters
-------------+---------------+---------------+---------------+------------
pypkg        | mindspore     | tensorflow    | pytorch       | fault-diag
-------------+---------------+---------------+---------------+------------
xx.xx.xx.191 | not installed | not installed | not installed | OK, 7.2.RC1
-------------+---------------+---------------+---------------+------------
 
 
dl-clusters(master-node)
-------------------------+-----------------+--------------+-----------------+------------+----------------------
master-node              | ascend-operator | clusterd     | hccl-controller | volcano    | resilience-controller
-------------------------+-----------------+--------------+-----------------+------------+----------------------
worker-191: xx.xx.xx.191 | OK, v7.2.RC1    | OK, v7.2.RC1 | not installed   | OK, v1.7.0 | not installed
-------------------------+-----------------+--------------+-----------------+------------+----------------------
 
dl-clusters(worker-node)
-------------------------+----------------------+--------------+-------------
worker-node              | ascend-device-plugin | noded        | npu-exporter
-------------------------+----------------------+--------------+-------------
worker-191: xx.xx.xx.191 | OK, v7.2.RC1         | OK, v7.2.RC1 | OK, v7.2.RC1
-------------------------+----------------------+--------------+-------------
 
dl-clusters(worker-node-physical)
-------------------------+----------------------
worker-node              | ascend-docker-runtime
-------------------------+----------------------
worker-191: xx.xx.xx.191 | OK, v7.2.RC1
-------------------------+----------------------
 => localhost
ascend deployer processed success
run cmd: --test=all successfully

检查NPU驱动与固件、CANN、MindCluster故障诊断、MindCluster性能测试、MindIE容器

可执行如下命令查询已安装的NPU驱动固件、CANN、MindCluster故障诊断、MindCluster性能测试和MindIE容器信息。

bash install.sh --test=toolbox                                     //测试toolbox是否正常

检查AI框架

可执行如下命令查询已安装的AI框架信息。
bash install.sh   --test=mindspore                 //测试已安装的mindspore及版本信息。

检查MindCluster集群调度组件

方式一:执行如下命令检查已安装的MindCluster集群调度组件及版本信息。示例如下。

bash install.sh   --test=clusterd 

方式二:安装MindCluster集群调度后,可进入/ascend_deployer/report目录通过报告文件“report.json”“report.csv”查看安装结果及节点状态。

检查MindIE容器版本信息

可执行如下命令检查MindIE容器的版本信息。

bash install.sh   --test=mindie_image

后续操作

配置环境变量