昇腾社区首页
中文
注册
开发者
下载

检查升级结果

通过MindCluster Ascend Deployer工具升级完成后,可以执行如下操作检查各组件的版本及能否正常工作。

<target>可选范围可通过执行bash install.sh --help查看或参考install参数说明
bash install.sh --test=<target>

全量检查

检查已安装的全量组件及版本信息。
bash install.sh   --test=all                                //测试已安装的全量组件及版本信息。

回显信息如下所示。

task info:Generate the test report
[===================================================]
 
{'xx.xx.xx.xx': {'driver': ['OK', '25.5.0'], 'firmware': ['OK', '7.8.0.5.208'], 'mcu': {'npu_id_1': '24.15.19', 'npu_id_2': '24.15.19', 'npu_id_3': '24.15.19', 'npu_id_4': '24.15.30', 'npu_id_5': '24.15.19', 'npu_id_6': '24.15.19', 'npu_id_7': '24.15.19'}, 'toolbox': ['OK', '7.3.0'], 'mindspore': ['not installed', ''], 'pytorch': ['not installed', ''], 'tensorflow': ['not installed', ''], 'nnae': ['OK', '8.5.0'], 'nnrt': ['OK', '8.5.0'], 'toolkit': ['OK', '8.5.0'], 'fault-diag': ['not installed', ''], 'mindie_image': ['not installed', '']}}
 
localhost.localdomain: xx.xx.xx.xx: {'ascend-operator': ['OK', 'v7.3.0'], 'clusterd': ['OK', 'v7.3.0'], 'resilience-controller': ['not installed', ''], 'volcano': ['OK', 'v1.9.0'], 'ascend-device-plugin': ['OK', 'v7.3.0'], 'noded': ['not installed', ''], 'npu-exporter': ['not installed', ''], 'ascend-docker-runtime': ['not installed', '']}
 
 
npu-clusters
------------+------------+----------------
npu         | driver     | firmware
------------+------------+----------------
xx.xx.xx.xx | OK, 25.5.0 | OK, 7.8.0.5.208
------------+------------+----------------
 
mcu-version
------------+----------+----------+----------+----------+----------+----------+---------
ip          | npu_id_1 | npu_id_2 | npu_id_3 | npu_id_4 | npu_id_5 | npu_id_6 | npu_id_7
------------+----------+----------+----------+----------+----------+----------+---------
xx.xx.xx.xx | 24.15.19 | 24.15.19 | 24.15.19 | 24.15.30 | 24.15.19 | 24.15.19 | 24.15.19
------------+----------+----------+----------+----------+----------+----------+---------
 
cann-clusters
------------+-----------+-----------+-----------+-----------+--------------
cann        | toolbox   | nnae      | nnrt      | toolkit   | mindie_image
------------+-----------+-----------+-----------+-----------+--------------
xx.xx.xx.xx | OK, 7.3.0 | OK, 8.5.0 | OK, 8.5.0 | OK, 8.5.0 | not installed
------------+-----------+-----------+-----------+-----------+--------------
 
pypkg-clusters
------------+---------------+---------------+---------------+--------------
pypkg       | mindspore     | tensorflow    | pytorch       | fault-diag
------------+---------------+---------------+---------------+--------------
xx.xx.xx.xx | not installed | not installed | not installed | not installed
------------+---------------+---------------+---------------+--------------
 
 
dl-clusters(master-node)
-----------------------------------+-----------------+----------+------------+----------------------
master-node                        | ascend-operator | clusterd | volcano    | resilience-controller
-----------------------------------+-----------------+----------+------------+----------------------
localhost.localdomain: xx.xx.xx.xx | OK, v7.3.0      | OK, v7.3.0    | OK, v1.9.0 | not installed
-----------------------------------+-----------------+----------+------------+----------------------
 
dl-clusters(worker-node)
-----------------------------------+----------------------+---------------+--------------
worker-node                        | ascend-device-plugin | noded         | npu-exporter
-----------------------------------+----------------------+---------------+--------------
localhost.localdomain: xx.xx.xx.xx | OK, v7.3.0           | not installed | not installed
-----------------------------------+----------------------+---------------+--------------
 
dl-clusters(worker-node-physical)
-----------------------------------+----------------------
worker-node                        | ascend-docker-runtime
-----------------------------------+----------------------
localhost.localdomain: xx.xx.xx.xx | not installed
-----------------------------------+----------------------
 => localhost
ascend deployer processed success
run cmd: --test=all successfully

检查NPU固件与驱动、CANN软件、MindCluster组件(性能测试、故障诊断)

可执行如下命令查询已安装软件及版本信息。

bash install.sh --test=toolbox                                     //测试toolbox是否正常

检查MindCluster集群调度

方式一:执行如下命令检查已安装的MindCluster集群调度及版本信息。示例如下。

bash install.sh --test=clusterd 

方式二:安装MindCluster集群调度后,可进入/ascend_deployer/report目录查看报告文件“report.json”“report.csv”查看安装结果及节点状态。

检查MCU固件

可执行如下命令查询MCU固件信息。

bash install.sh --test=mcu