检查安装结果
通过MindCluster Ascend Deployer工具安装完成后,可以通过执行如下操作检查各组件的版本及能否正常工作。
<target>可选范围可通过执行bash install.sh --help查看或参考install参数说明。
bash install.sh --test=<target>
全量检查
检查已安装的全量组件及版本信息。
bash install.sh --test=all //测试已安装的全量组件及版本信息
回显信息如下所示。
task info:Generate the test report [===================================================] {'xx.xx.xx.xx': {'driver': ['OK', '25.5.0'], 'firmware': ['OK', '7.8.0.5.208'], 'mcu': {'npu_id_1': '24.15.19', 'npu_id_2': '24.15.19', 'npu_id_3': '24.15.19', 'npu_id_4': '24.15.30', 'npu_id_5': '24.15.19', 'npu_id_6': '24.15.19', 'npu_id_7': '24.15.19'}, 'toolbox': ['OK', '7.3.0'], 'mindspore': ['not installed', ''], 'pytorch': ['not installed', ''], 'tensorflow': ['not installed', ''], 'nnae': ['OK', '8.5.0'], 'nnrt': ['OK', '8.5.0'], 'toolkit': ['OK', '8.5.0'], 'fault-diag': ['not installed', ''], 'mindie_image': ['not installed', '']}} localhost.localdomain: xx.xx.xx.xx: {'ascend-operator': ['OK', 'v7.3.0'], 'clusterd': ['OK', 'v7.3.0'], 'resilience-controller': ['not installed', ''], 'volcano': ['OK', 'v1.9.0'], 'ascend-device-plugin': ['OK', 'v7.3.0'], 'noded': ['not installed', ''], 'npu-exporter': ['not installed', ''], 'ascend-docker-runtime': ['not installed', '']} npu-clusters ------------+------------+---------------- npu | driver | firmware ------------+------------+---------------- xx.xx.xx.xx | OK, 25.5.0 | OK, 7.8.0.5.208 ------------+------------+---------------- mcu-version ------------+----------+----------+----------+----------+----------+----------+--------- ip | npu_id_1 | npu_id_2 | npu_id_3 | npu_id_4 | npu_id_5 | npu_id_6 | npu_id_7 ------------+----------+----------+----------+----------+----------+----------+--------- xx.xx.xx.xx | 24.15.19 | 24.15.19 | 24.15.19 | 24.15.30 | 24.15.19 | 24.15.19 | 24.15.19 ------------+----------+----------+----------+----------+----------+----------+--------- cann-clusters ------------+-----------+-----------+-----------+-----------+-------------- cann | toolbox | nnae | nnrt | toolkit | mindie_image ------------+-----------+-----------+-----------+-----------+-------------- xx.xx.xx.xx | OK, 7.3.0 | OK, 8.5.0 | OK, 8.5.0 | OK, 8.5.0 | not installed ------------+-----------+-----------+-----------+-----------+-------------- pypkg-clusters ------------+---------------+---------------+---------------+-------------- pypkg | mindspore | tensorflow | pytorch | fault-diag ------------+---------------+---------------+---------------+-------------- xx.xx.xx.xx | not installed | not installed | not installed | not installed ------------+---------------+---------------+---------------+-------------- dl-clusters(master-node) -----------------------------------+-----------------+----------+------------+---------------------- master-node | ascend-operator | clusterd | volcano | resilience-controller -----------------------------------+-----------------+----------+------------+---------------------- localhost.localdomain: xx.xx.xx.xx | OK, v7.3.0 | OK, v7.3.0 | OK, v1.9.0 | not installed -----------------------------------+-----------------+----------+------------+---------------------- dl-clusters(worker-node) -----------------------------------+----------------------+---------------+-------------- worker-node | ascend-device-plugin | noded | npu-exporter -----------------------------------+----------------------+---------------+-------------- localhost.localdomain: xx.xx.xx.xx | OK, v7.3.0 | not installed | not installed -----------------------------------+----------------------+---------------+-------------- dl-clusters(worker-node-physical) -----------------------------------+---------------------- worker-node | ascend-docker-runtime -----------------------------------+---------------------- localhost.localdomain: xx.xx.xx.xx | not installed -----------------------------------+---------------------- => localhost ascend deployer processed success run cmd: --test=all successfully
检查NPU驱动与固件、CANN、MindCluster故障诊断、MindCluster性能测试、MindIE容器
可执行如下命令查询已安装的NPU驱动固件、CANN、MindCluster故障诊断、MindCluster性能测试和MindIE容器信息。
bash install.sh --test=toolbox //测试toolbox是否正常
检查AI框架
可执行如下命令查询已安装的AI框架信息。
bash install.sh --test=mindspore //测试已安装的mindspore及版本信息。
检查MindCluster集群调度组件
方式一:执行如下命令检查已安装的MindCluster集群调度组件及版本信息。示例如下。
bash install.sh --test=clusterd
方式二:安装MindCluster集群调度后,可进入/ascend_deployer/report目录通过报告文件“report.json”或“report.csv”查看安装结果及节点状态。
检查MindIE容器版本信息
可执行如下命令检查MindIE容器的版本信息。
bash install.sh --test=mindie_image
后续操作
父主题: 安装昇腾软件