检查安装结果
通过MindCluster Ascend Deployer工具安装完成后,可以通过执行如下操作检查各组件的版本及能否正常工作。
<target>可选范围可通过执行bash install.sh --help查看或参考install参数说明。
1 | bash install.sh --test=<target> |
全量检查
检查已安装的全量组件及版本信息。
1 | bash install.sh --test=all //测试已安装的全量组件及版本信息 |
回显信息如下所示。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | task info:do dl test on master [===================================================] {'51.38.68.75': {'driver': ['OK', '24.1.RC3'], 'firmware': ['OK', '7.5.0.1.129'], 'toolbox': ['OK', '6.0.0'], 'mindspore': ['OK', '2.4.10'], 'pytorch': ['OK', '2.4.0.post2'], 'tensorflow': ['OK', '2.6.5'], 'tfplugin': ['not installed', ''], 'nnae': ['OK', '8.0.0'], 'nnrt': ['OK', '8.0.0'], 'toolkit': ['OK', '8.0.0'], 'fault-diag': ['OK', '6.0.0']}} ubuntu: {'ascend-operator': ['OK', 'v6.0.0'], 'clusterd': ['OK', 'v6.0.0'], 'resilience-controller': ['OK', 'v6.0.0'], 'hccl-controller': ['not installed', ''], 'volcano': ['OK', 'v1.7.0'], 'ascend-device-plugin': ['OK', 'v6.0.0'], 'noded': ['OK', 'v6.0.0'], 'npu-exporter': ['OK', 'v6.0.0'], 'ascend-docker-runtime': ['OK', 'v6.0.0']} npu-clusters ------------+--------------+---------------- npu | driver | firmware ------------+--------------+---------------- 51.38.68.75 | OK, 24.1.RC3 | OK, 7.5.0.1.129 ------------+--------------+---------------- cann-clusters ------------+-------------+---------------+---------------+---------------+-------------- cann | toolbox | tfplugin | nnae | nnrt | toolkit ------------+-------------+---------------+---------------+---------------+-------------- 51.38.68.75 | OK, 6.0.0 | not installed | OK, 8.0.0 | OK, 8.0.0 | OK, 8.0.0 ------------+-------------+---------------+---------------+---------------+-------------- pypkg-clusters ------------+---------------+------------+-------------------+-------------- pypkg | mindspore | tensorflow | pytorch | fault-diag ------------+---------------+------------+-------------------+-------------- 51.38.68.75 | OK, 2.4.10 | OK, 2.6.5 | OK, 2.4.0.post2 | OK, 6.0.0 ------------+---------------+------------+-------------------+-------------- dl-clusters(master-node) ------------+-----------------+--------------+-----------------+------------+---------------------- master-node | ascend-operator | clusterd | hccl-controller | volcano | resilience-controller ------------+-----------------+--------------+-----------------+------------+---------------------- ubuntu | OK, v6.0.0 | OK, v6.0.0 | not installed | OK, v1.7.0 | OK, v6.0.0 ------------+-----------------+--------------+-----------------+------------+---------------------- dl-clusters(worker-node) ------------+----------------------+-----------------------+--------------+------------- worker-node | ascend-device-plugin | ascend-docker-runtime | noded | npu-exporter ------------+----------------------+-----------------------+--------------+------------- ubuntu | OK, v6.0.0 | OK, v6.0.0 | OK, v6.0.0 | OK, v6.0.0 ------------+----------------------+-----------------------+--------------+------------- => localhost ascend deployer processed success run cmd: --test=all successfully |
检查NPU驱动与固件、CANN、MindCluster 故障诊断、MindCluster 性能测试
可执行如下命令查询已安装的NPU驱动固件、CANN、MindCluster 故障诊断、MindCluster 性能测试信息。
1 | bash install.sh --test=toolbox //测试toolbox是否正常 |
检查AI框架
可执行如下命令查询已安装的AI框架信息。
1 | bash install.sh --test=mindspore //测试已安装的mindspore及版本信息。 |
检查MindCluster 集群调度组件
方式一:执行如下命令检查已安装的MindCluster 集群调度组件及版本信息。示例如下。
1 | bash install.sh --test=clusterd |
方式二:安装MindCluster 集群调度后,可进入/ascend_deployer/report目录查看报告文件“report.json”或“report.csv”查看安装结果及节点状态。
后续操作
父主题: 安装昇腾软件