安装MindX DL组件后可通过查看K8s节点、查看K8s Pods和查看Ascend Docker Runtime、NPU和HCCN等信息的状态检查安装是否成功;安装MEF Center后可通过查看K8s Pods和查看进程信息检查安装是否成功。
NAME STATUS ROLES AGE VERSION master Ready master 60s v1.19.16 worker-1 Ready worker 60s v1.19.16
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system ascend-device-plugin-daemonset-910-lq 1/1 Running 0 21h kube-system calico-kube-controllers-68c855c64-4fn2k 1/1 Running 1 21h kube-system calico-node-4zfjp 1/1 Running 0 21h kube-system calico-node-jsdws 1/1 Running 0 21h kube-system coredns-f9fd979d6-84xd2 1/1 Running 0 21h kube-system coredns-f9fd979d6-8fld7 1/1 Running 0 21h kube-system etcd-ubuntu-1 1/1 Running 0 21h kube-system kube-apiserver-ubuntu-1 1/1 Running 0 21h kube-system kube-controller-manager-ubuntu-1 1/1 Running 8 21h kube-system kube-proxy-6zr9j 1/1 Running 0 21h kube-system kube-proxy-w9lw9 1/1 Running 0 21h kube-system kube-scheduler-ubuntu-1 1/1 Running 6 21h mindx-dl hccl-controller-8ff6fd684-9pgxm 1/1 Running 0 19h mindx-dl noded-c2h7r 1/1 Running 0 19h npu-exporter npu-exporter-7kt25 1/1 Running 0 19h volcano-system volcano-controllers-56cbbb9c6-9trf7 1/1 Running 0 19h volcano-system volcano-scheduler-66f75bf89f-94jkx 1/1 Running 0 19h
用户在master节点上也可以通过集群状态报告通过k8s_status_report工具查看集群状态确认安装结果。
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-68c855c64-4fn2k 1/1 Running 1 21h kube-system calico-node-4zfjp 1/1 Running 0 21h kube-system calico-node-jsdws 1/1 Running 0 21h kube-system coredns-f9fd979d6-84xd2 1/1 Running 0 21h kube-system coredns-f9fd979d6-8fld7 1/1 Running 0 21h kube-system etcd-ubuntu-1 1/1 Running 0 21h kube-system kube-apiserver-ubuntu-1 1/1 Running 0 21h kube-system kube-controller-manager-ubuntu-1 1/1 Running 8 21h kube-system kube-proxy-6zr9j 1/1 Running 0 21h kube-system kube-proxy-w9lw9 1/1 Running 0 21h kube-system kube-scheduler-ubuntu-1 1/1 Running 6 21h
安装MindX DL组件后,可执行命令docker info 2>/dev/null | grep Runtime,查看Ascend Docker Runtime是否生效,回显中出现“ascend”表示生效,回显示例如下。
Runtimes: ascend runc Default Runtime: ascend
cd /root/offline-deploy bash scripts/machine_report.sh