部署集群调度组件服务后,通过命令kubectl get pods --all-namespaces -o wide查看各组件状态,发现Pod处于ContainerCreating状态。以HCCL-Controller为例说明。
root@ubuntu:/home# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default hccl-controller-6bc9bccc4c-n6c7w 0/1 ContainerCreating 0 10m <none> ubuntu <none> <none> ...
使用命令查看Pod详情。
kubectl describe pod -n namespace podname
如:
kubectl describe pod -n default hccl-controller-6bc9bccc4c-n6c7w
显示如下内容:
... QoS Class: Guaranteed Node-Selectors: masterselector=dls-master-node Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16s default-scheduler Successfully assigned default/hccl-controller-6bc9bccc4c-n6c7w to ubuntu Warning FailedMount 8s (x5 over 15s) kubelet, ubuntu MountVolume.SetUp failed for volume "device-hcclcontroller" : hostPath type check failed: /var/log/mindx-dl/hccl-controller is not a directory
对应服务的日志目录不存在。
具体操作请参见创建日志目录章节。