Cluster Scheduling Component Pods Are in the ContainerCreating State
Symptom
After cluster scheduling components are deployed, the kubectl get pods --all-namespaces -o wide command is executed to check the status of each component. However, the pod status of each component is ContainerCreating. The following uses the Ascend Device Plugin as an example.
root@ubuntu:/home# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system ascend-device-plugin-daemonset-910-fpjw2 0/1 ContainerCreating 0 10m <none> ubuntu <none> <none> ...
Run the following command to view pod details:
kubectl describe pod -n namespace podname
Example:
kubectl describe pod -n kube-system ascend-device-plugin-daemonset-910-fpjw2
The following information is displayed:
...
QoS Class: Guaranteed
Node-Selectors: masterselector=dls-master-node
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16s default-scheduler Successfully assigned kube-system/ascend-device-plugin-daemonset-910-fpjw2 to ubuntu
Warning FailedMount 8s (x5 over 15s) kubelet, ubuntu MountVolume.SetUp failed for volume "device-ascenddeviceplugin" : hostPath type check failed: /var/log/mindx-dl/ascend-device-plugin is not a directory
Cause Analysis
The log directory of the corresponding component does not exist.
Solution
- Create the corresponding log directory and set the permission and owner for the directory.
For details, see Creating a Log Directory.
- Manually uninstall the component and deploy it again.
Parent topic: Faults During Installation