Cluster Scheduling Component Pod Is in the ContainerCreating Status
Symptom
After the cluster scheduling component service is deployed, a user runs the kubectl get pods --all-namespaces -o wide command to check the status of each component. It is found that the pod is in the ContainerCreating status. The following uses the HCCL-Controller as an example.
root@ubuntu:/home# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default hccl-controller-6bc9bccc4c-n6c7w 0/1 ContainerCreating 0 10m <none> ubuntu <none> <none> ...
Run the following command to view the pod details:
kubectl describe pod -n namespace podname
Example:
kubectl describe pod -n default hccl-controller-6bc9bccc4c-n6c7w
The following information is displayed:
...
QoS Class: Guaranteed
Node-Selectors: masterselector=dls-master-node
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16s default-scheduler Successfully assigned default/hccl-controller-6bc9bccc4c-n6c7w to ubuntu
Warning FailedMount 8s (x5 over 15s) kubelet, ubuntu MountVolume.SetUp failed for volume "device-hcclcontroller" : hostPath type check failed: /var/log/mindx-dl/hccl-controller is not a directory
Causes
The log directory of the corresponding service does not exist.
Solution
- Create the corresponding log directory and set the permission and owner for the directory.
For details, see Creating a Log Directory.
- Manually uninstall the service and deploy it again.
Parent topic: Troubleshooting