Cluster Scheduling Component Pods Are in the ContainerCreating State

Symptom

After cluster scheduling components are deployed, the kubectl get pods --all-namespaces -o wide command is executed to check the status of each component. However, the pod status of each component is ContainerCreating. The following uses the Ascend Device Plugin as an example.

root@ubuntu:/home# kubectl get pods --all-namespaces -o wide
NAMESPACE        NAME                                       READY   STATUS              RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
kube-system      ascend-device-plugin-daemonset-910-fpjw2           0/1     ContainerCreating    0          10m     <none>          ubuntu     <none>           <none>
...

Run the following command to view pod details:

kubectl describe pod -n namespace podname

Example:

kubectl describe pod -n kube-system ascend-device-plugin-daemonset-910-fpjw2

The following information is displayed:

...
QoS Class:       Guaranteed
Node-Selectors:  masterselector=dls-master-node
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age               From                 Message
  ----     ------       ----              ----                 -------
  Normal   Scheduled    16s               default-scheduler    Successfully assigned kube-system/ascend-device-plugin-daemonset-910-fpjw2 to ubuntu
  Warning  FailedMount  8s (x5 over 15s)  kubelet, ubuntu      MountVolume.SetUp failed for volume "device-ascenddeviceplugin" : hostPath type check failed: /var/log/mindx-dl/ascend-device-plugin is not a directory

Cause Analysis

The log directory of the corresponding component does not exist.

Solution

  1. Create the corresponding log directory and set the permission and owner for the directory.

    For details, see Creating a Log Directory.

  2. Manually uninstall the component and deploy it again.