Cluster Scheduling Component Pod Is in the ContainerCreating Status

Symptom

After the cluster scheduling component service is deployed, a user runs the kubectl get pods --all-namespaces -o wide command to check the status of each component. It is found that the pod is in the ContainerCreating status. The following uses the HCCL-Controller as an example.

root@ubuntu:/home# kubectl get pods --all-namespaces -o wide
NAMESPACE        NAME                                       READY   STATUS              RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
default          hccl-controller-6bc9bccc4c-n6c7w           0/1     ContainerCreating   0          10m     <none>          ubuntu     <none>           <none>
...

Run the following command to view the pod details:

kubectl describe pod -n namespace podname

Example:

kubectl describe pod -n default hccl-controller-6bc9bccc4c-n6c7w

The following information is displayed:

...
QoS Class:       Guaranteed
Node-Selectors:  masterselector=dls-master-node
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age               From                 Message
  ----     ------       ----              ----                 -------
  Normal   Scheduled    16s               default-scheduler    Successfully assigned default/hccl-controller-6bc9bccc4c-n6c7w to ubuntu
  Warning  FailedMount  8s (x5 over 15s)  kubelet, ubuntu      MountVolume.SetUp failed for volume "device-hcclcontroller" : hostPath type check failed: /var/log/mindx-dl/hccl-controller is not a directory

Causes

The log directory of the corresponding service does not exist.

Solution

  1. Create the corresponding log directory and set the permission and owner for the directory.

    For details, see Creating a Log Directory.

  2. Manually uninstall the service and deploy it again.