pod Remains in the Terminating State After vcjob Is Manually Deleted

Symptom

After vcjob is deleted using kubectl delete -f xxx.yaml, the pod remains in the Terminating state.

Cause Analysis

A long timeout interval for graceful exit is set. As a result, after a job is deleted, the Pod of the job is in the Terminating state for a long time.

Solution

Method 1: Unmounting the NFS mounting paths of the pod

  1. Run the following command to check the NFS mounting paths of the pod:
    mount|grep NFS_share_IP_address
    Figure 1 Queried results.

    As shown in the figure, xxx.xxx.xxx.xxx:/data/k8s/run and xxx.xxx.xxx.xxx:/data/k8s/dls_data/public/dataset/resnet50 are the NFS mounting paths of the pod.

  2. Run the following command to unmount each NFS mounting path of the pod:
    umount -f NFS_mounting_path
  3. Run the following command to check whether the NFS mounting paths of the pod have been unmounted:
    mount|grep NFS_share_IP_address
    • If yes, no further action is required.
    • If no, go to Method 2.

Method 2: Deleting the Docker process to which the pod belongs

  1. Run the following command to check the Docker process to which the pod belongs:
    docker ps |grep pod_name
  2. Run the following command to check the files occupied by the Docker process:
    ll /var/lib/docker/containers |grep Docker_process_ID

    The following is an example of the command result:

    root@ubuntu:/data/k8s/run# ll /var/lib/docker/containers |grep 95aeeafe2db8
    drwx------ 4 root root 4096 Jun 24 16:00 95aeeafe2db898065094dd34dbfbeca04734d5248316aa802d43a36b4d8b99df/
  3. Run the following command to delete the files occupied by the Docker process:
    rm -rf /var/lib/docker/container/95aeeafe2db898065094dd34dbfbeca04734d5248316aa802d43a36b4d8b99df/
  4. Run the following command to query the ID of the Docker process that occupies the files:
    lsof |grep 95aeeafe2db8
    Figure 2 Queried results.
  5. Run the following command to stop the process:
    kill -9  PID
  6. Run the following command to check whether the process has been deleted:
    ps -ef | grep PID
    • If so, go to 7.
    • If no, query and stop the process again. For details, see 4 and go to 7.
  7. Run the following command to delete the Docker to which the pod belongs:
    docker rm 95aeeafe2db8

    After the pod is deleted, wait for about 1 minute and then view the pod information again.