Pod Status Cannot Be Queried When Volcano V1.7.0 Is Used
Symptom
When Volcano v1.7.0 is used, if the current environment resources are insufficient, running the kubectl get pod --all-namespaces -o wide command to query the pod status fails.
Cause Analysis
When Volcano v1.7.0 is used and resources are insufficient, a pod will not be created and its status cannot be queried.
Solution
- Run the following command to query all pod group information and find the pod group corresponding to the job:
kubectl get pg -A
Command output:NAMESPACE NAME STATUS MINMEMBER RUNNINGS AGE vcjob mindx-xxx-16-p-4bf232e4-bd48-438d-9089-02bfef354fce Inqueue 1 5m32s vcjob mindx-xxx-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914 Pending 1 5m15s
- If the value of STATUS is Inqueue, the pod has been created, and its status can be queried.
- If the value of STATUS is Pending, the pod fails to be created. In this case, proceed to Step 2 to locate the fault.
- Query details about a pod group.
kubectl describe pg -n <namespace> <podgroup-name>
Replace <namespace> and <podgroup-name> with the actual namespace and pod group name.
Example:kubectl describe pg -n vcjob mindx-xxx-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914
Information similar to the following is displayed, indicating that the queue resource quota is insufficient.Name: mindx-xxx-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914 Namespace: vcjob Labels: fault-scheduling=force ring-controller.atlas=ascend-{xxx}b Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"batch.volcano.sh/v1alpha1","kind":"Job","metadata":{"annotations":{},"labels":{"fault-scheduling":"force","ring-controller.... API Version: scheduling.volcano.sh/v1beta1 Kind: PodGroup Metadata: Creation Timestamp: 2023-07-05T09:00:02Z Generation: 7 Owner References: API Version: batch.volcano.sh/v1alpha1 Block Owner Deletion: true Controller: true Kind: Job Name: mindx-xxx-2-p UID: 8bf7f0f6-8a7e-4621-a0d0-cafa56785914 Resource Version: 17544644 Self Link: /apis/scheduling.volcano.sh/v1beta1/namespaces/vcjob/podgroups/mindx-xxx-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914 UID: 277cc974-5eec-455f-a860-25d7d19e8335 Spec: Min Member: 1 Min Resources: count/pods: 1 huawei.com/Ascend910: 2 Pods: 1 requests.huawei.com/Ascend910: 2 Min Task Member: Default - Test: 1 Queue: default Status: Conditions: Last Transition Time: 2023-07-05T09:05:46Z Message: 1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable Reason: NotEnoughResources Status: True Transition ID: 33585c5e-d3ad-4bc4-be0c-c09bea59520e Type: Unschedulable Phase: Pending Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unschedulable 6m22s (x12 over 6m34s) volcano 0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable Normal Unschedulable 93s (x280 over 6m34s) volcano queue resource quota insufficient # Insufficient queue resource quota
Parent topic: Faults During Use