使用Volcano v1.7.0版本,无法查询Pod状态
2024/10/15
253
问题信息
问题来源 | 产品大类 | 关键字 |
---|---|---|
官方 | 集群调度 | Volcano v1.7.0、Pod状态查询失败 |
问题现象描述
使用Volcano v1.7.0版本时,若当前环境资源不足,使用kubectl get pod --all-namespaces -o wide命令查询Pod状态失败。
原因分析
使用Volcano v1.7.0版本时,当资源不足时,Pod将不会被创建,无法查询Pod状态。
解决措施
- 可以通过以下命令查询所有podgroup信息,找到任务对应的podgroup。
kubectl get pg -A
回显示例如下:NAMESPACE NAME STATUS MINMEMBER RUNNINGS AGE vcjob mindx-fjq-16-p-4bf232e4-bd48-438d-9089-02bfef354fce Inqueue 1 5m32s vcjob mindx-fjq-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914 Pending 1 5m15s
- 执行以下命令,查询对应podgroup的详细信息。
kubectl describe pg -n <namespace> <podgroup-name>
<namespace>和<podgroup-name>需要用实际的命名空间和podgroup名称进行替换。
示例命令如下。kubectl describe pg -n vcjob mindx-fjq-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914
回显示例如下,该回显示例表示queue资源配额不足。Name: mindx-fjq-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914 Namespace: vcjob Labels: fault-scheduling=force ring-controller.atlas=ascend***** Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"batch.volcano.sh/v1alpha1","kind":"Job","metadata":{"annotations":{},"labels":{"fault-scheduling":"force","ring-controller.... API Version: scheduling.volcano.sh/v1beta1 Kind: PodGroup Metadata: Creation Timestamp: 2023-07-05T09:00:02Z Generation: 7 Owner References: API Version: batch.volcano.sh/v1alpha1 Block Owner Deletion: true Controller: true Kind: Job Name: mindx-fjq-2-p UID: 8bf7f0f6-8a7e-4621-a0d0-cafa56785914 Resource Version: 17544644 Self Link: /apis/scheduling.volcano.sh/v1beta1/namespaces/vcjob/podgroups/mindx-fjq-2-p-8bf7f0f6-8a7e-4621-a0d0-cafa56785914 UID: 277cc974-5eec-455f-a860-25d7d19e8335 Spec: Min Member: 1 Min Resources: count/pods: 1 huawei.com/Ascend***: 2 Pods: 1 requests.huawei.com/Ascend***: 2 Min Task Member: Default - Test: 1 Queue: default Status: Conditions: Last Transition Time: 2023-07-05T09:05:46Z Message: 1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable Reason: NotEnoughResources Status: True Transition ID: 33585c5e-d3ad-4bc4-be0c-c09bea59520e Type: Unschedulable Phase: Pending Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unschedulable 6m22s (x12 over 6m34s) volcano 0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable Normal Unschedulable 93s (x280 over 6m34s) volcano queue resource quota insufficient # queue资源配额不足