Volcano

  1. 通过如下命令查看K8s集群中Volcano的两个的Pod,需要满足Pod的STATUS都为Running,READY都为1/1。

    kubectl get pods -n volcano-system -o wide

    回显示例:

    root@ubuntu:/usr/local/bin# kubectl get pods -n volcano-system -o wide
    NAME                                   READY   STATUS    RESTARTS   AGE    IP               NODE         NOMINATED NODE   READINESS GATES
    volcano-controllers-758b6d8bdd-b7g89   1/1     Running   2          166m   192.168.102.69   ubuntu       <none>           <none>
    volcano-scheduler-86775f88f-w649w      1/1     Running   2          166m   192.168.102.91   ubuntu       <none>           <none>
    ...

  2. 登录Volcano Pod运行的节点,使用如下命令查看Volcano组件日志。

    • 查看volcano-controllers的日志。
      cat /var/log/mindx-dl/volcano-controller/volcano-controller.log

      如果出现如下打印信息,表示组件运行正常。

      Log file created at: 2022/10/14 11:22:32
      Running on machine: volcano-controllers-758b6d8bdd-wc49r
      Binary: Built with gc go1.17.8-htrunk4 for linux/arm64
      Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
      I1014 11:22:32.070656       1 garbagecollector.go:91] Starting garbage collector
      I1014 11:22:32.072772       1 queue_controller.go:171] Starting queue controller.
      I1014 11:22:32.652887       1 queue_controller.go:238] Begin execute SyncQueue action for queue default, current status
      I1014 11:22:32.653026       1 queue_controller_action.go:36] Begin to sync queue default.
      I1014 11:22:32.756216       1 queue_controller_action.go:82] End sync queue default.
      I1014 11:22:32.756254       1 queue_controller.go:220] Finished syncing queue default (103.399375ms).
      I1014 11:22:32.972001       1 pg_controller.go:109] PodgroupController is running ......
      I1014 11:22:32.972396       1 job_controller.go:252] JobController is running ......
      I1014 11:22:32.972423       1 job_controller.go:256] worker 1 start ......
      I1014 11:22:32.972426       1 job_controller.go:256] worker 0 start ......
      I1014 11:22:32.972426       1 job_controller.go:256] worker 2 start ......
      ...
    • 查看volcano-scheduler的日志。
      cat /var/log/mindx-dl/volcano-scheduler/volcano-scheduler.log

      如果出现如下打印信息,表示组件运行正常。

      Log file created at: 2022/10/14 11:22:32
      Running on machine: volcano-scheduler-86775f88f-6dtqf
      Binary: Built with gc go1.17.8-htrunk4 for linux/arm64
      Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
      ...