HCCL-Controller

  1. 通过如下命令查看K8s集群中HCCL-Controller的Pod,需要满足Pod的STATUS为Running,READY为1/1。

    kubectl get pods -n mindx-dl -o wide

    回显示例:

    root@ubuntu:/usr/local/bin# kubectl get pods -n mindx-dl -o wide
    NAME                                     READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
    hccl-controller-7d5fdf7944-ln79f         1/1     Running   0          6m52s   192.168.102.67   ubuntu       <none>           <none>
    ...

  2. 通过如下命令查看K8s集群中HCCL-Controller的日志。

    kubectl logs -n mindx-dl {hccl组件的Pod名字}

    如果出现如下内容表示组件正常。

    root@ubuntu:~# kubectl logs -n mindx-dl hccl-controller-7f98b8c655-lxvq8 
    [INFO]     2022/10/24 18:07:50.733962 1       hwlog@v0.0.3/api.go:91    hccl-controller.log's logger init success
    [INFO]     2022/10/24 18:07:50.762904 1       hccl-controller/main.go:68    hccl controller starting and the version is v5.0.RC1_linux-x86_64
    [WARN]     2022/10/24 18:07:50.764127 1       K8stool@v0.0.3/self_K8s_client.go:153    Neither --kubeconfig nor --master was specified.Using the inClusterConfig.  This might not work.
    [INFO]     2022/10/24 18:07:50.768273 1       controller/controller.go:40    Creating event broadcaster
    [INFO]     2022/10/24 18:07:50.769178 1       agent/businessagent.go:81    start informer factory
    [INFO]     2022/10/24 18:07:50.769236 1       agent/businessagent.go:83    waiting for informer caches to sync
    [INFO]     2022/10/24 18:07:50.869912 1       agent/businessagent.go:105    Starting workers
    [INFO]     2022/10/24 18:07:50.870017 1       agent/businessagent.go:109    Started workers
    ...