Resilience-Controller

  1. 通过如下命令查看K8s集群中Resilience-Controller的Pod,需要满足Pod的STATUS为Running,READY为1/1。

    kubectl get pods -n mindx-dl -o wide

    回显示例:

    root@ubuntu:/usr/local/bin# kubectl get pods -n mindx-dl -o wide
    NAME                                           READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
    resilience-controller-76f4476bb5-fs986         1/1     Running   0          6m52s   192.168.102.67   ubuntu       <none>           <none>
    ...

  2. 通过如下命令查看K8s集群中Resilience-Controller的日志。

    kubectl logs -n mindx-dl {Resilience组件的Pod名字}

    如果出现如下内容表示组件正常。

    root@ubuntu:~# kubectl logs -n mindx-dl resilience-controller-76f4476bb5-fs986 
    [INFO]     2022/11/17 17:18:46.697010 1       hwlog@v0.0.0/api.go:96    run.log's logger init success
    [INFO]     2022/11/17 17:18:46.697139 1       cmd/main.go:57    resilience-controller starting and the version is v5.0.RC1_linux-x86_64
    [INFO]     2022/11/17 17:18:47.227913 1       K8stool@v0.0.0/self_K8s_client.go:116    start to decrypt cfg
    [INFO]     2022/11/17 17:18:47.297559 1       K8stool@v0.0.0/self_K8s_client.go:125    Config loaded from file: ****tc/mindx-dl/resilience-controller/.config/config6
    [INFO]     2022/11/17 17:18:47.300066 1       elastic/controller.go:45    Setting up elastic event handlers
    [INFO]     2022/11/17 17:18:47.300179 1       elastic/controller.go:63    Starting elastic controller, waiting for informer caches to sync
    [INFO]     2022/11/17 17:18:47.401246 1       cmd/main.go:80    elastic controller started
    ...