Working with Prometheus

This section describes how to install and deploy Prometheus and view resource monitoring data on Prometheus. For details about the data, see Prometheus Metrics API.

Directly Interconnecting with Prometheus

  1. Go to the mindcluster-deploy repository, access the corresponding branch based on mindcluster-deploy Version Description, and obtain the prometheus.yaml file in the samples/utils/prometheus/base directory.
  2. Run the following command on the management node to obtain the image:
    docker pull prom/prometheus:v2.10.0
    • Before obtaining the image, ensure that you can access the Internet.
    • If you do not use prometheus.yaml provided by cluster scheduling components, add the app: prometheus field to the corresponding position based on the YAML file. Otherwise, the NPU Exporter connection may time out.
  3. Modify the default configuration in prometheus.yaml for obtaining the NPU Exporter metrics as required. The following information in bold is the configuration of obtained NPU Exporter metrics:
    ...
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: kube-system
    data:
      prometheus.yml: |
        global:
          scrape_interval:     15s
          evaluation_interval: 15s
        scrape_configs:
    ...
        - job_name: 'kubernetes-npu-exporter'
          kubernetes_sd_configs:
          - role: pod
          scheme: http
          relabel_configs:
          - action: keep
            source_labels: [__meta_kubernetes_namespace]
            regex: npu-exporter
          - source_labels: [__meta_kubernetes_pod_node_name]
            target_label: job
            replacement: ${1}
    ...
  4. Run the following command to add a label to the management node:
    kubectl label nodes <Host name of the management node> masterselector=dls-master-node --overwrite=true
  5. Upload prometheus.yaml to any directory on the node in 2.
  6. Run the following command in the directory where the prometheus.yaml file is stored to start the Prometheus service:
    kubectl apply -f prometheus.yaml

    If the following information is displayed, the installation is successful:

    1
    2
    3
    4
    5
    6
    7
    [root@centos check_env]# kubectl apply -f prometheus.yaml 
    clusterrole.rbac.authorization.k8s.io/prometheus created
    serviceaccount/prometheus created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus created
    service/prometheus created
    deployment.apps/prometheus created
    configmap/prometheus-config created
    
  7. Run the following command to check whether Prometheus is started successfully:
    kubectl get pods --all-namespaces | grep prometheus

    The following is a startup example. If Running is displayed, Prometheus is started successfully.

    1
    kube-system      prometheus-58c69548b4-rhxsc                1/1     Running            0          6d14h
    
  8. Log in to Prometheus and view the monitoring data.
    1. Open the browser.
    2. Enter http://IP address of the management node:Port number in the browser address box and press Enter.

      Find the nodePort field in the prometheus.yaml file. The value of this field is the port number of the Prometheus service, which is 30003 by default.

    3. Choose NPU-related labels to view corresponding data.

Interconnecting with Prometheus via Prometheus Operator

  1. Run the following command to obtain the source code of Prometheus Operator:
    git clone https://github.com/prometheus-operator/kube-prometheus.git
    • Obtain the Prometheus Operator source code branch that matches Kubernetes from the compatibility list in the official document.
    • If Prometheus Operator and Prometheus have been installed, proceed to Step 4.
  2. Install Prometheus Operator.
    1. Run the following command to install Prometheus Operator:
      kubectl create -f manifests/setup/
      If the following information is displayed, Prometheus Operator is successfully installed:
      1
      2
      3
      4
      5
      namespace/monitoring created
      ...
      deployment.apps/prometheus-operator created
      service/prometheus-operator created
      serviceaccount/prometheus-operator created
      
    2. Run the following command to check whether Prometheus Operator is started successfully:
      kubectl get pod -A -o wide|grep prometheus-operator
      The following is a startup example. If Running is displayed, Prometheus Operator is started successfully.
      1
      monitoring     prometheus-operator-7649c7454f-wp84n       2/2     Running   0          58s   192.168.xx.xx   node133   <none>           <none>
      
  3. Install Prometheus.
    1. Go to the mindcluster-deploy repository, access the corresponding branch based on mindcluster-deploy Version Description, and obtain the prometheus.yaml file in the samples/utils/prometheus/base directory.
    2. Upload the prometheus.yaml file obtained in 1 to any directory in the environment.
    3. In the directory where prometheus.yaml is stored, run the following command to install Prometheus:
      kubectl apply -f prometheus.yaml

      If the following information is displayed, the installation is successful:

      1
      2
      3
      4
      5
      service/prometheus created
      prometheus.monitoring.coreos.com/prometheus created
      serviceaccount/prometheus-service-account created
      clusterrole.rbac.authorization.k8s.io/prometheus-cluster-role created
      clusterrolebinding.rbac.authorization.k8s.io/prometheus-cluster-role-binding created
      
    4. Run the following command to check whether Prometheus is started successfully:
      kubectl get pods --all-namespaces | grep prometheus

      Command output:

      1
      2
      kube-system    prometheus-prometheus-0                    2/2     Running   1          3m47s   192.168.xx.xx   node133   <none>           <none>
      monitoring     prometheus-operator-7649c7454f-wp84n       2/2     Running   0          5m52s   192.168.xx.xx   node133   <none>           <none>
      
  4. Interconnect NPU Exporter with Prometheus via Prometheus Operator.
    1. Obtain npu-exporter-svc.yaml and servicemonitor.yaml.

      If Prometheus has been installed, the following field in the servicemonitor.yaml file must be the same as matchLabels configured by serviceMonitorSelector in Prometheus.

      ...
        labels:                               
          serviceMonitorSelector: prometheus
      ...

      You can run the following command to query matchLabels:

      kubectl describe pod <pod-name>
    2. (Optional) Modify the NPU Exporter label as required. Otherwise, skip this step.
      1. In the npu-exporter-svc.yaml file, modify the label as required.
        apiVersion: v1
        kind: Service
        metadata:
         namespace: npu-exporter  # The namespace is npu-exporter.
          name: npu-exporter             
          labels:                        
            app: npu-exporter-svc   # Label of NPU Exporter service
        spec:
          type: ClusterIP
          ports:
          - port: 8082             # Service port number of NPU Exporter
            targetPort: 8082      
        ...
      2. In the servicemonitor.yaml file, modify the NPU Exporter label as required, which must be the same as that in the npu-exporter-svc.yaml file.
        ...
        spec:
          endpoints:
          - interval: 10s
            targetPort: 8082                                 # Service port number of NPU Exporter
            path: /metrics
          namespaceSelector:
            matchNames:
            - npu-exporter                                   # The namespace is npu-exporter.
          selector:
            matchLabels:                                     
              app: npu-exporter-svc                          # Label of the NPU Exporter service
    3. Run the following commands in sequence to interconnect NPU Exporter with Prometheus using Prometheus Operator:
      kubectl apply -f servicemonitor.yaml
      kubectl apply -f npu-exporter-svc.yaml
    4. Run the following command to check whether NPU Exporter is successfully interconnected with Prometheus Operator:
      kubectl get svc -A|grep npu-exporter
      If the following information is displayed, NPU Exporter is successfully interconnected with Prometheus Operator.
      npu-exporter   npu-exporter          ClusterIP   10.98.xx.xx     <none>        8082/TCP                       31s
    5. Run the following command to check whether Prometheus Operator is successfully interconnected with Prometheus:
      kubectl get servicemonitor -A|grep npu-exporter
      If the following information is displayed, Prometheus Operator is successfully interconnected with Prometheus.
      kube-system   npu-exporter   55s
  5. Log in to Prometheus and view the monitoring data.
    1. Open the browser.
    2. Enter http://IP address of the management node:Port number in the browser address box and press Enter.

      Find the nodePort field in the prometheus.yaml file. The value of this field is the port number of the Prometheus service, which is 30003 by default.

    3. Choose NPU-related labels to view corresponding data.