本章节指导用户安装部署Prometheus相关软件,并通过Prometheus查看资源监测的相关数据信息,数据信息的相关说明可参见Prometheus Metrics接口章节。
docker pull prom/prometheus:v2.10.0
... apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system data: prometheus.yml: | global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: ... - job_name: 'kubernetes-npu-exporter' kubernetes_sd_configs: - role: pod scheme: http relabel_configs: - action: keep source_labels: [__meta_kubernetes_namespace] regex: npu-exporter - source_labels: [__meta_kubernetes_pod_node_name] target_label: job replacement: ${1} ...
kubectl label nodes <管理节点Hostname> masterselector=dls-master-node --overwrite=true
kubectl apply -f prometheus.yaml
回显如下,表示安装成功。
1 2 3 4 5 6 7 | [root@centos check_env]# kubectl apply -f prometheus.yaml clusterrole.rbac.authorization.k8s.io/prometheus created serviceaccount/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created service/prometheus created deployment.apps/prometheus created configmap/prometheus-config created |
kubectl get pods --all-namespaces | grep prometheus
回显示例如下,出现Running状态表示Prometheus启动成功。
1 | kube-system prometheus-58c69548b4-rhxsc 1/1 Running 0 6d14h |
在prometheus.yaml文件中找到nodePort字段,该字段的值为Prometheus服务的端口号,默认为30003。
git clone https://github.com/prometheus-operator/kube-prometheus.git
kubectl create -f manifests/setup/
1 2 3 4 5 | namespace/monitoring created ... deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created |
kubectl get pod -A -o wide|grep prometheus-operator
1 | monitoring prometheus-operator-7649c7454f-wp84n 2/2 Running 0 58s 192.168.xx.xx node133 <none> <none> |
kubectl apply -f prometheus.yaml
回显如下,表示安装成功。
1 2 3 4 5 | service/prometheus created prometheus.monitoring.coreos.com/prometheus created serviceaccount/prometheus-service-account created clusterrole.rbac.authorization.k8s.io/prometheus-cluster-role created clusterrolebinding.rbac.authorization.k8s.io/prometheus-cluster-role-binding created |
kubectl get pods --all-namespaces | grep prometheus
回显示例如下:
1 2 | kube-system prometheus-prometheus-0 2/2 Running 1 3m47s 192.168.xx.xx node133 <none> <none> monitoring prometheus-operator-7649c7454f-wp84n 2/2 Running 0 5m52s 192.168.xx.xx node133 <none> <none> |
若已经提前安装Prometheus,需要确保servicemonitor.yaml的以下字段,和已经部署的Prometheus中serviceMonitorSelector配置的matchLabels标签一致。
... labels: serviceMonitorSelector: prometheus ...
matchLabels标签可通过执行以下命令进行查询。
kubectl describe pod <pod-name>
apiVersion: v1 kind: Service metadata: namespace: npu-exporter # 命名空间为npu-exporter name: npu-exporter labels: app: npu-exporter-svc # NPU Exporter service的标签 spec: type: ClusterIP ports: - port: 8082 # NPU Exporter的服务端口号 targetPort: 8082 ...
... spec: endpoints: - interval: 10s targetPort: 8082 # NPU Exporter的服务端口号 path: /metrics namespaceSelector: matchNames: - npu-exporter # 命名空间为npu-exporter selector: matchLabels: app: npu-exporter-svc # NPU Exporter service的标签
kubectl apply -f servicemonitor.yaml kubectl apply -f npu-exporter-svc.yaml
kubectl get svc -A|grep npu-exporter
npu-exporter npu-exporter ClusterIP 10.98.xx.xx <none> 8082/TCP 31s
kubectl get servicemonitor -A|grep npu-exporter
kube-system npu-exporter 55s
在prometheus.yaml文件中找到nodePort字段,该字段的值为Prometheus服务的端口号,默认为30003。