NPU-Exporter Prometheus Metrics API
Function Description
Provides the Metrics API for Prometheus to call and integrate.
For details about how to integrate Prometheus, see Deploying Prometheus. After Prometheus is started, it can automatically connect to NPU-Exporter.
URL
GET https://ip:port/metrics
For security purposes, the NPU-Exporter enables the container-level port (8082 by default) by default. The request IP address is the IP address of the Kubernetes container. If the Kubernetes network plugin is Calico, the network policy is set to allow the access of the application whose label is app=prometheus.
Request Parameters
None
Response Description
The data is returned in the Prometheus-specific format. The related metrics are as follows. Description about the metrics offered by Prometheus is not provided here.
...
# HELP machine_npu_nums Amount of npu installed on the machine.
# TYPE machine_npu_nums gauge
machine_npu_nums 8
# HELP npu_chip_info_error_code the npu error code
# TYPE npu_chip_info_error_code gauge
npu_chip_info_error_code{id="0"} 0 1613993498553
npu_chip_info_error_code{id="1"} 0 1613993498588
npu_chip_info_error_code{id="2"} 0 1613993498615
npu_chip_info_error_code{id="3"} 0 1613993498645
npu_chip_info_error_code{id="4"} 0 1613993498676
npu_chip_info_error_code{id="5"} 0 1613993498685
npu_chip_info_error_code{id="6"} 0 1613993498715
npu_chip_info_error_code{id="7"} 0 1613993498742
# HELP npu_chip_info_hbm_total_memory the npu hbm total memory
# TYPE npu_chip_info_hbm_total_memory gauge
npu_chip_info_hbm_total_memory{id="0"} 32255 1613993498553
npu_chip_info_hbm_total_memory{id="1"} 32255 1613993498588
npu_chip_info_hbm_total_memory{id="2"} 32255 1613993498615
...
Label |
Description |
Unit |
|---|---|---|
machine_npu_nums |
Number of Ascend AI Processors |
- |
npu_chip_info_error_code |
Error code of an Ascend AI Processor |
- |
npu_chip_info_name |
Name and ID of an Ascend AI Processor |
- |
npu_chip_info_health_status |
Health status of an Ascend AI Processor |
|
npu_chip_info_power |
Power consumption of an Ascend AI Processor. For 910 and 310, this parameter refers to processor power consumption. For 310P, it refers to board card power consumption. |
W |
npu_chip_info_temperature |
Temperature of an Ascend AI Processor |
°C |
npu_chip_info_used_memory |
Used memory of an Ascend AI Processor |
MB |
npu_chip_info_total_memory |
Total memory of an Ascend AI Processor |
MB |
npu_chip_info_hbm_used_memory |
Used HBM memory dedicated for the Ascend AI Processor |
MB |
npu_chip_info_hbm_total_memory |
Total HBM memory dedicated for the Ascend AI Processor |
MB |
npu_chip_info_utilization |
AI Core usage of an Ascend AI Processor |
% |
npu_chip_info_voltage |
Voltage of an Ascend AI Processor |
V |
npu_exporter_version_info |
NPU-Exporter version information |
- |
npu_container_info |
NPU container information. The output contains the following fields:
|
- |
container_npu_total_memory |
Total memory size of the NPU with container information. Only the entire card is supported. The container information contains the following fields:
|
MB |
container_npu_used_memory |
Used memory of the NPU with container information. Only the entire card is supported. The container information contains the following fields:
|
MB |
container_npu_utilization |
NPU usage with container information. Only the entire card is supported. The container information contains the following fields:
|
% |
Status Code
Status Code |
Description |
|---|---|
200 |
Normal |
307 |
Temporary redirection |
500 |
Internal server error |