Telegraf Data Description

After you run Telegraf, the monitored data information of Ascend AI processors is displayed. The following information is for reference only. For more details, see the following content or refer to Data Description.
...
Ascend910-0,host=xxx  npu_chip_link_speed=104857600000i,npu_chip_roce_rx_cnp_pkt_num=0i,npu_chip_roce_unexpected_ack_num=0i,npu_chip_optical_vcc=3245.1,npu_chip_optical_rx_power_1=0.8585,npu_chip_info_hbm_used_memory=0i,npu_chip_mac_rx_pause_num=0i,npu_chip_roce_tx_all_pkt_num=0i,npu_chip_roce_tx_cnp_pkt_num=0i,npu_chip_info_temperature=46,npu_chip_mac_rx_bad_pkt_num=0i,npu_chip_roce_tx_err_pkt_num=0i,npu_chip_optical_rx_power_3=0.8466,npu_chip_optical_rx_power_0=0.7933,npu_chip_info_network_status=0i,npu_chip_mac_rx_pfc_pkt_num=0i,npu_chip_mac_tx_bad_pkt_num=0i,npu_chip_roce_rx_all_pkt_num=0i,npu_chip_mac_rx_bad_oct_num=0i,npu_chip_optical_tx_power_1=0.9162,npu_chip_info_utilization=0,npu_chip_info_power=73.9000015258789,npu_chip_info_link_status=1i,npu_chip_info_bandwidth_rx=0,npu_chip_mac_tx_pfc_pkt_num=0i,npu_chip_roce_rx_err_pkt_num=0i,npu_chip_roce_verification_err_num=0i,npu_chip_optical_state=1i,npu_chip_info_bandwidth_tx=0,npu_chip_mac_tx_bad_oct_num=0i,npu_chip_roce_out_of_order_num=0i,npu_chip_roce_qp_status_err_num=0i,npu_chip_optical_rx_power_2=0.855,npu_chip_optical_tx_power_0=0.9095,npu_chip_info_hbm_utilization=0,npu_chip_link_up_num=2i,npu_chip_info_health_status=1i,npu_chip_mac_tx_pause_num=0i,npu_chip_roce_new_pkt_rty_num=0i,npu_chip_optical_temp=53,npu_chip_optical_tx_power_2=1.0342,npu_chip_optical_tx_power_3=0.9715 1694772754612200641,npu_chip_info_process_info_num=0i

This API can be used to query both the default and custom metrics groups. Custom Metric Development describes how to customize a metrics group, and the default metrics group contains the following parts. The collection and reporting of a metrics group are governed by its corresponding configuration parameter. When enabled, the metrics group is collected and reported; when disabled, it is neither collected nor reported.

  • NPU Exporter obtains data information by calling the underlying HDK APIs. For more details, see HDK APIs to Be Called.
  • NPU Exporter does not report data if it does not support the device or HDK APIs fail to be called when a data record is queried.

Network Information

Table 4 Network information

Type

Data Name

Data Description

Value/Unit

Supported Product

Network

npu_chip_info_bandwidth_rx

Real-time RX rate of the network port of the Ascend AI processor

Unit: MB/s

  • Atlas training products
  • Atlas A2 training products
  • Atlas A3 training products
  • Atlas 800I A2 inference server
  • A200I A2 Box heterogeneous component

Network

npu_chip_info_bandwidth_tx

Real-time TX rate of the network port of the Ascend AI processor

Unit: MB/s

Network

npu_chip_info_link_status

Network port link status of the Ascend AI processor

The value can be 0 or 1.

  • 1: UP
  • 0: DOWN

Network

npu_chip_link_speed

Default network port rate of the Ascend AI processor.

Unit: MB/s

Network

npu_chip_link_up_num

Number of times when the network port of the Ascend AI processor is up.

Unit: integer

HDK APIs to Be Called

NPU Exporter obtains data information by calling the underlying HDK APIs. For details about HDK APIs called to obtain the above data, see HDK APIs Called by NPU Exporter. To search for the required HDK APIs, perform the following steps:

  1. Visit Ascend Computing Documentation and select a product name to go to the document page. For example, if you are a user of Atlas 800I A2 inferences, click Atlas 800I A2.
  2. In the navigation pane on the left, click Developer Documents.
  3. In the search box on the home page of the document, search for an API name or keyword to view the API information.