global-ranktable Description
ClusterD listens to the information of the MS Controller and MS Coordinator job pods and the changes of the ConfigMap corresponding to each hccl.json file, and generates global-ranktable in real time. Some fields in global-ranktable are consistent with those in the hccl.json file. For details about hccl.json, see hccl.json File Description.
- Example of global-ranktable of the
Atlas A2 training product { "version": "1.0", "status": "completed", "server_group_list": [ { "group_id": "2", "deploy_server": "0", "server_count": "1", "server_list": [ { "device": [ { "device_id": "x", "device_ip": "xx.xx.xx.xx", "device_logical_id": "x", "rank_id": "x" } ], "server_id": "xx.xx.xx.xx", "server_ip": "xx.xx.xx.xx" } ] } ] }
- Example of global-ranktable of
Atlas A3 training product { "version": "1.2", "status": "completed", "server_group_list": [ { "group_id": "2", "deploy_server": "1", "server_count": "1", "server_list": [ { "device": [ { "device_id": "0", "device_ip": "xx.xx.xx.xx", "super_device_id": "xxxxx", "device_logical_id": "0", "rank_id": "0" } ], "server_id": "xx.xx.xx.xx", "server_ip": "xx.xx.xx.xx" } ], "super_pod_list": [ { "super_pod_id": "0", "server_list": [ { "server_id": "xx.xx.xx.xx" } ] } ] } ] }
Field |
Description |
|---|---|
version |
Version |
status |
Status |
server_group_list |
List of server groups |
group_id |
Group ID |
server_count |
Number of servers |
server_list |
Server list |
server_id |
AI server ID, which is globally unique. |
server_ip |
Pod IP |
device_id |
NPU device ID |
device_ip |
NPU device IP |
super_device_id |
Unique NPU ID of in a SuperPoD of |
rank_id |
Training rank ID of the NPU |
device_logical_id |
Logical NPU ID |
super_pod_list |
SuperPoD list |
super_pod_id |
Logical SuperPoD ID |