SubscribeRankTable
Description
Receives RankTable subscription requests from the client. The server allocates a message queue to each job and monitors whether the message queue contains messages to be transmitted. If yes, the server transmits the messages to the client through the gRPC stream.
Prototype
rpc SubscribeRankTable(ClientInfo) returns (stream RankTableStream) {}
Input Parameters
Parameter |
Type (Defined by Protobuf) |
Description |
|---|---|---|
ClientInfo |
message ClientInfo{ string jobId = 1; string role = 2; } |
ClientInfo.jobId: job ID ClientInfo.role: client role |
Return Value
Return Value |
Type (Defined by Protobuf) |
Description |
|---|---|---|
stream |
gRPC stream |
This API returns a gRPC stream. (The data structure of the return value is based on the programming language selected by the client.) The client can call the stream's Receive method (the actual name is determined by the client's programming language) to receive data pushed by the server. |
Data Sending Description
Return Value |
Type (Defined by Protobuf) |
Description |
|---|---|---|
RankTableStream |
message RankTableStream{ string jobId = 1; string rankTable = 2; } |
RankTableStream.jobId: job ID RankTableStream.rankTable: RankTable information. For details about the fields, see Table 1. |
global-ranktable Description
ClusterD generates global-ranktable as a response message in the RankTable field. Some fields in global-ranktable are consistent with those in the hccl.json file. For details about hccl.json, see hccl.json File Description.
- Example of global-ranktable of the
{ "version": "1.0", "status": "completed", "server_group_list": [ { "group_id": "2", "deploy_server": "0", "server_count": "1", "server_list": [ { "device": [ { "device_id": "x", "device_ip": "xx.xx.xx.xx", "device_logical_id": "x", "rank_id": "x" } ], "server_id": "xx.xx.xx.xx", "server_ip": "xx.xx.xx.xx" } ] } ] }
- Example of global_ranktable of the
Atlas A3 training product { "version": "1.2", "status": "completed", "server_group_list": [ { "group_id": "2", "deploy_server": "1", "server_count": "1", "server_list": [ { "device": [ { "device_id": "0", "device_ip": "xx.xx.xx.xx", "super_device_id": "xxxxx", "device_logical_id": "0", "rank_id": "0" } ], "server_id": "xx.xx.xx.xx", "server_ip": "xx.xx.xx.xx" } ], "super_pod_list": [ { "super_pod_id": "0", "server_list": [ { "server_id": "xx.xx.xx.xx" } ] } ] } ] }
Field |
Description |
|---|---|
version |
Version |
status |
Status |
server_group_list |
List of server groups |
group_id |
Group ID |
server_count |
Number of servers |
server_list |
Server list |
server_id |
AI server ID, which is globally unique. |
server_ip |
Pod IP |
device_id |
NPU device ID |
device_ip |
NPU device IP |
super_device_id |
Unique NPU ID of in a SuperPoD of |
rank_id |
Training rank ID of the NPU |
device_logical_id |
logical NPU ID |
super_pod_list |
SuperPoD list |
super_pod_id |
Logical SuperPoD ID |