SubscribeRankTable

Description

Receives RankTable subscription requests from the client. The server allocates a message queue to each job and monitors whether the message queue contains messages to be transmitted. If yes, the server transmits the messages to the client through the gRPC stream.

Prototype

rpc SubscribeRankTable(ClientInfo) returns (stream RankTableStream) {}

Input Parameters

Parameter

Type (Defined by Protobuf)

Description

ClientInfo

message ClientInfo{

string jobId = 1;

string role = 2;

}

ClientInfo.jobId: job ID

ClientInfo.role: client role

Return Value

Return Value

Type (Defined by Protobuf)

Description

stream

gRPC stream

This API returns a gRPC stream. (The data structure of the return value is based on the programming language selected by the client.)

The client can call the stream's Receive method (the actual name is determined by the client's programming language) to receive data pushed by the server.

Data Sending Description

Return Value

Type (Defined by Protobuf)

Description

RankTableStream

message RankTableStream{

string jobId = 1;

string rankTable = 2;

}

RankTableStream.jobId: job ID

RankTableStream.rankTable: RankTable information. For details about the fields, see Table 1.

global-ranktable Description

ClusterD generates global-ranktable as a response message in the RankTable field. Some fields in global-ranktable are consistent with those in the hccl.json file. For details about hccl.json, see hccl.json File Description.

  • Example of global-ranktable of the
    {
        "version": "1.0",
        "status": "completed",
        "server_group_list": [
            {
                "group_id": "2",
                "deploy_server": "0",
                "server_count": "1",
                "server_list": [
                    {
                        "device": [
                            {
                                "device_id": "x",
                                "device_ip": "xx.xx.xx.xx",
                                "device_logical_id": "x",
                                "rank_id": "x"
                            }
                        ],
                        "server_id": "xx.xx.xx.xx",
                        "server_ip": "xx.xx.xx.xx"
                    }
                ]
            }
        ]
    }
  • Example of global_ranktable of the Atlas A3 training product
    {
        "version": "1.2",
        "status": "completed",
        "server_group_list": [
            {
                "group_id": "2",
                "deploy_server": "1",
                "server_count": "1",
                "server_list": [
                    {
                        "device": [
                            {
                                "device_id": "0",
                                "device_ip": "xx.xx.xx.xx",
                                "super_device_id": "xxxxx",
                                "device_logical_id": "0",
                                "rank_id": "0"
                            }
                        ],
                        "server_id": "xx.xx.xx.xx",
                        "server_ip": "xx.xx.xx.xx"
                    }
                ],
                "super_pod_list": [
                    {
                        "super_pod_id": "0",
                        "server_list": [
                            {
                                "server_id": "xx.xx.xx.xx"
                            }
                        ]
                    }
                ]
            }
        ]
    }
Table 1 Fields in global-ranktable

Field

Description

version

Version

status

Status

server_group_list

List of server groups

group_id

Group ID

server_count

Number of servers

server_list

Server list

server_id

AI server ID, which is globally unique.

server_ip

Pod IP

device_id

NPU device ID

device_ip

NPU device IP

super_device_id

Unique NPU ID of in a SuperPoD of Atlas A3 training product

rank_id

Training rank ID of the NPU

device_logical_id

logical NPU ID

super_pod_list

SuperPoD list

super_pod_id

Logical SuperPoD ID