SubscribeFaultMsgSignal
Description
Receives fault subscription requests from the client. The server allocates a message queue to each job and monitors whether the message queue contains messages to be transmitted. If yes, the server transmits the messages to the client through the gRPC stream.
- Before calling this API, call Register.
- After subscribing to the fault information of a general computing job, the client can receive only the NodeD fault and Kubernetes node status exception.
Prototype
rpc SubscribeFaultMsgSignal(ClientInfo) returns (stream FaultMsgSignal){}
Input Parameters
Parameter |
Type (Defined by Protobuf) |
Description |
|---|---|---|
ClientInfo |
message ClientInfo{ string jobId = 1; string role = 2; } |
ClientInfo.jobId: job ID ClientInfo.role: client role NOTE:
|
Return Value
Return Value |
Type (Defined by Protobuf) |
Description |
|---|---|---|
stream |
gRPC stream |
This API returns a gRPC stream. (The data structure of the return value is based on the programming language selected by the client.) The client can call the stream's Receive method (the actual name is determined by the client's programming language) to receive data pushed by the server. |
Data to Be Sent
Parameter |
Type (Defined by Protobuf) |
Description |
|---|---|---|
FaultMsgSignal |
message FaultMsgSignal{ string uuid = 1; string jobId = 2; string signalType = 3; repeated NodeFaultInfo nodeFaultInfo = 4; } message NodeFaultInfo{ string nodeName = 1; string nodeIP = 2; string nodeSN = 3; string faultLevel = 4; repeated DeviceFaultInfo faultDevice = 5; } message DeviceFaultInfo{ string deviceId = 1; string deviceType = 2; repeated string faultCodes = 3; string faultLevel = 4; repeated string faultType = 5; repeated string faultReason = 6; repeated SwitchFaultInfo switchFaultInfos = 7; repeated string faultLevels = 8; } message SwitchFaultInfo{ string faultCode = 1; string switchChipId = 2; string switchPortId = 3; string faultTime = 4; string faultLevel = 5; } |
FaultMsgSignal.uuid: message ID FaultMsgSignal.jobId: job ID FaultMsgSignal.signalType: message type. fault indicates that a fault occurs, and normal indicates that no fault occurs or a fault is rectified. FaultMsgSignal.nodeFaultInfo: node fault information NodeFaultInfo.nodeName: name of the faulty node NodeFaultInfo.nodeIP: node IP address NodeFaultInfo.nodeSN: node SN NodeFaultInfo.faultLevel: fault type, which can be Healthy, SubHealthy, or UnHealthy. Set this parameter to the most severe level in DeviceFaultInfo.faultLevel. NodeFaultInfo.faultDevice: device fault information DeviceFaultInfo.deviceId: device ID. When a bus device fault or Kubernetes status exception occurs on the node, the value of deviceId is -1. DeviceFaultInfo.deviceType: device type, including Node, NPU, Storage, CPU, and Network. DeviceFaultInfo.faultCodes: fault code list DeviceFaultInfo.faultLevel: fault type, including Healthy, SubHealthy, and UnHealthy. The severity levels increase in ascending order. DeviceFaultInfo.faultType: (reserved) fault subsystem type DeviceFaultInfo.faultReason: (reserved) fault cause DeviceFaultInfo.switchFaultInfos: UnifiedBus fault information DeviceFaultInfo.faultLevels: fault level list SwitchFaultInfo.faultCode: UnifiedBus fault code SwitchFaultInfo.switchChipId: ID of the faulty UnifiedBus chip SwitchFaultInfo.switchPortId: ID of the faulty UnifiedBus port SwitchFaultInfo.faultTime: time when a UnifiedBus fault occurs SwitchFaultInfo.faultLevel: UnifiedBus fault level |