GetFaultMsgSignal
Description
Serving as a fault query interface, it is used to receive requests from the client to query cluster and job fault information.
This interface can receive a maximum of 10 requests per second. If the number of requests exceeds 10, the requests are added to the waiting queue. If the total number of waiting requests exceeds 50, request sending will be rejected.
Prototype
rpc GetFaultMsgSignal(ClientInfo) returns(FaultQueryResult){}
Input Parameters
Parameter |
Type (Defined by Protobuf) |
Description |
|---|---|---|
ClientInfo |
message ClientInfo{ string jobId = 1; string role = 2; } |
ClientInfo.jobId: job ID If the input jobId is empty, fault information within the cluster is returned. If jobId is not empty, it must contain 8 to 128 characters and cannot include any Chinese character. ClientInfo.role: client role NOTE:
|
Return Value
Return Value |
Type (Defined by Protobuf) |
Description |
|---|---|---|
FaultQueryResult |
message FaultQueryResult{ int32 code = 1; string info = 2; FaultMsgSignal faultSignal =3; } |
code: return code of a query
info: description of the query result faultSignal: fault information structure FaultMsgSignal.uuid: message ID FaultMsgSignal.jobId: job ID. The value -1 indicates the cluster. FaultMsgSignal.signalType: message type. fault indicates that a fault occurs, and normal indicates that no fault occurs or a fault is rectified. FaultMsgSignal.nodeFaultInfo: node fault information NodeFaultInfo.nodeName: name of the faulty node NodeFaultInfo.nodeIP: node IP address NodeFaultInfo.nodeSN: node SN NodeFaultInfo.faultLevel: fault type, which can be Healthy, SubHealthy, or UnHealthy. Set this parameter to the most severe level in DeviceFaultInfo.faultLevel. NodeFaultInfo.faultDevice: device fault information DeviceFaultInfo.deviceId: device ID DeviceFaultInfo.deviceType: device type, including Node, NPU, Storage, CPU, and Network. DeviceFaultInfo.faultCodes: fault code list DeviceFaultInfo.faultLevel: fault type, including Healthy, SubHealthy, and UnHealthy. The severity levels increase in ascending order. DeviceFaultInfo.faultType: (reserved) fault subsystem type DeviceFaultInfo.faultReason: (reserved) fault cause DeviceFaultInfo.switchFaultInfos: UnifiedBus fault information list DeviceFaultInfo.faultLevels: fault level list SwitchFaultInfo.faultCode: UnifiedBus fault code SwitchFaultInfo.switchChipId: ID of the faulty UnifiedBus chip SwitchFaultInfo.switchPortId: ID of the faulty UnifiedBus port SwitchFaultInfo.faultTime: time when a UnifiedBus fault occurs SwitchFaultInfo.faultLevel: UnifiedBus fault level |