gRPC
Description
Receives and processes public faults from the gRPC client to connect to the resumable training process.
- If the actual parameter values in gRPC requests are different from the defined value ranges, ClusterD discards the fault information.
- When faults are injected through ConfigMap or gRPC interfaces, the maximum number of faults on all nodes is 50,000. If this threshold is exceeded, ClusterD discards any newly injected fault information.
- To clear public faults, the recover event of the fault needs to be transmitted to ClusterD through the gRPC interface.
Prototype
rpc SendPublicFault(PublicFaultRequest) returns (RespStatus){}
Input Parameters
Parameter |
Type (Defined by Protobuf) |
Description |
|---|---|---|
PublicFaultRequest |
message PublicFaultRequest{ string id = 1; int64 timestamp = 2; string version = 3; string resource = 4; repeated Fault faults = 5; } message Fault{ string faultId = 1; string faultType = 2; string faultCode = 3; int64 faultTime = 4; string assertion = 5; map<string, string> faultLocation = 6; repeated PubFaultInfo influence = 7; string description = 8; } message PubFaultInfo{ string nodeName = 1; string nodeSN = 2; repeated int32 deviceIds = 3; } |
PublicFaultRequest.id: unique ID of a message PublicFaultRequest.timestamp: Timestamp for message sending PublicFaultRequest.version: message version PublicFaultRequest.resource: fault sender PublicFaultRequest.faults: fault content Fault.faultId: fault instance ID Fault.faultType: fault type Fault.faultCode: fault code Fault.faultTime: fault occurrence time Fault.assertion: fault status Fault.faultLocation: fault locating information Fault.influence: fault impact scope Fault.description: fault description PubFaultInfo.nodeName: node name PubFaultInfo.nodeSN: node SN PubFaultInfo.deviceIds: physical processor ID For more details, see ConfigMap. |
Return Value
Return Value |
Type (Defined by Protobuf) |
Description |
|---|---|---|
RespStatus |
message RespStatus{ int32 code = 1; string info = 2; } |
RespStatus.code: return code
RespStatus.info: return information |