ReportProcessFault

Description

Receives the global rank information of the faulty processor reported by the client.

Prototype

rpc ReportProcessFault(ProcessFaultRequest) returns (Status){}

Input Parameters

Parameter

Type (Defined by Protobuf)

Description

ProcessFaultRequest

message ProcessFaultRequest{

string jobId = 1;

repeated FaultRank faultRankIds = 2;

}

ProcessFaultRequest.jobId: job ID

ProcessFaultRequest.faultRankIds: global rank ID list of faulty processors FaultRank is a key-value pair of fault information, including rankId (global rank ID) and faultType (fault type). faultType = 0 indicates an on-chip memory fault. faultType = 1 indicates other faults.

Return Value

Return Value

Type (Defined by Protobuf)

Description

Status

message Status{

int32 code = 1;

string info = 2;

}

Status.code: return code

  • 0: The process recovery is normal.
  • Other values: The fault recovery process is abnormal and rescheduling is triggered.

Status.info: return information