report_process_fault(fault_ranks: dict) -> int
The client reports a service plane fault of the task process. When detecting a fault, the client reports the information about the rank where the service plane fault occurs to the server.
Parameter |
Type |
Description |
|---|---|---|
fault_ranks |
Dictionary |
Rank of the faulty process |
Type |
Description |
|---|---|
Integer |
Returns 0 if the operation is successful; returns a non-0 value if the operation fails. For details, see Return Codes. |
Parent topic: Elastic Agent (APIs Related to Resumable Training)