tft_notify_controller_stop_train
Function
This API is called by MindCluster to instruct MindIO TFT to proactively stop training and inform MindIO TFT of the faulty NPU information.
Format
mindio_ttp.controller_ttp.tft_notify_controller_stop_train(fault_ranks: dict, stop_type: str = "stop", timeout: int = None)
Parameters
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
fault_ranks |
Mandatory |
Information about the faulty NPU. |
<int key, int errorType> dictionary:
|
stop_type |
Optional |
Mode of stopping training. |
The value is a character string and can be either of the following:
|
timeout |
Optional |
Timeout interval for MindCluster to issue a notification after training is paused. |
0 or a positive integer |
Return Value
- 0: API call succeeded.
- 1: API call failed.
Parent topic: API Reference