tft_register_mindx_callback
Function
This API is called by MindCluster to register the callback function of the repair process with MindIO TFT.
Format
mindio_ttp.controller_ttp.tft_register_mindx_callback(action: str, func: Callable)
Parameters
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
action |
Mandatory |
Name of the action to be registered by the callback function. |
The value is of the string type. The following action names are supported:
|
func |
Mandatory |
Function to be registered. |
The callback function cannot be empty. For details about the input parameters of the callback function, see Table 1 to Table 4. |
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
error_rank_dict |
- |
Information about the faulty NPU. |
<int key, int errorType> dictionary:
|
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
code |
- |
Action execution result. |
|
msg |
- |
Message indicating whether the training stops. |
String |
error_rank_dict |
- |
Information about the faulty NPU. |
<int key, int errorType> dictionary:
|
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
error_rank_dict |
- |
Information about the faulty NPU. |
<int key, int errorType> dictionary:
|
strategy_list |
- |
List of repair policies supported by MindIO TFT based on the current available replica information. |
The value is of the list type. The supported repair policies (string) are as follows:
|
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
code |
- |
Action execution result. |
|
msg |
- |
Message indicating repair success or failure. |
String |
error_rank_dict |
- |
Information about the faulty NPU. |
<int key, int errorType> dictionary:
|
curr_strategy |
- |
Current repair policy. |
The value is of the string type. For details about the value, see strategy_list in Table 3. |
Return Value
- 0: API call succeeded.
- 1: API call failed.