tft_notify_controller_on_global_rank

Function

This API is called by MindCluster to notify MindIO TFT of the global fault NPU information.

Format

mindio_ttp.controller_ttp.tft_notify_controller_on_global_rank(fault_ranks: dict,time:int=1)

Parameters

Parameter

Mandatory/Optional

Description

Value

fault_ranks

Mandatory

Information about the faulty NPU.

<int key, int errorType> dictionary:

  • key: rank ID of the faulty NPU
  • errorType: fault type
    • 0: UCE
    • 1: non-UCE fault

time

Optional

Maximum time for interacting with the MindCluster repair policy, which is determined based on the environment variable.

The value is an integer in the range [1, 3600], with a default of 1 (unit: seconds).

Return Value

  • 0: API call succeeded.
  • 1: API call failed.