tft_can_do_uce_repair

Function

This API is called by MindSpore to determine whether the optimizer data is polluted over time, based on the UCE occurrence time triggered by the L2 Cache and the optimizer update timestamp. It then returns whether the data is recoverable.

This API determines optimizer data corruption solely by evaluating the intersection of time ranges, instead of memory addresses.

Format

mindio_ttp.framework_ttp.tft_can_do_uce_repair(hbm_error_time: int, start_time: int = None, end_time: int = None)

Parameters

Parameter

Mandatory/Optional

Description

Value

hbm_error_time

Mandatory

Time when the L2 Cache triggers a UCE.

int

start_time

Optional

Time obtained from the device before the optimizer is updated locally.

int

end_time

Optional

Time obtained from the device after the optimizer is updated locally.

int

Return Value

Boolean value, which indicates whether fast recovery upon UCEs can be performed based on the time intersection.