tft_register_save_ckpt_handler

Function

Registers the dump callback function in the framework.

For MindSpeed-LLM, the callback function has been adapted by MindIO TFT. For other frameworks, you need to ensure the security of the callback function.

Format

mindio_ttp.framework_ttp.tft_register_save_ckpt_handler(func: Callable, ctx = None)

Parameters

Parameter

Mandatory/Optional

Description

Value

func

Mandatory

Function that saves the dying gasp checkpoint.

The callback function cannot be empty. For details about the input parameters of the callback function, see Table 1. The callback function has no return value. If the execution fails, an exception is thrown.

ctx

Optional

Callback function context.

This parameter is left empty by default.

Table 1 Parameters of the callback function

Parameter

Mandatory/Optional

Description

Value

step

-

Step for dumping optimizer data.

Positive integer

save_info

-

Rank list generated when different optimizers participate in saving the dying gasp checkpoint. Each element is a dictionary. The dictionary is arranged in the sequence of ATTENTION (0) and MOE (1).

[
{
"type": int, optimizer type.
"ranks": list, rank list generated when an optimizer saves the dying gasp checkpoint.
},
]

args

-

Parameter set by tft_set_step_args.

Determined by the registration party.

ctx

-

Callback function context.

Determined by the registration party.

Return Value

No return value. If an error occurs, an error log is recorded and an exception is thrown.