tft_register_save_ckpt_handler
Function
Registers the dump callback function in the framework.
For MindSpeed-LLM, the callback function has been adapted by MindIO TFT. For other frameworks, you need to ensure the security of the callback function.
Format
mindio_ttp.framework_ttp.tft_register_save_ckpt_handler(func: Callable, ctx = None)
Parameters
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
func |
Mandatory |
Function that saves the dying gasp checkpoint. |
The callback function cannot be empty. For details about the input parameters of the callback function, see Table 1. The callback function has no return value. If the execution fails, an exception is thrown. |
ctx |
Optional |
Callback function context. |
This parameter is left empty by default. |
Parameter |
Mandatory/Optional |
Description |
Value |
|---|---|---|---|
step |
- |
Step for dumping optimizer data. |
Positive integer |
save_info |
- |
Rank list generated when different optimizers participate in saving the dying gasp checkpoint. Each element is a dictionary. The dictionary is arranged in the sequence of ATTENTION (0) and MOE (1). |
[
{
"type": int, optimizer type.
"ranks": list, rank list generated when an optimizer saves the dying gasp checkpoint.
},
]
|
args |
- |
Parameter set by tft_set_step_args. |
Determined by the registration party. |
ctx |
- |
Callback function context. |
Determined by the registration party. |
Return Value
No return value. If an error occurs, an error log is recorded and an exception is thrown.