mindx_elastic.terminating_message.ExceptionCheckpoint(prefix='CKP', directory=None, config=None, partial_save=False, replicas=1)

Function Description

Fixed action executed in each training epoch or iteration. It is used to capture INT and TERM signals and save the dying gasp checkpoint.

Parameters:

  • prefix (str): prefix name of the checkpoint file.
  • directory (str): path of the folder that stores the checkpoint file. By default, the file is saved in the current directory.
  • config (CheckpointConfig): checkpoint policy configuration.
  • partial_save (bool): whether to enable partial saving.
  • replicas (int): number of partially saved copies. The value ranges from 1 to 5.