ReportState

Function

Enumerates the training states reported by the decorator.

Format

mindio_ttp.framework_ttp.ReportState

Parameters

Parameter

Mandatory/Optional

Description

Value

ReportState

Mandatory

Type of the reported training state.

  • RS_NORMAL: normal state
  • RS_UCE: UCE
  • RS_UCE_CORRUPTED: Multi-bit ECC of the on-chip memory
  • RS_HCCL_FAILED: HCCL recomputation failure
  • RS_UNKNOWN: other errors
  • RS_INIT_FINISH: exception thrown by the newly started ARF node after the training process is initialized in the MindSpore framework
  • RS_PREREPAIR_FINISH: exception thrown by the newly started ARF node
  • RS_STEP_FINISH: exception thrown when the step-level pause in the subhealth hot switchover is complete
  • RS_NORMAL.value: ttp_c2python_api.ReportState_RS_NORMAL
  • RS_UCE.value: ttp_c2python_api.ReportState_RS_UCE
  • RS_UCE_CORRUPTED:

    ttp_c2python_api.ReportState_RS_UCE_CORRUPTED

  • RS_HCCL_FAILED.value: ttp_c2python_api.ReportState_RS_HCCL_FAILED
  • RS_UNKNOWN.value: ttp_c2python_api.ReportState_RS_UNKNOWN
  • RS_INIT_FINISH:

    ttp_c2python_api.ReportState_RS_INIT_FINISH

  • RS_PREREPAIR_FINISH.value: ttp_c2python_api.ReportState_RS_PREREPAIR_FINISH
  • RS_STEP_FINISH:

    ttp_c2python_api.ReportState_RS_STEP_FINISH

Return Value

None