Configuration File Description
Resumable training ignores collateral faults caused by special faults in associated scenarios. ClusterD obtains special faults and process them based on the associated fault policy configured in the relationFaultCustomization.json and faultDuration.json files.
relationFaultCustomization.json and faultDuration.json are system configuration files. Do not modify them unless otherwise required.
Parameter |
Description |
Value |
|---|---|---|
TriggerFault |
Collateral fault code. Currently, fault codes configured in faultCode.json and SwitchFaultCode.json are supported. |
String |
RelationFaults |
List of faults to be associated, which can be one or more fault codes. Currently, fault codes configured in faultCode.json and SwitchFaultCode.json are supported. |
String list |
FaultStrategy |
Processing policy of a job when the associated fault is successfully matched.
|
String |
Note: When a fault configured by RelationFaults occurs, ClusterD adds the fault to the fault code queue to be processed. If the fault corresponding to TriggerFault occurs within the interval configured by TimeOutInterval, a job is processed based on the configured FaultStrategy. If the interval exceeds the value of TimeOutInterval, the interconnect device fault is processed using the SubHealth policy. If a processor fault or parameter plane network fault occurs, the fault is ignored. |
||
Parameter |
Description |
Value |
|---|---|---|
FaultCode |
Fault code. Currently, fault codes configured in faultCode.json and SwitchFaultCode.json are supported. |
String |
FaultType |
Fault type:
|
String |
TimeOutInterval |
Maximum association time of a fault code, in seconds. |
Integer |