Constraints
- Currently, this feature is supported only by Kubernetes clusters that use cluster scheduling components.
- This feature depends on the Volcano and Asend Device Plugin components. When the rescheduling policy is enabled, an exception of Ascend Device Plugin also triggers rescheduling upon a fault.
- The fault rescheduling label "fault-scheduling: grace" needs to be added to jobs of the deployment type. "fault-scheduling: force" is not supported.
- Table 1 lists the supported Ascend hardware.
Parent topic: Example of Inference Fault Tolerance