Rescheduling Upon Inference Card Faults
Function Highlights
If an inference NPU resource managed by the cluster scheduling components is faulty, the faulty resource is isolated and automatically rescheduled.
Required Components
- Ascend Device Plugin
- Ascend Docker Runtime
- Ascend Operator
- Volcano
- ClusterD
- NodeD
Instructions
- Refer to Installation and Deployment for component installation.
- Refer to Rescheduling Upon Inference Card Faults for feature usage.
Parent topic: Basic Scheduling