Container Recovery
Function Highlights
In the scenario where Kubernetes is not deployed, if the training or inference process is abnormal, you can configure the container recovery function to recover containers.
- Fault detection: Container Manager is used to detect job faults.
- Fault rectification: After a fault occurs, the faulty device can be automatically recovered without manual intervention.
- Container recovery: When a fault occurs, the container is stopped. After the fault is rectified, the container is started again.
Required Component
Container Manager
Instructions
- Refer to Installation and Deployment for component installation.
- Refer to Appliance Feature Guide for feature usage.
Parent topic: Feature Description