Kubernetes 1.25.10 and Later Versions Do Not Support vNPU Restoration
Symptom
For Kubernetes 1.25.10 and later versions, the cluster scheduling components do not support vNPU restoration.
Cause Analysis
After a server is restarted, for a Kubernetes version earlier than 1.25.10, it only checks whether a resource type exists but does not check whether the resource type is empty during job pod startup. For Kubernetes 1.25.10 and later versions, whether a resource type is empty is checked. If a job pod is started before a management pod of the device plugin, rescheduling is performed because no resource is available.
Solution
This issue exists in the Kubernetes community but is not resolved. If you can modify the source code, you can manually handle the problem. If you cannot modify the source code, do not use the current version.
Parent topic: Faults During Installation