Kubernetes 1.25.10 and Later Versions Do Not Support vNPU Restoration

Symptom

For Kubernetes 1.25.10 and later versions, the cluster scheduling components do not support vNPU restoration.

Cause Analysis

After a server is restarted, for a Kubernetes version earlier than 1.25.10, it only checks whether a resource type exists but does not check whether the resource type is empty during job pod startup. For Kubernetes 1.25.10 and later versions, whether a resource type is empty is checked. If a job pod is started before a management pod of the device plugin, rescheduling is performed because no resource is available.

Solution

This issue exists in the Kubernetes community but is not resolved. If you can modify the source code, you can manually handle the problem. If you cannot modify the source code, do not use the current version.