Volcano Works Abnormally, and "Failed to get plugin" Is Displayed in the Log

Symptom

The pod of volcano-scheduler-xxxx of Volcano is in the Running state, but scheduling is abnormal. View the volcano-scheduler logs. The following information is displayed in the log:

E1026 10:55:44.995088       1 framework.go:38] Failed to get plugin volcano-npu_v{version}_linux-aarch64

Cause Analysis

The name of the scheduling plugin to be used is specified in the Volcano startup YAML file.

...
# Source: volcano/templates/scheduler.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: volcano-scheduler-configmap
  namespace: volcano-system
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
      - name: volcano-npu_v{version}_linux-aarch64  # Name of the scheduling plugin
    - plugins:
      - name: drf
      - name: predicates
...

During image creation, the .so file of the scheduling plugin in the current directory is copied to the container for volcano-scheduler to use.

FROM alpine:latest

COPY vc-scheduler /vc-scheduler
COPY volcano-npu_*.so plugins/
...

If the name of the scheduling plugin copied to the container for volcano-scheduler to use is different from that configured in the YAML file, the "Failed to get plugin" error occurs.

Solution

  1. Use the matching YAML file and SO file of the scheduling plugin to create the volcano-scheduler image again.
  2. Uninstall Volcano and reinstall it.