Volcano Component Works Abnormally, and "Failed to get plugin volcano-npu_xxx_linux-aarch64" Is Displayed in the Log

Symptom

Pod volcano-scheduler-xxxx of Volcano is in the Running state, but the scheduling is abnormal. View the volcano-scheduler logs. The following information is displayed in the log:

E1026 10:55:44.995088       1 framework.go:38] Failed to get plugin volcano-npu_v3.0.RC2_linux-aarch64

Causes

The name of the scheduling plug-in to be used is specified in the Volcano startup YAML file.

...
# Source: volcano/templates/scheduler.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: volcano-scheduler-configmap
  namespace: volcano-system
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
      - name: volcano-npu_v3.0.RC2_linux-aarch64  # Name of the scheduling plug-in
    - plugins:
      - name: drf
      - name: predicates
...

During image creation, the .so file of the scheduling plug-in in the current directory is copied to the container for volcano-scheduler to use.

FROM alpine:latest

COPY vc-scheduler /vc-scheduler
COPY volcano-npu_*.so plugins/
...

If the name of the scheduling plug-in copied to the container for volcano-scheduler to use is different from that configured in the YAML file, the "Failed to get plugin" error occurs.

Solution

  1. Use the matching YAML file and SO file of the scheduling plug-in to create the volcano-scheduler image again.
  2. Uninstall and reinstall Volcano.