集群调度提供的MindCluster Volcano组件是在开源MindCluster Volcano的基础上新增了关于NPU调度相关的功能,该功能可通过集成集群调度为开发者提供的Ascend-volcano-plugin插件实现。开源Volcano框架支持插件机制供用户注册调度插件,实现不同的调度策略。
Ascend-volcano-plugin目前只支持了开源MindCluster Volcano1.4.0和1.7.0版本,且未对开源MindCluster Volcano框架做修改。
cd $GOPATH/src/volcano.sh/ git clone -b release-1.7 https://github.com/volcano-sh/volcano.git
cd $GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/build chmod +x build.sh ./build.sh v1.7.0
编译出的二进制文件和动态链接库文件在“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/output”目录下。
编译后的文件列表见表1。
docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f ./Dockerfile-scheduler
kubectl apply -f volcano-v{version}.yaml
namespace/volcano-system created namespace/volcano-monitoring created configmap/volcano-scheduler-configmap createdserviceaccount/volcano-scheduler created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created deployment.apps/volcano-scheduler createdservice/volcano-scheduler-service created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers createdclusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh createdcustomresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.shcreated customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
FROM golang:1.19.1 AS builder WORKDIR /go/src/volcano.sh/ ADD . volcano RUN cd volcano && make vc-scheduler FROM alpine:latest COPY --from=builder /go/src/volcano.sh/volcano/_output/bin/vc-scheduler /vc-scheduler COPY volcano-npu_*.so plugins/ #新增 ENTRYPOINT ["/vc-scheduler"]
cd $GOPATH/src/volcano.sh/volcano docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f installer/dockerfile/scheduler/Dockerfile
apiVersion: v1
kind: ConfigMap
metadata:
name: volcano-scheduler-configmap
namespace: volcano-system
data:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
- name: conformance
- name: volcano-npu_v6.0.RC2_linux-x86_64 #在ConfigMap中的新增自定义调度插件,请注意保持组件的版本配套关系
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack
configurations: # 新增以下加粗字段,该字段为MindCluster Volcano配置字段
- name: init-params
arguments: {"grace-over-time":"900","presetVirtualDevice":"true","nslb-version":"1.0","shared-tor-num":"2","useClusterInfoManager":"false","useClusterInfoManager":"false"}
...
kind: Deployment
apiVersion: apps/v1
metadata:
name: volcano-scheduler
namespace: volcano-system
labels:
app: volcano-scheduler
spec:
...
template:
...
- name: volcano-scheduler
image: volcanosh/vc-scheduler:v1.7.0
args:
- --logtostderr
- --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
- --enable-healthz=true
- --enable-metrics=true
- --plugins-dir=plugins # 在volcano-scheduler启动命令中加载自定义插件
- -v=3
- 2>&1
---
# Source: volcano/templates/scheduler.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: volcano-scheduler
rules:
...
- apiGroups: ["nodeinfo.volcano.sh"]
resources: ["numatopologies"]
verbs: ["get", "list", "watch", "delete"]
- apiGroups: [""] # 新增services的get权限
resources: ["services"]
verbs: ["get"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "create", "delete", "update","list","watch"] # 新增ConfigMap的list和watch权限
- apiGroups: ["apps"]
resources: ["daemonsets", "replicasets", "statefulsets"]
verbs: ["list", "watch", "get"]
...
kubectl apply -f installer/volcano-development.yaml
namespace/volcano-system created namespace/volcano-monitoring created serviceaccount/volcano-admission created configmap/volcano-admission-configmap created clusterrole.rbac.authorization.k8s.io/volcano-admission created clusterrolebinding.rbac.authorization.k8s.io/volcano-admission-role created service/volcano-admission-service createddeployment.apps/volcano-admission created job.batch/volcano-admission-init created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers created clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created serviceaccount/volcano-scheduler createdconfigmap/volcano-scheduler-configmap created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role createdservice/volcano-scheduler-service created deployment.apps/volcano-scheduler created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created mutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-podgroups-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-mutate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-validate created