(可选)集成昇腾插件扩展开源Volcano
集群调度提供的Volcano组件是在开源Volcano的基础上新增了关于NPU调度相关的功能,该功能可通过集成集群调度为开发者提供的Ascend-volcano-plugin插件实现。开源Volcano框架支持插件机制供用户注册调度插件,实现不同的调度策略。

Ascend-volcano-plugin目前只支持了开源Volcano1.7.0和1.9.0版本,且未对开源Volcano框架做修改。
操作步骤
- 依次执行以下命令,在“$GOPATH/src/volcano.sh/”目录下拉取Volcano v1.7.0(或v1.9.0)版本官方开源代码。
cd $GOPATH/src/volcano.sh/ git clone -b release-1.7 https://github.com/volcano-sh/volcano.git
- 将获取的ascend-for-volcano源码重命名为ascend-volcano-plugin,并上传至开源Volcano官方开源代码的插件路径下(“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/”)。
- 依次执行以下命令,编译开源Volcano二进制文件和华为NPU调度插件so文件。根据开源代码版本,为build.sh脚本选择对应的参数,如v1.7.0或v1.9.0。
cd $GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/build chmod +x build.sh ./build.sh v1.7.0
编译出的二进制文件和动态链接库文件在“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/output”目录下。
编译后的文件列表见表1。
- 选择以下两种方式之一,启动volcano-scheduler组件。
- 使用集群调度组件提供的启动yaml,启动volcano-scheduler组件。
- 执行以下命令,制作Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.7.0或v1.9.0。
docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f ./Dockerfile-scheduler
- 执行以下命令,启动volcano-scheduler组件。
kubectl apply -f volcano-v{version}.yaml
启动示例如下。namespace/volcano-system created namespace/volcano-monitoring created configmap/volcano-scheduler-configmap createdserviceaccount/volcano-scheduler created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created deployment.apps/volcano-scheduler createdservice/volcano-scheduler-service created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers createdclusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh createdcustomresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.shcreated customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
- 执行以下命令,制作Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.7.0或v1.9.0。
- 使用开源Volcano的启动yaml,启动volcano-scheduler组件。
- 将步骤3中编译出的volcano-npu-{version}.so文件拷贝到开源Volcano的“$GOPATH/src/volcano.sh/volcano”目录下;在开源Volcano的Dockerfile(路径为“$GOPATH/src/volcano.sh/volcano/installer/dockerfile/scheduler/Dockerfile”)中添加如下命令。
FROM golang:1.19.1 AS builder WORKDIR /go/src/volcano.sh/ ADD . volcano RUN cd volcano && make vc-scheduler FROM alpine:latest COPY --from=builder /go/src/volcano.sh/volcano/_output/bin/vc-scheduler /vc-scheduler COPY volcano-npu_*.so plugins/ #新增 ENTRYPOINT ["/vc-scheduler"]
- 依次执行以下命令,制作Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.7.0或v1.9.0。
cd $GOPATH/src/volcano.sh/volcano docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f installer/dockerfile/scheduler/Dockerfile
- 修改volcano-development.yaml,该文件路径为“$GOPATH/src/volcano.sh/volcano/installer/volcano-development.yaml”。
apiVersion: v1 kind: ConfigMap metadata: name: volcano-scheduler-configmap namespace: volcano-system data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang enablePreemptable: false - name: conformance - name: volcano-npu_v6.0.RC3_linux-x86_64 #在ConfigMap中的新增自定义调度插件,请注意保持组件的版本配套关系 - plugins: - name: overcommit - name: drf enablePreemptable: false - name: predicates - name: proportion - name: nodeorder - name: binpack configurations: # 新增以下加粗字段,该字段为Volcano配置字段 - name: init-params arguments: {"grace-over-time":"900","presetVirtualDevice":"true","nslb-version":"1.0","shared-tor-num":"2","useClusterInfoManager":"false","useClusterInfoManager":"false","super-pod-size": "48","reserve-nodes": "2"} ... kind: Deployment apiVersion: apps/v1 metadata: name: volcano-scheduler namespace: volcano-system labels: app: volcano-scheduler spec: ... template: ... - name: volcano-scheduler image: volcanosh/vc-scheduler:v1.7.0 args: - --logtostderr - --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf - --enable-healthz=true - --enable-metrics=true - --plugins-dir=plugins # 在volcano-scheduler启动命令中加载自定义插件 - -v=3 - 2>&1 --- # Source: volcano/templates/scheduler.yaml kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: volcano-scheduler rules: ... - apiGroups: ["nodeinfo.volcano.sh"] resources: ["numatopologies"] verbs: ["get", "list", "watch", "delete"] - apiGroups: [""] # 新增services的get权限 resources: ["services"] verbs: ["get"] - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "create", "delete", "update","list","watch"] # 新增ConfigMap的list和watch权限 - apiGroups: ["apps"] resources: ["daemonsets", "replicasets", "statefulsets"] verbs: ["list", "watch", "get"] ...
- 执行以下命令,启动volcano-scheduler组件。
kubectl apply -f installer/volcano-development.yaml
回显示例如下。namespace/volcano-system created namespace/volcano-monitoring created serviceaccount/volcano-admission created configmap/volcano-admission-configmap created clusterrole.rbac.authorization.k8s.io/volcano-admission created clusterrolebinding.rbac.authorization.k8s.io/volcano-admission-role created service/volcano-admission-service createddeployment.apps/volcano-admission created job.batch/volcano-admission-init created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers created clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created serviceaccount/volcano-scheduler createdconfigmap/volcano-scheduler-configmap created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role createdservice/volcano-scheduler-service created deployment.apps/volcano-scheduler created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created mutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-podgroups-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-mutate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-validate created
- 将步骤3中编译出的volcano-npu-{version}.so文件拷贝到开源Volcano的“$GOPATH/src/volcano.sh/volcano”目录下;在开源Volcano的Dockerfile(路径为“$GOPATH/src/volcano.sh/volcano/installer/dockerfile/scheduler/Dockerfile”)中添加如下命令。
- 使用集群调度组件提供的启动yaml,启动volcano-scheduler组件。
父主题: Volcano