(可选)集成昇腾插件扩展开源Volcano
集群调度组件提供的MindCluster Volcano组件是在开源MindCluster Volcano的基础上新增了关于NPU调度相关的功能,该功能可通过集成集群调度组件为开发者提供的Ascend-volcano-plugin插件实现。开源Volcano框架支持插件机制供用户注册调度插件,实现不同的调度策略。

Ascend-volcano-plugin目前只支持了开源MindCluster Volcano1.4.0和1.7.0版本,且未对开源MindCluster Volcano框架做修改。
操作步骤
- 依次执行以下命令,在“$GOPATH/src/volcano.sh/”目录下拉取MindCluster Volcano v1.4.0(或v1.7.0)版本官方开源代码。
cd $GOPATH/src/volcano.sh/ git clone -b release-1.4 https://github.com/volcano-sh/volcano.git
- 将获取的ascend-for-volcano源码重命名为ascend-volcano-plugin,并上传至开源MindCluster Volcano官方开源代码的插件路径下(“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/”)。
- 依次执行以下命令,编译开源MindCluster Volcano二进制文件和华为NPU调度插件so文件。根据开源代码版本,为build.sh脚本选择对应的参数,如v1.4.0或v1.7.0。
cd $GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/build chmod +x build.sh ./build.sh v1.4.0
编译出的二进制文件和动态链接库文件在“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/output”目录下。
编译后的文件列表见表1 output路径下的文件列表。
- 选择以下两种方式之一,启动volcano-scheduler组件。
- 使用集群调度组件提供的启动yaml,启动volcano-scheduler组件。
- 执行以下命令,制作MindCluster Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.4.0或v1.7.0。
docker build --no-cache -t volcanosh/vc-scheduler:v1.4.0 ./ -f ./Dockerfile-scheduler
- 执行以下命令,启动volcano-scheduler组件。
kubectl apply -f volcano-v{version}.yaml
启动示例如下。namespace/volcano-system created namespace/volcano-monitoring created configmap/volcano-scheduler-configmap createdserviceaccount/volcano-scheduler created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created deployment.apps/volcano-scheduler createdservice/volcano-scheduler-service created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers createdclusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh createdcustomresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.shcreated customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
- 执行以下命令,制作MindCluster Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.4.0或v1.7.0。
- 使用开源MindCluster Volcano的启动yaml,启动volcano-scheduler组件。
- 将步骤3中编译出的volcano-npu-{version}.so文件拷贝到开源MindCluster Volcano的“$GOPATH/src/volcano.sh/volcano”目录下;在开源MindCluster Volcano的Dockerfile(路径为“$GOPATH/src/volcano.sh/volcano/installer/dockerfile/scheduler/Dockerfile”)中添加如下命令。
FROM golang:1.19.1 AS builder WORKDIR /go/src/volcano.sh/ ADD . volcano RUN cd volcano && make vc-scheduler FROM alpine:latest COPY --from=builder /go/src/volcano.sh/volcano/_output/bin/vc-scheduler /vc-scheduler COPY volcano-npu_*.so plugins/ #新增 ENTRYPOINT ["/vc-scheduler"]
- 依次执行以下命令,制作MindCluster Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.4.0或v1.7.0。
cd $GOPATH/src/volcano.sh/volcano docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f installer/dockerfile/scheduler/Dockerfile
- 修改volcano-development.yaml,该文件路径为“$GOPATH/src/volcano.sh/volcano/installer/volcano-development.yaml”。
apiVersion: v1 kind: ConfigMap metadata: name: volcano-scheduler-configmap namespace: volcano-system data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang enablePreemptable: false - name: conformance - name: volcano-npu_v5.0.1.1_linux-x86_64 #在ConfigMap中的新增自定义调度插件,请注意保持组件的版本配套关系 - plugins: - name: overcommit - name: drf enablePreemptable: false - name: predicates - name: proportion - name: nodeorder - name: binpack configurations: # 新增以下加粗字段,该字段为MindCluster Volcano配置字段 - name: selector arguments: {"host-arch":"huawei-arm|huawei-x86", "accelerator":"huawei-Ascend910|nvidia-tesla-v100|nvidia-tesla-p40", "accelerator-type":"card|module|half|module-{xxx}b-16|module-{xxx}b-8|card-{xxx}-2|card-{xxx}b-infer","servertype":"soc"} - name: init-params arguments: {"grace-over-time":"900","presetVirtualDevice":"true"} ... kind: Deployment apiVersion: apps/v1 metadata: name: volcano-scheduler namespace: volcano-system labels: app: volcano-scheduler spec: ... template: ... - name: volcano-scheduler image: volcanosh/vc-scheduler:v1.7.0 args: - --logtostderr - --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf - --enable-healthz=true - --enable-metrics=true - --plugins-dir=plugins # 在volcano-scheduler启动命令中加载自定义插件 - -v=3 - 2>&1 ...
- 执行以下命令,启动volcano-scheduler组件。
kubectl apply -f installer/volcano-development.yaml
回显示例如下。namespace/volcano-system created namespace/volcano-monitoring created serviceaccount/volcano-admission created configmap/volcano-admission-configmap created clusterrole.rbac.authorization.k8s.io/volcano-admission created clusterrolebinding.rbac.authorization.k8s.io/volcano-admission-role created service/volcano-admission-service createddeployment.apps/volcano-admission created job.batch/volcano-admission-init created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers created clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created serviceaccount/volcano-scheduler createdconfigmap/volcano-scheduler-configmap created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role createdservice/volcano-scheduler-service created deployment.apps/volcano-scheduler created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created mutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-podgroups-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-mutate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-validate created
- 将步骤3中编译出的volcano-npu-{version}.so文件拷贝到开源MindCluster Volcano的“$GOPATH/src/volcano.sh/volcano”目录下;在开源MindCluster Volcano的Dockerfile(路径为“$GOPATH/src/volcano.sh/volcano/installer/dockerfile/scheduler/Dockerfile”)中添加如下命令。
- 使用集群调度组件提供的启动yaml,启动volcano-scheduler组件。
父主题: Volcano