集群调度组件提供的MindCluster Volcano组件是在开源MindCluster Volcano的基础上新增了关于NPU调度相关的功能,该功能可通过集成集群调度组件为开发者提供的Ascend-volcano-plugin插件实现。开源Volcano框架支持插件机制供用户注册调度插件,实现不同的调度策略。
Ascend-volcano-plugin目前只支持了开源MindCluster Volcano1.4.0和1.7.0版本,且未对开源MindCluster Volcano框架做修改。
cd $GOPATH/src/volcano.sh/ git clone -b release-1.4 https://github.com/volcano-sh/volcano.git
cd $GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/build chmod +x build.sh ./build.sh v1.4.0
编译出的二进制文件和动态链接库文件在“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/output”目录下。
编译后的文件列表见表1 output路径下的文件列表。
docker build --no-cache -t volcanosh/vc-scheduler:v1.4.0 ./ -f ./Dockerfile-scheduler
kubectl apply -f volcano-v{version}.yaml
namespace/volcano-system created namespace/volcano-monitoring created configmap/volcano-scheduler-configmap createdserviceaccount/volcano-scheduler created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created deployment.apps/volcano-scheduler createdservice/volcano-scheduler-service created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers createdclusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh createdcustomresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.shcreated customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
FROM golang:1.19.1 AS builder WORKDIR /go/src/volcano.sh/ ADD . volcano RUN cd volcano && make vc-scheduler FROM alpine:latest COPY --from=builder /go/src/volcano.sh/volcano/_output/bin/vc-scheduler /vc-scheduler COPY volcano-npu_*.so plugins/ #新增 ENTRYPOINT ["/vc-scheduler"]
cd $GOPATH/src/volcano.sh/volcano docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f installer/dockerfile/scheduler/Dockerfile
apiVersion: v1
kind: ConfigMap
metadata:
name: volcano-scheduler-configmap
namespace: volcano-system
data:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
- name: conformance
- name: volcano-npu_v5.0.0.2_linux-x86_64 #在ConfigMap中的新增自定义调度插件,请注意保持组件的版本配套关系
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack
configurations: # 新增以下加粗字段,该字段为MindCluster Volcano配置字段
- name: selector
arguments: {"host-arch":"huawei-arm|huawei-x86",
"accelerator":"huawei-Ascend910|nvidia-tesla-v100|nvidia-tesla-p40",
"accelerator-type":"card|module|half|module-{xxx}b-16|module-{xxx}b-8|card-{xxx}-2|card-{xxx}b-infer","servertype":"soc"}
- name: init-params
arguments: {"grace-over-time":"900","presetVirtualDevice":"true"}
...
kind: Deployment
apiVersion: apps/v1
metadata:
name: volcano-scheduler
namespace: volcano-system
labels:
app: volcano-scheduler
spec:
...
template:
...
- name: volcano-scheduler
image: volcanosh/vc-scheduler:v1.7.0
args:
- --logtostderr
- --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
- --enable-healthz=true
- --enable-metrics=true
- --plugins-dir=plugins # 在volcano-scheduler启动命令中加载自定义插件
- -v=3
- 2>&1
...
kubectl apply -f installer/volcano-development.yaml
namespace/volcano-system created namespace/volcano-monitoring created serviceaccount/volcano-admission created configmap/volcano-admission-configmap created clusterrole.rbac.authorization.k8s.io/volcano-admission created clusterrolebinding.rbac.authorization.k8s.io/volcano-admission-role created service/volcano-admission-service createddeployment.apps/volcano-admission created job.batch/volcano-admission-init created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers created clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created serviceaccount/volcano-scheduler createdconfigmap/volcano-scheduler-configmap created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role createdservice/volcano-scheduler-service created deployment.apps/volcano-scheduler created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created mutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-podgroups-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-mutate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-validate created