昇腾社区首页
中文
注册

(可选)集成昇腾插件扩展开源Volcano

集群调度提供的Volcano组件是在开源Volcano的基础上新增了关于NPU调度相关的功能,该功能可通过集成集群调度为开发者提供的Ascend-volcano-plugin插件实现。开源Volcano框架支持插件机制供用户注册调度插件,实现不同的调度策略。

Ascend-volcano-plugin目前只支持了开源Volcano1.7.0和1.9.0版本,且未对开源Volcano框架做修改。

操作步骤

  1. 依次执行以下命令,在“$GOPATH/src/volcano.sh/”目录下拉取Volcano v1.7.0(或v1.9.0)版本官方开源代码。
    cd $GOPATH/src/volcano.sh/ 
    git clone -b release-1.7 https://github.com/volcano-sh/volcano.git
  2. 将获取的ascend-for-volcano源码重命名为ascend-volcano-plugin,并上传至开源Volcano官方开源代码的插件路径下($GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/”)。
  3. 依次执行以下命令,编译开源Volcano二进制文件和华为NPU调度插件so文件。根据开源代码版本,为build.sh脚本选择对应的参数,如v1.7.0或v1.9.0。
    cd $GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/build
    chmod +x build.sh
    ./build.sh v1.7.0

    编译出的二进制文件和动态链接库文件在“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/output”目录下。

    编译后的文件列表见表1

    表1 output路径下的文件

    文件名

    说明

    volcano-npu-{version}.so

    华为NPU调度插件动态链接库

    Dockerfile-scheduler

    volcano-scheduler镜像构建文本文件

    Dockerfile-controller

    volcano-controller镜像构建文本文件

    volcano-v{version}.yaml

    Volcano的启动配置文件

    vc-scheduler

    volcano-scheduler组件二进制文件

    vc-controller-manager

    volcano-controller组件二进制文件

  4. 选择以下两种方式之一,启动volcano-scheduler组件。
    • 使用集群调度组件提供的启动yaml,启动volcano-scheduler组件。
      1. 执行以下命令,制作Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.7.0或v1.9.0。
        docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f ./Dockerfile-scheduler
      2. 执行以下命令,启动volcano-scheduler组件。
        kubectl apply -f volcano-v{version}.yaml
        启动示例如下。
        namespace/volcano-system created
        namespace/volcano-monitoring created
        configmap/volcano-scheduler-configmap createdserviceaccount/volcano-scheduler created
        clusterrole.rbac.authorization.k8s.io/volcano-scheduler created
        clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created
        deployment.apps/volcano-scheduler createdservice/volcano-scheduler-service created
        serviceaccount/volcano-controllers created
        clusterrole.rbac.authorization.k8s.io/volcano-controllers createdclusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created
        deployment.apps/volcano-controllers created
        customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh createdcustomresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created
        customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created
        customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.shcreated
        customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
    • 使用开源Volcano的启动yaml,启动volcano-scheduler组件。
      1. 将步骤3中编译出的volcano-npu-{version}.so文件拷贝到开源Volcano“$GOPATH/src/volcano.sh/volcano”目录下;在开源Volcano的Dockerfile(路径为“$GOPATH/src/volcano.sh/volcano/installer/dockerfile/scheduler/Dockerfile”)中添加如下命令。
        FROM golang:1.19.1 AS builder
        WORKDIR /go/src/volcano.sh/
        ADD . volcano
        RUN cd volcano && make vc-scheduler
        FROM alpine:latest
        COPY --from=builder /go/src/volcano.sh/volcano/_output/bin/vc-scheduler /vc-scheduler
        COPY volcano-npu_*.so plugins/     #新增
        ENTRYPOINT ["/vc-scheduler"]
      2. 依次执行以下命令,制作Volcano镜像。根据开源代码版本,为镜像选择对应的参数,如v1.7.0或v1.9.0。
        cd $GOPATH/src/volcano.sh/volcano
        docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f installer/dockerfile/scheduler/Dockerfile
      3. 修改volcano-development.yaml,该文件路径为“$GOPATH/src/volcano.sh/volcano/installer/volcano-development.yaml”
        apiVersion: v1
        kind: ConfigMap
        metadata: 
          name: volcano-scheduler-configmap 
          namespace: volcano-system
        data:
           volcano-scheduler.conf: |
             actions: "enqueue, allocate, backfill"
             tiers:
             - plugins:
               - name: priority
               - name: gang
                 enablePreemptable: false
               - name: conformance
               - name: volcano-npu_v6.0.RC3_linux-x86_64    #在ConfigMap中的新增自定义调度插件,请注意保持组件的版本配套关系
             - plugins:
               - name: overcommit
               - name: drf
                 enablePreemptable: false
               - name: predicates
               - name: proportion
               - name: nodeorder
               - name: binpack
            configurations:           # 新增以下加粗字段,该字段为Volcano配置字段
              - name: init-params
                arguments: {"grace-over-time":"900","presetVirtualDevice":"true","nslb-version":"1.0","shared-tor-num":"2","useClusterInfoManager":"false","useClusterInfoManager":"false","super-pod-size": "48","reserve-nodes": "2"}
        ...
        kind: Deployment
        apiVersion: apps/v1
        metadata:
          name: volcano-scheduler
          namespace: volcano-system
          labels:
            app: volcano-scheduler
        spec:
          ...
          template:
        ...
                - name: volcano-scheduler
                  image: volcanosh/vc-scheduler:v1.7.0
                  args:
                    - --logtostderr
                    - --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
                    - --enable-healthz=true
                    - --enable-metrics=true
                    - --plugins-dir=plugins       # 在volcano-scheduler启动命令中加载自定义插件
                    - -v=3
                    - 2>&1
        ---
        # Source: volcano/templates/scheduler.yaml
        kind: ClusterRole
        apiVersion: rbac.authorization.k8s.io/v1
        metadata:
          name: volcano-scheduler
        rules:
        ...
          - apiGroups: ["nodeinfo.volcano.sh"]
            resources: ["numatopologies"]
            verbs: ["get", "list", "watch", "delete"]
          - apiGroups: [""]                          # 新增services的get权限  
            resources: ["services"]
            verbs: ["get"]
          - apiGroups: [""]
            resources: ["configmaps"]
            verbs: ["get", "create", "delete", "update","list","watch"]    # 新增ConfigMap的list和watch权限
          - apiGroups: ["apps"]
            resources: ["daemonsets", "replicasets", "statefulsets"]
            verbs: ["list", "watch", "get"]
        ...
      4. 执行以下命令,启动volcano-scheduler组件。
        kubectl apply -f installer/volcano-development.yaml
        回显示例如下。
        namespace/volcano-system created
        namespace/volcano-monitoring created
        serviceaccount/volcano-admission created
        configmap/volcano-admission-configmap created
        clusterrole.rbac.authorization.k8s.io/volcano-admission created
        clusterrolebinding.rbac.authorization.k8s.io/volcano-admission-role created
        service/volcano-admission-service createddeployment.apps/volcano-admission created
        job.batch/volcano-admission-init created
        customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created
        customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created
        serviceaccount/volcano-controllers created
        clusterrole.rbac.authorization.k8s.io/volcano-controllers created
        clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created
        deployment.apps/volcano-controllers created
        serviceaccount/volcano-scheduler createdconfigmap/volcano-scheduler-configmap created
        clusterrole.rbac.authorization.k8s.io/volcano-scheduler created
        clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role createdservice/volcano-scheduler-service created
        deployment.apps/volcano-scheduler created
        customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created
        customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created
        customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
        mutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-podgroups-mutate createdmutatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-mutate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-jobs-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-pods-validate createdvalidatingwebhookconfiguration.admissionregistration.k8s.io/volcano-admission-service-queues-validate created