Installing Volcano

  • Volcano or another scheduler must be installed on the management node when you need to use functions of full NPU scheduling, static vNPU scheduling, dynamic vNPU scheduling, resumable training, elastic training, recovery of inference card faults, or rescheduling upon inference card faults.
  • If Volcano is used for job scheduling, it is not advised to create or mount NPUs in a container using Docker or Containerd and run jobs in the container. Otherwise, Volcano may encounter scheduling problems.
  • If you need only containerization and resource monitoring functions, you do not need to install Volcano. In this case, skip this section.

    This section describes how to install the Volcano components (vc-scheduler and vc-controller-manager). If you need to use other open-source Volcano components, install them by yourself and ensure their security.

    • In this document, Volcano refers to Volcano involved in the cluster scheduling components. If you need other open-source Volcano-based schedulers, refer to (Optional) Integrating the Ascend Plugin to Extend Open-Source Volcano to integrate the Ascend-volcano-plugin plugin and enable NPU scheduling.
    • NodeD of 6.0.RC1 and later versions are incompatible with Volcano of earlier versions. If you use NodeD of 6.0.RC1 or later, you need to use Volcano of 6.0.RC1 or later.
    • When Volcano 6.0.RC2 or later is used as the scheduler, ClusterD must be installed. If ClusterD is not installed, you must modify the startup parameters of Volcano. Otherwise, Volcano cannot schedule jobs.

Procedure

  1. Log in to the Kubernetes management node as the root user and check whether the Volcano image and version number are correct.
    docker images | grep volcanosh
    Command output:
    1
    2
    volcanosh/vc-controller-manager      v1.7.0              84c73128cc55        3 days ago          44.5MB
    volcanosh/vc-scheduler               v1.7.0              e90c114c75b1        3 days ago          188MB
    
  2. Copy the YAML file in the directory where the Volcano package is decompressed to any directory on the Kubernetes management node.
  3. Skip this step if you do not need to modify the component startup parameters. Otherwise, modify the Volcano startup parameters in the corresponding startup YAML file based on your requirements. For details about common startup parameters, see Table 4 and Table 5.
  4. Configure log dump for Volcano.
    During the installation, Volcano logs are mounted to the drive space (/var/log/mindx-dl). By default, Volcano clears log files when the size of daily logs reaches 1.8 GB. To prevent the drive space from being used up, configure log dump for Volcano. For details, see Table 1. Alternatively, select a more frequent log dump policy to prevent log loss.
    1. In the /etc/logrotate.d directory on the management node, run the following command to create a log dump configuration file:
      vi /etc/logrotate.d/file_name

      Example:

      vi /etc/logrotate.d/volcano
      Add the following content to the file and run the :wq command to save the file:
      /var/log/mindx-dl/volcano-*/*.log{    
           daily     
           rotate 8     
           size 50M     
           compress     
           dateext     
           missingok     
           notifempty     
           copytruncate     
           create 0640 hwMindX hwMindX     
           sharedscripts     
           postrotate         
               chmod 640 /var/log/mindx-dl/volcano-*/*.log                
               chmod 440 /var/log/mindx-dl/volcano-*/*.log-*            
           endscript 
      }
    2. Run the following commands in sequence to set the configuration file permission to 640 and owner to root:
      chmod 640 /etc/logrotate.d/file_name
      chown root /etc/logrotate.d/file_name
      Example:
      chmod 640 /etc/logrotate.d/volcano
      chown root /etc/logrotate.d/volcano
    Table 1 Configuration items of Volcano log dump files

    Configuration Item

    Description

    Possible Value

    daily

    Log dump frequency

    • daily: Performs the dump check once a day.
    • weekly: Performs the dump check once a week.
    • monthly: Performs the dump check once a month.
    • yearly: Performs the dump check once a year.

    rotate x

    Number of times that log files are dumped before they are deleted

    x indicates the number of backups.

    Example:

    • rotate 0: no backup
    • rotate 8: eight backups

    size xx

    A log file is dumped only when its size reaches the value of this parameter.

    The size unit can be specified as follows:

    • byte (default value)
    • K
    • M

    For example, size 50M indicates that a log file is dumped when its size reaches 50 MB.

    NOTE:

    logrotate periodically checks the sizes of log files based on the configured dump frequency. Dump is triggered only when the size of a log file exceeds the value of size.

    This means that logrotate does not dump a log file as soon as it reaches its size limit.

    compress

    Whether to compress dumped logs using gzip

    • compress: Use gzip for compression.
    • nocompress: Do not use gzip for compression.

    notifempty

    Whether to dump empty files

    • ifempty: Dump empty files.
    • notifempty: Do not dump empty files.
  5. (Optional) In volcano-v{version}.yaml, configure the CPU and memory required by Volcano. For the recommended CPU and memory values, see the recommended values in the volcano-controller and volcano-scheduler tables in official documentation of open-source Volcano.
    ...
    kind: Deployment
    ...
      labels:
        app: volcano-scheduler
    spec:
      replicas: 1
    ...
        spec:
    ...
              imagePullPolicy: "IfNotPresent"
              resources:
                requests:
                  memory: 4Gi
                  cpu: 5500m
                limits:
                  memory: 8Gi
                  cpu: 5500m
    ...
    kind: Deployment
    ...
      labels:
        app: volcano-controller
    spec:
    ...
        spec:
    ...
              resources:
                requests:
                  memory: 3Gi
                  cpu: 2000m
                limits:
                  memory: 3Gi
                  cpu: 2000m
    ...
  6. (Optional) Optimize the scheduling time. The plugin used by Volcano can be configured in volcano-v{version}.yaml. For details, see "advanced Volcano configuration parameters" and "supported plugins" in official documentation of open-source Volcano.
    ...
    data:
      volcano-scheduler.conf: |
        actions: "enqueue, allocate, backfill"
        tiers:
        - plugins:
          - name: priority
            enableNodeOrder: false
          - name: gang
            enableNodeOrder: false
          - name: conformance
            enableNodeOrder: false
          - name: volcano-npu_v7.3.0_linux-aarch64   # v7.3.0 indicates the MindCluster version. The number varies depending on the actual version.
        - plugins:
          - name: drf
            enableNodeOrder: false
          - name: predicates
            enableNodeOrder: false
            arguments:
              predicate.GPUSharingEnable: false
              predicate.GPUNumberEnable: false
          - name: proportion
            enableNodeOrder: false
          - name: nodeorder
          - name: binpack
            enableNodeOrder: false
    ....
  7. (Optional) In volcano-v{version}.yaml, enable the Volcano health check interface and Prometheus information collection interface.
    ...
    kind: Deployment
    metadata:
      name: volcano-scheduler
      namespace: volcano-system
      labels:
        app: volcano-scheduler
    spec:
      ...
      template:
    ...
            - name: volcano-scheduler
              image: volcanosh/vc-scheduler:v1.7.0
              args: [ ...
                  ...
                  --enable-healthz=true   # To ensure that the Volcano health check interface can be accessed, the value of this parameter must be true.
                  --enable-metrics=true # To ensure that the Prometheus information collection interface can be accessed, the value of this parameter must be true.
                  ...
    ...
    Table 2 Open interfaces of the cluster scheduling Volcano

    Access Mode

    Protocol

    Method

    Description

    Component

    http://podIP:11251/healthz

    http

    Get

    Health check interface

    volcano-controller

    http://podIP:11251/healthz

    http

    Get

    Health check interface

    volcano-scheduler

    http://volcano-scheduler-serviceIP:8080/metrics

    http

    Get

    Prometheus information collection interface

    volcano-scheduler

  8. (Optional) In volcano-v{version}.yaml, configure pod deletion mode, virtualization mode, switch affinity scheduling, and self-maintenance of available processor status provided by cluster scheduling components for Volcano during rescheduling.
    ...
    data:
      volcano-scheduler.conf: |
    ...
        configurations:
          - name: init-params
            arguments: {"grace-over-time":"900","presetVirtualDevice":"true","nslb-version":"1.0","shared-tor-num":"2","useClusterInfoManager":"false","self-maintain-available-card":"true","super-pod-size": "48","reserve-nodes": "2","forceEnqueue":"true"}
    ...
    Table 3 Parameters

    Parameter

    Default Value

    Description

    grace-over-time

    900

    Maximum time required for deleting a pod in graceful deletion mode during rescheduling. The value ranges from 2 to 3600, in seconds. This field indicates the graceful deletion mode during rescheduling. Graceful deletion means that during rescheduling, the system waits for Volcano to perform related operations for pod deletion. If the pod is not deleted after 900 seconds, it is forcibly deleted.

    presetVirtualDevice

    true

    Virtualization mode.

    • true: static virtualization
    • false: dynamic virtualization

    nslb-version

    1.0

    Switch affinity scheduling version. The value can be 1.0 or 2.0.

    NOTE:
    • Switch affinity scheduling 1.0 supports Atlas training product and Atlas A2 training product as well as PyTorch and MindSpore.
    • Switch affinity scheduling 2.0 supports Atlas A2 training product and PyTorch.

    shared-tor-num

    2

    Maximum number of shared switches that can be used by a single task in switch affinity scheduling 2.0. The value can be 1 or 2. This parameter takes effect only when nslb-version is set to 2.0.

    For details about switch affinity scheduling (1.0 or 2.0), see Node-based Affinity.

    useClusterInfoManager

    true

    Method of obtaining cluster information by Volcano. The options are as follows:

    • true: read ConfigMap reported by ClusterD.
    • false: read ConfigMap reported by Ascend Device Plugin and NodeD respectively.
    NOTE:

    By default, ConfigMap reported by ClusterD is used. In later versions, ConfigMap reported by Ascend Device Plugin and NodeD cannot be read.

    self-maintain-available-card

    true

    Whether Volcano self-maintains the available processor status. The options are as follows:

    • true: Volcano self-maintains the available processor status.
    • false: Volcano obtains the available processor status based on the ConfigMap reported by ClusterD or Ascend Device Plugin.

    super-pod-size

    48

    Number of nodes in an Atlas 900 A3 SuperPoD.

    reserve-nodes

    2

    Number of reserved nodes in an Atlas 900 A3 SuperPoD.

    NOTE:

    If the value of reserve-nodes is greater than that of super-pod-size, the following scenarios may occur:

    • If the value of super-pod-size is greater than 2, the value of reserve-nodes is reset to 2 by default.
    • If the value of super-pod-size is less than or equal to 2, the value of reserve-nodes is reset to 0 by default.

    forceEnqueue

    true

    Whether a job is forcibly added to the to-be-scheduled queue when cluster NPU resources are sufficient. The options are as follows:

    • true: If Volcano enables Enqueue and the cluster NPU resources meet the job requirements, the job is forcibly added to the to-be-scheduled queue, regardless of whether other resources are sufficient. If the job stays in the to-be-scheduled queue for a long time, resources are pre-occupied. As a result, other jobs may fail to be added to the queue.
    • Other values: If cluster NPU resources are insufficient, the job is rejected from entering the to-be-scheduled queue. If NPU resources meet the job requirements, all plugins determine whether the job enters the to-be-scheduled queue.

    For details about this parameter, see Volcano Actions.

  9. (Optional) Optimize the scheduling time. In a single vcjob or acjob, Volcano can reduce the time of scheduling 4,000 or 5,000 pods to 4,000 or 5,000 nodes to about 5 minutes. To use this function, modify volcano-v{version}.yaml as follows.
    • To meet the reference time of about 5 minutes, ensure that the CPU frequency is at least 2.60 GHz and the APIServer latency is less than 80 ms.
    • If the native nodeAffinity and podAntiAffinity fields of Kubernetes are not used for scheduling, you can disable the nodeorder plugin to further reduce the scheduling time.
    data:
      volcano-scheduler.conf: |
    
    ...
          - name: proportion
            enableNodeOrder: false
          - name: nodeorder
            enableNodeOrder: false # (Optional) Disable the nodeorder plugin when nodeAffinity and podAntiAffinity are not used for scheduling.
    ...
          containers:
            - name: volcano-scheduler
              image: volcanosh/vc-scheduler:v1.7.0
              command: ["/bin/ash"]
              args: ["-c", "umask 027; GOMEMLIMIT=15000000000 GOGC=off /vc-scheduler # Add GOMEMLIMIT=15000000000 and GOGC=off fields.
                      --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf
                      --plugins-dir=plugins
                      --logtostderr=false
                      --log_dir=/var/log/mindx-dl/volcano-scheduler
                      --log_file=/var/log/mindx-dl/volcano-scheduler/volcano-scheduler.log
                      -v=2 2>&1"]
              imagePullPolicy: "IfNotPresent"
              resources:
                requests:
                  memory: 10000Mi                                                                #Change 4 GiB to 10000 MiB.
                  cpu: 5500m
                limits:
                  memory: 15000Mi                                                       #  #Change 8 GiB to 15000 MiB.
                  cpu: 5500m
    ...
  10. Run the following command in the directory where the YAML file of the management node is stored to start Volcano.
    kubectl apply -f volcano-v{version}.yaml

    Startup example:

    namespace/volcano-system created
    namespace/volcano-monitoring created
    configmap/volcano-scheduler-configmap created
    serviceaccount/volcano-scheduler created
    clusterrole.rbac.authorization.k8s.io/volcano-scheduler created
    clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created
    deployment.apps/volcano-scheduler created
    service/volcano-scheduler-service created
    serviceaccount/volcano-controllers created
    clusterrole.rbac.authorization.k8s.io/volcano-controllers created
    clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created
    deployment.apps/volcano-controllers created
    customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created
    customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created
    customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created
    customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created
    customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
  11. Query the component status.
    kubectl get pod -n volcano-system
    If Running is displayed in the command output, the component is started successfully.
    1
    2
    3
    NAME                                          READY    STATUS     RESTARTS     AGE
    volcano-controllers-5cf8d788d5-qdpzq   1/1     Running   0          1m
    volcano-scheduler-6cffd555c9-45k7c     1/1     Running   0          1m
    

Parameters

Table 4 volcano-scheduler startup parameters

Parameter

Type

Default Value

Description

--log-dir

String

None

Log directory. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-scheduler.

--log-file

String

None

Log file name. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-scheduler/volcano-scheduler.log.

NOTE:

Dumped files are named in the format of "volcano-scheduler.log-dump triggering time.gz", for example, volcano-scheduler.log-20230926.gz.

--scheduler-conf

String

/volcano.scheduler/volcano-scheduler.conf

Absolute path of the configuration file of the scheduling component.

--logtostderr

Bool

false

Whether to print logs in the standard output.

  • true: yes.
  • false: no.

-v

Integer

2

Log output level.

  • 1: error
  • 2: warning
  • 3: info
  • 4: debug

--plugins-dir

String

plugins

Path for loading the scheduler plugin.

--version

Bool

false

Whether to query the volcano-scheduler binary version.

  • true: queries the version.
  • false: does not query the version.

--log_file_max_size

Integer

1800

Maximum size of a log file, in MB.

NOTE:

When the size of a log file exceeds the threshold, the log content is cleared.

--leader-elect

Bool

false

Primary node selected during multi-copy startup.

--percentage-nodes-to-find

Integer

100

Percentage of available nodes selected during job scheduling to the total number of nodes in a cluster.

Table 5 volcano-controller startup parameters

Parameter

Type

Default Value

Description

--log-dir

String

None

Log directory. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-controller.

--log-file

String

None

Log file name. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-controller/volcano-controller.log.

NOTE:

Dumped files are named in the format of "volcano-controller.log-dump triggering time.gz", for example, volcano-controller.log-20230926.gz.

--logtostderr

Bool

false

Whether to print logs in the standard output.

  • true: yes.
  • false: no.

-v

Integer

4

Log output level.

  • 1: error
  • 2: warning
  • 3: info
  • 4: debug

--version

Bool

false

volcano-controller binary version number.

--log_file_max_size

Integer

1800

Maximum size of a log file, in MB.

NOTE:

When the size of a log file exceeds the threshold, the log content is cleared.

Volcano is open-source software. Only common startup parameters are listed here. For details about other parameters, see the description of the open-source software.