Installing Volcano
- Volcano or another scheduler must be installed on the management node when you need to use functions of full NPU scheduling, static vNPU scheduling, dynamic vNPU scheduling, resumable training, elastic training, recovery of inference card faults, or rescheduling upon inference card faults.
- If Volcano is used for job scheduling, it is not advised to create or mount NPUs in a container using Docker or Containerd and run jobs in the container. Otherwise, Volcano may encounter scheduling problems.
- If you need only containerization and resource monitoring functions, you do not need to install Volcano. In this case, skip this section.
This section describes how to install the Volcano components (vc-scheduler and vc-controller-manager). If you need to use other open-source Volcano components, install them by yourself and ensure their security.
- In this document, Volcano refers to Volcano involved in the cluster scheduling components. If you need other open-source Volcano-based schedulers, refer to (Optional) Integrating the Ascend Plugin to Extend Open-Source Volcano to integrate the Ascend-volcano-plugin plugin and enable NPU scheduling.
- NodeD of 6.0.RC1 and later versions are incompatible with Volcano of earlier versions. If you use NodeD of 6.0.RC1 or later, you need to use Volcano of 6.0.RC1 or later.
- When Volcano 6.0.RC2 or later is used as the scheduler, ClusterD must be installed. If ClusterD is not installed, you must modify the startup parameters of Volcano. Otherwise, Volcano cannot schedule jobs.
Procedure
- Log in to the Kubernetes management node as the root user and check whether the Volcano image and version number are correct.
docker images | grep volcanosh
Command output:1 2
volcanosh/vc-controller-manager v1.7.0 84c73128cc55 3 days ago 44.5MB volcanosh/vc-scheduler v1.7.0 e90c114c75b1 3 days ago 188MB
- If correct, proceed to Step 2.
- If not correct, create the image by referring to Preparing an Image.
- Copy the YAML file in the directory where the Volcano package is decompressed to any directory on the Kubernetes management node.
- Skip this step if you do not need to modify the component startup parameters. Otherwise, modify the Volcano startup parameters in the corresponding startup YAML file based on your requirements. For details about common startup parameters, see Table 4 and Table 5.
- Configure log dump for Volcano.During the installation, Volcano logs are mounted to the drive space (/var/log/mindx-dl). By default, Volcano clears log files when the size of daily logs reaches 1.8 GB. To prevent the drive space from being used up, configure log dump for Volcano. For details, see Table 1. Alternatively, select a more frequent log dump policy to prevent log loss.
- In the /etc/logrotate.d directory on the management node, run the following command to create a log dump configuration file:
vi /etc/logrotate.d/file_name
Example:
vi /etc/logrotate.d/volcano
Add the following content to the file and run the :wq command to save the file:/var/log/mindx-dl/volcano-*/*.log{ daily rotate 8 size 50M compress dateext missingok notifempty copytruncate create 0640 hwMindX hwMindX sharedscripts postrotate chmod 640 /var/log/mindx-dl/volcano-*/*.log chmod 440 /var/log/mindx-dl/volcano-*/*.log-* endscript } - Run the following commands in sequence to set the configuration file permission to 640 and owner to root:
chmod 640 /etc/logrotate.d/file_name chown root /etc/logrotate.d/file_name
Example:chmod 640 /etc/logrotate.d/volcano chown root /etc/logrotate.d/volcano
Table 1 Configuration items of Volcano log dump files Configuration Item
Description
Possible Value
daily
Log dump frequency
- daily: Performs the dump check once a day.
- weekly: Performs the dump check once a week.
- monthly: Performs the dump check once a month.
- yearly: Performs the dump check once a year.
rotate x
Number of times that log files are dumped before they are deleted
x indicates the number of backups.
Example:
- rotate 0: no backup
- rotate 8: eight backups
size xx
A log file is dumped only when its size reaches the value of this parameter.
The size unit can be specified as follows:
- byte (default value)
- K
- M
For example, size 50M indicates that a log file is dumped when its size reaches 50 MB.
NOTE:logrotate periodically checks the sizes of log files based on the configured dump frequency. Dump is triggered only when the size of a log file exceeds the value of size.
This means that logrotate does not dump a log file as soon as it reaches its size limit.
compress
Whether to compress dumped logs using gzip
- compress: Use gzip for compression.
- nocompress: Do not use gzip for compression.
notifempty
Whether to dump empty files
- ifempty: Dump empty files.
- notifempty: Do not dump empty files.
- In the /etc/logrotate.d directory on the management node, run the following command to create a log dump configuration file:
- (Optional) In volcano-v{version}.yaml, configure the CPU and memory required by Volcano. For the recommended CPU and memory values, see the recommended values in the volcano-controller and volcano-scheduler tables in official documentation of open-source Volcano.
... kind: Deployment ... labels: app: volcano-scheduler spec: replicas: 1 ... spec: ... imagePullPolicy: "IfNotPresent" resources: requests: memory: 4Gi cpu: 5500m limits: memory: 8Gi cpu: 5500m ... kind: Deployment ... labels: app: volcano-controller spec: ... spec: ... resources: requests: memory: 3Gi cpu: 2000m limits: memory: 3Gi cpu: 2000m ... - (Optional) Optimize the scheduling time. The plugin used by Volcano can be configured in volcano-v{version}.yaml. For details, see "advanced Volcano configuration parameters" and "supported plugins" in official documentation of open-source Volcano.
... data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority enableNodeOrder: false - name: gang enableNodeOrder: false - name: conformance enableNodeOrder: false - name: volcano-npu_v7.3.0_linux-aarch64 # v7.3.0 indicates the MindCluster version. The number varies depending on the actual version. - plugins: - name: drf enableNodeOrder: false - name: predicates enableNodeOrder: false arguments: predicate.GPUSharingEnable: false predicate.GPUNumberEnable: false - name: proportion enableNodeOrder: false - name: nodeorder - name: binpack enableNodeOrder: false .... - (Optional) In volcano-v{version}.yaml, enable the Volcano health check interface and Prometheus information collection interface.
... kind: Deployment metadata: name: volcano-scheduler namespace: volcano-system labels: app: volcano-scheduler spec: ... template: ... - name: volcano-scheduler image: volcanosh/vc-scheduler:v1.7.0 args: [ ... ... --enable-healthz=true # To ensure that the Volcano health check interface can be accessed, the value of this parameter must be true. --enable-metrics=true # To ensure that the Prometheus information collection interface can be accessed, the value of this parameter must be true. ... ...Table 2 Open interfaces of the cluster scheduling Volcano Access Mode
Protocol
Method
Description
Component
http://podIP:11251/healthz
http
Get
Health check interface
volcano-controller
http://podIP:11251/healthz
http
Get
Health check interface
volcano-scheduler
http://volcano-scheduler-serviceIP:8080/metrics
http
Get
Prometheus information collection interface
volcano-scheduler
- (Optional) In volcano-v{version}.yaml, configure pod deletion mode, virtualization mode, switch affinity scheduling, and self-maintenance of available processor status provided by cluster scheduling components for Volcano during rescheduling.
... data: volcano-scheduler.conf: | ... configurations: - name: init-params arguments: {"grace-over-time":"900","presetVirtualDevice":"true","nslb-version":"1.0","shared-tor-num":"2","useClusterInfoManager":"false","self-maintain-available-card":"true","super-pod-size": "48","reserve-nodes": "2","forceEnqueue":"true"} ...Table 3 Parameters Parameter
Default Value
Description
grace-over-time
900
Maximum time required for deleting a pod in graceful deletion mode during rescheduling. The value ranges from 2 to 3600, in seconds. This field indicates the graceful deletion mode during rescheduling. Graceful deletion means that during rescheduling, the system waits for Volcano to perform related operations for pod deletion. If the pod is not deleted after 900 seconds, it is forcibly deleted.
presetVirtualDevice
true
Virtualization mode.
- true: static virtualization
- false: dynamic virtualization
nslb-version
1.0
Switch affinity scheduling version. The value can be 1.0 or 2.0.
NOTE:- Switch affinity scheduling 1.0 supports Atlas training product and
Atlas A2 training product as well as PyTorch and MindSpore. - Switch affinity scheduling 2.0 supports
Atlas A2 training product and PyTorch.
shared-tor-num
2
Maximum number of shared switches that can be used by a single task in switch affinity scheduling 2.0. The value can be 1 or 2. This parameter takes effect only when nslb-version is set to 2.0.
For details about switch affinity scheduling (1.0 or 2.0), see Node-based Affinity.
useClusterInfoManager
true
Method of obtaining cluster information by Volcano. The options are as follows:
- true: read ConfigMap reported by ClusterD.
- false: read ConfigMap reported by Ascend Device Plugin and NodeD respectively.
NOTE:By default, ConfigMap reported by ClusterD is used. In later versions, ConfigMap reported by Ascend Device Plugin and NodeD cannot be read.
self-maintain-available-card
true
Whether Volcano self-maintains the available processor status. The options are as follows:
- true: Volcano self-maintains the available processor status.
- false: Volcano obtains the available processor status based on the ConfigMap reported by ClusterD or Ascend Device Plugin.
super-pod-size
48
Number of nodes in an Atlas 900 A3 SuperPoD.
reserve-nodes
2
Number of reserved nodes in an Atlas 900 A3 SuperPoD.
NOTE:If the value of reserve-nodes is greater than that of super-pod-size, the following scenarios may occur:
- If the value of super-pod-size is greater than 2, the value of reserve-nodes is reset to 2 by default.
- If the value of super-pod-size is less than or equal to 2, the value of reserve-nodes is reset to 0 by default.
forceEnqueue
true
Whether a job is forcibly added to the to-be-scheduled queue when cluster NPU resources are sufficient. The options are as follows:
- true: If Volcano enables Enqueue and the cluster NPU resources meet the job requirements, the job is forcibly added to the to-be-scheduled queue, regardless of whether other resources are sufficient. If the job stays in the to-be-scheduled queue for a long time, resources are pre-occupied. As a result, other jobs may fail to be added to the queue.
- Other values: If cluster NPU resources are insufficient, the job is rejected from entering the to-be-scheduled queue. If NPU resources meet the job requirements, all plugins determine whether the job enters the to-be-scheduled queue.
- For details about how to configure open-source Volcano, see official documentation of open-source Volcano.
- Kubernetes allows nodeAffinity to conduct node affinity scheduling. For details about this field, see Kubernetes documentation. Volcano can also use this field. For details, see Scheduling.
- (Optional) Optimize the scheduling time. In a single vcjob or acjob, Volcano can reduce the time of scheduling 4,000 or 5,000 pods to 4,000 or 5,000 nodes to about 5 minutes. To use this function, modify volcano-v{version}.yaml as follows.
- To meet the reference time of about 5 minutes, ensure that the CPU frequency is at least 2.60 GHz and the APIServer latency is less than 80 ms.
- If the native nodeAffinity and podAntiAffinity fields of Kubernetes are not used for scheduling, you can disable the nodeorder plugin to further reduce the scheduling time.
data: volcano-scheduler.conf: | ... - name: proportion enableNodeOrder: false - name: nodeorder enableNodeOrder: false # (Optional) Disable the nodeorder plugin when nodeAffinity and podAntiAffinity are not used for scheduling. ... containers: - name: volcano-scheduler image: volcanosh/vc-scheduler:v1.7.0 command: ["/bin/ash"] args: ["-c", "umask 027; GOMEMLIMIT=15000000000 GOGC=off /vc-scheduler # Add GOMEMLIMIT=15000000000 and GOGC=off fields. --scheduler-conf=/volcano.scheduler/volcano-scheduler.conf --plugins-dir=plugins --logtostderr=false --log_dir=/var/log/mindx-dl/volcano-scheduler --log_file=/var/log/mindx-dl/volcano-scheduler/volcano-scheduler.log -v=2 2>&1"] imagePullPolicy: "IfNotPresent" resources: requests: memory: 10000Mi #Change 4 GiB to 10000 MiB. cpu: 5500m limits: memory: 15000Mi # #Change 8 GiB to 15000 MiB. cpu: 5500m ... - Run the following command in the directory where the YAML file of the management node is stored to start Volcano.
kubectl apply -f volcano-v{version}.yamlStartup example:
namespace/volcano-system created namespace/volcano-monitoring created configmap/volcano-scheduler-configmap created serviceaccount/volcano-scheduler created clusterrole.rbac.authorization.k8s.io/volcano-scheduler created clusterrolebinding.rbac.authorization.k8s.io/volcano-scheduler-role created deployment.apps/volcano-scheduler created service/volcano-scheduler-service created serviceaccount/volcano-controllers created clusterrole.rbac.authorization.k8s.io/volcano-controllers created clusterrolebinding.rbac.authorization.k8s.io/volcano-controllers-role created deployment.apps/volcano-controllers created customresourcedefinition.apiextensions.k8s.io/jobs.batch.volcano.sh created customresourcedefinition.apiextensions.k8s.io/commands.bus.volcano.sh created customresourcedefinition.apiextensions.k8s.io/podgroups.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/queues.scheduling.volcano.sh created customresourcedefinition.apiextensions.k8s.io/numatopologies.nodeinfo.volcano.sh created
- Query the component status.
kubectl get pod -n volcano-system
If Running is displayed in the command output, the component is started successfully.1 2 3
NAME READY STATUS RESTARTS AGE volcano-controllers-5cf8d788d5-qdpzq 1/1 Running 0 1m volcano-scheduler-6cffd555c9-45k7c 1/1 Running 0 1m
- If the pod status of Volcano is CrashLoopBackOff, rectify the fault by referring to After Volcano Is Manually Installed, the Pod Status Is CrashLoopBackOff.
- If volcano-scheduler-6cffd555c9-45k7c is in the Running state but the scheduling is abnormal, rectify the fault by referring to Volcano Works Abnormally, and "Failed to get plugin" Is Displayed in the Log.
- After the component is installed, if the pod status of the component is not Running, refer to Component pods Are Not in the Running State.
- After the component is installed, if the pod status of the component is ContainerCreating, refer to Cluster Scheduling Component Pods Are in the ContainerCreating State.
- If the component fails to be started, refer to Cluster Scheduling Components Fail to Start and "get sem errno =13" Is Displayed in Logs.
- If the component is started successfully, but the corresponding pod cannot be found, refer to YAML File for Starting a Component Is Successfully Executed, But the pod Corresponding to the Component Is Not Displayed.
Parameters
Parameter |
Type |
Default Value |
Description |
|---|---|---|---|
--log-dir |
String |
None |
Log directory. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-scheduler. |
--log-file |
String |
None |
Log file name. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-scheduler/volcano-scheduler.log. NOTE:
Dumped files are named in the format of "volcano-scheduler.log-dump triggering time.gz", for example, volcano-scheduler.log-20230926.gz. |
--scheduler-conf |
String |
/volcano.scheduler/volcano-scheduler.conf |
Absolute path of the configuration file of the scheduling component. |
--logtostderr |
Bool |
false |
Whether to print logs in the standard output.
|
-v |
Integer |
2 |
Log output level.
|
--plugins-dir |
String |
plugins |
Path for loading the scheduler plugin. |
--version |
Bool |
false |
Whether to query the volcano-scheduler binary version.
|
--log_file_max_size |
Integer |
1800 |
Maximum size of a log file, in MB. NOTE:
When the size of a log file exceeds the threshold, the log content is cleared. |
--leader-elect |
Bool |
false |
Primary node selected during multi-copy startup. |
--percentage-nodes-to-find |
Integer |
100 |
Percentage of available nodes selected during job scheduling to the total number of nodes in a cluster. |
Parameter |
Type |
Default Value |
Description |
|---|---|---|---|
--log-dir |
String |
None |
Log directory. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-controller. |
--log-file |
String |
None |
Log file name. The default value in the component startup YAML file is /var/log/mindx-dl/volcano-controller/volcano-controller.log. NOTE:
Dumped files are named in the format of "volcano-controller.log-dump triggering time.gz", for example, volcano-controller.log-20230926.gz. |
--logtostderr |
Bool |
false |
Whether to print logs in the standard output.
|
-v |
Integer |
4 |
Log output level.
|
--version |
Bool |
false |
volcano-controller binary version number. |
--log_file_max_size |
Integer |
1800 |
Maximum size of a log file, in MB. NOTE:
When the size of a log file exceeds the threshold, the log content is cleared. |
Volcano is open-source software. Only common startup parameters are listed here. For details about other parameters, see the description of the open-source software.