Affinity Scheduling Interconnection
To decouple the scheduling layer from the task resource type, the scheduling plugin Ascend-for-volcano supports the configuration of pod-level scheduling policy. You can configure scheduling parameters in metadata.labels or metadata.annotations of a pod, without depending on PodGroup. The pod types supported include acjob, vcjob, Job, Deployment, and StatefulSet.
Function Description
You can add a specific label or annotation to the pod template of Kubernetes resources to control core scheduling behavior of Volcano, including but not limited to the following:
- Ascend AI processor-based affinity scheduling
- Switch affinity scheduling
- Affinity scheduling of logical SuperPoDs
- Rescheduling upon faults
Prerequisite
Ensure that the Kubernetes cluster has been correctly deployed, Volcano has been configured, and Ascend-for-volcano has been enabled.
Example of Scheduling Policy Configuration
Take StatefulSet as an example. All labels and annotations related to scheduling must be configured under StatefulSet.spec.template.metadata to ensure that the scheduler can correctly read the labels and annotations from the pod instance.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mindx-dls-test # The value of this parameter must be consistent with the name of ConfigMap.
labels:
app: mindspore
ring-controller.atlas: ascend-910
spec:
replicas: 16 # The value of replicas is 1 in a single-node scenario and N in an N-node scenario. The number of NPUs in the requests field is 8 in an N-node scenario.
podManagementPolicy: Parallel # The OrderdReady and Parallel modes both are supported. OrderdReady supports only intra-node affinity scheduling, and huawei.com/schedule_minAvailable can only be set to 1. Parallel supports intra-node and inter-node affinity scheduling.
serviceName: service-headliness
selector:
matchLabels:
app: mindspore
template:
metadata:
labels:
app: mindspore
ring-controller.atlas: ascend-910
fault-scheduling: force # Scheduling upon faults
pod-rescheduling: "on" # Pod-level rescheduling
fault-retry-times: "85" # Number of rescheduling times when a service plane fault occurs
tor-affinity: large-model-schema # Switch affinity scheduling
deploy-name: mindx-dls-test # This label must be added to generate RankTable. The value must be the same as the task name.
annotations:
sp-block: "128" # Affinity scheduling of logical SuperPoDs
huawei.com/recover_policy_path: pod # Pod-level rescheduling
huawei.com/schedule_minAvailable: "16" # Minimum number of replicas for job scheduling. It is recommended that the value be the same as the number of job replicas.
spec:
schedulerName: volcano # Use the Volcano scheduler to schedule jobs.
nodeSelector:
host-arch: huawei-arm # Configure the label based on the actual job.
containers:
- image: ubuntu:18.04 # Training framework image, which can be modified.
name: mindspore
resources:
requests:
huawei.com/Ascend910: 16 # Number of required NPUs. The maximum value is 16. You can add lines below to configure resources such as memory and CPU
limits:
huawei.com/Ascend910: 16 # The value must be consistent with that in requests.
- If a PodGroup is created, the scheduling configuration in spec overwrites the labels/annotations of its generated pod.
- For resources that can generate PodGroups, you can configure the corresponding scheduling policy in PodGroups to implement affinity scheduling.
- For details about the common labels and annotations, see PodGroup or Pod.