亲和性调度对接说明
为了实现调度层与任务资源类型解耦,Ascend-for-volcano调度插件新增支持Pod级别调度策略的配置。用户可直接在Pod的metadata.labels或metadata.annotations中配置调度相关参数,无需依赖PodGroup,支持acjob、vcjob、Job、Deployment、StatefulSet等Pod类型。
功能介绍
通过在K8s资源的Pod模板中添加特定Label或Annotation,可控制Volcano的核心调度行为,包括但不限于:
- 昇腾AI处理器的亲和性调度
- 交换机亲和性调度
- 逻辑超节点亲和性调度
- 故障重调度
前提条件
确保Kubernetes集群已经正确部署并配置了Volcano调度器,并且相关的调度插件Ascend-for-volcano已启用。
调度策略配置示例
以StatefulSet为例,所有调度相关的labels/annotations均需配置在StatefulSet.spec.template.metadata下,确保调度器可以从Pod实例中正确读取。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mindx-dls-test # The value of this parameter must be consistent with the name of ConfigMap.
labels:
app: mindspore
ring-controller.atlas: ascend-910
spec:
replicas: 16 # The value of replicas is 1 in a single-node scenario and N in an N-node scenario. The number of NPUs in the requests field is 8 in an N-node scenario.
podManagementPolicy: Parallel # 支持OrderdReady和Parallel两种模式。“OrderdReady”仅支持节点内亲和调度并且huawei.com/schedule_minAvailable只能为1。“Parallel”支持节点内和节点间亲和调度
serviceName: service-headliness
selector:
matchLabels:
app: mindspore
template:
metadata:
labels:
app: mindspore
ring-controller.atlas: ascend-910
fault-scheduling: force # 故障重调度功能开关
pod-rescheduling: "on" # Pod级别重调度功能开关
fault-retry-times: "85" # 业务面故障重调度次数
tor-affinity: large-model-schema # 交换机亲和性调度开关
deploy-name: mindx-dls-test # 生成rankTable必须增加该标签,取值和任务名称保持一致
annotations:
sp-block: "128" # 逻辑超节点亲和性调度开关
huawei.com/recover_policy_path: pod # Pod级别重调度不升级为Job级开关
huawei.com/schedule_minAvailable: "16" # 任务调度的最小副本数,建议与任务副本数保持一致
spec:
schedulerName: volcano # Use the Volcano scheduler to schedule jobs.
nodeSelector:
host-arch: huawei-arm # Configure the label based on the actual job.
containers:
- image: ubuntu:18.04 # Training framework image, which can be modified.
name: mindspore
resources:
requests:
huawei.com/Ascend910: 16 # Number of required NPUs. The maximum value is 16. You can add lines below to configure resources such as memory and CPU
limits:
huawei.com/Ascend910: 16 # The value must be consistent with that in requests.
父主题: 亲和性调度
