调度插件配置
若不需要超分调度(有多少资源调度多少任务),则需要确保MindCluster Volcano的调度插件中不包含overcommit插件,修改方法如下。
在MindCluster Volcano部署文件“volcano-v{version}.yaml”中,如果存在“- name: overcommit”,则需要删除该配置。
...
data:
volcano-scheduler.conf: |
actions: "enqueue, allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
- name: conformance
- name: volcano-npu_v5.0.0.2_linux-aarch64 # 其中v5.0.0.2为MindX DL的版本号,根据不同版本,该处编码不同
- plugins:
- name: drf
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack
configurations:
- name: selector
arguments: {"host-arch":"huawei-arm|huawei-x86",
"accelerator":"huawei-Ascend910|nvidia-tesla-v100|nvidia-tesla-p40",
"accelerator-type":"card|module|half|module-{xxx}b-16|module-{xxx}b-8|card-{xxx}b-2|card-{xxx}b-infer","servertype":"soc"}
- name: init-params
arguments: {"grace-over-time":"900","presetVirtualDevice":"true"}
...
overcommit插件存在时,会按照资源的1.2倍来接纳任务。这样可提高作业吞吐量和作业带宽,但同时也会造成资源超载的任务进入调度队列后,可调度的任务无法分配到资源的情况。
父主题: (可选)配置Volcano