Customizing Labels

To support hybrid deployment of NPUs, x86 and ARM platforms, and standard cards and modules, you need to configure labels for each worker node so that cluster scheduling components can schedule worker nodes of different forms.

You can configure a label to specify the node where a job is to be run. The label configuration involves Job, volcano-scheduler, and Node. The three labels must be matched (that is, the label configured for Job can be found in volcano-scheduler and Node). Customizing Labels shows the relationship between the three labels.
Figure 1 Process of customizing a label
  • The nodeSelector label of host-arch must be configured for NPU jobs. The default content is huawei-arm or huawei-x86. Modification is not supported.
  • If a label is configured for a job, the label must match the label configured by volcano-scheduler. If they do not match, set the job status to pending and provide the reason. If they match, the process goes to the next step.
  • The job label is in the label list of volcano-scheduler. The volcano-scheduler needs to select the node with the same label. If they do not match, set the job status to pending and provide the reason. If there is a matched node, the scheduling is performed according to other rules.

Customizing a volcano-scheduler Label

In the Volcano deployment file volcano-v*.yaml, configure the name: selector part in configurations. The configuration needs to be added to arguments.

...
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill"
    tiers:
    - plugins:
      - name: volcano-npu-v3.0.0_linux-aarch64  # v3.0.0 indicates the MindX DL version. The number varies depending on the version.
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
    - plugins:
      - name: drf
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack
    configurations:
      - name: selector
        arguments: {"host-arch":"huawei-arm|huawei-x86",
        "accelerator":"huawei-Ascend910|nvidia-tesla-v100|nvidia-tesla-p40",
        "accelerator-type":"card|module|half"}
...
  • The configuration mode adopts the map format. Currently, only English input is supported. If a label has multiple values, use vertical bars (|) to separate them.
  • If the value of ascend device plugin is Ascend910, "host-arch":"huawei-arm|huawei-x86" in arguments is the default configuration and cannot be modified. If you need to use other labels, add them.
  • If host-arch is set to huawei-arm|huawei-x86, it cannot be configured or modified and takes effect only for NPU jobs.

Customizing a Job Label

You can add custom labels to the YAML file of a training job as required. To download a complete YAML file, visit each cluster scheduling component's Gitee code repository and find the "samples" section. Jobs of the NPU type must contain the nodeSelector label of host-arch:huawei-arm or host-arch:huawei-x86. Jobs of other types are not restricted.

The related configuration of the YAML file is as follows:

...
 spec:
        containers:
        ...
        nodeSelector:
          accelerator: nvidia-tesla-v100
        volumes:
...

Customizing a Node Label

The node label must be operated on the master node where Kubernetes is installed.

  • Creating a Node Label

    kubectl label nodes {HostName} {label_key}={label_value}

    Parameter description:

    • {HostName}: name of the host to be added.
    • label_key and label_value must match the configurations in Job and volcano-scheduler.

    See the following example:

    kubectl label nodes ubuntu accelerator=nvidia-tesla-p40

  • Modifying a Node Label

    kubectl label nodes {HostName} {label_key}={label_value} --overwrite=true

    See the following example:

    kubectl label nodes ubuntu accelerator=vidia-tesla-p40 --overwrite=true

  • Deleting a Node Label

    kubectl label nodes {HostName} {label_key} -

    See the following example:

    kubectl label nodes ubuntu accelerator -