Static Virtualization

Constraints

  • Only single vNPU for single container tasks are supported. Creating copies is not supported.
  • Volcano cannot be uninstalled during job running.
  • Number of NPUs that can be requested by each pod:

    Only one vNPU is supported.

  • To create or destroy a vNPU in static virtualization scenarios, restart the Ascend Device Plugin
Table 1 Mapping between virtual instance templates and virtual device types

NPU Type

Virtual Instance Templates

vNPU Type

Virtual Device Name (vNPU ID 100 and Physical Processor ID 0 Used as Examples)

Atlas training product (30 or 32 AI Cores)

vir02

Ascend910-2c

Ascend910-2c-100-0

vir04

Ascend910-4c

Ascend910-4c-100-0

vir08

Ascend910-8c

Ascend910-8c-100-0

vir16

Ascend910-16c

Ascend910-16c-100-0

Atlas inference product (8 AI Cores)

vir01

Ascend310P-1c

Ascend310P-1c-100-0

vir02

Ascend310P-2c

Ascend310P-2c-100-0

vir04

Ascend310P-4c

Ascend310P-4c-100-0

vir02_1c

Ascend310P-2c.1cpu

Ascend310P-2c.1cpu-100-0

vir04_3c

Ascend310P-4c.3cpu

Ascend310P-4c.3cpu-100-0

vir04_3c_ndvpp

Ascend310P-4c.3cpu.ndvpp

Ascend310P-4c.3cpu.ndvpp-100-0

vir04_4c_dvpp

Ascend310P-4c.4cpu.dvpp

Ascend310P-4c.4cpu.dvpp-100-0

Prerequisite

  1. You need to obtain the Ascend-docker-runtime_{version}_linux-{arch}.run package and install the container engine plugin.
  2. Install components by referring to section Installation and Deployment.

    Parameters of Volcano and Ascend Device Plugin involved in virtual instances need to be modified. Modify the parameters based on the following requirements and use the corresponding YAML files for installation and deployment.

    • Affinity scenario: Volcano is required.
    • Non-affinity scenario: Volcano is not required and only the number of devices is reported to Kubernetes on the node.
    1. Ascend Device Plugin parameter modification and startup:

      Parameters for starting a virtual instance:

      Table 2 Ascend Device Plugin startup parameters

      Parameter

      Type

      Default Value

      Description

      -volcanoType

      Bool

      false

      Whether to use Volcano for scheduling. For dynamic virtualization scenarios, set this parameter to true.

      -presetVirtualDevice

      Bool

      true

      Whether to enable static virtualization. Only Atlas training product and Atlas inference product support this function, and the value can only be set to true.

      If dynamic virtualization is used, set this parameter to false. Currently, dynamic virtualization of Atlas inference product is supported, and Volcano needs to be enabled, that is, -volcanoType needs to be set to true.

      Using YAML files for startup:
      • Atlas inference product in the Kubernetes cluster (Ascend Device Plugin works independently without Volcano.)
        kubectl apply -f device-plugin-310P-v{version}.yaml
      • Atlas training product in the Kubernetes cluster (Ascend Device Plugin works independently without Volcano and Ascend Operator.)
        kubectl apply -f device-plugin-910-v{version}.yaml
      • Atlas inference product in the Kubernetes cluster (Volcano used; NPU virtualization supported; dynamic virtualization disabled in YAML by default)
        kubectl apply -f device-plugin-310P-volcano-v{version}.yaml
      • Atlas training product in the Kubernetes cluster (Volcano and Ascend Operator used together; NPU virtualization supported; dynamic virtualization disabled in YAML by default)
        kubectl apply -f device-plugin-volcano-v{version}.yaml

      If the Kubernetes cluster uses multiple types of Ascend AI processors, run the corresponding command for each type.

    2. Volcano parameter modification and startup:

      In the Volcano deployment file volcano-v{version}.yaml, presetVirtualDevice must be set to true.

      ...
      data:
        volcano-scheduler.conf: |
          actions: "enqueue, allocate, backfill"
          tiers:
          - plugins:
            - name: priority
            - name: gang
            - name: conformance
            - name: volcano-npu-v7.3.0_linux-aarch64    # 7.3.0 indicates the MindCluster version. The code varies depending on the version.
          - plugins:
            - name: drf
            - name: predicates
            - name: proportion
            - name: nodeorder
            - name: binpack
          configurations:
           ...
            - name: init-params
              arguments: {"grace-over-time":"900","presetVirtualDevice":"true"}  
      ...

Instructions

  • Modify the following configuration when creating a YAML file upon training job creation (Atlas training product as an example).
    In the resources, the requests and limits types need to be changed to huawei.com/Ascend910-Y. The value of Y is related to the vNPU type. For details, see the virtual type in the Table 1.
    ...
              resources:  
                requests:
                  huawei.com/Ascend910-Y: 1          #  The maximum number of requested vNPUs is 1.
                limits:
                  huawei.com/Ascend910-Y: 1          #  The value and the requested number are consistent.
    ...
  • Modify the following configuration when creating a YAML file upon inference job creation (Atlas inference product as an example).
    In the resources, the requests and limits types need to be changed to huawei.com/Ascend310P-Y. The value of Y is related to the vNPU type. For details, see the virtual type in the Table 1.
    ...
              resources:  
                requests:
                  huawei.com/Ascend310P-Y: 1          #  The maximum number of requested vNPUs is 1.
                limits:
                  huawei.com/Ascend310P-Y: 1          #  The value and the requested number are consistent.
    ...