Dynamic Virtualization

Before using dynamic virtualization, read Table 1.

Instruction

Table 1 Scenario description

Scenario

Description

General instructions

The allocated processor information is displayed in the pod annotation. For details about the pod annotation, see huawei.com/npu-core and huawei.com/AscendReal in Pod Annotations.

Only tasks using the same virtualization template can be delivered at the same time.

During dynamic vNPU allocation, the MindCluster scheduling arranges the physical NPUs with the least remaining computing power to be preferentially occupied.

Only one NPU can be requested by each pod.

Scenarios supported by the feature

Multiple replicas are supported, but each pod must use vNPUs.

The Kubernetes mechanism is supported, such as affinity.

Rescheduling is supported in the case of processor faults or node faults. For details, see Recovery of Inference Card Faults and Rescheduling Upon Inference Card Faults.

Scenarios not supported by the feature

Different processors cannot be used in the same job.

Volcano cannot be uninstalled during job running.

vNPUs are automatically created and destroyed in Kubernetes scenarios. Do not mix the operations with those used in Docker scenarios.

The CPU resources must not be configured on the node where dynamic virtualization is to be used.

Atlas inference product (8 AI Cores)

When vNPUs are used, the number of AI Cores that can be requested by a job is 1, 2, or 4. When the physical NPU is used, that number must be 8 or a multiple of 8.

The container is started by the root user. If you need to run an inference job as a common user, refer to Failure to Run an Inference Service Container as a Common User in Dynamic Virtualization Mode.

Dynamic vNPU creation and destruction are valid only on Atlas inference product and must be used with Volcano.

Table 2 Mapping between virtual instance templates and virtual device types

NPU Type

Virtual Instance Templates

vNPU Type

Virtual Device Name (vNPU ID 100 and Physical Processor ID 0 Are Used as Examples)

Atlas inference product (8 AI Cores)

vir01

Ascend310P-1c

Ascend310P-1c-100-0

vir02

Ascend310P-2c

Ascend310P-2c-100-0

vir04

Ascend310P-4c

Ascend310P-4c-100-0

vir02_1c

Ascend310P-2c.1cpu

Ascend310P-2c.1cpu-100-0

vir04_3c

Ascend310P-4c.3cpu

Ascend310P-4c.3cpu-100-0

vir04_3c_ndvpp

Ascend310P-4c.3cpu.ndvpp

Ascend310P-4c.3cpu.ndvpp-100-0

vir04_4c_dvpp

Ascend310P-4c.4cpu.dvpp

Ascend310P-4c.4cpu.dvpp-100-0

Prerequisite

  1. You need to obtain the Ascend-docker-runtime_{version}_linux-{arch}.run package and install the container engine plugin.
  2. Install components by referring to section Installation and Deployment.

    Parameters of Volcano and Ascend Device Plugin involved in virtual instances need to be modified. Modify the parameters based on the following requirements and use the corresponding YAML files for installation and deployment.

    1. Ascend Device Plugin parameter modification and startup.

      Parameters for starting a virtual instance:

      Table 3 Ascend Device Plugin startup parameters

      Parameter

      Type

      Default Value

      Description

      -volcanoType

      Bool

      false

      Whether to use Volcano for scheduling. For dynamic virtualization scenarios, set this parameter to true.

      -presetVirtualDevice

      Bool

      true

      Whether to enable static virtualization. Only Atlas training product and Atlas inference product support this function, and the value can only be set to true.

      If dynamic virtualization is used, set this parameter to false. Currently, dynamic virtualization is supported for Atlas inference product. Volcano needs to be enabled at the same time.

      Using YAML files for startup:

      For Atlas inference product in the Kubernetes cluster, change the value of presetVirtualDevice to false in device-plugin-310P-volcano-v{version}. (work with Volcano to support NPU virtualization. By default, dynamic virtualization is disabled in YAML.)

      ...
      args: [ "device-plugin  -useAscendDocker=true -volcanoType=true -presetVirtualDevice=false
                 -logFile=/var/log/mindx-dl/devicePlugin/devicePlugin.log -logLevel=0" ]
      ...
    2. Volcano parameter modification and startup:

      In the Volcano deployment file volcano-v{version}.yaml, set presetVirtualDevice to false.

      ...
      data:
        volcano-scheduler.conf: |
          actions: "enqueue, allocate, backfill"
          tiers:
          - plugins:
            - name: priority
            - name: gang
            - name: conformance
            - name: volcano-npu-v{version}_linux-aarch64   
          - plugins:
            - name: drf
            - name: predicates
            - name: proportion
            - name: nodeorder
            - name: binpack
          configurations:
           ...
            - name: init-params
              arguments: {"grace-over-time":"900","presetVirtualDevice":"false"}  # Enables dynamic virtualization. Set presetVirtualDevice to false.
      ...

Instructions

Modify the following configuration when creating a YAML file upon inference job creation (Atlas inference product as an example).

To allocate an AI Core, the requests and limits types set in resources need to be changed to huawei.com/npu-core. The following uses Deployment as an example:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-with-volcano
  labels:
    app: tf
  namespace: vnpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf
  template:
    metadata:
      labels:
        app: tf
        ring-controller.atlas: ascend-310P  # See Table 4.
        fault-scheduling: "grace"           # Label used for rescheduling.
        vnpu-dvpp: "yes"                    # See Table 4.
        vnpu-level: "low"                  # See Table 4.
    spec:
      schedulerName: volcano  # MindCluster Volcano is required.
      nodeSelector:
        host-arch: huawei-arm
      containers:
        - image: ubuntu:22.04   # Example image
          imagePullPolicy: IfNotPresent
          name: tf
          command:
          - "/bin/bash"
          - "-c"
          args: ["Customer's own running script"]
          resources:
            requests:
              huawei.com/npu-core: 1        #  Use the vir01 template to dynamically virtualize NPUs.
            limits:
             huawei.com/npu-core: 1        # The value is the same as that in requests.
 ....
Table 4 Virtual instance labels in the YAML file

Key

Value

Description

vnpu-level

low

Low configuration. This is the default value. Select the virtual instance template with the minimum configuration.

high

Performance comes in the first place.

If there are enough cluster resources, select a virtual instance template with the highest configuration. If most of the cluster resources are used, for example, most physical NPUs are used and only a small number of AI Cores are left on each physical NPU, other templates with lower configurations with the same number of AI Cores are used. For details, see Table 5.

vnpu-dvpp

yes

This pod uses DVPP.

no

This pod does not use DVPP.

null

This is the default value. Whether the DVPP is used is not concerned.

ring-controller.atlas

ascend-310P

Flag indicates that Atlas inference product is used.

Selection result of vnpu-level and vnpu-dvpp. For details, see Table 5.
  • Degrade in the table indicates that when the number of AI Cores meets the requirement, but other resources (such as the AI CPUs) are insufficient, another template that has the same number of AI Cores but different other resources will be selected. If only one processor is left with two AI Cores and one AI CPU, the vir02 template is degraded to vir02_1c.
  • The values listed under Template correspond to those listed under Virtual Instance Template of Atlas inference product, in Virtual Instance Templates.
  • In the vnpu-level column of the table, Other indicate any value except low and high.
  • If an entire processor (with 8 cores or a multiple of 8 cores) will be used, vnpu-dvpp and vnpu-level can be set to any value.
Table 5 DVPP and levels

Product Model

Number of Requested AI Cores

vnpu-dvpp

vnpu-level

Degrade (Y/N)

Template

Atlas inference product (8 AI Cores)

1

null

Any value

-

vir01

2

null

low/other

-

vir02_1c

high

No

vir02

Yes

vir02_1c

4

yes

low/other

-

vir04_4c_dvpp

no

vir04_3c_ndvpp

null

vir04_3c

yes

high

-

vir04_4c_dvpp

no

vir04_3c_ndvpp

null

No

vir04

Yes

vir04_3c

8 or a multiple of 8

Any value

Any value

-

-

Notes:

For Atlas inference product (with eight AI Cores), the number of AI Cores to be allocated must be 8 or a multiple of 8.

In the preceding table, for vNPUs, the value of vnpu-dvpp need to be consistent with that listed in the table. Otherwise, jobs cannot be delivered.