Dynamic Virtualization

Before using dynamic virtualization, read Table 1.

Instruction

**Table 1** Scenario description
Scenario	Description
General instructions	The allocated processor information is displayed in the pod annotation. For details about the pod annotation, see huawei.com/npu-core and huawei.com/AscendReal in Pod Annotations.
	Only tasks using the same virtualization template can be delivered at the same time.
	During dynamic vNPU allocation, the MindCluster scheduling arranges the physical NPUs with the least remaining computing power to be preferentially occupied.
	Only one NPU can be requested by each pod.
Scenarios supported by the feature	Multiple replicas are supported, but each pod must use vNPUs.
	The Kubernetes mechanism is supported, such as affinity.
	Rescheduling is supported in the case of processor faults or node faults. For details, see Recovery of Inference Card Faults and Rescheduling Upon Inference Card Faults.
Scenarios not supported by the feature	Different processors cannot be used in the same job.
	Volcano cannot be uninstalled during job running.
	vNPUs are automatically created and destroyed in Kubernetes scenarios. Do not mix the operations with those used in Docker scenarios.
	The CPU resources must not be configured on the node where dynamic virtualization is to be used.
Atlas inference product (8 AI Cores)	When vNPUs are used, the number of AI Cores that can be requested by a job is 1, 2, or 4. When the physical NPU is used, that number must be 8 or a multiple of 8.
	The container is started by the root user. If you need to run an inference job as a common user, refer to Failure to Run an Inference Service Container as a Common User in Dynamic Virtualization Mode.
	Dynamic vNPU creation and destruction are valid only on Atlas inference product and must be used with Volcano.

**Table 2** Mapping between virtual instance templates and virtual device types
NPU Type	Virtual Instance Templates	vNPU Type	Virtual Device Name (vNPU ID 100 and Physical Processor ID 0 Are Used as Examples)
Atlas inference product (8 AI Cores)	vir01	Ascend310P-1c	Ascend310P-1c-100-0
	vir02	Ascend310P-2c	Ascend310P-2c-100-0
	vir04	Ascend310P-4c	Ascend310P-4c-100-0
	vir02_1c	Ascend310P-2c.1cpu	Ascend310P-2c.1cpu-100-0
	vir04_3c	Ascend310P-4c.3cpu	Ascend310P-4c.3cpu-100-0
	vir04_3c_ndvpp	Ascend310P-4c.3cpu.ndvpp	Ascend310P-4c.3cpu.ndvpp-100-0
	vir04_4c_dvpp	Ascend310P-4c.4cpu.dvpp	Ascend310P-4c.4cpu.dvpp-100-0

Prerequisite

You need to obtain the Ascend-docker-runtime_{version}_linux-{arch}.run package and install the container engine plugin.

Install components by referring to section Installation and Deployment.

Parameters of Volcano and Ascend Device Plugin involved in virtual instances need to be modified. Modify the parameters based on the following requirements and use the corresponding YAML files for installation and deployment.

Ascend Device Plugin parameter modification and startup.

Parameters for starting a virtual instance:

**Table 3** Ascend Device Plugin startup parameters
Parameter	Type	Default Value	Description
-volcanoType	Bool	false	Whether to use Volcano for scheduling. For dynamic virtualization scenarios, set this parameter to true.
-presetVirtualDevice	Bool	true	Whether to enable static virtualization. Only Atlas training product and Atlas inference product support this function, and the value can only be set to true. If dynamic virtualization is used, set this parameter to false. Currently, dynamic virtualization is supported for Atlas inference product. Volcano needs to be enabled at the same time.

Using YAML files for startup:

For Atlas inference product in the Kubernetes cluster, change the value of presetVirtualDevice to false in device-plugin-310P-volcano-v{version}. (work with Volcano to support NPU virtualization. By default, dynamic virtualization is disabled in YAML.)

...
args: [ "device-plugin  -useAscendDocker=true -volcanoType=true -presetVirtualDevice=false
           -logFile=/var/log/mindx-dl/devicePlugin/devicePlugin.log -logLevel=0" ]
...

Volcano parameter modification and startup:

In the Volcano deployment file volcano-v{version}.yaml, set presetVirtualDevice to false.

...
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
      - name: volcano-npu-v{version}_linux-aarch64   
    - plugins:
      - name: drf
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack
    configurations:
     ...
      - name: init-params
        arguments: {"grace-over-time":"900","presetVirtualDevice":"false"}  # Enables dynamic virtualization. Set presetVirtualDevice to false.
...

Instructions

Modify the following configuration when creating a YAML file upon inference job creation (Atlas inference product as an example).

To allocate an AI Core, the requests and limits types set in resources need to be changed to huawei.com/npu-core. The following uses Deployment as an example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-with-volcano
  labels:
    app: tf
  namespace: vnpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf
  template:
    metadata:
      labels:
        app: tf
        ring-controller.atlas: ascend-310P  # See Table 4.
        fault-scheduling: "grace"           # Label used for rescheduling.
        vnpu-dvpp: "yes"                    # See Table 4.
        vnpu-level: "low"                  # See Table 4.
    spec:
      schedulerName: volcano  # MindCluster Volcano is required.
      nodeSelector:
        host-arch: huawei-arm
      containers:
        - image: ubuntu:22.04   # Example image
          imagePullPolicy: IfNotPresent
          name: tf
          command:
          - "/bin/bash"
          - "-c"
          args: ["Customer's own running script"]
          resources:
            requests:
              huawei.com/npu-core: 1        #  Use the vir01 template to dynamically virtualize NPUs.
            limits:
             huawei.com/npu-core: 1        # The value is the same as that in requests.
 ....

**Table 4** Virtual instance labels in the YAML file
Key	Value	Description
vnpu-level	low	Low configuration. This is the default value. Select the virtual instance template with the minimum configuration.
vnpu-level	high	Performance comes in the first place. If there are enough cluster resources, select a virtual instance template with the highest configuration. If most of the cluster resources are used, for example, most physical NPUs are used and only a small number of AI Cores are left on each physical NPU, other templates with lower configurations with the same number of AI Cores are used. For details, see Table 5.
vnpu-dvpp	yes	This pod uses DVPP.
	no	This pod does not use DVPP.
	null	This is the default value. Whether the DVPP is used is not concerned.
ring-controller.atlas	ascend-310P	Flag indicates that Atlas inference product is used.

Selection result of vnpu-level and vnpu-dvpp. For details, see Table 5.

Degrade in the table indicates that when the number of AI Cores meets the requirement, but other resources (such as the AI CPUs) are insufficient, another template that has the same number of AI Cores but different other resources will be selected. If only one processor is left with two AI Cores and one AI CPU, the vir02 template is degraded to vir02_1c.
The values listed under Template correspond to those listed under Virtual Instance Template of Atlas inference product, in Virtual Instance Templates.
In the vnpu-level column of the table, Other indicate any value except low and high.
If an entire processor (with 8 cores or a multiple of 8 cores) will be used, vnpu-dvpp and vnpu-level can be set to any value.

**Table 5** DVPP and levels
Product Model	Number of Requested AI Cores	vnpu-dvpp	vnpu-level	Degrade (Y/N)	Template
Atlas inference product (8 AI Cores)	1	null	Any value	-	vir01
	2	null	low/other	-	vir02_1c
			high	No	vir02
			high	Yes	vir02_1c
	4	yes	low/other	-	vir04_4c_dvpp
		no			vir04_3c_ndvpp
		null			vir04_3c
		yes	high	-	vir04_4c_dvpp
		no		-	vir04_3c_ndvpp
		null		No	vir04
		null		Yes	vir04_3c
	8 or a multiple of 8	Any value	Any value	-	-
Notes: For Atlas inference product (with eight AI Cores), the number of AI Cores to be allocated must be 8 or a multiple of 8.

In the preceding table, for vNPUs, the value of vnpu-dvpp need to be consistent with that listed in the table. Otherwise, jobs cannot be delivered.

Parent topic: Method 2: Mounting vNPUs Using Kubernetes