Dynamic Virtualization
Before using dynamic virtualization, read Table 1.
Instruction
Scenario |
Description |
|---|---|
General instructions |
The allocated processor information is displayed in the pod annotation. For details about the pod annotation, see huawei.com/npu-core and huawei.com/AscendReal in Pod Annotations. |
Only tasks using the same virtualization template can be delivered at the same time. |
|
During dynamic vNPU allocation, the MindCluster scheduling arranges the physical NPUs with the least remaining computing power to be preferentially occupied. |
|
Only one NPU can be requested by each pod. |
|
Scenarios supported by the feature |
Multiple replicas are supported, but each pod must use vNPUs. |
The Kubernetes mechanism is supported, such as affinity. |
|
Rescheduling is supported in the case of processor faults or node faults. For details, see Recovery of Inference Card Faults and Rescheduling Upon Inference Card Faults. |
|
Scenarios not supported by the feature |
Different processors cannot be used in the same job. |
Volcano cannot be uninstalled during job running. |
|
vNPUs are automatically created and destroyed in Kubernetes scenarios. Do not mix the operations with those used in Docker scenarios. |
|
The CPU resources must not be configured on the node where dynamic virtualization is to be used. |
|
Atlas inference product (8 AI Cores) |
When vNPUs are used, the number of AI Cores that can be requested by a job is 1, 2, or 4. When the physical NPU is used, that number must be 8 or a multiple of 8. |
The container is started by the root user. If you need to run an inference job as a common user, refer to Failure to Run an Inference Service Container as a Common User in Dynamic Virtualization Mode. |
|
Dynamic vNPU creation and destruction are valid only on Atlas inference product and must be used with Volcano. |
NPU Type |
Virtual Instance Templates |
vNPU Type |
Virtual Device Name (vNPU ID 100 and Physical Processor ID 0 Are Used as Examples) |
|---|---|---|---|
Atlas inference product (8 AI Cores) |
vir01 |
Ascend310P-1c |
Ascend310P-1c-100-0 |
vir02 |
Ascend310P-2c |
Ascend310P-2c-100-0 |
|
vir04 |
Ascend310P-4c |
Ascend310P-4c-100-0 |
|
vir02_1c |
Ascend310P-2c.1cpu |
Ascend310P-2c.1cpu-100-0 |
|
vir04_3c |
Ascend310P-4c.3cpu |
Ascend310P-4c.3cpu-100-0 |
|
vir04_3c_ndvpp |
Ascend310P-4c.3cpu.ndvpp |
Ascend310P-4c.3cpu.ndvpp-100-0 |
|
vir04_4c_dvpp |
Ascend310P-4c.4cpu.dvpp |
Ascend310P-4c.4cpu.dvpp-100-0 |
Prerequisite
- You need to obtain the Ascend-docker-runtime_{version}_linux-{arch}.run package and install the container engine plugin.
- Install components by referring to section Installation and Deployment.
Parameters of Volcano and Ascend Device Plugin involved in virtual instances need to be modified. Modify the parameters based on the following requirements and use the corresponding YAML files for installation and deployment.
- Ascend Device Plugin parameter modification and startup.
Parameters for starting a virtual instance:
Table 3 Ascend Device Plugin startup parameters Parameter
Type
Default Value
Description
-volcanoType
Bool
false
Whether to use Volcano for scheduling. For dynamic virtualization scenarios, set this parameter to true.
-presetVirtualDevice
Bool
true
Whether to enable static virtualization. Only Atlas training product and Atlas inference product support this function, and the value can only be set to true.
If dynamic virtualization is used, set this parameter to false. Currently, dynamic virtualization is supported for Atlas inference product. Volcano needs to be enabled at the same time.
Using YAML files for startup:
For Atlas inference product in the Kubernetes cluster, change the value of presetVirtualDevice to false in device-plugin-310P-volcano-v{version}. (work with Volcano to support NPU virtualization. By default, dynamic virtualization is disabled in YAML.)
... args: [ "device-plugin -useAscendDocker=true -volcanoType=true -presetVirtualDevice=false -logFile=/var/log/mindx-dl/devicePlugin/devicePlugin.log -logLevel=0" ] ... - Volcano parameter modification and startup:
In the Volcano deployment file volcano-v{version}.yaml, set presetVirtualDevice to false.
... data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang - name: conformance - name: volcano-npu-v{version}_linux-aarch64 - plugins: - name: drf - name: predicates - name: proportion - name: nodeorder - name: binpack configurations: ... - name: init-params arguments: {"grace-over-time":"900","presetVirtualDevice":"false"} # Enables dynamic virtualization. Set presetVirtualDevice to false. ...
- Ascend Device Plugin parameter modification and startup.
Instructions
Modify the following configuration when creating a YAML file upon inference job creation (Atlas inference product as an example).
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-with-volcano
labels:
app: tf
namespace: vnpu
spec:
replicas: 1
selector:
matchLabels:
app: tf
template:
metadata:
labels:
app: tf
ring-controller.atlas: ascend-310P # See Table 4.
fault-scheduling: "grace" # Label used for rescheduling.
vnpu-dvpp: "yes" # See Table 4.
vnpu-level: "low" # See Table 4.
spec:
schedulerName: volcano # MindCluster Volcano is required.
nodeSelector:
host-arch: huawei-arm
containers:
- image: ubuntu:22.04 # Example image
imagePullPolicy: IfNotPresent
name: tf
command:
- "/bin/bash"
- "-c"
args: ["Customer's own running script"]
resources:
requests:
huawei.com/npu-core: 1 # Use the vir01 template to dynamically virtualize NPUs.
limits:
huawei.com/npu-core: 1 # The value is the same as that in requests.
....
Key |
Value |
Description |
|---|---|---|
vnpu-level |
low |
Low configuration. This is the default value. Select the virtual instance template with the minimum configuration. |
high |
Performance comes in the first place. If there are enough cluster resources, select a virtual instance template with the highest configuration. If most of the cluster resources are used, for example, most physical NPUs are used and only a small number of AI Cores are left on each physical NPU, other templates with lower configurations with the same number of AI Cores are used. For details, see Table 5. |
|
vnpu-dvpp |
yes |
This pod uses DVPP. |
no |
This pod does not use DVPP. |
|
null |
This is the default value. Whether the DVPP is used is not concerned. |
|
ring-controller.atlas |
ascend-310P |
Flag indicates that Atlas inference product is used. |
- Degrade in the table indicates that when the number of AI Cores meets the requirement, but other resources (such as the AI CPUs) are insufficient, another template that has the same number of AI Cores but different other resources will be selected. If only one processor is left with two AI Cores and one AI CPU, the vir02 template is degraded to vir02_1c.
- The values listed under Template correspond to those listed under Virtual Instance Template of Atlas inference product, in Virtual Instance Templates.
- In the vnpu-level column of the table, Other indicate any value except low and high.
- If an entire processor (with 8 cores or a multiple of 8 cores) will be used, vnpu-dvpp and vnpu-level can be set to any value.
Product Model |
Number of Requested AI Cores |
vnpu-dvpp |
vnpu-level |
Degrade (Y/N) |
Template |
|---|---|---|---|---|---|
Atlas inference product (8 AI Cores) |
1 |
null |
Any value |
- |
vir01 |
2 |
null |
low/other |
- |
vir02_1c |
|
high |
No |
vir02 |
|||
Yes |
vir02_1c |
||||
4 |
yes |
low/other |
- |
vir04_4c_dvpp |
|
no |
vir04_3c_ndvpp |
||||
null |
vir04_3c |
||||
yes |
high |
- |
vir04_4c_dvpp |
||
no |
vir04_3c_ndvpp |
||||
null |
No |
vir04 |
|||
Yes |
vir04_3c |
||||
8 or a multiple of 8 |
Any value |
Any value |
- |
- |
|
Notes: For Atlas inference product (with eight AI Cores), the number of AI Cores to be allocated must be 8 or a multiple of 8. |
|||||
In the preceding table, for vNPUs, the value of vnpu-dvpp need to be consistent with that listed in the table. Otherwise, jobs cannot be delivered.