Static Virtualization
Constraints
- Only single vNPU for single container tasks are supported. Creating copies is not supported.
- Volcano cannot be uninstalled during job running.
- Number of NPUs that can be requested by each pod:
- To create or destroy a vNPU in static virtualization scenarios, restart the Ascend Device Plugin
NPU Type |
Virtual Instance Templates |
vNPU Type |
Virtual Device Name (vNPU ID 100 and Physical Processor ID 0 Used as Examples) |
|---|---|---|---|
Atlas training product (30 or 32 AI Cores) |
vir02 |
Ascend910-2c |
Ascend910-2c-100-0 |
vir04 |
Ascend910-4c |
Ascend910-4c-100-0 |
|
vir08 |
Ascend910-8c |
Ascend910-8c-100-0 |
|
vir16 |
Ascend910-16c |
Ascend910-16c-100-0 |
|
Atlas inference product (8 AI Cores) |
vir01 |
Ascend310P-1c |
Ascend310P-1c-100-0 |
vir02 |
Ascend310P-2c |
Ascend310P-2c-100-0 |
|
vir04 |
Ascend310P-4c |
Ascend310P-4c-100-0 |
|
vir02_1c |
Ascend310P-2c.1cpu |
Ascend310P-2c.1cpu-100-0 |
|
vir04_3c |
Ascend310P-4c.3cpu |
Ascend310P-4c.3cpu-100-0 |
|
vir04_3c_ndvpp |
Ascend310P-4c.3cpu.ndvpp |
Ascend310P-4c.3cpu.ndvpp-100-0 |
|
vir04_4c_dvpp |
Ascend310P-4c.4cpu.dvpp |
Ascend310P-4c.4cpu.dvpp-100-0 |
Prerequisite
- You need to obtain the Ascend-docker-runtime_{version}_linux-{arch}.run package and install the container engine plugin.
- Install components by referring to section Installation and Deployment.
Parameters of Volcano and Ascend Device Plugin involved in virtual instances need to be modified. Modify the parameters based on the following requirements and use the corresponding YAML files for installation and deployment.
- Affinity scenario: Volcano is required.
- Non-affinity scenario: Volcano is not required and only the number of devices is reported to Kubernetes on the node.
- Ascend Device Plugin parameter modification and startup:
Parameters for starting a virtual instance:
Table 2 Ascend Device Plugin startup parameters Parameter
Type
Default Value
Description
-volcanoType
Bool
false
Whether to use Volcano for scheduling. For dynamic virtualization scenarios, set this parameter to true.
-presetVirtualDevice
Bool
true
Whether to enable static virtualization. Only Atlas training product and Atlas inference product support this function, and the value can only be set to true.
If dynamic virtualization is used, set this parameter to false. Currently, dynamic virtualization of Atlas inference product is supported, and Volcano needs to be enabled, that is, -volcanoType needs to be set to true.
Using YAML files for startup:- Atlas inference product in the Kubernetes cluster (Ascend Device Plugin works independently without Volcano.)
kubectl apply -f device-plugin-310P-v{version}.yaml
- Atlas training product in the Kubernetes cluster (Ascend Device Plugin works independently without Volcano and Ascend Operator.)
kubectl apply -f device-plugin-910-v{version}.yaml
- Atlas inference product in the Kubernetes cluster (Volcano used; NPU virtualization supported; dynamic virtualization disabled in YAML by default)
kubectl apply -f device-plugin-310P-volcano-v{version}.yaml - Atlas training product in the Kubernetes cluster (Volcano and Ascend Operator used together; NPU virtualization supported; dynamic virtualization disabled in YAML by default)
kubectl apply -f device-plugin-volcano-v{version}.yaml
If the Kubernetes cluster uses multiple types of Ascend AI processors, run the corresponding command for each type.
- Atlas inference product in the Kubernetes cluster (Ascend Device Plugin works independently without Volcano.)
- Volcano parameter modification and startup:
In the Volcano deployment file volcano-v{version}.yaml, presetVirtualDevice must be set to true.
... data: volcano-scheduler.conf: | actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang - name: conformance - name: volcano-npu-v7.3.0_linux-aarch64 # 7.3.0 indicates the MindCluster version. The code varies depending on the version. - plugins: - name: drf - name: predicates - name: proportion - name: nodeorder - name: binpack configurations: ... - name: init-params arguments: {"grace-over-time":"900","presetVirtualDevice":"true"} ...
Instructions
- Modify the following configuration when creating a YAML file upon training job creation (Atlas training product as an example).In the resources, the requests and limits types need to be changed to huawei.com/Ascend910-Y. The value of Y is related to the vNPU type. For details, see the virtual type in the Table 1.
... resources: requests: huawei.com/Ascend910-Y: 1 # The maximum number of requested vNPUs is 1. limits: huawei.com/Ascend910-Y: 1 # The value and the requested number are consistent. ... - Modify the following configuration when creating a YAML file upon inference job creation (Atlas inference product as an example).In the resources, the requests and limits types need to be changed to huawei.com/Ascend310P-Y. The value of Y is related to the vNPU type. For details, see the virtual type in the Table 1.
... resources: requests: huawei.com/Ascend310P-Y: 1 # The maximum number of requested vNPUs is 1. limits: huawei.com/Ascend310P-Y: 1 # The value and the requested number are consistent. ...