Virtualization Rules

Virtual Instance Templates

Table 1 lists the processor models and hardware resources.

Table 1 Specifications

Processor Model

Number of AI Cores

Memory

Number of AI CPUs

Total VPC Cores

Total VDEC Cores

Total JPEGD Cores

Total PNGD Cores

Total VENC Cores

Total JPEGE Cores

Atlas training product (30 AI Cores)

30

32 GB

14

16

16

16

24

N/A

8

Atlas training product (32 AI Cores)

32

32 GB

14

16

16

16

24

N/A

8

Atlas inference product (8 AI Cores)

8

24 GB

7

12

12

16

N/A

3

8

An Ascend AI processor consists of hardware resources such as the AI Cores, AI CPUs, DVPP modules, and memory. Their functions are as follows:

  • AI Cores are mainly used for matrix multiplication and suitable for convolution models.
  • AI CPUs execute CPU operators (including control operators, scalars, vectors, and other general computation).
  • The virtual instance function (creating vNPUs for a specified processor) enables SR-IOV and converts data CPUs into AI CPUs. As a result, the number of AI CPUs in NPU information changes.
  • The digital vision preprocessing (DVPP) module preprocesses (such as decoding and scaling) videos and images in various formats, and encodes and outputs processed videos and images. It consists of the VPC, VDEC, JPEGD, PNGD, VENC and JPEGE modules.
    • VPC: vision preprocessing core that provides capabilities such as image scaling, color space conversion (CSC), bit number reduction, storage format conversion, and block segmentation and conversion.
    • VDEC: video decoder that decodes videos in specific formats.
    • JPEGD: JPEG image decoder that decodes JPEG images.
    • PNGD: PNG image decoder that decodes PNG images.
    • VENC: video encoder that encodes videos of specific formats.
    • JPEGE: JPEG image encoder that encodes images and outputs them in JPEG format.
Table 2 Virtual instance templates

Product Model

Virtual Instance Template

Description

Atlas training product (30 or 32 AI Cores)

vir02, vir04, vir08, and vir16

  • The number following vir indicates the number of AI Cores.
  • The number before c indicates the number of AI CPUs.
  • dvpp indicates that all digital vision pre-processing (DVPP) modules (VPC, VDEC, JPEGD, PNGD, VENC and JPEGE) are included during virtualization.
  • ndvpp indicates that DVPP hardware resources are excluded during virtualization.

Atlas inference product (8 AI Cores)

vir01, vir02, vir04, vir02_1c, vir04_3c, vir04_3c_ndvpp, and vir04_4c_dvpp

  • The number following vir indicates the number of AI Cores.
  • The number before c indicates the number of AI CPUs.
  • dvpp indicates that all digital vision pre-processing (DVPP) modules (VPC, VDEC, JPEGD, PNGD, VENC and JPEGE) are included during virtualization.
  • ndvpp indicates that DVPP hardware resources are excluded during virtualization.
Table 3 Resources of each virtual instance template

Processor Model

Virtual Instance Template

Number of AI Cores

Memory

Number of AI CPUs

Total VPC Cores

Total VDEC Cores

Total JPEGD Cores

Total PNGD Cores

Total VENC Cores

Total JPEGE Cores

Atlas training product (30 or 32 AI Cores)

vir16

16

16 GB

7

8

8

8

12

N/A

4

vir08

8

8 GB

3

4

4

4

6

N/A

2

vir04

4

4 GB

1

2

2

2

3

N/A

1

vir02

2

2 GB

1

1

1

1

1

N/A

0

Atlas inference product (8 AI Cores)

vir04

4

12 GB

4

6

6

8

N/A

2

4

vir04_3c

4

12 GB

3

6

6

8

N/A

1

4

vir02

2

6 GB

2

3

3

4

N/A

1

2

vir02_1c

2

6 GB

1

3

3

4

N/A

0

2

vir01

1

3 GB

1

1

1

2

N/A

0

1

vir04_3c_ndvpp

4

12 GB

3

0

0

0

N/A

0

0

vir04_4c_dvpp

4

12 GB

4

12

12

16

N/A

3

8

Virtual Instance Specifications

Figure 1 shows the combination of virtualization instances supported by the Atlas inference product. One Ascend AI Processor can be virtualized into a maximum of seven instances. You can virtualize NPU hardware resources based on the combination specifications.

Seven AI CPUs are configured for Atlas inference product, so vNPU resources cannot be evenly allocated during virtualization (see Figure 1). You need to evaluate the resources required by the inference applications on the server. For example, the resources are not enough to create eight vir01 instances but can create six vir01 instances and one vir02_1c instance, or seven vir01 instances (one AI Core is wasted). Before using vNPUs for your inference applications, perform vNPU allocation test to find the optimal allocation policy.

If you want to obtain the inference performance data of typical models running on vNPUs, contact Huawei technical support.

Figure 1 Virtual instances supported by Atlas inference product

Virtual instance specifications supported by Atlas training product are not listed one by one here. According to the virtualization mechanism, an NPU cannot be further split when the number of AI Cores contained by virtual instances created on the NPU reaches the total number of AI Cores on the NPU. For example, the Ascend AI Processor, containing 30 AI Cores, supports only one vir16 instance. The remaining 14 AI Cores can be allocated to a vir08, a vir04, and a vir02 instance. If the Ascend AI Processor has 32 AI Cores, it supports two vir16 instances. When the Ascend AI Processor has 30 AI Cores, a maximum of 15 virtualization instances are supported. When the Ascend AI Processor has 32 AI Cores, a maximum of 16 virtualization instances are supported.

Virtualization Modes

There are two virtualization modes: hardware virtualization and software virtualization.
  • Hardware virtualization means to virtualize an NPU into multiple vNPUs. Each vNPU has its own hardware (AI Cores, AI CPUs, and memory) isolated from that of other vNPUs. Once a vNPU is assigned to an AI task, the AI task can run on the exclusive hardware without affecting other tasks.
  • Software virtualization means that a vNPU is created as a virtual instance. The NPU hardware resources are in a pool. When a virtual instance is allocated to an AI task, it invokes corresponding hardware resources from the NPU resource pool.

Atlas training product support only software virtualization. The vir04, vir04_3c, vir02, vir02_1c, vir04_3c_ndvpp, and vir04_4c_dvpp templates of Atlas inference product adopt hardware virtualization. The vir01 template adopts software virtualization.

Virtual instances of Atlas inference product also involve the concept of vGroup.

  • vGroup refers to a virtual resource group during NPU virtualization based on a specified virtualization template. Each vGroup contains several AI Cores, AI CPUs, on-chip memory, and DVPP resources.
  • If you use the vir04, vir04_3c, vir02, vir02_1c, vir04_3c_ndvpp, or vir04_4c_dvpp templates, the system creates a vGroup that contains the AI Cores and other hardware resources that match the virtual instance template. The vGroup then provides the resources for the vNPU. Figure 2 lists the mapping between virtual instance template combinations and vGroups.
  • A maximum of four vGroups are supported by Atlas inference product. A vGroup must contain at least two AI Cores. If you use the vir01 template (one vir01 or two vir01s), the vGroup allocated by the NPU also contains two AI Cores. vNPUs use vGroup resources in time-division multiplexing mode. For example, if two vNPUs are divided using two vir01 templates, each vNPU uses vGroup resources in turn in a serial manner (for example, vNPU1 uses the resources for 1 ms, and then vNPU2 also uses the resources for 1 ms).
Figure 2 Mapping between vGroups and virtual instance templates