Affinity Scheduling Description
Affinity scheduling maximizes the computing power of Ascend AI processors by reducing resource fragments and network congestion.
- Resource fragments
After a job is deployed, deploy the remaining AscendAI processors in units with smoother network connections (such as nodes, SuperPoDs, or nodes under a single switch). This prevents job scheduling failures due to scattered resources even when the total number of AscendAI processors is sufficient.
- Network congestion
AscendAI processors can be connected in multiple modes. The interconnection mode varies depending on networking modes of different products, resulting in different network bandwidth. You can select a proper scheduling policy based on the interconnection mode of AscendAI processors to reduce network congestion.
Ascend AI Processor-based Affinity Scheduling
There are three processor connection modes in hardware products. Regarding scheduling priority, a job is preferentially scheduled to the Ascend AI processor within the same inference card or training card, then to the Ascend AI processor interconnected through HCCS, and finally to the Ascend AI processor interconnected through PCIe.
Huawei Cache Coherence System (HCCS) is the hardware form of Huawei Collective Communication Library (HCCL) that facilitates high-performance communication between servers in deep learning training scenarios.

Different hardware products may use one or more of the three interconnection modes. The following table describes scheduling policies in detail.
Product |
Ascend AI Processor Interconnection Mode |
Method to Reduce Network Congestion |
Method to Reduce Resource Fragments |
|---|---|---|---|
Atlas training product |
Four Ascend AI processors are interconnected through HCCS, and Ascend AI processors in HCCS rings are interconnected through PCIe. |
Allocate the job with four or fewer Ascend AI processors to one HCCS ring. |
If the network statuses of two resources are the same, use the one with fewer resource fragments generated after scheduling. |
Atlas 200T A2 Box16 heterogeneous subrack Atlas 200I A2 Box16 heterogeneous subrack |
Eight Ascend AI processors are interconnected through HCCS, and Ascend AI processors in HCCS rings are interconnected through PCIe. |
|
If the network statuses of two resources are the same, use the one with fewer resource fragments generated after scheduling. |
Atlas 900 A3 SuperPoD A200T A3 Box8 SuperPoD Server Atlas 800I A3 SuperPoD Server Atlas 800T A3 SuperPoD Server |
Two Ascend AI processors form eight HiAM modules through SIO, and each HiAM module is interconnected through HCCS. |
If the number of Ascend AI processors is an even number, they must be scheduled to one HiAM module. |
- |
Atlas 800 inference server (model 3000) (with Atlas 300I inference cards) |
Each inference card has four interconnected Ascend AI processors, but inference cards are not interconnected. |
If the number of allocated Ascend AI processors is less than 4 and scheduling is performed by inference card, the job must be scheduled to one inference card. |
If the network statuses of two resources are the same, use the one with fewer resource fragments generated after scheduling. |
Atlas 800 inference server (model 3000) (with Atlas 300I Duo inference cards) |
Two Ascend AI processors in each inference card are interconnected through HCCS, and inference cards are interconnected through PCIe. |
In distributed inference scheduling, a job must be scheduled to the entire Atlas 300I Duo inference card. If the number of Ascend AI processors required by a job is an odd number, the job is preferentially scheduled to the Atlas 300I Duo inference card whose number of remaining Ascend AI processors is 1. |
If the network statuses of two resources are the same, use the one with fewer resource fragments generated after scheduling. |
Node-based Affinity Scheduling
Nodes are connected through the RoCE network or interconnect device + RoCE network. The interconnect device network is preferentially used during job scheduling. The RoCE network uses the
- Products using RoCE connections: Atlas 800T A2 training server, Atlas 800I A2 inference server, A200I A2 Box heterogeneous component, Atlas 200T A2 Box16 heterogeneous subrack, Atlas 200I A2 Box16 heterogeneous subrack, Atlas 800 training server (model 9000), and Atlas 800 training server (model 9010)
- Products that use single-layer RoCE connections: Atlas 800I A2 inference server and A200I A2 Box heterogeneous component
- Products using UnifiedBus + RoCE connections: Atlas 900 A3 SuperPoD

Interconnection Mode |
Ascend AI Processor Interconnection Mode |
Scheduling Type |
Method to Reduce Network Congestion |
Method to Reduce Networking Costs |
Method to Reduce Resource Fragments |
|---|---|---|---|---|---|
RoCE dual-layer interconnection |
Global two-layer interconnection through |
Switch affinity scheduling 1.0 |
|
- |
If the network statuses of two resources are the same, use the one with fewer resource fragments generated after scheduling. |
Global two-layer interconnection through |
Switch affinity scheduling 2.0 |
|
- |
||
Single-layer RoCE connection |
Single-layer connection through leaf |
Single-layer switch affinity scheduling |
- |
Single-layer networking can meet the requirements of parameter plane interconnection, greatly reducing networking costs. |
|
RoCE + interconnect device |
Global interconnection through |
Affinity scheduling of logical SuperPoDs |
A network affinity unit with a high network communication requirement can be obtained based on the job splitting policy. Ensure that each network affinity unit is distributed in the interconnect device network. |
- |