Resource Allocation Constraints

Resource Allocation Constraints of Atlas Training Products

Based on the service model design, a training job must meet the following requirements:

  • The number of Ascend AI Processors allocated to a training job cannot be greater than the total number of Ascend AI Processors on a node.
  • If the number of Ascend AI Processors allocated to a training job is less than or equal to 4, the required Ascend AI Processors need to be scheduled to the same HCCS.
  • If the number of Ascend AI Processors allocated to a training job is 8, all Ascend AI Processors on a node need to be allocated to this job.
  • If Ascend AI Processors allocated to a training job are virtual devices (vNPUs), only one Ascend AI Processor can be allocated.
  • Resource allocation must comply with other constraints of the open source Volcano.

Scenario Description

Table 1 lists the scenarios based on the affinity policies and service model design.

Table 1 Affinity policy scenarios

Number of Ascend AI Processors Allocated to a Job

A

B

C

D

1

1–[0, 1, 2, 3, 4]

3–[0, 2, 3, 4]

2–[0, 2, 4]

4–[0, 4]

2

2−[0, 1, 2, 3, 4]

4−[0, 1, 3, 4]

3–[0, 1]

-

4

4−[0, 1, 2, 3, 4]

-

-

-

8

8

-

-

-

  • The four groups from A to D indicate the four HCCS scenarios that meet selection requirements of Ascend AI Processors. The priorities of Ascend AI Processor selection in these four groups are in descending order. That is, B, C, or D is selected only when A does not meet the requirements.
  • Remaining Ascend AI Processors of the node when HCCS affinity requirements are met: The left part of indicates the number of remaining Ascend AI Processors of the HCCS that meets the requirements, and the right part indicates the remaining Ascend AI Processors of the other HCCS. For example, for group A that allocates one Ascend AI Processor, the other HCCS may have 0, 1, 2, 3, or 4 remaining Ascend AI Processors.
  • If the number of Ascend AI Processors allocated by a job is greater than or equal to 8, all the Ascend AI Processors are placed in group A and all of them are occupied.