Affinity Scheduling Policies

Table 1 describes the characteristics and resource utilization rules of Ascend AI Processors of Atlas training product.

**Table 1** Affinity policies of Atlas training products
Priority	Policy Name	Details
1	HCCS affinity scheduling	Select Ascend AI Processors in one HCCS to improve communication performance. If one Ascend AI Processor needs to be allocated, ensure that it is selected from a single HCCS. The node with one available Ascend AI Processor is the best, with three being the next best option, followed by two, and lastly four. If two Ascend AI Processors need to be allocated, ensure that they are selected from a single HCCS. The node with two available Ascend AI Processors is the best, with four being the next best option, and lastly three. If four Ascend AI Processors need to be allocated, ensure that they are selected from a single HCCS. The node with four available Ascend AI Processors is the best. If eight Ascend AI Processors need to be allocated, the eight Ascend AI Processors of the node will be selected.
2	Full priority scheduling	Nodes that have been allocated with Ascend AI Processors are preferentially scheduled to reduce fragments. If one Ascend AI Processor needs to be allocated, choose a node whose resource capacity is eight and number of available Ascend AI Processors in the HCCS is one (ideally), three, two, or four. If two Ascend AI Processors need to be allocated, choose a node whose resource capacity is eight and number of available Ascend AI Processors in the HCCS is two (ideally), four, or three. If four Ascend AI Processors need to be allocated, choose a node whose resource capacity is eight and number of available Ascend AI Processors is four. If the number of Ascend AI Processors to be allocated is a multiple of eight, select a node whose capacity size is eight and that does not use any Ascend AI Processor. NOTE: When a distributed job is delivered, the job does not fully occupy a node as the full priority scheduling principle required. Description: Symptom: For example, in a cluster with two Atlas 800 training servers (model 9000), if three-processor, four-processor, and one-processor jobs are delivered at the same time, the three-processor and four-processor jobs are scheduled to the same node, and the one-processor job is scheduled to another node. Cause analysis: After Volcano schedules a job, there is a delay for Ascend Device Plugin to report the scheduled Ascend AI Processor topology to mindx-dl-deviceinfo-$*{node_name}*. As a result, Volcano fails to verify the number of Ascend AI Processors on a node and the job is scheduled to another node.
3	Even number priority scheduling	The HCCS that meets policies 1 to 2 is preferentially selected, and then the HCCS whose number of remaining Ascend AI Processors is an even number is selected.

Parent topic: Atlas training product