(Optional) Configuring Instance-Level Affinity Scheduling
For the Atlas 800I A3 SuperPoD Server, MindCluster cluster scheduling components support job-level affinity scheduling for MindIE Motor inference jobs. That is, MindIE Server instances can be scheduled to the same physical SuperPoD as much as possible, fully utilizing the HCCS network and accelerating network communication between instances.
For details about the affinity scheduling rules of logical SuperPoDs, see UnifiedBus Interconnect Device Network Description.
Figure 1 UnifiedBus interconnect device network


Configuring Instance-Level Affinity Scheduling
After image preparation is finished, if instance-level affinity scheduling policies are required for MindIE Motor inference jobs during Preparing a Job YAML File, perform the following operations simultaneously.
- Specify the sp-block field in the job YAML file. The value of sp-block must be the same as the number of processors required by the job to ensure that the entire job can be scheduled to a physical SuperPoD.
- Ensure that there are reserved nodes in a physical SuperPoD for MindIE Server instance scheduling.
- If sp-fit is set to idlest, MindIE Server instances are scheduled to a more idle physical SuperPoD.
- If podAffinity is set, MindIE Server instances are scheduled to a physical SuperPoD with more affinity pods.
YAML example:
apiVersion: mindxdl.gitee.com/v1
kind: AscendJob
metadata:
name: mindie-server-0
namespace: mindie
labels:
framework: pytorch
app: mindie-ms-server # Role of MindIE Motor in the AscendJob, which cannot be changed.
jobID: mindie-ms-test # Unique ID of the MindIE Motor job in the cluster. Change the ID as required.
ring-controller.atlas: ascend-910b
fault-scheduling: force
annotations:
sp-block: "16" # The cluster scheduling components divide logical SuperPoDs from physical SuperPoDs based on the splitting policy for affinity scheduling of training jobs.
sp-fit: "idlest" # SuperPoD scheduling policy. For details, see YAML Parameters.
spec:
schedulerName: volcano # Scheduler selected when Ascend Operator enables gang scheduling.
runPolicy:
schedulingPolicy: # This field takes effect only when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.
minAvailable: 2 # Total number of running job replicas
queue: default
successPolicy: AllWorkers
replicaSpecs:
Master:
restartPolicy: Never
template:
metadata:
labels:
ring-controller.atlas: ascend-910b
spec:
affinity:
podAffinity: # Scheduling to the physical SuperPoD with more affinity pods
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100 # Cannot be changed
podAffinityTerm:
labelSelector:
matchLabels:
jobID: mindie-ms-test # Label required by affinity pods
topologyKey: kubernetes.io/hostname
Parent topic: Use on the CLI