(Optional) Switch Affinity Scheduling of Volcano
Volcano supports affinity scheduling of switches. To use this function, you need to upload the mapping between switches and server nodes for Volcano to use.
Currently, only training and inference jobs support switch affinity scheduling of the entire NPU. Static or dynamic vNPU scheduling is not supported.
Procedure
- Prepare the network design LLD document of the deployment environment and upload it to any directory (for example, /home/tor-affinity) on the Kubernetes management node.
The LLD file name must be lld.xlsx.
- Obtain the LLD document parsing script.
Go to the mindcluster-deploy repository and access the corresponding branch based on mindcluster-deploy Version Description. Download the lld_to_cm.py file in the samples/utils directory and upload the file to the directory on the management node used in Step 1.
- Start the lld_to_cm.py script.
python ./lld_to_cm.py --num 32
- Use the --num (or -n) subcommand to specify the number of nodes under a switch. If this parameter is not specified, the default value 4 is used.
- Use the --level (or -l) subcommand to specify the switch networking type. If this parameter is not specified, the default value double_layer is used.
- single_layer: single-layer switch networking
- double_layer: double-layer switch networking
- This script requires the openpyxl module. If the module is missing in the installation environment, run the pip install openpyxl command to install it.
- Check whether a ConfigMap is successfully created.
kubectl get cm -n kube-system basic-tor-node-cm
If the following information is displayed, the creation is successful:
1 2
NAME DATA AGE basic-tor-node-cm 1 8s
Configuring Affinity Scheduling for Switches
To configure affinity scheduling for switches, you need to set the tor-affinity parameter in the job YAML file. For details about the parameter, see the following table.
Parameter |
Value |
Description |
|---|---|---|
(.kind=="AscendJob").metadata.labels.tor-affinity |
|
The default value is null, indicating that switch affinity scheduling is not used. You need to set this parameter based on the job type. NOTE:
|