YAML Selection
Various YAML examples are provided by cluster scheduling components. You can select an appropriate YAML example based on the used component, processor type, and job type, and make necessary modifications according to actual requirements before using it.
Resource Information Configuration Using Environment Variables
- If
Atlas A2 training product are used in the current environment, refer to Table 1 to obtain the corresponding YAML example.Then, modify and adapt the YAML files of the Atlas 800T A2 training server, Atlas 200T A2 Box16 heterogeneous subrack, and A200T A3 Box8 SuperPoD Server based on the parameter description provided in Table 1.
- If Atlas training product are used in the current environment, refer to Table 2 to obtain the corresponding YAML example.
Then, modify the YAML files of the servers (with Atlas 300T training cards) based on the YAML file of the Atlas 800 training server and the parameter description provided in Table 1.
- If
Atlas A3 training product are used in the current environment, refer to Table 3 to obtain the corresponding YAML example.
Resource Information Configuration Using Configuration Files
- If
Atlas A2 training product are used in the current environment, refer to Table 4 to obtain the corresponding YAML example.Then, modify the YAML files of Atlas 800T A2 training server, Atlas 200T A2 Box16 heterogeneous subrack, and A200T A3 Box8 SuperPoD Server based on the parameter description provided in Table 2.
- If Atlas training product are used in the current environment, refer to Table 5 to obtain the corresponding YAML example.
Job Type |
Hardware Model |
Training Framework |
YAML File Name |
Description |
How to Obtain |
|---|---|---|---|---|---|
VolcanoJob |
Atlas 900 A2 PoD cluster basic unit |
TensorFlow |
a800_tensorflow_vcjob.yaml |
A single-server 16-processor job is presented in the example file by default. |
|
PyTorch |
a800_pytorch_vcjob.yaml |
||||
MindSpore |
a800_mindspore_vcjob.yaml |
||||
Deployment |
Atlas 900 A2 PoD cluster basic unit |
TensorFlow |
a800_tensorflow_deployment.yaml |
A single-server 16-processor job is presented in the example file by default. |
|
PyTorch |
a800_pytorch_deployment.yaml |
||||
MindSpore |
a800_mindspore_deployment.yaml |
Job Type |
Hardware Model |
Training Framework |
YAML File Name |
Description |
How to Obtain |
|---|---|---|---|---|---|
VolcanoJob |
Atlas 800 training server |
TensorFlow |
a800_tensorflow_vcjob.yaml |
A single-server eight-processor job is presented in the example file by default. |
|
PyTorch |
a800_pytorch_vcjob.yaml |
||||
MindSpore |
a800_mindspore_vcjob.yaml |
||||
Server (with Atlas 300T training cards) |
TensorFlow |
a300t_tensorflow_vcjob.yaml |
A single-server single-processor job is presented in the example file by default. |
||
PyTorch |
a300t_pytorch_vcjob.yaml |
||||
MindSpore |
a300t_mindspore_vcjob.yaml |
||||
Deployment |
Atlas 800 training server |
TensorFlow |
a800_tensorflow_deployment.yaml |
A single-server eight-processor job is presented in the example file by default. |
|
PyTorch |
a800_pytorch_deployment.yaml |
||||
MindSpore |
a800_mindspore_deployment.yaml |
||||
Server (with Atlas 300T training cards) |
TensorFlow |
a300t_tensorflow_deployment.yaml |
A single-server single-processor job is presented in the example file by default. |
||
PyTorch |
a300t_pytorch_deployment.yaml |
A single-server eight-processor job is presented in the example file by default. |
|||
MindSpore |
a300t_mindspore_deployment.yaml |
A single-server single-processor job is presented in the example file by default. |