Preparation of Job YAML Files
The cluster scheduling components provide YAML examples. You need to select an appropriate YAML example based on the functionality, model type, job type, and fault handling mode, and make necessary modifications according to actual requirements before using it.
Job Type |
Hardware Model |
Training Framework |
Model |
YAML File Name |
How to Obtain |
Description |
|---|---|---|---|---|---|---|
AscendJob |
|
PyTorch |
Qwen3 |
pytorch_multinodes_acjob_910b.yaml |
A two-server eight-processor job is presented in the example file by default. |
|
AscendJob |
|
MindSpore |
Qwen3 |
ms_multinodes_acjob_superpod.yaml |
A two-server 16-processor job is presented in the example file by default. |
|
AscendJob |
Atlas 900 A3 SuperPoD |
verl |
Qwen3-30B |
verl-resche.yaml |
A two-server 16-processor job is presented in the example file by default. |
Currently, resumable training does not provide the example YAML file of the Atlas 900 A3 SuperPoD. You can add the annotations field under labels in the example YAML file. Example:
...
labels:
...
annotations:
sp-block: "32" # Number of processors on a logical SuperPoD. For details about the sp-block field, see YAML Parameters.
...
Parent topic: Using Resumable Training on the CLI