Preparation of Job YAML Files
- If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.
- Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.
Procedure
- Download the corresponding YAML file.
Table 1 YAML files of different hardware models Job Type
Hardware Model
YAML File Path
How to Obtain
Deployment job scheduled by Volcano
Atlas 200I SoC A1 core board
infer-deploy-310p-1usoc.yaml
Inference nodes of other types
infer-deploy.yaml
Volcano Job
Atlas 800I A2 inference server
A200I A2 Box heterogeneous component
Atlas 800I A3 SuperPoD Server
infer-vcjob-910.yaml
Ascend Job
Inference server (equipped with Atlas 300I Duo inference cards)
pytorch_acjob_infer_310p_with_ranktable.yaml
Atlas 800I A2 inference server
A200I A2 Box heterogeneous component
Atlas 800I A3 SuperPoD Server
pytorch_multinodes_acjob_infer_{xxx}b_with_ranktable.yaml
For Volcano Jobs, you need to modify the corresponding YAML file based on the example YAML file.
- In addition to basic YAML configuration for full NPU scheduling or dynamic vNPU scheduling, add the following fields in bold to enable the rescheduling function. The infer-deploy.yaml file for full NPU scheduling is used as an example.
apiVersion: apps/v1 kind: Deployment metadata: name: resnetinfer1-1-deploy labels: app: infers spec: replicas: 1 selector: matchLabels: app: infers template: metadata: labels: ... fault-scheduling: grace # Add this field. ring-controller.atlas: ascend-310 # Add this field. spec: schedulerName: volcano nodeSelector: host-arch: huawei-arm # Select the os arch. If the os arch is x86, change it to huawei-x86. ...Table 2 fault-scheduling description Parameter
Value
Description
fault-scheduling
grace
Job rescheduling enabled. Gracefully delete the original pod during the rescheduling.
force
Forcible deletion mode enabled for a job to forcibly delete the original pod during the process.
ring-controller.atlas
- Inference server (equipped with Atlas 300I inference cards): ascend-310
- Atlas inference product: ascend-310P
- Atlas 800I A2 inference server/A200I A2 Box heterogeneous component/Atlas 800I A3 SuperPoD Server: ascend-{xxx}b
Indicates the processor type used by the job.
- Mount the weight file.
... ports: # Collective communication port for distributed training - containerPort: 2222 name: ascendjob-port resources: limits: huawei.com/Ascend310P: 1 # Number of allocated processors requests: huawei.com/Ascend310P: 1 # The value must be the same as that of limits. volumeMounts: ... # Mount path of the weight file - name: weights mountPath: /path-to-weights ... volumes: ... # Mount path of the weight file - name: weights hostPath: path: /path-to-weights # Shared storage or local storage path. Change it as required. ...
- /path-to-weights indicates model weights, which need to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
- The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.
- Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.
... containers: - image: ubuntu-infer:v1 ... command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"] resources: requests: ...