Preparation of Job YAML Files

If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.
Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.

Procedure

Download the corresponding YAML file.

**Table 1** YAML files of different hardware models
Job Type	Hardware Model	YAML File Path	How to Obtain
Deployment job scheduled by Volcano	Atlas 200I SoC A1 core board	infer-deploy-310p-1usoc.yaml	Click here.
Deployment job scheduled by Volcano	Inference nodes of other types	infer-deploy.yaml	Click here.
Volcano Job	Atlas 800I A2 inference server A200I A2 Box heterogeneous component Atlas 800I A3 SuperPoD Server	infer-vcjob-910.yaml	Click here.
Ascend Job	Inference server (equipped with Atlas 300I Duo inference cards)	pytorch_acjob_infer_310p_with_ranktable.yaml	Click here.
Ascend Job	Atlas 800I A2 inference server A200I A2 Box heterogeneous component Atlas 800I A3 SuperPoD Server	pytorch_multinodes_acjob_infer_{xxx}b_with_ranktable.yaml	Click here.

For Volcano Jobs, you need to modify the corresponding YAML file based on the example YAML file.

In addition to basic YAML configuration for full NPU scheduling or dynamic vNPU scheduling, add the following fields in bold to enable the rescheduling function. The infer-deploy.yaml file for full NPU scheduling is used as an example.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: resnetinfer1-1-deploy
  labels:
      app: infers
spec:
  replicas: 1
  selector:
    matchLabels:
      app: infers
  template:
    metadata:
      labels:
...
         fault-scheduling: grace               # Add this field.
         ring-controller.atlas: ascend-310   # Add this field.
    spec:
      schedulerName: volcano
      nodeSelector:
        host-arch: huawei-arm           # Select the os arch. If the os arch is x86, change it to huawei-x86.
...

**Table 2** fault-scheduling description
Parameter	Value	Description
fault-scheduling	grace	Job rescheduling enabled. Gracefully delete the original pod during the rescheduling.
fault-scheduling	force	Forcible deletion mode enabled for a job to forcibly delete the original pod during the process.
ring-controller.atlas	Inference server (equipped with Atlas 300I inference cards): ascend-310 Atlas inference product: ascend-310P Atlas 800I A2 inference server/A200I A2 Box heterogeneous component/Atlas 800I A3 SuperPoD Server: ascend-{xxx}b	Indicates the processor type used by the job.

Mount the weight file.

...
              ports:     # Collective communication port for distributed training
                - containerPort: 2222      
                  name: ascendjob-port      
              resources:
                limits:
                  huawei.com/Ascend310P: 1   # Number of allocated processors
                requests:
                  huawei.com/Ascend310P: 1   # The value must be the same as that of limits.
              volumeMounts:
...
                  # Mount path of the weight file
                - name: weights                  
                  mountPath: /path-to-weights
...
          volumes:
...
            # Mount path of the weight file
            - name: weights
              hostPath:
                path: /path-to-weights  # Shared storage or local storage path. Change it as required.
...

/path-to-weights indicates model weights, which need to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.

Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.

...
      containers:
      - image: ubuntu-infer:v1
...
        command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"]
        resources:
          requests:
...

Parent topic: Use on the CLI (Volcano)