Preparation of Job YAML Files

If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.

Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.

Procedure

Download the corresponding YAML file.

**Table 1** YAML files of different hardware models
Job Type	Hardware Model	YAML File Name	How to Obtain
Deployment job scheduled by Volcano	Atlas 200I SoC A1 core board	infer-deploy-310p-1usoc.yaml	Click here.
Deployment job scheduled by Volcano	Inference nodes of other types	infer-deploy.yaml	Click here.
Volcano Job	Atlas 800I A2 inference server A200I A2 Box heterogeneous component Atlas 800I A3 SuperPoD Server	infer-vcjob-910.yaml	Click here.
Ascend Job	Inference server (equipped with Atlas 300I Duo inference cards)	pytorch_acjob_infer_310p_with_ranktable.yaml	Click here.
Ascend Job	Atlas 800I A2 inference server A200I A2 Box heterogeneous component Atlas 800I A3 SuperPoD Server	pytorch_multinodes_acjob_infer_{xxx}b_with_ranktable.yaml	Click here.

Upload the YAML file to any directory on the management node and modify the YAML file by referring to Table 2.

**Table 2** Parameters in the YAML file
Parameter	Value	Description
image	-	Inference image name. Change it based on your actual requirements. (It is the name of the image created in the image preparation section.)
replicas	Integer	Number of job replicas. Generally, the value is 1.
requests	Full NPU scheduling Inference server (equipped with Atlas 300I inference cards) huawei.com/Ascend310: number of processors Atlas inference product in non-mixed insertion mode: huawei.com/Ascend310P: number of processors Atlas inference product in mixed insertion mode: huawei.com/Ascend310P-V: number of processors huawei.com/Ascend310P-VPro: number of processors huawei.com/Ascend310P-IPro: number of processors Atlas 800I A2 inference server/A200I A2 Box heterogeneous component/ Atlas 800I A3 SuperPoD Server: huawei.com/Ascend910: number of processors Static vNPU scheduling: The value is 1. Only the vNPUs of one NPU can be used. Atlas inference product in non-mixed insertion mode: huawei.com/Ascend310P-Y: 1 For example, huawei.com/Ascend310P-4c.3cpu: 1.	Type and number of requested NPUs or vNPUs. Only one type can be requested. Change them as required. For requests and limits, the processor name and quantity must be the same. NOTE: Only Atlas inference product in non-mixed insertion mode support static vNPU scheduling. Inference server (equipped with Atlas 300I inference cards) and Atlas inference product in mixed insertion mode do not support static vNPU scheduling. For details about the value of Y, see the vNPU type column of the corresponding product in the table of mapping between virtual instance templates and virtual device types in Static Virtualization. Take the Ascend310P-4c.3cpu of the vNPU type as an example. The value of Y is 4c.3cpu, excluding Ascend310P.
limits		Type and number of requested NPUs or vNPUs. Only one type can be requested. Change them as required. The processor name and quantity in limits must be the same as those in requests.
(Optional) host-arch	ARM environment: huawei-arm x86_64 environment: huawei-x86	Architecture of the node where an inference job is executed. Set this parameter as required. The Atlas 200I SoC A1 core board supports only huawei-arm.
huawei.com/recover_policy_path	pod: Only pod-level rescheduling is supported. Rescheduling at the job level is not supported.	Job rescheduling policy.
huawei.com/schedule_minAvailable	Integer	Minimum number of replicas that can be scheduled by a job.
huawei.com/schedule_policy	See Table 3 for its configurations.	Job's AI processor layout to be scheduled. Volcano selects a proper scheduling policy based on this field. If this parameter is not set, the scheduling policy is selected based on accelerator-type. NOTE: This field can be used only on the Atlas A2 inference products and Atlas A3 inference product.
servertype	soc	Server type. To schedule jobs to the Atlas 200I SoC A1 core board, add this parameter and mount the directory by referring to the infer-310p-1usoc.yaml file. This parameter is not required for other types of nodes.
metadata.annotations['huawei.com/AscendXXX']	XXX indicates the processor model. The value can be 910, 310, or 310P. The value must be the same as the actual processor type in the environment.	Ascend Docker Runtime obtains the value of this parameter and mounts NPUs of the corresponding type to a container. NOTE: This parameter is supported only by Volcano with full NPU scheduling enabled. If you use static vNPU scheduling, dynamic vNPU scheduling, and other schedulers, delete fields of this parameter from the example YAML file.
The following parameters can be used only by the inference server (equipped with Atlas 300I inference cards):
npu-310-strategy	card: scheduling by inference card. The number of Ascend AI Processors in a request cannot exceed 4, and Ascend AI Processors must be within one Atlas 300I inference card. chip: scheduling by Ascend AI Processor. The number of requested processors cannot exceed the maximum value supported by a single node.	-
schedulerName	volcano	To switch the scheduler, release all the previous scheduled jobs.
The following parameters can be used only by the inference server (equipped with Atlas 300I Duo inference cards):
duo	true: Use Atlas 300I Duo inference card. false: Not use Atlas 300I Duo inference card.	Inference card type.
npu-310-strategy	card: scheduling by inference card. The number of Ascend AI processors requested by request does not exceed 2, and the Ascend AI processor on the same Atlas 300I Duo inference card is used. chip: scheduling by Ascend AI processor. The number of Ascend AI processors requested cannot exceed the maximum value supported by a single node.	-
distributed	true: Distributed inference is used. When chip is specified, the job must be scheduled to the entire Atlas 300I Duo inference card. If the number of Ascend AI processors required by the job is an odd number, the job is preferentially scheduled to the Atlas 300I Duo inference card with one remaining AscendAI processor. false: Non-distributed inference is used. When chip is specified, the number of requested Ascend AI processors cannot exceed the maximum processor number on a single node. NOTE: The scheduling policy in card mode remains unchanged regardless of whether distributed inference is used. When distributed is set to true, only single-server multi-processor is supported. When distributed is set to false, only multi-server multi-processor is supported. If distributed is set to true, Deployment job is not supported.	Whether to use distributed inference.
The following parameters can be used only by the Atlas 800I A2 inference server, A200I A2 Box heterogeneous component, and Atlas 800I A3 SuperPoD Server:
nodeSelector	module-{xxx}b-8	Type of the node where a training job runs.
The following parameters can be used only by acjob:
ring-controller.atlas	Atlas 800I A2 inference serverand A200I A2 Box heterogeneous component: *ascend-{xxx}b* Inference server (equipped with Atlas 300I Duo inference cards): ascend-310P	Processor type.
schedulerName	The default value is volcano. Set this parameter based on your actual requirements.	Scheduler selected when Ascend Operator enables gang scheduling.
minAvailable	The default value is the total number of job replicas.	Total number of job replicas when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.
queue	The default value is default. Set this parameter based on your actual requirements.	Queue to which a job belongs when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.
(Optional) successPolicy	The default value is null. If you do not set this parameter, the default value null is used. AllWorkers	Prerequisite for a successful job. The null value indicates that if only one pod succeeds, the entire job is considered successful. The AllWorkers value indicates that all pods need to succeed for the job to be considered as successful.
container.name	ascend	The container name must be ascend.
(Optional) ports	If you do not set corresponding parameters, the system fills in the following values by default: name: ascendjob-port containerPort: 2222	Collective communication port for distributed training. The value of name can only be ascendjob-port. You can set containerPort as required. If containerPort is not set, the default port 2222 is used.

**Table 3** huawei.com/schedule_policy configuration description
Configuration	Description
chip4-node8	One node has eight processors, and four processors form an interconnection ring, for example, the processor layout of the Atlas 800 training server (model 9000) or Atlas 800 training server (model 9010).
chip1-node2	One node has two processors. For example, one Atlas 300T training card can be equipped with only one processor, and one node can be equipped with a maximum of two Atlas 300T training cards.
chip4-node4	One node has four processors, and four processors form an interconnection ring, for example, the processor layout of the Atlas 800 training server (model 9000) or Atlas 800 training server (model 9010).
chip8-node8	One node has eight processors, and eight processors form on one interconnection ring, for example, the processor layout of the Atlas 800T A2 training server.
chip8-node16	One node has 16 processors, and eight processors form on one interconnection ring, for example, the processor layout of the Atlas 200T A2 Box16 heterogeneous subrack.
chip2-node16	One node has 16 processors, and two processors form on one interconnection ring, for example, the processor layout of the Atlas 800T A3 SuperPoD Server.
chip2-node16-sp	One node has 16 processors, and two processors form on one interconnection ring, and multiple servers form a SuperPoD, for example, the processor layout of the Atlas 900 A3 SuperPoD.

Select a YAML example as required and modify the file as follows.

**Table 4** Operation examples
Feature	Operation Reference
Full NPU scheduling	Creating a Single-Processor Job on an Inference Server (with Atlas 300I Inference Cards)
	Creating a Distributed Job on an Inference Server (with Atlas 300I Duo Inference Cards)
	Creating a Single-Processor Job on Atlas Inference Products (Non-Atlas 200I SoC A1 Core Board/Atlas 300I Duo Inference Card)
	Single-Processor Job on the Atlas 200I SoC A1 Core Board
	Creating a Single-Processor Job on the Atlas 800I A2 Inference Server
Static vNPU scheduling	Creating a Single-Processor Job on Atlas Inference Products (Non-Atlas 200I SoC A1 Core Board)

Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example. Create a single-processor inference job on inference servers with Atlas 300I inference cards and enable the scheduling policy.

apiVersion: apps/v1
kind: Deployment
...
spec:
  template:
    metadata: 
      labels:
         app: infers
         host-arch: huawei-arm
         npu-310-strategy: card     # Scheduling by inference card
...
    spec:
      schedulerName: volcano        # The scheduler must be Volcano.
      nodeSelector:
         host-arch: huawei-arm    # (Optional) Set it as required.
...
      containers:
      - image: ubuntu-infer:v1
...
      env:
      - name: ASCEND_VISIBLE_DEVICES                       # This field is used by Ascend Docker Runtime.
        valueFrom:
          fieldRef:
            fieldPath: metadata.annotations['huawei.com/Ascend310']               # Be the same as resources.requests.
        resources:
          requests:
            huawei.com/Ascend310: 1                   # Number of allocated processors
          limits:
            huawei.com/Ascend310: 1
...

Refer to this configuration when using the full NPU scheduling feature. The following uses pytorch_acjob_infer_310p_with_ranktable.yaml as an example to describe how to create a distributed inference job on an inference server (with Atlas 300I Duo inference cards) and enable the scheduling policy.

apiVersion: mindxdl.gitee.com/v1
kind: AscendJob
metadata:
  name: default-infer-test
  labels:
...
    app: infers
    npu-310-strategy: chip      # Scheduling by Ascend AI processors
    distributed: "true"         # Distributed inference
    duo: "true"             # Use Atlas 300I Duo inference card.
    ring-controller.atlas: ascend-310P    # Processor type used by a job
    framework: pytorch       # Framework type

spec:
  schedulerName: volcano     # This field is valid when the startup parameter enableGangScheduling of Ascend Operator is set to true.
  runPolicy:
    schedulingPolicy:    
      minAvailable: 2  # Total number of job replicas
      queue: default      # Queue to which a job belongs
  successPolicy: AllWorkers # Prerequisites for a successful job
  replicaSpecs:
    Master:
      replicas: 1     # Number of job replicas
...
        spec:
          nodeSelector:
            servertype: Ascend310P
          containers:
       - - name: ascend                    # The value must be ascend and cannot be changed.
              image: ubuntu:22.04          # Change the image name as required.
...
                - name: ASCEND_VISIBLE_DEVICES
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.annotations['huawei.com/Ascend310P']       # Mount a processor of the corresponding type to the container.
...
              ports:                  # Collective communication port for distributed training
                - containerPort: 2222     
                  name: ascendjob-port    
              resources:
                limits:
                  huawei.com/Ascend310P: 1   # Number of allocated processors
                requests:
                  huawei.com/Ascend310P: 1  # The value must be the same as that of limits.
              volumeMounts:
...
                - name: ranktable                  
                  mountPath: /user/serverid/devindex/config
...
          volumes:
...
            - name: ranktable
              hostPath:
                path: /user/mindx-dl/ranktable/default.default-infer-test  
...
    Worker:
...
        spec:
          containers:
            - name: ascend     #The value must be ascend and cannot be changed.
              image:  ubuntu:22.04     # Change the image name as required.
              env:
...
                - name: ASCEND_VISIBLE_DEVICES
                  valueFrom:
                    fieldRef:
                       fieldPath: metadata.annotations['huawei.com/Ascend310P']      # Mount a processor of the corresponding type to the container.
...
              ports:     # Collective communication port for distributed training
                - containerPort: 2222      
                  name: ascendjob-port      
              resources:
                limits:
                  huawei.com/Ascend310P: 1   # Number of allocated processors
                requests:
                  huawei.com/Ascend310P: 1   # The value must be the same as that of limits.
              volumeMounts:
...
         # Optional. Generate RankTable files for PyTorch and MindSpore frameworks through Ascend Operator. Add the following fields in bold to set the path for storing the hccl.json file in the container.
                - name: ranktable                  
                  mountPath: /user/serverid/devindex/config
...
          volumes:
...
            # Optional. Generate a RankTable file for the PyTorch framework through Ascend Operator. Add the following fields in bold to set the path for storing the hccl.json file.
            - name: ranktable
              hostPath:
                path: /user/mindx-dl/ranktable/default.default-infer-test  # Shared storage or local storage path. Change it as required.
...

Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example to describe how to create a single-processor inference job in non-mixed insertion mode on Atlas inference product (excluding Atlas 200I SoC A1 core board and Atlas 300I Duo inference card).

apiVersion: apps/v1
kind: Deployment
...
spec:
  template:
    metadata: 
      labels:
         app: infers
...
    spec:
      affinity:        # The job is not scheduled to the Atlas 200I SoC A1 core board.
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: servertype
                    operator: NotIn
                    values:
                      - soc
      schedulerName: volcano 
      nodeSelector:
        host-arch: huawei-arm 
...
      containers:
      - image: ubuntu-infer:v1
...
      env:
      - name: ASCEND_VISIBLE_DEVICES                       # This field is used by Ascend Docker Runtime.
        valueFrom:
          fieldRef:
            fieldPath: metadata.annotations['huawei.com/Ascend310P']               # Mount a processor of the corresponding type to the container.
...
        resources:
          requests:
            huawei.com/Ascend310P: 1     # Number of allocated processors
          limits:
            huawei.com/Ascend310P: 1
...

The directories and files to be mounted to the node of the Atlas 200I SoC A1 core board are different from those to other types of nodes. To avoid inference failure, if Atlas inference product are required and the node of the Atlas 200I SoC A1 core board exists in a cluster but you do not want to schedule jobs to this type of node, add the affinity field to the example YAML file. This prevents scheduling jobs to the nodes with the servertype=soc label.

Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy-310p-1usoc.yaml as an example to describe how to create a single-processor inference job on the Atlas 200I SoC A1 core board (non-mixed insertion mode).

apiVersion: apps/v1
kind: Deployment
...
spec:
  template:
    metadata: 
      labels:
         app: infers
...
    spec:
      schedulerName: volcano 
      nodeSelector:
        host-arch: huawei-arm
        servertype: soc      # The job is scheduled only to the Atlas 200I SoC A1 core board.
...
      containers:
      - image: ubuntu-infer:v1
...
      env:
      - name: ASCEND_VISIBLE_DEVICES                       # This field is required by Ascend Docker Runtime.
        valueFrom:
          fieldRef:
            fieldPath: metadata.annotations['huawei.com/Ascend310P']               # Mount a processor of the corresponding type to the container.
...
        resources:
          requests:
            huawei.com/Ascend310P: 1     # Number of allocated processors
          limits:
            huawei.com/Ascend310P: 1
...

Refer to this configuration when using the full NPU scheduling feature. The following uses infer-vcjob-910.yaml as an example to describe how to create a single-processor inference job on the Atlas 800I A2 inference server.

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: mindx-infer-test
  namespace: vcjob                      # Select a proper namespace as required.
  labels:
    ring-controller.atlas: ascend-{xxx}b
    fault-scheduling: "force"
spec:
...
    template:
      metadata:
        labels:
          app: infer
          ring-controller.atlas: ascend-{xxx}b
      spec:
        containers:
          - image: infer_image:latest             # Name of the inference image. Input the actual image name.
...
      env:
      - name: ASCEND_VISIBLE_DEVICES                       # This field is required by Ascend Docker Runtime.
        valueFrom:
          fieldRef:
            fieldPath: metadata.annotations['huawei.com/Ascend910']               # Be the same as resources.requests.
              requests:
                huawei.com/Ascend910: 1          # Number of required processors
              limits:
                huawei.com/Ascend910: 1          # Be the same as the value of requests.
            volumeMounts:
              - name: localtime                  # The container time must be the same as the host time.
                mountPath: /etc/localtime
        nodeSelector:
          host-arch: huawei-arm                  # Set this parameter as required.
          accelerator-type: module-{xxx}b-8      # Atlas 800I A2 inference server
        volumes:
        - name: localtime
          hostPath:
            path: /etc/localtime
        restartPolicy: OnFailure

Refer to this configuration when using static vNPU scheduling. The following uses infer-deploy.yaml as an example to describe how to create an inference job using vNPUs on the Atlas inference product (non-Atlas 200I SoC A1 core board).

apiVersion: apps/v1
kind: Deployment
...
spec:
  template:
    metadata: 
      labels:
         app: infers
...
    spec:
      schedulerName: volcano 
      nodeSelector:
        host-arch: huawei-arm 
...
      containers:
      - image: ubuntu-infer:v1
...
# ASCEND_VISIBLE_DEVICES is not supported by static vNPU scheduling. Delete the following fields in bold:
        env:
        - name: ASCEND_VISIBLE_DEVICES
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['huawei.com/Ascend310P']    # Deletion ends here.
        resources:
          requests:
            huawei.com/Ascend310P-2c: 1     # The number must be 1 for vNPU scheduling.
          limits:
            huawei.com/Ascend310P-2c: 1       # The value must be the same as that of requests.
...

Mount the weight file.

...
              ports:     # Collective communication port for distributed training
                - containerPort: 2222      
                  name: ascendjob-port      
              resources:
                limits:
                  huawei.com/Ascend310P: 1   # Number of allocated processors
                requests:
                  huawei.com/Ascend310P: 1   # The value must be the same as that of limits.
              volumeMounts:
...
                  # Mount path of the weight file
                - name: weights                  
                  mountPath: /path-to-weights
...
          volumes:
...
            # Mount path of the weight file
            - name: weights
              hostPath:
                path: /path-to-weights  # Shared storage or local storage path. Change it as required.
...

/path-to-weights indicates model weights, which needs to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.

Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.

...
      containers:
      - image: ubuntu-infer:v1
...
        command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"]
        resources:
          requests:
...

Parent topic: Use on the CLI (Volcano)