Preparation of Job YAML Files

If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.

Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.

Procedure

  1. Download the corresponding YAML file.
    Table 1 YAML files of different hardware models

    Job Type

    Hardware Model

    YAML File Name

    How to Obtain

    Deployment job scheduled by Volcano

    Atlas 200I SoC A1 core board

    infer-deploy-310p-1usoc.yaml

    Click here.

    Inference nodes of other types

    infer-deploy.yaml

    Click here.

    Volcano Job

    Atlas 800I A2 inference server

    A200I A2 Box heterogeneous component

    Atlas 800I A3 SuperPoD Server

    infer-vcjob-910.yaml

    Click here.

    Ascend Job

    Inference server (equipped with Atlas 300I Duo inference cards)

    pytorch_acjob_infer_310p_with_ranktable.yaml

    Click here.

    Atlas 800I A2 inference server

    A200I A2 Box heterogeneous component

    Atlas 800I A3 SuperPoD Server

    pytorch_multinodes_acjob_infer_{xxx}b_with_ranktable.yaml

    Click here.

  2. Upload the YAML file to any directory on the management node and modify the YAML file by referring to Table 2.
    Table 2 Parameters in the YAML file

    Parameter

    Value

    Description

    image

    -

    Inference image name. Change it based on your actual requirements. (It is the name of the image created in the image preparation section.)

    replicas

    Integer

    Number of job replicas. Generally, the value is 1.

    requests

    Full NPU scheduling

    • Inference server (equipped with Atlas 300I inference cards)

      huawei.com/Ascend310: number of processors

    • Atlas inference product in non-mixed insertion mode:

      huawei.com/Ascend310P: number of processors

    • Atlas inference product in mixed insertion mode:
      • huawei.com/Ascend310P-V: number of processors
      • huawei.com/Ascend310P-VPro: number of processors
      • huawei.com/Ascend310P-IPro: number of processors
    • Atlas 800I A2 inference server/A200I A2 Box heterogeneous component/ Atlas 800I A3 SuperPoD Server: huawei.com/Ascend910: number of processors

    Static vNPU scheduling: The value is 1. Only the vNPUs of one NPU can be used.

    Atlas inference product in non-mixed insertion mode: huawei.com/Ascend310P-Y: 1

    For example, huawei.com/Ascend310P-4c.3cpu: 1.

    Type and number of requested NPUs or vNPUs. Only one type can be requested. Change them as required. For requests and limits, the processor name and quantity must be the same.

    NOTE:
    • Only Atlas inference product in non-mixed insertion mode support static vNPU scheduling.
    • Inference server (equipped with Atlas 300I inference cards) and Atlas inference product in mixed insertion mode do not support static vNPU scheduling.
    • For details about the value of Y, see the vNPU type column of the corresponding product in the table of mapping between virtual instance templates and virtual device types in Static Virtualization.

      Take the Ascend310P-4c.3cpu of the vNPU type as an example. The value of Y is 4c.3cpu, excluding Ascend310P.

    limits

    Type and number of requested NPUs or vNPUs. Only one type can be requested. Change them as required.

    The processor name and quantity in limits must be the same as those in requests.

    (Optional) host-arch

    ARM environment: huawei-arm

    x86_64 environment: huawei-x86

    Architecture of the node where an inference job is executed. Set this parameter as required. The Atlas 200I SoC A1 core board supports only huawei-arm.

    huawei.com/recover_policy_path

    pod: Only pod-level rescheduling is supported. Rescheduling at the job level is not supported.

    Job rescheduling policy.

    huawei.com/schedule_minAvailable

    Integer

    Minimum number of replicas that can be scheduled by a job.

    huawei.com/schedule_policy

    See Table 3 for its configurations.

    Job's AI processor layout to be scheduled. Volcano selects a proper scheduling policy based on this field. If this parameter is not set, the scheduling policy is selected based on accelerator-type.

    NOTE:

    This field can be used only on the Atlas A2 inference products and Atlas A3 inference product.

    servertype

    soc

    Server type.

    • To schedule jobs to the Atlas 200I SoC A1 core board, add this parameter and mount the directory by referring to the infer-310p-1usoc.yaml file.
    • This parameter is not required for other types of nodes.

    metadata.annotations['huawei.com/AscendXXX']

    XXX indicates the processor model. The value can be 910, 310, or 310P. The value must be the same as the actual processor type in the environment.

    Ascend Docker Runtime obtains the value of this parameter and mounts NPUs of the corresponding type to a container.

    NOTE:

    This parameter is supported only by Volcano with full NPU scheduling enabled. If you use static vNPU scheduling, dynamic vNPU scheduling, and other schedulers, delete fields of this parameter from the example YAML file.

    The following parameters can be used only by the inference server (equipped with Atlas 300I inference cards):

    npu-310-strategy

    • card: scheduling by inference card. The number of Ascend AI Processors in a request cannot exceed 4, and Ascend AI Processors must be within one Atlas 300I inference card.
    • chip: scheduling by Ascend AI Processor. The number of requested processors cannot exceed the maximum value supported by a single node.

    -

    schedulerName

    volcano

    To switch the scheduler, release all the previous scheduled jobs.

    The following parameters can be used only by the inference server (equipped with Atlas 300I Duo inference cards):

    duo

    • true: Use Atlas 300I Duo inference card.
    • false: Not use Atlas 300I Duo inference card.

    Inference card type.

    npu-310-strategy

    • card: scheduling by inference card. The number of Ascend AI processors requested by request does not exceed 2, and the Ascend AI processor on the same Atlas 300I Duo inference card is used.
    • chip: scheduling by Ascend AI processor. The number of Ascend AI processors requested cannot exceed the maximum value supported by a single node.

    -

    distributed

    • true: Distributed inference is used. When chip is specified, the job must be scheduled to the entire Atlas 300I Duo inference card. If the number of Ascend AI processors required by the job is an odd number, the job is preferentially scheduled to the Atlas 300I Duo inference card with one remaining AscendAI processor.
    • false: Non-distributed inference is used. When chip is specified, the number of requested Ascend AI processors cannot exceed the maximum processor number on a single node.
      NOTE:
      • The scheduling policy in card mode remains unchanged regardless of whether distributed inference is used.
      • When distributed is set to true, only single-server multi-processor is supported. When distributed is set to false, only multi-server multi-processor is supported.
      • If distributed is set to true, Deployment job is not supported.

    Whether to use distributed inference.

    The following parameters can be used only by the Atlas 800I A2 inference server, A200I A2 Box heterogeneous component, and Atlas 800I A3 SuperPoD Server:

    nodeSelector

    module-{xxx}b-8

    Type of the node where a training job runs.

    The following parameters can be used only by acjob:

    ring-controller.atlas

    • Atlas 800I A2 inference serverand A200I A2 Box heterogeneous component: ascend-{xxx}b
    • Inference server (equipped with Atlas 300I Duo inference cards): ascend-310P

    Processor type.

    schedulerName

    The default value is volcano. Set this parameter based on your actual requirements.

    Scheduler selected when Ascend Operator enables gang scheduling.

    minAvailable

    The default value is the total number of job replicas.

    Total number of job replicas when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.

    queue

    The default value is default. Set this parameter based on your actual requirements.

    Queue to which a job belongs when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.

    (Optional) successPolicy

    • The default value is null. If you do not set this parameter, the default value null is used.
    • AllWorkers

    Prerequisite for a successful job. The null value indicates that if only one pod succeeds, the entire job is considered successful. The AllWorkers value indicates that all pods need to succeed for the job to be considered as successful.

    container.name

    ascend

    The container name must be ascend.

    (Optional) ports

    If you do not set corresponding parameters, the system fills in the following values by default:

    • name: ascendjob-port
    • containerPort: 2222

    Collective communication port for distributed training. The value of name can only be ascendjob-port. You can set containerPort as required. If containerPort is not set, the default port 2222 is used.

    Table 3 huawei.com/schedule_policy configuration description

    Configuration

    Description

    chip4-node8

    One node has eight processors, and four processors form an interconnection ring, for example, the processor layout of the Atlas 800 training server (model 9000) or Atlas 800 training server (model 9010).

    chip1-node2

    One node has two processors. For example, one Atlas 300T training card can be equipped with only one processor, and one node can be equipped with a maximum of two Atlas 300T training cards.

    chip4-node4

    One node has four processors, and four processors form an interconnection ring, for example, the processor layout of the Atlas 800 training server (model 9000) or Atlas 800 training server (model 9010).

    chip8-node8

    One node has eight processors, and eight processors form on one interconnection ring, for example, the processor layout of the Atlas 800T A2 training server.

    chip8-node16

    One node has 16 processors, and eight processors form on one interconnection ring, for example, the processor layout of the Atlas 200T A2 Box16 heterogeneous subrack.

    chip2-node16

    One node has 16 processors, and two processors form on one interconnection ring, for example, the processor layout of the Atlas 800T A3 SuperPoD Server.

    chip2-node16-sp

    One node has 16 processors, and two processors form on one interconnection ring, and multiple servers form a SuperPoD, for example, the processor layout of the Atlas 900 A3 SuperPoD.

  3. Select a YAML example as required and modify the file as follows.
    • Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example. Create a single-processor inference job on inference servers with Atlas 300I inference cards and enable the scheduling policy.
      apiVersion: apps/v1
      kind: Deployment
      ...
      spec:
        template:
          metadata: 
            labels:
               app: infers
               host-arch: huawei-arm
               npu-310-strategy: card     # Scheduling by inference card
      ...
          spec:
            schedulerName: volcano        # The scheduler must be Volcano.
            nodeSelector:
               host-arch: huawei-arm    # (Optional) Set it as required.
      ...
            containers:
            - image: ubuntu-infer:v1
      ...
            env:
            - name: ASCEND_VISIBLE_DEVICES                       # This field is used by Ascend Docker Runtime.
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['huawei.com/Ascend310']               # Be the same as resources.requests.
              resources:
                requests:
                  huawei.com/Ascend310: 1                   # Number of allocated processors
                limits:
                  huawei.com/Ascend310: 1
      ...
    • Refer to this configuration when using the full NPU scheduling feature. The following uses pytorch_acjob_infer_310p_with_ranktable.yaml as an example to describe how to create a distributed inference job on an inference server (with Atlas 300I Duo inference cards) and enable the scheduling policy.
      apiVersion: mindxdl.gitee.com/v1
      kind: AscendJob
      metadata:
        name: default-infer-test
        labels:
      ...
          app: infers
          npu-310-strategy: chip      # Scheduling by Ascend AI processors
          distributed: "true"         # Distributed inference
          duo: "true"             # Use Atlas 300I Duo inference card.
          ring-controller.atlas: ascend-310P    # Processor type used by a job
          framework: pytorch       # Framework type
      
      spec:
        schedulerName: volcano     # This field is valid when the startup parameter enableGangScheduling of Ascend Operator is set to true.
        runPolicy:
          schedulingPolicy:    
            minAvailable: 2  # Total number of job replicas
            queue: default      # Queue to which a job belongs
        successPolicy: AllWorkers # Prerequisites for a successful job
        replicaSpecs:
          Master:
            replicas: 1     # Number of job replicas
      ...
              spec:
                nodeSelector:
                  servertype: Ascend310P
                containers:
             - - name: ascend                    # The value must be ascend and cannot be changed.
                    image: ubuntu:22.04          # Change the image name as required.
      ...
                      - name: ASCEND_VISIBLE_DEVICES
                        valueFrom:
                          fieldRef:
                            fieldPath: metadata.annotations['huawei.com/Ascend310P']       # Mount a processor of the corresponding type to the container.
      ...
                    ports:                  # Collective communication port for distributed training
                      - containerPort: 2222     
                        name: ascendjob-port    
                    resources:
                      limits:
                        huawei.com/Ascend310P: 1   # Number of allocated processors
                      requests:
                        huawei.com/Ascend310P: 1  # The value must be the same as that of limits.
                    volumeMounts:
      ...
                      - name: ranktable                  
                        mountPath: /user/serverid/devindex/config
      ...
                volumes:
      ...
                  - name: ranktable
                    hostPath:
                      path: /user/mindx-dl/ranktable/default.default-infer-test  
      ...
          Worker:
      ...
              spec:
                containers:
                  - name: ascend     #The value must be ascend and cannot be changed.
                    image:  ubuntu:22.04     # Change the image name as required.
                    env:
      ...
                      - name: ASCEND_VISIBLE_DEVICES
                        valueFrom:
                          fieldRef:
                             fieldPath: metadata.annotations['huawei.com/Ascend310P']      # Mount a processor of the corresponding type to the container.
      ...
                    ports:     # Collective communication port for distributed training
                      - containerPort: 2222      
                        name: ascendjob-port      
                    resources:
                      limits:
                        huawei.com/Ascend310P: 1   # Number of allocated processors
                      requests:
                        huawei.com/Ascend310P: 1   # The value must be the same as that of limits.
                    volumeMounts:
      ...
               # Optional. Generate RankTable files for PyTorch and MindSpore frameworks through Ascend Operator. Add the following fields in bold to set the path for storing the hccl.json file in the container.
                      - name: ranktable                  
                        mountPath: /user/serverid/devindex/config
      ...
                volumes:
      ...
                  # Optional. Generate a RankTable file for the PyTorch framework through Ascend Operator. Add the following fields in bold to set the path for storing the hccl.json file.
                  - name: ranktable
                    hostPath:
                      path: /user/mindx-dl/ranktable/default.default-infer-test  # Shared storage or local storage path. Change it as required.
      ...
    • Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example to describe how to create a single-processor inference job in non-mixed insertion mode on Atlas inference product (excluding Atlas 200I SoC A1 core board and Atlas 300I Duo inference card).
      apiVersion: apps/v1
      kind: Deployment
      ...
      spec:
        template:
          metadata: 
            labels:
               app: infers
      ...
          spec:
            affinity:        # The job is not scheduled to the Atlas 200I SoC A1 core board.
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: servertype
                          operator: NotIn
                          values:
                            - soc
            schedulerName: volcano 
            nodeSelector:
              host-arch: huawei-arm 
      ...
            containers:
            - image: ubuntu-infer:v1
      ...
            env:
            - name: ASCEND_VISIBLE_DEVICES                       # This field is used by Ascend Docker Runtime.
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['huawei.com/Ascend310P']               # Mount a processor of the corresponding type to the container.
      ...
              resources:
                requests:
                  huawei.com/Ascend310P: 1     # Number of allocated processors
                limits:
                  huawei.com/Ascend310P: 1
      ...

      The directories and files to be mounted to the node of the Atlas 200I SoC A1 core board are different from those to other types of nodes. To avoid inference failure, if Atlas inference product are required and the node of the Atlas 200I SoC A1 core board exists in a cluster but you do not want to schedule jobs to this type of node, add the affinity field to the example YAML file. This prevents scheduling jobs to the nodes with the servertype=soc label.

    • Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy-310p-1usoc.yaml as an example to describe how to create a single-processor inference job on the Atlas 200I SoC A1 core board (non-mixed insertion mode).
      apiVersion: apps/v1
      kind: Deployment
      ...
      spec:
        template:
          metadata: 
            labels:
               app: infers
      ...
          spec:
            schedulerName: volcano 
            nodeSelector:
              host-arch: huawei-arm
              servertype: soc      # The job is scheduled only to the Atlas 200I SoC A1 core board.
      ...
            containers:
            - image: ubuntu-infer:v1
      ...
            env:
            - name: ASCEND_VISIBLE_DEVICES                       # This field is required by Ascend Docker Runtime.
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['huawei.com/Ascend310P']               # Mount a processor of the corresponding type to the container.
      ...
              resources:
                requests:
                  huawei.com/Ascend310P: 1     # Number of allocated processors
                limits:
                  huawei.com/Ascend310P: 1
      ...
    • Refer to this configuration when using the full NPU scheduling feature. The following uses infer-vcjob-910.yaml as an example to describe how to create a single-processor inference job on the Atlas 800I A2 inference server.
      apiVersion: batch.volcano.sh/v1alpha1
      kind: Job
      metadata:
        name: mindx-infer-test
        namespace: vcjob                      # Select a proper namespace as required.
        labels:
          ring-controller.atlas: ascend-{xxx}b
          fault-scheduling: "force"
      spec:
      ...
          template:
            metadata:
              labels:
                app: infer
                ring-controller.atlas: ascend-{xxx}b
            spec:
              containers:
                - image: infer_image:latest             # Name of the inference image. Input the actual image name.
      ...
            env:
            - name: ASCEND_VISIBLE_DEVICES                       # This field is required by Ascend Docker Runtime.
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['huawei.com/Ascend910']               # Be the same as resources.requests.
                    requests:
                      huawei.com/Ascend910: 1          # Number of required processors
                    limits:
                      huawei.com/Ascend910: 1          # Be the same as the value of requests.
                  volumeMounts:
                    - name: localtime                  # The container time must be the same as the host time.
                      mountPath: /etc/localtime
              nodeSelector:
                host-arch: huawei-arm                  # Set this parameter as required.
                accelerator-type: module-{xxx}b-8      # Atlas 800I A2 inference server
              volumes:
              - name: localtime
                hostPath:
                  path: /etc/localtime
              restartPolicy: OnFailure
    • Refer to this configuration when using static vNPU scheduling. The following uses infer-deploy.yaml as an example to describe how to create an inference job using vNPUs on the Atlas inference product (non-Atlas 200I SoC A1 core board).
      apiVersion: apps/v1
      kind: Deployment
      ...
      spec:
        template:
          metadata: 
            labels:
               app: infers
      ...
          spec:
            schedulerName: volcano 
            nodeSelector:
              host-arch: huawei-arm 
      ...
            containers:
            - image: ubuntu-infer:v1
      ...
      # ASCEND_VISIBLE_DEVICES is not supported by static vNPU scheduling. Delete the following fields in bold:
              env:
              - name: ASCEND_VISIBLE_DEVICES
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.annotations['huawei.com/Ascend310P']    # Deletion ends here.
              resources:
                requests:
                  huawei.com/Ascend310P-2c: 1     # The number must be 1 for vNPU scheduling.
                limits:
                  huawei.com/Ascend310P-2c: 1       # The value must be the same as that of requests.
      ...
  4. Mount the weight file.
    ...
                  ports:     # Collective communication port for distributed training
                    - containerPort: 2222      
                      name: ascendjob-port      
                  resources:
                    limits:
                      huawei.com/Ascend310P: 1   # Number of allocated processors
                    requests:
                      huawei.com/Ascend310P: 1   # The value must be the same as that of limits.
                  volumeMounts:
    ...
                      # Mount path of the weight file
                    - name: weights                  
                      mountPath: /path-to-weights
    ...
              volumes:
    ...
                # Mount path of the weight file
                - name: weights
                  hostPath:
                    path: /path-to-weights  # Shared storage or local storage path. Change it as required.
    ...
    • /path-to-weights indicates model weights, which needs to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
    • The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.
  5. Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.
    ...
          containers:
          - image: ubuntu-infer:v1
    ...
            command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"]
            resources:
              requests:
    ...