Preparation of Job YAML Files
If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.
Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.
Procedure
- Download the corresponding YAML file.
Table 1 YAML files of different hardware models Job Type
Hardware Model
YAML File Name
How to Obtain
Deployment job scheduled by Volcano
Atlas 200I SoC A1 core board
infer-deploy-310p-1usoc.yaml
Inference nodes of other types
infer-deploy.yaml
Volcano Job
Atlas 800I A2 inference server
A200I A2 Box heterogeneous component
Atlas 800I A3 SuperPoD Server
infer-vcjob-910.yaml
Ascend Job
Inference server (equipped with Atlas 300I Duo inference cards)
pytorch_acjob_infer_310p_with_ranktable.yaml
Atlas 800I A2 inference server
A200I A2 Box heterogeneous component
Atlas 800I A3 SuperPoD Server
pytorch_multinodes_acjob_infer_{xxx}b_with_ranktable.yaml
- Upload the YAML file to any directory on the management node and modify the YAML file by referring to Table 2.
Table 2 Parameters in the YAML file Parameter
Value
Description
image
-
Inference image name. Change it based on your actual requirements. (It is the name of the image created in the image preparation section.)
replicas
Integer
Number of job replicas. Generally, the value is 1.
requests
Full NPU scheduling
- Inference server (equipped with Atlas 300I inference cards)
huawei.com/Ascend310: number of processors
- Atlas inference product in non-mixed insertion mode:
huawei.com/Ascend310P: number of processors
- Atlas inference product in mixed insertion mode:
- huawei.com/Ascend310P-V: number of processors
- huawei.com/Ascend310P-VPro: number of processors
- huawei.com/Ascend310P-IPro: number of processors
- Atlas 800I A2 inference server/A200I A2 Box heterogeneous component/ Atlas 800I A3 SuperPoD Server: huawei.com/Ascend910: number of processors
Static vNPU scheduling: The value is 1. Only the vNPUs of one NPU can be used.
Atlas inference product in non-mixed insertion mode: huawei.com/Ascend310P-Y: 1
For example, huawei.com/Ascend310P-4c.3cpu: 1.
Type and number of requested NPUs or vNPUs. Only one type can be requested. Change them as required. For requests and limits, the processor name and quantity must be the same.
NOTE:- Only Atlas inference product in non-mixed insertion mode support static vNPU scheduling.
- Inference server (equipped with Atlas 300I inference cards) and Atlas inference product in mixed insertion mode do not support static vNPU scheduling.
- For details about the value of Y, see the vNPU type column of the corresponding product in the table of mapping between virtual instance templates and virtual device types in Static Virtualization.
Take the Ascend310P-4c.3cpu of the vNPU type as an example. The value of Y is 4c.3cpu, excluding Ascend310P.
limits
Type and number of requested NPUs or vNPUs. Only one type can be requested. Change them as required.
The processor name and quantity in limits must be the same as those in requests.
(Optional) host-arch
ARM environment: huawei-arm
x86_64 environment: huawei-x86
Architecture of the node where an inference job is executed. Set this parameter as required. The Atlas 200I SoC A1 core board supports only huawei-arm.
huawei.com/recover_policy_path
pod: Only pod-level rescheduling is supported. Rescheduling at the job level is not supported.
Job rescheduling policy.
huawei.com/schedule_minAvailable
Integer
Minimum number of replicas that can be scheduled by a job.
huawei.com/schedule_policy
See Table 3 for its configurations.
Job's AI processor layout to be scheduled. Volcano selects a proper scheduling policy based on this field. If this parameter is not set, the scheduling policy is selected based on accelerator-type.
NOTE:This field can be used only on the
Atlas A2 inference products andAtlas A3 inference product .servertype
soc
Server type.
- To schedule jobs to the Atlas 200I SoC A1 core board, add this parameter and mount the directory by referring to the infer-310p-1usoc.yaml file.
- This parameter is not required for other types of nodes.
metadata.annotations['huawei.com/AscendXXX']
XXX indicates the processor model. The value can be 910, 310, or 310P. The value must be the same as the actual processor type in the environment.
Ascend Docker Runtime obtains the value of this parameter and mounts NPUs of the corresponding type to a container.
NOTE:This parameter is supported only by Volcano with full NPU scheduling enabled. If you use static vNPU scheduling, dynamic vNPU scheduling, and other schedulers, delete fields of this parameter from the example YAML file.
The following parameters can be used only by the inference server (equipped with Atlas 300I inference cards):
npu-310-strategy
- card: scheduling by inference card. The number of Ascend AI Processors in a request cannot exceed 4, and Ascend AI Processors must be within one Atlas 300I inference card.
- chip: scheduling by Ascend AI Processor. The number of requested processors cannot exceed the maximum value supported by a single node.
-
schedulerName
volcano
To switch the scheduler, release all the previous scheduled jobs.
The following parameters can be used only by the inference server (equipped with Atlas 300I Duo inference cards):
duo
- true: Use Atlas 300I Duo inference card.
- false: Not use Atlas 300I Duo inference card.
Inference card type.
npu-310-strategy
- card: scheduling by inference card. The number of Ascend AI processors requested by request does not exceed 2, and the Ascend AI processor on the same Atlas 300I Duo inference card is used.
- chip: scheduling by Ascend AI processor. The number of Ascend AI processors requested cannot exceed the maximum value supported by a single node.
-
distributed
- true: Distributed inference is used. When chip is specified, the job must be scheduled to the entire Atlas 300I Duo inference card. If the number of Ascend AI processors required by the job is an odd number, the job is preferentially scheduled to the Atlas 300I Duo inference card with one remaining AscendAI processor.
- false: Non-distributed inference is used. When chip is specified, the number of requested Ascend AI processors cannot exceed the maximum processor number on a single node.NOTE:
- The scheduling policy in card mode remains unchanged regardless of whether distributed inference is used.
- When distributed is set to true, only single-server multi-processor is supported. When distributed is set to false, only multi-server multi-processor is supported.
- If distributed is set to true, Deployment job is not supported.
Whether to use distributed inference.
The following parameters can be used only by the Atlas 800I A2 inference server, A200I A2 Box heterogeneous component, and Atlas 800I A3 SuperPoD Server:
nodeSelector
module-{xxx}b-8
Type of the node where a training job runs.
The following parameters can be used only by acjob:
ring-controller.atlas
- Atlas 800I A2 inference serverand A200I A2 Box heterogeneous component: ascend-{xxx}b
- Inference server (equipped with Atlas 300I Duo inference cards): ascend-310P
Processor type.
schedulerName
The default value is volcano. Set this parameter based on your actual requirements.
Scheduler selected when Ascend Operator enables gang scheduling.
minAvailable
The default value is the total number of job replicas.
Total number of job replicas when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.
queue
The default value is default. Set this parameter based on your actual requirements.
Queue to which a job belongs when Ascend Operator enables gang scheduling and Volcano is used as the scheduler.
(Optional) successPolicy
- The default value is null. If you do not set this parameter, the default value null is used.
- AllWorkers
Prerequisite for a successful job. The null value indicates that if only one pod succeeds, the entire job is considered successful. The AllWorkers value indicates that all pods need to succeed for the job to be considered as successful.
container.name
ascend
The container name must be ascend.
(Optional) ports
If you do not set corresponding parameters, the system fills in the following values by default:
- name: ascendjob-port
- containerPort: 2222
Collective communication port for distributed training. The value of name can only be ascendjob-port. You can set containerPort as required. If containerPort is not set, the default port 2222 is used.
Table 3 huawei.com/schedule_policy configuration description Configuration
Description
chip4-node8
One node has eight processors, and four processors form an interconnection ring, for example, the processor layout of the Atlas 800 training server (model 9000) or Atlas 800 training server (model 9010).
chip1-node2
One node has two processors. For example, one Atlas 300T training card can be equipped with only one processor, and one node can be equipped with a maximum of two Atlas 300T training cards.
chip4-node4
One node has four processors, and four processors form an interconnection ring, for example, the processor layout of the Atlas 800 training server (model 9000) or Atlas 800 training server (model 9010).
chip8-node8
One node has eight processors, and eight processors form on one interconnection ring, for example, the processor layout of the Atlas 800T A2 training server.
chip8-node16
One node has 16 processors, and eight processors form on one interconnection ring, for example, the processor layout of the Atlas 200T A2 Box16 heterogeneous subrack.
chip2-node16
One node has 16 processors, and two processors form on one interconnection ring, for example, the processor layout of the Atlas 800T A3 SuperPoD Server.
chip2-node16-sp
One node has 16 processors, and two processors form on one interconnection ring, and multiple servers form a SuperPoD, for example, the processor layout of the Atlas 900 A3 SuperPoD.
- Inference server (equipped with Atlas 300I inference cards)
- Select a YAML example as required and modify the file as follows.
Table 4 Operation examples Feature
Operation Reference
Full NPU scheduling
Creating a Single-Processor Job on an Inference Server (with Atlas 300I Inference Cards)
Creating a Distributed Job on an Inference Server (with Atlas 300I Duo Inference Cards)
Creating a Single-Processor Job on the Atlas 800I A2 Inference Server
Static vNPU scheduling
Creating a Single-Processor Job on Atlas Inference Products (Non-Atlas 200I SoC A1 Core Board)
- Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example. Create a single-processor inference job on inference servers with Atlas 300I inference cards and enable the scheduling policy.
apiVersion: apps/v1 kind: Deployment ... spec: template: metadata: labels: app: infers host-arch: huawei-arm npu-310-strategy: card # Scheduling by inference card ... spec: schedulerName: volcano # The scheduler must be Volcano. nodeSelector: host-arch: huawei-arm # (Optional) Set it as required. ... containers: - image: ubuntu-infer:v1 ... env: - name: ASCEND_VISIBLE_DEVICES # This field is used by Ascend Docker Runtime. valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend310'] # Be the same as resources.requests. resources: requests: huawei.com/Ascend310: 1 # Number of allocated processors limits: huawei.com/Ascend310: 1 ... - Refer to this configuration when using the full NPU scheduling feature. The following uses pytorch_acjob_infer_310p_with_ranktable.yaml as an example to describe how to create a distributed inference job on an inference server (with Atlas 300I Duo inference cards) and enable the scheduling policy.
apiVersion: mindxdl.gitee.com/v1 kind: AscendJob metadata: name: default-infer-test labels: ... app: infers npu-310-strategy: chip # Scheduling by Ascend AI processors distributed: "true" # Distributed inference duo: "true" # Use Atlas 300I Duo inference card. ring-controller.atlas: ascend-310P # Processor type used by a job framework: pytorch # Framework type spec: schedulerName: volcano # This field is valid when the startup parameter enableGangScheduling of Ascend Operator is set to true. runPolicy: schedulingPolicy: minAvailable: 2 # Total number of job replicas queue: default # Queue to which a job belongs successPolicy: AllWorkers # Prerequisites for a successful job replicaSpecs: Master: replicas: 1 # Number of job replicas ... spec: nodeSelector: servertype: Ascend310P containers: - - name: ascend # The value must be ascend and cannot be changed. image: ubuntu:22.04 # Change the image name as required. ... - name: ASCEND_VISIBLE_DEVICES valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend310P'] # Mount a processor of the corresponding type to the container. ... ports: # Collective communication port for distributed training - containerPort: 2222 name: ascendjob-port resources: limits: huawei.com/Ascend310P: 1 # Number of allocated processors requests: huawei.com/Ascend310P: 1 # The value must be the same as that of limits. volumeMounts: ... - name: ranktable mountPath: /user/serverid/devindex/config ... volumes: ... - name: ranktable hostPath: path: /user/mindx-dl/ranktable/default.default-infer-test ... Worker: ... spec: containers: - name: ascend #The value must be ascend and cannot be changed. image: ubuntu:22.04 # Change the image name as required. env: ... - name: ASCEND_VISIBLE_DEVICES valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend310P'] # Mount a processor of the corresponding type to the container. ... ports: # Collective communication port for distributed training - containerPort: 2222 name: ascendjob-port resources: limits: huawei.com/Ascend310P: 1 # Number of allocated processors requests: huawei.com/Ascend310P: 1 # The value must be the same as that of limits. volumeMounts: ... # Optional. Generate RankTable files for PyTorch and MindSpore frameworks through Ascend Operator. Add the following fields in bold to set the path for storing the hccl.json file in the container. - name: ranktable mountPath: /user/serverid/devindex/config ... volumes: ... # Optional. Generate a RankTable file for the PyTorch framework through Ascend Operator. Add the following fields in bold to set the path for storing the hccl.json file. - name: ranktable hostPath: path: /user/mindx-dl/ranktable/default.default-infer-test # Shared storage or local storage path. Change it as required. ... - Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example to describe how to create a single-processor inference job in non-mixed insertion mode on Atlas inference product (excluding Atlas 200I SoC A1 core board and Atlas 300I Duo inference card).
apiVersion: apps/v1 kind: Deployment ... spec: template: metadata: labels: app: infers ... spec: affinity: # The job is not scheduled to the Atlas 200I SoC A1 core board. nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: servertype operator: NotIn values: - soc schedulerName: volcano nodeSelector: host-arch: huawei-arm ... containers: - image: ubuntu-infer:v1 ... env: - name: ASCEND_VISIBLE_DEVICES # This field is used by Ascend Docker Runtime. valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend310P'] # Mount a processor of the corresponding type to the container. ... resources: requests: huawei.com/Ascend310P: 1 # Number of allocated processors limits: huawei.com/Ascend310P: 1 ...
The directories and files to be mounted to the node of the Atlas 200I SoC A1 core board are different from those to other types of nodes. To avoid inference failure, if Atlas inference product are required and the node of the Atlas 200I SoC A1 core board exists in a cluster but you do not want to schedule jobs to this type of node, add the affinity field to the example YAML file. This prevents scheduling jobs to the nodes with the servertype=soc label.
- Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy-310p-1usoc.yaml as an example to describe how to create a single-processor inference job on the Atlas 200I SoC A1 core board (non-mixed insertion mode).
apiVersion: apps/v1 kind: Deployment ... spec: template: metadata: labels: app: infers ... spec: schedulerName: volcano nodeSelector: host-arch: huawei-arm servertype: soc # The job is scheduled only to the Atlas 200I SoC A1 core board. ... containers: - image: ubuntu-infer:v1 ... env: - name: ASCEND_VISIBLE_DEVICES # This field is required by Ascend Docker Runtime. valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend310P'] # Mount a processor of the corresponding type to the container. ... resources: requests: huawei.com/Ascend310P: 1 # Number of allocated processors limits: huawei.com/Ascend310P: 1 ... - Refer to this configuration when using the full NPU scheduling feature. The following uses infer-vcjob-910.yaml as an example to describe how to create a single-processor inference job on the Atlas 800I A2 inference server.
apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: mindx-infer-test namespace: vcjob # Select a proper namespace as required. labels: ring-controller.atlas: ascend-{xxx}b fault-scheduling: "force" spec: ... template: metadata: labels: app: infer ring-controller.atlas: ascend-{xxx}b spec: containers: - image: infer_image:latest # Name of the inference image. Input the actual image name. ... env: - name: ASCEND_VISIBLE_DEVICES # This field is required by Ascend Docker Runtime. valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend910'] # Be the same as resources.requests. requests: huawei.com/Ascend910: 1 # Number of required processors limits: huawei.com/Ascend910: 1 # Be the same as the value of requests. volumeMounts: - name: localtime # The container time must be the same as the host time. mountPath: /etc/localtime nodeSelector: host-arch: huawei-arm # Set this parameter as required. accelerator-type: module-{xxx}b-8 # Atlas 800I A2 inference server volumes: - name: localtime hostPath: path: /etc/localtime restartPolicy: OnFailure - Refer to this configuration when using static vNPU scheduling. The following uses infer-deploy.yaml as an example to describe how to create an inference job using vNPUs on the Atlas inference product (non-Atlas 200I SoC A1 core board).
apiVersion: apps/v1 kind: Deployment ... spec: template: metadata: labels: app: infers ... spec: schedulerName: volcano nodeSelector: host-arch: huawei-arm ... containers: - image: ubuntu-infer:v1 ... # ASCEND_VISIBLE_DEVICES is not supported by static vNPU scheduling. Delete the following fields in bold: env: - name: ASCEND_VISIBLE_DEVICES valueFrom: fieldRef: fieldPath: metadata.annotations['huawei.com/Ascend310P'] # Deletion ends here. resources: requests: huawei.com/Ascend310P-2c: 1 # The number must be 1 for vNPU scheduling. limits: huawei.com/Ascend310P-2c: 1 # The value must be the same as that of requests. ...
- Refer to this configuration when using the full NPU scheduling feature. The following uses infer-deploy.yaml as an example. Create a single-processor inference job on inference servers with Atlas 300I inference cards and enable the scheduling policy.
- Mount the weight file.
... ports: # Collective communication port for distributed training - containerPort: 2222 name: ascendjob-port resources: limits: huawei.com/Ascend310P: 1 # Number of allocated processors requests: huawei.com/Ascend310P: 1 # The value must be the same as that of limits. volumeMounts: ... # Mount path of the weight file - name: weights mountPath: /path-to-weights ... volumes: ... # Mount path of the weight file - name: weights hostPath: path: /path-to-weights # Shared storage or local storage path. Change it as required. ...
- /path-to-weights indicates model weights, which needs to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
- The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.
- Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.
... containers: - image: ubuntu-infer:v1 ... command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"] resources: requests: ...