Creating a Job YAML File
Procedure
- Download the YAML file from cluster scheduling componentGitee code repository.
Table 1 YAML files of different job types and hardware models Job Type
Hardware Model
YAML File Path
Jobs in Kubernetes or other scheduler scenarios
Atlas 200I Soc A1 core board
samples/inference/infer-310p-1usoc.yaml
Inference nodes of other types
samples/inference/infer.yaml
Deployment jobs scheduled by Volcano
Atlas 200I Soc A1 core board
samples/inference/infer-deploy-310p-1usoc.yaml
Inference nodes of other types
samples/inference/infer-deploy.yaml
- Upload the YAML file to any directory on the master node and modify the file content as required.
Table 2 Parameters in the YAML file Parameter
Value
Description
image
-
Inference image name. Set this parameter as required.
replicas
Integer
replicas: number of job replicas. Generally, the value is 1.
requests
Ascend 310 environment: huawei.com/Ascend310: number_of_processors
Ascend 310P environment: huawei.com/Ascend310P: number_of_processors
Example: huawei.com/Ascend310: 1
Type and number of requested NPUs. Change them as required. For these two parameters, the processor names and quantity must be the same.
limits
host-arch
ARM environment: huawei-arm
x86 environment: huawei-x86
Architecture of the node where an inference job is executed. Set this parameter as required. The Atlas 200I Soc A1 core board node supports only huawei-arm.
servertype
soc
Server type
- To schedule jobs to the Atlas 200I Soc A1 core board node, add this parameter and mount the directory by referring to the infer-310p-1usoc.yaml file.
- This parameter is not required for other types of nodes.
The following uses infer-310p-1usoc.yaml as an example to describe how to set parameters.
apiVersion: batch/v1 kind: Job metadata: name: resnetinfer1-1-1usoc spec: ... nodeSelector: host-arch: huawei-arm servertype: soc containers: - image: ubuntu-infer:v1 ... resources: requests: huawei.com/Ascend310P: 1 limits: huawei.com/Ascend310P: 1 ... - The directories and files to be mounted to the Atlas 200I Soc A1 core board node are different from those to other types of nodes. To avoid inference failure, if the Ascend 310P processor is required and the Atlas 200I Soc A1 core board node exists in the cluster but you do not want to schedule the job to this type of node, add the affinity field to the example YAML file, indicating not to schedule jobs to the nodes with the servertype=soc label.
- The following gives an example of infer.yaml, which is the YAML file for a job.
apiVersion: batch/v1 kind: Job metadata: name: resnetinfer1-1 spec: template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: servertype operator: NotIn values: - soc nodeSelector: host-arch: huawei-arm ... - The following gives an example of infer-deployment.yaml, which is the YAML file for a Deployment job.
apiVersion: apps/v1 kind: Deployment ... spec: template: metadata: labels: app: infers ... spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: servertype operator: NotIn values: - soc schedulerName: volcano nodeSelector: host-arch: huawei-arm ...
- The following gives an example of infer.yaml, which is the YAML file for a job.