Creating a Job YAML File

Procedure

  1. Download the YAML file from cluster scheduling componentGitee code repository.
    Table 1 YAML files of different job types and hardware models

    Job Type

    Hardware Model

    YAML File Path

    Jobs in Kubernetes or other scheduler scenarios

    Atlas 200I Soc A1 core board

    samples/inference/infer-310p-1usoc.yaml

    Inference nodes of other types

    samples/inference/infer.yaml

    Deployment jobs scheduled by Volcano

    Atlas 200I Soc A1 core board

    samples/inference/infer-deploy-310p-1usoc.yaml

    Inference nodes of other types

    samples/inference/infer-deploy.yaml

  2. Upload the YAML file to any directory on the master node and modify the file content as required.
    Table 2 Parameters in the YAML file

    Parameter

    Value

    Description

    image

    -

    Inference image name. Set this parameter as required.

    replicas

    Integer

    replicas: number of job replicas. Generally, the value is 1.

    requests

    Ascend 310 environment: huawei.com/Ascend310: number_of_processors

    Ascend 310P environment: huawei.com/Ascend310P: number_of_processors

    Example: huawei.com/Ascend310: 1

    Type and number of requested NPUs. Change them as required. For these two parameters, the processor names and quantity must be the same.

    limits

    host-arch

    ARM environment: huawei-arm

    x86 environment: huawei-x86

    Architecture of the node where an inference job is executed. Set this parameter as required. The Atlas 200I Soc A1 core board node supports only huawei-arm.

    servertype

    soc

    Server type

    • To schedule jobs to the Atlas 200I Soc A1 core board node, add this parameter and mount the directory by referring to the infer-310p-1usoc.yaml file.
    • This parameter is not required for other types of nodes.

    The following uses infer-310p-1usoc.yaml as an example to describe how to set parameters.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: resnetinfer1-1-1usoc
    spec:
    ...
          nodeSelector:
            host-arch: huawei-arm
            servertype: soc
          containers:
          - image: ubuntu-infer:v1
    ...
            resources:
              requests:
                huawei.com/Ascend310P: 1
              limits:
                huawei.com/Ascend310P: 1
    ...
  3. The directories and files to be mounted to the Atlas 200I Soc A1 core board node are different from those to other types of nodes. To avoid inference failure, if the Ascend 310P processor is required and the Atlas 200I Soc A1 core board node exists in the cluster but you do not want to schedule the job to this type of node, add the affinity field to the example YAML file, indicating not to schedule jobs to the nodes with the servertype=soc label.
    • The following gives an example of infer.yaml, which is the YAML file for a job.
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: resnetinfer1-1
      spec:
        template:
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: servertype
                          operator: NotIn
                          values:
                            - soc
            nodeSelector:
              host-arch: huawei-arm 
      ...
    • The following gives an example of infer-deployment.yaml, which is the YAML file for a Deployment job.
      apiVersion: apps/v1
      kind: Deployment
      ...
      spec:
        template:
          metadata: 
            labels:
               app: infers
      ...
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: servertype
                          operator: NotIn
                          values:
                            - soc
            schedulerName: volcano 
            nodeSelector:
              host-arch: huawei-arm 
      ...