Preparation of Job YAML Files

If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.

Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.

Procedure

  1. Obtain the corresponding YAML file.
    Table 1 YAML description

    Job Type

    Hardware Model

    YAML File Name

    How to Obtain

    Deployment

    Atlas inference product

    infer-deploy-dynamic.yaml

    Click here.

    VolcanoJob

    infer-vcjob-dynamic.yaml

    Click here.

  2. Upload the YAML file to any directory on the management node and modify the file content as required.

    The following uses infer-deploy-dynamic.yaml as an example to describe how to allocate one AI Core on the Atlas inference product.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: resnetinfer1-1-deploy
      labels:
        app: infers
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: infers
      template:
        metadata:
          labels:
            app: infers
            fault-scheduling: "grace"           # Label used for rescheduling.
             # For details about the following parameters, see infer-deploy-dynamic.yaml.
            ring-controller.atlas: ascend-310P 
            vnpu-dvpp: "null"         
            vnpu-level: "low"           
        spec:
          schedulerName: volcano              # Use MindCluster Volcano as the scheduler.
          nodeSelector:
            host-arch: huawei-arm
          containers:
            - image: ubuntu-infer:v1   # Example image
    ...
    
              resources:
                requests:
                  huawei.com/npu-core: 1        # Use the static virtualization template vir01 to dynamically virtualize NPUs.
                limits:
                  huawei.com/npu-core: 1        # The value must be the same as that in requests.
    Table 2 Parameters in the infer-deploy-dynamic.yaml file

    Parameter

    Value

    Description

    vnpu-level

    low

    Low configuration. This is the default value. Select the virtualization template with the minimum configuration.

    high

    Performance comes in the first place.

    If there are enough cluster resources, select a virtualization template with the maximum configuration. If most of the cluster resources are used, for example, most physical NPUs are used and only a small number of AI Cores are left on each physical NPU, use minimal-spec templates with the same number of AI Cores. For details, see "Virtualization Template" in Virtualization Rules.

    vnpu-dvpp

    yes

    DVPP used by a pod

    no

    DVPP not used by a pod

    null

    Default value: DVPP ignored

    ring-controller.atlas

    ascend-310P

    Flag indicates that the job running in the Atlas inference product

    After vnpu-level and vnpu-dvpp take effect, select a vNPU template by referring to Table 3.

    Table 3 DVPP and levels

    Number of Requested AI Cores

    vnpu-dvpp

    vnpu-level

    Degrade (Y/N)

    Template

    1

    null

    Any value

    -

    vir01

    2

    null

    Low/Other

    -

    vir02_1c

    null

    High

    No

    vir02

    Yes

    vir02_1c

    4

    yes

    Low/Other

    -

    vir04_4c_dvpp

    no

    vir04_3c_ndvpp

    null

    vir04_3c

    yes

    High

    -

    -

    vir04_4c_dvpp

    no

    vir04_3c_ndvpp

    null

    No

    vir04

    Yes

    vir04_3c

    8 or a multiple of 8

    Any value

    Any value

    -

    -

    If the number of requested AI Cores is 8 or a multiple of 8, the entire NPU is used.

  3. Mount the weight file.
    ...
                  ports:     # Collective communication port for distributed training
                    - containerPort: 2222      
                      name: ascendjob-port      
                  resources:
                    limits:
                      huawei.com/Ascend310P: 1   # Number of allocated processors
                    requests:
                      huawei.com/Ascend310P: 1   # The value must be the same as that of limits.
                  volumeMounts:
    ...
                      # Mount path of the weight file
                    - name: weights                  
                      mountPath: /path-to-weights
    ...
              volumes:
    ...
                      # Mount path of the weight file
                - name: weights
                  hostPath:
                    path: /path-to-weights  # Shared storage or local storage path. Change it as required.
    ...
    • /path-to-weights indicates model weights, which need to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
    • The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.
  4. Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.
    ...
          containers:
          - image: ubuntu-infer:v1
    ...
            command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"]
            resources:
              requests:
    ...