Preparation of Job YAML Files
If you do not use Ascend Docker Runtime, Ascend Device Plugin only helps you mount devices in the /dev directory. For other directories (such as /usr), you need to modify the YAML file and mount the corresponding driver directories and files. The mount path in the container must be the same as the host path.
Ascend Docker Runtime is not supported by Atlas 200I SoC A1 core boards, so you do not need to modify the YAML file.
Procedure
- Obtain the corresponding YAML file.
- Upload the YAML file to any directory on the management node and modify the file content as required.
The following uses infer-deploy-dynamic.yaml as an example to describe how to allocate one AI Core on the Atlas inference product.
apiVersion: apps/v1 kind: Deployment metadata: name: resnetinfer1-1-deploy labels: app: infers spec: replicas: 1 selector: matchLabels: app: infers template: metadata: labels: app: infers fault-scheduling: "grace" # Label used for rescheduling. # For details about the following parameters, see infer-deploy-dynamic.yaml. ring-controller.atlas: ascend-310P vnpu-dvpp: "null" vnpu-level: "low" spec: schedulerName: volcano # Use MindCluster Volcano as the scheduler. nodeSelector: host-arch: huawei-arm containers: - image: ubuntu-infer:v1 # Example image ... resources: requests: huawei.com/npu-core: 1 # Use the static virtualization template vir01 to dynamically virtualize NPUs. limits: huawei.com/npu-core: 1 # The value must be the same as that in requests.Table 2 Parameters in the infer-deploy-dynamic.yaml file Parameter
Value
Description
vnpu-level
low
Low configuration. This is the default value. Select the virtualization template with the minimum configuration.
high
Performance comes in the first place.
If there are enough cluster resources, select a virtualization template with the maximum configuration. If most of the cluster resources are used, for example, most physical NPUs are used and only a small number of AI Cores are left on each physical NPU, use minimal-spec templates with the same number of AI Cores. For details, see "Virtualization Template" in Virtualization Rules.
vnpu-dvpp
yes
DVPP used by a pod
no
DVPP not used by a pod
null
Default value: DVPP ignored
ring-controller.atlas
ascend-310P
Flag indicates that the job running in the Atlas inference product
After vnpu-level and vnpu-dvpp take effect, select a vNPU template by referring to Table 3.
Table 3 DVPP and levels Number of Requested AI Cores
vnpu-dvpp
vnpu-level
Degrade (Y/N)
Template
1
null
Any value
-
vir01
2
null
Low/Other
-
vir02_1c
null
High
No
vir02
Yes
vir02_1c
4
yes
Low/Other
-
vir04_4c_dvpp
no
vir04_3c_ndvpp
null
vir04_3c
yes
High
-
-
vir04_4c_dvpp
no
vir04_3c_ndvpp
null
No
vir04
Yes
vir04_3c
8 or a multiple of 8
Any value
Any value
-
-
If the number of requested AI Cores is 8 or a multiple of 8, the entire NPU is used.
- Mount the weight file.
... ports: # Collective communication port for distributed training - containerPort: 2222 name: ascendjob-port resources: limits: huawei.com/Ascend310P: 1 # Number of allocated processors requests: huawei.com/Ascend310P: 1 # The value must be the same as that of limits. volumeMounts: ... # Mount path of the weight file - name: weights mountPath: /path-to-weights ... volumes: ... # Mount path of the weight file - name: weights hostPath: path: /path-to-weights # Shared storage or local storage path. Change it as required. ...
- /path-to-weights indicates model weights, which need to be prepared by yourself. You can download the MindIE image by referring to the $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md file.
- The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. You do not need to configure it by yourself.
- Modify the container startup command in the example YAML file, as shown in the following information in bold. If the command field does not exist, add it.
... containers: - image: ubuntu-infer:v1 ... command: ["/bin/bash", "-c", "cd $ATB_SPEED_HOME_PATH; python examples/run_pa.py --model_path /path-to-weights"] resources: requests: ...