NPU Inference Job

NPU inference jobs in "Typical Scenarios" are classified into the following types:

Using Volcano as the scheduler: See Basic Process of NPU Inference Jobs Using Volcano as the Scheduler.
Not using Volcano as the scheduler: See Basic Process of NPU Inference Jobs Not Using Volcano as the Scheduler.

Basic Process of NPU Inference Jobs Using Volcano as the Scheduler

Create an inference job of the Deployment type.

Deployment resource example
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: infer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: infers
  template:
    metadata:
      labels:
         app: infers
    spec:
      schedulerName: volcano
      nodeSelector:
        host-arch: huawei-arm
      containers:
      - image: infer:latest
        imagePullPolicy: IfNotPresent
        name: infer
        command: xxxx
        resources:
          requests:
            huawei.com/Ascend310: 1
          limits:
            huawei.com/Ascend310: 1
        volumeMounts:
          - name: ascend-driver
            mountPath: /usr/local/Ascend/driver
      volumes:
        - name: ascend-driver
          hostPath:
            path: /usr/local/Ascend/driver
```
- Generally, the value of replicas is 1.
- The schedulerName of the scheduler must be Volcano.
- By default, nodeSelector supports only the key-value pairs configured in the YAML file when Volcano is started and the host-arch label must be used. For details about how to add a user-defined selector, see Volcano Scheduling Configuration.
- Change the NPU resource name and quantity in the request and limit. You can view the node details in the Kubernetes cluster to determine the NPU resource types that can be used by the nodes, such as the devices, NPUs after virtual instance implementation, and Ascend310/Ascend310P.
- Currently, only one container in a pod can use NPUs.
- Mount driver-related directories. If either of the following conditions is not met, you need to mount driver-related directories.
  - When the startup parameter useAscendDocker of the Ascend Device Plugin is set to true and the Ascend Docker Runtime has been installed and takes effect, the driver-related directories installed in /usr/local/Ascend are automatically mounted.
  - When the startup parameter useAscendDocker of the Ascend Device Plugin is set to false, the driver-related directories installed in /usr/local/Ascend are automatically mounted.
- You need to mount model code paths, and add other required content, such as environment variables.
- You need to set the container startup command, which corresponds to the command field in the YAML file.

Basic Process of NPU Inference Jobs Not Using Volcano as the Scheduler

Use a resource type, such as Job or other resource types, to create an inference job. For details about how to create Job resources, see the official examples of Kubernetes.

See the fourth item and the content after it in Deployment resource example.

Parent topic: Quick Start