NPU Inference Job

NPU inference jobs in "Typical Scenarios" are classified into the following types:

Basic Process of NPU Inference Jobs Using Volcano as the Scheduler

Create an inference job of the Deployment type.
  • Deployment resource example
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: infer
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: infers
      template:
        metadata:
          labels:
             app: infers
        spec:
          schedulerName: volcano
          nodeSelector:
            host-arch: huawei-arm
          containers:
          - image: infer:latest
            imagePullPolicy: IfNotPresent
            name: infer
            command: xxxx
            resources:
              requests:
                huawei.com/Ascend310: 1
              limits:
                huawei.com/Ascend310: 1
            volumeMounts:
              - name: ascend-driver
                mountPath: /usr/local/Ascend/driver
          volumes:
            - name: ascend-driver
              hostPath:
                path: /usr/local/Ascend/driver
    • Generally, the value of replicas is 1.
    • The schedulerName of the scheduler must be Volcano.
    • By default, nodeSelector supports only the key-value pairs configured in the YAML file when Volcano is started and the host-arch label must be used. For details about how to add a user-defined selector, see Volcano Scheduling Configuration.
    • Change the NPU resource name and quantity in the request and limit. You can view the node details in the Kubernetes cluster to determine the NPU resource types that can be used by the nodes, such as the devices, NPUs after virtual instance implementation, and Ascend310/Ascend310P.
    • Currently, only one container in a pod can use NPUs.
    • Mount driver-related directories. If either of the following conditions is not met, you need to mount driver-related directories.
      • When the startup parameter useAscendDocker of the Ascend Device Plugin is set to true and the Ascend Docker Runtime has been installed and takes effect, the driver-related directories installed in /usr/local/Ascend are automatically mounted.
      • When the startup parameter useAscendDocker of the Ascend Device Plugin is set to false, the driver-related directories installed in /usr/local/Ascend are automatically mounted.
    • You need to mount model code paths, and add other required content, such as environment variables.
    • You need to set the container startup command, which corresponds to the command field in the YAML file.

Basic Process of NPU Inference Jobs Not Using Volcano as the Scheduler

Use a resource type, such as Job or other resource types, to create an inference job. For details about how to create Job resources, see the official examples of Kubernetes.