Inference Server (with Atlas 300I Inference Cards)

Affinity scheduling is supported by the inference server (with Atlas 300I inference cards). A maximum of eight Atlas 300I inference cards can be inserted into an Atlas 800 inference server (model 3000), and each Atlas 300I inference card has four Ascend AI Processors. When using the inference server (with Atlas 300I inference cards), you can use npu-310-strategy to specify the scheduling policy when delivering a job YAML. Affinity scheduling can be implemented only when scheduling by inference card is specified.

Values of npu-310-strategy:

  • card: scheduling by inference card. The number of Ascend AI Processors in a request cannot exceed 4, and Ascend AI Processors must be within one Atlas 300I inference card.
  • chip: scheduling by Ascend AI Processor. The number of requested Ascend AI Processors cannot exceed the maximum value of a single node.