Inference Server (with Atlas 300I Duo Inference Cards)

Affinity scheduling is supported by the inference server (with Atlas 300I Duo inference cards). A maximum of four Atlas 300I Duo inference cards can be inserted into an Atlas 800 inference server (model 3000), and each Atlas 300I Duo inference card has two Ascend AI processors. When you deliver a job YAML on an inference server (with Atlas 300I Duo inference cards), you can use duo to specify the Atlas 300I Duo inference card, npu-310-strategy to specify the scheduling mode, and distributed to specify the scheduling policy. For details about the parameters, see Table 1.

Table 1 Parameter description

Parameter

Default Value

Value Description

duo

false

  • true: Use Atlas 300I Duo inference card.
  • false: Not use Atlas 300I Duo inference card.

npu-310-strategy

chip

  • card: Schedule by inference card. The number of Ascend AI processors requested by request does not exceed 2, and the Ascend AI processor on the same Atlas 300I Duo inference card is used.
  • chip: scheduling by Ascend AI processor. The number of Ascend AI processors requested cannot exceed the maximum value supported by a single node.

distributed

false

  • true: distributed inference scheduling policy. When chip is specified, the job must be scheduled to the entire Atlas 300I Duo inference card. If the number of Ascend AI processors required by the job is an odd number, the job is preferentially scheduled to the Atlas 300I Duo inference card with one remaining AscendAI processor.
  • false: non-distributed inference scheduling policy. When chip is specified, the number of requested Ascend AI processors cannot exceed the maximum processor number on a single node.
    NOTE:

    The scheduling policy in card mode remains unchanged regardless of whether distributed inference is used.