Constraints

  • Currently, this feature is supported only by Kubernetes clusters that use cluster scheduling components.
  • This feature depends on the Volcano and Asend Device Plugin components. When the rescheduling policy is enabled, an exception of Ascend Device Plugin also triggers rescheduling upon a fault.
  • The fault rescheduling label "fault-scheduling: grace" needs to be added to jobs of the deployment type. "fault-scheduling: force" is not supported.
  • Table 1 lists the supported Ascend hardware.
    Table 1 Supported Ascend hardware

    Server

    Model

    Inference server

    Atlas 800 inference server (model 3000 and 3010) and inference server with the Atlas 300I, Atlas 300I Pro, Atlas 300I Duo, Atlas 300V Pro, or Atlas 300V inference cards