Process Description

If the RoCE network is not configured:

In non-SuperPoD scheduling scenarios, single-node inference instances can be properly scheduled, but KV transfer between inference instances may be abnormal. As a result, inference jobs cannot run properly.
In SuperPoD scheduling scenarios, if there is only one logical SuperPoD for inference instances, inference instances can be properly scheduled, but KV transfer between inference instances may be abnormal. As a result, inference jobs cannot run properly.

MindIE Motor consists of MindIE Management Service (MS) and MindIE Server. MindIE MS consists of MS Controller and MS Coordinator, and MindIE Server can be divided into prefill instances and decode instances. MS Controller and MS Coordinator do not need NPU resources, while MindIE Server needs NPU resources.

MindCluster cluster scheduling components allow MS Controller, MS Coordinator, and MindIE Server to run in separate pods. When MindCluster cluster scheduling components are used to deploy MindIE Motor jobs, each instance of MS Controller, MS Coordinator, and MindIE Server is deployed as an AscendJob. For example, if an inference job contains two prefill instances and one decode instance, five AscendJobs need to be deployed.

For details about the prefill-decode disaggregation deployment, see "Cluster Service Deployment" > "Prefill-Decode Disaggregation" in MindIE Motor Development Guide.

Procedure

The following figure shows the procedure for using MindCluster cluster scheduling components to deploy MindIE Motor inference jobs through the CLI.

Figure 1 Procedure

Parent topic: Use on the CLI