Scenario

The single-node service deployment means to deploy an instance in non-distributed mode. That is, a complete and independent Server inference service instance can be deployed on a compute node. Based on device resources, multiple Server service instances can be deployed on a single compute node or multiple compute nodes.

Coordinator as the External Service Entry

Inference request: All requests are sent to Coordinator through the scheduling entry of a user-deployed third-party platform (such as the scheduling entry of Kubernetes or MA). Then, Coordinator schedules the requests to each Server instance based on its supported load scheduling algorithms. For details, see Deploying a Service Using kubectl. If this mode is used to deploy a single-node (non-distributed) service, refer to Serving APIs to see the supported interfaces.

Figure 1 Coordinator as the external service entry

The following table lists the scheduling algorithms supported in the single-node deployment scenario.

Scheduling Algorithm

Description

Deployment Suggestions

cache_affinity

Cache affinity scheduling algorithm used only in the OpenAI multi-round session scenario.

Recommended in the OpenAI multi-round session scenario.

round_robin

Round-robin scheduling algorithm used in non-OpenAI multi-round session scenarios.

Used by default when a non-OpenAI multi-round session interface is used. You do not need to configure it.

To maintain service stability, users should strictly control the permissions of custom pods to prevent high-privilege pods from modifying internal parameters of MindIE, which may cause exceptions.