External Shared Experts

  • External shared experts: Shared experts are deployed on an independent NPU so that they are separated from routed and redundant experts. During load balancing, only routed experts are included in the computation.

    Computation process: dispatch > simultaneous computing of shared and routed experts > combine

  • Built-in shared experts: Shared experts and routed/redundant experts are deployed on a single NPU. During load balancing, only routed experts are included in the computation.

    Computation process: matmul for shared experts > dispatch > routed experts > combine > results of shared and routed experts

  • Mixed deployment of shared experts: Shared experts are used as routed experts for load balancing.

    Computation process: dispatch > simultaneous computing of shared and routed experts > combine

Constraints

  • Only DeepSeek V3/R1 is supported.
  • External shared experts can be configured independently only for the 144-device Atlas 800I A3 SuperPoD Server. Performance improves if load balancing is enabled in this scenario.
  • Mixed deployment of shared experts and built-in shared experts must be used with load balancing, and ep_level must be 2.
  • External shared experts are supported by the Atlas 800I A3 SuperPoD Server only. Mixed deployment of shared experts is supported by the Atlas 800I A2 inference server only. Built-in shared experts are supported by the Atlas 800I A2 inference server and Atlas 800I A3 SuperPoD Server.

Usage Example

  • (Recommended) Enable expert load balancing.
    1. Generate expert deployment tables by referring to Generating Redundant Expert Deployment Tables.
    2. Modify the following parameters in the configuration file.
      1
      2
      3
      4
      5
      6
      7
      8
      9
              "models": {
                "deepseekv2": {
                  "ep_level": 2,
                  "eplb": {
                    "level": 1,
                    "expert_map_file": "xxxx.json"
                  }
                }
              }
      
  • The 144-device Atlas 800I A3 SuperPoD Server uses external shared experts independently with expert load balancing disabled.

    Modify the following parameters in the configuration file.

    1
    2
    3
    4
    5
    6
            "models": {
              "deepseekv2": {
                "ep_level": 2,
                "num_dangling_shared_experts": 32
              }
             }
    

Running Inference

  1. Set serving parameters. For details about the path of the serving config.json file, see the software package file list in "MindIE Configuration" > "Server Configuration" > "Single-Node Inference" in MindIE Installation Guide. For details about parameter settings, see Usage Example.
  2. Start the service. For details, see "Quick Start" > "Service Startup" in MindIE Motor Development Guide.