Generating Redundant Expert Deployment Tables

After the hotspot information is collected, each NPU generates a .csv file that contains a matrix (num_moe_layer × the number of experts per NPU). Each number in the matrix represents the number of tokens processed by experts in that layer. The matrix is appended to the collection file at an interval of eight tokens.

Use the elb component of the msIT tool to generate redundant expert deployment tables based on the collected expert hotspot information.

  1. The following describes how to install the elb component.
    # 1.git clone
    git clone https://gitcode.com/Ascend/msit.git
    cd msit/msit
    
    # 2. Install msIT.
    pip install .
     
    # 3. Run the msit install command to install the required elb component.
    msit install elb
     
    # 4. Run the msit check command to check whether the installation is successful.
    msit check all
  2. The installation is successful if the following information is displayed:
    2025-07-16 15:08:58,383 - 36266 - msit_llm_logger - INFO - msit-surgeon
    2025-07-16 15:08:58,395 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,395 - 36266 - msit_llm_logger - INFO - msit-analyze
    2025-07-16 15:08:58,407 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,407 - 36266 - msit_llm_logger - INFO - msit-convert
    2025-07-16 15:08:58,419 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,419 - 36266 - msit_llm_logger - INFO - msit-profile
    2025-07-16 15:08:58,431 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,431 - 36266 - msit_llm_logger - INFO - msit-tensor-view
    2025-07-16 15:08:58,443 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,443 - 36266 - msit_llm_logger - INFO - msit-benchmark
    2025-07-16 15:08:58,454 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,454 - 36266 - msit_llm_logger - INFO - msit-compare
    2025-07-16 15:08:58,465 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,465 - 36266 - msit_llm_logger - INFO - msit-opcheck
    2025-07-16 15:08:58,476 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,476 - 36266 - msit_llm_logger - INFO - msit-graph
    2025-07-16 15:08:58,488 - 36266 - msit_llm_logger - INFO -   not install yet.
    2025-07-16 15:08:58,488 - 36266 - msit_llm_logger - INFO - msit-elb
    2025-07-16 15:08:58,632 - 36266 - msit_llm_logger - INFO -   OK
  3. Use the elb component to generate redundant expert deployment tables. For details, see the guide on affinity-based expert determination for load balancing. The typical 8-server 64-device configuration is as follows:
    msit elb -icp input_dir_path -o output_file_path -nre 0 -nd 8 -nn 64 -al 5 -dt a2

    msIT provides two load balancing algorithms: compute-communication load balancing (C2LB) and speculative-moe interface algorithms. Currently, the speculative-moe level 2 hybrid algorithm (AL 5) achieves the optimal performance in Atlas 800I A2 inference server. The speculative-moe level 2 algorithm (AL 3) achieves the optimal performance in Atlas 800I A3 SuperPoD Server.

  • In the prefill-decode disaggregation scenario, redundant expert deployment tables can be generated for the prefill and decode phases, respectively.
  • The prefill-decode overlap scenario requires redundant expert deployment tables for the decode phase only to enhance performance.
  • If OOM occurs when you collect expert hotspot information in long-sequence scenarios, you are advised to reduce the sequence length.