Generating Redundant Expert Deployment Tables
After the hotspot information is collected, each NPU generates a .csv file that contains a matrix (num_moe_layer × the number of experts per NPU). Each number in the matrix represents the number of tokens processed by experts in that layer. The matrix is appended to the collection file at an interval of eight tokens.
Use the elb component of the msIT tool to generate redundant expert deployment tables based on the collected expert hotspot information.
- The following describes how to install the elb component.
# 1.git clone git clone https://gitcode.com/Ascend/msit.git cd msit/msit # 2. Install msIT. pip install . # 3. Run the msit install command to install the required elb component. msit install elb # 4. Run the msit check command to check whether the installation is successful. msit check all
- The installation is successful if the following information is displayed:
2025-07-16 15:08:58,383 - 36266 - msit_llm_logger - INFO - msit-surgeon 2025-07-16 15:08:58,395 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,395 - 36266 - msit_llm_logger - INFO - msit-analyze 2025-07-16 15:08:58,407 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,407 - 36266 - msit_llm_logger - INFO - msit-convert 2025-07-16 15:08:58,419 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,419 - 36266 - msit_llm_logger - INFO - msit-profile 2025-07-16 15:08:58,431 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,431 - 36266 - msit_llm_logger - INFO - msit-tensor-view 2025-07-16 15:08:58,443 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,443 - 36266 - msit_llm_logger - INFO - msit-benchmark 2025-07-16 15:08:58,454 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,454 - 36266 - msit_llm_logger - INFO - msit-compare 2025-07-16 15:08:58,465 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,465 - 36266 - msit_llm_logger - INFO - msit-opcheck 2025-07-16 15:08:58,476 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,476 - 36266 - msit_llm_logger - INFO - msit-graph 2025-07-16 15:08:58,488 - 36266 - msit_llm_logger - INFO - not install yet. 2025-07-16 15:08:58,488 - 36266 - msit_llm_logger - INFO - msit-elb 2025-07-16 15:08:58,632 - 36266 - msit_llm_logger - INFO - OK
- Use the elb component to generate redundant expert deployment tables. For details, see the guide on affinity-based expert determination for load balancing. The typical 8-server 64-device configuration is as follows:
msit elb -icp input_dir_path -o output_file_path -nre 0 -nd 8 -nn 64 -al 5 -dt a2
msIT provides two load balancing algorithms: compute-communication load balancing (C2LB) and speculative-moe interface algorithms. Currently, the speculative-moe level 2 hybrid algorithm (AL 5) achieves the optimal performance in Atlas 800I A2 inference server. The speculative-moe level 2 algorithm (AL 3) achieves the optimal performance in Atlas 800I A3 SuperPoD Server.
- In the prefill-decode disaggregation scenario, redundant expert deployment tables can be generated for the prefill and decode phases, respectively.
- The prefill-decode overlap scenario requires redundant expert deployment tables for the decode phase only to enhance performance.
- If OOM occurs when you collect expert hotspot information in long-sequence scenarios, you are advised to reduce the sequence length.