Network-Wide Profiling and Comparison
After identifying the token with significant variations in logits accuracy, dump the network-wide accuracy data for both the benchmark model and the ATB model. Then, use the compare tool to perform an accuracy comparison and locate the issue.
For Model A, dump the network-wide accuracy data for the third token.
- Collect the network-wide data of the benchmark model.Sample code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import torch from msit_llm import DumpConfig,register_hook from transformers import AutoTokenizer,AutoModelForCausalLM # Enable deterministic computation before initializing the inference model from msit_llm import seed_all seed_all(seed=2345) # Configure dump parameters: # token_range=list ([3]) indicates that the network-wide data of the third token will be collected # dump_path="/data/golden_dump_all_path" is the path to save the dumped data; replace with actual path dump_config=DumpConfig(token_range=list([3]),dump_path="/data/golden_dump_all_path") # Initialize the inference model model_weight_path="/data/model_path" # model_weight_path is the model A weight path; replace with actual path tokenizer=AutoTokenizer.from_pretrained(model_weight_path) model=AutoModelForCausalLM.from_pretrained(model_weight_path).cuda() register_hook(model, dump_config) # model is the model instance for which intermediate tensors will be dumped. Add the code after model initialization with torch.no_grad(): # Inference process code
- Collect the network-wide data of the ATB model.
The following is a code example. ATB_DUMP_ALL_PATH indicates the path for saving the dump data.
msit llm dump --exec "bash run.sh" -er 3,3 -o {ATB_DUMP_ALL_PATH} -seed 2345 - Use the msit llm compare tool to compare the network-wide accuracy data.
The following is an example. GOLDEN_DUMP_ALL_PATH indicates the path for saving the dump data of the benchmark model, and COMPARE_PATH indicates the path for saving the comparison results.
msit llm compare -gp {GOLDEN_DUMP_ALL_PATH}/msit_dump_{TIMESTAMP}/torch_tensors/cuda{device_id}_{PID}/ -mp {ATB_DUMP_ALL_PATH}/msit_dump_{TIMESTAMP}/tensors/{device_id}_{PID} -o {COMPARE_PATH} - After the comparison is complete, the comparison result file msit_cmp_report_{TIMESTAMP} is generated and saved in the path for saving the comparison results.
- After confirming the operator name, use the msit llm opcheck tool to perform a pre-check on the operator's accuracy, determining whether the accuracy of the ATB operator meets the requirements.
Parent topic: Using MSIT for Troubleshooting

