Network-Wide Profiling and Comparison

After identifying the token with significant variations in logits accuracy, dump the network-wide accuracy data for both the benchmark model and the ATB model. Then, use the compare tool to perform an accuracy comparison and locate the issue.

For Model A, dump the network-wide accuracy data for the third token.

Collect the network-wide data of the benchmark model.

Sample code:

import torch
from msit_llm import DumpConfig,register_hook
from transformers import AutoTokenizer,AutoModelForCausalLM

# Enable deterministic computation before initializing the inference model
from msit_llm import seed_all
seed_all(seed=2345)

# Configure dump parameters:
# token_range=list ([3]) indicates that the network-wide data of the third token will be collected
# dump_path="/data/golden_dump_all_path" is the path to save the dumped data; replace with actual path

dump_config=DumpConfig(token_range=list([3]),dump_path="/data/golden_dump_all_path")  

# Initialize the inference model
model_weight_path="/data/model_path" # model_weight_path is the model A weight path; replace with actual path
tokenizer=AutoTokenizer.from_pretrained(model_weight_path)
model=AutoModelForCausalLM.from_pretrained(model_weight_path).cuda()
register_hook(model, dump_config) # model is the model instance for which intermediate tensors will be dumped. Add the code after model initialization

with torch.no_grad():
# Inference process code

Collect the network-wide data of the ATB model.
The following is a code example. ATB_DUMP_ALL_PATH indicates the path for saving the dump data.
```
msit llm dump --exec  "bash run.sh"  -er 3,3 -o {ATB_DUMP_ALL_PATH} -seed 2345
```
Use the msit llm compare tool to compare the network-wide accuracy data.
The following is an example. GOLDEN_DUMP_ALL_PATH indicates the path for saving the dump data of the benchmark model, and COMPARE_PATH indicates the path for saving the comparison results.
```
msit llm compare -gp {GOLDEN_DUMP_ALL_PATH}/msit_dump_{TIMESTAMP}/torch_tensors/cuda{device_id}_{PID}/ -mp {ATB_DUMP_ALL_PATH}/msit_dump_{TIMESTAMP}/tensors/{device_id}_{PID} -o {COMPARE_PATH}
```
After the comparison is complete, the comparison result file msit_cmp_report_{TIMESTAMP} is generated and saved in the path for saving the comparison results.
Open the accuracy comparison result of model A and find the first tensor that does not meet the accuracy requirements, as shown in Figure 1.

Figure 1 First tensor that does not meet the accuracy requirements

By checking the my_data_path column, the operator that introduced the issue can be identified as LinearOperation, as shown in Figure 2.

Figure 2 Name of the operator that introduces the issue
After confirming the operator name, use the msit llm opcheck tool to perform a pre-check on the operator's accuracy, determining whether the accuracy of the ATB operator meets the requirements.

Parent topic: Using MSIT for Troubleshooting