Network-Wide Profiling and Comparison

After identifying the token with significant variations in logits accuracy, dump the network-wide accuracy data for both the benchmark model and the ATB model. Then, use the compare tool to perform an accuracy comparison and locate the issue.

For Model A, dump the network-wide accuracy data for the third token.

  1. Profile the network-wide data of the benchmark model.
    Sample code:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    import torch
    from msit_llm import DumpConfig, register_hook
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    # Before initializing the inference model, enable deterministic computation.
    from msit_llm import seed_all
    seed_all(seed=2345)
    
    # Configure dump parameters:
    # token_range=list ([3]) indicates that the network-wide data of the third token is collected.
    
    dump_config = DumpConfig(token_range=list([3]), dump_path="dump data storage path")
    
    # Initialize the inference model.
    model_weight_path="Model A's weight path"
    tokenizer = AutoTokenizer.from_pretrained(model_weight_path)
    model = AutoModelForCausalLM.from_pretrained(model_weight_path).cuda()
    register_hook(model, dump_config)  # model represents the model instance that dumps intermediate tensors, and the code should be added after model initialization.
    
    with torch.no_grad():
        # Inference process code.
    
  2. Collect the network-wide data of the ATB model.

    Command example:

    msit llm dump --exec  "bash run.sh"  -er 3,3 -o "dump data saving path" -seed 2345
  3. Use the compare tool to compare the network-wide accuracy data.

    Example:

    msit llm compare -gp {GOLDEN_DUMP_DIR}/msit_dump_{TIMESTAMP}/torch_tensors/cuda{device_id}_{PID}/ -mp {ATB_DUMP_DIR}/msit_dump_{TIMESTAMP}/tensors/{device_id}_{PID} -o "Comparison result saving path"
  4. After the comparison is complete, the comparison result file msit_cmp_report_{TIMESTAMP} is generated and saved in the path for saving the comparison result.

    Open the accuracy comparison result of Model A, find the first tensor that does not meet the accuracy requirements, and check the my_data_path column. The name of the operator that introduces the issue is LinearOperation, as shown in Figure 1.

    Figure 1 Network-wide accuracy comparison result of Model A
  5. After confirming the operator name, use the msit llm opcheck tool to pre-check the operator accuracy and check whether the accuracy of the ATB operator meets the requirements.