Accuracy Debugging

The Large Language Model Debug Tool provides functions such as dumping and comparison of foundation model inference data, allowing developers to quickly locate accuracy issues and root causes during inference and development and improve development efficiency.

The dump tool dumps the intermediate data generated during acceleration library model inference. The dumped data is used for accuracy comparison.

The compare tool provides the one-click accuracy comparison function to quickly compare the accuracy of the entire network in inference scenarios.

Prerequisite

The model has been quantized by referring to the Model Quantization section.
A floating-point model has been prepared. For details, see 1 in the model quantization section.
The Large Language Model Debug Tool tool has been installed. For details, see Inference Accuracy Tuning Tool for Foundation Models.

Model Dump

Set the environment variables of the CANN, acceleration library, and model repository.

# Configure the CANN environment. By default, the CANN is installed in the /usr/local directory.
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# Configure the ATB environment.
source /usr/local/Ascend/nnal/atb/set_env.sh
# Configure environment variables for the model repository.
source /usr/local/Ascend/llm_model/set_env.sh

Dump a model.

Dump the quantized model.

Check whether the quantized model can be inferred.
```
bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh ${save_directory} ${max_output_length}
```
The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. max_output_length indicates the maximum number of output tokens in the dialog test.

If the command output contains the following information, the quantized model can be used for inference:
1 2 3
Question[0]: What's deep learning? Answer[0]: Deep learning is a subset of machine learning that uses artificial neural networks to analyze data. It's called Generate[0] token num: (0, 20)

Run the following command to dump the quantized model and save the result to the user-defined path. Table 1 describes the options. The following uses the second token as an example. For more option meanings, refer to Acceleration Library Model Data Dump.

msit llm dump --exec "bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh ${save_directory} ${max_output_length}" --type model tensor -er 2,2 -o ${quant_dump_path}

**Table 1** Options in the dump command
Option	Description	Sample
--exec	Specifies the program execution command that contains the ATB. Redirection characters are not supported. To redirect the output, you are advised to write the command to the shell script and then start the shell script.	--exec "bash run.sh patches/models"
--type	Specifies the dump type. The default value is ['tensor', 'model']. The options are as follows: model: model topology information (default). When the dump type is model, the layer is dumped together with the model. layer: topology information in the operation dimension tensor: tensor data (default)	--type layer tensor
-er, --execute-range	Specifies the token round range of dump. The left and right ranges are closed. Multiple ranges are supported. The default value is 0. Ensure that the total length of multiple ranges does not exceed 500 characters.	-er 2,2 -er 3,5,7,7: indicates the [3,5] and [7,7] ranges, meaning the third, fourth, fifth, and seventh tokens.
-o, --output	Specifies the output directory of dump data. The default value is ./.	-o /home/projects/output

After the quantized model is dumped, the data dump directory structure is as follows:

- {quant_dump_path}/ # Data storage path.
  - msit_dump_{timestamp}/ # Data dump timestamp directory.
        - layer/ # Network structure subdirectory.
        - model/ # Model information directory.
        - tensors/ # Tensor subdirectory.

Dump a floating-point model.

Check whether the floating-point model can be inferred.

bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}

If the command output contains the following information, the floating-point model can be used for inference:

Question[0]: What's deep learning?
Answer[0]:  Deep learning is a subset of machine learning that uses artificial neural networks to analyze data. It's called
Generate[0] token num: (0, 20)

Dump the floating-point model and save the result to the user-defined path. Table 1 describes the options. The following uses the second token as an example.

msit llm dump --exec "bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}" --type model tensor -er 2,2 -o ${float_dump_path}

After the floating-point model is dumped, the msit_dump_{timestamp} folder is generated in the float_dump folder. The data dump directory structure is as follows:

- {float_dump_path}/ # Data storage path.
  - msit_dump_{timestamp}/ # Data dump timestamp directory.
        - layer/ # Network structure subdirectory.
        - model/ # Model information directory.
        - tensors/ # Tensor subdirectory.

Accuracy Comparison

Compare the accuracy of the dump result file of the quantized model with that of the floating-point model.

msit llm compare -gp 
{float_dump_path}/msit_dump_{timestamp}/tensors/{device_id}_{process_id}/2/
-mp {quant_dump_path}/msit_dump_{timestamp}/tensors/{device_id}_{process_id}/2/ -o ${compare_result_dir}

Table 2 describes the options in the command.

**Table 2** Options in the compare command
Option	Description
-gp	Specifies the path of the benchmark data, that is, the directory where the dump data of the floating-point model is stored.
-mp	Specifies the path of the data to be compared, that is, the directory of the dump data of the quantized model.
-o	Specifies the path for saving the comparison result.

Verify that the command output matches the following example. For details about the options in the comparison result file, see Accuracy Comparison Result Options for further analysis.

msit_llm_logger - INFO - golden_layer_type: Prefill_layer
msit_llm_logger - INFO - my_layer_type: Prefill_layer
msit_llm_logger - INFO - golden_layer_type: Decoder_layer
msit_llm_logger - INFO - my_layer_type: Decoder_layer
msit_llm_logger - INFO - Saved comparing results: ./msit_cmp_report_{timestamp}.csv