Accuracy Debugging
The Large Language Model Debug Tool provides functions such as dumping and comparison of foundation model inference data, allowing developers to quickly locate accuracy issues and root causes during inference and development and improve development efficiency.
The dump tool dumps the intermediate data generated during acceleration library model inference. The dumped data is used for accuracy comparison.
The compare tool provides the one-click accuracy comparison function to quickly compare the accuracy of the entire network in inference scenarios.
Prerequisite
- The model has been quantized by referring to the Model Quantization section.
- A floating-point model has been prepared. For details, see 1 in the model quantization section.
- The Large Language Model Debug Tool tool has been installed. For details, see Inference Accuracy Tuning Tool for Foundation Models.
Model Dump
- Set the environment variables of the CANN, acceleration library, and model repository.
# Configure the CANN environment. By default, the CANN is installed in the /usr/local directory. source /usr/local/Ascend/ascend-toolkit/set_env.sh # Configure the ATB environment. source /usr/local/Ascend/nnal/atb/set_env.sh # Configure environment variables for the model repository. source /usr/local/Ascend/llm_model/set_env.sh
- Dump a model.
- Dump the quantized model.
- Check whether the quantized model can be inferred.
bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh ${save_directory} ${max_output_length}The default value of ATB_SPEED_HOME_PATH is /usr/local/Ascend/atb-models, which has been configured in the set_env.sh script in the source model repository. max_output_length indicates the maximum number of output tokens in the dialog test.
If the command output contains the following information, the quantized model can be used for inference:
1 2 3
Question[0]: What's deep learning? Answer[0]: Deep learning is a subset of machine learning that uses artificial neural networks to analyze data. It's called Generate[0] token num: (0, 20)
- Run the following command to dump the quantized model and save the result to the user-defined path. Table 1 describes the options. The following uses the second token as an example. For more option meanings, refer to Acceleration Library Model Data Dump.
msit llm dump --exec "bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh ${save_directory} ${max_output_length}" --type model tensor -er 2,2 -o ${quant_dump_path}Table 1 Options in the dump command Option
Description
Sample
--exec
Specifies the program execution command that contains the ATB.
Redirection characters are not supported. To redirect the output, you are advised to write the command to the shell script and then start the shell script.
--exec "bash run.sh patches/models"
--type
Specifies the dump type. The default value is ['tensor', 'model'].
The options are as follows:
- model: model topology information (default). When the dump type is model, the layer is dumped together with the model.
- layer: topology information in the operation dimension
- tensor: tensor data (default)
--type layer tensor
-er, --execute-range
Specifies the token round range of dump. The left and right ranges are closed. Multiple ranges are supported. The default value is 0.
Ensure that the total length of multiple ranges does not exceed 500 characters.
-er 2,2
-er 3,5,7,7: indicates the [3,5] and [7,7] ranges, meaning the third, fourth, fifth, and seventh tokens.
-o, --output
Specifies the output directory of dump data. The default value is ./.
-o /home/projects/output
- After the quantized model is dumped, the data dump directory structure is as follows:
- {quant_dump_path}/ # Data storage path. - msit_dump_{timestamp}/ # Data dump timestamp directory. - layer/ # Network structure subdirectory. - model/ # Model information directory. - tensors/ # Tensor subdirectory.
- Check whether the quantized model can be inferred.
- Dump a floating-point model.
- Check whether the floating-point model can be inferred.
bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}If the command output contains the following information, the floating-point model can be used for inference:
1 2 3
Question[0]: What's deep learning? Answer[0]: Deep learning is a subset of machine learning that uses artificial neural networks to analyze data. It's called Generate[0] token num: (0, 20)
- Dump the floating-point model and save the result to the user-defined path. Table 1 describes the options. The following uses the second token as an example.
msit llm dump --exec "bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}" --type model tensor -er 2,2 -o ${float_dump_path} - After the floating-point model is dumped, the msit_dump_{timestamp} folder is generated in the float_dump folder. The data dump directory structure is as follows:
- {float_dump_path}/ # Data storage path. - msit_dump_{timestamp}/ # Data dump timestamp directory. - layer/ # Network structure subdirectory. - model/ # Model information directory. - tensors/ # Tensor subdirectory.
- Check whether the floating-point model can be inferred.
- Dump the quantized model.
Accuracy Comparison
- Compare the accuracy of the dump result file of the quantized model with that of the floating-point model.
msit llm compare -gp {float_dump_path}/msit_dump_{timestamp}/tensors/{device_id}_{process_id}/2/ -mp {quant_dump_path}/msit_dump_{timestamp}/tensors/{device_id}_{process_id}/2/ -o ${compare_result_dir}Table 2 describes the options in the command.
Table 2 Options in the compare command Option
Description
-gp
Specifies the path of the benchmark data, that is, the directory where the dump data of the floating-point model is stored.
-mp
Specifies the path of the data to be compared, that is, the directory of the dump data of the quantized model.
-o
Specifies the path for saving the comparison result.
- Verify that the command output matches the following example. For details about the options in the comparison result file, see Accuracy Comparison Result Options for further analysis.
1 2 3 4 5
msit_llm_logger - INFO - golden_layer_type: Prefill_layer msit_llm_logger - INFO - my_layer_type: Prefill_layer msit_llm_logger - INFO - golden_layer_type: Decoder_layer msit_llm_logger - INFO - my_layer_type: Decoder_layer msit_llm_logger - INFO - Saved comparing results: ./msit_cmp_report_{timestamp}.csv