Accuracy Debugging

Prerequisites

  • The model has been quantized by referring to the Model Quantization section.
  • A floating-point model has been prepared. For details, see 1 in the model quantization section.

Dump the Quantized Model

  1. Run the following command to check whether the quantized model can be used for inference:
    bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh ${save_directory} ${max_output_length}

    Information in the value of this parameter is as follows:

    • ATB_SPEED_HOME_PATH: The default path is /usr/local/Ascend/atb-models, which is configured when sourcing the set_env.sh script in the model repository.
    • max_output_length: indicates the maximum number of output tokens in the dialogue test.
    If the command output contains the following information, the quantized model can be used for inference:
    1
    2
    3
    Question[0]: What's deep learning?
    Answer[0]:  Deep learning is a subset of machine learning that uses artificial neural networks to analyze data. It's called
    Generate[0] token num: (0, 20)
    
  2. Run the following command to dump the quantized model and save the result to a user-specified output path. For option meanings in the command, see Table 1. The following uses the second token as an example. For more option meanings, see Acceleration Library Model Data Dump.
    msit llm dump --exec "bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh ${save_directory} ${max_output_length}" --type model tensor -er 2,2 -o ${quant_dump_path}
    Table 1 Options in the dump command

    Option

    Description

    Sample

    --exec

    Specifies the command to execute the program containing ATB.

    Redirection characters are not supported. To redirect the output, you are advised to write the command into the shell script and then start the shell script.

    --exec "bash run.sh patches/models"

    --type

    Specifies the dump type. The default value is ['tensor', 'model'].

    The options are as follows:

    • model: indicates model topology information (by default). If the dump type is model, the layer information will be dumped together with the model topology information.
    • layer: indicates topology information in the operation dimension.
    • tensor: indicates tensor data (by default).

    --type layer tensor

    -er, --execute-range

    Specifies the token number range to dump. The interval is inclusive on both ends. Multiple interval sequences are supported. The default is the 0th token.

    Ensure that the total length of multiple intervals does not exceed 500 characters.

    -er 2,2

    -er 3,5,7,7: indicates the intervals [3,5] and [7,7], that is, the 3rd, 4th, 5th, and 7th tokens.

    -o, --output

    Specifies the output directory for dumped data. The default value is ./.

    -o /home/projects/output

  3. After the quantized model is dumped, the data dump directory structure is as follows:
    ├── {quant_dump_path}/              # Data storage path   
      - msit_dump_{timestamp}/ # Data dump timestamp directory
    │    ├── layer/                 # Network structure subdirectory
    │    ├── model/                 # Model information directory
    │    ├── tensors/               # Tensor subdirectory

Dump the Floating-Point Model

  1. Run the following command to check whether the floating-point model can be used for inference:
    bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}
  2. If the command output contains the following information, the floating-point model can be used for inference:
    1
    2
    3
    Question[0]: What's deep learning?
    Answer[0]:  Deep learning is a subset of machine learning that uses artificial neural networks to analyze data. It's called
    Generate[0] token num: (0, 20)
    
  3. Dump the floating-point model and save the result to a user-specified output path. For option meanings in the command, see Table 1. The following uses the second token as an example.
    msit llm dump --exec "bash ${ATB_SPEED_HOME_PATH}/examples/models/llama3/run_pa.sh --model_path ${model_path} ${max_output_length}" --type model tensor -er 2,2 -o ${float_dump_path}
  4. After the floating-point model is dumped, the msit_dump_{timestamp} folder is generated in the float_dump folder. The data dump directory structure is as follows:
    ├── {float_dump_path}/              # Data storage path   
      - msit_dump_{timestamp}/ # Data dump timestamp directory
    │    ├── layer/                 # Network structure subdirectory
    │    ├── model/                 # Model information directory
    │    ├── tensors/               # Tensor subdirectory

Accuracy Comparison

  1. Compare the accuracy of the dump result file of the quantized model with that of the floating-point model. Table 2 describes the options in the command.
    msit llm compare -gp 
    ${float_dump_path}/msit_dump_{timestamp}/tensors/{device_id}_{process_id}/2/
    -mp ${quant_dump_path}/msit_dump_{timestamp}/tensors/{device_id}_{process_id}/2/ -o ${compare_result_dir}
    Table 2 Options in the compare command

    Option

    Description

    -gp

    Specifies the path of the benchmark data, that is, the directory containing the data dumped from the floating-point model.

    -mp

    Specifies the path of the data to be compared, that is, the directory containing the data dumped from the quantized model.

    -o

    Specifies the path for saving the comparison result.

  2. The accuracy comparison command output is as follows. For details about the parameters in the comparison result file, see Accuracy Comparison Result Parameters.
    1
    2
    3
    4
    5
    msit_llm_logger - INFO - golden_layer_type: Prefill_layer
    msit_llm_logger - INFO - my_layer_type: Prefill_layer
    msit_llm_logger - INFO - golden_layer_type: Decoder_layer
    msit_llm_logger - INFO - my_layer_type: Decoder_layer
    msit_llm_logger - INFO - Saved comparing results: ./msit_cmp_report_{timestamp}.csv