Overview
The MindStudio inference toolchain is a one-stop inference development tool dedicated to accelerating model problem locating and improving model inference performance.
This document uses the Llama-3.1-8B-Instruct model as an example to describe how to use tools, including model quantization, inference data dump, automatic accuracy comparison, and performance tuning, in the large language model (LLM) inference toolchain.
Instructions
Table 1 describes the functions of each tool during LLM inference.
Tool |
Function Description |
|---|---|
Model quantization tool: msModelSlim |
It provides model compression capability. It reduces model memory footprint and computational requirements by lowering the numerical precision of model weights and activations. It typically converts high-bit floating-point numbers to low-bit fixed-point numbers, directly reducing the size of model weights. The input of the model quantization tool is a model and data that can run properly, and the output is an available quantization weight and quantization factor. |
Data dump tool: msit llm dump |
It dumps the intermediate data generated during acceleration library model inference. The dumped data is used for accuracy comparison. |
Accuracy comparison tool: msit llm compare |
It provides the one-click accuracy comparison function, enabling rapid whole-network accuracy comparison in inference scenarios. |
Performance tuning: msProf |
It collects and analyzes key performance metrics at each execution stage of AI tasks running on Ascend AI Processors. |
msServiceProfiler |
It provides end-to-end performance profiling. It clearly displays the performance of framework scheduling and model inference, helping users quickly locate performance bottlenecks (helping determine whether the problem is caused by the framework or model) and effectively improve service performance. |
MindStudio Insight |
It visualizes profile data collected by the performance profiling tool. It can quickly locate hardware and software performance bottlenecks to improve AI task performance analysis efficiency. |
Environment Setup
- Deploy the development environment. For details, see "Installing MindIE" > "Method 1: Using an Image" in MindIE Installation Guide.
- Install the msit tool package. For details, see the msit Tool Installation document. Source code installation is recommended.
- Install the msModelSlim software. For details about how to download and install the software package, see msModelSlim.
- Install the LLM debug tool. For details, see Large Language Model Debug Tool.
- Install the CANN Toolkit package and ops operator package of the required version, and configure CANN environment variables. For details, see CANN Software Installation Guide..
- Install the MindStudio Insight tool. For details about how to select a proper environment for installation, see "Installation and Uninstallation" in MindStudio Insight User Guide.