Overview

The MindStudio inference toolchain is a one-stop inference development tool dedicated to accelerating model problem locating and improving model inference performance.

This document uses the Llama-3.1-8B-Instruct model as an example to describe how to use tools, including model quantization, inference data dump, automatic accuracy comparison, and performance tuning, in the large language model (LLM) inference toolchain.

Instructions

Table 1 describes the functions of each tool during LLM inference.

**Table 1** Inference tool functions
Tool	Function Description
Model quantization tool: msModelSlim	It provides model compression capability. It reduces model memory footprint and computational requirements by lowering the numerical precision of model weights and activations. It typically converts high-bit floating-point numbers to low-bit fixed-point numbers, directly reducing the size of model weights. The input of the model quantization tool is a model and data that can run properly, and the output is an available quantization weight and quantization factor.
Data dump tool: msit llm dump	It dumps the intermediate data generated during acceleration library model inference. The dumped data is used for accuracy comparison.
Accuracy comparison tool: msit llm compare	It provides the one-click accuracy comparison function, enabling rapid whole-network accuracy comparison in inference scenarios.
Performance tuning: msProf	It collects and analyzes key performance metrics at each execution stage of AI tasks running on Ascend AI Processors.
msServiceProfiler	It provides end-to-end performance profiling. It clearly displays the performance of framework scheduling and model inference, helping users quickly locate performance bottlenecks (helping determine whether the problem is caused by the framework or model) and effectively improve service performance.
MindStudio Insight	It visualizes profile data collected by the performance profiling tool. It can quickly locate hardware and software performance bottlenecks to improve AI task performance analysis efficiency.

Environment Setup

Deploy the development environment. For details, see "Installing MindIE" > "Method 1: Using an Image" in MindIE Installation Guide.
Install the msit tool package. For details, see the msit Tool Installation document. Source code installation is recommended.
Install the msModelSlim software. For details about how to download and install the software package, see msModelSlim.
Install the LLM debug tool. For details, see Large Language Model Debug Tool.
Install the CANN Toolkit package and ops operator package of the required version, and configure CANN environment variables. For details, see CANN Software Installation Guide..
Install the MindStudio Insight tool. For details about how to select a proper environment for installation, see "Installation and Uninstallation" in MindStudio Insight User Guide.

Parent topic: Inference Tools Quick Start