Serving Tools

This section describes the application scenarios and troubleshooting methods of serving tools. For details about how to locate specific faults, see Cases for Tuning the Serving Performance.

  1. General tuning: The pre-check tool (msprechecker) checks the system, environment variables, and configuration files to identify potential problems that may affect the serving performance.
  2. Targeted tuning: Optimize serving scheduling by adjusting the configuration items (such as the input and output length and number of concurrent requests) of the current requirements. The following tools are available:
    1. Expert advice tool for serving tuning (msservice_advisor): This tool quickly improves serving performance, but cannot be used for refined tuning.
    2. Automatic serving tuning tool (modelevalstate): This tool enhances serving performance and reaches 95% of the optimal performance achieved by manual tuning. However, this method takes a long time, and parameters need to be continuously searched to approach the optimal solution.
  3. If the expected result is not achieved, you can use the serving tuning tool (msServiceProfiler) for in-depth analysis. The msServiceProfiler tool is applicable to users who are familiar with the entire serving operation.
Table 1 Introduction to serving tools

Tool

Description

Inference precheck tool (msprechecker)

Supports full-process detection before, during, and after inference.

  • Before inference: The one-click pre-check function is provided to check for issues that may cause service deployment failures or performance deterioration, such as environment variables, system kernels, and configuration files.
  • During inference: All environment-related data can be flushed to disks.
  • After inference: The flushed files are compared to help identify differences and reproduce the baseline environment.

Serving tuning expert advice

Provides tuning suggestions for key metrics such as Time To First Token (TTFT) and throughput, based on the output result using the Benchmark tool, config.json configuration of the MindIE service, and theoretical analysis of the performance upper limit.

Automatic serving tuning

Provides automatic parameter tuning for MindIE and vLLM services. Uses advanced retrieval algorithms to efficiently find the optimal solution in the parameter space for automatic tuning. This tool is lightweight to support quick and convenient deployment and ensure accurate search results.

Serving tuning tool (msServiceProfiler)

Provides the capabilities to parse and break down the performance data collection APIs of the inference service This tool is designed for serving tuning. It collects the start and end time points of key processes, identifies and records information (such as key function calls, key events, and serving scheduling), and collects operator information to quickly locate performance problems. For more information, see "msServiceProfiler" in Profiling Instructions.