Performance/Accuracy Test Tool

Currently, MindIE supports the AISBench tool for accuracy and performance tests. For details, see AISBench. Table 1 and Table 2 lists the supported functions and features as well as performance test indicators.

**Table 1** Features
Feature	AISBench
Inference mode	Supports streaming inference and text inference in client mode. For details, see here.
Inference engine	Supports inference engines including MindIE, vLLM, SGLang, TGI, and Triton. For details, see here.
Dataset	Supports 39 open-source datasets and synthetic random datasets. For details, see here.
Sending mode	Supports uniform and Poisson distributions. For details, see here.
Accuracy test	Supported. For details, see here.
Performance test	Supported. For details, see here.
Token inference	Supported. For details, see here.
Multi LoRA inference	Supported. For details, see here.
Function call test	Supported. For details, see here.
Multi-turn dialogue test	Supported. For details, see here.
Stable-state test	Supported. For details, see here.
Pressure test	Supported. For details, see here.
Multi-task test	Supported. For details, see here.
Visualized process	Supported. For details, see here.
Resumable test	Supported. For details, see here.
Custom dataset	Supported. For details, see here.
Plug-in extension	Supported. For details, see here.

**Table 2** Performance test result indicators
AISBench	Definition
TTFT	Time To First Token, latency of the first token NOTE: This indicator cannot be measured in the beam search scenario.
ITL	Inter-token latency, latency between chunks
TPOT	Time Per Output Token, latency between decode tokens, which is calculated as follows: (E2EL – TTFT)/(OutputTokens – 1) NOTE: This indicator cannot be measured in the beam search scenario.
E2EL	End To End Latency, end-to-end latency of a request
InputTokens	Number of input tokens of a request
OutputTokens	Number of generated tokens of a request
PrefillTokenThroughput	Prefill throughput of a request, which is calculated as follows: InputTokens/TTFT
OutputTokenThroughput	Throughput of a request, which is calculated as follows: OutputTokens/E2EL
Benchmark Duration	End-to-end duration of a performance test
Total Requests	Total number of sent requests
Failed Requests	Total number of failed requests
Successful Requests	Total number of successful requests
Concurrency	Average number of concurrent connections, which is calculated as follows: sum(E2EL)/Benchmark Duration
Max Concurrency	Configured number of concurrent connections
Request Throughput	Request throughput, which is calculated as follows: Successful Requests/Total Requests
Total Input Tokens	Total number of input tokens of all requests
Total generated tokens	Total number of output tokens of all requests
Input Token Throughput	Speed of input token calculation of the test, which is calculated as follows: Total Input Tokens/Benchmark Duration
Output Token Throughput	Speed of output token calculation of the test, which is calculated as follows: Total generated tokens/Benchmark Duration
Total Token Throughput	Speed of total input and output token calculation of the test, which is calculated as follows: (Total Input Tokens + Total generated tokens)/Benchmark Duration

Parent topic: Auxiliary Tools