Performance/Accuracy Test Tool

Currently, MindIE supports the AISBench tool for accuracy and performance tests. For details, see AISBench. Table 1 and Table 2 lists the supported functions and features as well as performance test indicators.

Table 1 Features

Feature

AISBench

Inference mode

Supports streaming inference and text inference in client mode. For details, see here.

Inference engine

Supports inference engines including MindIE, vLLM, SGLang, TGI, and Triton. For details, see here.

Dataset

Supports 39 open-source datasets and synthetic random datasets. For details, see here.

Sending mode

Supports uniform and Poisson distributions. For details, see here.

Accuracy test

Supported. For details, see here.

Performance test

Supported. For details, see here.

Token inference

Supported. For details, see here.

Multi LoRA inference

Supported. For details, see here.

Function call test

Supported. For details, see here.

Multi-turn dialogue test

Supported. For details, see here.

Stable-state test

Supported. For details, see here.

Pressure test

Supported. For details, see here.

Multi-task test

Supported. For details, see here.

Visualized process

Supported. For details, see here.

Resumable test

Supported. For details, see here.

Custom dataset

Supported. For details, see here.

Plug-in extension

Supported. For details, see here.

Table 2 Performance test result indicators

AISBench

Definition

TTFT

Time To First Token, latency of the first token

NOTE:

This indicator cannot be measured in the beam search scenario.

ITL

Inter-token latency, latency between chunks

TPOT

Time Per Output Token, latency between decode tokens, which is calculated as follows: (E2EL – TTFT)/(OutputTokens – 1)

NOTE:

This indicator cannot be measured in the beam search scenario.

E2EL

End To End Latency, end-to-end latency of a request

InputTokens

Number of input tokens of a request

OutputTokens

Number of generated tokens of a request

PrefillTokenThroughput

Prefill throughput of a request, which is calculated as follows: InputTokens/TTFT

OutputTokenThroughput

Throughput of a request, which is calculated as follows: OutputTokens/E2EL

Benchmark Duration

End-to-end duration of a performance test

Total Requests

Total number of sent requests

Failed Requests

Total number of failed requests

Successful Requests

Total number of successful requests

Concurrency

Average number of concurrent connections, which is calculated as follows: sum(E2EL)/Benchmark Duration

Max Concurrency

Configured number of concurrent connections

Request Throughput

Request throughput, which is calculated as follows: Successful Requests/Total Requests

Total Input Tokens

Total number of input tokens of all requests

Total generated tokens

Total number of output tokens of all requests

Input Token Throughput

Speed of input token calculation of the test, which is calculated as follows: Total Input Tokens/Benchmark Duration

Output Token Throughput

Speed of output token calculation of the test, which is calculated as follows: Total generated tokens/Benchmark Duration

Total Token Throughput

Speed of total input and output token calculation of the test, which is calculated as follows: (Total Input Tokens + Total generated tokens)/Benchmark Duration