Performance/Accuracy Test Tool
Currently, MindIE supports the AISBench tool for accuracy and performance tests. For details, see AISBench. Table 1 and Table 2 lists the supported functions and features as well as performance test indicators.
AISBench |
Definition |
|---|---|
TTFT |
Time To First Token, latency of the first token NOTE:
This indicator cannot be measured in the beam search scenario. |
ITL |
Inter-token latency, latency between chunks |
TPOT |
Time Per Output Token, latency between decode tokens, which is calculated as follows: (E2EL – TTFT)/(OutputTokens – 1) NOTE:
This indicator cannot be measured in the beam search scenario. |
E2EL |
End To End Latency, end-to-end latency of a request |
InputTokens |
Number of input tokens of a request |
OutputTokens |
Number of generated tokens of a request |
PrefillTokenThroughput |
Prefill throughput of a request, which is calculated as follows: InputTokens/TTFT |
OutputTokenThroughput |
Throughput of a request, which is calculated as follows: OutputTokens/E2EL |
Benchmark Duration |
End-to-end duration of a performance test |
Total Requests |
Total number of sent requests |
Failed Requests |
Total number of failed requests |
Successful Requests |
Total number of successful requests |
Concurrency |
Average number of concurrent connections, which is calculated as follows: sum(E2EL)/Benchmark Duration |
Max Concurrency |
Configured number of concurrent connections |
Request Throughput |
Request throughput, which is calculated as follows: Successful Requests/Total Requests |
Total Input Tokens |
Total number of input tokens of all requests |
Total generated tokens |
Total number of output tokens of all requests |
Input Token Throughput |
Speed of input token calculation of the test, which is calculated as follows: Total Input Tokens/Benchmark Duration |
Output Token Throughput |
Speed of output token calculation of the test, which is calculated as follows: Total generated tokens/Benchmark Duration |
Total Token Throughput |
Speed of total input and output token calculation of the test, which is calculated as follows: (Total Input Tokens + Total generated tokens)/Benchmark Duration |