Visualized Results
The generated Grafana dashboard contains the following visualized charts:
Chart |
Description |
|---|---|
Batch Size by Batch ID |
Line chart showing the number of requests in each scheduled batch, distinguished by prefill and decode phases over time. |
Request Status |
Line chart showing the number of requests in different states over time. |
Kvcache usage percent |
Line chart showing the KV cache usage of all requests over time. |
first_token_latency |
Line chart showing the time to first token (TTFT) of all requests over time, including the average TTFT, as well as the 99th, 90th, and 50th percentile values. |
prefill_generate_speed_latency |
Line chart showing the average token latency of all requests in the prefill phase over time, including the average token latency, as well as the 99th, 90th, and 50th percentile values. |
decode_generate_speed_latency |
Line chart showing the average token latency of all requests in the decode phase over time, including the average token latency, as well as the 99th, 90th, and 50th percentile values. |
request_latency |
Line chart showing the end-to-end latency of all requests over time, including the average end-to-end latency, as well as the 99th, 90th, and 50th percentile values. |
Batch Size by Batch ID
Line chart showing the number of requests in each scheduled batch.
x-axis: represents the chronological batch index, beginning with 0.
y-axis: represents the batch size, distinguished by prefill and decode batches.

Request Status
Line chart showing the number of requests in different states over time.
x-axis: represents the timeline of the inference serving.
y-axis: represents the total number of requests in this state at the current time.

Kvcache usage percent
Line chart showing the KV cache usage of all requests over time.
x-axis: represents the timeline of the inference serving.
y-axis: represents the KV cache usage change of all requests, in percentage.

first_token_latency
Line chart showing the token latency of all requests over time.
x-axis: represents the timeline of the inference serving.
y-axis: represents the average TTFT, as well as the 99th, 90th, and 50th percentile values, in μs.

prefill_generate_speed_latency
Line chart showing the average token latency of all requests in the prefill phase over time,
x-axis: represents the timeline of the inference serving.
y-axis: represents the average token latency, as well as the 99th, 90th, and 50th percentile values in the prefill phase over time. The unit is tokens/s.

decode_generate_speed_latency
Line chart showing the average token latency of all requests in the decode phase over time.
x-axis: represents the timeline of the inference serving.
y-axis: represents the average token latency, as well as the 99th, 90th, and 50th percentile values in the decode phase over time. The unit is tokens/s.

request_latency
Line chart showing the end-to-end latency of all requests over time.
x-axis: represents the timeline of the inference serving.
y-axis: represents the average end-to-end latency, as well as the 99th, 90th, and 50th percentile values, in μs.
