Visualized Results

The generated Grafana dashboard contains the following visualized charts:

**Table 1** Visualized charts
Chart	Description
Batch Size by Batch ID	Line chart showing the number of requests in each scheduled batch, distinguished by prefill and decode phases over time.
Request Status	Line chart showing the number of requests in different states over time.
Kvcache usage percent	Line chart showing the KV cache usage of all requests over time.
first_token_latency	Line chart showing the time to first token (TTFT) of all requests over time, including the average TTFT, as well as the 99th, 90th, and 50th percentile values.
prefill_generate_speed_latency	Line chart showing the average token latency of all requests in the prefill phase over time, including the average token latency, as well as the 99th, 90th, and 50th percentile values.
decode_generate_speed_latency	Line chart showing the average token latency of all requests in the decode phase over time, including the average token latency, as well as the 99th, 90th, and 50th percentile values.
request_latency	Line chart showing the end-to-end latency of all requests over time, including the average end-to-end latency, as well as the 99th, 90th, and 50th percentile values.

Batch Size by Batch ID

Line chart showing the number of requests in each scheduled batch.

x-axis: represents the chronological batch index, beginning with 0.

y-axis: represents the batch size, distinguished by prefill and decode batches.

Figure 1 Batch Size by Batch ID

Request Status

Line chart showing the number of requests in different states over time.

x-axis: represents the timeline of the inference serving.

y-axis: represents the total number of requests in this state at the current time.

Figure 2 Request Status

Kvcache usage percent

Line chart showing the KV cache usage of all requests over time.

x-axis: represents the timeline of the inference serving.

y-axis: represents the KV cache usage change of all requests, in percentage.

Figure 3 Kvcache usage percent

first_token_latency

Line chart showing the token latency of all requests over time.

x-axis: represents the timeline of the inference serving.

y-axis: represents the average TTFT, as well as the 99th, 90th, and 50th percentile values, in μs.

Figure 4 first_token_latency

prefill_generate_speed_latency

Line chart showing the average token latency of all requests in the prefill phase over time,

x-axis: represents the timeline of the inference serving.

y-axis: represents the average token latency, as well as the 99th, 90th, and 50th percentile values in the prefill phase over time. The unit is tokens/s.

Figure 5 prefill_generate_speed_latency

decode_generate_speed_latency

Line chart showing the average token latency of all requests in the decode phase over time.

x-axis: represents the timeline of the inference serving.

y-axis: represents the average token latency, as well as the 99th, 90th, and 50th percentile values in the decode phase over time. The unit is tokens/s.

Figure 6 decode_generate_speed_latency

request_latency

Line chart showing the end-to-end latency of all requests over time.

x-axis: represents the timeline of the inference serving.

y-axis: represents the average end-to-end latency, as well as the 99th, 90th, and 50th percentile values, in μs.

Figure 7 request_latency

Parent topic: Grafana Visualization