Instructions
For details about how to use the Timeline tab page in service-oriented tuning scenarios, see Instructions in System Optimization.
Slice Detail
Servitization View
On the System View tab page, when you select , the Rank ID selection box and serving data are displayed. You can select the card to be viewed from the Rank ID selection box.
The serving data includes kvcache_usage, batch_info, request_data, and forward_info, as shown in Figure 2.
Select a serving data type. The details are displayed in the right pane. For details about the fields, see Table 2. You can click
next to a field name to search for the required information.
Field |
Description |
|
|---|---|---|
kvcache_usage |
||
rid |
rid |
Request ID. |
name |
name |
Method that changes the graphics memory usage. |
real_start_time_ms |
real_start_time_ms |
Time when the device memory usage changes, in milliseconds. |
device_kvcache_left |
device_kvcache_left |
Number of left blocks in the graphics memory. |
kvcache_usage_rate |
kvcache_usage_rate |
KV cache usage. |
batch_info |
||
name |
name |
Batch grouping or execution. batchFrameworkProcessing refers to batch grouping, while modelExec refers to batch execution. |
res_list |
res_list |
List of grouped batches. |
start_time_ms |
start_time_ms |
Start time of batch grouping or batch execution, in milliseconds. |
end_time_ms |
end_time_ms |
End time of batch grouping or batch execution, in milliseconds. |
batch_size |
batch_size |
Number of requests in a batch. |
batch_type |
batch_type |
Request status (prefill or decode) in a batch. |
during_time_ms |
during_time_ms |
Execution time, in milliseconds. |
dp*_rid |
dp*_rid |
ID of the request contained in the DP domain. The asterisk (*) indicates the DP domain ID, and the value range is [0, n-1]. |
dp*_size |
dp*_size |
Batch size of the DP domain. The asterisk (*) indicates the DP domain ID, and the value range is [0, n-1]. |
dp*_forward_ms |
dp*_forward_ms |
The longest forward execution time in the DP domain, in milliseconds. The asterisk (*) indicates the DP domain ID, and the value range is [0, n-1]. |
request_data |
||
http_rid |
http_rid |
HTTP request ID. |
start_time_ms |
start_time_ms |
Request arrival time, in milliseconds. |
recv_token_size |
recv_token_size |
Input token length of a request. |
reply_token_size |
reply_token_size |
Output token length of a request. |
execution_time_ms |
execution_time_ms |
End-to-end request duration, in milliseconds. |
queue_wait_time_ms |
queue_wait_time_ms |
The total waiting time of a request in the queue throughout the inference process includes both waiting and pending periods, measured in milliseconds. |
first_token_latency |
first_token_latency |
Time to first token (TTFT), in milliseconds. |
forward_info |
||
name |
name |
Forward event mark, which indicates the forward execution process of the model. |
relative_start_time(ms) |
relative_start_time(ms) |
Time between the forward and the first forward on each device. |
start_time(ms) |
start_time(ms) |
Forward start time. |
end_time(ms) |
end_time(ms) |
Forward end time. |
during_time(ms) |
during_time(ms) |
Forward execution time, in milliseconds. |
bubble_time(ms) |
bubble_time(ms) |
Bubble time between forwards, in milliseconds. |
batch_size |
batch_size |
Number of requests processed by the forward. |
batch_type |
batch_type |
Request status in the forward. |
forward_iter |
forward_iter |
Step ID of the forward on different cards. |
dp_rank |
dp_rank |
Forward DP information. The values in this column are the same for the same DP domain. |
prof_id |
prof_id |
Card ID. The values in this column are the same for the same card. |
hostname |
hostname |
Host name. The values in this column are the same for the same host. |
Generating Line Charts by Blocks
The duration and bubble line charts of blocks are available in the serving tuning scenario, facilitating fault analysis.
On the Timeline tab page, right-click a block in any unit and choose Generate Duration Line Chart By Block or Generate Bubble Line Chart By Block from the shortcut menu. The Curve tab page is displayed, showing the curve (average duration and duration) and data details of the unit where the block is located, as shown in Figure 3.
If you spot an anomaly in the curve, zoom into that area and click on the anomaly. Check the related information in the data details table below the curve. Right-click the data row and choose Find in Timeline from the shortcut menu. The Timeline page is displayed, as shown in Figure 4.



