Instructions

For details about how to use the Timeline tab page in service-oriented tuning scenarios, see Instructions in System Optimization.

Slice Detail

When you select a key phase block, the details about the key phase are displayed on the Slice Detail tab page. If res_list exists in Slice Detail, click any row in the rid list. The request details of the corresponding RID are displayed in the right area of Slice Detail, as shown in Figure 1. For details about the fields, see Table 1.
Figure 1 Slice Detail
Table 1 Slice Detail fields

Field

Description

Title

Name.

Start

Start time.

Start(Raw Timestamp)

Original start time of data collection.

Wall Duration

Total duration.

Args

Key phase parameters.

Servitization View

On the System View tab page, when you select Servitization View, the Rank ID selection box and serving data are displayed. You can select the card to be viewed from the Rank ID selection box.

The serving data includes kvcache_usage, batch_info, request_data, and forward_info, as shown in Figure 2.

Select a serving data type. The details are displayed in the right pane. For details about the fields, see Table 2. You can click next to a field name to search for the required information.

Figure 2 Servitization View
Table 2 Servitization View fields
  

Field

Description

kvcache_usage

rid

rid

Request ID.

name

name

Method that changes the graphics memory usage.

real_start_time_ms

real_start_time_ms

Time when the device memory usage changes, in milliseconds.

device_kvcache_left

device_kvcache_left

Number of left blocks in the graphics memory.

kvcache_usage_rate

kvcache_usage_rate

KV cache usage.

batch_info

name

name

Batch grouping or execution.

batchFrameworkProcessing refers to batch grouping, while modelExec refers to batch execution.

res_list

res_list

List of grouped batches.

start_time_ms

start_time_ms

Start time of batch grouping or batch execution, in milliseconds.

end_time_ms

end_time_ms

End time of batch grouping or batch execution, in milliseconds.

batch_size

batch_size

Number of requests in a batch.

batch_type

batch_type

Request status (prefill or decode) in a batch.

during_time_ms

during_time_ms

Execution time, in milliseconds.

dp*_rid

dp*_rid

ID of the request contained in the DP domain. The asterisk (*) indicates the DP domain ID, and the value range is [0, n-1].

dp*_size

dp*_size

Batch size of the DP domain. The asterisk (*) indicates the DP domain ID, and the value range is [0, n-1].

dp*_forward_ms

dp*_forward_ms

The longest forward execution time in the DP domain, in milliseconds. The asterisk (*) indicates the DP domain ID, and the value range is [0, n-1].

request_data

http_rid

http_rid

HTTP request ID.

start_time_ms

start_time_ms

Request arrival time, in milliseconds.

recv_token_size

recv_token_size

Input token length of a request.

reply_token_size

reply_token_size

Output token length of a request.

execution_time_ms

execution_time_ms

End-to-end request duration, in milliseconds.

queue_wait_time_ms

queue_wait_time_ms

The total waiting time of a request in the queue throughout the inference process includes both waiting and pending periods, measured in milliseconds.

first_token_latency

first_token_latency

Time to first token (TTFT), in milliseconds.

forward_info

name

name

Forward event mark, which indicates the forward execution process of the model.

relative_start_time(ms)

relative_start_time(ms)

Time between the forward and the first forward on each device.

start_time(ms)

start_time(ms)

Forward start time.

end_time(ms)

end_time(ms)

Forward end time.

during_time(ms)

during_time(ms)

Forward execution time, in milliseconds.

bubble_time(ms)

bubble_time(ms)

Bubble time between forwards, in milliseconds.

batch_size

batch_size

Number of requests processed by the forward.

batch_type

batch_type

Request status in the forward.

forward_iter

forward_iter

Step ID of the forward on different cards.

dp_rank

dp_rank

Forward DP information. The values in this column are the same for the same DP domain.

prof_id

prof_id

Card ID. The values in this column are the same for the same card.

hostname

hostname

Host name. The values in this column are the same for the same host.

Generating Line Charts by Blocks

The duration and bubble line charts of blocks are available in the serving tuning scenario, facilitating fault analysis.

On the Timeline tab page, right-click a block in any unit and choose Generate Duration Line Chart By Block or Generate Bubble Line Chart By Block from the shortcut menu. The Curve tab page is displayed, showing the curve (average duration and duration) and data details of the unit where the block is located, as shown in Figure 3.

Figure 3 Generating a curve by block

If you spot an anomaly in the curve, zoom into that area and click on the anomaly. Check the related information in the data details table below the curve. Right-click the data row and choose Find in Timeline from the shortcut menu. The Timeline page is displayed, as shown in Figure 4.

Figure 4 Find in Timeline