GUI Description

Function

During serving tuning, MindStudio Insight displays the end-to-end request execution in the timeline view, showing the duration of the request in each key phase and the status of the request. By analyzing the timeline, you can quickly identify service performance bottlenecks and adjust the tuning policy based on the symptom.

GUI Display

The Timeline tab page consists of the toolbar (area 1), graphical display (area 2), and data pane (area 3), as shown in Figure 1.

Figure 1 Timeline tab page

Area 1: toolbar, which contains common shortcut keys. From left to right, the shortcut keys are Marker List, Filter (card or unit), Search, Flow Events, Reset (page restoration), Timeline Zoom Out, and Timeline Zoom In.

Area 2: graphical display. The profile data collected by service is displayed on the left. The first level is the process, and the second level is the key phase information of the request. Table 1 shows the unit information. The timeline view is displayed on the right line by line, including the execution sequence and duration of each key phase.

**Table 1** Unit information
Unit	Description
CPU Usage	Average CPU usage. This unit is displayed only when the host_system_usage_freq data collection function is enabled.
Memory Usage	System memory usage on the host. This unit is displayed only when the host_system_usage_freq data collection function is enabled.
NPU Usage	NPU memory usage. This unit is displayed only when the npu_memory_usage_freq data collection function is enabled.
KVCache	Usage of remaining KV cache over time.
BatchSchedule	Group batch time, in nanoseconds.
WAITING	Time when a request is in the WAITING state.
PENDING	Time when a request is in the PENDING state.
RUNNING	Time when a request is in the RUNNING state.
RUNNING2	Time when a request is in the RUNNING2 state.
SWAPPED	Time when a batch is in the SWAPPED state.
RECOMPUTE	Time when a request is in the RECOMPUTE state.
SUSPENDED	Time when a batch is in the SUSPENDED state.
END	Time when a request is in the END state.
END_PRE	Time when a request is in the END_PRE state.
STOP	Time when a batch is in the STOP state.
PREFILL_HOLD	Time when a batch is in the PREFILL_HOLD state.
http	HTTP request lifetime data, covering the receipt, encoding, and decoding of the request.
batchFrameworkProcessing	Batch data, including the batch creation time, current batch type (prefill or decode), request RID, and steps.
preprocessBatch	Time consumed for parameter injection to batches during IBIS data distribution, in nanoseconds.
SerializeExecuteMessage	Time consumed for serialization during IBIS data distribution, in nanoseconds.
setInferBuffer	Time consumed for buffer setting during IBIS data distribution, in nanoseconds.
grpcWriteToSlave	Time consumed for gRPC write during IBIS data distribution, in nanoseconds.
deserializeExecuteRequestsForInfer	Time consumed for deserialization during IBIS data distribution, in nanoseconds.
convertTensorBatchToBackend	Time consumed for request conversion during IBIS data distribution, in nanoseconds.
getInputMetadata	Time consumed for metadata obtaining during IBIS data distribution, in nanoseconds.
beforemodelExec	Processing time before model execution, in nanoseconds.
modelExec	Model execution data, in nanoseconds, including the execution time, current batch type (prefill or decode), request RID, and steps.
instanceExecute	Model instance execution time, in nanoseconds.
Queue	Time when the request is enqueued.
PDcommunication	PD disaggregation communication time, in nanoseconds. This unit exists only in the PD disaggregation scenario.
forward	Forward propagation time of model inference, in nanoseconds.
operatorExecute	Python-side model API execution time, in nanoseconds.
processPythonExecResult	Time consumed for response conversion, serialization, and writing to the shared memory during data receiving, in nanoseconds.
deserializeExecuteResponse	Time consumed for deserialization during data receiving, in nanoseconds.
saveoutAndContinueBatching	Time consumed for parsing responses as outputs during data receiving, in nanoseconds.
continueBatching	Time consumed for enqueuing requests during data receiving, in nanoseconds.
sendExecuteMessage	Time consumed for sending execution information, in nanoseconds.
postprocess	Postprocessing time of model inference, in nanoseconds.
preprocess	Preprocessing time of model inference, in nanoseconds.
processBroadcastMessage	Time consumed for broadcasting communication information, in nanoseconds.
sample	Sampling time, in nanoseconds.
PullKVCache	KV cache transfer time between PD nodes, in nanoseconds. This unit exists only in the PD disaggregation scenario.
CANN	Operator execution time, in nanoseconds. This unit is displayed only when the acl_task_time data collection function is enabled.
dpBatch	DP domain information corresponding to each request during model inference.
RequestState	Request status changes during model inference.

Area 3: data pane, which displays statistics or instruction details. If you select Slice Detail, the details of a single key phase are displayed. If you select Slice List, the key phase list information of the selected area in the unit is displayed.

You can check the duration and interval at each level in the timeline view to determine whether performance problems exist in the corresponding key phase.

Parent topic: Timeline