GUI Description
Function
During serving tuning, MindStudio Insight displays the end-to-end request execution in the timeline view, showing the duration of the request in each key phase and the status of the request. By analyzing the timeline, you can quickly identify service performance bottlenecks and adjust the tuning policy based on the symptom.
GUI Display
- Area 1: toolbar, which contains common shortcut keys. From left to right, the shortcut keys are Marker List, Filter (card or unit), Search, Flow Events, Reset (page restoration), Timeline Zoom Out, and Timeline Zoom In.
- Area 2: graphical display. The profile data collected by service is displayed on the left. The first level is the process, and the second level is the key phase information of the request. Table 1 shows the unit information. The timeline view is displayed on the right line by line, including the execution sequence and duration of each key phase.
Table 1 Unit information Unit
Description
CPU Usage
Average CPU usage. This unit is displayed only when the host_system_usage_freq data collection function is enabled.
Memory Usage
System memory usage on the host. This unit is displayed only when the host_system_usage_freq data collection function is enabled.
NPU Usage
NPU memory usage. This unit is displayed only when the npu_memory_usage_freq data collection function is enabled.
KVCache
Usage of remaining KV cache over time.
BatchSchedule
Group batch time, in nanoseconds.
WAITING
Time when a request is in the WAITING state.
PENDING
Time when a request is in the PENDING state.
RUNNING
Time when a request is in the RUNNING state.
RUNNING2
Time when a request is in the RUNNING2 state.
SWAPPED
Time when a batch is in the SWAPPED state.
RECOMPUTE
Time when a request is in the RECOMPUTE state.
SUSPENDED
Time when a batch is in the SUSPENDED state.
END
Time when a request is in the END state.
END_PRE
Time when a request is in the END_PRE state.
STOP
Time when a batch is in the STOP state.
PREFILL_HOLD
Time when a batch is in the PREFILL_HOLD state.
http
HTTP request lifetime data, covering the receipt, encoding, and decoding of the request.
batchFrameworkProcessing
Batch data, including the batch creation time, current batch type (prefill or decode), request RID, and steps.
preprocessBatch
Time consumed for parameter injection to batches during IBIS data distribution, in nanoseconds.
SerializeExecuteMessage
Time consumed for serialization during IBIS data distribution, in nanoseconds.
setInferBuffer
Time consumed for buffer setting during IBIS data distribution, in nanoseconds.
grpcWriteToSlave
Time consumed for gRPC write during IBIS data distribution, in nanoseconds.
deserializeExecuteRequestsForInfer
Time consumed for deserialization during IBIS data distribution, in nanoseconds.
convertTensorBatchToBackend
Time consumed for request conversion during IBIS data distribution, in nanoseconds.
getInputMetadata
Time consumed for metadata obtaining during IBIS data distribution, in nanoseconds.
beforemodelExec
Processing time before model execution, in nanoseconds.
modelExec
Model execution data, in nanoseconds, including the execution time, current batch type (prefill or decode), request RID, and steps.
instanceExecute
Model instance execution time, in nanoseconds.
Queue
Time when the request is enqueued.
PDcommunication
PD disaggregation communication time, in nanoseconds. This unit exists only in the PD disaggregation scenario.
forward
Forward propagation time of model inference, in nanoseconds.
operatorExecute
Python-side model API execution time, in nanoseconds.
processPythonExecResult
Time consumed for response conversion, serialization, and writing to the shared memory during data receiving, in nanoseconds.
deserializeExecuteResponse
Time consumed for deserialization during data receiving, in nanoseconds.
saveoutAndContinueBatching
Time consumed for parsing responses as outputs during data receiving, in nanoseconds.
continueBatching
Time consumed for enqueuing requests during data receiving, in nanoseconds.
sendExecuteMessage
Time consumed for sending execution information, in nanoseconds.
postprocess
Postprocessing time of model inference, in nanoseconds.
preprocess
Preprocessing time of model inference, in nanoseconds.
processBroadcastMessage
Time consumed for broadcasting communication information, in nanoseconds.
sample
Sampling time, in nanoseconds.
PullKVCache
KV cache transfer time between PD nodes, in nanoseconds. This unit exists only in the PD disaggregation scenario.
CANN
Operator execution time, in nanoseconds. This unit is displayed only when the acl_task_time data collection function is enabled.
dpBatch
DP domain information corresponding to each request during model inference.
RequestState
Request status changes during model inference.
- Area 3: data pane, which displays statistics or instruction details. If you select Slice Detail, the details of a single key phase are displayed. If you select Slice List, the key phase list information of the selected area in the unit is displayed.
You can check the duration and interval at each level in the timeline view to determine whether performance problems exist in the corresponding key phase.
