Parsing Results

The parsing results are saved in the path specified by --output-path.

**Table 1** Mapping between domains and parsing results
Parsing Result	Domain
profiler.db	"BatchSchedule; ModelExecute; Request; KVCache"
chrome_tracing.json	No mandatory restriction. To view flow events between requests, you must profile the Request domain.
batch.csv	"BatchSchedule; ModelExecute"
kvcache.csv	"KVCache"
request.csv	"Request"
forward.csv	"BatchSchedule; ModelExecute"
pd_split_communication.csv	"Communication"
pd_split_kvcache.csv	"KVCache"
coordinator.csv	"Coordinator"
{host_name}_eplb_{i}_summed_hot_map_by_expert.png	"eplb_observe"
{host_name}_eplb_{i}_summed_hot_map_by_rank.png	"eplb_observe"
{host_name}_eplb_{i}_summed_hot_map_by_model_expert.png	"eplb_observe"

The parsing results of the acl_prof_task_time_level, aclDataTypeConfig, and aclprofAicoreMetrics parameters are not listed in the preceding table. For details about the parsing results of the three parameters, see Profiling Description and op_summary (Operator Details). The actual results may vary. The op_statistic_*.csv and op_summary_*.csv files are flushed to the PROF_XXX directory in the directory specified by --output-path. The profile data files collected using the three parameters are saved in the PROF_XXX/mindstudio_profiler_output directory in the directory specified by prof_dir.

The files are as follows:

profiler.db

SQLite database file used to generate line charts.

It contains the following database tables. The functions of the tables are as follows:

**Table 2** profiler.db
Table Name	Description
batch	Displays batch table data on MindStudio Insight.
decode_gen_speed	Generates line charts showing average token latency at different time points in the decode phase.
first_token_latency	Generates line charts showing the time to first token (TTFT) of the serving framework.
kvcache	Generates line charts showing the KV cache memory usage during serving.
prefill_gen_speed	Generates line charts showing average token latency at different time points in the prefill phase.
req_latency	Generates line charts showing the end-to-end request latency of the serving framework.
request_status	Generates line charts showing the request status of the profile data at different time points.
request	Displays request table data on MindStudio Insight.
batch_exec	Displays the mapping between batches and model execution.
batch_req	Displays the mapping between batches and requests.
data_table	Displays table data on MindStudio Insight.
counter	Displays counter data in the trace view.
flow	Displays flow data in the trace view.
process	Displays secondary lane data in the trace view.
thread	Displays tertiary lane data in the trace view.
slice	Displays slice data in the trace view.
pd_split_kvcache	Displays KV cache table data of the decode node on MindStudio Insight, exclusive to prefill-decode (PD) disaggregation scenarios.
pd_split_communication	Displays communication table data between prefill and decode nodes on MindStudio Insight, exclusive to PD disaggregation scenarios.
ep_balance	Records load imbalance analysis results for the GroupedMatmul operator, profiled via MSPTI during DeepSeek MoE inference serving.
moe_analysis	Records fast/slow rank analysis results for the MoeDistributeCombine and MoeDistributeDispatch operators, profiled via MSPTI during DeepSeek MoE inference serving.
data_link	Enables drill-down on rid in the trace view to view request input length during the forward.

This file is intended for visualizing data in Grafana. Details about each entry are not described.

chrome_tracing.json

Records trace data of inference serving requests. You can visualize this data using various tools. Refer to Data Visualization for more information.

batch.csv

Records detailed batch-level data for inference serving.

**Table 3** batch.csv
Field	Description
name	Batch grouping or execution. batchFrameworkProcessing refers to batch grouping, while modelExec refers to batch execution.
res_list	List of grouped batches.
start_time(ms)	Start time of batch grouping or execution, in milliseconds.
end_time(ms)	End time of batch grouping or execution, in milliseconds.
batch_size	Number of requests in a batch.
batch_type	Request status (prefill or decode) in a batch.
during_time(ms)	Execution time, in milliseconds.

kvcache.csv

Records device memory usage during inference.

**Table 4** kvcache.csv
Field	Description
domain	KV cache event mark.
rid	Request ID.
timestamp(ms)	Time when the device memory usage changes, in milliseconds.
name	Method of changing the device memory usage.
device_kvcache_left	Number of left blocks in the device memory.

request.csv

Records detailed request-level data for inference serving.

**Table 5** request.csv
Field	Description
http_rid	HTTP request ID.
start_time(ms)	Request arrival time, in milliseconds.
recv_token_size	Input token length of a request.
reply_token_size	Output token length of a request.
execution_time(ms)	End-to-end request duration, in milliseconds.
queue_wait_time(ms)	Time for a request to wait in the queue throughout the entire inference process, including the time in the waiting and pending states, in milliseconds.
first_token_latency(ms)	TTFT, in milliseconds.

forward.csv

Records detailed execution data during the model forward in inference serving.

**Table 6** forward.csv
Field	Description
name	Forward event mark, which indicates the forward process of the model.
relative_start_time(ms)	Time elapsed since the initial forward on each device.
start_time(ms)	Forward start time, in milliseconds.
end_time(ms)	Forward end time, in milliseconds.
during_time(ms)	Forward execution time, in milliseconds.
bubble_time(ms)	Bubble time between forwards, in milliseconds.
batch_size	Number of requests per forward.
batch_type	Request status in the forward.
forward_iter	Step ID of the forward across ranks.
dp_rank	DP information of the forward. The values for the same DP domain are the same.
prof_id	Rank ID. The values for the same rank are the same.
hostname	Host name. The values for the same device are the same.

pd_split_communication.csv

Records communication data in PD disaggregation scenarios. PD disaggregation works in cluster scenarios with multiple nodes and ranks. It requires using the shared configuration file during profiling (see Profiling).

For details about PD disaggregation and related concepts, see "Cluster Service Deployment" > "Deploying the Prefill-Decode Disaggregation Service" in MindIE Motor Development Guide.

**Table 7** pd_split_communication.csv
Field	Description
rid	Request ID.
http_req_time(ms)	Request arrival time, in milliseconds.
send_request_time(ms)	Time when the prefill node starts to send a request to the decode node, in milliseconds.
send_request_succ_time(ms)	Time when the request is successfully sent, in milliseconds.
prefill_res_time(ms)	Time when prefill completes, in milliseconds.
request_end_time(ms)	Time when the request execution ends, in milliseconds.

pd_split_kvcache.csv

Records the KV cache transfer between prefill and decode nodes during inference based on PD disaggregation. PD disaggregation works in cluster scenarios with multiple nodes and ranks. It requires using the shared configuration file during profiling (see Profiling).

For details about PD disaggregation and related concepts, see "Cluster Service Deployment" > "Deploying the Prefill-Decode Disaggregation Service" in MindIE Motor Development Guide.

**Table 8** pd_split_kvcache.csv
Field	Description
domain	PullKVCache event mark.
rank	Device ID.
rid	Request ID.
block_tables	block_tables information.
seq_len	Request length.
during_time(ms)	Time taken to transfer the KV cache from the prefill node to the decode node, in milliseconds.
start_datetime(ms)	Start time for the KV cache to be transferred from the prefill node to the decode node, displayed as a specific date, in milliseconds.
end_datetime(ms)	End time for the KV cache to be transferred from the prefill node to the decode node, displayed as a specific date, in milliseconds.
start_time(ms)	Start time for the KV cache to be transferred from the prefill node to the decode node, displayed as a timestamp, in milliseconds.
end_time(ms)	End time for the KV cache to be transferred from the prefill node to the decode node, displayed as a timestamp, in milliseconds.

coordinator.csv

Records changes in the number of requests distributed to each node during inference based on PD disaggregation. PD disaggregation works in cluster scenarios with multiple nodes and ranks. It requires using the shared configuration file during profiling (see Profiling).

For details about PD disaggregation and related concepts, see "Cluster Service Deployment" > "Deploying the Prefill-Decode Disaggregation Service" in MindIE Motor Development Guide.

**Table 9** coordinator.csv
Field	Description
time	Time when the number of requests changes.
address	Address distributed to the node, in the format of IP address:Port number.
node_type	Node type (prefill or decode).
add_count	Number of added requests on the current node.
end_count	Number of ended requests on the current node.
running_count	Number of running requests on the current node.

ep_balance.csv

Records load imbalance analysis results for the GroupedMatmul operator, profiled via MSPTI during DeepSeek MoE inference serving.

Whenever ep_balance profile data is available, executing the parsing command will automatically generate a heatmap in the output directory. See Figure 1. In this heatmap, the x-axis represents the process ID for each device, while the y-axis represents the decoder layer of the model. Brighter pixels indicate longer duration. Greater color variation across rows indicates more pronounced load imbalance.

**Table 10** ep_balance.csv
Field	Description
<Process ID> (row header)	Process ID of each device at runtime.
<Decoder Layer> (column value)	Decoder layer index of the model running on each device.

Figure 1 ep_balance.png

moe_analysis.csv

Records fast/slow rank analysis results for the MoeDistributeCombine and MoeDistributeDispatch operators, profiled via MSPTI during DeepSeek MoE inference serving.

Whenever the moe_analysis profile data is available, executing the parsing command will automatically generate a box plot in the output directory. See Figure 2. The x-axis represents the process ID for each device, while the y-axis represents the total execution duration. The plot displays the mean and the 2.5th/97.5th percentiles of the total execution duration. Greater disparity between ranks (wider percentile intervals) indicates more pronounced fast/slow rank issues.

**Table 11** moe_analysis.csv
Field	Description
Dataset	Process ID of the corresponding device.
Mean	Mean total duration of the MoeDistributeCombine and MoeDistributeDispatch operators on this device.
CI Lower	2.5th percentile of the total duration for the MoeDistributeCombine and MoeDistributeDispatch operators on this device.
CI Upper	97.5th percentile of the total duration for the MoeDistributeCombine and MoeDistributeDispatch operators on this device.

Figure 2 moe_analysis.png

request_status.csv

Records the request status at each moment during inference serving (number of requests in the waiting, running, or swapped state). This data can be used to generate line charts that visualize request status trends over time.

**Table 12** request_status.csv
Field	Description
hostuid	Node ID.
pid	Process ID.
timestamp(ms)	Timestamp, in milliseconds.
relative_timestamp(ms)	Relative timestamp, in milliseconds.
waiting	Number of requests in the waiting state.
running	Number of requests in the running state.
swapped	Number of requests in the swapped state.

{host_name}_eplb_{i}_summed_hot_map_by_expert.png

This is an expert hotspot heatmap. In Figure 3, pixel brightness reflects hotspot intensity (see the colorbar on the right), that is, brighter pixels signifies higher heat.

host_name indicates the name of the device where data is located.
i indicates the number of load balancing table updates during the serving profiling period when dynamic load balancing is enabled on MindIE. If dynamic load balancing is disabled, i is 0.

Figure 3 Heatmap

The x-axis represents the expert ID, while the y-axis represents the MoE layer of the model.

In the model instance, Rank_ID is sorted in ascending order, with experts indexed sequentially within each rank. For example, in a configuration with 16 ranks and 17 experts per rank, expert ID 42 corresponds to expert_7 (the 8th expert) on Rank_2 (the 3rd rank).

{host_name}_eplb_{i}_summed_hot_map_by_rank.png

This is an expert hotspot heatmap. In Figure 4, pixel brightness reflects hotspot intensity (see the colorbar on the right), that is, brighter pixels signifies higher heat.

host_name indicates the name of the device where the expert is located.
i indicates the number of load balancing table updates during the serving profiling period when dynamic load balancing is enabled on MindIE. If dynamic load balancing is disabled, i is 0.

Figure 4 Heatmap

The x-axis represents the rank ID, while the y-axis represents the MoE layer of the model.

{host_name}_eplb_{i}_summed_hot_map_by_model_expert.png

This is an expert hotspot heatmap. In Figure 5, pixel brightness reflects hotspot intensity (see the colorbar on the right), that is, brighter pixels signifies higher heat.

host_name indicates the name of the device where the expert is located.
i indicates the number of load balancing table updates during the serving profiling period when dynamic load balancing is enabled on MindIE. If dynamic load balancing is disabled, i is 0.
This heatmap is generated only when the dynamic load balancing feature of MindIE is enabled.

Figure 5 Heatmap

The x-axis represents the expert ID, with shared experts positioned at the end of the sequence. The y-axis represents the MoE layer of the model.

Parent topic: Data Parsing