Cluster Iteration Analysis
Cluster Iteration Analysis summarizes the iteration performance analysis data in the training cluster scenario, including the information on the summary page and detailed data of each iteration.
MindStudio does not support data collection in the cluster scenario. You can use Merge Reports to import the parent directory of PROF_XXX to display the collected profile data.
Summary Page
When you access Cluster Analysis for the first time, the summary page is displayed, on which a maximum of 10 groups of data can be included in a bar chart.
The summary page is divided into areas 1 to 4. For details about the fields, see Table 1, Table 2, Table 3, and Table 4.
- When Type is set to Iteration ID, Step Trace (iteration trace data) and Collective Communication (collective communication data) are displayed. When Type is set to Rank ID, only Step Trace is displayed.
- If you click a bar chart in the Step Trace area on the summary page, the detailed iteration data page corresponding to the iteration ID or rank ID will be prompted.
- The horizontal and vertical coordinates of the bar chart are described as follows:
- When Type is set to Iteration ID, the horizontal coordinates sort the iteration traces of all cluster nodes by total duration in descending order from left to right by default (collective communication data is sorted by communication time in descending order). If you click a column name in a table in area 2 or 4, the bar chart sorts the column values. The vertical coordinates are durations.
- When Type is set to Rank ID, the horizontal coordinates sort all iteration traces of the current cluster node by total duration in descending order from left to right by default. If you click a column name in a table in area 2, the bar chart sorts the column values. The vertical coordinates are iteration durations.
|
Field |
Description |
|---|---|
|
Type |
Data display mode:
|
|
Iteration ID |
Iteration ID, used for querying the iteration data of all devices in a specified iteration. |
|
Rank ID |
Rank ID, used for querying all iteration data of a specified node. |
|
Model ID |
Model ID, used for querying the iteration data of a specified model in a specified iteration or on a specified node. |
|
Apply |
Data export button. After you select an iteration ID/rank ID and a model ID, and click this button, the Cluster Iteration Analysis report of the corresponding node is exported. |
|
Step Trace |
Iteration trace data. |
|
Bar Chart |
Use the bar chart to display the iteration duration data. If this parameter is selected, FP to BP time, Iteration Refresh, and Iteration Interval are displayed in the bar chart in parallel. |
|
Stack Chart |
Use the stack chart to display the iteration duration data. If this parameter is selected, FP to BP time, Iteration Refresh, and Iteration Interval are displayed in a bar chart in stack mode. |
|
Top |
You can set the Top N value to display top N data records with the longest iteration durations. The value ranges from 1 to 200. The default value is 10. |
|
Field |
Description |
|---|---|
|
Iteration ID |
Iteration ID. |
|
Rank ID |
Rank ID. |
|
FP to BP time(us) |
FP/BP elapsed time (= BP End – FP Start). The unit is μs. |
|
Iteration Refresh(us) |
Iteration refresh hangover time (= Iteration End – BP End). The unit is μs. |
|
Iteration Interval(us) |
Iteration interval. The unit is μs. |
|
Total Time(us) |
Total iteration duration. |
Detailed Iteration Data Page
Area 1:
For details about timeline data, see Timeline View.
Area 2:
Operator Statistics: operator statistics.
|
Field |
Description |
|---|---|
|
Model Name |
Model name. It may be left empty if no related data is collected. |
|
OP Type |
Operator type. |
|
Core Type |
Core type. |
|
Count |
Number of calls to an operator. |
|
Total Time(us) |
Time taken by the calls to an operator (μs). |
|
Min Time(us) |
Minimum time required for calling an operator (μs) |
|
Avg Time(us) |
Average time required for calling an operator (μs) |
|
Max Time(us) |
Maximum time required for calling an operator (μs) |
|
Total Time Ratio(%) |
Percentage of duration of the operator calls in the model. |
Area 3:
Computing Workload: operator computing workload.
The pie chart is not associated with the table on the right. It is drawn based on the proportion of each operator type in the OP Type column in the table. This pie chart is displayed only when Profiling collects data in task-based mode. The fields displayed are related to the AI Core collection type. For details about the fields, see AI Core Metrics View.

