Cluster Iteration Analysis

Cluster Iteration Analysis summarizes the iteration performance analysis data in the training cluster scenario, including the information on the summary page and detailed data of each iteration.

MindStudio IDE does not support data collection in cluster scenarios. You can use Import Result to import the parent directory of PROF_XXX to display the collected profile data.

Summary Page

When you access Cluster Analysis for the first time, the summary page is displayed, on which a maximum of 10 groups of data can be included in a bar chart.

The summary page is divided into areas 1 and 2. For details about the fields, see Table 1 and Table 2.

When Type is set to Iteration ID, Step Trace and Data Parallelism Statistics, Model Parallelism Statistics, or Pipeline Parallelism Statistics are displayed, as shown in Figure 1, Figure 2, or Figure 3.

When Type is set to Rank ID, only Step Trace is displayed, as shown in Figure 4.

Figure 1 Data Parallelism Statistics

Figure 2 Model Parallelism Statistics

Figure 3 Pipeline Parallelism Statistics

Figure 4 Rank ID

If you click a bar chart in the Step Trace area on the summary page, the detailed iteration data page corresponding to the iteration ID or rank ID will be prompted.
The horizontal and vertical coordinates of the bar chart are described as follows:
- When Type is set to Iteration ID, the horizontal coordinates sort the iteration traces of all cluster nodes by total duration in descending order from left to right by default, and sort the parallelism statistics by computation time in descending order. If you click a column name in a table on the right pane, the bar chart sorts the column values. The vertical coordinates are durations.
- When Type is set to Rank ID, the horizontal coordinates sort all iteration traces of the current cluster node by total duration in descending order from left to right by default. If you click a column name in a table on the right pane, the bar chart sorts the column values. The vertical coordinates are iteration durations.

**Table 1** Fields in area 1
Field	Description
Type	Data display mode: Iteration ID: When you set Type to Iteration ID and click Apply, the bar chart in the lower part displays the iteration data of all cluster nodes in the current iteration. Rank ID: When you set Type to Rank ID and click Apply, the bar chart in the lower part displays all iteration data of the current node.
Iteration ID	Iteration ID, used for querying the iteration data of all devices in a specified iteration.
Rank ID	Rank ID, used for querying all iteration data of a specified node.
Model ID	Model ID, used for querying the iteration data of a specified model in a specified iteration or on a specified node.
Apply	Data export button. After you select an iteration ID/rank ID and a model ID, and click this button, the Cluster Iteration Analysis report of the corresponding node is exported.
Step Trace (iteration trace data)
Bar Chart	Use the bar chart to display the iteration duration data. If this parameter is selected, FP to BP Time, Iteration Refresh, and Iteration Interval are displayed in the bar chart in parallel.
Stack Chart	Use the stack chart to display the iteration duration data. If this parameter is selected, FP to BP Time, Iteration Refresh, and Iteration Interval are displayed in a bar chart in stack mode.
Top	You can set the Top N value to display top N data records with the longest iteration durations. The value ranges from 1 to 200. The default value is 10.
FP to BP Time(us)	FP/BP elapsed time (= BP End – FP Start). The unit is μs.
Iteration Refresh(us)	Iteration refresh hangover time (= Iteration End – BP End). The unit is μs.
Iteration Interval(us)	Iteration interval. The unit is μs.
Total Time(us)	Total iteration duration. The unit is μs.

**Table 2** Fields in area 2
Field	Description
Rank ID	Rank ID.
Top	You can set the Top N value to display top N data records with the longest collective communication durations. The value ranges from 1 to 200. The default value is 10.
Data Parallelism Statistics (data parallelism mode)
Computation Time(us)	Computation time. The unit is μs. It is the total operator execution time, which is used to determine whether slow cards exist.
Pure Communication Time(us)	Pure communication time, during which only communication operators are executed and compute operators are not executed. The unit is μs.
Communication Time(us)	Communication time. The unit is μs.
Communication Interval(us)	Communication interval. The unit is μs.
Model Parallelism Statistics (model parallelism mode)
Computation Time(us)	Computation time. The unit is μs. It is the total operator execution time, which is used to determine whether slow cards exist.
Pure Communication Time(us)	Pure communication time, during which only communication operators are executed and compute operators are not executed. The unit is μs.
Pipeline Parallelism Statistics (pipeline parallelism mode)
Computation Time(us)	Computation time. The unit is μs. It is the total operator execution time, which is used to determine whether slow cards exist.
Pure Communication Time (Only Receive Op Included)(us)	Pure communication time (including only the Receive operator), during which only the point-to-point (Receive) communication operator is executed and compute operators are not executed. The unit is μs.
Pure Communication Time (Receive Op Not Included)(us)	Pure communication time (excluding the Receive operator), during which only communication operators except Receive are executed and compute operators are not executed. The unit is μs.
Stage Time(us)	Stage time, that is, duration of each stage. The unit is μs. You can view the data to find the stage that takes the longest time.

Detailed Iteration Data Page

When you click a bar chart in the Step Trace area on the summary page, the window for detailed profile data of the specified iteration ID/rank ID is displayed, including area 1 (Timeline) and area 2 (Bottleneck/Operator Statistics/Computing Workload). See Figure 5.

Figure 5 Page for detailed iteration data

Figure 6 Operator Statistics

Figure 7 Computing Workload

Area 1:

For details about timeline data, see Timeline View.

Area 2:

Bottleneck: bottlenecks and optimization suggestions.
Bottlenecks are classified into six types: Computation, Memory, Operator Schedule, Operator Processing, Operator Metrics, and Operator Parallelism. Each type contains several sub-issues. You can click see more to view related operator information, and click see more again to view full operator information on the right.

Operator Statistics: operator statistics, as shown in Figure 6.

The pie chart on the left is associated with the data in the table on the right. When you click a column header, the pie chart displays the proportion of each data item based on the actual data in the column. For details about the fields, see Table 3.

**Table 3** Fields in **Operator Statistics**
Field	Description
Model Name	Model name. It may be left empty if no related data is collected.
OP Type	Operator type.
Core Type	Core type.
Count	Number of calls to an operator.
Total Time(us)	Time taken by the calls to an operator (μs).
Min Time(us)	Minimum time required for calling an operator (μs)
Avg Time(us)	Average time required for calling an operator (μs)
Max Time(us)	Maximum time required for calling an operator (μs)
Total Time Ratio(%)	Percentage of duration of the operator calls in the model.

Computing Workload: operator computation workload, as shown in Figure 7.

The pie chart is not associated with the table on the right. It is drawn based on the proportion of each operator type in the OP Type column in the table. This pie chart is displayed only when profile data is collected in Task-based mode. For details about the fields, see Table 4.

**Table 4** Fields in **Computing Workload**
Field	Description
FLOPs(M)	Quantity of floating-point operations per second. It is a unit of the computing speed of a computer.
FLOPS(G/s)	Peak quantity of floating-point operations per second. It indicates the peak computing speed of a computer
FLOPS AVG(bytes)	Average quantity of floating-point operations per second. It indicates the average speed of a computer.
The fields displayed in the table on the right vary with the AI Core collection type. For details about the fields, see AI Core Metrics View.

Parent topic: Data in Cluster Scenarios