Data Preparation

Data Preparation displays data about analysis on the data preparation performance.

  • This function applies only to the cluster training scenario.
  • MindStudio does not support data collection in the cluster scenario. You can use Merge Reports to import the parent directory of PROF_XXX to display the collected profile data.

The Data Preparation process can be divided into three phases: data processing pipeline, training data sending to the device, and training data reading on the device.

During data preparation, MindStudio Profiling has implemented two phases for performance bottleneck analysis by identifying iteration gaps: training data sending to the device and training data reading on the device. The performance bottlenecks are analyzed according to the iteration gaps. As shown in Figure 1, Data Queues (data queue chart, an important basis for analysis and determination) and Data Acquisition (time consumption chart of the data acquisition operator) are included.
  • The vertical coordinate of Data Queues indicates the queue length when the device reads training data. If the data queue length is 0, the training keeps waiting for the next iteration until there is data in the queue. In this case, the iteration may have a performance bottleneck. If the data queue length is greater than 0, data can be quickly read during training, and data preparation is not the bottleneck of the iteration. If there is a fluctuation curve in the chart, a delay occurs when data is read from the queue during training, and a performance bottleneck may exist.
  • The vertical coordinate of Data Acquisition indicates the time consumed by the data acquisition operator to fetch data from the host to the queue. If the time is long, a performance bottleneck may exist.
Figure 1 Data Preparation

Table 1 describes the fields in the figure.

Table 1 Field description

Field

Description

Rank ID

Node ID in the cluster scenario.

Apply

Data export button. When you select a rank ID and click this button, the Data Preparation of the rank is exported.

Data Queues

Data queue chart.

Proportion of empty queues: */*

Proportion of empty queues: Number of empty queues/Total number of queues. It is the summary of the horizontal and vertical coordinates of the data queue chart.

Iteration

Iteration.

Data Acquisition

Time consumption chart of the data acquisition operator.

Average duration: *ms

Average duration, in milliseconds. It is the average mean value obtained after the horizontal and vertical coordinates of the time consumption chart of the data acquisition operator are summarized.

Time(ms)

Time consumed by the data acquisition operator.