Data Preparation
Data Preparation displays data about analysis on the data preparation performance.
- This function applies only to the cluster training scenario.
- MindStudio IDE does not support data collection in cluster scenarios. You can use Import Result to import the parent directory of PROF_XXX to display the collected profile data.
The Data Preparation process can be divided into three phases: data processing pipeline, training data sending to the device, and training data reading on the device. During data preparation, the performance analysis tool of MindStudio can analyze performance bottlenecks in two phases (training data sending to the device and training data reading on the device) by identifying iteration intervals.
Figure 1 shows the data offloading mode, which contains Host Queues, Data Queues, Host Data Transmission, and Data Acquisition (data acquisition operator time consumption).
Figure 2 shows the non-data offloading mode, which contains only Host Queues.
- The vertical coordinate of Data Queues indicates the queue length when the device reads training data. If the data queue length is 0, the training keeps waiting for the next iteration until there is data in the queue. In this case, the iteration may have a performance bottleneck. If the data queue length is greater than 0, data can be quickly read during training, and data preparation is not the bottleneck of the iteration. If there is a fluctuation curve in the chart, a delay occurs when data is read from the queue during training, and a performance bottleneck may exist.
- The vertical coordinate of Host Queues is the number of cached data records in the current queue. If the number of cached data records in the queue is 0 in most cases, the data processing process may have a performance bottleneck. If this number is greater than 0, the process of sending data to the device after obtaining data may have a performance bottleneck.
- The vertical coordinate of Host Data Transmission is the time required for obtaining and pushing data on the host. If the time is long, a performance bottleneck may exist.
- The vertical coordinate of Data Acquisition is the time required for obtaining data on the device. If the time is long, a performance bottleneck may exist.
Table 1 describes the fields in the figures.
|
Field |
Description |
|---|---|
|
Rank ID |
Rank ID in the cluster scenario. |
|
Apply |
Data export button. When you select a rank ID and click this button, the Data Preparation of the rank is exported. |
|
Queues Analysis |
|
|
Host Queues |
Host queue chart. |
|
Data Queues |
Data queue chart. |
|
Proportion of Empty Queues: */* |
Proportion of empty queues: Number of empty queues/Total number of queues. It is the summary of the horizontal and vertical coordinates of the host and data queue charts. |
|
Iteration |
Iteration. |
|
Queue size |
Number of queues. |
|
Consumption Analysis |
|
|
Host Data Transmission |
Host data sending chart. |
|
Average Duration: *ms |
Average duration, in milliseconds. It is the average mean value obtained after the horizontal and vertical coordinates of the host data sending chart are summarized. |
|
Average Data Acquisition Duration: *ms |
Average data acquisition duration, in milliseconds. |
|
Average Data Sending Duration: *ms |
Average data sending duration, in milliseconds. |
|
Total duration |
Total duration. It is the summary of the horizontal and vertical coordinates of the host data sending chart. |
|
Data acquisition duration |
Data acquisition duration. |
|
Data sending duration |
Data sending duration. |
|
Data Acquisition |
Time consumption of the data acquisition operator. |
|
Average Duration: *ms |
Average duration, in milliseconds. It is the average mean value obtained after the horizontal and vertical coordinates of the time consumption chart of the data acquisition operator are summarized. |
|
Iteration |
Iteration. |
|
Time(ms) |
Time required. |

