step_trace (Iteration Trace Information)

The timeline information of iteration trace data is displayed in the step_trace_*.json file, and the summary information is summarized in the step_trace_*.csv file to determine time-consuming iterations.

This profile data file does not exist in single-operator scenarios (such as the PyTorch scenario).

Availability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

step_trace_*.json File

Iteration trace data: step_trace_*.json. You can determine the iteration that takes the longest time based on the iteration length.

The file content is formatted as follows:

Figure 1 step_trace_*.json

Iteration trace data records the software status of a training job and the Ascend AI Software Stack, which can be used to analyze the performance of a training job. If the default two-segment gradient segmentation policy is applied, the iteration traces including fp_start, bp_end, Reduce Start, and Reduce Duration(us) of a training job are printed to describe the job execution status in an iteration.

In offline inference scenarios, FP (start point of the forward propagated operator in iteration traces) and BP (end point of the backward propagated operator in iteration traces) are not collected. In the collection result, FP Start and BP End are displayed as N/A and no timeline exists.

As shown in the preceding figure, to determine the gradient segmentation policy, you need to calculate the difference between bp_end and allreduce1_end as follows: (BP End – Reduce End)/freq (Based on the obtained iteration traces, the first batch of HCCS time is used for calculation.)

**Table 1** Field description
Field	Description
Title	API name of a component.
Start	Start point on the timeline, which is automatically aligned with that in chrome trace (ms).
Wall Duration	Time taken by the calls to an API (ms).
Iteration ID	Iteration ID for graph-based statistics collection. The iteration ID increases by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.
FP Start	FP start time (ns).
Iteration End	End time of each iteration (ns).
Iteration Time(ns)	Iteration duration (ns).
BP End	BP end time (ns).
FP_BP Time	FP/BP elapsed time (= BP End – FP Start). The unit is ns.
Iteration Refresh	Iteration refresh lag (= Iteration End – BP End) (ns).
Data_aug Bound	Data augmentation hangover time (= Current FP Start – Previous Iteration End). The elapsed time of iteration 0 is N/A because the previous Iteration End is absent.
Reduce	Collective communication elapsed time (may involve groups of iterations). ph:B indicates the start time, and ph:E indicates the end time. If there is only one device, no Reduce data is output.

Data Read Time Analysis

You can use the GetNext time segments to determine whether the interval between the end of the previous iteration and the start of the current iteration is too large due to slow data reading. See Figure 2.

Only the TensorFlow framework supports this function.

Figure 2 GetNext

**Table 2** GetNext field description
Field	Description
GetNext Start	Start time of data reading (ns).
GetNext End	End time of data reading (ns).
GetNext Time(ns)	Time required for data reading (ns).

step_trace_*.csv File

The file content is formatted as follows.

Figure 3 step_trace_*.csv

Determination based on the step_trace_*.json file can be confirmed with the information contained in the step_trace_*.csv file.

**Table 3** Field description
Field	Description
Device_id	Device ID.
Iteration ID	Iteration ID for graph-based statistics collection. The iteration ID increases by 1 each time a graph is executed. When a script is compiled into multiple graphs, the iteration ID is different from the step ID at the script layer.
FP Start (μs)	FP start time (μs).
BP End (μs).	BP end time (μs).
Iteration End (μs)	End time of each iteration (μs).
Iteration Time (μs)	Iteration duration (μs).
FP to BP Time (μs)	FP/BP elapsed time (= BP End – FP Start) (μs).
Iteration Refresh (μs)	Iteration refresh lag (= Iteration End – BP End) (μs).
Data Aug Bound (μs)	Data augmentation hangover time (= Current FP Start – Previous Iteration End) (μs). The elapsed time of iteration 0 is N/A because the previous Iteration End is absent.
Model ID	Graph ID in the model of a round of iteration.
Reduce Start (μs)	Start time of collective communication (μs).
Reduce Duration (μs)	Total duration spent by collective communication. The collective communication duration is divided into two segments according to the default segmentation policy. Reduce Start indicates the start time, and Reduce Duration indicates the duration from the start to the end. The unit is μs. If you use multiple devices, Reduce profile data will not be collected.

Parent topic: Profile Data File References