msprof (Timeline Report)

Availability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Timeline report: msprof*.json.

The following figure shows a sample msprof*.json file opened in chrome://tracing.

Figure 1 Timeline summary display

As shown in Figure 1, the timeline summary data is displayed in the following areas:

  • Area 1: data at the application layer, including the time consumption information of upper-layer application running. The data needs to be collected only in msproftx or PyTorch scenarios.
  • Area 2: data at the CANN layer, including the time consumption data of components (such as AscendCL and Runtime) and nodes (operators).
  • Area 3: bottom-layer NPU data, including the time consumption data and iteration trace data of each task stream under Ascend Hardware, HCCL and Overlap Analysis communication data, and other Ascend AI Processor system data.
  • Area 4: details about each operator and API in a timeline (displayed when you click a timeline).
  • Data of the timeline report is described in detail in Profile Data File References.
  • The data in each area of the above figure is related to the collection scenario. Area 1 is generated only when data is collected in msproftx or PyTorch scenarios; and HCCL and Overlap Analysis communication data can only be collected in multiple-device, multi-node, and cluster communication scenarios. Use the actually collected data.
  • The msprof*.json file displays data within iterations. Data outside iterations is not displayed.

Operator Delivery Direction Check

When viewing a .json file in tracing, enable the option under Flow events, and the corresponding delivery and execution mappings between application-layer operators and NPU operators are displayed through connection lines. See Figure 2.

The mappings include:

  • async_npu: delivery and execution mapping from application-layer operators to NPU operators on Ascend Hardware.
  • MsTx: delivery and execution mapping from traininginference process dotting tasks to NPU dotting operators on Ascend Hardware. This mapping is generated when the aclprofMarkEx API is called for dotting.
  • async_task_queue: mapping from enqueuing to dequeuing at the application layer.
  • HostToDevice: delivery and execution mapping from CANN-layer nodes (operators) to NPU operators on Ascend Hardware (host to device).
  • HostToDevice: delivery and execution mapping from CANN-layer nodes (operators) to HCCL communication operators on Ascend Hardware (host to device).
  • fwdbwd: mapping from forward APIs to backward APIs.
  • Due to the deviation between the Ascend AI Processor frequency measured by software and the actual frequency, as well as the time synchronization error between the host and device, lower-layer operators may fail to be connected by lines due to misplacement.
  • Whether mappings between layers are displayed depends on whether the data is collected in a specific scenario.
Figure 2 Operator mappings

You can click the operator or API at each end of a connection line to view the operator delivery direction. See Figure 3.

Figure 3 Operator information

View the inbound and outbound directions of an operator or API in the Event(s) column. View the information at both ends of a mapping in the Link column.