communication_statistic (Collective Communication Operator Statistics)
The timeline information of the collective communication operator and computation and communication pipeline overlapping data is displayed at the Communication layer in the msprof_*.json file, and the summary information is summarized in the communication_statistic_*.csv file. Additionally, the msprof_*.json file displays the Overlap Analysis for computation and communication pipeline overlapping data analysis.
HCCL operator data can only be collected and parsed in scenarios where inter-rank communications exist, such as the multi-rank, multi-server, and cluster scenarios.
Availability
Atlas 200/500 A2 Inference Product
Atlas Inference Series Product
Atlas Training Series Product
Atlas A2 Training Series Product/Atlas 800I A2 Inference Product
Atlas A3 Training Series Product
Data at the Communication Layer in msprof_*.json
The following figure shows data at the Communication layer in the msprof_*.json file.


In multi-rank, multi-node, or cluster scenarios, ranks communicate with each other to form communicators. The Communication layer collects the time consumptions of communication operators according to the arranged communicators. In this file, you can find the communication operator that takes the longest time.
Field |
Description |
|---|---|
Public information |
|
Group * Communication (communicator name, determined by the reported name) |
Communication operators in a communicator. A rank (rank) may exist in different communicators, and a group identifies the behavior of the current rank in the current communicator. |
Plane ID |
Network plane ID. In terms of parallel scheduling and execution of multiple transmit and receive communication links, each plane is a concurrent communication dimension. |
Title |
API name of a component. |
Start |
Start point on the timeline, which is automatically aligned with that in chrome trace (ms). |
Wall Duration |
Time taken by the calls to an API (ms). |
Self Time |
Execution time of the current instruction (ms). |
Communication large operator information |
|
connection_id |
ID of the connection between a CANN API and an NPU operator when the former is delivered to the latter. |
model id |
Model ID. |
data_type |
Data type. |
alg_type |
Algorithm type in each phase of communication operators, which can be MESH, RING, NB, HD, NHR, PIPELINE, PAIRWISE, or STAR. |
count |
Data transmission count. |
relay |
Checks if link replay happens to the communication operator: yes (link relay occurs) or no (no link relay occurs). Applicable products: Atlas A2 Training Series Product/Atlas 800I A2 Inference Product: Only no is displayed, with no specific meaning. Atlas A3 Training Series Product |
retry |
Checks if execution retry happens to the communication operator: yes (execution retry occurs) or no (no execution retry occurs). Applicable products: Atlas A2 Training Series Product/Atlas 800I A2 Inference Product Atlas A3 Training Series Product |
Communication small operator information |
|
notify id |
Unique notify ID. The notify ID is valid only for notify tasks and RDMA send tasks used to transmit the notify record signals. For other task types, the notify ID is invalid and is displayed as 18446744073709551615. |
duration estimated(us) |
Estimated task duration, in μs. |
stream id |
Stream ID. |
task id |
Task ID. |
task type |
Task type. |
src rank |
Source rank. |
dst rank |
Destination rank. The value 4294967295 indicates a local on-chip operation. |
transport type |
Transmission type, including LOCAL, SDMA, and RDMA. |
size(Byte) |
Data volume (Byte). This field is invalid when the task type is notify and will be set to 0. |
data type |
Data type. |
link type |
Link type, which can be HCCS, PCIe or RoCE. |
bandwidth(GB/s) |
Bandwidth, in GB/s. |
model id |
Model ID. |
Pipeline Overlapping Analysis of Computation and Communication
Overlap Analysis in msprof_*.json is the pipeline overlapping analysis data of computation and communication, which is controlled by --task-time and --hccl. See Figure 3.
Computation and communication are sometimes parallel. You can check the pipeline overlapping time (time when computation and communication are parallel) to determine the computation and communication efficiencies.
Field |
Description |
|---|---|
Communication |
Communication time. This field is not displayed in the single-rank scenario because no communication is involved. |
Communication(Not Overlapped) |
Communication time that is not overlapped. This field is not displayed in the single-rank scenario because no communication is involved. |
Computing |
Computation time |
Free |
Interval. |
Start |
Time when the current API starts to be called (ms). |
Wall Duration |
Time taken by the calls to an API (ms). |
communication_statistic_*.csv File
The file content is formatted as follows.

communication_statistic_*.csv contains the collective communication operator statistics, through which you can learn the time consumption of an operator type and the time consumption ratio of each communication operator in collective communication to determine whether the operator can be optimized.
Field |
Description |
|---|---|
Device_id |
Device ID. |
OP Type |
HCCL operator type. |
Count |
Number of times that HCCL operators are executed. |
Total Time(us) |
Total execution time of HCCL operators, in μs. |
Min Time(us) |
Minimum execution time of HCCL operators, in μs. |
Avg Time(us) |
Average execution time of HCCL operators, in μs. |
Max Time(us) |
Maximum execution time of HCCL operators, in μs. |
Ratio(%) |
Ratio of execution time of HCCL operators to the overall collective communication time. |
