hccl_statistic (HCCL Operator Statistics)
The timeline information of the HCCL operator and computation and communication pipeline overlapping data is displayed at the HCCL level in the msprof_*.json file, and the summary information is summarized in the hccl_statistic_*.csv file. Additionally, the msprof*.json file displays the Overlap Analysis for computation and communication pipeline overlapping data analysis.
HCCL operator data can only be collected and parsed in scenarios where inter-device communications exist, such as the multi-device, multi-server, and cluster scenarios.
Availability
Data at HCCL Level in msprof_*.json
The following figure shows data at the HCCL level in the msprof_*.json file.
In multi-device, multi-server, or cluster scenarios, devices communicate with each other to form communicators. The HCCL level collects the time consumptions of communication operators according to the arranged communicators. In this file, you can find the communication operator that takes the longest time.
|
Field |
Description |
|---|---|
|
Public information |
|
|
Group * Communication |
Communication operators in a communicator. A device (rank) may exist in different communicators, and a group identifies the behavior of the current device in the current communicator. |
|
Plane ID |
Network plane ID. In terms of parallel scheduling and execution of multiple transmit and receive communication links, each plane is a concurrent communication dimension. |
|
Title |
API name of a component. |
|
Start |
Start point on the timeline, which is automatically aligned with that in chrome trace (ms). |
|
Wall Duration |
Time taken by the calls to an API (ms). |
|
Self Time |
Execution time of the current instruction (ms). |
|
Communication large operator information |
|
|
connection_id |
ID of the connection between a CANN API and an NPU operator when the former is delivered to the latter. |
|
model id |
Model ID. |
|
data_type |
Data type. |
|
alg_type |
Algorithm type in each phase of communication operators, which can be MESH, RING, NB, HD, NHR, PIPELINE, PAIRWISE, or STAR. |
|
count |
Data transmission count. |
|
Communication small operator information |
|
|
notify id |
The unique notify ID. |
|
duration estimated(us) |
Estimated task duration (μs). |
|
stream id |
Stream ID. |
|
task id |
Task ID. |
|
task type |
Task type. |
|
src rank |
Source rank. |
|
dst rank |
Destination rank. |
|
transport type |
Transmission type, including LOCAL, SDMA, and RDMA. |
|
size(Byte) |
Data volume (Byte). |
|
data type |
Data type. |
|
link type |
Link type, including HCCS, PCIe, and RoCE. |
|
bandwidth(GB/s) |
Bandwidth (GB/s). |
Pipeline Overlapping Analysis of Computation and Communication
Overlap Analysis in msprof_*.json is the pipeline overlapping analysis data of computation and communication, which is controlled by --task-time and --hccl. See Figure 3.
Computation and communication are sometimes parallel. You can check the pipeline overlapping time (time when computation and communication are parallel) to determine the computation and communication efficiencies.
|
Field |
Description |
|---|---|
|
Communication |
Communication time. This field is not displayed in the single-device scenario because no communication is involved. |
|
Communication(Not Overlapped) |
Communication time that is not overlapped. This field is not displayed in the single-device scenario because no communication is involved. |
|
Computing |
Computation time |
|
Free |
Interval. |
|
Start |
Time when the current API starts to be called (ms). |
|
Wall Duration |
Time taken by the calls to an API (ms). |
hccl_statistic_*.csv File
The file content is formatted as follows.
hccl_statistic_*.csv contains the HCCL operator statistics, through which you can learn the time consumption of an operator type and the time consumption ratio of each HCCL operator in collective communication to determine whether the operator can be optimized.
|
Field |
Description |
|---|---|
|
Device_id |
Device ID. |
|
OP Type |
HCCL operator type. |
|
Count |
Number of times that HCCL operators are executed. |
|
Total Time(us) |
Total execution time of HCCL operators (μs). |
|
Min Time(us) |
Minimum execution time of HCCL operators (μs). |
|
Avg Time(us) |
Average execution time of HCCL operators (μs). |
|
Max Time(us) |
Maximum execution time of HCCL operators (μs). |
|
Ratio(%) |
Ratio of execution time of HCCL operators to the overall collective communication time. |
