hccl_statistic (HCCL Operator Statistics)

The timeline information of the HCCL operator and computation and communication pipeline overlapping data is displayed at the HCCL level in the msprof_*.json file, and the summary information is summarized in the hccl_statistic_*.csv file. Additionally, the msprof*.json file displays the Overlap Analysis for computation and communication pipeline overlapping data analysis.

HCCL operator data can only be collected and parsed in scenarios where inter-device communications exist, such as the multi-device, multi-server, and cluster scenarios.

Availability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Data at HCCL Level in msprof_*.json

The following figure shows data at the HCCL level in the msprof_*.json file.

Figure 1 Communication large operator information
Figure 2 Communication small operator information

In multi-device, multi-server, or cluster scenarios, devices communicate with each other to form communicators. The HCCL level collects the time consumptions of communication operators according to the arranged communicators. In this file, you can find the communication operator that takes the longest time.

Table 1 Field description

Field

Description

Public information

Group * Communication

Communication operators in a communicator. A device (rank) may exist in different communicators, and a group identifies the behavior of the current device in the current communicator.

Plane ID

Network plane ID. In terms of parallel scheduling and execution of multiple transmit and receive communication links, each plane is a concurrent communication dimension.

Title

API name of a component.

Start

Start point on the timeline, which is automatically aligned with that in chrome trace (ms).

Wall Duration

Time taken by the calls to an API (ms).

Self Time

Execution time of the current instruction (ms).

Communication large operator information

connection_id

ID of the connection between a CANN API and an NPU operator when the former is delivered to the latter.

model id

Model ID.

data_type

Data type.

alg_type

Algorithm type in each phase of communication operators, which can be MESH, RING, NB, HD, NHR, PIPELINE, PAIRWISE, or STAR.

count

Data transmission count.

Communication small operator information

notify id

The unique notify ID.

duration estimated(us)

Estimated task duration (μs).

stream id

Stream ID.

task id

Task ID.

task type

Task type.

src rank

Source rank.

dst rank

Destination rank.

transport type

Transmission type, including LOCAL, SDMA, and RDMA.

size(Byte)

Data volume (Byte).

data type

Data type.

link type

Link type, including HCCS, PCIe, and RoCE.

bandwidth(GB/s)

Bandwidth (GB/s).

Pipeline Overlapping Analysis of Computation and Communication

Overlap Analysis in msprof_*.json is the pipeline overlapping analysis data of computation and communication, which is controlled by --task-time and --hccl. See Figure 3.

Computation and communication are sometimes parallel. You can check the pipeline overlapping time (time when computation and communication are parallel) to determine the computation and communication efficiencies.

Figure 3 Effect of pipeline overlapping of computation and communication
Table 2 Field description

Field

Description

Communication

Communication time. This field is not displayed in the single-device scenario because no communication is involved.

Communication(Not Overlapped)

Communication time that is not overlapped. This field is not displayed in the single-device scenario because no communication is involved.

Computing

Computation time

Free

Interval.

Start

Time when the current API starts to be called (ms).

Wall Duration

Time taken by the calls to an API (ms).

hccl_statistic_*.csv File

The file content is formatted as follows.

Figure 4 hccl_statistic_*.csv

hccl_statistic_*.csv contains the HCCL operator statistics, through which you can learn the time consumption of an operator type and the time consumption ratio of each HCCL operator in collective communication to determine whether the operator can be optimized.

Table 3 Field description

Field

Description

Device_id

Device ID.

OP Type

HCCL operator type.

Count

Number of times that HCCL operators are executed.

Total Time(us)

Total execution time of HCCL operators (μs).

Min Time(us)

Minimum execution time of HCCL operators (μs).

Avg Time(us)

Average execution time of HCCL operators (μs).

Max Time(us)

Maximum execution time of HCCL operators (μs).

Ratio(%)

Ratio of execution time of HCCL operators to the overall collective communication time.