Hierarchical Communication Principles
The following uses communication operators ReduceScatter, AllGather, and AllReduce as examples to describe the hierarchical communication process.
ReduceScatter
As shown in Figure 1, the ReduceScatter operator requires the ith rank to obtain the ith reduction result. To ensure the continuity of data blocks between servers, the ReduceScatter operation is performed between servers first and then within a server.
AllGather
As shown in Figure 2, the AllGather operator requires that the input data of the ith rank be in the ith position of the result. To ensure the continuity of data blocks between servers, the AllGather operation is performed within a server first and then between servers.
AllReduce
As shown in Figure 3, the output of the AllReduce operator is a complete reduction result. Therefore, although the output is divided into two phases: ReduceScatter and AllGather, the semantics of ReduceScatter and AllGather do not need to be strictly followed, and the communication process with large-sized data can be performed in a server with higher bandwidth. That is, the ReduceScatter operation is performed within a server first, then between servers, and finally within a server.


