Computing and Communication Bandwidth Contention
Operators such as MatMul and FA are memory-intensive operators and are prone to MTE bound. When such operators are executed in parallel with communication operators, the memory bandwidth is preempted by the computing and communication operators, as shown in Figure 1. As a result, the communication transmission bandwidth is lower than the empirical value (about 1 to 2 times lower, but not too low), as shown in Figure 2.
Solution: If bandwidth contention is severe due to parallel computing and communication, compare the performance data of parallel and non-parallel operations, evaluate whether the impact of bandwidth contention outweighs the benefits of parallel computing, and select the mode with better performance.
Parent topic: Communication Tuning Solutions

