Using 512-Byte Alignment for the GM Address

[Priority] High

[Description] Due to the internal design restrictions of the AI Processor, when data is moved from the GM to the local memory, ensure that the GM address is 512-byte aligned to maximize the bandwidth efficiency. The following figures show the bandwidth efficiency of a single core in 512-byte alignment and 32-byte alignment scenarios. When the same amount of data is moved and the bandwidth difference is the largest, the bandwidth efficiency in the 32-byte alignment scenario can only reach 70% of that in the 512-byte alignment scenario.

  • This performance optimization method takes effect only for the Atlas A2 training products/Atlas A2 inference products.
  • The test data is subject to the processor model, and slight jitter may occur during the actual test so that the specific bandwidth values may not be strictly consistent with the following test data.
Figure 1 Comparison between the actual bandwidths of 512-byte alignment and 32-byte alignment in the GM->UB direction
Figure 2 Comparison between the actual bandwidths of 512-byte alignment and 32-byte alignment in the UB->GM direction