HCCL_BUFFSIZE

Description

Sets the size of the shared data buffer used by the communicator. The value must be an integer greater than or equal to 1. The default value is 200. The unit is MB.

In collective communication, each communicator occupies a buffer of the size of HCCL_BUFFSIZE. If there are many communicators in the cluster, the overall buffer usage increases, which may affect the normal storage of model data. In this case, you can decrease the value of this environment variable to reduce the buffer space occupied by the communicator. If the service model data size is small but the communication data size is large, you can increase the value of this environment variable to increase the buffer space occupied by the communicator, thereby improving data communication efficiency.

The recommended value for LLMs is as follows:

(MicrobatchSize × SequenceLength × hiddenSize × sizeof (DataType))/(1024 × 1024). Round up to an integer.

This environment variable is used in the following scenarios:

Dynamic shape network scenario
Scenario where developers call the C language APIs of the HCCL for framework interconnection

Notes:

The memory requested by this environment variable is exclusively used by HCCL and cannot be multiplexed by other services.
Each communicator occupies 2 × HCCL_BUFFSIZE memory, which is used for receiving and sending memory.
The resource is managed by communicator. Each communicator exclusively occupies a group of 2 × HCCL_BUFFSIZE memory to ensure that concurrent operators in multiple communicators do not affect each other.
For the collective communication operator, when the data size exceeds the value of HCCL_BUFFSIZE, the performance may deteriorate. It is recommended that the value of HCCL_BUFFSIZE be greater than the data size.

Example

export HCCL_BUFFSIZE=200

Restrictions

If you call the HCCL C APIs to initialize a communicator with specific configurations and specify the shared data buffer size using the hcclBufferSize parameter of HcclCommConfig, the configuration of the communicator takes precedence.

Applicability

Atlas A3 training products / Atlas A3 inference products

Atlas A2 training products / Atlas A2 inference products (For Atlas A2 training products / Atlas A2 inference products , only the Atlas 800T A2 training server, Atlas 900 A2 PoD cluster basic unit, and Atlas 200T A2 Box16 heterogeneous subrack are supported.)

Atlas training products

Atlas inference products (For the Atlas inference products , only the Atlas 300I Duo inference card is supported.)

Parent topic: Function