HCCL_CONNECT_TIMEOUT

Description

Configures the timeout wait period of socket connection establishment between different devices in the distributed training or inference scenario.

The value of this environment variable must be an integer ranging from 120 to 7200, and the default value is 120, in seconds.

The progress of collective communication initialization varies depending on the device. This environment variable synchronizes the progress of socket establishment between devices by using a timeout interval.

Example

export HCCL_CONNECT_TIMEOUT=200

Restrictions

None

Applicability

Atlas Training Series Product