HCCL_DIAGNOSE_ENABLE
Description
Sets whether to cache detailed information about some tasks during collective communication. If a task fails to be executed, detailed logs can be printed for fault locating.
The following options are supported:
- 1: enables the function.
- 0: disables the function.
The default value is 0.
Note that enabling this function will affect the performance.
Example
export HCCL_DIAGNOSE_ENABLE=1
Restrictions
Information about a maximum of 2000 latest operators can be saved.
Applicability
Parent topic: Debugging