HcclGetCommAsyncError
Description
If the cluster information shows that the communication link of the device network port is unstable or network congestion occurs, "error cqe" is printed in the device log. This error is called "RDMA ERROR CQE".
In the current version, this API can only be used to check whether the "RDMA ERROR CQE" error exists in the communicator.
This API is a synchronous API. That is, after this API is called, you need to wait for the returned result.
Prototype
HcclResult HcclGetCommAsyncError(HcclComm comm, HcclResult *asyncError)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
comm |
Input |
Communicator for which you want to check whether error information exists. For details about the definition of the HcclComm type, see HcclComm. |
asyncError |
Output |
|
Returns
For details, see the HcclResult type. In the current version, only the HCCL_E_REMOTE error type is returned.
Constraints
- This API can be called only after a communicator is created.
- This API cannot be called after the communicator is destroyed.