Result Description
After distributed training is complete, refer to this section to check the execution result and locate faults.
Training Result Check
- Check your script execution result.
The print result varies according to training scripts. If information similar to the following is displayed on each device for distributed training, the training is complete.

You will get dumped graph files from GE when you enable DUMP_GE_GRAPH.
1export DUMP_GE_GRAPH=2
From the dumped graphs, you can find the HcomBroadcast and HcomAllReduce operators, indicating that the HCCL operators used for communication between NPUs are properly inserted.
Figure 1 Dumped graphs from GE
- If your script fails to execute, analyze and locate the fault in the same way you do in single-device training.
You can spot the fault by checking the host log file plog_*.log in $HOME/ascend/log/run/plog where $HOME is the root directory of the host user.
The failure is most likely related to collective communication as shown in Figure 2, if your script execution on single device is successful but fails on multiple devices. For details, see section "FAQs" in Collective Communication User Guide.
Troubleshooting
If the script execution fails, analyze and locate the fault based on the following logs:
Path of run logs generated when the app is running on the host: $HOME/ascend/log/run/plog/plog-pid_*.log.
Path of the run logs generated when the app is running on the device: $HOME/ascend/log/run/device-id/device-pid_*.log.
$HOME indicates the root directory of the user on the host.
You can identify the error module and determine the cause by using ERROR-level logs.

Module Name |
Error |
Solution |
|---|---|---|
System error |
Environment and version mismatch |
Check the version mapping and system installation. |
GE |
GE graph compilation or verification error |
Specific error causes are provided for verification errors. You only need to modify the network script as prompted. |
Runtime |
Initialization or graph execution failure due to an environment exception |
If initialization fails, check the environment configuration and whether the environment is occupied by other processes. |

