Result Description
After distributed training is complete, you can check the execution result and locate faults by referring to this section.
Training Result Check
- Check your script execution result.
The print result varies according to training scripts. If information similar to the following is displayed on each device for distributed training, the training is complete.

When the environment variable DUMP_GE_GRAPH is enabled, GE dump graph files are generated.
1export DUMP_GE_GRAPH=2
If the HcomBroadcast and HcomAllReduce operators are found in the directory of the dumped graph files, it indicates that the HCCL operators for inter-NPU communication have been properly inserted.
Figure 1 Dumped graphs from GE
- If your script fails to execute, analyze and locate the fault in the same way you do in single-device training.
You can spot the fault by checking the host log file plog_*.log in $HOME/ascend/log/run/plog where $HOME is the root directory of the host user.
If the execution succeeds on a single device but fails on multiple devices, the issue is typically related to collective communication, as shown in Figure 2. For details, see section "FAQs" in HCCL User Guide.
Troubleshooting
If the script execution fails, analyze and locate the fault based on the following logs:
Path of run logs generated when the app is running on the host: $HOME/ascend/log/run/plog/plog-pid_*.log.
Path of the run logs generated when the app is running on the device: $HOME/ascend/log/run/device-id/device-pid_*.log.
$HOME indicates the root directory of the user on the host.
You can identify the error module and determine the cause by using ERROR-level logs.

Module Name |
Error |
Solution |
|---|---|---|
System error |
Environment and version mismatch |
Check the version mapping and system installation. |
GE |
GE graph build or verification error |
Specific error causes are provided for verification errors. You only need to modify the network script as prompted. |
Runtime |
Initialization or graph execution failure due to an environment exception |
If initialization fails, check the environment configuration and whether the environment is occupied by other processes. |

