Performance Analysis in Cluster Training Scenarios
Scenario
A cluster consists of multiple nodes, which are management in a unified manner on the management page. Each node has an independent system. In cluster scenarios, the tool collects profile data of each node, generates a PROF_XXX directory on each node, and pre-parses and summarizes all PROF_XXX directories to OBS. You need to manually copy all PROF_XXX directories summarized by OBS to an environment where cluster data can be displayed and analyzed.
Currently, the following tool supports cluster data display and analysis: MindStudio Insight.
Profile Data Collection Process
The following figure shows the overall process of profile data collection.

Environment Setup
Restrictions
In cluster scenarios, profile data of a maximum of 128 nodes can be collected. If eight devices are configured for each node, profile data of a maximum of 1024 devices can be collected.
Profile Data Collection
After the environment is set up, you can collect profile data in the cluster scenario as follows:
- Use Profile Data Collecting and Parsing to profile data.
- Use the Ascend PyTorch Profiler API to collect PyTorch profile data.
- Set up a distributed training environment and prepare the distributed training script to be used after porting. For details, see "Model Porting" in PyTorch Training Model Porting and Tuning Guide.
- Refer to Getting Started with Performance Analysis in PyTorch Training Scenarios to modify the training script and start distributed training for data collection.