Introduction

Overview

MindStudio Insight is a visualization tuning tool for Ascend AI developers. It supports system, operator, serving, and memory tuning, enabling developers to quickly optimize performance in training, inference, and operator development scenarios.

MindStudio Insight provides various tuning analysis methods, displays real software and hardware running data, analyzes performance bottlenecks from multiple dimensions, and supports visualized performance analysis for clusters with hundreds and thousands of cards, and even beyond, enabling developers to complete performance tuning within days.

Advantages

  • MindStudio Insight allows developers to view the profile data in the cluster scenario on the Timeline tab page, displays the data by single card, and automatically traverses .db files or all trace_view.json files (in PyTorch and MindSpore scenarios) and msprof*.json files (in TensorFlow and offline inference scenarios) in the input path. Developers do not need to manually merge files.
  • With the database, MindStudio Insight supports large-scale profile data processing, analysis of 20 GB cluster profile data, and performance tuning in foundation model scenarios.

Scenario

  • System tuning: MindStudio Insight provides the timeline view, memory usage, operator duration, and communication bottleneck analysis to help developers quickly locate model performance bottlenecks.
    Table 1 Functions

    Page

    Introduction

    Description

    Timeline

    Displays the running status of the entire online inference and training process in the timeline view based on the scheduling process, and provides functions such as cluster timeline display and system view details viewing.

    -

    Memory

    Provides a visualized display of memory information during collection. Displays the operator memory trend in an operator memory curve.

    -

    Operator

    Provides operator duration statistics and analysis.

    -

    Summary

    Displays the computing and communication operator duration analysis, and displays the analysis results in a bar chart, curve, and data pane.

    PyTorch or MindSpore cluster scenarios are supported.

    Communication

    Displays the network link performance across the cluster and the communication performance of all nodes. By analyzing the overlapped duration between cluster communication and computation, slow hosts or nodes in the cluster training can be identified.

    PyTorch or MindSpore cluster scenarios are supported.

    RL (Reinforcement Learning)

    Provides a visualized display of the pipeline diagram in each phase of the reinforcement learning process.

    -

  • Operator tuning: MindStudio Insight provides the instruction pipeline view, operator source code view, and operator running load analysis view to display the key performance metrics of operators running on the Ascend AI Processors in a visualized manner, allowing developers to quickly locate software and hardware performance bottlenecks of operators and improve operator performance analysis efficiency.
    Table 2 Functions

    Page

    Introduction

    Remarks

    Timeline

    Displays the running status of instructions on the Ascend AI Processor in a timeline view, displays the overall running status based on the scheduling process, and allows users to view instruction details and search for instructions.

    -

    Source

    Displays the operator instruction heatmap, and allows developers to view the mapping between the operator source code and instruction sets as well as the time consumption.

    BIN files collected by msProf are supported.

    Details

    Displays the basic operator information, compute workload analysis, and memory workload analysis, as well as the analysis results in charts and data panes.

    BIN files collected by msProf are supported.

    Cache

    Displays the L2 cache access of kernel functions in user programs, helping users optimize the cache hit rate.

    BIN files collected by msProf are supported.

  • Serving tuning: MindStudio Insight displays the end-to-end request execution in the timeline view, showing the duration of the request in each key phase and the status of the request. This helps users quickly identify service performance bottlenecks and adjust the tuning policy.
    Table 3 Functions

    Page

    Introduction

    Description

    Timeline

    Displays the end-to-end request execution status in a timeline view, helping users intuitively view the duration of the request in each key phase and the current request status.

    JSON files of trace data of inference service requests are supported.

    Curve

    Displays the end-to-end performance of the inference service process in a curve and data details table.

    profiler.db files are supported.

  • Memory tuning: MindStudio Insight displays the detailed memory allocation on the device on graphics, and marks memory allocation and usage details based on the Python call stack and custom trace tags, to locate and tune memory issues.
    Table 4 Functions

    Page

    Overview

    Description

    Leaks

    The call stack graph, line & block graph, and memory disassembly graph are used to display the memory status, allowing developers to efficiently analyze and locate memory issues.

    DB memory result files collected by msLeaks are supported.

Constraints

MindStudio Insight supports the import and display of profile data files in multiple formats. For details about the file specifications and restrictions, see Table 5.
Table 5 File specifications

File Type

Suggestion

Specification Restriction

.json

Keep files under 1 GB each and 20 GB in total.

Single file limit: 10 GB

.bin

Single file limit: 500 MB

Single file limit: 10 GB

.db

  • System tuning: Keep files under 1 GB each.
  • Serving tuning: Keep files under 1 GB each.
  • System tuning: Keep files under 20 GB each.
  • Serving tuning: Keep files under 10 GB each.

.csv

CSV files are stored in text data. Single CSV file limit: 500 MB

Single file limit: 2 GB