Troubleshooting

Troubleshooting Process

Figure 1 shows how to locate performance issues in the traditional model inference scenario.

Figure 1 Troubleshooting process
  1. Determine the exception scope.
  2. Collect profile data. You can use the performance tuning tool to collect and parse profile data.
  3. Import the parsed profile data to MindStudio Insight for analysis. Common issues in traditional model inference usually involve computation and scheduling.
    1. If the issue is slow delivery, analyze the CPU usage during model running and determine whether the key concurrent events (CPU preemption and IOWait) are the root causes.
    2. If the issue is not slow delivery, the operator execution may be slow. In this case, examine the computation and check if you can optimize or merge high-impact operators.
  4. Analyze the data and make adjustments based on the identified issue and root cause.

Troubleshooting Procedure

  1. Install the required software packages to facilitate subsequent data collection and analysis.
  2. Collect data.

    You can use the msProf command of the performance tuning tool to collect profile data of a single-device inference program. The following is an example:

    msprof --output=save_path python3 main.py

    For details about how to use the command, see Profiling Instructions. After the collection and parsing are complete, the following results are generated in the directory specified by the --output parameter.

    └── PROF_XXX
          ├── device_x
          ├── host
          └── mindstudio_profiler_log
                ├── xx.log
          └── mindstudio_profiler_output
                ├── xx.json
                ├── xx_*.csv
  3. Use MindStudio Insight to analyze data.

    Import the parsed profile data to MindStudio Insight for analysis. For details, see MindStudio Insight User Guide.

    You can analyze performance problems by checking the following information:

    • Observing the proportion in the data overview

      For single-server, single-card inference, choose Timeline > System View > Overlap Analysis to learn the computing and free time percentages, as shown in Figure 2.

      Figure 2 Overlap analysis

      Free indicates the duration during which the Ascend device is free with no computing or communication tasks, while Computing indicates the duration during which the Ascend device performs computing.

      • If the free time is greater than the computing time, there may be scheduling issues.
      • If many operators are performing long-time computing, there may be computing issues.
    • Scheduling issues

      On the Timeline page, enable the HostToDevice connection. The connection shows the delivery and execution relationship between operators at the CANN layer and operators at the Ascend Hardware layer. The HostToDevice connection can be tilted or vertical, as shown in Figure 3. A tilted connection ensures proper task scheduling and full loading of the Ascend devices. In contrast, a vertical connection leads to inefficient task delivery, leaving the Ascend devices underloaded and free while awaiting tasks. In this case, tuning can be done by methods such as increasing the batch size, binding cores, and replacing operators with fused ones.

      Figure 3 HostToDevice connections
    • Computing issues

      You can view the operator proportion on the Operator page, as shown in Figure 4.

      Arrange operators by their time percentage, identify the slowest one, and verify if poor code design causes many unnecessary actions. Modify the code as needed to reduce inefficiencies. For prolonged execution times, reach out to the operator's developers for deeper analysis.

      Figure 4 Viewing the operator proportion