Analyzing Performance Data Files

  1. Import the collected performance data files to the MindStudio Insight tool for analysis.
  2. Analyze the ratio of Free. Generally, the ratio of Free of the Atlas 200I A2 Inference Acceleration Module is less than 10%. As shown in Figure 1, the ratio of Free exceeds 30%, which is beyond the specified value. You need to further analyze whether the OS of the Atlas 200I A2 Inference Acceleration Module runs other services, which occupies resources and causes waiting.
    Figure 1 Analyzing the ratio of Free

  3. Analyze the operators running on the AI CPU. As shown in Figure 2, GridSampler2D runs on the AI CPU. Locate the faulty operator and contact the owner to determine whether the operator can be optimized to run on the AI Core or perform further analysis.
    Figure 2 Analyzing the operators running on the AI CPU
  4. Analyze the time-consuming operators running on the AI Core. As shown in Figure 3, the Conv2D operator takes most of the time. Locate the faulty operator and contact the owner to check whether the operator can be optimized.
    Figure 3 Analyzing the time-consuming operators running on the AI Core
  5. Analyze the op_summary file.

    You are advised to sort the operators by task duration in descending order and pay attention to the operators that take a long time. These operators are the bottlenecks for high performance. If the values of vec_ratio and mac_ratio do not exceed 0.8, the operators can be further optimized. If the value of mtex_ratio is high, data movement takes a long time. In this case, you can combine the operators before and after the movement to reduce the movement. Table 1 describes the parameters.

    Table 1 Parameters

    Parameter

    Description

    aic_mte1_time(us)

    Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time.

    aic_mte1_ratio

    Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement) to the total cycles.

    aic_mte2_time(us)

    Time taken to execute MTE2 instructions (GM-to-AI Core movement)

    aic_mte2_ratio

    Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles.

    aic_mte3_time(us)

    Time taken to execute MTE3 instructions (AI Core-to-GM movement).

    aic_mte3_ratio

    Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM movement) to the total cycles.