Optimizing Operators Using Instruction Pipeline Chart

This section shows how to use the instruction pipeline charts of the msProf tool to analyze operator bottlenecks and optimize the performance of vector operators.

Procedure

  1. Import the visualize_data.bin file generated after collecting operator simulation profile data to MindStudio Insight by referring to msprof op simulator. For details, see "Importing Profile Data" in Ascend C Operator Development APIs.
  2. View the operator instruction pipeline chart.

    It can be found that the MTE2 pipeline does not execute the transfer instruction during VADD computation, making it the performance bottleneck of the operator. To optimize operator performance, the transfer efficiency of the MTE2 pipeline needs to be improved.

  3. Enable the double buffer mechanism of Ascend C operators to improve the MTE2 transfer efficiency.
    For example, in the sample operator kernel function, you can change the value of the second parameter (BUFFER_NUM) of InitBuffer in TPipe from 1 to 2 to enable double buffer. For details about how to use InitBuffer, see InitBuffer.
    1
    2
    3
    4
    constexpr int32_t BUFFER_NUM = 2;        // tensor num for each queue
    ...
    pipe.InitBuffer(inQueueY, BUFFER_NUM, 1024 * sizeof(half));
    ...
    
  4. Repeat Step 1 to view the optimized instruction pipeline chart.

    It can be observed that the MTE2 pipeline executes the transfer instruction during VADD computation, resulting in more efficient data transfer.