Tutorial
Required Knowledge
This document provides guidance for you to develop high-performance operators on the Ascend AI Processor based on the Ascend C framework. Before reading this document, you should:
- Be proficient in using the C++ programming language.
- Understand computer architecture.
- Know about hardware architecture of Ascend AI Processors.
- Have completed learning Ascend C documents and courses.
- Be able to set up the Ascend C development and debugging environments.
- Be able to independently develop Ascend C operators.
- Be proficient in using the profiling tool to obtain profile data.
Problems Solving
The operators mentioned in this document refer to those developed by using Ascend C. After operators are developed, you can further optimize operator performance with the help of this document.
This document describes the concept of the operator running data flow, debugging and optimization roadmap with Ascend C programming, performance optimization methods based on experience, and specific performance optimization cases.
Optimizing operator performance is a continuous iteration process, which goes through the following steps cyclically until the performance goal is achieved. These four steps are also the basis for you to understand the best practices.

This main content of this document consists of:
- Heterogeneous Computing: describes the operator deployment on hardware and the running data flow. It aims to help you understand the processes that may affect the operator execution performance in the hardware architecture.
- Normal Functions: describes some common scenarios that affect operator functions. It aims to enable you to quickly solve function problems and optimize performance.
- Performance Analysis: describes operator profiling. It aims to enable you to analyze profile data and identify the performance optimization direction. More details are introduced in Performance Tuning Tool User Guide .
- Performance Optimization: describes performance optimization methods. It aims to enable you to optimize operator performance based on bottlenecks. The optimization suggestions include: transfer optimization, memory optimization, API usage optimization, instruction optimization, pipeline optimization, and tiling optimization. Some suggestions are a comprehensive reflection of the preceding classifications and are described in the sections that are closely related. These suggestions are classified by priority based on the performance effect and scope. Higher-priority suggestions are those that bring performance benefits to most Ascend C operators, while lower-priority suggestions are methods that affect only specific situations. You are not required to be familiar with all optimization methods, but can adopt proper optimization methods based on the operator performance bottlenecks obtained through analysis and gradually understand the whole optimization strategy.
- Best Practices: describes practical cases of operator performance optimization. It aims to help you better understand the preceding content, so that you can refer to the optimization methods and ideas to optimize the operator performance.