Introduction

Overview

The rapid development of AI, especially with the widespread availability of GPT models in 2023, has driven immense economic value. Transformer models like GPT have diversified, sparking a significant demand for AI devices.

The Ascend Transformer Boost (ATB for short) is an efficient and reliable acceleration library designed for Transformer model training and inference based on Huawei Ascend AI Processors.

The ATB uses a series of optimization policies for algorithm, hardware, and software, to improve the training and inference speed of the Transformer model and reduce energy consumption and costs. Specifically, the ATB implements efficient acceleration of the Transformer model by optimizing the implementation of core operators and attention mechanisms such as matrix multiplication. In addition, the ATB makes full use of the hardware features of the Ascend AI Processor, such as computing power, storage bandwidth, and memory bandwidth, and uses technologies such as hardware acceleration and data reuse to further improve performance and efficiency. Currently, the ATB provides basic high-performance operators at the bottom layer and operator combination technologies (graph operators). In addition, the upper layer supports interconnection with multiple model frameworks, such as PyTorch, MindSpore, and Paddle.

In conclusion, the ATB contains highly optimized modules of various Transformer models, which play an important role in various application scenarios and provide strong support for model training and inference.

Software Architecture

Figure 1 Software architecture

The ATB mainly provides the following functions:

  • Provides basic native operators (Operations) so that you can use corresponding operators to complete desired computations as required.
  • Provides a plugin operator (PluginOperation) mechanism so that you can customize operators as required.
  • Provides a graph operator (GraphOperation) mechanism so that you can design graph operators based on specific models, and use the native operators provided by the ATB and the created custom operators to create graph operators to complete desired computations.