TBE Introduction

TBE

Tensor Boost Engine (TBE) executes the operators running on the AI Core of Ascend AI Processor. TBE enables custom development of neural network (NN) operators based on TVM.

As deep learning becomes a ubiquitous and indispensable technology, additional frameworks and hardware backends have begun to spring up. Most existing NN models have difficulty in perfectly migrating to another hardware platform, resulting in no performance improvement on a new hardware platform. TVM has been proposed as an open deep learning compiler stack that compiles various deep learning models from different frameworks to CPUs, GPUs, or specialized accelerators and supports multiple languages through unified intermediate representation (IR) and optimized scheduling. For details about TVM, visit https://tvm.apache.org/.

Figure 1 shows the logical architecture of TBE.

TBE supports flexible operator development modes. You can select an appropriate mode based on your level of hardware proficiency, and leverage the optimization and code generation capabilities of TBE to generate high-performance operators executable on Ascend AI Processor.

Figure 1 Logical architecture of TBE in the software stack

Front-end framework: includes MindSpore and third-party open-source frameworks TensorFlow (Google's open-source machine learning framework) and Caffe (a convolutional architecture for rapid feature embedding).
Graph Compiler: provides unified IR APIs for different machine learning frameworks based on the Ascend AI Software Stack to connect to upper-layer network model frameworks, such as TensorFlow, Caffe, and PyTorch. Graph Compiler provides functions such as graph (network model topology) preparation, partitioning, optimization, compilation, loading, execution, and management. It compiles the IR graph input by users into a model that can run on the Ascend AI Processor.
TBE: provides a tool required for developing custom operators. TBE provides operator information necessary for graph inference of Graph Compiler based on the IR definition, and provides information such as operator calling based on the operator information library. The binary programs generated by TBE can be executed on Ascend AI Processor.

TBE Architecture

As shown in Figure 2, TBE consists of the following modules: operator logic description module, Schedule, IR, Pass (build optimization module), and CodeGen (code generation module).

Figure 2 TBE Architecture

Operator logic description: provides external compute APIs for operator programming.
Schedule: describes shape-oriented tiling policies for operator execution on Ascend AI Processor using TVM scheduling primitives.
IR: offers IR of TVM and provides functions including IR transformation, abstract syntax tree (AST) maintenance, and more.
Pass: introduces a range of build optimizations on the generated IR, including double buffering, pipeline synchronization, memory allocation management, instruction mapping, tiling for adapting to the Cube Unit, and more.
CodeGen: generates a temporary C-style code file, which can be used by the compiler to generate the operator implementation file or directly loaded and executed by a network model.

Parent topic: Background Knowledge