Introduction
This section describes the functional architecture of the ATC tool and the component interaction process during model conversion.
ATC Overview
ATC is a model conversion tool built upon the heterogeneous compute architecture CANN. It is designed to convert models of open-source frameworks or Ascend IR–defined single-operator description files (JSON) into OM offline models supported by the Ascend AI Processor. Figure 1 shows the ATC architecture.
During model conversion, ATC implements operator scheduling optimization, weight data rearrangement, and memory optimization, as well as deep learning model tuning, to achieve higher performance and efficiency of model execution on the Ascend AI Processor.
Note that:
- For a model developed in an open-source framework:
- The Parser parses a network model from an open-source framework into an intermediate representation (IR) graph.
- After graph preparation, partitioning, optimization, and build, the IR is converted into an offline model adapted to the Ascend AI Processor. The graph here refers to the model network topology.
- Upload the converted offline model to the board and load the model file by using the AscendCL API for inference. For details, see Model Inference in CANN AscendCL Application Software Development Guide (C&C++).
To view parameters of either an open-source model or an offline model converted from an open-source one, you can use ATC to convert it into a .json file.
- For the single-operator description file scenario:
Use ATC to build and convert the Ascend IR–defined single-operator description file (.json) into a single-operator offline model adapted to the Ascend AI Processor. Upload the offline model to the board and load the single-operator model file by using the AscendCL API to verify the single-operator functionality. For details, see Single-Operator Calling in CANN AscendCL Application Software Development Guide (C&C++).
Interaction in Model Conversion
The following uses the conversion from the open-source model into the OM offline model as an example to describe the module interaction in model conversion.
According to the compute unit differences of operators in a network model, operators are classified into Tensor Boost Engine (TBE) operators and AI CPU operators. TBE operators run on AI Cores, and AI CPU operators run on AI CPUs. Although the interaction in model conversion of both TBE and AI CPU operators involves nodes such as graph preparation, partitioning, optimization, and build, the internal interaction modules of the two types of operators are different because their compute units are different. For details, see the following figure. For details about operator types and basic concepts, see TBE&AI CPU Operator Developer Guide.
- Interaction in model conversion using a TBE operator
Figure 2 Interaction in model conversion using a TBE operator
- The Parser is called to parse the source model into the CANN format.
- Graph preparation: The source graph is optimized and Infershape inference (including setting operator output shape and data type) is performed in this phase.
Graph Engine (GE) is a unified IR API based on the Ascend AI Software Stack for popular machine learning frameworks, such as TensorFlow. GE implements the preparation, partitioning, optimization, build, loading, execution, and management of graphs. During source graph optimization, GE sends a graph optimization request to Fusion Engine (FE) and sends the graph to FE. Then, FE fuses the operators according to the fusion patterns and selects the operators with the highest priority. Finally, an optimized graph is returned to GE.
- Graph partitioning: GE partitions the graph into subgraphs.
- Graph optimization: GE sends the subgraphs to FE. Then, FE inserts transformation operators into the subgraphs, prebuilds the TBE operator based on the data flow of subgraphs, performs Unified Buffer (UB) fusion on these operators on subgraphs based on the fusion patterns, finds the operator implementation file based on the operator information library, builds the operator implementation file into operator kernel files (.o and .json files), and returns the optimized subgraphs to GE.
GE merges the optimized subgraphs into a graph and further optimizes the graph.
- Graph build: GE builds the graph (including memory and stream resource allocation) and sends a tasking request to FE. Then, FE returns the task info of the operator to GE. After the graph is built, an OM offline model file that adapts to the Ascend AI Processor is generated.
- Interaction in model conversion using an AI CPU operator
Figure 3 Interaction in model conversion using an AI CPU operator
- The Parser is called to parse the source model into the CANN format.
- Graph preparation: Basic parameter verification and shape inference (including setting operator output shape and data type) are performed in this phase.
In addition, GE delivers the entire graph to AI CPU Engine. AI CPU Engine reads the operator information library, looks up an appropriate format for the operator, and returns the format to GE.
- Graph partitioning: GE partitions the graph into subgraphs.
- Graph optimization: GE sends the subgraphs to AI CPU Engine. AI CPU Engine optimizes the subgraphs and returns the optimized subgraphs to GE.
GE merges the optimized subgraphs into a graph and further optimizes the graph.
- Graph build: GE builds the graph (including memory and stream allocation) and sends a genTask request to AI CPU Engine. Then, AI CPU Engine returns the task info of the operator to GE. After the graph is built, an OM offline model file that adapts to the Ascend AI Processor is generated.
