Introduction

This section describes the architecture of the ATC tool and the terms, acronyms, and abbreviations that you may encounter when using the tool.

ATC Overview

ATC is a model conversion tool built upon the heterogeneous compute architecture CANN. It is designed to convert models of open-source frameworks or Ascend IR–defined single-operator description files (JSON) into OM offline models supported by the Ascend AI Processor. Figure 1 shows the ATC architecture.

During model conversion, ATC implements operator scheduling optimization, weight data rearrangement, and memory optimization, as well as deep learning model tuning, to achieve higher performance and efficiency of model execution on the Ascend AI Processor.

Figure 1 ATC architecture

Note that:

  • For a model developed in an open-source framework:
    1. The Parser parses a network model from an open-source framework into an intermediate representation (IR) graph.
    2. After graph preparation, partitioning, optimization, and build, the IR is converted into an offline model adapted to the Ascend AI Processor. The graph here refers to the model network topology.
    3. Upload the offline model to the board and load the model file using the acl API for inference. For details, see ""Model Management"" in Application Development Guide (C&C++).
  • For the single-operator description file scenario:

    Use ATC to build and convert the Ascend IR–defined single-operator description file (.json) into a single-operator offline model adapted to the Ascend AI Processor. Upload the offline model to the board and load the single-operator model file using the acl API to verify the single-operator functionality. For details, see ""Single-Operator Model Execution"" in Application Development Guide (C&C++).

    For details about the configuration of the single-operator description file, see Generating a Single-Operator Model.

Basic Concepts

Table 1 Concepts

Concept

Description

GE

The graph engine (GE) serves as the control center for the build and run of computational graphs. Its key functionalities include graph optimization, management of graph build, and control of graph run. GE supports various AI frameworks through unified graph development APIs, which allows for the conversion from computational graphs of different AI frameworks into Ascend graphs. During the optimization of the original graph, GE optimizes the entire graph.

YUV420SP

It is a lossy image color encoding format, which comes in two forms: YUV420SP_UV and YUV420SP_VU.

Format

The format is the physical layout of data and defines the dimensions for data interpretation, such as 1D, 2D, 3D, 4D, and 5D.

NCHW and NHWC

In deep-learning frameworks, n-dimensional data is stored by using an n-dimensional array. For example, a feature map of a convolutional neural network (CNN) is stored using a 4D array, including:

  • N: batch size, for example, the number of images.
  • H: height of the feature map, that is, the number of pixels in the vertical direction.
  • W: width of the feature map, that is, the number of pixels in the horizontal direction.
  • C: channels. For example, an RGB image has 3 channels.

Data can be stored only in linear mode, so the four dimensions are stored in a certain layout. Different deep-learning frameworks store feature maps in different layouts. For example, Caffe uses the layout [Batch, Channels, Height, Width], that is, NCHW, and TensorFlow uses the layout [Batch, Height, Width, Channels], that is, NHWC.

Figure 2 uses an RGB image as an example. In NCHW, C is arranged at the outermost layer, and pixels are close to each other in progressive mode in each channel, stored in the layout of RRRRRRGGGGGGBBBBBB. In NHWC, C is arranged at the innermost layer, and pixels are close to each other in interlaced mode, stored in the layout of RGBRGBRGBRGBRGBRGB.

Figure 2 NCHW and NHWC

NC1HWC0

To improve data access efficiency of General Matrix Multiply (GEMM) data blocks, the tensor data on Ascend AI Processor is stored in NC1HWC0, a 5D format.

C0 is closely related to the microarchitecture. It indicates the data volume processed by a Cube Unit in one dimension. A Cube Unit processes 32 bytes × 32 bytes data, and the data volume in one dimension is 32 bytes. For example, if the data type is float16 (2 bytes), C0 = 32/2 = 16; if the data type is float32 (4 bytes), C0 = 32/4 = 8.

C1 = (C + C0 – 1)/C0. When the division is not exact, the result is rounded down.

Steps of NHWC/NCHW -> NC1HWC0 conversion: Tile data into C1 pieces of NHWC0/NC0HW along the C dimension, and arrange them in the memory into NC1HWC0, as shown in the following figure.

Figure 3 NC1HWC0
  • Formula for NHWC -> NC1HWC0 conversion:
    Tensor.reshape( [N, H, W, C1, C0]).transpose( [0, 3, 1, 2, 4] )
  • Formula for NCHW -> NC1HWC0 conversion:
    Tensor.reshape( [N, C1, C0, H, W]).transpose( [0, 1, 3, 4, 2] )

FRACTAL_Z

FRACTAL_Z is a format to define convolution weights, which is converted from the Filter Matrix. It is transferred to the Cube Unit in 4D format of "C1HW,N1,N0,C0".

The data is tiled into two layers, as shown in the following figure.

The data of the first layer, related to the cube size, is contiguously stored in column-major order (n format). The data of the second layer, related to the matrix size, is contiguously stored in row-major order (Z format).

For example, HWCN = (2, 2, 32, 32) can be reshaped into FRACTAL_Z (C1HW, N1, N0, C0) = (8, 2, 16, 16).

HWCN-to-FRACTAL_Z conversion:

Tensor.padding([ [0,0], [0,0], [0,(C0–C%C0)%C0], [0,(N0–N%N0)%N0] ]).reshape( [H, W, C1, C0, N1, N0]).transpose( [2, 0, 1, 4, 5, 3] ).reshape( [C1*H*W, N1, N0, C0])

NCHW-to-FRACTAL_Z conversion:

Tensor.padding([ [0,(N0–N%N0)%N0], [0,(C0–C%C0)%C0], [0,0], [0,0] ]).reshape( [N1, N0, C1, C0, H, W,]).transpose( [2, 4, 5, 0, 1, 3] ).reshape( [C1*H*W, N1, N0, C0])

FRACTAL_NZ

FRACTAL_NZ is a fractal format for storing data such as feature maps. For example, the Cube Unit outputs matrices in NW1H1H0W0 format. A matrix is divided into (H1 × W1) fractals in column-major order, which looks like an N-shaped layout. Each fractal consists of (H0 × W0) elements in row-major order, resembling a Z-shaped layout. Therefore the NW1H1H0W0 format is referred to as the Nz format. (H0 × W0) indicates the size of a fractal, as shown in the following figure.

ND-to-FRACTAL_NZ conversion:

(..., N, H, W )->pad->(..., N, H1*H0, W1*W0)->reshape->(..., N, H1, H0, W1, W0)->transpose->(..., N, W1, H1, H0, W0)

Repository

A repository stores the tiling policies after the operator performance is verified on the board so that the policies can be directly used during operator build.