Introduction to DataFlow

Intended Audience

This document provides guidance for developers to use DataFlow APIs to build, modify, compile, and execute computational graphs. With this document, you will:

  • Understand how to build FlowGraph using FlowOperator.
  • Understand the method of compiling and executing FlowGraph.
  • Understand the User-Defined Function (UDF) development process and related APIs.

Personnel who are familiar with the basic architecture and features of CANN, capable of developing C++/C language programs, and have a basic understanding of machine learning and deep learning knowledge can better understand this document.

DataFlow Overview

DataFlow is used to organize one or more compute ProcessPoints into a complete compute flow in a data-driven manner by using data queues. The key difference between ProcessPoints and operators is that ProcessPoints are processed in asynchronous mode. DataFlow is carried by FlowGraph, and ProcessPoints are carried by FlowNode. The following figure shows the relationships between DataFlow APIs.

Figure 1 Relationships between DataFlow APIs
  • FlowGraph: DataFlow graph, which consists of the input node FlowData and compute node FlowNode.
  • FlowOperator: node base class of FlowGraph. It has two derivative classes: FlowData and FlowNode.
  • FlowData: input node of FlowGraph.
  • FlowNode: compute node of FlowGraph. The following two types are supported:
    • FunctionPp: compute ProcessPoint of functions, which implements user-defined functions through UDF.
    • GraphPp: compute ProcessPoint of graphs, which implements the compute logic of users through IR graph construction.

DataFlow allows users to write customized processing functions through FunctionPp and GraphPp and execute the functions in FlowModel mode through DataFlow graph construction.

The table below describes the differences between DataFlow and IR graph construction.

Table 1 Differences between DataFlow and IR graph construction

Dimension

IR Graph Construction

DataFlow

Data flow processing mode

  • The graph allows one input with one output only.
  • Operators in the graph use synchronous data flows to implement serial and synchronous execution.
  • DataFlow allows one input with one or multiple outputs, or multiple inputs with one output. It has higher flexibility.
  • ProcessPoints in DataFlow use asynchronous data flows to implement parallel and asynchronous execution, fully utilizing resources and improving throughput.

User-defined function development mode

The user-defined function is implemented by developing custom operators.

Operator development involves prototype definition, code implementation, information library definition, and adaptation of the operator. There are many development deliverables, which is not user-friendly.

The user-defined function can be implemented by developing UDF or custom operators.

To develop UDF, you only need to define user functions and construct graphs. There are fewer development deliverables, which is more user-friendly.

Buffer allocation mode

The input buffer and output buffer of the operator have already been allocated.

The output buffer of the UDF is user-defined and needs to be allocated by users.

Development Workflow

Restrictions

The model cannot modify the input data either on the host or device.