Introduction to DSL

To facilitate custom operator development, Tensor Boost Engine (TBE) provides a set of computing APIs for developers to assemble the compute logic of operators. These computing APIs are called domain-specific language (DSL). Operators developed based on the DSL can directly use the Auto Schedule mechanism provided by TBE to automatically complete the scheduling process, eliminating the most complex scheduling build process.

DSL Functional Framework

Figure 1 shows the functional framework of an operator developed based on the TBE DSL.

Figure 1 Functional framework of the TBE DSL
  1. You can use the DSL API provided by TBE to describe the compute logic, which specifies the computation method and procedure of the operator.
  2. After the compute logic is developed, you can call the Auto Schedule API to start automatic scheduling. Behind the scene, TBE automatically selects a proper scheduling template based on the compute type to tile and stream the data, ensuring optimal execution on hardware.
  3. After auto scheduling is complete, a TVM-style IR is generated.
  4. The Pass module introduces a range of build optimizations on the generated IR, including double buffering, pipeline synchronization, memory allocation management, instruction mapping, tiling for adapting to the Cube Unit, and more.
  5. After the operator traverses the Pass module, the CodeGen module generates a temporary C-style code file, which is used by the compiler to generate the operator implementation file or directly loaded and executed by a network model.

A code example is provided as follows:

    // Initialize the input tensor to configure a placeholder of the input tensor.
    data_x = tvm.placeholder(shape_x, name="data_1", dtype=input_data_type)
    data_y = tvm.placeholder(shape_y, name="data_2", dtype=input_data_type)
   // Call the computation API to implement data_x + data_y.
    res = tbe.dsl.vadd(data_x, data_y)
   // Call the auto_schedule API to implement automatic scheduling.
    with tvm.target.cce():
        schedule = tbe.dsl.auto_schedule(res)
    // Configure build parameters and perform build.
    config = {"name": kernel_name,
              "tensor_list": (data_x, data_y, res)}
    tbe.dsl.build(schedule, config)

DSL Compute APIs

The compute APIs provided by the TBE DSL approach primarily cover vector operations, including math compute APIs, Neural Network (NN) compute APIs, reduce compute APIs, convolutional compute APIs, and matrix compute APIs.

For details about the APIs, see TBE DSL API.

Auto Schedule

When the compute logic of an operator is implemented, determine the following before executing the compute logic in hardware:
  • The sequence in which computing instructions are executed in hardware
  • The data storage method in the hardware memory

To solve these issues, scheduling needs to be introduced. It adjusts the compute logic and streamlines the compute process, thus improving the compute efficiency and ensuring that the hardware memory allocated during computation does not exceed the upper limit.

TBE DSL provides the Auto Schedule mechanism, with no need for hands-on scheduling. After expressing the compute logic of an operator with DSL API combination, you can call the Auto Schedule API directly to implement automatic scheduling, data tiling, and data streaming. The Auto Schedule mechanism is the default schedule tuning mechanism at the low level of TBE. Developers have the minimum control over operator scheduling during operator development.

The following is an example of operator development using DSL to obtain the exponent of x, accumulate and reduce along axis 0, and then obtain the reciprocal.

x = tvm.placeholder((512, 1024), "float16")
exp_x = tbe.dsl.vexp(x)
reduce_exp_x = tbe.dsl.sum(exp_x, axis = 0)
res = tbe.dsl.vrec(reduce_exp_x)

with tvm.target.cce():
    sch = tbe.dsl.auto_schedule(res)

You can call the tbe.dsl.auto_schedule API to enable automatic scheduling of TBE. Figure 2 shows the automatic scheduling process.

Figure 2 Auto Schedule workflow
  1. A compute syntax tree is transferred when the Auto Schedule API is called. Each compute statement in the TBE is added with the tag_scope tag during compilation.

    Add the tag_scope flag as follows:

    with tvm.tag_scope(op):
        tmp = tvm.compute(shape, lambda_func, name=name)

    As shown in Figure 3, the compute syntax tree on the left is also called the AST. During building, tag_scope is added to each compute statement.

    Figure 3 Example of mapping between compute statements and tag_scope
  2. The corresponding pattern is identified based on the scope tag. Pattern types supported by TBE include elewise, reduce, segment, concat, conv, depthwise, and pooling2d. TBE partitions the AST based on patterns. For example, the fundamental pattern is that elewise can be connected to any pattern, and reduce, segment, and concat must not be in the same AST subgraph.
  3. After the AST subgraph partition is complete, TBE creates and initializes a schedule object.
  4. During the scheduling process, the system finds the boundary of the AST subgraph and then selects a proper schedule template for each subgraph according to the pattern. The scheduling process includes data flow management, data tiling, and instruction mapping.