Operator Implementation

This section describes how to implement an operator based on the tiling policy. For the complete implementation code of the BatchNorm operator, see Sample Usage.

Code Structure

The code structure varies with the tiling policy.

import os
import warnings

from tbe import tik
from tbe.common.platform import set_current_compile_soc_info

#tiling mode 2 ub size
TILING_1_UB_SIZE = 112*1024

#max allow general mem size
TOTAL_GM_SIZE = 512*1024*1024   #GM_MAX_SIZE

# batch for N
MAX_BATCH = 1

# channel for C
MAX_CHANNEL = 1024

# width for W
MAX_WIDTH = 512

# height for H
MAX_HEIGHT = 1024

class BatchNorm(object):
    # Initialization function
    def __init__(self, input0, gamma0, beta0, output0, kernel_name = "BatchNorm"): 
      """
       Implement the initialization function.
      """
    # Implement the compute logic and build the operator.
    def batchnorm_compute(self):
       # The compute logic varies with the tiling policy.
       self.batchnorm_compute_tiling_c()
     
        # Build the operator.
        self.tik_instance.BuildCCE(kernel_name=self.kernel_name,
                                   inputs=[self.input_gm,
                                           self.gamma_gm,
                                           self.beta_gm],
                                   outputs=[self.output_gm],
                                   flowtable=[self.input_n, self.input_c,
                                              self.input_h, self.input_w,
                                              self.inputtype, self.output_n,
                                              self.output_c, self.output_h,
                                              self.output_w, self.outputtype,
                                              self.gamma_c, self.gammatype,
                                              self.beta_c, self.betatype,
                                              self.param1, self.param2,
                                              self.param3, self.param4,
                                              self.param5, self.param6,
                                              self.param7, self.param8,
                                              self.param9, self.param10],
                                   config={"double_buffer_non_reuse": True,
                                           "out_of_bound_sync_check": True})
        return self.tik_instance

# Operator definition function
def batch_norm(input_x, gamma, beta, output, kernel_name="BatchNorm"):
    obj = BatchNorm(input_x, gamma, beta, output, kernel_name)
    obj.batchnorm_compute()

Pay attention to the following points:

  • The code structure varies slightly with the tiling policy. The main difference lies in the compute logic.
  • Pass flowtable of tiling arguments at operator build time in TIK mode using the BuildCCE API. The flowtable arguments are computed in Operator Selector equivalent to adding an address space for buffering the tiling arguments to the output container. Ensure that the flowtable length and the input argument count add up to less than or equal to 64. The flowtable arguments are of the TIK InputScalar type and are not computed in TIK. Iterative scalar compute process is moved to the operator selector scheduled on the host CPU, reducing the scalar compute process workload on the AI Cores for pipeline parallelism.