BuildCCE

Description

Generates a TIK description language defined on the target machine and builds the TIK description language into a binary that is executable on Ascend AI Processor and the corresponding configuration file.

Prototype

BuildCCE(kernel_name, inputs, outputs, output_files_path=None, enable_l2=False, config=None, flowtable=None, evaluates=None, extend_params=None)

Parameters

Table 1 Parameter description

Parameter

Input/Output

Description

kernel_name

Input

  • A string.
  • Specifies the names of the generated binary file and CCE kernel function.
  • Format: The value consists of digits, letters, and underscores (_) and cannot start with a digit.
  • Example:

    If the string test is passed, the generated binary file is named test.o, and the generated CCE kernel function is named test__kernel0.

inputs

Input

  • A list or tuple of Tensors or InputScalars. Must be in the scope_gm scope.
  • Specifies the operator inputs in the generated CCE kernel function. The list or tuple length is up to 64.

outputs

Input

  • A list or tuple of Tensors. Must be in the scope_gm scope.
  • Specifies the operator outputs in the generated CCE kernel function. The list or tuple length is up to 64.
  • When outputs=[], a length 1 array whose element size is 0 is returned, that is, [ [0] ].

output_files_path

Input

A string specifying the build output path.

  • Defaults to None, indicating that the outputs are generated to ./kernel_meta in the working directory.
  • If an alternative path is specified, the outputs are generated to ./kernel_meta in the specified path. The path can be either absolute or relative.

enable_l2

Input

A bool specifying whether L2 is set as the external memory. Defaults to False.

This parameter does not take effect.

config

Input

A dictionary of key-value pairs, used to configure the operator build properties. key is of type string.

Format:

config = {"key":value}

Example:

config = {"tbe_debug_level":2,"enable_const_fold":True,"double_buffer_non_reuse":True}

The following keys are supported:

  • kernel_meta_parent_dir: parent path of the kernel_meta folder that stores debugging files during operator build. The debugging files include the operator binary files (.o), operator description files (.json), and .cce files. This parameter is equivalent to output_file_path. If they are both configured, output_file_path takes priority.

    The value is a path string relative to the path where the compile command is run.

    Defaults to a dot (.) indicating that the debugging files are stored in the ./kernel_meta folder in path where the compile command is run.

  • tbe_debug_level: debug level, selected from:
    • 0 (default): disables debug. Only the operator binary file (.o) and operator description file (.json) are generated in the kernel_meta folder.
    • 1: enables operator debug. TBE instruction mapping files, including the operator CCE file (.cce), Python-CCE mapping file (*_loc.json), and operator .o and .json files, are generated in the kernel_meta directory. Enabling operator debug might compromise the operator performance.
    • 2: enables operator debug. TBE instruction mapping files, including the operator CCE file (.cce), Python-CCE mapping file (*_loc.json), and operator .o and .json files, are generated in the kernel_meta directory. Build optimization is disabled while CCE compiler debug is enabled (by setting -O0-g). Enabling operator debug might compromise the operator performance.
      NOTICE:

      The --op_debug_level argument in the ATC command line (if any) takes precedence over this parameter.

  • enable_const_fold: constant folding enable, either True or False. Constant folding is a process of simplifying constants at build time. By folding constants, you can simplify your code and improve the execution performance. Defaults to False. See the following example:

    a = 1 means to define a TIK Scalar and assign it with the value 1. If the Scalar has been defined, only the value 1 is assigned. a = ? means the value is unknown until the code is executed. vec_add(a) indicates an instruction call to which the argument a is passed.

    Before folding:

    a = 1
    b = a + 1
    vec_add(b)
    c = ?
    b = c + 1 + a
    vec_add(b)

    After folding:

    a = 1
    b = 1 + 1 = 2
    vec_add(2)
    c = ?
    b = c + 1 + 1 = c + 2
    vec_add(b)
  • double_buffer_non_reuse: If it is set to True, the ping and pong variables in double_buffer are not reused; False otherwise. Defaults to False. Note that this configuration is reserved and it may be changed in the future releases.

flowtable

Input

A list or tuple of Tensors or InputScalars. Must be in the scope_gm scope.

A flowtable of tiling parameters (computed by the operator selector in the dynamic-shape scenario). The flowtable length and inputs length adds up to less than or equal to 64.

evaluates

Input

Debug parameters, used to assign a value to a defined global Scalar variable at build time.

A dictionary of key-value pairs, with Scalar variable names as the keys. The values are of type Python int or float.

Format:

evaluates = {key : value}

Example:

a = tik_instance.Scalar(dtype="float16")
a.set_as(1)
tik_instance.BuildCCE(..., evaluates = {a : 2})

extend_params

Input

A dictionary of key-value pairs for extended parameters.

key: The data type is string.

value: The data type is key-specific.

Example:

extend_params = {"param1": value1, 
                 "param2": value2}

For details, see Table 2.

Table 2 extend_params description

Extended Parameter

Description

build_multi_kernels

Builds a kernel for each tiling policy. Kernels are named according to the index of each tiling policy. The kernels are encapsulated in one .o file. The runtime framework automatically calls the corresponding functions based on the tiling policy.

Example:

tik_instance.BuildCCE(...,
 extend_params={"build_multi_kernels":{
  "tiling_key":[Scalar1, Scalar2],
  "tiling_key_value":
       [[Scalar1_val_1,Scalar2_val_1],
       [Scalar1_val_2,Scalar2_val_2]]}})

In the preceding code:

tiling_key is the keyword of the tiling policy:

  • Must be a list or tuple of Scalar variables.
  • Supported data types: int8, int16, int32, int64, float16, float32

tiling_key_value is the value corresponding to the keyword of the tiling policy:

  • Must be a 2D list or tuple.
  • Supported data types: Python int and floats. The value must be representable by the Scalar data type.
  • The length of the second dimension of tiling_key_value must be the same as that of tiling_key. That is, the input data of tiling_key_value is the collection of the input data of tiling_key.
  • The length of the first dimension of tiling_key_value determines the number of kernel functions in an .o file.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • inputs and outputs must have different tensors. Otherwise, a TIK error is reported.
  • All tensors in scope_gm must be in inputs or outputs (except tensors in the workspace or tensors with init_value assigned). Otherwise, a compilation error is reported.
  • When there is no output, BuildCCE specifies a length 1 array with data 0: If outputs=[], [[0]] is returned.
  • In inputs, Tensors must be placed before InputScalars.
  • flowtable allows only one Tensor, which must be placed before InputScalars.
  • evaluates only changes the initial value of a Scalar. The dictionary length is up to 16.

Returns

None

Example

from tbe import tik
tik_instance = tik.Tik()
data_A = tik_instance.Tensor("float16", (128,), name="data_A", scope=tik.scope_gm)
data_B = tik_instance.Tensor("float16", (128,), name="data_B", scope=tik.scope_gm)
data_C = tik_instance.Tensor("float16", (128,), name="data_C", scope=tik.scope_gm)
tik_instance.BuildCCE(kernel_name="simple_add",inputs=[data_A,data_B],outputs=[data_C])