autotune

Function

Traverses the search space, tests different parameter combinations, and displays the running time of each combination and the optimal combination.

Function Prototype

def autotune(configs: List[Dict], warmup: int = 300, repeat: int = 1, device_ids = [0]):

Parameters

Parameter

Input/Output

Description

configs

Input

Search space definition.

Data type: list[dict].

This parameter is required.

warmup

Input

Preheating time before performance collection. Longer preheating times typically lead to more stable operator performance.

Unit: µs.

This parameter is optional. The default value is 1000. The value is an integer ranging from 1 to 100000.

repeat

Input

Number of repeat times. The average running duration of multiple repeats is used as the operator duration.

This parameter is optional. The default value is 1. The value is an integer ranging from 1 to 10000.

device_ids

Input

Device ID list. Currently, only the single-device mode is supported. If multiple device IDs are entered, only the first device ID takes effect.

This parameter is optional. The default value is [0].

Return Value

None.

Example

@mskpp.autotune(configs=[
    {'L1TileShape': 'MatmulShape<64, 64, 64>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
    {'L1TileShape': 'MatmulShape<64, 64, 128>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
    {'L1TileShape': 'MatmulShape<64, 128, 128>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
    {'L1TileShape': 'MatmulShape<64, 128, 128>', 'L0TileShape': 'MatmulShape<64, 256, 64>'},
    {'L1TileShape': 'MatmulShape<128, 128, 128>', 'L0TileShape': 'MatmulShape<128, 256, 64>'},
], warmup=500, repeat=10, device_ids=[0])
def basic_matmul(problem_shape, a, layout_a, b, layout_b, c, layout_c):
    kernel = get_kernel()
    blockdim = 20
    return kernel[blockdim](problem_shape, a, layout_a, b, layout_b, c, layout_c)