Synchronization Instruction Analysis
To prevent memory overwrites, the system automatically inserts synchronization instructions at operator build time. However, redundant synchronization instructions will compromise operator performance. TIK supports hierarchical synchronization. As such, at operator build time, you can disable automatic insertion of synchronization instructions in a code snippet under the premise that memory overwrites will not occur.
To disable auto insertion of synchronization instructions at build time, set the scope based on the corresponding code blocks and set disable_sync to True.
with tik_instance.new_stmt_scope(disable_sync=True)
new_stmt_scope indicates a new scope. If there is no data dependency among APIs in this scope, synchronization instruction insertion can be disabled with disable_sync set to True. Behind the scene, no instruction is inserted at operator build time, guaranteeing high performance.
tik_instance = tik.Tik()
dtype = "float16"
shape = (2, 128)
src_gm = tik_instance.Tensor(dtype, shape, name="src_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor(dtype, shape, name="src_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
a = tik_instance.InputScalar(dtype="int32", name="a")
b = tik_instance.InputScalar(dtype="int32", name="b")
tik_instance.data_move(src_ub, src_gm, 0, 1, 16, 0, 0)
# If disable_sync is set to False and offset of data_move is a Scalar, the system cannot determine whether the addresses of the two data_move operations overlap. In this case, the system automatically inserts a synchronization instruction.
# If the values of a and b are known (for example, a = 0 and b = 1), there is no data dependency between the two data_move operations. In this case, you can set disable_sync to True to disable the insertion of synchronization instruction and cancel the synchronization between movement instructions, to improve the performance.
with tik_instance.new_stmt_scope(disable_sync=True):
tik_instance.data_move(dst_ub[a*128:], src_ub[a*128:], 0, 1, 8, 0, 0)
tik_instance.data_move(dst_ub[b*128:], src_ub[b*128:], 0, 1, 8, 0, 0)
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
tik_instance.BuildCCE(kernel_name="sample", inputs=[src_gm, a, b], outputs=[dst_gm])