new_stmt_scope

Description

Indicates a new scope (equivalent to the curly brackets in C language). The disable_sync parameter specifies whether to automatically insert a synchronization instruction in the current scope.

When there is data dependency between API calls, for example, the previous API call must be executed before the next API call, a synchronization instruction needs to be inserted between the API calls. When disable_sync is set to False, the system automatically inserts synchronization instructions as needed. If you are sure that there is no data dependency between the API calls, you can set disable_sync to True to disable auto insertion of synchronization instructions, which offers higher operator performance.

Prototype

new_stmt_scope(disable_sync=False)

Parameters

Table 1 Parameter description

Parameter

Input/Output

Description

disable_sync

Input

Specifies whether to disable auto insertion of synchronization instructions in the current scope.

If it is set to True, auto insertion of synchronization instruction is disabled. In this case, you must guarantee that there is no data dependency between API calls.

If it is set to False, the system automatically determines whether to insert a synchronization instruction. When there is data dependency between API calls, the system automatically inserts synchronization instructions.

Defaults to False.

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • A tensor defined in new_stmt_scope cannot be accessed externally from the outside beyond the range of new_stmt_scope.
  • If auto insertion of synchronization instructions is disabled, high-level API calls are not supported within this scope. A high-level API is a combination of multiple low-level APIs with mutual dependencies. Disabling auto insertion of synchronization instructions will result in unexpected results.

    The high-level APIs include: vec_expm1_high_preci, vec_ln_high_preci, vec_rec_high_preci, vec_rsqrt_high_preci, conv2d, fixpipe, matmul, vec_reduce_max, vec_reduce_min, vec_reduce_add, printf

  • If disable_sync is set to True for new_stmt_scope, a nested call to new_stmt_scope in the scope is not allowed.

Returns

A TikWithScope object.

Example

tik_instance = tik.Tik()
dtype = "float16"
shape = (2, 128)
src_gm = tik_instance.Tensor(dtype, shape, name="src_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor(dtype, shape, name="src_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
a = tik_instance.InputScalar(dtype="int32", name="a")
b = tik_instance.InputScalar(dtype="int32", name="b")
tik_instance.data_move(src_ub, src_gm, 0, 1, 16, 0, 0)
# If disable_sync is set to False and offset of data_move is a Scalar, the system cannot determine whether the addresses of the two data_move operations overlap. In this case, the system automatically inserts a synchronization instruction.
# If the values of a and b are known (for example, a = 0 and b = 1), there is no data dependency between the two data_move operations. In this case, you can set disable_sync to True to disable the insertion of synchronization instruction and cancel the synchronization between movement instructions, to improve the performance.
with tik_instance.new_stmt_scope(disable_sync=True):
    tik_instance.data_move(dst_ub[a*128:], src_ub[a*128:], 0, 1, 8, 0, 0)
    tik_instance.data_move(dst_ub[b*128:], src_ub[b*128:], 0, 1, 8, 0, 0)
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
tik_instance.BuildCCE(kernel_name="sample", inputs=[src_gm, a, b], outputs=[dst_gm])