new_stmt_scope
Description
Indicates a new scope (equivalent to the curly brackets in C language). The disable_sync parameter specifies whether to automatically insert a synchronization instruction in the current scope.
When there is data dependency between API calls, for example, the previous API call must be executed before the next API call, a synchronization instruction needs to be inserted between the API calls. When disable_sync is set to False, the system automatically inserts synchronization instructions as needed. If you are sure that there is no data dependency between the API calls, you can set disable_sync to True to disable auto insertion of synchronization instructions, which offers higher operator performance.
Prototype
new_stmt_scope(disable_sync=False)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
disable_sync |
Input |
Specifies whether to disable auto insertion of synchronization instructions in the current scope. If it is set to True, auto insertion of synchronization instruction is disabled. In this case, you must guarantee that there is no data dependency between API calls. If it is set to False, the system automatically determines whether to insert a synchronization instruction. When there is data dependency between API calls, the system automatically inserts synchronization instructions. Defaults to False. |
Applicability
Restrictions
- A tensor defined in new_stmt_scope cannot be accessed externally from the outside beyond the range of new_stmt_scope.
- If auto insertion of synchronization instructions is disabled, high-level API calls are not supported within this scope. A high-level API is a combination of multiple low-level APIs with mutual dependencies. Disabling auto insertion of synchronization instructions will result in unexpected results.
The high-level APIs include: vec_expm1_high_preci, vec_ln_high_preci, vec_rec_high_preci, vec_rsqrt_high_preci, conv2d, fixpipe, matmul, vec_reduce_max, vec_reduce_min, vec_reduce_add, printf
- If disable_sync is set to True for new_stmt_scope, a nested call to new_stmt_scope in the scope is not allowed.
Returns
A TikWithScope object.
Example
tik_instance = tik.Tik()
dtype = "float16"
shape = (2, 128)
src_gm = tik_instance.Tensor(dtype, shape, name="src_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor(dtype, shape, name="src_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
a = tik_instance.InputScalar(dtype="int32", name="a")
b = tik_instance.InputScalar(dtype="int32", name="b")
tik_instance.data_move(src_ub, src_gm, 0, 1, 16, 0, 0)
# If disable_sync is set to False and offset of data_move is a Scalar, the system cannot determine whether the addresses of the two data_move operations overlap. In this case, the system automatically inserts a synchronization instruction.
# If the values of a and b are known (for example, a = 0 and b = 1), there is no data dependency between the two data_move operations. In this case, you can set disable_sync to True to disable the insertion of synchronization instruction and cancel the synchronization between movement instructions, to improve the performance.
with tik_instance.new_stmt_scope(disable_sync=True):
tik_instance.data_move(dst_ub[a*128:], src_ub[a*128:], 0, 1, 8, 0, 0)
tik_instance.data_move(dst_ub[b*128:], src_ub[b*128:], 0, 1, 8, 0, 0)
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
tik_instance.BuildCCE(kernel_name="sample", inputs=[src_gm, a, b], outputs=[dst_gm])