vec_rsqrt

Description

Computes the reciprocal after extracting the square root element-wise:

Prototype

vec_rsqrt(mask, dst, src, repeat_times, dst_rep_stride, src_rep_stride)

Parameters

For details, see Parameters.

dst and src must have the same data type.

Atlas 200/300/500 Inference Product : Tensors of type float16/float32

Atlas Training Series Product : Tensors of type float16/float32

Returns

None

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • If the value of src is not positive, an unknown result may be produced.
  • For the Atlas 200/300/500 Inference Product , the compute result using this API fails to meet the dual-0.1% error limit (both the error ratio and relative error are within 0.1%) with float16 input, and fails to meet the dual-0.01% error limit with float32 input. If the accuracy requirement is high, the vec_rsqrt_high_preci API is preferred.
  • For the Atlas Training Series Product , the compute result using this API fails to meet the dual-0.1% error limit (both the error ratio and relative error are within 0.1%) with float16 input, and fails to meet the dual-0.01% error limit with float32 input. If the accuracy requirement is high, the vec_rsqrt_high_preci API is preferred.
  • For other restrictions, see Restrictions.

Example

This example applies to a small amount of data that can be moved at a time, helping you understand the API functions. For more complex samples with a large amount of data, see Example.

from tbe import tik
tik_instance = tik.Tik()
# Define the tensors.
src_gm = tik_instance.Tensor("float16", (128,), name="src_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor("float16", (128,), name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor("float16", (128,), name="dst_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor("float16", (128,), name="dst_gm", scope=tik.scope_gm)
# Move the user data from the Global Memory to the Unified Buffer.
tik_instance.data_move(src_ub, src_gm, 0, 1, 8, 0, 0)
tik_instance.vec_rsqrt(128, dst_ub, src_ub, 1, 8, 8)
# Move the compute result from the Unified Buffer to the destination Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)
tik_instance.BuildCCE("test_vec_rsqrt", [src_gm], [dst_gm])

Result example:

Input:
[1, 2, 3, 4, ......, 128]

Output:
[0.998, 0.705, 0.576, 0.499, ......, 0.08813]