vec_expm1_high_preci

Description

Computes the natural base element-wise: $\text{[math]}$ . This API has a higher precision than vec_exp.

Prototype

vec_expm1_high_preci(mask, dst, src, work_tensor, repeat_times, dst_rep_stride, src_rep_stride)

Parameters

For details, see Parameters. The following describes only the dst, src, and work_tensor parameters.

dst, src, and work_tensor are Tensors of type float16.

If the source operand tensor has an offset, the passing formats are as follows: tensor[offset1:offset2] means starting from offset1 and ending at offset2. tensor[offset1:] means starting from offset1. tensor[offset] means that only one element is passed. (In this case, the tensor is impossible to be sliced and a runtime error will be reported. Therefore, this format is not allowed.)
If the source operand tensor does not have an offset, the tensor can be passed directly.

work_tensor:

work_tensor is a user-defined temporary buffer space for storing the intermediate result. The space is limited to scope_ubuf and is used for internal computation only.

work_tensor space calculation:

Calculate the minimum buffer space required for src computation based on repeat_times and src_rep_stride as follows: src_extent_size = (repeat_times – 1) * src_rep_stride * 16 + 128. If 0 < src_rep_stride <= 8, consider src_rep_stride as 8. Otherwise, retain its original value.

Round up the minimum space required for src computation to the multiple of 32 bytes: wk_size_unit = (src_extent_size + 15)//16 * 16
Calculate the size of work_tensor as follows: work_tensor = 11 * wk_size_unit

Example of work_tensor space calculation:

If repeat_times = 1 and src_rep_stride = 8, then src_extent_size= 128 and work_tensor = 128 * 11.
If repeat_times = 2 and src_rep_stride = 4, then src_extent_size = (2 – 1) * 8 * 16 + 128 = 256 and work_tensor = 256 * 11.

Restrictions

dst, src, and work_tensor must be declared in scope_ubuf.
The space of the dst, src, and work_tensor tensors must not overlap.
The final compute result must be within the data range. Otherwise, inf or a saturated result is yielded.
The compute result of e^x – 1 using this API has higher accuracy than using the vec_exp API.
For other restrictions, see Restrictions.

Returns

None

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Example

This example applies to a small amount of data that can be moved at a time, helping you understand the API functions. For more complex samples with a large amount of data, see Example.

from tbe import tik
tik_instance = tik.Tik()
src_gm = tik_instance.Tensor("float16", (128,), name="src_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor("float16", (128,), name="src_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor("float16", (128,), name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor("float16", (128,), name="dst_ub", scope=tik.scope_ubuf)
# The required space is ((1 – 1) * 8 * 16 + 128) * 11 = 128 * 11.
work_tensor_ub = tik_instance.Tensor("float16", (128*11,), name="work_tensor_ub", scope=tik.scope_ubuf)
tik_instance.data_move(src_ub, src_gm, 0, 1, 8, 0, 0)
tik_instance.vec_expm1_high_preci(128, dst_ub, src_ub, work_tensor_ub, 1, 8, 8)
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)
tik_instance.BuildCCE(kernel_name="expm1", inputs=[src_gm], outputs=[dst_gm])

Result example:

Input:
[0, 1, 2, 3, ......]

Output:
[0.0, 1.719, 6.391, 19.08, ......]

Parent topic: Single Source (Gather Mode)