vec_relu

Description

Performs ReLU element-wise:

ReLU stands for rectified linear unit, and is the most used activation function in artificial neural networks.

Prototype

vec_relu(mask, dst, src, repeat_times, dst_rep_stride, src_rep_stride)

Parameters

For details, see Parameters.

dst has the same data type as src.

Atlas 200/300/500 Inference Product : Tensors of type float16

Atlas Training Series Product : Tensors of type float16

Returns

None

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

For details, see Restrictions.

Example

This example applies to a small amount of data that can be moved at a time, helping you understand the API functions. For more complex samples with a large amount of data, see Example.

from tbe import tik
tik_instance = tik.Tik()
src_gm = tik_instance.Tensor("float16", (128,), name="src_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor("float16", (128,), name="dst_gm", scope=tik.scope_gm)
src_ub = tik_instance.Tensor("float16", (128,), name="src_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor("float16", (128,), name="dst_ub", scope=tik.scope_ubuf)
# Move the user input from the Global Memory to the Unified Buffer.
tik_instance.data_move(src_ub, src_gm, 0, 1, 8, 0, 0)
tik_instance.vec_relu(128, dst_ub, src_ub, 1, 8, 8)
# Move the compute result from the Unified Buffer to the destination Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 8, 0, 0)

tik_instance.BuildCCE(kernel_name="vec_relu", inputs=[src_gm], outputs=[dst_gm])

Result example:

Input (src_gm):
[ 6.938   -8.86    -0.2263   6.77     2.924    1.759    0.4253  -5.23
 -1.892   -3.049    4.      -9.49    -0.8145   1.974    7.793    2.13
 -3.799    1.292   -0.311   -6.883   -3.29     6.445    7.65     6.76
  8.96    -6.84     3.111   -6.984    7.773   -7.605   -1.563   -5.6
 -2.938    6.785   -1.157    2.373   -3.924   -1.134   -5.523    7.082
  0.5425   9.33     3.734   -7.004   -3.535   -6.35     2.137   -6.42
 -3.076    4.93    -8.234   -7.156   -9.96    -2.623   -2.625   -8.516
  0.88    -3.312   -9.23    -4.734   -0.834    1.154   -0.2268   6.79
  0.559   -4.3     -0.2212   0.02264 -2.775    3.691    8.13    -5.555
  8.766    0.1989  -4.473   -7.99    -5.81    -2.379   -8.64     9.85
  6.867    3.43    -5.176    8.89     5.55     4.586   -8.45     0.3813
  2.875    4.027   -8.96    -9.49    -3.764    4.688   -0.723    8.24
  4.67     4.016    5.266    9.47    -3.033    9.53     2.674    0.2131
  6.836    0.3386   9.95     4.73     5.87    -3.758   -9.45     2.574
 -8.914    9.49     7.42    -7.453    8.19     3.479   -0.0785   0.1791
 -7.098   -9.5      7.41     3.854   -7.57    -6.91     1.971    1.778  ]

Output (dst_gm):
[6.938   0.      0.      6.77    2.924   1.759   0.4253  0.      0.
 0.      4.      0.      0.      1.974   7.793   2.13    0.      1.292
 0.      0.      0.      6.445   7.65    6.76    8.96    0.      3.111
 0.      7.773   0.      0.      0.      0.      6.785   0.      2.373
 0.      0.      0.      7.082   0.5425  9.33    3.734   0.      0.
 0.      2.137   0.      0.      4.93    0.      0.      0.      0.
 0.      0.      0.88    0.      0.      0.      0.      1.154   0.
 6.79    0.559   0.      0.      0.02264 0.      3.691   8.13    0.
 8.766   0.1989  0.      0.      0.      0.      0.      9.85    6.867
 3.43    0.      8.89    5.55    4.586   0.      0.3813  2.875   4.027
 0.      0.      0.      4.688   0.      8.24    4.67    4.016   5.266
 9.47    0.      9.53    2.674   0.2131  6.836   0.3386  9.95    4.73
 5.87    0.      0.      2.574   0.      9.49    7.42    0.      8.19
 3.479   0.      0.1791  0.      0.      7.41    3.854   0.      0.
 1.971   1.778  ]