vec_cmpv_xx

Description

Compares two tensors by returning the truth value element-wise to the corresponding bits of dst. Multiple comparison modes are supported.

Prototype

vec_cmpv_xx (dst, src0, src1, repeat_times, src0_rep_stride, src1_rep_stride)

Pipe: Vector

Parameters

Table 1 Parameter description

Parameter

Input/Output

Description

instruction

Input

Instruction name, selected from:
  • vec_cmpv_lt: indicates that src0 is less than src1.
  • vec_cmpv_gt: indicates that src0 is greater than src1.
  • vec_cmpv_ge: indicates that src0 is greater than or equal to src1.
  • vec_cmpv_eq: indicates that src0 is equal to src1.
  • vec_cmpv_ne: indicates that src0 is not equal to src1.
  • vec_cmpv_le: indicates that src0 is less than or equal to src1.

dst

Output

Start element of the destination Tensor operand. Must be one of the following data types: uint64, uint32, uint16, uint8.

The scope of the tensor is the Unified Buffer.

src0

Input

Start element of the source Tensor operand 0.

The scope of the tensor is the Unified Buffer.

Atlas 200/300/500 Inference Product : Tensor of type float16

Atlas Training Series Product : Tensor of type float16/float32

src1

Input

Start element of the source Tensor operand 1.

The scope of the tensor is the Unified Buffer.

Has the same data type as src0.

repeat_times

Input

Repeat times (or iterations).

  • When repeat_times = 1, the addresses of the source and destination operands can overlap.
  • When repeat_times > 1, the addresses of the source and destination operands must not overlap.

src0_rep_stride

Input

Repeat stride size for source operand 0 between the corresponding blocks of successive iterations

src1_rep_stride

Input

Repeat stride size for source operand 1 between the corresponding blocks of successive iterations

Returns

None

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • The mask parameter is unavailable.
  • dst is generated contiguously. For example, if the source operand is of type float16 while the destination operand is of type uint16, eight elements of dst are skipped between adjacent iterations. If the source operand is of type float32 while the destination operand is of type uint16, four elements are skipped.
  • src0_rep_stride and src1_rep_stride , in the unit of blocks. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64.
  • To save memory space, you can define a tensor reused by the source and destination operands (which means they have overlapped addresses). The general instruction restrictions are as follows.
    • In the event of a single iteration repeat (repeat_times = 1), the source operand must completely overlap the destination operand.
    • In the event of multiple iteration repeats (repeat_times > 1), if there is a dependency between the source operand and the destination operand, that is, the destination operand of the Nth iteration is the source operand of the (N+1)th iteration, address overlapping is not allowed.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Example

  • Example 1
from tbe import tik
tik_instance = tik.Tik()
src0_gm = tik_instance.Tensor("float16", (128,), name="src0_gm", scope=tik.scope_gm)
src1_gm = tik_instance.Tensor("float16", (128,), name="src1_gm", scope=tik.scope_gm)
src0_ub = tik_instance.Tensor("float16", (128,), name="src0_ub", scope=tik.scope_ubuf)
src1_ub = tik_instance.Tensor("float16", (128,), name="src1_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor("uint16", (16,), name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor("uint16", (16,), name="dst_ub", scope=tik.scope_ubuf)
# Copy the user input to the source Unified Buffer.
tik_instance.data_move(src0_ub, src0_gm, 0, 1, 8, 0, 0)
tik_instance.data_move(src1_ub, src1_gm, 0, 1, 8, 0, 0)
# Initialize dst_ub to all 5s.
tik_instance.vector_dup(16, dst_ub, 5, 1, 1, 1)
tik_instance.vec_cmpv_eq(dst_ub, src0_ub, src1_ub, 1, 8, 8)
# Copy the compute result to the destination Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 1, 0, 0)

tik_instance.BuildCCE(kernel_name="vec_cmpv_eq", inputs=[src0_gm, src1_gm], outputs=[dst_gm])

Result example:

Input (float16):
  src0_gm = {1,2,3,...,128}
  src1_gm = {2,2,2,...,2}
Output:
dst_gm = {2,0,0,0,0,0,0,0,5,5,5,5,5,5,5,5}
  • Example 2
"""
Process the two groups of 256 source operands using vec_cmpv_gt. The first half of src0 data is the same as src1 data, and the second half of src0 data is greater than src1.
"""
from tbe import tik
tik_instance = tik.Tik()
dtype_size = {
    "int8": 1,
    "uint8": 1,
    "int16": 2,
    "uint16": 2,
    "float16": 2,
    "int32": 4,
    "uint32": 4,
    "float32": 4,
    "int64": 8,
}
src_shape = (2, 128)
dst_shape = (16, )
src_dtype = "float16"
dst_dtype = "uint16"
elements = 2 * 128

# Number of iterations, which is 2 in the current example. You can adjust the number of iterations as required.
repeat_times = 2
# Iteration stride between the previous repeat header and the next repeat header of the destination operand. The unit is 32 bytes. src0 data has an interval of eight blocks, and src1 has an interval of seven blocks. Therefore, the src0 data in the second iteration is greater than src1 data.
src0_rep_stride = 8
src1_rep_stride = 7
src0_gm = tik_instance.Tensor(src_dtype, src_shape, name="src0_gm", scope=tik.scope_gm)
src1_gm = tik_instance.Tensor(src_dtype, src_shape, name="src1_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_gm", scope=tik.scope_gm)
src0_ub = tik_instance.Tensor(src_dtype, src_shape, name="src0_ub", scope=tik.scope_ubuf)
src1_ub = tik_instance.Tensor(src_dtype, src_shape, name="src1_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_ub", scope=tik.scope_ubuf)
# Number of moved segments.
nburst = 1
# Length of the moved segment each time, in 32 bytes.
burst = elements * dtype_size[src_dtype] // 32 // nburst
# Stride between the previous burst tail and the next burst header, in 32 bytes.
dst_stride, src_stride = 0, 0
# Copy the user input to the source Unified Buffer.
tik_instance.data_move(src0_ub, src0_gm, 0, nburst, burst, src_stride, dst_stride)
tik_instance.data_move(src1_ub, src1_gm, 0, nburst, burst, src_stride, dst_stride)
tik_instance.vec_cmpv_gt(dst_ub, src0_ub, src1_ub, repeat_times, src0_rep_stride, src1_rep_stride)
# Copy the compute result to the destination Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, nburst, 1, src_stride, dst_stride)
tik_instance.BuildCCE(kernel_name="vec_cmpv_gt", inputs=[src0_gm, src1_gm], outputs=[dst_gm])


Result example:
Input (src0_gm):
[[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
   14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
   28.  29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
   42.  43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
   56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
   70.  71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
   84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
   98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
  112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
  126. 127.]
 [128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141.
  142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155.
  156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169.
  170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.
  184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197.
  198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211.
  212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225.
  226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239.
  240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253.
  254. 255.]]
Input (src1_gm):
[[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
   14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
   28.  29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
   42.  43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
   56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
   70.  71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
   84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
   98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
  112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
  126. 127.]
 [128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141.
  142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155.
  156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169.
  170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.
  184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197.
  198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211.
  212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225.
  226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239.
  240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253.
  254. 255.]]
Output (dst_gm):
[    0     0     0     0     0     0     0     0 65535 65535 65535 65535
 65535 65535 65535 65535]