vec_cmpv_xx
Description
Compares two tensors by returning the truth value element-wise to the corresponding bits of dst. Multiple comparison modes are supported.
Prototype
vec_cmpv_xx (dst, src0, src1, repeat_times, src0_rep_stride, src1_rep_stride)
Pipe: Vector
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
instruction |
Input |
Instruction name, selected from:
|
|
dst |
Output |
Start element of the destination Tensor operand. Must be one of the following data types: uint64, uint32, uint16, uint8. The scope of the tensor is the Unified Buffer. |
|
src0 |
Input |
Start element of the source Tensor operand 0. The scope of the tensor is the Unified Buffer. |
|
src1 |
Input |
Start element of the source Tensor operand 1. The scope of the tensor is the Unified Buffer. Has the same data type as src0. |
|
repeat_times |
Input |
Repeat times (or iterations).
|
|
src0_rep_stride |
Input |
Repeat stride size for source operand 0 between the corresponding blocks of successive iterations |
|
src1_rep_stride |
Input |
Repeat stride size for source operand 1 between the corresponding blocks of successive iterations |
Returns
None
Applicability
Restrictions
- The mask parameter is unavailable.
- dst is generated contiguously. For example, if the source operand is of type float16 while the destination operand is of type uint16, eight elements of dst are skipped between adjacent iterations. If the source operand is of type float32 while the destination operand is of type uint16, four elements are skipped.
- src0_rep_stride and src1_rep_stride
, in the unit of blocks. Must be a Scalar of type int16/int32/int64/uint16/uint32/uint64, an immediate of type int, or an Expr of type int16/int32/int64/uint16/uint32/uint64. - To save memory space, you can define a tensor reused by the source and destination operands (which means they have overlapped addresses). The general instruction restrictions are as follows.
- In the event of a single iteration repeat (repeat_times = 1), the source operand must completely overlap the destination operand.
- In the event of multiple iteration repeats (repeat_times > 1), if there is a dependency between the source operand and the destination operand, that is, the destination operand of the Nth iteration is the source operand of the (N+1)th iteration, address overlapping is not allowed.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Example
- Example 1
from tbe import tik
tik_instance = tik.Tik()
src0_gm = tik_instance.Tensor("float16", (128,), name="src0_gm", scope=tik.scope_gm)
src1_gm = tik_instance.Tensor("float16", (128,), name="src1_gm", scope=tik.scope_gm)
src0_ub = tik_instance.Tensor("float16", (128,), name="src0_ub", scope=tik.scope_ubuf)
src1_ub = tik_instance.Tensor("float16", (128,), name="src1_ub", scope=tik.scope_ubuf)
dst_gm = tik_instance.Tensor("uint16", (16,), name="dst_gm", scope=tik.scope_gm)
dst_ub = tik_instance.Tensor("uint16", (16,), name="dst_ub", scope=tik.scope_ubuf)
# Copy the user input to the source Unified Buffer.
tik_instance.data_move(src0_ub, src0_gm, 0, 1, 8, 0, 0)
tik_instance.data_move(src1_ub, src1_gm, 0, 1, 8, 0, 0)
# Initialize dst_ub to all 5s.
tik_instance.vector_dup(16, dst_ub, 5, 1, 1, 1)
tik_instance.vec_cmpv_eq(dst_ub, src0_ub, src1_ub, 1, 8, 8)
# Copy the compute result to the destination Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, 1, 1, 0, 0)
tik_instance.BuildCCE(kernel_name="vec_cmpv_eq", inputs=[src0_gm, src1_gm], outputs=[dst_gm])
Result example:
Input (float16):
src0_gm = {1,2,3,...,128}
src1_gm = {2,2,2,...,2}
Output:
dst_gm = {2,0,0,0,0,0,0,0,5,5,5,5,5,5,5,5}
- Example 2
"""
Process the two groups of 256 source operands using vec_cmpv_gt. The first half of src0 data is the same as src1 data, and the second half of src0 data is greater than src1.
"""
from tbe import tik
tik_instance = tik.Tik()
dtype_size = {
"int8": 1,
"uint8": 1,
"int16": 2,
"uint16": 2,
"float16": 2,
"int32": 4,
"uint32": 4,
"float32": 4,
"int64": 8,
}
src_shape = (2, 128)
dst_shape = (16, )
src_dtype = "float16"
dst_dtype = "uint16"
elements = 2 * 128
# Number of iterations, which is 2 in the current example. You can adjust the number of iterations as required.
repeat_times = 2
# Iteration stride between the previous repeat header and the next repeat header of the destination operand. The unit is 32 bytes. src0 data has an interval of eight blocks, and src1 has an interval of seven blocks. Therefore, the src0 data in the second iteration is greater than src1 data.
src0_rep_stride = 8
src1_rep_stride = 7
src0_gm = tik_instance.Tensor(src_dtype, src_shape, name="src0_gm", scope=tik.scope_gm)
src1_gm = tik_instance.Tensor(src_dtype, src_shape, name="src1_gm", scope=tik.scope_gm)
dst_gm = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_gm", scope=tik.scope_gm)
src0_ub = tik_instance.Tensor(src_dtype, src_shape, name="src0_ub", scope=tik.scope_ubuf)
src1_ub = tik_instance.Tensor(src_dtype, src_shape, name="src1_ub", scope=tik.scope_ubuf)
dst_ub = tik_instance.Tensor(dst_dtype, dst_shape, name="dst_ub", scope=tik.scope_ubuf)
# Number of moved segments.
nburst = 1
# Length of the moved segment each time, in 32 bytes.
burst = elements * dtype_size[src_dtype] // 32 // nburst
# Stride between the previous burst tail and the next burst header, in 32 bytes.
dst_stride, src_stride = 0, 0
# Copy the user input to the source Unified Buffer.
tik_instance.data_move(src0_ub, src0_gm, 0, nburst, burst, src_stride, dst_stride)
tik_instance.data_move(src1_ub, src1_gm, 0, nburst, burst, src_stride, dst_stride)
tik_instance.vec_cmpv_gt(dst_ub, src0_ub, src1_ub, repeat_times, src0_rep_stride, src1_rep_stride)
# Copy the compute result to the destination Global Memory.
tik_instance.data_move(dst_gm, dst_ub, 0, nburst, 1, src_stride, dst_stride)
tik_instance.BuildCCE(kernel_name="vec_cmpv_gt", inputs=[src0_gm, src1_gm], outputs=[dst_gm])
Result example:
Input (src0_gm):
[[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.
42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55.
56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69.
70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83.
84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97.
98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
126. 127.]
[128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141.
142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155.
156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169.
170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.
184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197.
198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211.
212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225.
226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239.
240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253.
254. 255.]]
Input (src1_gm):
[[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.
42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55.
56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69.
70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83.
84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97.
98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
126. 127.]
[128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141.
142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155.
156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169.
170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.
184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197.
198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211.
212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225.
226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239.
240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253.
254. 255.]]
Output (dst_gm):
[ 0 0 0 0 0 0 0 0 65535 65535 65535 65535
65535 65535 65535 65535]