vec_trans
Description
Transposes contiguous blocks of a 16 x 16 2D matrix for repeat_times times. Each iteration operates 256 contiguous address space data blocks. The addresses between different iterations can be incontiguous. The address space between adjacent iterations is specified by dst_rep_stride and src_rep_stride.
Prototype
vec_trans(dst, src, repeat_times, dst_rep_stride, src_rep_stride)
Parameters
Parameter |
Input/Output |
Description |
dst |
Output |
A tensor for the destination operand. Must be one of the following data types: int16, uint16, float16 The scope of the tensor is the Unified Buffer. |
src |
Input |
A tensor for the source operand. Must be one of the following data types: int16, uint16, float16 The scope of the tensor is the Unified Buffer. |
repeat_times |
Input |
Number of repeats. Must be a Scalar of type int/uint, an immediate of type int, or an Expr of type int/uint. Must be in the range of [1, 4095]. |
dst_rep_stride |
Input |
dst address space between adjacent iterations (unit: 512 bytes). Must be a Scalar of type int/uint, an immediate of type int, or an Expr of type int/uint. Must be in the range of [0, 4095]. |
src_rep_stride |
Input |
src address space between adjacent iterations (unit: 512 bytes). Must be a Scalar of type int/uint, an immediate of type int, or an Expr of type int/uint. Must be in the range of [0, 4095]. |
Applicability
Restrictions
- To save memory space, you can define a tensor shared by the source and destination operands (by address overlapping). The general instruction restriction is that the source operand must completely overlap the destination operand.
- For details about the alignment requirements of the operand address offset, see General Restrictions.
Returns
None
Example
- Example 1
from tbe import tik tik_instance = tik.Tik() src_gm = tik_instance.Tensor("float16", (1,16,16), name="src_gm", scope=tik.scope_gm) src_ub = tik_instance.Tensor("float16", (1,16,16), name="src_ub", scope=tik.scope_ubuf) dst_gm = tik_instance.Tensor("float16", (1,16,16), name="dst_gm", scope=tik.scope_gm) dst_ub = tik_instance.Tensor("float16", (1,16,16), name="dst_ub", scope=tik.scope_ubuf) # Copy the user input to the source Unified Buffer. tik_instance.data_move(src_ub, src_gm, 0, 1, 16, 0, 0) tik_instance.vec_trans(dst_ub, src_ub, 1, 1, 1) # Copy the compute result to the destination Global Memory. tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0) tik_instance.BuildCCE(kernel_name="vec_trans", inputs=[src_gm], outputs=[dst_gm])Result example:
Description: Input: src_gm=[1,2,3,4,...,256] Output: dst_gm=[1,17,33,49,...,256]
- Example 2
tik_instance = tik.Tik() shape = (3,16,16) dtype = "float16" # Number of iterations repeat_time = 2 # Address stride between two adjacent iterations, in 512 bytes. In the current example, the source data is read contiguously. The header interval between the first and second iterations of the destination data is 2 x 512 bytes. dst_rep_stride = 2 src_rep_stride = 1 src_gm = tik_instance.Tensor(dtype, shape, name="src_gm", scope=tik.scope_gm) src_ub = tik_instance.Tensor(dtype, shape, name="src_ub", scope=tik.scope_ubuf) dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm) dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf) # Copy the user input to the source Unified Buffer. For details about data_move, see the corresponding section. tik_instance.data_move(src_ub, src_gm, 0, 1, 48, 0, 0) # To facilitate observation, initialize dst_ub to 0. For details about vec_dup, see the corresponding section. tik_instance.vec_dup(128, dst_ub, 0, 6, 8) tik_instance.vec_trans(dst_ub, src_ub, repeat_time, dst_rep_stride, src_rep_stride) # Copy the compute result to the destination Global Memory. For details about data_move, see the corresponding section. tik_instance.data_move(dst_gm, dst_ub, 0, 1, 48, 0, 0) tik_instance.BuildCCE(kernel_name="vec_trans", inputs=[src_gm], outputs=[dst_gm])
Result example:
Description: Input: [[[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.] [ 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.] [ 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47.] [ 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63.] [ 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79.] [ 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95.] [ 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.] [112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127.] [128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143.] [144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159.] [160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175.] [176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191.] [192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207.] [208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223.] [224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239.] [240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253. 254. 255.]] [[256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271.] [272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287.] [288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301. 302. 303.] [304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319.] [320. 321. 322. 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333. 334. 335.] [336. 337. 338. 339. 340. 341. 342. 343. 344. 345. 346. 347. 348. 349. 350. 351.] [352. 353. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364. 365. 366. 367.] [368. 369. 370. 371. 372. 373. 374. 375. 376. 377. 378. 379. 380. 381. 382. 383.] [384. 385. 386. 387. 388. 389. 390. 391. 392. 393. 394. 395. 396. 397. 398. 399.] [400. 401. 402. 403. 404. 405. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415.] [416. 417. 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429. 430. 431.] [432. 433. 434. 435. 436. 437. 438. 439. 440. 441. 442. 443. 444. 445. 446. 447.] [448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459. 460. 461. 462. 463.] [464. 465. 466. 467. 468. 469. 470. 471. 472. 473. 474. 475. 476. 477. 478. 479.] [480. 481. 482. 483. 484. 485. 486. 487. 488. 489. 490. 491. 492. 493. 494. 495.] [496. 497. 498. 499. 500. 501. 502. 503. 504. 505. 506. 507. 508. 509. 510. 511.]] [[512. 513. 514. 515. 516. 517. 518. 519. 520. 521. 522. 523. 524. 525. 526. 527.] [528. 529. 530. 531. 532. 533. 534. 535. 536. 537. 538. 539. 540. 541. 542. 543.] [544. 545. 546. 547. 548. 549. 550. 551. 552. 553. 554. 555. 556. 557. 558. 559.] [560. 561. 562. 563. 564. 565. 566. 567. 568. 569. 570. 571. 572. 573. 574. 575.] [576. 577. 578. 579. 580. 581. 582. 583. 584. 585. 586. 587. 588. 589. 590. 591.] [592. 593. 594. 595. 596. 597. 598. 599. 600. 601. 602. 603. 604. 605. 606. 607.] [608. 609. 610. 611. 612. 613. 614. 615. 616. 617. 618. 619. 620. 621. 622. 623.] [624. 625. 626. 627. 628. 629. 630. 631. 632. 633. 634. 635. 636. 637. 638. 639.] [640. 641. 642. 643. 644. 645. 646. 647. 648. 649. 650. 651. 652. 653. 654. 655.] [656. 657. 658. 659. 660. 661. 662. 663. 664. 665. 666. 667. 668. 669. 670. 671.] [672. 673. 674. 675. 676. 677. 678. 679. 680. 681. 682. 683. 684. 685. 686. 687.] [688. 689. 690. 691. 692. 693. 694. 695. 696. 697. 698. 699. 700. 701. 702. 703.] [704. 705. 706. 707. 708. 709. 710. 711. 712. 713. 714. 715. 716. 717. 718. 719.] [720. 721. 722. 723. 724. 725. 726. 727. 728. 729. 730. 731. 732. 733. 734. 735.] [736. 737. 738. 739. 740. 741. 742. 743. 744. 745. 746. 747. 748. 749. 750. 751.] [752. 753. 754. 755. 756. 757. 758. 759. 760. 761. 762. 763. 764. 765. 766. 767.]]] Output: [[[ 0. 16. 32. 48. 64. 80. 96. 112. 128. 144. 160. 176. 192. 208. 224. 240.] [ 1. 17. 33. 49. 65. 81. 97. 113. 129. 145. 161. 177. 193. 209. 225. 241.] [ 2. 18. 34. 50. 66. 82. 98. 114. 130. 146. 162. 178. 194. 210. 226. 242.] [ 3. 19. 35. 51. 67. 83. 99. 115. 131. 147. 163. 179. 195. 211. 227. 243.] [ 4. 20. 36. 52. 68. 84. 100. 116. 132. 148. 164. 180. 196. 212. 228. 244.] [ 5. 21. 37. 53. 69. 85. 101. 117. 133. 149. 165. 181. 197. 213. 229. 245.] [ 6. 22. 38. 54. 70. 86. 102. 118. 134. 150. 166. 182. 198. 214. 230. 246.] [ 7. 23. 39. 55. 71. 87. 103. 119. 135. 151. 167. 183. 199. 215. 231. 247.] [ 8. 24. 40. 56. 72. 88. 104. 120. 136. 152. 168. 184. 200. 216. 232. 248.] [ 9. 25. 41. 57. 73. 89. 105. 121. 137. 153. 169. 185. 201. 217. 233. 249.] [ 10. 26. 42. 58. 74. 90. 106. 122. 138. 154. 170. 186. 202. 218. 234. 250.] [ 11. 27. 43. 59. 75. 91. 107. 123. 139. 155. 171. 187. 203. 219. 235. 251.] [ 12. 28. 44. 60. 76. 92. 108. 124. 140. 156. 172. 188. 204. 220. 236. 252.] [ 13. 29. 45. 61. 77. 93. 109. 125. 141. 157. 173. 189. 205. 221. 237. 253.] [ 14. 30. 46. 62. 78. 94. 110. 126. 142. 158. 174. 190. 206. 222. 238. 254.] [ 15. 31. 47. 63. 79. 95. 111. 127. 143. 159. 175. 191. 207. 223. 239. 255.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[256. 272. 288. 304. 320. 336. 352. 368. 384. 400. 416. 432. 448. 464. 480. 496.] [257. 273. 289. 305. 321. 337. 353. 369. 385. 401. 417. 433. 449. 465. 481. 497.] [258. 274. 290. 306. 322. 338. 354. 370. 386. 402. 418. 434. 450. 466. 482. 498.] [259. 275. 291. 307. 323. 339. 355. 371. 387. 403. 419. 435. 451. 467. 483. 499.] [260. 276. 292. 308. 324. 340. 356. 372. 388. 404. 420. 436. 452. 468. 484. 500.] [261. 277. 293. 309. 325. 341. 357. 373. 389. 405. 421. 437. 453. 469. 485. 501.] [262. 278. 294. 310. 326. 342. 358. 374. 390. 406. 422. 438. 454. 470. 486. 502.] [263. 279. 295. 311. 327. 343. 359. 375. 391. 407. 423. 439. 455. 471. 487. 503.] [264. 280. 296. 312. 328. 344. 360. 376. 392. 408. 424. 440. 456. 472. 488. 504.] [265. 281. 297. 313. 329. 345. 361. 377. 393. 409. 425. 441. 457. 473. 489. 505.] [266. 282. 298. 314. 330. 346. 362. 378. 394. 410. 426. 442. 458. 474. 490. 506.] [267. 283. 299. 315. 331. 347. 363. 379. 395. 411. 427. 443. 459. 475. 491. 507.] [268. 284. 300. 316. 332. 348. 364. 380. 396. 412. 428. 444. 460. 476. 492. 508.] [269. 285. 301. 317. 333. 349. 365. 381. 397. 413. 429. 445. 461. 477. 493. 509.] [270. 286. 302. 318. 334. 350. 366. 382. 398. 414. 430. 446. 462. 478. 494. 510.] [271. 287. 303. 319. 335. 351. 367. 383. 399. 415. 431. 447. 463. 479. 495. 511.]]]