vec_trans

Description

Transposes contiguous blocks of a 16 x 16 2D matrix for repeat_times times. Each iteration operates 256 contiguous address space data blocks. The addresses between different iterations can be incontiguous. The address space between adjacent iterations is specified by dst_rep_stride and src_rep_stride.

Prototype

vec_trans(dst, src, repeat_times, dst_rep_stride, src_rep_stride)

Parameters

Parameter

Input/Output

Description

dst

Output

A tensor for the destination operand. Must be one of the following data types: int16, uint16, float16

The scope of the tensor is the Unified Buffer.

src

Input

A tensor for the source operand. Must be one of the following data types: int16, uint16, float16

The scope of the tensor is the Unified Buffer.

repeat_times

Input

Number of repeats. Must be a Scalar of type int/uint, an immediate of type int, or an Expr of type int/uint. Must be in the range of [1, 4095].

dst_rep_stride

Input

dst address space between adjacent iterations (unit: 512 bytes). Must be a Scalar of type int/uint, an immediate of type int, or an Expr of type int/uint. Must be in the range of [0, 4095].

src_rep_stride

Input

src address space between adjacent iterations (unit: 512 bytes). Must be a Scalar of type int/uint, an immediate of type int, or an Expr of type int/uint. Must be in the range of [0, 4095].

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

  • To save memory space, you can define a tensor shared by the source and destination operands (by address overlapping). The general instruction restriction is that the source operand must completely overlap the destination operand.
  • For details about the alignment requirements of the operand address offset, see General Restrictions.

Returns

None

Example

  • Example 1
    from tbe import tik
    tik_instance = tik.Tik()
    src_gm = tik_instance.Tensor("float16", (1,16,16), name="src_gm", scope=tik.scope_gm)
    src_ub = tik_instance.Tensor("float16", (1,16,16), name="src_ub", scope=tik.scope_ubuf)
    dst_gm = tik_instance.Tensor("float16", (1,16,16), name="dst_gm", scope=tik.scope_gm)
    dst_ub = tik_instance.Tensor("float16", (1,16,16), name="dst_ub", scope=tik.scope_ubuf)
    # Copy the user input to the source Unified Buffer.
    tik_instance.data_move(src_ub, src_gm, 0, 1, 16, 0, 0)
    tik_instance.vec_trans(dst_ub, src_ub, 1, 1, 1)
    # Copy the compute result to the destination Global Memory.
    tik_instance.data_move(dst_gm, dst_ub, 0, 1, 16, 0, 0)
    
    tik_instance.BuildCCE(kernel_name="vec_trans", inputs=[src_gm], outputs=[dst_gm])
    

    Result example:

    Description:
    Input:
    src_gm=[1,2,3,4,...,256]
    Output:
    dst_gm=[1,17,33,49,...,256]
  • Example 2
    tik_instance = tik.Tik()
    
    shape = (3,16,16)
    dtype = "float16"
    # Number of iterations
    repeat_time = 2
    # Address stride between two adjacent iterations, in 512 bytes. In the current example, the source data is read contiguously. The header interval between the first and second iterations of the destination data is 2 x 512 bytes.
    dst_rep_stride = 2
    src_rep_stride = 1
    
    src_gm = tik_instance.Tensor(dtype, shape, name="src_gm", scope=tik.scope_gm)
    src_ub = tik_instance.Tensor(dtype, shape, name="src_ub", scope=tik.scope_ubuf)
    dst_gm = tik_instance.Tensor(dtype, shape, name="dst_gm", scope=tik.scope_gm)
    dst_ub = tik_instance.Tensor(dtype, shape, name="dst_ub", scope=tik.scope_ubuf)
    # Copy the user input to the source Unified Buffer. For details about data_move, see the corresponding section.
    tik_instance.data_move(src_ub, src_gm, 0, 1, 48, 0, 0)
    # To facilitate observation, initialize dst_ub to 0. For details about vec_dup, see the corresponding section.
    tik_instance.vec_dup(128, dst_ub, 0, 6, 8)
    tik_instance.vec_trans(dst_ub, src_ub, repeat_time, dst_rep_stride, src_rep_stride)
    # Copy the compute result to the destination Global Memory. For details about data_move, see the corresponding section.
    tik_instance.data_move(dst_gm, dst_ub, 0, 1, 48, 0, 0)
    
    tik_instance.BuildCCE(kernel_name="vec_trans", inputs=[src_gm], outputs=[dst_gm])

    Result example:

    Description:
    Input:
    [[[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
        14.  15.]
      [ 16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.  28.  29.
        30.  31.]
      [ 32.  33.  34.  35.  36.  37.  38.  39.  40.  41.  42.  43.  44.  45.
        46.  47.]
      [ 48.  49.  50.  51.  52.  53.  54.  55.  56.  57.  58.  59.  60.  61.
        62.  63.]
      [ 64.  65.  66.  67.  68.  69.  70.  71.  72.  73.  74.  75.  76.  77.
        78.  79.]
      [ 80.  81.  82.  83.  84.  85.  86.  87.  88.  89.  90.  91.  92.  93.
        94.  95.]
      [ 96.  97.  98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109.
       110. 111.]
      [112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
       126. 127.]
      [128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141.
       142. 143.]
      [144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157.
       158. 159.]
      [160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173.
       174. 175.]
      [176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189.
       190. 191.]
      [192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205.
       206. 207.]
      [208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221.
       222. 223.]
      [224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237.
       238. 239.]
      [240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253.
       254. 255.]]
    
     [[256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269.
       270. 271.]
      [272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285.
       286. 287.]
      [288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301.
       302. 303.]
      [304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 315. 316. 317.
       318. 319.]
      [320. 321. 322. 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333.
       334. 335.]
      [336. 337. 338. 339. 340. 341. 342. 343. 344. 345. 346. 347. 348. 349.
       350. 351.]
      [352. 353. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363. 364. 365.
       366. 367.]
      [368. 369. 370. 371. 372. 373. 374. 375. 376. 377. 378. 379. 380. 381.
       382. 383.]
      [384. 385. 386. 387. 388. 389. 390. 391. 392. 393. 394. 395. 396. 397.
       398. 399.]
      [400. 401. 402. 403. 404. 405. 406. 407. 408. 409. 410. 411. 412. 413.
       414. 415.]
      [416. 417. 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429.
       430. 431.]
      [432. 433. 434. 435. 436. 437. 438. 439. 440. 441. 442. 443. 444. 445.
       446. 447.]
      [448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459. 460. 461.
       462. 463.]
      [464. 465. 466. 467. 468. 469. 470. 471. 472. 473. 474. 475. 476. 477.
       478. 479.]
      [480. 481. 482. 483. 484. 485. 486. 487. 488. 489. 490. 491. 492. 493.
       494. 495.]
      [496. 497. 498. 499. 500. 501. 502. 503. 504. 505. 506. 507. 508. 509.
       510. 511.]]
    
     [[512. 513. 514. 515. 516. 517. 518. 519. 520. 521. 522. 523. 524. 525.
       526. 527.]
      [528. 529. 530. 531. 532. 533. 534. 535. 536. 537. 538. 539. 540. 541.
       542. 543.]
      [544. 545. 546. 547. 548. 549. 550. 551. 552. 553. 554. 555. 556. 557.
       558. 559.]
      [560. 561. 562. 563. 564. 565. 566. 567. 568. 569. 570. 571. 572. 573.
       574. 575.]
      [576. 577. 578. 579. 580. 581. 582. 583. 584. 585. 586. 587. 588. 589.
       590. 591.]
      [592. 593. 594. 595. 596. 597. 598. 599. 600. 601. 602. 603. 604. 605.
       606. 607.]
      [608. 609. 610. 611. 612. 613. 614. 615. 616. 617. 618. 619. 620. 621.
       622. 623.]
      [624. 625. 626. 627. 628. 629. 630. 631. 632. 633. 634. 635. 636. 637.
       638. 639.]
      [640. 641. 642. 643. 644. 645. 646. 647. 648. 649. 650. 651. 652. 653.
       654. 655.]
      [656. 657. 658. 659. 660. 661. 662. 663. 664. 665. 666. 667. 668. 669.
       670. 671.]
      [672. 673. 674. 675. 676. 677. 678. 679. 680. 681. 682. 683. 684. 685.
       686. 687.]
      [688. 689. 690. 691. 692. 693. 694. 695. 696. 697. 698. 699. 700. 701.
       702. 703.]
      [704. 705. 706. 707. 708. 709. 710. 711. 712. 713. 714. 715. 716. 717.
       718. 719.]
      [720. 721. 722. 723. 724. 725. 726. 727. 728. 729. 730. 731. 732. 733.
       734. 735.]
      [736. 737. 738. 739. 740. 741. 742. 743. 744. 745. 746. 747. 748. 749.
       750. 751.]
      [752. 753. 754. 755. 756. 757. 758. 759. 760. 761. 762. 763. 764. 765.
       766. 767.]]]
    Output:
    [[[  0.  16.  32.  48.  64.  80.  96. 112. 128. 144. 160. 176. 192. 208.
       224. 240.]
      [  1.  17.  33.  49.  65.  81.  97. 113. 129. 145. 161. 177. 193. 209.
       225. 241.]
      [  2.  18.  34.  50.  66.  82.  98. 114. 130. 146. 162. 178. 194. 210.
       226. 242.]
      [  3.  19.  35.  51.  67.  83.  99. 115. 131. 147. 163. 179. 195. 211.
       227. 243.]
      [  4.  20.  36.  52.  68.  84. 100. 116. 132. 148. 164. 180. 196. 212.
       228. 244.]
      [  5.  21.  37.  53.  69.  85. 101. 117. 133. 149. 165. 181. 197. 213.
       229. 245.]
      [  6.  22.  38.  54.  70.  86. 102. 118. 134. 150. 166. 182. 198. 214.
       230. 246.]
      [  7.  23.  39.  55.  71.  87. 103. 119. 135. 151. 167. 183. 199. 215.
       231. 247.]
      [  8.  24.  40.  56.  72.  88. 104. 120. 136. 152. 168. 184. 200. 216.
       232. 248.]
      [  9.  25.  41.  57.  73.  89. 105. 121. 137. 153. 169. 185. 201. 217.
       233. 249.]
      [ 10.  26.  42.  58.  74.  90. 106. 122. 138. 154. 170. 186. 202. 218.
       234. 250.]
      [ 11.  27.  43.  59.  75.  91. 107. 123. 139. 155. 171. 187. 203. 219.
       235. 251.]
      [ 12.  28.  44.  60.  76.  92. 108. 124. 140. 156. 172. 188. 204. 220.
       236. 252.]
      [ 13.  29.  45.  61.  77.  93. 109. 125. 141. 157. 173. 189. 205. 221.
       237. 253.]
      [ 14.  30.  46.  62.  78.  94. 110. 126. 142. 158. 174. 190. 206. 222.
       238. 254.]
      [ 15.  31.  47.  63.  79.  95. 111. 127. 143. 159. 175. 191. 207. 223.
       239. 255.]]
    
     [[  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]
      [  0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
         0.   0.]]
    
     [[256. 272. 288. 304. 320. 336. 352. 368. 384. 400. 416. 432. 448. 464.
       480. 496.]
      [257. 273. 289. 305. 321. 337. 353. 369. 385. 401. 417. 433. 449. 465.
       481. 497.]
      [258. 274. 290. 306. 322. 338. 354. 370. 386. 402. 418. 434. 450. 466.
       482. 498.]
      [259. 275. 291. 307. 323. 339. 355. 371. 387. 403. 419. 435. 451. 467.
       483. 499.]
      [260. 276. 292. 308. 324. 340. 356. 372. 388. 404. 420. 436. 452. 468.
       484. 500.]
      [261. 277. 293. 309. 325. 341. 357. 373. 389. 405. 421. 437. 453. 469.
       485. 501.]
      [262. 278. 294. 310. 326. 342. 358. 374. 390. 406. 422. 438. 454. 470.
       486. 502.]
      [263. 279. 295. 311. 327. 343. 359. 375. 391. 407. 423. 439. 455. 471.
       487. 503.]
      [264. 280. 296. 312. 328. 344. 360. 376. 392. 408. 424. 440. 456. 472.
       488. 504.]
      [265. 281. 297. 313. 329. 345. 361. 377. 393. 409. 425. 441. 457. 473.
       489. 505.]
      [266. 282. 298. 314. 330. 346. 362. 378. 394. 410. 426. 442. 458. 474.
       490. 506.]
      [267. 283. 299. 315. 331. 347. 363. 379. 395. 411. 427. 443. 459. 475.
       491. 507.]
      [268. 284. 300. 316. 332. 348. 364. 380. 396. 412. 428. 444. 460. 476.
       492. 508.]
      [269. 285. 301. 317. 333. 349. 365. 381. 397. 413. 429. 445. 461. 477.
       493. 509.]
      [270. 286. 302. 318. 334. 350. 366. 382. 398. 414. 430. 446. 462. 478.
       494. 510.]
      [271. 287. 303. 319. 335. 351. 367. 383. 399. 415. 431. 447. 463. 479.
       495. 511.]]]