1====================================== 2Syntax of AMDGPU Instruction Modifiers 3====================================== 4 5.. contents:: 6 :local: 7 8Conventions 9=========== 10 11The following notation is used throughout this document: 12 13 =================== ============================================================= 14 Notation Description 15 =================== ============================================================= 16 {0..N} Any integer value in the range from 0 to N (inclusive). 17 <x> Syntax and meaning of *x* is explained elsewhere. 18 =================== ============================================================= 19 20.. _amdgpu_syn_modifiers: 21 22Modifiers 23========= 24 25DS Modifiers 26------------ 27 28.. _amdgpu_synid_ds_offset8: 29 30offset8 31~~~~~~~ 32 33Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0. 34 35Used with DS instructions which have 2 addresses. 36 37 =================== ===================================================== 38 Syntax Description 39 =================== ===================================================== 40 offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive 41 :ref:`integer number <amdgpu_synid_integer_number>`. 42 =================== ===================================================== 43 44Examples: 45 46.. parsed-literal:: 47 48 offset:255 49 offset:0xff 50 51.. _amdgpu_synid_ds_offset16: 52 53offset16 54~~~~~~~~ 55 56Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0. 57 58Used with DS instructions which have 1 address. 59 60 ==================== ====================================================== 61 Syntax Description 62 ==================== ====================================================== 63 offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive 64 :ref:`integer number <amdgpu_synid_integer_number>`. 65 ==================== ====================================================== 66 67Examples: 68 69.. parsed-literal:: 70 71 offset:65535 72 offset:0xffff 73 74.. _amdgpu_synid_sw_offset16: 75 76swizzle pattern 77~~~~~~~~~~~~~~~ 78 79This is a special modifier which may be used with *ds_swizzle_b32* instruction only. 80It specifies a swizzle pattern in numeric or symbolic form. The default value is 0. 81 82See AMD documentation for more information. 83 84 ======================================================= =========================================================== 85 Syntax Description 86 ======================================================= =========================================================== 87 offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern. 88 offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern 89 90 Each number is a lane *id*. 91 offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern. 92 93 The pattern converts a 5-bit lane *id* to another 94 lane *id* with which the lane interacts. 95 96 *mask* is a 5 character sequence which 97 specifies how to transform the bits of the 98 lane *id*. 99 100 The following characters are allowed: 101 102 * "0" - set bit to 0. 103 104 * "1" - set bit to 1. 105 106 * "p" - preserve bit. 107 108 * "i" - inverse bit. 109 110 offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode. 111 112 Broadcasts the value of any particular lane to 113 all lanes in its group. 114 115 The first numeric parameter is a group 116 size and must be equal to 2, 4, 8, 16 or 32. 117 118 The second numeric parameter is an index of the 119 lane being broadcasted. 120 121 The index must not exceed group size. 122 offset:swizzle(SWAP,{1..16}) Specifies a swap mode. 123 124 Swaps the neighboring groups of 125 1, 2, 4, 8 or 16 lanes. 126 offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. 127 128 Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes. 129 ======================================================= =========================================================== 130 131Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or 132:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 133 134Examples: 135 136.. parsed-literal:: 137 138 offset:255 139 offset:0xffff 140 offset:swizzle(QUAD_PERM, 0, 1, 2 ,3) 141 offset:swizzle(BITMASK_PERM, "01pi0") 142 offset:swizzle(BROADCAST, 2, 0) 143 offset:swizzle(SWAP, 8) 144 offset:swizzle(REVERSE, 30 + 2) 145 146.. _amdgpu_synid_gds: 147 148gds 149~~~ 150 151Specifies whether to use GDS or LDS memory (LDS is the default). 152 153 ======================================== ================================================ 154 Syntax Description 155 ======================================== ================================================ 156 gds Use GDS memory. 157 ======================================== ================================================ 158 159 160EXP Modifiers 161------------- 162 163.. _amdgpu_synid_done: 164 165done 166~~~~ 167 168Specifies if this is the last export from the shader to the target. By default, 169*exp* instruction does not finish an export sequence. 170 171 ======================================== ================================================ 172 Syntax Description 173 ======================================== ================================================ 174 done Indicates the last export operation. 175 ======================================== ================================================ 176 177.. _amdgpu_synid_compr: 178 179compr 180~~~~~ 181 182Indicates if the data are compressed (data are not compressed by default). 183 184 ======================================== ================================================ 185 Syntax Description 186 ======================================== ================================================ 187 compr Data are compressed. 188 ======================================== ================================================ 189 190.. _amdgpu_synid_vm: 191 192vm 193~~ 194 195Specifies valid mask flag state (off by default). 196 197 ======================================== ================================================ 198 Syntax Description 199 ======================================== ================================================ 200 vm Set valid mask flag. 201 ======================================== ================================================ 202 203FLAT Modifiers 204-------------- 205 206.. _amdgpu_synid_flat_offset12: 207 208offset12 209~~~~~~~~ 210 211Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. 212 213Cannot be used with *global/scratch* opcodes. GFX9 only. 214 215 ================= ====================================================== 216 Syntax Description 217 ================= ====================================================== 218 offset:{0..4095} Specifies a 12-bit unsigned offset as a positive 219 :ref:`integer number <amdgpu_synid_integer_number>`. 220 ================= ====================================================== 221 222Examples: 223 224.. parsed-literal:: 225 226 offset:4095 227 offset:0xff 228 229.. _amdgpu_synid_flat_offset13s: 230 231offset13s 232~~~~~~~~~ 233 234Specifies an immediate signed 13-bit offset, in bytes. The default value is 0. 235 236Can be used with *global/scratch* opcodes only. GFX9 only. 237 238 ============================ ======================================================= 239 Syntax Description 240 ============================ ======================================================= 241 offset:{-4096..4095} Specifies a 13-bit signed offset as an 242 :ref:`integer number <amdgpu_synid_integer_number>`. 243 ============================ ======================================================= 244 245Examples: 246 247.. parsed-literal:: 248 249 offset:-4000 250 offset:0x10 251 252.. _amdgpu_synid_flat_offset12s: 253 254offset12s 255~~~~~~~~~ 256 257Specifies an immediate signed 12-bit offset, in bytes. The default value is 0. 258 259Can be used with *global/scratch* opcodes only. 260 261GFX10 only. 262 263 ============================ ======================================================= 264 Syntax Description 265 ============================ ======================================================= 266 offset:{-2048..2047} Specifies a 12-bit signed offset as an 267 :ref:`integer number <amdgpu_synid_integer_number>`. 268 ============================ ======================================================= 269 270Examples: 271 272.. parsed-literal:: 273 274 offset:-2000 275 offset:0x10 276 277.. _amdgpu_synid_flat_offset11: 278 279offset11 280~~~~~~~~ 281 282Specifies an immediate unsigned 11-bit offset, in bytes. The default value is 0. 283 284Cannot be used with *global/scratch* opcodes. 285 286GFX10 only. 287 288 ================= ====================================================== 289 Syntax Description 290 ================= ====================================================== 291 offset:{0..2047} Specifies an 11-bit unsigned offset as a positive 292 :ref:`integer number <amdgpu_synid_integer_number>`. 293 ================= ====================================================== 294 295Examples: 296 297.. parsed-literal:: 298 299 offset:2047 300 offset:0xff 301 302dlc 303~~~ 304 305See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 306 307glc 308~~~ 309 310See a description :ref:`here<amdgpu_synid_glc>`. 311 312lds 313~~~ 314 315See a description :ref:`here<amdgpu_synid_lds>`. GFX10 only. 316 317slc 318~~~ 319 320See a description :ref:`here<amdgpu_synid_slc>`. 321 322tfe 323~~~ 324 325See a description :ref:`here<amdgpu_synid_tfe>`. 326 327nv 328~~ 329 330See a description :ref:`here<amdgpu_synid_nv>`. 331 332MIMG Modifiers 333-------------- 334 335.. _amdgpu_synid_dmask: 336 337dmask 338~~~~~ 339 340Specifies which channels (image components) are used by the operation. By default, no channels 341are used. 342 343 =============== ===================================================== 344 Syntax Description 345 =============== ===================================================== 346 dmask:{0..15} Specifies image channels as a positive 347 :ref:`integer number <amdgpu_synid_integer_number>`. 348 349 Each bit corresponds to one of 4 image 350 components (RGBA). 351 352 If the specified bit value 353 is 0, the component is not used, value 1 means 354 that the component is used. 355 =============== ===================================================== 356 357This modifier has some limitations depending on instruction kind: 358 359 =================================================== ======================== 360 Instruction Kind Valid dmask Values 361 =================================================== ======================== 362 32-bit atomic *cmpswap* 0x3 363 32-bit atomic instructions except for *cmpswap* 0x1 364 64-bit atomic *cmpswap* 0xF 365 64-bit atomic instructions except for *cmpswap* 0x3 366 *gather4* 0x1, 0x2, 0x4, 0x8 367 Other instructions any value 368 =================================================== ======================== 369 370Examples: 371 372.. parsed-literal:: 373 374 dmask:0xf 375 dmask:0b1111 376 dmask:3 377 378.. _amdgpu_synid_unorm: 379 380unorm 381~~~~~ 382 383Specifies whether the address is normalized or not (the address is normalized by default). 384 385 ======================== ======================================== 386 Syntax Description 387 ======================== ======================================== 388 unorm Force the address to be unnormalized. 389 ======================== ======================================== 390 391glc 392~~~ 393 394See a description :ref:`here<amdgpu_synid_glc>`. 395 396slc 397~~~ 398 399See a description :ref:`here<amdgpu_synid_slc>`. 400 401.. _amdgpu_synid_r128: 402 403r128 404~~~~ 405 406Specifies texture resource size. The default size is 256 bits. 407 408GFX7, GFX8 and GFX10 only. 409 410 =================== ================================================ 411 Syntax Description 412 =================== ================================================ 413 r128 Specifies 128 bits texture resource size. 414 =================== ================================================ 415 416.. WARNING:: Using this modifier should descrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature. 417 418tfe 419~~~ 420 421See a description :ref:`here<amdgpu_synid_tfe>`. 422 423.. _amdgpu_synid_lwe: 424 425lwe 426~~~ 427 428Specifies LOD warning status (LOD warning is disabled by default). 429 430 ======================================== ================================================ 431 Syntax Description 432 ======================================== ================================================ 433 lwe Enables LOD warning. 434 ======================================== ================================================ 435 436.. _amdgpu_synid_da: 437 438da 439~~ 440 441Specifies if an array index must be sent to TA. By default, array index is not sent. 442 443 ======================================== ================================================ 444 Syntax Description 445 ======================================== ================================================ 446 da Send an array-index to TA. 447 ======================================== ================================================ 448 449.. _amdgpu_synid_d16: 450 451d16 452~~~ 453 454Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7. 455 456 ======================================== ================================================ 457 Syntax Description 458 ======================================== ================================================ 459 d16 Enables 16-bits data mode. 460 461 On loads, convert data in memory to 16-bit 462 format before storing it in VGPRs. 463 464 For stores, convert 16-bit data in VGPRs to 465 32 bits before going to memory. 466 467 Note that GFX8.0 does not support data packing. 468 Each 16-bit data element occupies 1 VGPR. 469 470 GFX8.1, GFX9 and GFX10 support data packing. 471 Each pair of 16-bit data elements 472 occupies 1 VGPR. 473 ======================================== ================================================ 474 475.. _amdgpu_synid_a16: 476 477a16 478~~~ 479 480Specifies size of image address components: 16 or 32 bits (32 bits by default). 481GFX9 and GFX10 only. 482 483 ======================================== ================================================ 484 Syntax Description 485 ======================================== ================================================ 486 a16 Enables 16-bits image address components. 487 ======================================== ================================================ 488 489.. _amdgpu_synid_dim: 490 491dim 492~~~ 493 494Specifies surface dimension. This is a mandatory modifier. There is no default value. 495 496GFX10 only. 497 498 =============================== ========================================================= 499 Syntax Description 500 =============================== ========================================================= 501 dim:1D One-dimensional image. 502 dim:2D Two-dimensional image. 503 dim:3D Three-dimensional image. 504 dim:CUBE Cubemap array. 505 dim:1D_ARRAY One-dimensional image array. 506 dim:2D_ARRAY Two-dimensional image array. 507 dim:2D_MSAA Two-dimensional multi-sample auto-aliasing image. 508 dim:2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array. 509 =============================== ========================================================= 510 511The following table defines an alternative syntax which is supported 512for compatibility with SP3 assembler: 513 514 =============================== ========================================================= 515 Syntax Description 516 =============================== ========================================================= 517 dim:SQ_RSRC_IMG_1D One-dimensional image. 518 dim:SQ_RSRC_IMG_2D Two-dimensional image. 519 dim:SQ_RSRC_IMG_3D Three-dimensional image. 520 dim:SQ_RSRC_IMG_CUBE Cubemap array. 521 dim:SQ_RSRC_IMG_1D_ARRAY One-dimensional image array. 522 dim:SQ_RSRC_IMG_2D_ARRAY Two-dimensional image array. 523 dim:SQ_RSRC_IMG_2D_MSAA Two-dimensional multi-sample auto-aliasing image. 524 dim:SQ_RSRC_IMG_2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array. 525 =============================== ========================================================= 526 527dlc 528~~~ 529 530See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 531 532Miscellaneous Modifiers 533----------------------- 534 535.. _amdgpu_synid_dlc: 536 537dlc 538~~~ 539 540Controls device level cache policy for memory operations. Used for synchronization. 541When specified, forces operation to bypass device level cache making the operation device 542level coherent. By default, instructions use device level cache. 543 544GFX10 only. 545 546 ======================================== ================================================ 547 Syntax Description 548 ======================================== ================================================ 549 dlc Bypass device level cache. 550 ======================================== ================================================ 551 552.. _amdgpu_synid_glc: 553 554glc 555~~~ 556 557This modifier has different meaning for loads, stores, and atomic operations. 558The default value is off (0). 559 560See AMD documentation for details. 561 562 ======================================== ================================================ 563 Syntax Description 564 ======================================== ================================================ 565 glc Set glc bit to 1. 566 ======================================== ================================================ 567 568.. _amdgpu_synid_lds: 569 570lds 571~~~ 572 573Specifies where to store the result: VGPRs or LDS (VGPRs by default). 574 575 ======================================== =========================== 576 Syntax Description 577 ======================================== =========================== 578 lds Store result in LDS. 579 ======================================== =========================== 580 581.. _amdgpu_synid_nv: 582 583nv 584~~ 585 586Specifies if instruction is operating on non-volatile memory. By default, memory is volatile. 587 588GFX9 only. 589 590 ======================================== ================================================ 591 Syntax Description 592 ======================================== ================================================ 593 nv Indicates that instruction operates on 594 non-volatile memory. 595 ======================================== ================================================ 596 597.. _amdgpu_synid_slc: 598 599slc 600~~~ 601 602Specifies cache policy. The default value is off (0). 603 604See AMD documentation for details. 605 606 ======================================== ================================================ 607 Syntax Description 608 ======================================== ================================================ 609 slc Set slc bit to 1. 610 ======================================== ================================================ 611 612.. _amdgpu_synid_tfe: 613 614tfe 615~~~ 616 617Controls access to partially resident textures. The default value is off (0). 618 619See AMD documentation for details. 620 621 ======================================== ================================================ 622 Syntax Description 623 ======================================== ================================================ 624 tfe Set tfe bit to 1. 625 ======================================== ================================================ 626 627MUBUF/MTBUF Modifiers 628--------------------- 629 630.. _amdgpu_synid_idxen: 631 632idxen 633~~~~~ 634 635Specifies whether address components include an index. By default, no components are used. 636 637Can be used together with :ref:`offen<amdgpu_synid_offen>`. 638 639Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. 640 641 ======================================== ================================================ 642 Syntax Description 643 ======================================== ================================================ 644 idxen Address components include an index. 645 ======================================== ================================================ 646 647.. _amdgpu_synid_offen: 648 649offen 650~~~~~ 651 652Specifies whether address components include an offset. By default, no components are used. 653 654Can be used together with :ref:`idxen<amdgpu_synid_idxen>`. 655 656Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. 657 658 ======================================== ================================================ 659 Syntax Description 660 ======================================== ================================================ 661 offen Address components include an offset. 662 ======================================== ================================================ 663 664.. _amdgpu_synid_addr64: 665 666addr64 667~~~~~~ 668 669Specifies whether a 64-bit address is used. By default, no address is used. 670 671GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and 672:ref:`idxen<amdgpu_synid_idxen>` modifiers. 673 674 ======================================== ================================================ 675 Syntax Description 676 ======================================== ================================================ 677 addr64 A 64-bit address is used. 678 ======================================== ================================================ 679 680.. _amdgpu_synid_buf_offset12: 681 682offset12 683~~~~~~~~ 684 685Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. 686 687 =============================== ====================================================== 688 Syntax Description 689 =============================== ====================================================== 690 offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive 691 :ref:`integer number <amdgpu_synid_integer_number>`. 692 =============================== ====================================================== 693 694Examples: 695 696.. parsed-literal:: 697 698 offset:0 699 offset:0x10 700 701glc 702~~~ 703 704See a description :ref:`here<amdgpu_synid_glc>`. 705 706slc 707~~~ 708 709See a description :ref:`here<amdgpu_synid_slc>`. 710 711lds 712~~~ 713 714See a description :ref:`here<amdgpu_synid_lds>`. 715 716dlc 717~~~ 718 719See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 720 721tfe 722~~~ 723 724See a description :ref:`here<amdgpu_synid_tfe>`. 725 726.. _amdgpu_synid_dfmt: 727 728dfmt 729~~~~ 730 731TBD 732 733.. _amdgpu_synid_nfmt: 734 735nfmt 736~~~~ 737 738TBD 739 740SMRD/SMEM Modifiers 741------------------- 742 743glc 744~~~ 745 746See a description :ref:`here<amdgpu_synid_glc>`. 747 748nv 749~~ 750 751See a description :ref:`here<amdgpu_synid_nv>`. GFX9 only. 752 753dlc 754~~~ 755 756See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 757 758VINTRP Modifiers 759---------------- 760 761.. _amdgpu_synid_high: 762 763high 764~~~~ 765 766Specifies which half of the LDS word to use. Low half of LDS word is used by default. 767GFX9 and GFX10 only. 768 769 ======================================== ================================ 770 Syntax Description 771 ======================================== ================================ 772 high Use high half of LDS word. 773 ======================================== ================================ 774 775DPP8 Modifiers 776-------------- 777 778GFX10 only. 779 780.. _amdgpu_synid_dpp8_sel: 781 782dpp8_sel 783~~~~~~~~ 784 785Selects which lane to pull data from, within a group of 8 lanes. This is a mandatory modifier. 786There is no default value. 787 788GFX10 only. 789 790The *dpp8_sel* modifier must specify exactly 8 values, each ranging from 0 to 7. 791First value selects which lane to read from to supply data into lane 0. 792Second value controls value for lane 1 and so on. 793 794 =============================================================== =========================== 795 Syntax Description 796 =============================================================== =========================== 797 dpp8:[{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7}] Select lanes to read from. 798 =============================================================== =========================== 799 800Examples: 801 802.. parsed-literal:: 803 804 dpp8:[7,6,5,4,3,2,1,0] 805 dpp8:[0,1,0,1,0,1,0,1] 806 807.. _amdgpu_synid_fi8: 808 809fi 810~~ 811 812Controls interaction with inactive lanes for *dpp8* instructions. The default value is zero. 813 814Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero. 815 816GFX10 only. 817 818 ==================================== ===================================================== 819 Syntax Description 820 ==================================== ===================================================== 821 fi:0 Fetch zero when accessing data from inactive lanes. 822 fi:1 Fetch pre-exist values from inactive lanes. 823 ==================================== ===================================================== 824 825DPP/DPP16 Modifiers 826------------------- 827 828GFX8, GFX9 and GFX10 only. 829 830.. _amdgpu_synid_dpp_ctrl: 831 832dpp_ctrl 833~~~~~~~~ 834 835Specifies how data are shared between threads. This is a mandatory modifier. 836There is no default value. 837 838GFX8 and GFX9 only. Use :ref:`dpp16_ctrl<amdgpu_synid_dpp16_ctrl>` for GFX10. 839 840Note. The lanes of a wavefront are organized in four *rows* and four *banks*. 841 842 ======================================== ================================================ 843 Syntax Description 844 ======================================== ================================================ 845 quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. 846 row_mirror Mirror threads within row. 847 row_half_mirror Mirror threads within 1/2 row (8 threads). 848 row_bcast:15 Broadcast 15th thread of each row to next row. 849 row_bcast:31 Broadcast thread 31 to rows 2 and 3. 850 wave_shl:1 Wavefront left shift by 1 thread. 851 wave_rol:1 Wavefront left rotate by 1 thread. 852 wave_shr:1 Wavefront right shift by 1 thread. 853 wave_ror:1 Wavefront right rotate by 1 thread. 854 row_shl:{1..15} Row shift left by 1-15 threads. 855 row_shr:{1..15} Row shift right by 1-15 threads. 856 row_ror:{1..15} Row rotate right by 1-15 threads. 857 ======================================== ================================================ 858 859Note: Numeric parameters may be specified as either 860:ref:`integer numbers<amdgpu_synid_integer_number>` or 861:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 862 863Examples: 864 865.. parsed-literal:: 866 867 quad_perm:[0, 1, 2, 3] 868 row_shl:3 869 870.. _amdgpu_synid_dpp16_ctrl: 871 872dpp16_ctrl 873~~~~~~~~~~ 874 875Specifies how data are shared between threads. This is a mandatory modifier. 876There is no default value. 877 878GFX10 only. Use :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` for GFX8 and GFX9. 879 880Note. The lanes of a wavefront are organized in four *rows* and four *banks*. 881(There are only two rows in *wave32* mode.) 882 883 ======================================== ==================================================== 884 Syntax Description 885 ======================================== ==================================================== 886 quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. 887 row_mirror Mirror threads within row. 888 row_half_mirror Mirror threads within 1/2 row (8 threads). 889 row_share:{0..15} Share the value from the specified lane with other 890 lanes in the row. 891 row_xmask:{0..15} Fetch from XOR(current lane id, specified lane id). 892 row_shl:{1..15} Row shift left by 1-15 threads. 893 row_shr:{1..15} Row shift right by 1-15 threads. 894 row_ror:{1..15} Row rotate right by 1-15 threads. 895 ======================================== ==================================================== 896 897Note: Numeric parameters may be specified as either 898:ref:`integer numbers<amdgpu_synid_integer_number>` or 899:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 900 901Examples: 902 903.. parsed-literal:: 904 905 quad_perm:[0, 1, 2, 3] 906 row_shl:3 907 908.. _amdgpu_synid_row_mask: 909 910row_mask 911~~~~~~~~ 912 913Controls which rows are enabled for data sharing. By default, all rows are enabled. 914 915Note. The lanes of a wavefront are organized in four *rows* and four *banks*. 916(There are only two rows in *wave32* mode.) 917 918 ======================================== ===================================================== 919 Syntax Description 920 ======================================== ===================================================== 921 row_mask:{0..15} Specifies a *row mask* as a positive 922 :ref:`integer number <amdgpu_synid_integer_number>`. 923 924 Each of 4 bits in the mask controls one 925 row (0 - disabled, 1 - enabled). 926 927 In *wave32* mode the values should be limited to 928 {0..7}. 929 ======================================== ===================================================== 930 931Examples: 932 933.. parsed-literal:: 934 935 row_mask:0xf 936 row_mask:0b1010 937 row_mask:0b1111 938 939.. _amdgpu_synid_bank_mask: 940 941bank_mask 942~~~~~~~~~ 943 944Controls which banks are enabled for data sharing. By default, all banks are enabled. 945 946Note. The lanes of a wavefront are organized in four *rows* and four *banks*. 947(There are only two rows in *wave32* mode.) 948 949 ======================================== ======================================================= 950 Syntax Description 951 ======================================== ======================================================= 952 bank_mask:{0..15} Specifies a *bank mask* as a positive 953 :ref:`integer number <amdgpu_synid_integer_number>`. 954 955 Each of 4 bits in the mask controls one 956 bank (0 - disabled, 1 - enabled). 957 ======================================== ======================================================= 958 959Examples: 960 961.. parsed-literal:: 962 963 bank_mask:0x3 964 bank_mask:0b0011 965 bank_mask:0b1111 966 967.. _amdgpu_synid_bound_ctrl: 968 969bound_ctrl 970~~~~~~~~~~ 971 972Controls data sharing when accessing an invalid lane. By default, data sharing with 973invalid lanes is disabled. 974 975 ======================================== ================================================ 976 Syntax Description 977 ======================================== ================================================ 978 bound_ctrl:0 Enables data sharing with invalid lanes. 979 980 Accessing data from an invalid lane will 981 return zero. 982 ======================================== ================================================ 983 984.. _amdgpu_synid_fi16: 985 986fi 987~~ 988 989Controls interaction with *inactive* lanes for *dpp16* instructions. The default value is zero. 990 991Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero. 992 993GFX10 only. 994 995 ======================================== ================================================== 996 Syntax Description 997 ======================================== ================================================== 998 fi:0 Interaction with inactive lanes is controlled by 999 :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`. 1000 1001 fi:1 Fetch pre-exist values from inactive lanes. 1002 ======================================== ================================================== 1003 1004SDWA Modifiers 1005-------------- 1006 1007GFX8, GFX9 and GFX10 only. 1008 1009clamp 1010~~~~~ 1011 1012See a description :ref:`here<amdgpu_synid_clamp>`. 1013 1014omod 1015~~~~ 1016 1017See a description :ref:`here<amdgpu_synid_omod>`. 1018 1019GFX9 and GFX10 only. 1020 1021.. _amdgpu_synid_dst_sel: 1022 1023dst_sel 1024~~~~~~~ 1025 1026Selects which bits in the destination are affected. By default, all bits are affected. 1027 1028 ======================================== ================================================ 1029 Syntax Description 1030 ======================================== ================================================ 1031 dst_sel:DWORD Use bits 31:0. 1032 dst_sel:BYTE_0 Use bits 7:0. 1033 dst_sel:BYTE_1 Use bits 15:8. 1034 dst_sel:BYTE_2 Use bits 23:16. 1035 dst_sel:BYTE_3 Use bits 31:24. 1036 dst_sel:WORD_0 Use bits 15:0. 1037 dst_sel:WORD_1 Use bits 31:16. 1038 ======================================== ================================================ 1039 1040 1041.. _amdgpu_synid_dst_unused: 1042 1043dst_unused 1044~~~~~~~~~~ 1045 1046Controls what to do with the bits in the destination which are not selected 1047by :ref:`dst_sel<amdgpu_synid_dst_sel>`. 1048By default, unused bits are preserved. 1049 1050 ======================================== ================================================ 1051 Syntax Description 1052 ======================================== ================================================ 1053 dst_unused:UNUSED_PAD Pad with zeros. 1054 dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits. 1055 dst_unused:UNUSED_PRESERVE Preserve bits. 1056 ======================================== ================================================ 1057 1058.. _amdgpu_synid_src0_sel: 1059 1060src0_sel 1061~~~~~~~~ 1062 1063Controls which bits in the src0 are used. By default, all bits are used. 1064 1065 ======================================== ================================================ 1066 Syntax Description 1067 ======================================== ================================================ 1068 src0_sel:DWORD Use bits 31:0. 1069 src0_sel:BYTE_0 Use bits 7:0. 1070 src0_sel:BYTE_1 Use bits 15:8. 1071 src0_sel:BYTE_2 Use bits 23:16. 1072 src0_sel:BYTE_3 Use bits 31:24. 1073 src0_sel:WORD_0 Use bits 15:0. 1074 src0_sel:WORD_1 Use bits 31:16. 1075 ======================================== ================================================ 1076 1077.. _amdgpu_synid_src1_sel: 1078 1079src1_sel 1080~~~~~~~~ 1081 1082Controls which bits in the src1 are used. By default, all bits are used. 1083 1084 ======================================== ================================================ 1085 Syntax Description 1086 ======================================== ================================================ 1087 src1_sel:DWORD Use bits 31:0. 1088 src1_sel:BYTE_0 Use bits 7:0. 1089 src1_sel:BYTE_1 Use bits 15:8. 1090 src1_sel:BYTE_2 Use bits 23:16. 1091 src1_sel:BYTE_3 Use bits 31:24. 1092 src1_sel:WORD_0 Use bits 15:0. 1093 src1_sel:WORD_1 Use bits 31:16. 1094 ======================================== ================================================ 1095 1096.. _amdgpu_synid_sdwa_operand_modifiers: 1097 1098SDWA Operand Modifiers 1099---------------------- 1100 1101Operand modifiers are not used separately. They are applied to source operands. 1102 1103GFX8, GFX9 and GFX10 only. 1104 1105abs 1106~~~ 1107 1108See a description :ref:`here<amdgpu_synid_abs>`. 1109 1110neg 1111~~~ 1112 1113See a description :ref:`here<amdgpu_synid_neg>`. 1114 1115.. _amdgpu_synid_sext: 1116 1117sext 1118~~~~ 1119 1120Sign-extends value of a (sub-dword) operand to fill all 32 bits. 1121Has no effect for 32-bit operands. 1122 1123Valid for integer operands only. 1124 1125 ======================================== ================================================ 1126 Syntax Description 1127 ======================================== ================================================ 1128 sext(<operand>) Sign-extend operand value. 1129 ======================================== ================================================ 1130 1131Examples: 1132 1133.. parsed-literal:: 1134 1135 sext(v4) 1136 sext(v255) 1137 1138VOP3 Modifiers 1139-------------- 1140 1141.. _amdgpu_synid_vop3_op_sel: 1142 1143op_sel 1144~~~~~~ 1145 1146Selects the low [15:0] or high [31:16] operand bits for source and destination operands. 1147By default, low bits are used for all operands. 1148 1149The number of values specified with the op_sel modifier must match the number of instruction 1150operands (both source and destination). First value controls src0, second value controls src1 1151and so on, except that the last value controls destination. 1152The value 0 selects the low bits, while 1 selects the high bits. 1153 1154Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified 1155by op_sel must be 0. 1156 1157GFX9 and GFX10 only. 1158 1159 ======================================== ============================================================ 1160 Syntax Description 1161 ======================================== ============================================================ 1162 op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand. 1163 op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1164 op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1165 ======================================== ============================================================ 1166 1167Examples: 1168 1169.. parsed-literal:: 1170 1171 op_sel:[0,0] 1172 op_sel:[0,1] 1173 1174.. _amdgpu_synid_clamp: 1175 1176clamp 1177~~~~~ 1178 1179Clamp meaning depends on instruction. 1180 1181For *v_cmp* instructions, clamp modifier indicates that the compare signals 1182if a floating point exception occurs. By default, signaling is disabled. 1183Not supported by GFX7. 1184 1185For integer operations, clamp modifier indicates that the result must be clamped 1186to the largest and smallest representable value. By default, there is no clamping. 1187Integer clamping is not supported by GFX7. 1188 1189For floating point operations, clamp modifier indicates that the result must be clamped 1190to the range [0.0, 1.0]. By default, there is no clamping. 1191 1192Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any). 1193 1194 ======================================== ================================================ 1195 Syntax Description 1196 ======================================== ================================================ 1197 clamp Enables clamping (or signaling). 1198 ======================================== ================================================ 1199 1200.. _amdgpu_synid_omod: 1201 1202omod 1203~~~~ 1204 1205Specifies if an output modifier must be applied to the result. 1206By default, no output modifiers are applied. 1207 1208Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any). 1209 1210Output modifiers are valid for f32 and f64 floating point results only. 1211They must not be used with f16. 1212 1213Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result 1214but accepts output modifiers. 1215 1216 ======================================== ================================================ 1217 Syntax Description 1218 ======================================== ================================================ 1219 mul:2 Multiply the result by 2. 1220 mul:4 Multiply the result by 4. 1221 div:2 Multiply the result by 0.5. 1222 ======================================== ================================================ 1223 1224.. _amdgpu_synid_vop3_operand_modifiers: 1225 1226VOP3 Operand Modifiers 1227---------------------- 1228 1229Operand modifiers are not used separately. They are applied to source operands. 1230 1231.. _amdgpu_synid_abs: 1232 1233abs 1234~~~ 1235 1236Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any). 1237Valid for floating point operands only. 1238 1239 ======================================== ================================================ 1240 Syntax Description 1241 ======================================== ================================================ 1242 abs(<operand>) Get absolute value of operand. 1243 \|<operand>| The same as above. 1244 ======================================== ================================================ 1245 1246Examples: 1247 1248.. parsed-literal:: 1249 1250 abs(v36) 1251 \|v36| 1252 1253.. _amdgpu_synid_neg: 1254 1255neg 1256~~~ 1257 1258Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any). 1259Valid for floating point operands only. 1260 1261 ======================================== ================================================ 1262 Syntax Description 1263 ======================================== ================================================ 1264 neg(<operand>) Get negative value of operand. 1265 -<operand> The same as above. 1266 ======================================== ================================================ 1267 1268Examples: 1269 1270.. parsed-literal:: 1271 1272 neg(v[0]) 1273 -v4 1274 1275VOP3P Modifiers 1276--------------- 1277 1278This section describes modifiers of *regular* VOP3P instructions. 1279 1280*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* 1281instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`. 1282 1283GFX9 and GFX10 only. 1284 1285.. _amdgpu_synid_op_sel: 1286 1287op_sel 1288~~~~~~ 1289 1290Selects the low [15:0] or high [31:16] operand bits as input to the operation 1291which results in the lower-half of the destination. 1292By default, low bits are used for all operands. 1293 1294The number of values specified by the *op_sel* modifier must match the number of source 1295operands. First value controls src0, second value controls src1 and so on. 1296 1297The value 0 selects the low bits, while 1 selects the high bits. 1298 1299 ================================= ============================================================= 1300 Syntax Description 1301 ================================= ============================================================= 1302 op_sel:[{0..1}] Select operand bits for instructions with 1 source operand. 1303 op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1304 op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1305 ================================= ============================================================= 1306 1307Examples: 1308 1309.. parsed-literal:: 1310 1311 op_sel:[0,0] 1312 op_sel:[0,1,0] 1313 1314.. _amdgpu_synid_op_sel_hi: 1315 1316op_sel_hi 1317~~~~~~~~~ 1318 1319Selects the low [15:0] or high [31:16] operand bits as input to the operation 1320which results in the upper-half of the destination. 1321By default, high bits are used for all operands. 1322 1323The number of values specified by the *op_sel_hi* modifier must match the number of source 1324operands. First value controls src0, second value controls src1 and so on. 1325 1326The value 0 selects the low bits, while 1 selects the high bits. 1327 1328 =================================== ============================================================= 1329 Syntax Description 1330 =================================== ============================================================= 1331 op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand. 1332 op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1333 op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1334 =================================== ============================================================= 1335 1336Examples: 1337 1338.. parsed-literal:: 1339 1340 op_sel_hi:[0,0] 1341 op_sel_hi:[0,0,1] 1342 1343.. _amdgpu_synid_neg_lo: 1344 1345neg_lo 1346~~~~~~ 1347 1348Specifies whether to change sign of operand values selected by 1349:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used 1350as input to the operation which results in the upper-half of the destination. 1351 1352The number of values specified by this modifier must match the number of source 1353operands. First value controls src0, second value controls src1 and so on. 1354 1355The value 0 indicates that the corresponding operand value is used unmodified, 1356the value 1 indicates that negative value of the operand must be used. 1357 1358By default, operand values are used unmodified. 1359 1360This modifier is valid for floating point operands only. 1361 1362 ================================ ================================================================== 1363 Syntax Description 1364 ================================ ================================================================== 1365 neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand. 1366 neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. 1367 neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. 1368 ================================ ================================================================== 1369 1370Examples: 1371 1372.. parsed-literal:: 1373 1374 neg_lo:[0] 1375 neg_lo:[0,1] 1376 1377.. _amdgpu_synid_neg_hi: 1378 1379neg_hi 1380~~~~~~ 1381 1382Specifies whether to change sign of operand values selected by 1383:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used 1384as input to the operation which results in the upper-half of the destination. 1385 1386The number of values specified by this modifier must match the number of source 1387operands. First value controls src0, second value controls src1 and so on. 1388 1389The value 0 indicates that the corresponding operand value is used unmodified, 1390the value 1 indicates that negative value of the operand must be used. 1391 1392By default, operand values are used unmodified. 1393 1394This modifier is valid for floating point operands only. 1395 1396 =============================== ================================================================== 1397 Syntax Description 1398 =============================== ================================================================== 1399 neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand. 1400 neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. 1401 neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. 1402 =============================== ================================================================== 1403 1404Examples: 1405 1406.. parsed-literal:: 1407 1408 neg_hi:[1,0] 1409 neg_hi:[0,1,1] 1410 1411clamp 1412~~~~~ 1413 1414See a description :ref:`here<amdgpu_synid_clamp>`. 1415 1416.. _amdgpu_synid_mad_mix: 1417 1418VOP3P V_MAD_MIX Modifiers 1419------------------------- 1420 1421*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions 1422use *op_sel* and *op_sel_hi* modifiers 1423in a manner different from *regular* VOP3P instructions. 1424 1425See a description below. 1426 1427GFX9 and GFX10 only. 1428 1429.. _amdgpu_synid_mad_mix_op_sel: 1430 1431m_op_sel 1432~~~~~~~~ 1433 1434This operand has meaning only for 16-bit source operands as indicated by 1435:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`. 1436It specifies to select either the low [15:0] or high [31:16] operand bits 1437as input to the operation. 1438 1439The number of values specified by the *op_sel* modifier must match the number of source 1440operands. First value controls src0, second value controls src1 and so on. 1441 1442The value 0 indicates the low bits, the value 1 indicates the high 16 bits. 1443 1444By default, low bits are used for all operands. 1445 1446 =============================== ================================================ 1447 Syntax Description 1448 =============================== ================================================ 1449 op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand. 1450 =============================== ================================================ 1451 1452Examples: 1453 1454.. parsed-literal:: 1455 1456 op_sel:[0,1] 1457 1458.. _amdgpu_synid_mad_mix_op_sel_hi: 1459 1460m_op_sel_hi 1461~~~~~~~~~~~ 1462 1463Selects the size of source operands: either 32 bits or 16 bits. 1464By default, 32 bits are used for all source operands. 1465 1466The number of values specified by the *op_sel_hi* modifier must match the number of source 1467operands. First value controls src0, second value controls src1 and so on. 1468 1469The value 0 indicates 32 bits, the value 1 indicates 16 bits. 1470 1471The location of 16 bits in the operand may be specified by 1472:ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`. 1473 1474 ======================================== ==================================== 1475 Syntax Description 1476 ======================================== ==================================== 1477 op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand. 1478 ======================================== ==================================== 1479 1480Examples: 1481 1482.. parsed-literal:: 1483 1484 op_sel_hi:[1,1,1] 1485 1486abs 1487~~~ 1488 1489See a description :ref:`here<amdgpu_synid_abs>`. 1490 1491neg 1492~~~ 1493 1494See a description :ref:`here<amdgpu_synid_neg>`. 1495 1496clamp 1497~~~~~ 1498 1499See a description :ref:`here<amdgpu_synid_clamp>`. 1500