1====================================== 2Syntax of AMDGPU Instruction Modifiers 3====================================== 4 5.. contents:: 6 :local: 7 8Conventions 9=========== 10 11The following notation is used throughout this document: 12 13 =================== ============================================================= 14 Notation Description 15 =================== ============================================================= 16 {0..N} Any integer value in the range from 0 to N (inclusive). 17 <x> Syntax and meaning of *x* is explained elsewhere. 18 =================== ============================================================= 19 20.. _amdgpu_syn_modifiers: 21 22Modifiers 23========= 24 25DS Modifiers 26------------ 27 28.. _amdgpu_synid_ds_offset80: 29 30offset0 31~~~~~~~ 32 33Specifies first 8-bit offset, in bytes. The default value is 0. 34 35Used with DS instructions that expect two addresses. 36 37 =================== ==================================================================== 38 Syntax Description 39 =================== ==================================================================== 40 offset0:{0..0xFF} Specifies an unsigned 8-bit offset as a positive 41 :ref:`integer number <amdgpu_synid_integer_number>` 42 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 43 =================== ==================================================================== 44 45Examples: 46 47.. parsed-literal:: 48 49 offset0:0xff 50 offset0:2-x 51 offset0:-x-y 52 53.. _amdgpu_synid_ds_offset81: 54 55offset1 56~~~~~~~ 57 58Specifies second 8-bit offset, in bytes. The default value is 0. 59 60Used with DS instructions that expect two addresses. 61 62 =================== ==================================================================== 63 Syntax Description 64 =================== ==================================================================== 65 offset1:{0..0xFF} Specifies an unsigned 8-bit offset as a positive 66 :ref:`integer number <amdgpu_synid_integer_number>` 67 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 68 =================== ==================================================================== 69 70Examples: 71 72.. parsed-literal:: 73 74 offset1:0xff 75 offset1:2-x 76 offset1:-x-y 77 78.. _amdgpu_synid_ds_offset16: 79 80offset 81~~~~~~ 82 83Specifies a 16-bit offset, in bytes. The default value is 0. 84 85Used with DS instructions that expect a single address. 86 87 ==================== ==================================================================== 88 Syntax Description 89 ==================== ==================================================================== 90 offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive 91 :ref:`integer number <amdgpu_synid_integer_number>` 92 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 93 ==================== ==================================================================== 94 95Examples: 96 97.. parsed-literal:: 98 99 offset:65535 100 offset:0xffff 101 offset:-x-y 102 103.. _amdgpu_synid_sw_offset16: 104 105swizzle pattern 106~~~~~~~~~~~~~~~ 107 108This is a special modifier which may be used with *ds_swizzle_b32* instruction only. 109It specifies a swizzle pattern in numeric or symbolic form. The default value is 0. 110 111See AMD documentation for more information. 112 113 ======================================================= =========================================================== 114 Syntax Description 115 ======================================================= =========================================================== 116 offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern. 117 offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern 118 119 Each number is a lane *id*. 120 offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern. 121 122 The pattern converts a 5-bit lane *id* to another 123 lane *id* with which the lane interacts. 124 125 *mask* is a 5 character sequence which 126 specifies how to transform the bits of the 127 lane *id*. 128 129 The following characters are allowed: 130 131 * "0" - set bit to 0. 132 133 * "1" - set bit to 1. 134 135 * "p" - preserve bit. 136 137 * "i" - inverse bit. 138 139 offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode. 140 141 Broadcasts the value of any particular lane to 142 all lanes in its group. 143 144 The first numeric parameter is a group 145 size and must be equal to 2, 4, 8, 16 or 32. 146 147 The second numeric parameter is an index of the 148 lane being broadcasted. 149 150 The index must not exceed group size. 151 offset:swizzle(SWAP,{1..16}) Specifies a swap mode. 152 153 Swaps the neighboring groups of 154 1, 2, 4, 8 or 16 lanes. 155 offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. 156 157 Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes. 158 ======================================================= =========================================================== 159 160Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or 161:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 162 163Examples: 164 165.. parsed-literal:: 166 167 offset:255 168 offset:0xffff 169 offset:swizzle(QUAD_PERM, 0, 1, 2, 3) 170 offset:swizzle(BITMASK_PERM, "01pi0") 171 offset:swizzle(BROADCAST, 2, 0) 172 offset:swizzle(SWAP, 8) 173 offset:swizzle(REVERSE, 30 + 2) 174 175.. _amdgpu_synid_gds: 176 177gds 178~~~ 179 180Specifies whether to use GDS or LDS memory (LDS is the default). 181 182 ======================================== ================================================ 183 Syntax Description 184 ======================================== ================================================ 185 gds Use GDS memory. 186 ======================================== ================================================ 187 188 189EXP Modifiers 190------------- 191 192.. _amdgpu_synid_done: 193 194done 195~~~~ 196 197Specifies if this is the last export from the shader to the target. By default, 198*exp* instruction does not finish an export sequence. 199 200 ======================================== ================================================ 201 Syntax Description 202 ======================================== ================================================ 203 done Indicates the last export operation. 204 ======================================== ================================================ 205 206.. _amdgpu_synid_compr: 207 208compr 209~~~~~ 210 211Indicates if the data are compressed (data are not compressed by default). 212 213 ======================================== ================================================ 214 Syntax Description 215 ======================================== ================================================ 216 compr Data are compressed. 217 ======================================== ================================================ 218 219.. _amdgpu_synid_vm: 220 221vm 222~~ 223 224Specifies valid mask flag state (off by default). 225 226 ======================================== ================================================ 227 Syntax Description 228 ======================================== ================================================ 229 vm Set valid mask flag. 230 ======================================== ================================================ 231 232FLAT Modifiers 233-------------- 234 235.. _amdgpu_synid_flat_offset12: 236 237offset12 238~~~~~~~~ 239 240Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. 241 242Cannot be used with *global/scratch* opcodes. GFX9 only. 243 244 ================= ==================================================================== 245 Syntax Description 246 ================= ==================================================================== 247 offset:{0..4095} Specifies a 12-bit unsigned offset as a positive 248 :ref:`integer number <amdgpu_synid_integer_number>` 249 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 250 ================= ==================================================================== 251 252Examples: 253 254.. parsed-literal:: 255 256 offset:4095 257 offset:x-0xff 258 259.. _amdgpu_synid_flat_offset13s: 260 261offset13s 262~~~~~~~~~ 263 264Specifies an immediate signed 13-bit offset, in bytes. The default value is 0. 265 266Can be used with *global/scratch* opcodes only. GFX9 only. 267 268 ===================== ==================================================================== 269 Syntax Description 270 ===================== ==================================================================== 271 offset:{-4096..4095} Specifies a 13-bit signed offset as an 272 :ref:`integer number <amdgpu_synid_integer_number>` 273 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 274 ===================== ==================================================================== 275 276Examples: 277 278.. parsed-literal:: 279 280 offset:-4000 281 offset:0x10 282 offset:-x 283 284.. _amdgpu_synid_flat_offset12s: 285 286offset12s 287~~~~~~~~~ 288 289Specifies an immediate signed 12-bit offset, in bytes. The default value is 0. 290 291Can be used with *global/scratch* opcodes only. 292 293GFX10 only. 294 295 ===================== ==================================================================== 296 Syntax Description 297 ===================== ==================================================================== 298 offset:{-2048..2047} Specifies a 12-bit signed offset as an 299 :ref:`integer number <amdgpu_synid_integer_number>` 300 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 301 ===================== ==================================================================== 302 303Examples: 304 305.. parsed-literal:: 306 307 offset:-2000 308 offset:0x10 309 offset:-x+y 310 311.. _amdgpu_synid_flat_offset11: 312 313offset11 314~~~~~~~~ 315 316Specifies an immediate unsigned 11-bit offset, in bytes. The default value is 0. 317 318Cannot be used with *global/scratch* opcodes. 319 320GFX10 only. 321 322 ================= ==================================================================== 323 Syntax Description 324 ================= ==================================================================== 325 offset:{0..2047} Specifies an 11-bit unsigned offset as a positive 326 :ref:`integer number <amdgpu_synid_integer_number>` 327 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 328 ================= ==================================================================== 329 330Examples: 331 332.. parsed-literal:: 333 334 offset:2047 335 offset:x+0xff 336 337dlc 338~~~ 339 340See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 341 342glc 343~~~ 344 345See a description :ref:`here<amdgpu_synid_glc>`. 346 347lds 348~~~ 349 350See a description :ref:`here<amdgpu_synid_lds>`. GFX10 only. 351 352slc 353~~~ 354 355See a description :ref:`here<amdgpu_synid_slc>`. 356 357tfe 358~~~ 359 360See a description :ref:`here<amdgpu_synid_tfe>`. 361 362nv 363~~ 364 365See a description :ref:`here<amdgpu_synid_nv>`. 366 367MIMG Modifiers 368-------------- 369 370.. _amdgpu_synid_dmask: 371 372dmask 373~~~~~ 374 375Specifies which channels (image components) are used by the operation. By default, no channels 376are used. 377 378 =============== ==================================================================== 379 Syntax Description 380 =============== ==================================================================== 381 dmask:{0..15} Specifies image channels as a positive 382 :ref:`integer number <amdgpu_synid_integer_number>` 383 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 384 385 Each bit corresponds to one of 4 image components (RGBA). 386 387 If the specified bit value is 0, the component is not used, 388 value 1 means that the component is used. 389 =============== ==================================================================== 390 391This modifier has some limitations depending on instruction kind: 392 393 =================================================== ======================== 394 Instruction Kind Valid dmask Values 395 =================================================== ======================== 396 32-bit atomic *cmpswap* 0x3 397 32-bit atomic instructions except for *cmpswap* 0x1 398 64-bit atomic *cmpswap* 0xF 399 64-bit atomic instructions except for *cmpswap* 0x3 400 *gather4* 0x1, 0x2, 0x4, 0x8 401 Other instructions any value 402 =================================================== ======================== 403 404Examples: 405 406.. parsed-literal:: 407 408 dmask:0xf 409 dmask:0b1111 410 dmask:x|y|z 411 412.. _amdgpu_synid_unorm: 413 414unorm 415~~~~~ 416 417Specifies whether the address is normalized or not (the address is normalized by default). 418 419 ======================== ======================================== 420 Syntax Description 421 ======================== ======================================== 422 unorm Force the address to be unnormalized. 423 ======================== ======================================== 424 425glc 426~~~ 427 428See a description :ref:`here<amdgpu_synid_glc>`. 429 430slc 431~~~ 432 433See a description :ref:`here<amdgpu_synid_slc>`. 434 435.. _amdgpu_synid_r128: 436 437r128 438~~~~ 439 440Specifies texture resource size. The default size is 256 bits. 441 442GFX7, GFX8 and GFX10 only. 443 444 =================== ================================================ 445 Syntax Description 446 =================== ================================================ 447 r128 Specifies 128 bits texture resource size. 448 =================== ================================================ 449 450.. WARNING:: Using this modifier should decrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature. 451 452tfe 453~~~ 454 455See a description :ref:`here<amdgpu_synid_tfe>`. 456 457.. _amdgpu_synid_lwe: 458 459lwe 460~~~ 461 462Specifies LOD warning status (LOD warning is disabled by default). 463 464 ======================================== ================================================ 465 Syntax Description 466 ======================================== ================================================ 467 lwe Enables LOD warning. 468 ======================================== ================================================ 469 470.. _amdgpu_synid_da: 471 472da 473~~ 474 475Specifies if an array index must be sent to TA. By default, array index is not sent. 476 477 ======================================== ================================================ 478 Syntax Description 479 ======================================== ================================================ 480 da Send an array-index to TA. 481 ======================================== ================================================ 482 483.. _amdgpu_synid_d16: 484 485d16 486~~~ 487 488Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7. 489 490 ======================================== ================================================ 491 Syntax Description 492 ======================================== ================================================ 493 d16 Enables 16-bits data mode. 494 495 On loads, convert data in memory to 16-bit 496 format before storing it in VGPRs. 497 498 For stores, convert 16-bit data in VGPRs to 499 32 bits before going to memory. 500 501 Note that GFX8.0 does not support data packing. 502 Each 16-bit data element occupies 1 VGPR. 503 504 GFX8.1, GFX9 and GFX10 support data packing. 505 Each pair of 16-bit data elements 506 occupies 1 VGPR. 507 ======================================== ================================================ 508 509.. _amdgpu_synid_a16: 510 511a16 512~~~ 513 514Specifies size of image address components: 16 or 32 bits (32 bits by default). 515GFX9 and GFX10 only. 516 517 ======================================== ================================================ 518 Syntax Description 519 ======================================== ================================================ 520 a16 Enables 16-bits image address components. 521 ======================================== ================================================ 522 523.. _amdgpu_synid_dim: 524 525dim 526~~~ 527 528Specifies surface dimension. This is a mandatory modifier. There is no default value. 529 530GFX10 only. 531 532 =============================== ========================================================= 533 Syntax Description 534 =============================== ========================================================= 535 dim:1D One-dimensional image. 536 dim:2D Two-dimensional image. 537 dim:3D Three-dimensional image. 538 dim:CUBE Cubemap array. 539 dim:1D_ARRAY One-dimensional image array. 540 dim:2D_ARRAY Two-dimensional image array. 541 dim:2D_MSAA Two-dimensional multi-sample auto-aliasing image. 542 dim:2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array. 543 =============================== ========================================================= 544 545The following table defines an alternative syntax which is supported 546for compatibility with SP3 assembler: 547 548 =============================== ========================================================= 549 Syntax Description 550 =============================== ========================================================= 551 dim:SQ_RSRC_IMG_1D One-dimensional image. 552 dim:SQ_RSRC_IMG_2D Two-dimensional image. 553 dim:SQ_RSRC_IMG_3D Three-dimensional image. 554 dim:SQ_RSRC_IMG_CUBE Cubemap array. 555 dim:SQ_RSRC_IMG_1D_ARRAY One-dimensional image array. 556 dim:SQ_RSRC_IMG_2D_ARRAY Two-dimensional image array. 557 dim:SQ_RSRC_IMG_2D_MSAA Two-dimensional multi-sample auto-aliasing image. 558 dim:SQ_RSRC_IMG_2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array. 559 =============================== ========================================================= 560 561dlc 562~~~ 563 564See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 565 566Miscellaneous Modifiers 567----------------------- 568 569.. _amdgpu_synid_dlc: 570 571dlc 572~~~ 573 574Controls device level cache policy for memory operations. Used for synchronization. 575When specified, forces operation to bypass device level cache making the operation device 576level coherent. By default, instructions use device level cache. 577 578GFX10 only. 579 580 ======================================== ================================================ 581 Syntax Description 582 ======================================== ================================================ 583 dlc Bypass device level cache. 584 ======================================== ================================================ 585 586.. _amdgpu_synid_glc: 587 588glc 589~~~ 590 591This modifier has different meaning for loads, stores, and atomic operations. 592The default value is off (0). 593 594See AMD documentation for details. 595 596 ======================================== ================================================ 597 Syntax Description 598 ======================================== ================================================ 599 glc Set glc bit to 1. 600 ======================================== ================================================ 601 602.. _amdgpu_synid_lds: 603 604lds 605~~~ 606 607Specifies where to store the result: VGPRs or LDS (VGPRs by default). 608 609 ======================================== =========================== 610 Syntax Description 611 ======================================== =========================== 612 lds Store result in LDS. 613 ======================================== =========================== 614 615.. _amdgpu_synid_nv: 616 617nv 618~~ 619 620Specifies if instruction is operating on non-volatile memory. By default, memory is volatile. 621 622GFX9 only. 623 624 ======================================== ================================================ 625 Syntax Description 626 ======================================== ================================================ 627 nv Indicates that instruction operates on 628 non-volatile memory. 629 ======================================== ================================================ 630 631.. _amdgpu_synid_slc: 632 633slc 634~~~ 635 636Specifies cache policy. The default value is off (0). 637 638See AMD documentation for details. 639 640 ======================================== ================================================ 641 Syntax Description 642 ======================================== ================================================ 643 slc Set slc bit to 1. 644 ======================================== ================================================ 645 646.. _amdgpu_synid_tfe: 647 648tfe 649~~~ 650 651Controls access to partially resident textures. The default value is off (0). 652 653See AMD documentation for details. 654 655 ======================================== ================================================ 656 Syntax Description 657 ======================================== ================================================ 658 tfe Set tfe bit to 1. 659 ======================================== ================================================ 660 661MUBUF/MTBUF Modifiers 662--------------------- 663 664.. _amdgpu_synid_idxen: 665 666idxen 667~~~~~ 668 669Specifies whether address components include an index. By default, no components are used. 670 671Can be used together with :ref:`offen<amdgpu_synid_offen>`. 672 673Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. 674 675 ======================================== ================================================ 676 Syntax Description 677 ======================================== ================================================ 678 idxen Address components include an index. 679 ======================================== ================================================ 680 681.. _amdgpu_synid_offen: 682 683offen 684~~~~~ 685 686Specifies whether address components include an offset. By default, no components are used. 687 688Can be used together with :ref:`idxen<amdgpu_synid_idxen>`. 689 690Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. 691 692 ======================================== ================================================ 693 Syntax Description 694 ======================================== ================================================ 695 offen Address components include an offset. 696 ======================================== ================================================ 697 698.. _amdgpu_synid_addr64: 699 700addr64 701~~~~~~ 702 703Specifies whether a 64-bit address is used. By default, no address is used. 704 705GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and 706:ref:`idxen<amdgpu_synid_idxen>` modifiers. 707 708 ======================================== ================================================ 709 Syntax Description 710 ======================================== ================================================ 711 addr64 A 64-bit address is used. 712 ======================================== ================================================ 713 714.. _amdgpu_synid_buf_offset12: 715 716offset12 717~~~~~~~~ 718 719Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. 720 721 ================== ==================================================================== 722 Syntax Description 723 ================== ==================================================================== 724 offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive 725 :ref:`integer number <amdgpu_synid_integer_number>` 726 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 727 ================== ==================================================================== 728 729Examples: 730 731.. parsed-literal:: 732 733 offset:x+y 734 offset:0x10 735 736glc 737~~~ 738 739See a description :ref:`here<amdgpu_synid_glc>`. 740 741slc 742~~~ 743 744See a description :ref:`here<amdgpu_synid_slc>`. 745 746lds 747~~~ 748 749See a description :ref:`here<amdgpu_synid_lds>`. 750 751dlc 752~~~ 753 754See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 755 756tfe 757~~~ 758 759See a description :ref:`here<amdgpu_synid_tfe>`. 760 761.. _amdgpu_synid_fmt: 762 763fmt 764~~~ 765 766Specifies data and numeric formats used by the operation. 767The default numeric format is BUF_NUM_FORMAT_UNORM. 768The default data format is BUF_DATA_FORMAT_8. 769 770 ========================================= =============================================================== 771 Syntax Description 772 ========================================= =============================================================== 773 format:{0..127} Use format specified as either an 774 :ref:`integer number<amdgpu_synid_integer_number>` or an 775 :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 776 format:[<data format>] Use the specified data format and 777 default numeric format. 778 format:[<numeric format>] Use the specified numeric format and 779 default data format. 780 format:[<data format>, <numeric format>] Use the specified data and numeric formats. 781 format:[<numeric format>, <data format>] Use the specified data and numeric formats. 782 ========================================= =============================================================== 783 784.. _amdgpu_synid_format_data: 785 786Supported data formats are defined in the following table: 787 788 ========================================= =============================== 789 Syntax Note 790 ========================================= =============================== 791 BUF_DATA_FORMAT_INVALID 792 BUF_DATA_FORMAT_8 Default value. 793 BUF_DATA_FORMAT_16 794 BUF_DATA_FORMAT_8_8 795 BUF_DATA_FORMAT_32 796 BUF_DATA_FORMAT_16_16 797 BUF_DATA_FORMAT_10_11_11 798 BUF_DATA_FORMAT_11_11_10 799 BUF_DATA_FORMAT_10_10_10_2 800 BUF_DATA_FORMAT_2_10_10_10 801 BUF_DATA_FORMAT_8_8_8_8 802 BUF_DATA_FORMAT_32_32 803 BUF_DATA_FORMAT_16_16_16_16 804 BUF_DATA_FORMAT_32_32_32 805 BUF_DATA_FORMAT_32_32_32_32 806 BUF_DATA_FORMAT_RESERVED_15 807 ========================================= =============================== 808 809.. _amdgpu_synid_format_num: 810 811Supported numeric formats are defined below: 812 813 ========================================= =============================== 814 Syntax Note 815 ========================================= =============================== 816 BUF_NUM_FORMAT_UNORM Default value. 817 BUF_NUM_FORMAT_SNORM 818 BUF_NUM_FORMAT_USCALED 819 BUF_NUM_FORMAT_SSCALED 820 BUF_NUM_FORMAT_UINT 821 BUF_NUM_FORMAT_SINT 822 BUF_NUM_FORMAT_SNORM_OGL GFX7 only. 823 BUF_NUM_FORMAT_RESERVED_6 GFX8 and GFX9 only. 824 BUF_NUM_FORMAT_FLOAT 825 ========================================= =============================== 826 827Examples: 828 829.. parsed-literal:: 830 831 format:0 832 format:127 833 format:[BUF_DATA_FORMAT_16] 834 format:[BUF_DATA_FORMAT_16,BUF_NUM_FORMAT_SSCALED] 835 format:[BUF_NUM_FORMAT_FLOAT] 836 837.. _amdgpu_synid_ufmt: 838 839ufmt 840~~~~ 841 842Specifies a unified format used by the operation. 843The default format is BUF_FMT_8_UNORM. 844GFX10 only. 845 846 ========================================= =============================================================== 847 Syntax Description 848 ========================================= =============================================================== 849 format:{0..127} Use unified format specified as either an 850 :ref:`integer number<amdgpu_synid_integer_number>` or an 851 :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 852 Note that unified format numbers are not compatible with 853 format numbers used for pre-GFX10 ISA. 854 format:[<unified format>] Use the specified unified format. 855 ========================================= =============================================================== 856 857Unified format is a replacement for :ref:`data<amdgpu_synid_format_data>` 858and :ref:`numeric<amdgpu_synid_format_num>` formats. For compatibility with older ISA, 859:ref:`syntax with data and numeric formats<amdgpu_synid_fmt>` is still accepted 860provided that the combination of formats can be mapped to a unified format. 861 862Supported unified formats and equivalent combinations of data and numeric formats 863are defined below: 864 865 ============================== ============================== ============================= 866 Syntax Equivalent Data Format Equivalent Numeric Format 867 ============================== ============================== ============================= 868 BUF_FMT_INVALID BUF_DATA_FORMAT_INVALID BUF_NUM_FORMAT_UNORM 869 870 BUF_FMT_8_UNORM BUF_DATA_FORMAT_8 BUF_NUM_FORMAT_UNORM 871 BUF_FMT_8_SNORM BUF_DATA_FORMAT_8 BUF_NUM_FORMAT_SNORM 872 BUF_FMT_8_USCALED BUF_DATA_FORMAT_8 BUF_NUM_FORMAT_USCALED 873 BUF_FMT_8_SSCALED BUF_DATA_FORMAT_8 BUF_NUM_FORMAT_SSCALED 874 BUF_FMT_8_UINT BUF_DATA_FORMAT_8 BUF_NUM_FORMAT_UINT 875 BUF_FMT_8_SINT BUF_DATA_FORMAT_8 BUF_NUM_FORMAT_SINT 876 877 BUF_FMT_16_UNORM BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_UNORM 878 BUF_FMT_16_SNORM BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_SNORM 879 BUF_FMT_16_USCALED BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_USCALED 880 BUF_FMT_16_SSCALED BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_SSCALED 881 BUF_FMT_16_UINT BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_UINT 882 BUF_FMT_16_SINT BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_SINT 883 BUF_FMT_16_FLOAT BUF_DATA_FORMAT_16 BUF_NUM_FORMAT_FLOAT 884 885 BUF_FMT_8_8_UNORM BUF_DATA_FORMAT_8_8 BUF_NUM_FORMAT_UNORM 886 BUF_FMT_8_8_SNORM BUF_DATA_FORMAT_8_8 BUF_NUM_FORMAT_SNORM 887 BUF_FMT_8_8_USCALED BUF_DATA_FORMAT_8_8 BUF_NUM_FORMAT_USCALED 888 BUF_FMT_8_8_SSCALED BUF_DATA_FORMAT_8_8 BUF_NUM_FORMAT_SSCALED 889 BUF_FMT_8_8_UINT BUF_DATA_FORMAT_8_8 BUF_NUM_FORMAT_UINT 890 BUF_FMT_8_8_SINT BUF_DATA_FORMAT_8_8 BUF_NUM_FORMAT_SINT 891 892 BUF_FMT_32_UINT BUF_DATA_FORMAT_32 BUF_NUM_FORMAT_UINT 893 BUF_FMT_32_SINT BUF_DATA_FORMAT_32 BUF_NUM_FORMAT_SINT 894 BUF_FMT_32_FLOAT BUF_DATA_FORMAT_32 BUF_NUM_FORMAT_FLOAT 895 896 BUF_FMT_16_16_UNORM BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_UNORM 897 BUF_FMT_16_16_SNORM BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_SNORM 898 BUF_FMT_16_16_USCALED BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_USCALED 899 BUF_FMT_16_16_SSCALED BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_SSCALED 900 BUF_FMT_16_16_UINT BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_UINT 901 BUF_FMT_16_16_SINT BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_SINT 902 BUF_FMT_16_16_FLOAT BUF_DATA_FORMAT_16_16 BUF_NUM_FORMAT_FLOAT 903 904 BUF_FMT_10_11_11_UNORM BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_UNORM 905 BUF_FMT_10_11_11_SNORM BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_SNORM 906 BUF_FMT_10_11_11_USCALED BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_USCALED 907 BUF_FMT_10_11_11_SSCALED BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_SSCALED 908 BUF_FMT_10_11_11_UINT BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_UINT 909 BUF_FMT_10_11_11_SINT BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_SINT 910 BUF_FMT_10_11_11_FLOAT BUF_DATA_FORMAT_10_11_11 BUF_NUM_FORMAT_FLOAT 911 912 BUF_FMT_11_11_10_UNORM BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_UNORM 913 BUF_FMT_11_11_10_SNORM BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_SNORM 914 BUF_FMT_11_11_10_USCALED BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_USCALED 915 BUF_FMT_11_11_10_SSCALED BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_SSCALED 916 BUF_FMT_11_11_10_UINT BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_UINT 917 BUF_FMT_11_11_10_SINT BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_SINT 918 BUF_FMT_11_11_10_FLOAT BUF_DATA_FORMAT_11_11_10 BUF_NUM_FORMAT_FLOAT 919 920 BUF_FMT_10_10_10_2_UNORM BUF_DATA_FORMAT_10_10_10_2 BUF_NUM_FORMAT_UNORM 921 BUF_FMT_10_10_10_2_SNORM BUF_DATA_FORMAT_10_10_10_2 BUF_NUM_FORMAT_SNORM 922 BUF_FMT_10_10_10_2_USCALED BUF_DATA_FORMAT_10_10_10_2 BUF_NUM_FORMAT_USCALED 923 BUF_FMT_10_10_10_2_SSCALED BUF_DATA_FORMAT_10_10_10_2 BUF_NUM_FORMAT_SSCALED 924 BUF_FMT_10_10_10_2_UINT BUF_DATA_FORMAT_10_10_10_2 BUF_NUM_FORMAT_UINT 925 BUF_FMT_10_10_10_2_SINT BUF_DATA_FORMAT_10_10_10_2 BUF_NUM_FORMAT_SINT 926 927 BUF_FMT_2_10_10_10_UNORM BUF_DATA_FORMAT_2_10_10_10 BUF_NUM_FORMAT_UNORM 928 BUF_FMT_2_10_10_10_SNORM BUF_DATA_FORMAT_2_10_10_10 BUF_NUM_FORMAT_SNORM 929 BUF_FMT_2_10_10_10_USCALED BUF_DATA_FORMAT_2_10_10_10 BUF_NUM_FORMAT_USCALED 930 BUF_FMT_2_10_10_10_SSCALED BUF_DATA_FORMAT_2_10_10_10 BUF_NUM_FORMAT_SSCALED 931 BUF_FMT_2_10_10_10_UINT BUF_DATA_FORMAT_2_10_10_10 BUF_NUM_FORMAT_UINT 932 BUF_FMT_2_10_10_10_SINT BUF_DATA_FORMAT_2_10_10_10 BUF_NUM_FORMAT_SINT 933 934 BUF_FMT_8_8_8_8_UNORM BUF_DATA_FORMAT_8_8_8_8 BUF_NUM_FORMAT_UNORM 935 BUF_FMT_8_8_8_8_SNORM BUF_DATA_FORMAT_8_8_8_8 BUF_NUM_FORMAT_SNORM 936 BUF_FMT_8_8_8_8_USCALED BUF_DATA_FORMAT_8_8_8_8 BUF_NUM_FORMAT_USCALED 937 BUF_FMT_8_8_8_8_SSCALED BUF_DATA_FORMAT_8_8_8_8 BUF_NUM_FORMAT_SSCALED 938 BUF_FMT_8_8_8_8_UINT BUF_DATA_FORMAT_8_8_8_8 BUF_NUM_FORMAT_UINT 939 BUF_FMT_8_8_8_8_SINT BUF_DATA_FORMAT_8_8_8_8 BUF_NUM_FORMAT_SINT 940 941 BUF_FMT_32_32_UINT BUF_DATA_FORMAT_32_32 BUF_NUM_FORMAT_UINT 942 BUF_FMT_32_32_SINT BUF_DATA_FORMAT_32_32 BUF_NUM_FORMAT_SINT 943 BUF_FMT_32_32_FLOAT BUF_DATA_FORMAT_32_32 BUF_NUM_FORMAT_FLOAT 944 945 BUF_FMT_16_16_16_16_UNORM BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_UNORM 946 BUF_FMT_16_16_16_16_SNORM BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_SNORM 947 BUF_FMT_16_16_16_16_USCALED BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_USCALED 948 BUF_FMT_16_16_16_16_SSCALED BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_SSCALED 949 BUF_FMT_16_16_16_16_UINT BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_UINT 950 BUF_FMT_16_16_16_16_SINT BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_SINT 951 BUF_FMT_16_16_16_16_FLOAT BUF_DATA_FORMAT_16_16_16_16 BUF_NUM_FORMAT_FLOAT 952 953 BUF_FMT_32_32_32_UINT BUF_DATA_FORMAT_32_32_32 BUF_NUM_FORMAT_UINT 954 BUF_FMT_32_32_32_SINT BUF_DATA_FORMAT_32_32_32 BUF_NUM_FORMAT_SINT 955 BUF_FMT_32_32_32_FLOAT BUF_DATA_FORMAT_32_32_32 BUF_NUM_FORMAT_FLOAT 956 BUF_FMT_32_32_32_32_UINT BUF_DATA_FORMAT_32_32_32_32 BUF_NUM_FORMAT_UINT 957 BUF_FMT_32_32_32_32_SINT BUF_DATA_FORMAT_32_32_32_32 BUF_NUM_FORMAT_SINT 958 BUF_FMT_32_32_32_32_FLOAT BUF_DATA_FORMAT_32_32_32_32 BUF_NUM_FORMAT_FLOAT 959 ============================== ============================== ============================= 960 961Examples: 962 963.. parsed-literal:: 964 965 format:0 966 format:[BUF_FMT_32_UINT] 967 968SMRD/SMEM Modifiers 969------------------- 970 971glc 972~~~ 973 974See a description :ref:`here<amdgpu_synid_glc>`. 975 976nv 977~~ 978 979See a description :ref:`here<amdgpu_synid_nv>`. GFX9 only. 980 981dlc 982~~~ 983 984See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only. 985 986VINTRP Modifiers 987---------------- 988 989.. _amdgpu_synid_high: 990 991high 992~~~~ 993 994Specifies which half of the LDS word to use. Low half of LDS word is used by default. 995GFX9 and GFX10 only. 996 997 ======================================== ================================ 998 Syntax Description 999 ======================================== ================================ 1000 high Use high half of LDS word. 1001 ======================================== ================================ 1002 1003DPP8 Modifiers 1004-------------- 1005 1006GFX10 only. 1007 1008.. _amdgpu_synid_dpp8_sel: 1009 1010dpp8_sel 1011~~~~~~~~ 1012 1013Selects which lanes to pull data from, within a group of 8 lanes. This is a mandatory modifier. 1014There is no default value. 1015 1016GFX10 only. 1017 1018The *dpp8_sel* modifier must specify exactly 8 values. 1019First value selects which lane to read from to supply data into lane 0. 1020Second value controls lane 1 and so on. 1021 1022Each value may be specified as either 1023an :ref:`integer number<amdgpu_synid_integer_number>` or 1024an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 1025 1026 =============================================================== =========================== 1027 Syntax Description 1028 =============================================================== =========================== 1029 dpp8:[{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7}] Select lanes to read from. 1030 =============================================================== =========================== 1031 1032Examples: 1033 1034.. parsed-literal:: 1035 1036 dpp8:[7,6,5,4,3,2,1,0] 1037 dpp8:[0,1,0,1,0,1,0,1] 1038 1039.. _amdgpu_synid_fi8: 1040 1041fi 1042~~ 1043 1044Controls interaction with inactive lanes for *dpp8* instructions. The default value is zero. 1045 1046Note: *inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero. 1047 1048GFX10 only. 1049 1050 ==================================== ===================================================== 1051 Syntax Description 1052 ==================================== ===================================================== 1053 fi:0 Fetch zero when accessing data from inactive lanes. 1054 fi:1 Fetch pre-exist values from inactive lanes. 1055 ==================================== ===================================================== 1056 1057Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or 1058:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1059 1060DPP Modifiers 1061------------- 1062 1063GFX8, GFX9 and GFX10 only. 1064 1065.. _amdgpu_synid_dpp_ctrl: 1066 1067dpp_ctrl 1068~~~~~~~~ 1069 1070Specifies how data are shared between threads. This is a mandatory modifier. 1071There is no default value. 1072 1073GFX8 and GFX9 only. Use :ref:`dpp16_ctrl<amdgpu_synid_dpp16_ctrl>` for GFX10. 1074 1075Note: the lanes of a wavefront are organized in four *rows* and four *banks*. 1076 1077 ======================================== ================================================ 1078 Syntax Description 1079 ======================================== ================================================ 1080 quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. 1081 row_mirror Mirror threads within row. 1082 row_half_mirror Mirror threads within 1/2 row (8 threads). 1083 row_bcast:15 Broadcast 15th thread of each row to next row. 1084 row_bcast:31 Broadcast thread 31 to rows 2 and 3. 1085 wave_shl:1 Wavefront left shift by 1 thread. 1086 wave_rol:1 Wavefront left rotate by 1 thread. 1087 wave_shr:1 Wavefront right shift by 1 thread. 1088 wave_ror:1 Wavefront right rotate by 1 thread. 1089 row_shl:{1..15} Row shift left by 1-15 threads. 1090 row_shr:{1..15} Row shift right by 1-15 threads. 1091 row_ror:{1..15} Row rotate right by 1-15 threads. 1092 ======================================== ================================================ 1093 1094Note: numeric values may be specified as either 1095:ref:`integer numbers<amdgpu_synid_integer_number>` or 1096:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1097 1098Examples: 1099 1100.. parsed-literal:: 1101 1102 quad_perm:[0, 1, 2, 3] 1103 row_shl:3 1104 1105.. _amdgpu_synid_dpp16_ctrl: 1106 1107dpp16_ctrl 1108~~~~~~~~~~ 1109 1110Specifies how data are shared between threads. This is a mandatory modifier. 1111There is no default value. 1112 1113GFX10 only. Use :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` for GFX8 and GFX9. 1114 1115Note: the lanes of a wavefront are organized in four *rows* and four *banks*. 1116(There are only two rows in *wave32* mode.) 1117 1118 ======================================== ==================================================== 1119 Syntax Description 1120 ======================================== ==================================================== 1121 quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. 1122 row_mirror Mirror threads within row. 1123 row_half_mirror Mirror threads within 1/2 row (8 threads). 1124 row_share:{0..15} Share the value from the specified lane with other 1125 lanes in the row. 1126 row_xmask:{0..15} Fetch from XOR(current lane id, specified lane id). 1127 row_shl:{1..15} Row shift left by 1-15 threads. 1128 row_shr:{1..15} Row shift right by 1-15 threads. 1129 row_ror:{1..15} Row rotate right by 1-15 threads. 1130 ======================================== ==================================================== 1131 1132Note: numeric values may be specified as either 1133:ref:`integer numbers<amdgpu_synid_integer_number>` or 1134:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1135 1136Examples: 1137 1138.. parsed-literal:: 1139 1140 quad_perm:[0, 1, 2, 3] 1141 row_shl:3 1142 1143.. _amdgpu_synid_dpp32_ctrl: 1144 1145dpp32_ctrl 1146~~~~~~~~~~ 1147 1148Specifies how data are shared between threads. This is a mandatory modifier. 1149There is no default value. 1150 1151May be used only with GFX90A 32-bit instructions. 1152 1153Note: the lanes of a wavefront are organized in four *rows* and four *banks*. 1154 1155 ======================================== ================================================== 1156 Syntax Description 1157 ======================================== ================================================== 1158 quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. 1159 row_mirror Mirror threads within row. 1160 row_half_mirror Mirror threads within 1/2 row (8 threads). 1161 row_bcast:15 Broadcast 15th thread of each row to next row. 1162 row_bcast:31 Broadcast thread 31 to rows 2 and 3. 1163 wave_shl:1 Wavefront left shift by 1 thread. 1164 wave_rol:1 Wavefront left rotate by 1 thread. 1165 wave_shr:1 Wavefront right shift by 1 thread. 1166 wave_ror:1 Wavefront right rotate by 1 thread. 1167 row_shl:{1..15} Row shift left by 1-15 threads. 1168 row_shr:{1..15} Row shift right by 1-15 threads. 1169 row_ror:{1..15} Row rotate right by 1-15 threads. 1170 row_newbcast:{1..15} Broadcast a thread within a row to the whole row. 1171 ======================================== ================================================== 1172 1173Note: numeric values may be specified as either 1174:ref:`integer numbers<amdgpu_synid_integer_number>` or 1175:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1176 1177Examples: 1178 1179.. parsed-literal:: 1180 1181 quad_perm:[0, 1, 2, 3] 1182 row_shl:3 1183 1184 1185.. _amdgpu_synid_dpp64_ctrl: 1186 1187dpp64_ctrl 1188~~~~~~~~~~ 1189 1190Specifies how data are shared between threads. This is a mandatory modifier. 1191There is no default value. 1192 1193May be used only with GFX90A 64-bit instructions. 1194 1195Note: the lanes of a wavefront are organized in four *rows* and four *banks*. 1196 1197 ======================================== ================================================== 1198 Syntax Description 1199 ======================================== ================================================== 1200 row_newbcast:{1..15} Broadcast a thread within a row to the whole row. 1201 ======================================== ================================================== 1202 1203Note: numeric values may be specified as either 1204:ref:`integer numbers<amdgpu_synid_integer_number>` or 1205:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1206 1207Examples: 1208 1209.. parsed-literal:: 1210 1211 row_newbcast:3 1212 1213 1214.. _amdgpu_synid_row_mask: 1215 1216row_mask 1217~~~~~~~~ 1218 1219Controls which rows are enabled for data sharing. By default, all rows are enabled. 1220 1221Note: the lanes of a wavefront are organized in four *rows* and four *banks*. 1222(There are only two rows in *wave32* mode.) 1223 1224 ================= ==================================================================== 1225 Syntax Description 1226 ================= ==================================================================== 1227 row_mask:{0..15} Specifies a *row mask* as a positive 1228 :ref:`integer number <amdgpu_synid_integer_number>` 1229 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 1230 1231 Each of 4 bits in the mask controls one row 1232 (0 - disabled, 1 - enabled). 1233 1234 In *wave32* mode the values should be limited to 0..7. 1235 ================= ==================================================================== 1236 1237Examples: 1238 1239.. parsed-literal:: 1240 1241 row_mask:0xf 1242 row_mask:0b1010 1243 row_mask:x|y 1244 1245.. _amdgpu_synid_bank_mask: 1246 1247bank_mask 1248~~~~~~~~~ 1249 1250Controls which banks are enabled for data sharing. By default, all banks are enabled. 1251 1252Note: the lanes of a wavefront are organized in four *rows* and four *banks*. 1253(There are only two rows in *wave32* mode.) 1254 1255 ================== ==================================================================== 1256 Syntax Description 1257 ================== ==================================================================== 1258 bank_mask:{0..15} Specifies a *bank mask* as a positive 1259 :ref:`integer number <amdgpu_synid_integer_number>` 1260 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 1261 1262 Each of 4 bits in the mask controls one bank 1263 (0 - disabled, 1 - enabled). 1264 ================== ==================================================================== 1265 1266Examples: 1267 1268.. parsed-literal:: 1269 1270 bank_mask:0x3 1271 bank_mask:0b0011 1272 bank_mask:x&y 1273 1274.. _amdgpu_synid_bound_ctrl: 1275 1276bound_ctrl 1277~~~~~~~~~~ 1278 1279Controls data sharing when accessing an invalid lane. By default, data sharing with 1280invalid lanes is disabled. 1281 1282 ======================================== ================================================ 1283 Syntax Description 1284 ======================================== ================================================ 1285 bound_ctrl:1 Enables data sharing with invalid lanes. 1286 1287 Accessing data from an invalid lane will 1288 return zero. 1289 ======================================== ================================================ 1290 1291.. _amdgpu_synid_fi16: 1292 1293fi 1294~~ 1295 1296Controls interaction with *inactive* lanes for *dpp16* instructions. The default value is zero. 1297 1298Note: *inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero. 1299 1300GFX10 only. 1301 1302 ======================================== ================================================== 1303 Syntax Description 1304 ======================================== ================================================== 1305 fi:0 Interaction with inactive lanes is controlled by 1306 :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`. 1307 1308 fi:1 Fetch pre-exist values from inactive lanes. 1309 ======================================== ================================================== 1310 1311Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or 1312:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1313 1314SDWA Modifiers 1315-------------- 1316 1317GFX8, GFX9 and GFX10 only. 1318 1319clamp 1320~~~~~ 1321 1322See a description :ref:`here<amdgpu_synid_clamp>`. 1323 1324omod 1325~~~~ 1326 1327See a description :ref:`here<amdgpu_synid_omod>`. 1328 1329GFX9 and GFX10 only. 1330 1331.. _amdgpu_synid_dst_sel: 1332 1333dst_sel 1334~~~~~~~ 1335 1336Selects which bits in the destination are affected. By default, all bits are affected. 1337 1338 ======================================== ================================================ 1339 Syntax Description 1340 ======================================== ================================================ 1341 dst_sel:DWORD Use bits 31:0. 1342 dst_sel:BYTE_0 Use bits 7:0. 1343 dst_sel:BYTE_1 Use bits 15:8. 1344 dst_sel:BYTE_2 Use bits 23:16. 1345 dst_sel:BYTE_3 Use bits 31:24. 1346 dst_sel:WORD_0 Use bits 15:0. 1347 dst_sel:WORD_1 Use bits 31:16. 1348 ======================================== ================================================ 1349 1350.. _amdgpu_synid_dst_unused: 1351 1352dst_unused 1353~~~~~~~~~~ 1354 1355Controls what to do with the bits in the destination which are not selected 1356by :ref:`dst_sel<amdgpu_synid_dst_sel>`. 1357By default, unused bits are preserved. 1358 1359 ======================================== ================================================ 1360 Syntax Description 1361 ======================================== ================================================ 1362 dst_unused:UNUSED_PAD Pad with zeros. 1363 dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits. 1364 dst_unused:UNUSED_PRESERVE Preserve bits. 1365 ======================================== ================================================ 1366 1367.. _amdgpu_synid_src0_sel: 1368 1369src0_sel 1370~~~~~~~~ 1371 1372Controls which bits in the src0 are used. By default, all bits are used. 1373 1374 ======================================== ================================================ 1375 Syntax Description 1376 ======================================== ================================================ 1377 src0_sel:DWORD Use bits 31:0. 1378 src0_sel:BYTE_0 Use bits 7:0. 1379 src0_sel:BYTE_1 Use bits 15:8. 1380 src0_sel:BYTE_2 Use bits 23:16. 1381 src0_sel:BYTE_3 Use bits 31:24. 1382 src0_sel:WORD_0 Use bits 15:0. 1383 src0_sel:WORD_1 Use bits 31:16. 1384 ======================================== ================================================ 1385 1386.. _amdgpu_synid_src1_sel: 1387 1388src1_sel 1389~~~~~~~~ 1390 1391Controls which bits in the src1 are used. By default, all bits are used. 1392 1393 ======================================== ================================================ 1394 Syntax Description 1395 ======================================== ================================================ 1396 src1_sel:DWORD Use bits 31:0. 1397 src1_sel:BYTE_0 Use bits 7:0. 1398 src1_sel:BYTE_1 Use bits 15:8. 1399 src1_sel:BYTE_2 Use bits 23:16. 1400 src1_sel:BYTE_3 Use bits 31:24. 1401 src1_sel:WORD_0 Use bits 15:0. 1402 src1_sel:WORD_1 Use bits 31:16. 1403 ======================================== ================================================ 1404 1405.. _amdgpu_synid_sdwa_operand_modifiers: 1406 1407SDWA Operand Modifiers 1408---------------------- 1409 1410Operand modifiers are not used separately. They are applied to source operands. 1411 1412GFX8, GFX9 and GFX10 only. 1413 1414abs 1415~~~ 1416 1417See a description :ref:`here<amdgpu_synid_abs>`. 1418 1419neg 1420~~~ 1421 1422See a description :ref:`here<amdgpu_synid_neg>`. 1423 1424.. _amdgpu_synid_sext: 1425 1426sext 1427~~~~ 1428 1429Sign-extends value of a (sub-dword) operand to fill all 32 bits. 1430Has no effect for 32-bit operands. 1431 1432Valid for integer operands only. 1433 1434 ======================================== ================================================ 1435 Syntax Description 1436 ======================================== ================================================ 1437 sext(<operand>) Sign-extend operand value. 1438 ======================================== ================================================ 1439 1440Examples: 1441 1442.. parsed-literal:: 1443 1444 sext(v4) 1445 sext(v255) 1446 1447VOP3 Modifiers 1448-------------- 1449 1450.. _amdgpu_synid_vop3_op_sel: 1451 1452op_sel 1453~~~~~~ 1454 1455Selects the low [15:0] or high [31:16] operand bits for source and destination operands. 1456By default, low bits are used for all operands. 1457 1458The number of values specified with the op_sel modifier must match the number of instruction 1459operands (both source and destination). First value controls src0, second value controls src1 1460and so on, except that the last value controls destination. 1461The value 0 selects the low bits, while 1 selects the high bits. 1462 1463Note: op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified 1464by op_sel must be 0. 1465 1466GFX9 and GFX10 only. 1467 1468 ======================================== ============================================================ 1469 Syntax Description 1470 ======================================== ============================================================ 1471 op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand. 1472 op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1473 op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1474 ======================================== ============================================================ 1475 1476Note: numeric values may be specified as either 1477:ref:`integer numbers<amdgpu_synid_integer_number>` or 1478:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1479 1480Examples: 1481 1482.. parsed-literal:: 1483 1484 op_sel:[0,0] 1485 op_sel:[0,1] 1486 1487.. _amdgpu_synid_dpp_op_sel: 1488 1489dpp_op_sel 1490~~~~~~~~~~ 1491 1492Special version of *op_sel* used for *permlane* opcodes to specify 1493dpp-like mode bits - :ref:`fi<amdgpu_synid_fi16>` and 1494:ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`. 1495 1496GFX10 only. 1497 1498 ======================================== ============================================================ 1499 Syntax Description 1500 ======================================== ============================================================ 1501 op_sel:[{0..1},{0..1}] First bit specifies :ref:`fi<amdgpu_synid_fi16>`, second 1502 bit specifies :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`. 1503 ======================================== ============================================================ 1504 1505Note: numeric values may be specified as either 1506:ref:`integer numbers<amdgpu_synid_integer_number>` or 1507:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1508 1509Examples: 1510 1511.. parsed-literal:: 1512 1513 op_sel:[0,0] 1514 1515.. _amdgpu_synid_clamp: 1516 1517clamp 1518~~~~~ 1519 1520Clamp meaning depends on instruction. 1521 1522For *v_cmp* instructions, clamp modifier indicates that the compare signals 1523if a floating point exception occurs. By default, signaling is disabled. 1524Not supported by GFX7. 1525 1526For integer operations, clamp modifier indicates that the result must be clamped 1527to the largest and smallest representable value. By default, there is no clamping. 1528Integer clamping is not supported by GFX7. 1529 1530For floating point operations, clamp modifier indicates that the result must be clamped 1531to the range [0.0, 1.0]. By default, there is no clamping. 1532 1533Note: clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any). 1534 1535 ======================================== ================================================ 1536 Syntax Description 1537 ======================================== ================================================ 1538 clamp Enables clamping (or signaling). 1539 ======================================== ================================================ 1540 1541.. _amdgpu_synid_omod: 1542 1543omod 1544~~~~ 1545 1546Specifies if an output modifier must be applied to the result. 1547By default, no output modifiers are applied. 1548 1549Note: output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any). 1550 1551Output modifiers are valid for f32 and f64 floating point results only. 1552They must not be used with f16. 1553 1554Note: *v_cvt_f16_f32* is an exception. This instruction produces f16 result 1555but accepts output modifiers. 1556 1557 ======================================== ================================================ 1558 Syntax Description 1559 ======================================== ================================================ 1560 mul:2 Multiply the result by 2. 1561 mul:4 Multiply the result by 4. 1562 div:2 Multiply the result by 0.5. 1563 ======================================== ================================================ 1564 1565Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or 1566:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1567 1568Examples: 1569 1570.. parsed-literal:: 1571 1572 mul:2 1573 mul:x // x must be equal to 2 or 4 1574 1575.. _amdgpu_synid_vop3_operand_modifiers: 1576 1577VOP3 Operand Modifiers 1578---------------------- 1579 1580Operand modifiers are not used separately. They are applied to source operands. 1581 1582.. _amdgpu_synid_abs: 1583 1584abs 1585~~~ 1586 1587Computes the absolute value of its operand. Must be applied before :ref:`neg<amdgpu_synid_neg>` 1588(if any). Valid for floating point operands only. 1589 1590 ======================================== ==================================================== 1591 Syntax Description 1592 ======================================== ==================================================== 1593 abs(<operand>) Get the absolute value of a floating-point operand. 1594 \|<operand>| The same as above (an SP3 syntax). 1595 ======================================== ==================================================== 1596 1597Note: avoid using SP3 syntax with operands specified as expressions because the trailing '|' 1598may be misinterpreted. Such operands should be enclosed into additional parentheses as shown 1599in examples below. 1600 1601Examples: 1602 1603.. parsed-literal:: 1604 1605 abs(v36) 1606 \|v36| 1607 abs(x|y) // ok 1608 \|(x|y)| // additional parentheses are required 1609 1610.. _amdgpu_synid_neg: 1611 1612neg 1613~~~ 1614 1615Computes the negative value of its operand. Must be applied after :ref:`abs<amdgpu_synid_abs>` 1616(if any). Valid for floating point operands only. 1617 1618 ================== ==================================================== 1619 Syntax Description 1620 ================== ==================================================== 1621 neg(<operand>) Get the negative value of a floating-point operand. 1622 The operand may include an optional 1623 :ref:`abs<amdgpu_synid_abs>` modifier. 1624 -<operand> The same as above (an SP3 syntax). 1625 ================== ==================================================== 1626 1627Note: SP3 syntax is supported with limitations because of a potential ambiguity. 1628Currently it is allowed in the following cases: 1629 1630* Before a register. 1631* Before an :ref:`abs<amdgpu_synid_abs>` modifier. 1632* Before an SP3 :ref:`abs<amdgpu_synid_abs>` modifier. 1633 1634In all other cases "-" is handled as a part of an expression that follows the sign. 1635 1636Examples: 1637 1638.. parsed-literal:: 1639 1640 // Operands with negate modifiers 1641 neg(v[0]) 1642 neg(1.0) 1643 neg(abs(v0)) 1644 -v5 1645 -abs(v5) 1646 -\|v5| 1647 1648 // Operands without negate modifiers 1649 -1 1650 -x+y 1651 1652VOP3P Modifiers 1653--------------- 1654 1655This section describes modifiers of *regular* VOP3P instructions. 1656 1657*v_mad_mix\** and *v_fma_mix\** 1658instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`. 1659 1660GFX9 and GFX10 only. 1661 1662.. _amdgpu_synid_op_sel: 1663 1664op_sel 1665~~~~~~ 1666 1667Selects the low [15:0] or high [31:16] operand bits as input to the operation 1668which results in the lower-half of the destination. 1669By default, low bits are used for all operands. 1670 1671The number of values specified by the *op_sel* modifier must match the number of source 1672operands. First value controls src0, second value controls src1 and so on. 1673 1674The value 0 selects the low bits, while 1 selects the high bits. 1675 1676 ================================= ============================================================= 1677 Syntax Description 1678 ================================= ============================================================= 1679 op_sel:[{0..1}] Select operand bits for instructions with 1 source operand. 1680 op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1681 op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1682 ================================= ============================================================= 1683 1684Note: numeric values may be specified as either 1685:ref:`integer numbers<amdgpu_synid_integer_number>` or 1686:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1687 1688Examples: 1689 1690.. parsed-literal:: 1691 1692 op_sel:[0,0] 1693 op_sel:[0,1,0] 1694 1695.. _amdgpu_synid_op_sel_hi: 1696 1697op_sel_hi 1698~~~~~~~~~ 1699 1700Selects the low [15:0] or high [31:16] operand bits as input to the operation 1701which results in the upper-half of the destination. 1702By default, high bits are used for all operands. 1703 1704The number of values specified by the *op_sel_hi* modifier must match the number of source 1705operands. First value controls src0, second value controls src1 and so on. 1706 1707The value 0 selects the low bits, while 1 selects the high bits. 1708 1709 =================================== ============================================================= 1710 Syntax Description 1711 =================================== ============================================================= 1712 op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand. 1713 op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1714 op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1715 =================================== ============================================================= 1716 1717Note: numeric values may be specified as either 1718:ref:`integer numbers<amdgpu_synid_integer_number>` or 1719:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1720 1721Examples: 1722 1723.. parsed-literal:: 1724 1725 op_sel_hi:[0,0] 1726 op_sel_hi:[0,0,1] 1727 1728.. _amdgpu_synid_neg_lo: 1729 1730neg_lo 1731~~~~~~ 1732 1733Specifies whether to change sign of operand values selected by 1734:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used 1735as input to the operation which results in the upper-half of the destination. 1736 1737The number of values specified by this modifier must match the number of source 1738operands. First value controls src0, second value controls src1 and so on. 1739 1740The value 0 indicates that the corresponding operand value is used unmodified, 1741the value 1 indicates that negative value of the operand must be used. 1742 1743By default, operand values are used unmodified. 1744 1745This modifier is valid for floating point operands only. 1746 1747 ================================ ================================================================== 1748 Syntax Description 1749 ================================ ================================================================== 1750 neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand. 1751 neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. 1752 neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. 1753 ================================ ================================================================== 1754 1755Note: numeric values may be specified as either 1756:ref:`integer numbers<amdgpu_synid_integer_number>` or 1757:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1758 1759Examples: 1760 1761.. parsed-literal:: 1762 1763 neg_lo:[0] 1764 neg_lo:[0,1] 1765 1766.. _amdgpu_synid_neg_hi: 1767 1768neg_hi 1769~~~~~~ 1770 1771Specifies whether to change sign of operand values selected by 1772:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used 1773as input to the operation which results in the upper-half of the destination. 1774 1775The number of values specified by this modifier must match the number of source 1776operands. First value controls src0, second value controls src1 and so on. 1777 1778The value 0 indicates that the corresponding operand value is used unmodified, 1779the value 1 indicates that negative value of the operand must be used. 1780 1781By default, operand values are used unmodified. 1782 1783This modifier is valid for floating point operands only. 1784 1785 =============================== ================================================================== 1786 Syntax Description 1787 =============================== ================================================================== 1788 neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand. 1789 neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. 1790 neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. 1791 =============================== ================================================================== 1792 1793Note: numeric values may be specified as either 1794:ref:`integer numbers<amdgpu_synid_integer_number>` or 1795:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1796 1797Examples: 1798 1799.. parsed-literal:: 1800 1801 neg_hi:[1,0] 1802 neg_hi:[0,1,1] 1803 1804clamp 1805~~~~~ 1806 1807See a description :ref:`here<amdgpu_synid_clamp>`. 1808 1809.. _amdgpu_synid_mad_mix: 1810 1811VOP3P MAD_MIX/FMA_MIX Modifiers 1812------------------------------- 1813 1814*v_mad_mix\** and *v_fma_mix\** 1815instructions use *op_sel* and *op_sel_hi* modifiers 1816in a manner different from *regular* VOP3P instructions. 1817 1818See a description below. 1819 1820GFX9 and GFX10 only. 1821 1822.. _amdgpu_synid_mad_mix_op_sel: 1823 1824m_op_sel 1825~~~~~~~~ 1826 1827This operand has meaning only for 16-bit source operands as indicated by 1828:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`. 1829It specifies to select either the low [15:0] or high [31:16] operand bits 1830as input to the operation. 1831 1832The number of values specified by the *op_sel* modifier must match the number of source 1833operands. First value controls src0, second value controls src1 and so on. 1834 1835The value 0 indicates the low bits, the value 1 indicates the high 16 bits. 1836 1837By default, low bits are used for all operands. 1838 1839 =============================== ================================================ 1840 Syntax Description 1841 =============================== ================================================ 1842 op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand. 1843 =============================== ================================================ 1844 1845Note: numeric values may be specified as either 1846:ref:`integer numbers<amdgpu_synid_integer_number>` or 1847:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1848 1849Examples: 1850 1851.. parsed-literal:: 1852 1853 op_sel:[0,1] 1854 1855.. _amdgpu_synid_mad_mix_op_sel_hi: 1856 1857m_op_sel_hi 1858~~~~~~~~~~~ 1859 1860Selects the size of source operands: either 32 bits or 16 bits. 1861By default, 32 bits are used for all source operands. 1862 1863The number of values specified by the *op_sel_hi* modifier must match the number of source 1864operands. First value controls src0, second value controls src1 and so on. 1865 1866The value 0 indicates 32 bits, the value 1 indicates 16 bits. 1867 1868The location of 16 bits in the operand may be specified by 1869:ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`. 1870 1871 ======================================== ==================================== 1872 Syntax Description 1873 ======================================== ==================================== 1874 op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand. 1875 ======================================== ==================================== 1876 1877Note: numeric values may be specified as either 1878:ref:`integer numbers<amdgpu_synid_integer_number>` or 1879:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 1880 1881Examples: 1882 1883.. parsed-literal:: 1884 1885 op_sel_hi:[1,1,1] 1886 1887abs 1888~~~ 1889 1890See a description :ref:`here<amdgpu_synid_abs>`. 1891 1892neg 1893~~~ 1894 1895See a description :ref:`here<amdgpu_synid_neg>`. 1896 1897clamp 1898~~~~~ 1899 1900See a description :ref:`here<amdgpu_synid_clamp>`. 1901 1902VOP3P MFMA Modifiers 1903-------------------- 1904 1905These modifiers may only be used with GFX908 and GFX90A. 1906 1907.. _amdgpu_synid_cbsz: 1908 1909cbsz 1910~~~~ 1911 1912Specifies a broadcast mode. 1913 1914 =============================== ================================================================== 1915 Syntax Description 1916 =============================== ================================================================== 1917 cbsz:[{0..7}] A broadcast mode. 1918 =============================== ================================================================== 1919 1920Note: numeric value may be specified as either 1921an :ref:`integer number<amdgpu_synid_integer_number>` or 1922an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 1923 1924.. _amdgpu_synid_abid: 1925 1926abid 1927~~~~ 1928 1929Specifies matrix A group select. 1930 1931 =============================== ================================================================== 1932 Syntax Description 1933 =============================== ================================================================== 1934 abid:[{0..15}] Matrix A group select id. 1935 =============================== ================================================================== 1936 1937Note: numeric value may be specified as either 1938an :ref:`integer number<amdgpu_synid_integer_number>` or 1939an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 1940 1941.. _amdgpu_synid_blgp: 1942 1943blgp 1944~~~~ 1945 1946Specifies matrix B lane group pattern. 1947 1948 =============================== ================================================================== 1949 Syntax Description 1950 =============================== ================================================================== 1951 blgp:[{0..7}] Matrix B lane group pattern. 1952 =============================== ================================================================== 1953 1954Note: numeric value may be specified as either 1955an :ref:`integer number<amdgpu_synid_integer_number>` or 1956an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 1957 1958