1====================================== 2Syntax of AMDGPU Instruction Modifiers 3====================================== 4 5.. contents:: 6 :local: 7 8Conventions 9=========== 10 11The following notation is used throughout this document: 12 13 =================== ============================================================= 14 Notation Description 15 =================== ============================================================= 16 {0..N} Any integer value in the range from 0 to N (inclusive). 17 <x> Syntax and meaning of *x* is explained elsewhere. 18 =================== ============================================================= 19 20.. _amdgpu_syn_modifiers: 21 22Modifiers 23========= 24 25DS Modifiers 26------------ 27 28.. _amdgpu_synid_ds_offset8: 29 30offset8 31~~~~~~~ 32 33Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0. 34 35Used with DS instructions which have 2 addresses. 36 37 =================== ===================================================== 38 Syntax Description 39 =================== ===================================================== 40 offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive 41 :ref:`integer number <amdgpu_synid_integer_number>`. 42 =================== ===================================================== 43 44Examples: 45 46.. parsed-literal:: 47 48 offset:255 49 offset:0xff 50 51.. _amdgpu_synid_ds_offset16: 52 53offset16 54~~~~~~~~ 55 56Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0. 57 58Used with DS instructions which have 1 address. 59 60 ==================== ====================================================== 61 Syntax Description 62 ==================== ====================================================== 63 offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive 64 :ref:`integer number <amdgpu_synid_integer_number>`. 65 ==================== ====================================================== 66 67Examples: 68 69.. parsed-literal:: 70 71 offset:65535 72 offset:0xffff 73 74.. _amdgpu_synid_sw_offset16: 75 76pattern 77~~~~~~~ 78 79This is a special modifier which may be used with *ds_swizzle_b32* instruction only. 80It specifies a swizzle pattern in numeric or symbolic form. The default value is 0. 81 82See AMD documentation for more information. 83 84 ======================================================= =========================================================== 85 Syntax Description 86 ======================================================= =========================================================== 87 offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern. 88 offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern 89 90 Each number is a lane *id*. 91 offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern. 92 93 The pattern converts a 5-bit lane *id* to another 94 lane *id* with which the lane interacts. 95 96 *mask* is a 5 character sequence which 97 specifies how to transform the bits of the 98 lane *id*. 99 100 The following characters are allowed: 101 102 * "0" - set bit to 0. 103 104 * "1" - set bit to 1. 105 106 * "p" - preserve bit. 107 108 * "i" - inverse bit. 109 110 offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode. 111 112 Broadcasts the value of any particular lane to 113 all lanes in its group. 114 115 The first numeric parameter is a group 116 size and must be equal to 2, 4, 8, 16 or 32. 117 118 The second numeric parameter is an index of the 119 lane being broadcasted. 120 121 The index must not exceed group size. 122 offset:swizzle(SWAP,{1..16}) Specifies a swap mode. 123 124 Swaps the neighboring groups of 125 1, 2, 4, 8 or 16 lanes. 126 offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. 127 128 Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes. 129 ======================================================= =========================================================== 130 131Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or 132:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 133 134Examples: 135 136.. parsed-literal:: 137 138 offset:255 139 offset:0xffff 140 offset:swizzle(QUAD_PERM, 0, 1, 2 ,3) 141 offset:swizzle(BITMASK_PERM, "01pi0") 142 offset:swizzle(BROADCAST, 2, 0) 143 offset:swizzle(SWAP, 8) 144 offset:swizzle(REVERSE, 30 + 2) 145 146.. _amdgpu_synid_gds: 147 148gds 149~~~ 150 151Specifies whether to use GDS or LDS memory (LDS is the default). 152 153 ======================================== ================================================ 154 Syntax Description 155 ======================================== ================================================ 156 gds Use GDS memory. 157 ======================================== ================================================ 158 159 160EXP Modifiers 161------------- 162 163.. _amdgpu_synid_done: 164 165done 166~~~~ 167 168Specifies if this is the last export from the shader to the target. By default, current 169instruction does not finish an export sequence. 170 171 ======================================== ================================================ 172 Syntax Description 173 ======================================== ================================================ 174 done Indicates the last export operation. 175 ======================================== ================================================ 176 177.. _amdgpu_synid_compr: 178 179compr 180~~~~~ 181 182Indicates if the data are compressed (data are not compressed by default). 183 184 ======================================== ================================================ 185 Syntax Description 186 ======================================== ================================================ 187 compr Data are compressed. 188 ======================================== ================================================ 189 190.. _amdgpu_synid_vm: 191 192vm 193~~ 194 195Specifies valid mask flag state (off by default). 196 197 ======================================== ================================================ 198 Syntax Description 199 ======================================== ================================================ 200 vm Set valid mask flag. 201 ======================================== ================================================ 202 203FLAT Modifiers 204-------------- 205 206.. _amdgpu_synid_flat_offset12: 207 208offset12 209~~~~~~~~ 210 211Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. 212 213Cannot be used with *global/scratch* opcodes. GFX9 only. 214 215 ================= ====================================================== 216 Syntax Description 217 ================= ====================================================== 218 offset:{0..4095} Specifies a 12-bit unsigned offset as a positive 219 :ref:`integer number <amdgpu_synid_integer_number>`. 220 ================= ====================================================== 221 222Examples: 223 224.. parsed-literal:: 225 226 offset:4095 227 offset:0xff 228 229.. _amdgpu_synid_flat_offset13s: 230 231offset13s 232~~~~~~~~~ 233 234Specifies an immediate signed 13-bit offset, in bytes. The default value is 0. 235 236Can be used with *global/scratch* opcodes only. GFX9 only. 237 238 ============================ ======================================================= 239 Syntax Description 240 ============================ ======================================================= 241 offset:{-4096..4095} Specifies a 13-bit signed offset as an 242 :ref:`integer number <amdgpu_synid_integer_number>`. 243 ============================ ======================================================= 244 245Examples: 246 247.. parsed-literal:: 248 249 offset:-4000 250 offset:0x10 251 252glc 253~~~ 254 255See a description :ref:`here<amdgpu_synid_glc>`. 256 257slc 258~~~ 259 260See a description :ref:`here<amdgpu_synid_slc>`. 261 262tfe 263~~~ 264 265See a description :ref:`here<amdgpu_synid_tfe>`. 266 267nv 268~~ 269 270See a description :ref:`here<amdgpu_synid_nv>`. 271 272MIMG Modifiers 273-------------- 274 275.. _amdgpu_synid_dmask: 276 277dmask 278~~~~~ 279 280Specifies which channels (image components) are used by the operation. By default, no channels 281are used. 282 283 =============== ===================================================== 284 Syntax Description 285 =============== ===================================================== 286 dmask:{0..15} Specifies image channels as a positive 287 :ref:`integer number <amdgpu_synid_integer_number>`. 288 289 Each bit corresponds to one of 4 image 290 components (RGBA). 291 292 If the specified bit value 293 is 0, the component is not used, value 1 means 294 that the component is used. 295 =============== ===================================================== 296 297This modifier has some limitations depending on instruction kind: 298 299 =================================================== ======================== 300 Instruction Kind Valid dmask Values 301 =================================================== ======================== 302 32-bit atomic *cmpswap* 0x3 303 32-bit atomic instructions except for *cmpswap* 0x1 304 64-bit atomic *cmpswap* 0xF 305 64-bit atomic instructions except for *cmpswap* 0x3 306 *gather4* 0x1, 0x2, 0x4, 0x8 307 Other instructions any value 308 =================================================== ======================== 309 310Examples: 311 312.. parsed-literal:: 313 314 dmask:0xf 315 dmask:0b1111 316 dmask:3 317 318.. _amdgpu_synid_unorm: 319 320unorm 321~~~~~ 322 323Specifies whether the address is normalized or not (the address is normalized by default). 324 325 ======================== ======================================== 326 Syntax Description 327 ======================== ======================================== 328 unorm Force the address to be unnormalized. 329 ======================== ======================================== 330 331glc 332~~~ 333 334See a description :ref:`here<amdgpu_synid_glc>`. 335 336slc 337~~~ 338 339See a description :ref:`here<amdgpu_synid_slc>`. 340 341.. _amdgpu_synid_r128: 342 343r128 344~~~~ 345 346Specifies texture resource size. The default size is 256 bits. 347 348GFX7 and GFX8 only. 349 350 =================== ================================================ 351 Syntax Description 352 =================== ================================================ 353 r128 Specifies 128 bits texture resource size. 354 =================== ================================================ 355 356.. WARNING:: Using this modifier should descrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature. 357 358tfe 359~~~ 360 361See a description :ref:`here<amdgpu_synid_tfe>`. 362 363.. _amdgpu_synid_lwe: 364 365lwe 366~~~ 367 368Specifies LOD warning status (LOD warning is disabled by default). 369 370 ======================================== ================================================ 371 Syntax Description 372 ======================================== ================================================ 373 lwe Enables LOD warning. 374 ======================================== ================================================ 375 376.. _amdgpu_synid_da: 377 378da 379~~ 380 381Specifies if an array index must be sent to TA. By default, array index is not sent. 382 383 ======================================== ================================================ 384 Syntax Description 385 ======================================== ================================================ 386 da Send an array-index to TA. 387 ======================================== ================================================ 388 389.. _amdgpu_synid_d16: 390 391d16 392~~~ 393 394Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7. 395 396 ======================================== ================================================ 397 Syntax Description 398 ======================================== ================================================ 399 d16 Enables 16-bits data mode. 400 401 On loads, convert data in memory to 16-bit 402 format before storing it in VGPRs. 403 404 For stores, convert 16-bit data in VGPRs to 405 32 bits before going to memory. 406 407 Note that GFX8.0 does not support data packing. 408 Each 16-bit data element occupies 1 VGPR. 409 410 GFX8.1 and GFX9 support data packing. 411 Each pair of 16-bit data elements 412 occupies 1 VGPR. 413 ======================================== ================================================ 414 415.. _amdgpu_synid_a16: 416 417a16 418~~~ 419 420Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only. 421 422 ======================================== ================================================ 423 Syntax Description 424 ======================================== ================================================ 425 a16 Enables 16-bits image address components. 426 ======================================== ================================================ 427 428Miscellaneous Modifiers 429----------------------- 430 431.. _amdgpu_synid_glc: 432 433glc 434~~~ 435 436This modifier has different meaning for loads, stores, and atomic operations. 437The default value is off (0). 438 439See AMD documentation for details. 440 441 ======================================== ================================================ 442 Syntax Description 443 ======================================== ================================================ 444 glc Set glc bit to 1. 445 ======================================== ================================================ 446 447.. _amdgpu_synid_slc: 448 449slc 450~~~ 451 452Specifies cache policy. The default value is off (0). 453 454See AMD documentation for details. 455 456 ======================================== ================================================ 457 Syntax Description 458 ======================================== ================================================ 459 slc Set slc bit to 1. 460 ======================================== ================================================ 461 462.. _amdgpu_synid_tfe: 463 464tfe 465~~~ 466 467Controls access to partially resident textures. The default value is off (0). 468 469See AMD documentation for details. 470 471 ======================================== ================================================ 472 Syntax Description 473 ======================================== ================================================ 474 tfe Set tfe bit to 1. 475 ======================================== ================================================ 476 477.. _amdgpu_synid_nv: 478 479nv 480~~ 481 482Specifies if instruction is operating on non-volatile memory. By default, memory is volatile. 483 484GFX9 only. 485 486 ======================================== ================================================ 487 Syntax Description 488 ======================================== ================================================ 489 nv Indicates that instruction operates on 490 non-volatile memory. 491 ======================================== ================================================ 492 493MUBUF/MTBUF Modifiers 494--------------------- 495 496.. _amdgpu_synid_idxen: 497 498idxen 499~~~~~ 500 501Specifies whether address components include an index. By default, no components are used. 502 503Can be used together with :ref:`offen<amdgpu_synid_offen>`. 504 505Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. 506 507 ======================================== ================================================ 508 Syntax Description 509 ======================================== ================================================ 510 idxen Address components include an index. 511 ======================================== ================================================ 512 513.. _amdgpu_synid_offen: 514 515offen 516~~~~~ 517 518Specifies whether address components include an offset. By default, no components are used. 519 520Can be used together with :ref:`idxen<amdgpu_synid_idxen>`. 521 522Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. 523 524 ======================================== ================================================ 525 Syntax Description 526 ======================================== ================================================ 527 offen Address components include an offset. 528 ======================================== ================================================ 529 530.. _amdgpu_synid_addr64: 531 532addr64 533~~~~~~ 534 535Specifies whether a 64-bit address is used. By default, no address is used. 536 537GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and 538:ref:`idxen<amdgpu_synid_idxen>` modifiers. 539 540 ======================================== ================================================ 541 Syntax Description 542 ======================================== ================================================ 543 addr64 A 64-bit address is used. 544 ======================================== ================================================ 545 546.. _amdgpu_synid_buf_offset12: 547 548offset12 549~~~~~~~~ 550 551Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. 552 553 =============================== ====================================================== 554 Syntax Description 555 =============================== ====================================================== 556 offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive 557 :ref:`integer number <amdgpu_synid_integer_number>`. 558 =============================== ====================================================== 559 560Examples: 561 562.. parsed-literal:: 563 564 offset:0 565 offset:0x10 566 567glc 568~~~ 569 570See a description :ref:`here<amdgpu_synid_glc>`. 571 572slc 573~~~ 574 575See a description :ref:`here<amdgpu_synid_slc>`. 576 577.. _amdgpu_synid_lds: 578 579lds 580~~~ 581 582Specifies where to store the result: VGPRs or LDS (VGPRs by default). 583 584 ======================================== =========================== 585 Syntax Description 586 ======================================== =========================== 587 lds Store result in LDS. 588 ======================================== =========================== 589 590tfe 591~~~ 592 593See a description :ref:`here<amdgpu_synid_tfe>`. 594 595.. _amdgpu_synid_dfmt: 596 597dfmt 598~~~~ 599 600TBD 601 602.. _amdgpu_synid_nfmt: 603 604nfmt 605~~~~ 606 607TBD 608 609SMRD/SMEM Modifiers 610------------------- 611 612glc 613~~~ 614 615See a description :ref:`here<amdgpu_synid_glc>`. 616 617nv 618~~ 619 620See a description :ref:`here<amdgpu_synid_nv>`. 621 622VINTRP Modifiers 623---------------- 624 625.. _amdgpu_synid_high: 626 627high 628~~~~ 629 630Specifies which half of the LDS word to use. Low half of LDS word is used by default. 631GFX9 only. 632 633 ======================================== ================================ 634 Syntax Description 635 ======================================== ================================ 636 high Use high half of LDS word. 637 ======================================== ================================ 638 639VOP1/VOP2 DPP Modifiers 640----------------------- 641 642GFX8 and GFX9 only. 643 644.. _amdgpu_synid_dpp_ctrl: 645 646dpp_ctrl 647~~~~~~~~ 648 649Specifies how data are shared between threads. This is a mandatory modifier. 650There is no default value. 651 652Note. The lanes of a wavefront are organized in four banks and four rows. 653 654 ======================================== ================================================ 655 Syntax Description 656 ======================================== ================================================ 657 quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. 658 row_mirror Mirror threads within row. 659 row_half_mirror Mirror threads within 1/2 row (8 threads). 660 row_bcast:15 Broadcast 15th thread of each row to next row. 661 row_bcast:31 Broadcast thread 31 to rows 2 and 3. 662 wave_shl:1 Wavefront left shift by 1 thread. 663 wave_rol:1 Wavefront left rotate by 1 thread. 664 wave_shr:1 Wavefront right shift by 1 thread. 665 wave_ror:1 Wavefront right rotate by 1 thread. 666 row_shl:{1..15} Row shift left by 1-15 threads. 667 row_shr:{1..15} Row shift right by 1-15 threads. 668 row_ror:{1..15} Row rotate right by 1-15 threads. 669 ======================================== ================================================ 670 671Note: Numeric parameters may be specified as either 672:ref:`integer numbers<amdgpu_synid_integer_number>` or 673:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 674 675Examples: 676 677.. parsed-literal:: 678 679 quad_perm:[0, 1, 2, 3] 680 row_shl:3 681 682.. _amdgpu_synid_row_mask: 683 684row_mask 685~~~~~~~~ 686 687Controls which rows are enabled for data sharing. By default, all rows are enabled. 688 689Note. The lanes of a wavefront are organized in four banks and four rows. 690 691 ======================================== ===================================================== 692 Syntax Description 693 ======================================== ===================================================== 694 row_mask:{0..15} Specifies a *row mask* as a positive 695 :ref:`integer number <amdgpu_synid_integer_number>`. 696 697 Each of 4 bits in the mask controls one 698 row (0 - disabled, 1 - enabled). 699 ======================================== ===================================================== 700 701Examples: 702 703.. parsed-literal:: 704 705 row_mask:0xf 706 row_mask:0b1010 707 row_mask:0b1111 708 709.. _amdgpu_synid_bank_mask: 710 711bank_mask 712~~~~~~~~~ 713 714Controls which banks are enabled for data sharing. By default, all banks are enabled. 715 716Note. The lanes of a wavefront are organized in four banks and four rows. 717 718 ======================================== ======================================================= 719 Syntax Description 720 ======================================== ======================================================= 721 bank_mask:{0..15} Specifies a *bank mask* as a positive 722 :ref:`integer number <amdgpu_synid_integer_number>`. 723 724 Each of 4 bits in the mask controls one 725 bank (0 - disabled, 1 - enabled). 726 ======================================== ======================================================= 727 728Examples: 729 730.. parsed-literal:: 731 732 bank_mask:0x3 733 bank_mask:0b0011 734 bank_mask:0b1111 735 736.. _amdgpu_synid_bound_ctrl: 737 738bound_ctrl 739~~~~~~~~~~ 740 741Controls data sharing when accessing an invalid lane. By default, data sharing with 742invalid lanes is disabled. 743 744 ======================================== ================================================ 745 Syntax Description 746 ======================================== ================================================ 747 bound_ctrl:0 Enables data sharing with invalid lanes. 748 749 Accessing data from an invalid lane will 750 return zero. 751 ======================================== ================================================ 752 753VOP1/VOP2/VOPC SDWA Modifiers 754----------------------------- 755 756GFX8 and GFX9 only. 757 758clamp 759~~~~~ 760 761See a description :ref:`here<amdgpu_synid_clamp>`. 762 763omod 764~~~~ 765 766See a description :ref:`here<amdgpu_synid_omod>`. 767 768GFX9 only. 769 770.. _amdgpu_synid_dst_sel: 771 772dst_sel 773~~~~~~~ 774 775Selects which bits in the destination are affected. By default, all bits are affected. 776 777 ======================================== ================================================ 778 Syntax Description 779 ======================================== ================================================ 780 dst_sel:DWORD Use bits 31:0. 781 dst_sel:BYTE_0 Use bits 7:0. 782 dst_sel:BYTE_1 Use bits 15:8. 783 dst_sel:BYTE_2 Use bits 23:16. 784 dst_sel:BYTE_3 Use bits 31:24. 785 dst_sel:WORD_0 Use bits 15:0. 786 dst_sel:WORD_1 Use bits 31:16. 787 ======================================== ================================================ 788 789 790.. _amdgpu_synid_dst_unused: 791 792dst_unused 793~~~~~~~~~~ 794 795Controls what to do with the bits in the destination which are not selected 796by :ref:`dst_sel<amdgpu_synid_dst_sel>`. 797By default, unused bits are preserved. 798 799 ======================================== ================================================ 800 Syntax Description 801 ======================================== ================================================ 802 dst_unused:UNUSED_PAD Pad with zeros. 803 dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits. 804 dst_unused:UNUSED_PRESERVE Preserve bits. 805 ======================================== ================================================ 806 807.. _amdgpu_synid_src0_sel: 808 809src0_sel 810~~~~~~~~ 811 812Controls which bits in the src0 are used. By default, all bits are used. 813 814 ======================================== ================================================ 815 Syntax Description 816 ======================================== ================================================ 817 src0_sel:DWORD Use bits 31:0. 818 src0_sel:BYTE_0 Use bits 7:0. 819 src0_sel:BYTE_1 Use bits 15:8. 820 src0_sel:BYTE_2 Use bits 23:16. 821 src0_sel:BYTE_3 Use bits 31:24. 822 src0_sel:WORD_0 Use bits 15:0. 823 src0_sel:WORD_1 Use bits 31:16. 824 ======================================== ================================================ 825 826.. _amdgpu_synid_src1_sel: 827 828src1_sel 829~~~~~~~~ 830 831Controls which bits in the src1 are used. By default, all bits are used. 832 833 ======================================== ================================================ 834 Syntax Description 835 ======================================== ================================================ 836 src1_sel:DWORD Use bits 31:0. 837 src1_sel:BYTE_0 Use bits 7:0. 838 src1_sel:BYTE_1 Use bits 15:8. 839 src1_sel:BYTE_2 Use bits 23:16. 840 src1_sel:BYTE_3 Use bits 31:24. 841 src1_sel:WORD_0 Use bits 15:0. 842 src1_sel:WORD_1 Use bits 31:16. 843 ======================================== ================================================ 844 845.. _amdgpu_synid_sdwa_operand_modifiers: 846 847VOP1/VOP2/VOPC SDWA Operand Modifiers 848------------------------------------- 849 850Operand modifiers are not used separately. They are applied to source operands. 851 852GFX8 and GFX9 only. 853 854abs 855~~~ 856 857See a description :ref:`here<amdgpu_synid_abs>`. 858 859neg 860~~~ 861 862See a description :ref:`here<amdgpu_synid_neg>`. 863 864.. _amdgpu_synid_sext: 865 866sext 867~~~~ 868 869Sign-extends value of a (sub-dword) operand to fill all 32 bits. 870Has no effect for 32-bit operands. 871 872Valid for integer operands only. 873 874 ======================================== ================================================ 875 Syntax Description 876 ======================================== ================================================ 877 sext(<operand>) Sign-extend operand value. 878 ======================================== ================================================ 879 880Examples: 881 882.. parsed-literal:: 883 884 sext(v4) 885 sext(v255) 886 887VOP3 Modifiers 888-------------- 889 890.. _amdgpu_synid_vop3_op_sel: 891 892op_sel 893~~~~~~ 894 895Selects the low [15:0] or high [31:16] operand bits for source and destination operands. 896By default, low bits are used for all operands. 897 898The number of values specified with the op_sel modifier must match the number of instruction 899operands (both source and destination). First value controls src0, second value controls src1 900and so on, except that the last value controls destination. 901The value 0 selects the low bits, while 1 selects the high bits. 902 903Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified 904by op_sel must be 0. 905 906GFX9 only. 907 908 ======================================== ============================================================ 909 Syntax Description 910 ======================================== ============================================================ 911 op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand. 912 op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 913 op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 914 ======================================== ============================================================ 915 916Examples: 917 918.. parsed-literal:: 919 920 op_sel:[0,0] 921 op_sel:[0,1] 922 923.. _amdgpu_synid_clamp: 924 925clamp 926~~~~~ 927 928Clamp meaning depends on instruction. 929 930For *v_cmp* instructions, clamp modifier indicates that the compare signals 931if a floating point exception occurs. By default, signaling is disabled. 932Not supported by GFX7. 933 934For integer operations, clamp modifier indicates that the result must be clamped 935to the largest and smallest representable value. By default, there is no clamping. 936Integer clamping is not supported by GFX7. 937 938For floating point operations, clamp modifier indicates that the result must be clamped 939to the range [0.0, 1.0]. By default, there is no clamping. 940 941Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any). 942 943 ======================================== ================================================ 944 Syntax Description 945 ======================================== ================================================ 946 clamp Enables clamping (or signaling). 947 ======================================== ================================================ 948 949.. _amdgpu_synid_omod: 950 951omod 952~~~~ 953 954Specifies if an output modifier must be applied to the result. 955By default, no output modifiers are applied. 956 957Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any). 958 959Output modifiers are valid for f32 and f64 floating point results only. 960They must not be used with f16. 961 962Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result 963but accepts output modifiers. 964 965 ======================================== ================================================ 966 Syntax Description 967 ======================================== ================================================ 968 mul:2 Multiply the result by 2. 969 mul:4 Multiply the result by 4. 970 div:2 Multiply the result by 0.5. 971 ======================================== ================================================ 972 973.. _amdgpu_synid_vop3_operand_modifiers: 974 975VOP3 Operand Modifiers 976---------------------- 977 978Operand modifiers are not used separately. They are applied to source operands. 979 980.. _amdgpu_synid_abs: 981 982abs 983~~~ 984 985Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any). 986Valid for floating point operands only. 987 988 ======================================== ================================================ 989 Syntax Description 990 ======================================== ================================================ 991 abs(<operand>) Get absolute value of operand. 992 \|<operand>| The same as above. 993 ======================================== ================================================ 994 995Examples: 996 997.. parsed-literal:: 998 999 abs(v36) 1000 \|v36| 1001 1002.. _amdgpu_synid_neg: 1003 1004neg 1005~~~ 1006 1007Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any). 1008Valid for floating point operands only. 1009 1010 ======================================== ================================================ 1011 Syntax Description 1012 ======================================== ================================================ 1013 neg(<operand>) Get negative value of operand. 1014 -<operand> The same as above. 1015 ======================================== ================================================ 1016 1017Examples: 1018 1019.. parsed-literal:: 1020 1021 neg(v[0]) 1022 -v4 1023 1024VOP3P Modifiers 1025--------------- 1026 1027This section describes modifiers of *regular* VOP3P instructions. 1028 1029*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* 1030instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`. 1031 1032GFX9 only. 1033 1034.. _amdgpu_synid_op_sel: 1035 1036op_sel 1037~~~~~~ 1038 1039Selects the low [15:0] or high [31:16] operand bits as input to the operation 1040which results in the lower-half of the destination. 1041By default, low bits are used for all operands. 1042 1043The number of values specified by the *op_sel* modifier must match the number of source 1044operands. First value controls src0, second value controls src1 and so on. 1045 1046The value 0 selects the low bits, while 1 selects the high bits. 1047 1048 ================================= ============================================================= 1049 Syntax Description 1050 ================================= ============================================================= 1051 op_sel:[{0..1}] Select operand bits for instructions with 1 source operand. 1052 op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1053 op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1054 ================================= ============================================================= 1055 1056Examples: 1057 1058.. parsed-literal:: 1059 1060 op_sel:[0,0] 1061 op_sel:[0,1,0] 1062 1063.. _amdgpu_synid_op_sel_hi: 1064 1065op_sel_hi 1066~~~~~~~~~ 1067 1068Selects the low [15:0] or high [31:16] operand bits as input to the operation 1069which results in the upper-half of the destination. 1070By default, high bits are used for all operands. 1071 1072The number of values specified by the *op_sel_hi* modifier must match the number of source 1073operands. First value controls src0, second value controls src1 and so on. 1074 1075The value 0 selects the low bits, while 1 selects the high bits. 1076 1077 =================================== ============================================================= 1078 Syntax Description 1079 =================================== ============================================================= 1080 op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand. 1081 op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. 1082 op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. 1083 =================================== ============================================================= 1084 1085Examples: 1086 1087.. parsed-literal:: 1088 1089 op_sel_hi:[0,0] 1090 op_sel_hi:[0,0,1] 1091 1092.. _amdgpu_synid_neg_lo: 1093 1094neg_lo 1095~~~~~~ 1096 1097Specifies whether to change sign of operand values selected by 1098:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used 1099as input to the operation which results in the upper-half of the destination. 1100 1101The number of values specified by this modifier must match the number of source 1102operands. First value controls src0, second value controls src1 and so on. 1103 1104The value 0 indicates that the corresponding operand value is used unmodified, 1105the value 1 indicates that negative value of the operand must be used. 1106 1107By default, operand values are used unmodified. 1108 1109This modifier is valid for floating point operands only. 1110 1111 ================================ ================================================================== 1112 Syntax Description 1113 ================================ ================================================================== 1114 neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand. 1115 neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. 1116 neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. 1117 ================================ ================================================================== 1118 1119Examples: 1120 1121.. parsed-literal:: 1122 1123 neg_lo:[0] 1124 neg_lo:[0,1] 1125 1126.. _amdgpu_synid_neg_hi: 1127 1128neg_hi 1129~~~~~~ 1130 1131Specifies whether to change sign of operand values selected by 1132:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used 1133as input to the operation which results in the upper-half of the destination. 1134 1135The number of values specified by this modifier must match the number of source 1136operands. First value controls src0, second value controls src1 and so on. 1137 1138The value 0 indicates that the corresponding operand value is used unmodified, 1139the value 1 indicates that negative value of the operand must be used. 1140 1141By default, operand values are used unmodified. 1142 1143This modifier is valid for floating point operands only. 1144 1145 =============================== ================================================================== 1146 Syntax Description 1147 =============================== ================================================================== 1148 neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand. 1149 neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. 1150 neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. 1151 =============================== ================================================================== 1152 1153Examples: 1154 1155.. parsed-literal:: 1156 1157 neg_hi:[1,0] 1158 neg_hi:[0,1,1] 1159 1160clamp 1161~~~~~ 1162 1163See a description :ref:`here<amdgpu_synid_clamp>`. 1164 1165.. _amdgpu_synid_mad_mix: 1166 1167VOP3P V_MAD_MIX Modifiers 1168------------------------- 1169 1170*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions 1171use *op_sel* and *op_sel_hi* modifiers 1172in a manner different from *regular* VOP3P instructions. 1173 1174See a description below. 1175 1176GFX9 only. 1177 1178.. _amdgpu_synid_mad_mix_op_sel: 1179 1180m_op_sel 1181~~~~~~~~ 1182 1183This operand has meaning only for 16-bit source operands as indicated by 1184:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`. 1185It specifies to select either the low [15:0] or high [31:16] operand bits 1186as input to the operation. 1187 1188The number of values specified by the *op_sel* modifier must match the number of source 1189operands. First value controls src0, second value controls src1 and so on. 1190 1191The value 0 indicates the low bits, the value 1 indicates the high 16 bits. 1192 1193By default, low bits are used for all operands. 1194 1195 =============================== ================================================ 1196 Syntax Description 1197 =============================== ================================================ 1198 op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand. 1199 =============================== ================================================ 1200 1201Examples: 1202 1203.. parsed-literal:: 1204 1205 op_sel:[0,1] 1206 1207.. _amdgpu_synid_mad_mix_op_sel_hi: 1208 1209m_op_sel_hi 1210~~~~~~~~~~~ 1211 1212Selects the size of source operands: either 32 bits or 16 bits. 1213By default, 32 bits are used for all source operands. 1214 1215The number of values specified by the *op_sel_hi* modifier must match the number of source 1216operands. First value controls src0, second value controls src1 and so on. 1217 1218The value 0 indicates 32 bits, the value 1 indicates 16 bits. 1219 1220The location of 16 bits in the operand may be specified by 1221:ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`. 1222 1223 ======================================== ==================================== 1224 Syntax Description 1225 ======================================== ==================================== 1226 op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand. 1227 ======================================== ==================================== 1228 1229Examples: 1230 1231.. parsed-literal:: 1232 1233 op_sel_hi:[1,1,1] 1234 1235abs 1236~~~ 1237 1238See a description :ref:`here<amdgpu_synid_abs>`. 1239 1240neg 1241~~~ 1242 1243See a description :ref:`here<amdgpu_synid_neg>`. 1244 1245clamp 1246~~~~~ 1247 1248See a description :ref:`here<amdgpu_synid_clamp>`. 1249