1======================================
2Syntax of AMDGPU Instruction Modifiers
3======================================
4
5.. contents::
6   :local:
7
8Conventions
9===========
10
11The following notation is used throughout this document:
12
13    =================== =============================================================
14    Notation            Description
15    =================== =============================================================
16    {0..N}              Any integer value in the range from 0 to N (inclusive).
17    <x>                 Syntax and meaning of *x* is explained elsewhere.
18    =================== =============================================================
19
20.. _amdgpu_syn_modifiers:
21
22Modifiers
23=========
24
25DS Modifiers
26------------
27
28.. _amdgpu_synid_ds_offset8:
29
30offset8
31~~~~~~~
32
33Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
34
35Used with DS instructions which have 2 addresses.
36
37    =================== =====================================================
38    Syntax              Description
39    =================== =====================================================
40    offset:{0..0xFF}    Specifies an unsigned 8-bit offset as a positive
41                        :ref:`integer number <amdgpu_synid_integer_number>`.
42    =================== =====================================================
43
44Examples:
45
46.. parsed-literal::
47
48  offset:255
49  offset:0xff
50
51.. _amdgpu_synid_ds_offset16:
52
53offset16
54~~~~~~~~
55
56Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
57
58Used with DS instructions which have 1 address.
59
60    ==================== ======================================================
61    Syntax               Description
62    ==================== ======================================================
63    offset:{0..0xFFFF}   Specifies an unsigned 16-bit offset as a positive
64                         :ref:`integer number <amdgpu_synid_integer_number>`.
65    ==================== ======================================================
66
67Examples:
68
69.. parsed-literal::
70
71  offset:65535
72  offset:0xffff
73
74.. _amdgpu_synid_sw_offset16:
75
76swizzle pattern
77~~~~~~~~~~~~~~~
78
79This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
80It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
81
82See AMD documentation for more information.
83
84    ======================================================= ===========================================================
85    Syntax                                                  Description
86    ======================================================= ===========================================================
87    offset:{0..0xFFFF}                                      Specifies a 16-bit swizzle pattern.
88    offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3})   Specifies a quad permute mode pattern
89
90                                                            Each number is a lane *id*.
91    offset:swizzle(BITMASK_PERM, "<mask>")                  Specifies a bitmask permute mode pattern.
92
93                                                            The pattern converts a 5-bit lane *id* to another
94                                                            lane *id* with which the lane interacts.
95
96                                                            *mask* is a 5 character sequence which
97                                                            specifies how to transform the bits of the
98                                                            lane *id*.
99
100                                                            The following characters are allowed:
101
102                                                            * "0" - set bit to 0.
103
104                                                            * "1" - set bit to 1.
105
106                                                            * "p" - preserve bit.
107
108                                                            * "i" - inverse bit.
109
110    offset:swizzle(BROADCAST,{2..32},{0..N})                Specifies a broadcast mode.
111
112                                                            Broadcasts the value of any particular lane to
113                                                            all lanes in its group.
114
115                                                            The first numeric parameter is a group
116                                                            size and must be equal to 2, 4, 8, 16 or 32.
117
118                                                            The second numeric parameter is an index of the
119                                                            lane being broadcasted.
120
121                                                            The index must not exceed group size.
122    offset:swizzle(SWAP,{1..16})                            Specifies a swap mode.
123
124                                                            Swaps the neighboring groups of
125                                                            1, 2, 4, 8 or 16 lanes.
126    offset:swizzle(REVERSE,{2..32})                         Specifies a reverse mode.
127
128                                                            Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
129    ======================================================= ===========================================================
130
131Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
132:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
133
134Examples:
135
136.. parsed-literal::
137
138  offset:255
139  offset:0xffff
140  offset:swizzle(QUAD_PERM, 0, 1, 2 ,3)
141  offset:swizzle(BITMASK_PERM, "01pi0")
142  offset:swizzle(BROADCAST, 2, 0)
143  offset:swizzle(SWAP, 8)
144  offset:swizzle(REVERSE, 30 + 2)
145
146.. _amdgpu_synid_gds:
147
148gds
149~~~
150
151Specifies whether to use GDS or LDS memory (LDS is the default).
152
153    ======================================== ================================================
154    Syntax                                   Description
155    ======================================== ================================================
156    gds                                      Use GDS memory.
157    ======================================== ================================================
158
159
160EXP Modifiers
161-------------
162
163.. _amdgpu_synid_done:
164
165done
166~~~~
167
168Specifies if this is the last export from the shader to the target. By default,
169*exp* instruction does not finish an export sequence.
170
171    ======================================== ================================================
172    Syntax                                   Description
173    ======================================== ================================================
174    done                                     Indicates the last export operation.
175    ======================================== ================================================
176
177.. _amdgpu_synid_compr:
178
179compr
180~~~~~
181
182Indicates if the data are compressed (data are not compressed by default).
183
184    ======================================== ================================================
185    Syntax                                   Description
186    ======================================== ================================================
187    compr                                    Data are compressed.
188    ======================================== ================================================
189
190.. _amdgpu_synid_vm:
191
192vm
193~~
194
195Specifies valid mask flag state (off by default).
196
197    ======================================== ================================================
198    Syntax                                   Description
199    ======================================== ================================================
200    vm                                       Set valid mask flag.
201    ======================================== ================================================
202
203FLAT Modifiers
204--------------
205
206.. _amdgpu_synid_flat_offset12:
207
208offset12
209~~~~~~~~
210
211Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
212
213Cannot be used with *global/scratch* opcodes. GFX9 only.
214
215    ================= ======================================================
216    Syntax            Description
217    ================= ======================================================
218    offset:{0..4095}  Specifies a 12-bit unsigned offset as a positive
219                      :ref:`integer number <amdgpu_synid_integer_number>`.
220    ================= ======================================================
221
222Examples:
223
224.. parsed-literal::
225
226  offset:4095
227  offset:0xff
228
229.. _amdgpu_synid_flat_offset13s:
230
231offset13s
232~~~~~~~~~
233
234Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
235
236Can be used with *global/scratch* opcodes only. GFX9 only.
237
238    ============================ =======================================================
239    Syntax                       Description
240    ============================ =======================================================
241    offset:{-4096..4095}         Specifies a 13-bit signed offset as an
242                                 :ref:`integer number <amdgpu_synid_integer_number>`.
243    ============================ =======================================================
244
245Examples:
246
247.. parsed-literal::
248
249  offset:-4000
250  offset:0x10
251
252.. _amdgpu_synid_flat_offset12s:
253
254offset12s
255~~~~~~~~~
256
257Specifies an immediate signed 12-bit offset, in bytes. The default value is 0.
258
259Can be used with *global/scratch* opcodes only.
260
261GFX10 only.
262
263    ============================ =======================================================
264    Syntax                       Description
265    ============================ =======================================================
266    offset:{-2048..2047}         Specifies a 12-bit signed offset as an
267                                 :ref:`integer number <amdgpu_synid_integer_number>`.
268    ============================ =======================================================
269
270Examples:
271
272.. parsed-literal::
273
274  offset:-2000
275  offset:0x10
276
277.. _amdgpu_synid_flat_offset11:
278
279offset11
280~~~~~~~~
281
282Specifies an immediate unsigned 11-bit offset, in bytes. The default value is 0.
283
284Cannot be used with *global/scratch* opcodes.
285
286GFX10 only.
287
288    ================= ======================================================
289    Syntax            Description
290    ================= ======================================================
291    offset:{0..2047}  Specifies an 11-bit unsigned offset as a positive
292                      :ref:`integer number <amdgpu_synid_integer_number>`.
293    ================= ======================================================
294
295Examples:
296
297.. parsed-literal::
298
299  offset:2047
300  offset:0xff
301
302dlc
303~~~
304
305See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
306
307glc
308~~~
309
310See a description :ref:`here<amdgpu_synid_glc>`.
311
312lds
313~~~
314
315See a description :ref:`here<amdgpu_synid_lds>`. GFX10 only.
316
317slc
318~~~
319
320See a description :ref:`here<amdgpu_synid_slc>`.
321
322tfe
323~~~
324
325See a description :ref:`here<amdgpu_synid_tfe>`.
326
327nv
328~~
329
330See a description :ref:`here<amdgpu_synid_nv>`.
331
332MIMG Modifiers
333--------------
334
335.. _amdgpu_synid_dmask:
336
337dmask
338~~~~~
339
340Specifies which channels (image components) are used by the operation. By default, no channels
341are used.
342
343    =============== =====================================================
344    Syntax          Description
345    =============== =====================================================
346    dmask:{0..15}   Specifies image channels as a positive
347                    :ref:`integer number <amdgpu_synid_integer_number>`.
348
349                    Each bit corresponds to one of 4 image
350                    components (RGBA).
351
352                    If the specified bit value
353                    is 0, the component is not used, value 1 means
354                    that the component is used.
355    =============== =====================================================
356
357This modifier has some limitations depending on instruction kind:
358
359    =================================================== ========================
360    Instruction Kind                                    Valid dmask Values
361    =================================================== ========================
362    32-bit atomic *cmpswap*                             0x3
363    32-bit atomic instructions except for *cmpswap*     0x1
364    64-bit atomic *cmpswap*                             0xF
365    64-bit atomic instructions except for *cmpswap*     0x3
366    *gather4*                                           0x1, 0x2, 0x4, 0x8
367    Other instructions                                  any value
368    =================================================== ========================
369
370Examples:
371
372.. parsed-literal::
373
374  dmask:0xf
375  dmask:0b1111
376  dmask:3
377
378.. _amdgpu_synid_unorm:
379
380unorm
381~~~~~
382
383Specifies whether the address is normalized or not (the address is normalized by default).
384
385    ======================== ========================================
386    Syntax                   Description
387    ======================== ========================================
388    unorm                    Force the address to be unnormalized.
389    ======================== ========================================
390
391glc
392~~~
393
394See a description :ref:`here<amdgpu_synid_glc>`.
395
396slc
397~~~
398
399See a description :ref:`here<amdgpu_synid_slc>`.
400
401.. _amdgpu_synid_r128:
402
403r128
404~~~~
405
406Specifies texture resource size. The default size is 256 bits.
407
408GFX7, GFX8 and GFX10 only.
409
410    =================== ================================================
411    Syntax              Description
412    =================== ================================================
413    r128                Specifies 128 bits texture resource size.
414    =================== ================================================
415
416.. WARNING:: Using this modifier should descrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature.
417
418tfe
419~~~
420
421See a description :ref:`here<amdgpu_synid_tfe>`.
422
423.. _amdgpu_synid_lwe:
424
425lwe
426~~~
427
428Specifies LOD warning status (LOD warning is disabled by default).
429
430    ======================================== ================================================
431    Syntax                                   Description
432    ======================================== ================================================
433    lwe                                      Enables LOD warning.
434    ======================================== ================================================
435
436.. _amdgpu_synid_da:
437
438da
439~~
440
441Specifies if an array index must be sent to TA. By default, array index is not sent.
442
443    ======================================== ================================================
444    Syntax                                   Description
445    ======================================== ================================================
446    da                                       Send an array-index to TA.
447    ======================================== ================================================
448
449.. _amdgpu_synid_d16:
450
451d16
452~~~
453
454Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
455
456    ======================================== ================================================
457    Syntax                                   Description
458    ======================================== ================================================
459    d16                                      Enables 16-bits data mode.
460
461                                             On loads, convert data in memory to 16-bit
462                                             format before storing it in VGPRs.
463
464                                             For stores, convert 16-bit data in VGPRs to
465                                             32 bits before going to memory.
466
467                                             Note that GFX8.0 does not support data packing.
468                                             Each 16-bit data element occupies 1 VGPR.
469
470                                             GFX8.1, GFX9 and GFX10 support data packing.
471                                             Each pair of 16-bit data elements
472                                             occupies 1 VGPR.
473    ======================================== ================================================
474
475.. _amdgpu_synid_a16:
476
477a16
478~~~
479
480Specifies size of image address components: 16 or 32 bits (32 bits by default).
481GFX9 and GFX10 only.
482
483    ======================================== ================================================
484    Syntax                                   Description
485    ======================================== ================================================
486    a16                                      Enables 16-bits image address components.
487    ======================================== ================================================
488
489.. _amdgpu_synid_dim:
490
491dim
492~~~
493
494Specifies surface dimension. This is a mandatory modifier. There is no default value.
495
496GFX10 only.
497
498    =============================== =========================================================
499    Syntax                          Description
500    =============================== =========================================================
501    dim:1D                          One-dimensional image.
502    dim:2D                          Two-dimensional image.
503    dim:3D                          Three-dimensional image.
504    dim:CUBE                        Cubemap array.
505    dim:1D_ARRAY                    One-dimensional image array.
506    dim:2D_ARRAY                    Two-dimensional image array.
507    dim:2D_MSAA                     Two-dimensional multi-sample auto-aliasing image.
508    dim:2D_MSAA_ARRAY               Two-dimensional multi-sample auto-aliasing image array.
509    =============================== =========================================================
510
511The following table defines an alternative syntax which is supported
512for compatibility with SP3 assembler:
513
514    =============================== =========================================================
515    Syntax                          Description
516    =============================== =========================================================
517    dim:SQ_RSRC_IMG_1D              One-dimensional image.
518    dim:SQ_RSRC_IMG_2D              Two-dimensional image.
519    dim:SQ_RSRC_IMG_3D              Three-dimensional image.
520    dim:SQ_RSRC_IMG_CUBE            Cubemap array.
521    dim:SQ_RSRC_IMG_1D_ARRAY        One-dimensional image array.
522    dim:SQ_RSRC_IMG_2D_ARRAY        Two-dimensional image array.
523    dim:SQ_RSRC_IMG_2D_MSAA         Two-dimensional multi-sample auto-aliasing image.
524    dim:SQ_RSRC_IMG_2D_MSAA_ARRAY   Two-dimensional multi-sample auto-aliasing image array.
525    =============================== =========================================================
526
527dlc
528~~~
529
530See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
531
532Miscellaneous Modifiers
533-----------------------
534
535.. _amdgpu_synid_dlc:
536
537dlc
538~~~
539
540Controls device level cache policy for memory operations. Used for synchronization.
541When specified, forces operation to bypass device level cache making the operation device
542level coherent. By default, instructions use device level cache.
543
544GFX10 only.
545
546    ======================================== ================================================
547    Syntax                                   Description
548    ======================================== ================================================
549    dlc                                      Bypass device level cache.
550    ======================================== ================================================
551
552.. _amdgpu_synid_glc:
553
554glc
555~~~
556
557This modifier has different meaning for loads, stores, and atomic operations.
558The default value is off (0).
559
560See AMD documentation for details.
561
562    ======================================== ================================================
563    Syntax                                   Description
564    ======================================== ================================================
565    glc                                      Set glc bit to 1.
566    ======================================== ================================================
567
568.. _amdgpu_synid_lds:
569
570lds
571~~~
572
573Specifies where to store the result: VGPRs or LDS (VGPRs by default).
574
575    ======================================== ===========================
576    Syntax                                   Description
577    ======================================== ===========================
578    lds                                      Store result in LDS.
579    ======================================== ===========================
580
581.. _amdgpu_synid_nv:
582
583nv
584~~
585
586Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
587
588GFX9 only.
589
590    ======================================== ================================================
591    Syntax                                   Description
592    ======================================== ================================================
593    nv                                       Indicates that instruction operates on
594                                             non-volatile memory.
595    ======================================== ================================================
596
597.. _amdgpu_synid_slc:
598
599slc
600~~~
601
602Specifies cache policy. The default value is off (0).
603
604See AMD documentation for details.
605
606    ======================================== ================================================
607    Syntax                                   Description
608    ======================================== ================================================
609    slc                                      Set slc bit to 1.
610    ======================================== ================================================
611
612.. _amdgpu_synid_tfe:
613
614tfe
615~~~
616
617Controls access to partially resident textures. The default value is off (0).
618
619See AMD documentation for details.
620
621    ======================================== ================================================
622    Syntax                                   Description
623    ======================================== ================================================
624    tfe                                      Set tfe bit to 1.
625    ======================================== ================================================
626
627MUBUF/MTBUF Modifiers
628---------------------
629
630.. _amdgpu_synid_idxen:
631
632idxen
633~~~~~
634
635Specifies whether address components include an index. By default, no components are used.
636
637Can be used together with :ref:`offen<amdgpu_synid_offen>`.
638
639Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
640
641    ======================================== ================================================
642    Syntax                                   Description
643    ======================================== ================================================
644    idxen                                    Address components include an index.
645    ======================================== ================================================
646
647.. _amdgpu_synid_offen:
648
649offen
650~~~~~
651
652Specifies whether address components include an offset. By default, no components are used.
653
654Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
655
656Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
657
658    ======================================== ================================================
659    Syntax                                   Description
660    ======================================== ================================================
661    offen                                    Address components include an offset.
662    ======================================== ================================================
663
664.. _amdgpu_synid_addr64:
665
666addr64
667~~~~~~
668
669Specifies whether a 64-bit address is used. By default, no address is used.
670
671GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
672:ref:`idxen<amdgpu_synid_idxen>` modifiers.
673
674    ======================================== ================================================
675    Syntax                                   Description
676    ======================================== ================================================
677    addr64                                   A 64-bit address is used.
678    ======================================== ================================================
679
680.. _amdgpu_synid_buf_offset12:
681
682offset12
683~~~~~~~~
684
685Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
686
687    =============================== ======================================================
688    Syntax                          Description
689    =============================== ======================================================
690    offset:{0..0xFFF}               Specifies a 12-bit unsigned offset as a positive
691                                    :ref:`integer number <amdgpu_synid_integer_number>`.
692    =============================== ======================================================
693
694Examples:
695
696.. parsed-literal::
697
698  offset:0
699  offset:0x10
700
701glc
702~~~
703
704See a description :ref:`here<amdgpu_synid_glc>`.
705
706slc
707~~~
708
709See a description :ref:`here<amdgpu_synid_slc>`.
710
711lds
712~~~
713
714See a description :ref:`here<amdgpu_synid_lds>`.
715
716dlc
717~~~
718
719See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
720
721tfe
722~~~
723
724See a description :ref:`here<amdgpu_synid_tfe>`.
725
726.. _amdgpu_synid_dfmt:
727
728dfmt
729~~~~
730
731TBD
732
733.. _amdgpu_synid_nfmt:
734
735nfmt
736~~~~
737
738TBD
739
740SMRD/SMEM Modifiers
741-------------------
742
743glc
744~~~
745
746See a description :ref:`here<amdgpu_synid_glc>`.
747
748nv
749~~
750
751See a description :ref:`here<amdgpu_synid_nv>`. GFX9 only.
752
753dlc
754~~~
755
756See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
757
758VINTRP Modifiers
759----------------
760
761.. _amdgpu_synid_high:
762
763high
764~~~~
765
766Specifies which half of the LDS word to use. Low half of LDS word is used by default.
767GFX9 and GFX10 only.
768
769    ======================================== ================================
770    Syntax                                   Description
771    ======================================== ================================
772    high                                     Use high half of LDS word.
773    ======================================== ================================
774
775DPP8 Modifiers
776--------------
777
778GFX10 only.
779
780.. _amdgpu_synid_dpp8_sel:
781
782dpp8_sel
783~~~~~~~~
784
785Selects which lane to pull data from, within a group of 8 lanes. This is a mandatory modifier.
786There is no default value.
787
788GFX10 only.
789
790The *dpp8_sel* modifier must specify exactly 8 values, each ranging from 0 to 7.
791First value selects which lane to read from to supply data into lane 0.
792Second value controls value for lane 1 and so on.
793
794    =============================================================== ===========================
795    Syntax                                                          Description
796    =============================================================== ===========================
797    dpp8:[{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7}]  Select lanes to read from.
798    =============================================================== ===========================
799
800Examples:
801
802.. parsed-literal::
803
804  dpp8:[7,6,5,4,3,2,1,0]
805  dpp8:[0,1,0,1,0,1,0,1]
806
807.. _amdgpu_synid_fi8:
808
809fi
810~~
811
812Controls interaction with inactive lanes for *dpp8* instructions. The default value is zero.
813
814Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
815
816GFX10 only.
817
818    ==================================== =====================================================
819    Syntax                               Description
820    ==================================== =====================================================
821    fi:0                                 Fetch zero when accessing data from inactive lanes.
822    fi:1                                 Fetch pre-exist values from inactive lanes.
823    ==================================== =====================================================
824
825DPP/DPP16 Modifiers
826-------------------
827
828GFX8, GFX9 and GFX10 only.
829
830.. _amdgpu_synid_dpp_ctrl:
831
832dpp_ctrl
833~~~~~~~~
834
835Specifies how data are shared between threads. This is a mandatory modifier.
836There is no default value.
837
838GFX8 and GFX9 only. Use :ref:`dpp16_ctrl<amdgpu_synid_dpp16_ctrl>` for GFX10.
839
840Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
841
842    ======================================== ================================================
843    Syntax                                   Description
844    ======================================== ================================================
845    quad_perm:[{0..3},{0..3},{0..3},{0..3}]  Full permute of 4 threads.
846    row_mirror                               Mirror threads within row.
847    row_half_mirror                          Mirror threads within 1/2 row (8 threads).
848    row_bcast:15                             Broadcast 15th thread of each row to next row.
849    row_bcast:31                             Broadcast thread 31 to rows 2 and 3.
850    wave_shl:1                               Wavefront left shift by 1 thread.
851    wave_rol:1                               Wavefront left rotate by 1 thread.
852    wave_shr:1                               Wavefront right shift by 1 thread.
853    wave_ror:1                               Wavefront right rotate by 1 thread.
854    row_shl:{1..15}                          Row shift left by 1-15 threads.
855    row_shr:{1..15}                          Row shift right by 1-15 threads.
856    row_ror:{1..15}                          Row rotate right by 1-15 threads.
857    ======================================== ================================================
858
859Note: Numeric parameters may be specified as either
860:ref:`integer numbers<amdgpu_synid_integer_number>` or
861:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
862
863Examples:
864
865.. parsed-literal::
866
867  quad_perm:[0, 1, 2, 3]
868  row_shl:3
869
870.. _amdgpu_synid_dpp16_ctrl:
871
872dpp16_ctrl
873~~~~~~~~~~
874
875Specifies how data are shared between threads. This is a mandatory modifier.
876There is no default value.
877
878GFX10 only. Use :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` for GFX8 and GFX9.
879
880Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
881(There are only two rows in *wave32* mode.)
882
883    ======================================== ====================================================
884    Syntax                                   Description
885    ======================================== ====================================================
886    quad_perm:[{0..3},{0..3},{0..3},{0..3}]  Full permute of 4 threads.
887    row_mirror                               Mirror threads within row.
888    row_half_mirror                          Mirror threads within 1/2 row (8 threads).
889    row_share:{0..15}                        Share the value from the specified lane with other
890                                             lanes in the row.
891    row_xmask:{0..15}                        Fetch from XOR(current lane id, specified lane id).
892    row_shl:{1..15}                          Row shift left by 1-15 threads.
893    row_shr:{1..15}                          Row shift right by 1-15 threads.
894    row_ror:{1..15}                          Row rotate right by 1-15 threads.
895    ======================================== ====================================================
896
897Note: Numeric parameters may be specified as either
898:ref:`integer numbers<amdgpu_synid_integer_number>` or
899:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
900
901Examples:
902
903.. parsed-literal::
904
905  quad_perm:[0, 1, 2, 3]
906  row_shl:3
907
908.. _amdgpu_synid_row_mask:
909
910row_mask
911~~~~~~~~
912
913Controls which rows are enabled for data sharing. By default, all rows are enabled.
914
915Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
916(There are only two rows in *wave32* mode.)
917
918    ======================================== =====================================================
919    Syntax                                   Description
920    ======================================== =====================================================
921    row_mask:{0..15}                         Specifies a *row mask* as a positive
922                                             :ref:`integer number <amdgpu_synid_integer_number>`.
923
924                                             Each of 4 bits in the mask controls one
925                                             row (0 - disabled, 1 - enabled).
926
927                                             In *wave32* mode the values should be limited to
928                                             {0..7}.
929    ======================================== =====================================================
930
931Examples:
932
933.. parsed-literal::
934
935  row_mask:0xf
936  row_mask:0b1010
937  row_mask:0b1111
938
939.. _amdgpu_synid_bank_mask:
940
941bank_mask
942~~~~~~~~~
943
944Controls which banks are enabled for data sharing. By default, all banks are enabled.
945
946Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
947(There are only two rows in *wave32* mode.)
948
949    ======================================== =======================================================
950    Syntax                                   Description
951    ======================================== =======================================================
952    bank_mask:{0..15}                        Specifies a *bank mask* as a positive
953                                             :ref:`integer number <amdgpu_synid_integer_number>`.
954
955                                             Each of 4 bits in the mask controls one
956                                             bank (0 - disabled, 1 - enabled).
957    ======================================== =======================================================
958
959Examples:
960
961.. parsed-literal::
962
963  bank_mask:0x3
964  bank_mask:0b0011
965  bank_mask:0b1111
966
967.. _amdgpu_synid_bound_ctrl:
968
969bound_ctrl
970~~~~~~~~~~
971
972Controls data sharing when accessing an invalid lane. By default, data sharing with
973invalid lanes is disabled.
974
975    ======================================== ================================================
976    Syntax                                   Description
977    ======================================== ================================================
978    bound_ctrl:0                             Enables data sharing with invalid lanes.
979
980                                             Accessing data from an invalid lane will
981                                             return zero.
982    ======================================== ================================================
983
984.. _amdgpu_synid_fi16:
985
986fi
987~~
988
989Controls interaction with *inactive* lanes for *dpp16* instructions. The default value is zero.
990
991Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
992
993GFX10 only.
994
995    ======================================== ==================================================
996    Syntax                                   Description
997    ======================================== ==================================================
998    fi:0                                     Interaction with inactive lanes is controlled by
999                                             :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`.
1000
1001    fi:1                                     Fetch pre-exist values from inactive lanes.
1002    ======================================== ==================================================
1003
1004SDWA Modifiers
1005--------------
1006
1007GFX8, GFX9 and GFX10 only.
1008
1009clamp
1010~~~~~
1011
1012See a description :ref:`here<amdgpu_synid_clamp>`.
1013
1014omod
1015~~~~
1016
1017See a description :ref:`here<amdgpu_synid_omod>`.
1018
1019GFX9 and GFX10 only.
1020
1021.. _amdgpu_synid_dst_sel:
1022
1023dst_sel
1024~~~~~~~
1025
1026Selects which bits in the destination are affected. By default, all bits are affected.
1027
1028    ======================================== ================================================
1029    Syntax                                   Description
1030    ======================================== ================================================
1031    dst_sel:DWORD                            Use bits 31:0.
1032    dst_sel:BYTE_0                           Use bits 7:0.
1033    dst_sel:BYTE_1                           Use bits 15:8.
1034    dst_sel:BYTE_2                           Use bits 23:16.
1035    dst_sel:BYTE_3                           Use bits 31:24.
1036    dst_sel:WORD_0                           Use bits 15:0.
1037    dst_sel:WORD_1                           Use bits 31:16.
1038    ======================================== ================================================
1039
1040
1041.. _amdgpu_synid_dst_unused:
1042
1043dst_unused
1044~~~~~~~~~~
1045
1046Controls what to do with the bits in the destination which are not selected
1047by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
1048By default, unused bits are preserved.
1049
1050    ======================================== ================================================
1051    Syntax                                   Description
1052    ======================================== ================================================
1053    dst_unused:UNUSED_PAD                    Pad with zeros.
1054    dst_unused:UNUSED_SEXT                   Sign-extend upper bits, zero lower bits.
1055    dst_unused:UNUSED_PRESERVE               Preserve bits.
1056    ======================================== ================================================
1057
1058.. _amdgpu_synid_src0_sel:
1059
1060src0_sel
1061~~~~~~~~
1062
1063Controls which bits in the src0 are used. By default, all bits are used.
1064
1065    ======================================== ================================================
1066    Syntax                                   Description
1067    ======================================== ================================================
1068    src0_sel:DWORD                           Use bits 31:0.
1069    src0_sel:BYTE_0                          Use bits 7:0.
1070    src0_sel:BYTE_1                          Use bits 15:8.
1071    src0_sel:BYTE_2                          Use bits 23:16.
1072    src0_sel:BYTE_3                          Use bits 31:24.
1073    src0_sel:WORD_0                          Use bits 15:0.
1074    src0_sel:WORD_1                          Use bits 31:16.
1075    ======================================== ================================================
1076
1077.. _amdgpu_synid_src1_sel:
1078
1079src1_sel
1080~~~~~~~~
1081
1082Controls which bits in the src1 are used. By default, all bits are used.
1083
1084    ======================================== ================================================
1085    Syntax                                   Description
1086    ======================================== ================================================
1087    src1_sel:DWORD                           Use bits 31:0.
1088    src1_sel:BYTE_0                          Use bits 7:0.
1089    src1_sel:BYTE_1                          Use bits 15:8.
1090    src1_sel:BYTE_2                          Use bits 23:16.
1091    src1_sel:BYTE_3                          Use bits 31:24.
1092    src1_sel:WORD_0                          Use bits 15:0.
1093    src1_sel:WORD_1                          Use bits 31:16.
1094    ======================================== ================================================
1095
1096.. _amdgpu_synid_sdwa_operand_modifiers:
1097
1098SDWA Operand Modifiers
1099----------------------
1100
1101Operand modifiers are not used separately. They are applied to source operands.
1102
1103GFX8, GFX9 and GFX10 only.
1104
1105abs
1106~~~
1107
1108See a description :ref:`here<amdgpu_synid_abs>`.
1109
1110neg
1111~~~
1112
1113See a description :ref:`here<amdgpu_synid_neg>`.
1114
1115.. _amdgpu_synid_sext:
1116
1117sext
1118~~~~
1119
1120Sign-extends value of a (sub-dword) operand to fill all 32 bits.
1121Has no effect for 32-bit operands.
1122
1123Valid for integer operands only.
1124
1125    ======================================== ================================================
1126    Syntax                                   Description
1127    ======================================== ================================================
1128    sext(<operand>)                          Sign-extend operand value.
1129    ======================================== ================================================
1130
1131Examples:
1132
1133.. parsed-literal::
1134
1135  sext(v4)
1136  sext(v255)
1137
1138VOP3 Modifiers
1139--------------
1140
1141.. _amdgpu_synid_vop3_op_sel:
1142
1143op_sel
1144~~~~~~
1145
1146Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
1147By default, low bits are used for all operands.
1148
1149The number of values specified with the op_sel modifier must match the number of instruction
1150operands (both source and destination). First value controls src0, second value controls src1
1151and so on, except that the last value controls destination.
1152The value 0 selects the low bits, while 1 selects the high bits.
1153
1154Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
1155by op_sel must be 0.
1156
1157GFX9 and GFX10 only.
1158
1159    ======================================== ============================================================
1160    Syntax                                   Description
1161    ======================================== ============================================================
1162    op_sel:[{0..1},{0..1}]                   Select operand bits for instructions with 1 source operand.
1163    op_sel:[{0..1},{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
1164    op_sel:[{0..1},{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
1165    ======================================== ============================================================
1166
1167Examples:
1168
1169.. parsed-literal::
1170
1171  op_sel:[0,0]
1172  op_sel:[0,1]
1173
1174.. _amdgpu_synid_clamp:
1175
1176clamp
1177~~~~~
1178
1179Clamp meaning depends on instruction.
1180
1181For *v_cmp* instructions, clamp modifier indicates that the compare signals
1182if a floating point exception occurs. By default, signaling is disabled.
1183Not supported by GFX7.
1184
1185For integer operations, clamp modifier indicates that the result must be clamped
1186to the largest and smallest representable value. By default, there is no clamping.
1187Integer clamping is not supported by GFX7.
1188
1189For floating point operations, clamp modifier indicates that the result must be clamped
1190to the range [0.0, 1.0]. By default, there is no clamping.
1191
1192Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
1193
1194    ======================================== ================================================
1195    Syntax                                   Description
1196    ======================================== ================================================
1197    clamp                                    Enables clamping (or signaling).
1198    ======================================== ================================================
1199
1200.. _amdgpu_synid_omod:
1201
1202omod
1203~~~~
1204
1205Specifies if an output modifier must be applied to the result.
1206By default, no output modifiers are applied.
1207
1208Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
1209
1210Output modifiers are valid for f32 and f64 floating point results only.
1211They must not be used with f16.
1212
1213Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
1214but accepts output modifiers.
1215
1216    ======================================== ================================================
1217    Syntax                                   Description
1218    ======================================== ================================================
1219    mul:2                                    Multiply the result by 2.
1220    mul:4                                    Multiply the result by 4.
1221    div:2                                    Multiply the result by 0.5.
1222    ======================================== ================================================
1223
1224.. _amdgpu_synid_vop3_operand_modifiers:
1225
1226VOP3 Operand Modifiers
1227----------------------
1228
1229Operand modifiers are not used separately. They are applied to source operands.
1230
1231.. _amdgpu_synid_abs:
1232
1233abs
1234~~~
1235
1236Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
1237Valid for floating point operands only.
1238
1239    ======================================== ================================================
1240    Syntax                                   Description
1241    ======================================== ================================================
1242    abs(<operand>)                           Get absolute value of operand.
1243    \|<operand>|                             The same as above.
1244    ======================================== ================================================
1245
1246Examples:
1247
1248.. parsed-literal::
1249
1250  abs(v36)
1251  \|v36|
1252
1253.. _amdgpu_synid_neg:
1254
1255neg
1256~~~
1257
1258Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
1259Valid for floating point operands only.
1260
1261    ======================================== ================================================
1262    Syntax                                   Description
1263    ======================================== ================================================
1264    neg(<operand>)                           Get negative value of operand.
1265    -<operand>                               The same as above.
1266    ======================================== ================================================
1267
1268Examples:
1269
1270.. parsed-literal::
1271
1272  neg(v[0])
1273  -v4
1274
1275VOP3P Modifiers
1276---------------
1277
1278This section describes modifiers of *regular* VOP3P instructions.
1279
1280*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16*
1281instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
1282
1283GFX9 and GFX10 only.
1284
1285.. _amdgpu_synid_op_sel:
1286
1287op_sel
1288~~~~~~
1289
1290Selects the low [15:0] or high [31:16] operand bits as input to the operation
1291which results in the lower-half of the destination.
1292By default, low bits are used for all operands.
1293
1294The number of values specified by the *op_sel* modifier must match the number of source
1295operands. First value controls src0, second value controls src1 and so on.
1296
1297The value 0 selects the low bits, while 1 selects the high bits.
1298
1299    ================================= =============================================================
1300    Syntax                            Description
1301    ================================= =============================================================
1302    op_sel:[{0..1}]                   Select operand bits for instructions with 1 source operand.
1303    op_sel:[{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
1304    op_sel:[{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
1305    ================================= =============================================================
1306
1307Examples:
1308
1309.. parsed-literal::
1310
1311  op_sel:[0,0]
1312  op_sel:[0,1,0]
1313
1314.. _amdgpu_synid_op_sel_hi:
1315
1316op_sel_hi
1317~~~~~~~~~
1318
1319Selects the low [15:0] or high [31:16] operand bits as input to the operation
1320which results in the upper-half of the destination.
1321By default, high bits are used for all operands.
1322
1323The number of values specified by the *op_sel_hi* modifier must match the number of source
1324operands. First value controls src0, second value controls src1 and so on.
1325
1326The value 0 selects the low bits, while 1 selects the high bits.
1327
1328    =================================== =============================================================
1329    Syntax                              Description
1330    =================================== =============================================================
1331    op_sel_hi:[{0..1}]                  Select operand bits for instructions with 1 source operand.
1332    op_sel_hi:[{0..1},{0..1}]           Select operand bits for instructions with 2 source operands.
1333    op_sel_hi:[{0..1},{0..1},{0..1}]    Select operand bits for instructions with 3 source operands.
1334    =================================== =============================================================
1335
1336Examples:
1337
1338.. parsed-literal::
1339
1340  op_sel_hi:[0,0]
1341  op_sel_hi:[0,0,1]
1342
1343.. _amdgpu_synid_neg_lo:
1344
1345neg_lo
1346~~~~~~
1347
1348Specifies whether to change sign of operand values selected by
1349:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
1350as input to the operation which results in the upper-half of the destination.
1351
1352The number of values specified by this modifier must match the number of source
1353operands. First value controls src0, second value controls src1 and so on.
1354
1355The value 0 indicates that the corresponding operand value is used unmodified,
1356the value 1 indicates that negative value of the operand must be used.
1357
1358By default, operand values are used unmodified.
1359
1360This modifier is valid for floating point operands only.
1361
1362    ================================ ==================================================================
1363    Syntax                           Description
1364    ================================ ==================================================================
1365    neg_lo:[{0..1}]                  Select affected operands for instructions with 1 source operand.
1366    neg_lo:[{0..1},{0..1}]           Select affected operands for instructions with 2 source operands.
1367    neg_lo:[{0..1},{0..1},{0..1}]    Select affected operands for instructions with 3 source operands.
1368    ================================ ==================================================================
1369
1370Examples:
1371
1372.. parsed-literal::
1373
1374  neg_lo:[0]
1375  neg_lo:[0,1]
1376
1377.. _amdgpu_synid_neg_hi:
1378
1379neg_hi
1380~~~~~~
1381
1382Specifies whether to change sign of operand values selected by
1383:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
1384as input to the operation which results in the upper-half of the destination.
1385
1386The number of values specified by this modifier must match the number of source
1387operands. First value controls src0, second value controls src1 and so on.
1388
1389The value 0 indicates that the corresponding operand value is used unmodified,
1390the value 1 indicates that negative value of the operand must be used.
1391
1392By default, operand values are used unmodified.
1393
1394This modifier is valid for floating point operands only.
1395
1396    =============================== ==================================================================
1397    Syntax                          Description
1398    =============================== ==================================================================
1399    neg_hi:[{0..1}]                 Select affected operands for instructions with 1 source operand.
1400    neg_hi:[{0..1},{0..1}]          Select affected operands for instructions with 2 source operands.
1401    neg_hi:[{0..1},{0..1},{0..1}]   Select affected operands for instructions with 3 source operands.
1402    =============================== ==================================================================
1403
1404Examples:
1405
1406.. parsed-literal::
1407
1408  neg_hi:[1,0]
1409  neg_hi:[0,1,1]
1410
1411clamp
1412~~~~~
1413
1414See a description :ref:`here<amdgpu_synid_clamp>`.
1415
1416.. _amdgpu_synid_mad_mix:
1417
1418VOP3P V_MAD_MIX Modifiers
1419-------------------------
1420
1421*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions
1422use *op_sel* and *op_sel_hi* modifiers
1423in a manner different from *regular* VOP3P instructions.
1424
1425See a description below.
1426
1427GFX9 and GFX10 only.
1428
1429.. _amdgpu_synid_mad_mix_op_sel:
1430
1431m_op_sel
1432~~~~~~~~
1433
1434This operand has meaning only for 16-bit source operands as indicated by
1435:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
1436It specifies to select either the low [15:0] or high [31:16] operand bits
1437as input to the operation.
1438
1439The number of values specified by the *op_sel* modifier must match the number of source
1440operands. First value controls src0, second value controls src1 and so on.
1441
1442The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
1443
1444By default, low bits are used for all operands.
1445
1446    =============================== ================================================
1447    Syntax                          Description
1448    =============================== ================================================
1449    op_sel:[{0..1},{0..1},{0..1}]   Select location of each 16-bit source operand.
1450    =============================== ================================================
1451
1452Examples:
1453
1454.. parsed-literal::
1455
1456  op_sel:[0,1]
1457
1458.. _amdgpu_synid_mad_mix_op_sel_hi:
1459
1460m_op_sel_hi
1461~~~~~~~~~~~
1462
1463Selects the size of source operands: either 32 bits or 16 bits.
1464By default, 32 bits are used for all source operands.
1465
1466The number of values specified by the *op_sel_hi* modifier must match the number of source
1467operands. First value controls src0, second value controls src1 and so on.
1468
1469The value 0 indicates 32 bits, the value 1 indicates 16 bits.
1470
1471The location of 16 bits in the operand may be specified by
1472:ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`.
1473
1474    ======================================== ====================================
1475    Syntax                                   Description
1476    ======================================== ====================================
1477    op_sel_hi:[{0..1},{0..1},{0..1}]         Select size of each source operand.
1478    ======================================== ====================================
1479
1480Examples:
1481
1482.. parsed-literal::
1483
1484  op_sel_hi:[1,1,1]
1485
1486abs
1487~~~
1488
1489See a description :ref:`here<amdgpu_synid_abs>`.
1490
1491neg
1492~~~
1493
1494See a description :ref:`here<amdgpu_synid_neg>`.
1495
1496clamp
1497~~~~~
1498
1499See a description :ref:`here<amdgpu_synid_clamp>`.
1500