1=====================================
2Syntax of AMDGPU Instruction Operands
3=====================================
4
5.. contents::
6   :local:
7
8Conventions
9===========
10
11The following notation is used throughout this document:
12
13    =================== =============================================================================
14    Notation            Description
15    =================== =============================================================================
16    {0..N}              Any integer value in the range from 0 to N (inclusive).
17    <x>                 Syntax and meaning of *x* is explained elsewhere.
18    =================== =============================================================================
19
20.. _amdgpu_syn_operands:
21
22Operands
23========
24
25.. _amdgpu_synid_v:
26
27v
28-
29
30Vector registers. There are 256 32-bit vector registers.
31
32A sequence of *vector* registers may be used to operate with more than 32 bits of data.
33
34Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
35
36    =================================================== ====================================================================
37    Syntax                                              Description
38    =================================================== ====================================================================
39    **v**\<N>                                           A single 32-bit *vector* register.
40
41                                                        *N* must be a decimal integer number.
42    **v[**\ <N>\ **]**                                  A single 32-bit *vector* register.
43
44                                                        *N* may be specified as an
45                                                        :ref:`integer number<amdgpu_synid_integer_number>`
46                                                        or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
47    **v[**\ <N>:<K>\ **]**                              A sequence of (\ *K-N+1*\ ) *vector* registers.
48
49                                                        *N* and *K* may be specified as
50                                                        :ref:`integer numbers<amdgpu_synid_integer_number>`
51                                                        or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
52    **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *vector* registers.
53
54                                                        Register indices must be specified as decimal integer numbers.
55    =================================================== ====================================================================
56
57Note. *N* and *K* must satisfy the following conditions:
58
59* *N* <= *K*.
60* 0 <= *N* <= 255.
61* 0 <= *K* <= 255.
62* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
63
64Examples:
65
66.. parsed-literal::
67
68  v255
69  v[0]
70  v[0:1]
71  v[1:1]
72  v[0:3]
73  v[2*2]
74  v[1-1:2-1]
75  [v252]
76  [v252,v253,v254,v255]
77
78.. _amdgpu_synid_s:
79
80s
81-
82
83Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
84
85    ======= ============================
86    GPU     Number of *scalar* registers
87    ======= ============================
88    GFX7    104
89    GFX8    102
90    GFX9    102
91    ======= ============================
92
93A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
94Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
95
96Pairs of *scalar* registers must be even-aligned (the first register must be even).
97Sequences of 4 and more *scalar* registers must be quad-aligned.
98
99    ======================================================== ====================================================================
100    Syntax                                                   Description
101    ======================================================== ====================================================================
102    **s**\ <N>                                               A single 32-bit *scalar* register.
103
104                                                             *N* must be a decimal integer number.
105    **s[**\ <N>\ **]**                                       A single 32-bit *scalar* register.
106
107                                                             *N* may be specified as an
108                                                             :ref:`integer number<amdgpu_synid_integer_number>`
109                                                             or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
110    **s[**\ <N>:<K>\ **]**                                   A sequence of (\ *K-N+1*\ ) *scalar* registers.
111
112                                                             *N* and *K* may be specified as
113                                                             :ref:`integer numbers<amdgpu_synid_integer_number>`
114                                                             or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
115    **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]**       A sequence of (\ *K-N+1*\ ) *scalar* registers.
116
117                                                             Register indices must be specified as decimal integer numbers.
118    ======================================================== ====================================================================
119
120Note. *N* and *K* must satisfy the following conditions:
121
122* *N* must be properly aligned based on sequence size.
123* *N* <= *K*.
124* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
125* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
126* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
127
128Examples:
129
130.. parsed-literal::
131
132  s0
133  s[0]
134  s[0:1]
135  s[1:1]
136  s[0:3]
137  s[2*2]
138  s[1-1:2-1]
139  [s4]
140  [s4,s5,s6,s7]
141
142Examples of *scalar* registers with an invalid alignment:
143
144.. parsed-literal::
145
146  s[1:2]
147  s[2:5]
148
149.. _amdgpu_synid_trap:
150
151trap
152----
153
154A set of trap handler registers:
155
156* :ref:`ttmp<amdgpu_synid_ttmp>`
157* :ref:`tba<amdgpu_synid_tba>`
158* :ref:`tma<amdgpu_synid_tma>`
159
160.. _amdgpu_synid_ttmp:
161
162ttmp
163----
164
165Trap handler temporary scalar registers, 32-bits wide.
166The number of available *ttmp* registers depends on GPU:
167
168    ======= ===========================
169    GPU     Number of *ttmp* registers
170    ======= ===========================
171    GFX7    12
172    GFX8    12
173    GFX9    16
174    ======= ===========================
175
176A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
177Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
178
179Pairs of *ttmp* registers must be even-aligned (the first register must be even).
180Sequences of 4 and more *ttmp* registers must be quad-aligned.
181
182    ============================================================= ====================================================================
183    Syntax                                                        Description
184    ============================================================= ====================================================================
185    **ttmp**\ <N>                                                 A single 32-bit *ttmp* register.
186
187                                                                  *N* must be a decimal integer number.
188    **ttmp[**\ <N>\ **]**                                         A single 32-bit *ttmp* register.
189
190                                                                  *N* may be specified as an
191                                                                  :ref:`integer number<amdgpu_synid_integer_number>`
192                                                                  or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
193    **ttmp[**\ <N>:<K>\ **]**                                     A sequence of (\ *K-N+1*\ ) *ttmp* registers.
194
195                                                                  *N* and *K* may be specified as
196                                                                  :ref:`integer numbers<amdgpu_synid_integer_number>`
197                                                                  or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
198    **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]**   A sequence of (\ *K-N+1*\ ) *ttmp* registers.
199
200                                                                  Register indices must be specified as decimal integer numbers.
201    ============================================================= ====================================================================
202
203Note. *N* and *K* must satisfy the following conditions:
204
205* *N* must be properly aligned based on sequence size.
206* *N* <= *K*.
207* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
208* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
209* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
210
211Examples:
212
213.. parsed-literal::
214
215  ttmp0
216  ttmp[0]
217  ttmp[0:1]
218  ttmp[1:1]
219  ttmp[0:3]
220  ttmp[2*2]
221  ttmp[1-1:2-1]
222  [ttmp4]
223  [ttmp4,ttmp5,ttmp6,ttmp7]
224
225Examples of *ttmp* registers with an invalid alignment:
226
227.. parsed-literal::
228
229  ttmp[1:2]
230  ttmp[2:5]
231
232.. _amdgpu_synid_tba:
233
234tba
235---
236
237Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
238
239    ================== ======================================================================= =============
240    Syntax             Description                                                             Availability
241    ================== ======================================================================= =============
242    tba                64-bit *trap base address* register.                                    GFX7, GFX8
243    [tba]              64-bit *trap base address* register (an alternative syntax).            GFX7, GFX8
244    [tba_lo,tba_hi]    64-bit *trap base address* register (an alternative syntax).            GFX7, GFX8
245    ================== ======================================================================= =============
246
247High and low 32 bits of *trap base address* may be accessed as separate registers:
248
249    ================== ======================================================================= =============
250    Syntax             Description                                                             Availability
251    ================== ======================================================================= =============
252    tba_lo             Low 32 bits of *trap base address* register.                            GFX7, GFX8
253    tba_hi             High 32 bits of *trap base address* register.                           GFX7, GFX8
254    [tba_lo]           Low 32 bits of *trap base address* register (an alternative syntax).    GFX7, GFX8
255    [tba_hi]           High 32 bits of *trap base address* register (an alternative syntax).   GFX7, GFX8
256    ================== ======================================================================= =============
257
258Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9,
259but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
260
261.. _amdgpu_synid_tma:
262
263tma
264---
265
266Trap memory address, 64-bits wide.
267
268    ================= ======================================================================= ==================
269    Syntax            Description                                                             Availability
270    ================= ======================================================================= ==================
271    tma               64-bit *trap memory address* register.                                  GFX7, GFX8
272    [tma]             64-bit *trap memory address* register (an alternative syntax).          GFX7, GFX8
273    [tma_lo,tma_hi]   64-bit *trap memory address* register (an alternative syntax).          GFX7, GFX8
274    ================= ======================================================================= ==================
275
276High and low 32 bits of *trap memory address* may be accessed as separate registers:
277
278    ================= ======================================================================= ==================
279    Syntax            Description                                                             Availability
280    ================= ======================================================================= ==================
281    tma_lo            Low 32 bits of *trap memory address* register.                          GFX7, GFX8
282    tma_hi            High 32 bits of *trap memory address* register.                         GFX7, GFX8
283    [tma_lo]          Low 32 bits of *trap memory address* register (an alternative syntax).  GFX7, GFX8
284    [tma_hi]          High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
285    ================= ======================================================================= ==================
286
287Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9,
288but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
289
290.. _amdgpu_synid_flat_scratch:
291
292flat_scratch
293------------
294
295Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
296
297    ================================== ================================================================
298    Syntax                             Description
299    ================================== ================================================================
300    flat_scratch                       64-bit *flat scratch* address register.
301    [flat_scratch]                     64-bit *flat scratch* address register (an alternative syntax).
302    [flat_scratch_lo,flat_scratch_hi]  64-bit *flat scratch* address register (an alternative syntax).
303    ================================== ================================================================
304
305High and low 32 bits of *flat scratch* address may be accessed as separate registers:
306
307    ========================= =========================================================================
308    Syntax                    Description
309    ========================= =========================================================================
310    flat_scratch_lo           Low 32 bits of *flat scratch* address register.
311    flat_scratch_hi           High 32 bits of *flat scratch* address register.
312    [flat_scratch_lo]         Low 32 bits of *flat scratch* address register (an alternative syntax).
313    [flat_scratch_hi]         High 32 bits of *flat scratch* address register (an alternative syntax).
314    ========================= =========================================================================
315
316.. _amdgpu_synid_xnack:
317
318xnack
319-----
320
321Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
322received an *XNACK* due to a vector memory operation.
323
324.. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature.
325
326\
327
328    ============================== =====================================================
329    Syntax                         Description
330    ============================== =====================================================
331    xnack_mask                     64-bit *xnack mask* register.
332    [xnack_mask]                   64-bit *xnack mask* register (an alternative syntax).
333    [xnack_mask_lo,xnack_mask_hi]  64-bit *xnack mask* register (an alternative syntax).
334    ============================== =====================================================
335
336High and low 32 bits of *xnack mask* may be accessed as separate registers:
337
338    ===================== ==============================================================
339    Syntax                Description
340    ===================== ==============================================================
341    xnack_mask_lo         Low 32 bits of *xnack mask* register.
342    xnack_mask_hi         High 32 bits of *xnack mask* register.
343    [xnack_mask_lo]       Low 32 bits of *xnack mask* register (an alternative syntax).
344    [xnack_mask_hi]       High 32 bits of *xnack mask* register (an alternative syntax).
345    ===================== ==============================================================
346
347.. _amdgpu_synid_vcc:
348
349vcc
350---
351
352Vector condition code, 64-bits wide. A bit mask with one bit per thread;
353it holds the result of a vector compare operation.
354
355    ================ =========================================================================
356    Syntax           Description
357    ================ =========================================================================
358    vcc              64-bit *vector condition code* register.
359    [vcc]            64-bit *vector condition code* register (an alternative syntax).
360    [vcc_lo,vcc_hi]  64-bit *vector condition code* register (an alternative syntax).
361    ================ =========================================================================
362
363High and low 32 bits of *vector condition code* may be accessed as separate registers:
364
365    ================ =========================================================================
366    Syntax           Description
367    ================ =========================================================================
368    vcc_lo           Low 32 bits of *vector condition code* register.
369    vcc_hi           High 32 bits of *vector condition code* register.
370    [vcc_lo]         Low 32 bits of *vector condition code* register (an alternative syntax).
371    [vcc_hi]         High 32 bits of *vector condition code* register (an alternative syntax).
372    ================ =========================================================================
373
374.. _amdgpu_synid_m0:
375
376m0
377--
378
379A 32-bit memory register. It has various uses,
380including register indexing and bounds checking.
381
382    =========== ===================================================
383    Syntax      Description
384    =========== ===================================================
385    m0          A 32-bit *memory* register.
386    [m0]        A 32-bit *memory* register (an alternative syntax).
387    =========== ===================================================
388
389.. _amdgpu_synid_exec:
390
391exec
392----
393
394Execute mask, 64-bits wide. A bit mask with one bit per thread,
395which is applied to vector instructions and controls which threads execute
396and which ignore the instruction.
397
398    ===================== =================================================================
399    Syntax                Description
400    ===================== =================================================================
401    exec                  64-bit *execute mask* register.
402    [exec]                64-bit *execute mask* register (an alternative syntax).
403    [exec_lo,exec_hi]     64-bit *execute mask* register (an alternative syntax).
404    ===================== =================================================================
405
406High and low 32 bits of *execute mask* may be accessed as separate registers:
407
408    ===================== =================================================================
409    Syntax                Description
410    ===================== =================================================================
411    exec_lo               Low 32 bits of *execute mask* register.
412    exec_hi               High 32 bits of *execute mask* register.
413    [exec_lo]             Low 32 bits of *execute mask* register (an alternative syntax).
414    [exec_hi]             High 32 bits of *execute mask* register (an alternative syntax).
415    ===================== =================================================================
416
417.. _amdgpu_synid_vccz:
418
419vccz
420----
421
422A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
423
424.. WARNING:: This operand is not currently supported by AMDGPU assembler.
425
426.. _amdgpu_synid_execz:
427
428execz
429-----
430
431A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
432
433.. WARNING:: This operand is not currently supported by AMDGPU assembler.
434
435.. _amdgpu_synid_scc:
436
437scc
438---
439
440A single bit flag indicating the result of a scalar compare operation.
441
442.. WARNING:: This operand is not currently supported by AMDGPU assembler.
443
444lds_direct
445----------
446
447A special operand which supplies a 32-bit value
448fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
449
450.. WARNING:: This operand is not currently supported by AMDGPU assembler.
451
452.. _amdgpu_synid_constant:
453
454constant
455--------
456
457A set of integer and floating-point *inline constants*:
458
459* :ref:`iconst<amdgpu_synid_iconst>`
460* :ref:`fconst<amdgpu_synid_fconst>`
461
462These operands are encoded as a part of instruction.
463
464If a number may be encoded as either
465a :ref:`literal<amdgpu_synid_literal>` or
466an :ref:`inline constant<amdgpu_synid_constant>`,
467assembler selects the latter encoding as more efficient.
468
469.. _amdgpu_synid_iconst:
470
471iconst
472------
473
474An :ref:`integer number<amdgpu_synid_integer_number>`
475encoded as an *inline constant*.
476
477Only a small fraction of integer numbers may be encoded as *inline constants*.
478They are enumerated in the table below.
479Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
480
481Integer *inline constants* are converted to
482:ref:`expected operand type<amdgpu_syn_instruction_type>`
483as described :ref:`here<amdgpu_synid_int_const_conv>`.
484
485    ================================== ====================================
486    Value                              Note
487    ================================== ====================================
488    {0..64}                            Positive integer inline constants.
489    {-16..-1}                          Negative integer inline constants.
490    ================================== ====================================
491
492.. WARNING:: GFX7 does not support inline constants for *f16* operands.
493
494There are also symbolic inline constants which provide read-only access to H/W registers.
495
496.. WARNING:: These inline constants are not currently supported by AMDGPU assembler.
497
498\
499
500    ======================== ================================================ =============
501    Syntax                   Note                                             Availability
502    ======================== ================================================ =============
503    shared_base              Base address of shared memory region.            GFX9
504    shared_limit             Address of the end of shared memory region.      GFX9
505    private_base             Base address of private memory region.           GFX9
506    private_limit            Address of the end of private memory region.     GFX9
507    pops_exiting_wave_id     A dedicated counter for POPS.                    GFX9
508    ======================== ================================================ =============
509
510.. _amdgpu_synid_fconst:
511
512fconst
513------
514
515A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
516encoded as an *inline constant*.
517
518Only a small fraction of floating-point numbers may be encoded as *inline constants*.
519They are enumerated in the table below.
520Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
521
522Floating-point *inline constants* are converted to
523:ref:`expected operand type<amdgpu_syn_instruction_type>`
524as described :ref:`here<amdgpu_synid_fp_const_conv>`.
525
526    ===================== ===================================================== ==================
527    Value                 Note                                                  Availability
528    ===================== ===================================================== ==================
529    0.0                   The same as integer constant 0.                       All GPUs
530    0.5                   Floating-point constant 0.5                           All GPUs
531    1.0                   Floating-point constant 1.0                           All GPUs
532    2.0                   Floating-point constant 2.0                           All GPUs
533    4.0                   Floating-point constant 4.0                           All GPUs
534    -0.5                  Floating-point constant -0.5                          All GPUs
535    -1.0                  Floating-point constant -1.0                          All GPUs
536    -2.0                  Floating-point constant -2.0                          All GPUs
537    -4.0                  Floating-point constant -4.0                          All GPUs
538    0.1592                1.0/(2.0*pi). Use only for 16-bit operands.           GFX8, GFX9
539    0.15915494            1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8, GFX9
540    0.15915494309189532   1.0/(2.0*pi).                                         GFX8, GFX9
541    ===================== ===================================================== ==================
542
543.. WARNING:: GFX7 does not support inline constants for *f16* operands.
544
545.. _amdgpu_synid_literal:
546
547literal
548-------
549
550A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
551
552If a number may be encoded as either
553a :ref:`literal<amdgpu_synid_literal>` or
554an :ref:`inline constant<amdgpu_synid_constant>`,
555assembler selects the latter encoding as more efficient.
556
557Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
558:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
559:ref:`expressions<amdgpu_synid_expression>`
560(expressions are currently supported for 32-bit operands only).
561
562A 64-bit literal value is converted by assembler
563to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
564as described :ref:`here<amdgpu_synid_lit_conv>`.
565
566An instruction may use only one literal but several operands may refer the same literal.
567
568.. _amdgpu_synid_uimm8:
569
570uimm8
571-----
572
573A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
574The value is encoded as part of the opcode so it is free to use.
575
576.. _amdgpu_synid_uimm32:
577
578uimm32
579------
580
581A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
582The value is stored as a separate 32-bit dword in the instruction stream.
583
584.. _amdgpu_synid_uimm20:
585
586uimm20
587------
588
589A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
590
591.. _amdgpu_synid_uimm21:
592
593uimm21
594------
595
596A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
597
598.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
599
600.. _amdgpu_synid_simm21:
601
602simm21
603------
604
605A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
606
607.. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
608
609.. _amdgpu_synid_off:
610
611off
612---
613
614A special entity which indicates that the value of this operand is not used.
615
616    ================================== ===================================================
617    Syntax                             Description
618    ================================== ===================================================
619    off                                Indicates an unused operand.
620    ================================== ===================================================
621
622
623.. _amdgpu_synid_number:
624
625Numbers
626=======
627
628.. _amdgpu_synid_integer_number:
629
630Integer Numbers
631---------------
632
633Integer numbers are 64 bits wide.
634They may be specified in binary, octal, hexadecimal and decimal formats:
635
636    ============== ====================================
637    Format         Syntax
638    ============== ====================================
639    Decimal        [-]?[1-9][0-9]*
640    Binary         [-]?0b[01]+
641    Octal          [-]?0[0-7]+
642    Hexadecimal    [-]?0x[0-9a-fA-F]+
643    \              [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
644    ============== ====================================
645
646Examples:
647
648.. parsed-literal::
649
650  -1234
651  0b1010
652  010
653  0xff
654  0ffh
655
656.. _amdgpu_synid_floating-point_number:
657
658Floating-Point Numbers
659----------------------
660
661All floating-point numbers are handled as double (64 bits wide).
662
663Floating-point numbers may be specified in hexadecimal and decimal formats:
664
665    ============== ======================================================== ========================================================
666    Format         Syntax                                                   Note
667    ============== ======================================================== ========================================================
668    Decimal        [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)?                    Must include either a decimal separator or an exponent.
669    Hexadecimal    [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
670    ============== ======================================================== ========================================================
671
672Examples:
673
674.. parsed-literal::
675
676 -1.234
677 234e2
678 -0x1afp-10
679 0x.1afp10
680
681.. _amdgpu_synid_expression:
682
683Expressions
684===========
685
686An expression specifies an address or a numeric value.
687There are two kinds of expressions:
688
689* :ref:`Absolute<amdgpu_synid_absolute_expression>`.
690* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
691
692.. _amdgpu_synid_absolute_expression:
693
694Absolute Expressions
695--------------------
696
697The value of an absolute expression remains the same after program relocation.
698Absolute expressions must not include unassigned and relocatable values
699such as labels.
700
701Examples:
702
703.. parsed-literal::
704
705    x = -1
706    y = x + 10
707
708.. _amdgpu_synid_relocatable_expression:
709
710Relocatable Expressions
711-----------------------
712
713The value of a relocatable expression depends on program relocation.
714
715Note that use of relocatable expressions is limited with branch targets
716and 32-bit :ref:`literals<amdgpu_synid_literal>`.
717
718Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
719
720Examples:
721
722.. parsed-literal::
723
724    y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
725    z = .
726
727Expression Data Type
728--------------------
729
730Expressions and operands of expressions are interpreted as 64-bit integers.
731
732Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
733However these operands are also handled as 64-bit integers
734using binary representation of specified floating-point numbers.
735No conversion from floating-point to integer is performed.
736
737Examples:
738
739.. parsed-literal::
740
741    x = 0.1    // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
742    y = x + x  // y is a sum of two integer values; it is not equal to 0.2!
743
744Syntax
745------
746
747Expressions are composed of
748:ref:`symbols<amdgpu_synid_symbol>`,
749:ref:`integer numbers<amdgpu_synid_integer_number>`,
750:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
751:ref:`binary operators<amdgpu_synid_expression_bin_op>`,
752:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
753
754Expressions may also use "." which is a reference to the current PC (program counter).
755
756The syntax of expressions is shown below::
757
758    expr ::= expr binop expr | primaryexpr ;
759
760    primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
761
762    binop ::= '&&'
763            | '||'
764            | '|'
765            | '^'
766            | '&'
767            | '!'
768            | '=='
769            | '!='
770            | '<>'
771            | '<'
772            | '<='
773            | '>'
774            | '>='
775            | '<<'
776            | '>>'
777            | '+'
778            | '-'
779            | '*'
780            | '/'
781            | '%' ;
782
783    unop ::= '~'
784           | '+'
785           | '-'
786           | '!' ;
787
788.. _amdgpu_synid_expression_bin_op:
789
790Binary Operators
791----------------
792
793Binary operators are described in the following table.
794They operate on and produce 64-bit integers.
795Operators with higher priority are performed first.
796
797    ========== ========= ===============================================
798    Operator   Priority  Meaning
799    ========== ========= ===============================================
800       \*         5      Integer multiplication.
801       /          5      Integer division.
802       %          5      Integer signed remainder.
803       \+         4      Integer addition.
804       \-         4      Integer subtraction.
805       <<         3      Integer shift left.
806       >>         3      Logical shift right.
807       ==         2      Equality comparison.
808       !=         2      Inequality comparison.
809       <>         2      Inequality comparison.
810       <          2      Signed less than comparison.
811       <=         2      Signed less than or equal comparison.
812       >          2      Signed greater than comparison.
813       >=         2      Signed greater than or equal comparison.
814      \|          1      Bitwise or.
815       ^          1      Bitwise xor.
816       &          1      Bitwise and.
817       &&         0      Logical and.
818       ||         0      Logical or.
819    ========== ========= ===============================================
820
821.. _amdgpu_synid_expression_un_op:
822
823Unary Operators
824---------------
825
826Unary operators are described in the following table.
827They operate on and produce 64-bit integers.
828
829    ========== ===============================================
830    Operator   Meaning
831    ========== ===============================================
832       !       Logical negation.
833       ~       Bitwise negation.
834       \+      Integer unary plus.
835       \-      Integer unary minus.
836    ========== ===============================================
837
838.. _amdgpu_synid_symbol:
839
840Symbols
841-------
842
843A symbol is a named 64-bit value, representing a relocatable
844address or an absolute (non-relocatable) number.
845
846Symbol names have the following syntax:
847    ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
848
849The table below provides several examples of syntax used for symbol definition.
850
851    ================ ==========================================================
852    Syntax           Meaning
853    ================ ==========================================================
854    .globl <S>       Declares a global symbol S without assigning it a value.
855    .set <S>, <E>    Assigns the value of an expression E to a symbol S.
856    <S> = <E>        Assigns the value of an expression E to a symbol S.
857    <S>:             Declares a label S and assigns it the current PC value.
858    ================ ==========================================================
859
860A symbol may be used before it is declared or assigned;
861unassigned symbols are assumed to be PC-relative.
862
863Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
864
865.. _amdgpu_synid_conv:
866
867Conversions
868===========
869
870This section describes what happens when a 64-bit
871:ref:`integer number<amdgpu_synid_integer_number>`, a
872:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
873:ref:`symbol<amdgpu_synid_symbol>`
874is used for an operand which has a different type or size.
875
876Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
877
878* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
879* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
880
881.. _amdgpu_synid_const_conv:
882
883Inline Constants
884----------------
885
886.. _amdgpu_synid_int_const_conv:
887
888Integer Inline Constants
889~~~~~~~~~~~~~~~~~~~~~~~~
890
891Integer :ref:`inline constants<amdgpu_synid_constant>`
892may be thought of as 64-bit
893:ref:`integer numbers<amdgpu_synid_integer_number>`;
894when used as operands they are truncated to the size of
895:ref:`expected operand type<amdgpu_syn_instruction_type>`.
896No data type conversions are performed.
897
898Examples:
899
900.. parsed-literal::
901
902    // GFX9
903
904    v_add_u16 v0, -1, 0    // v0 = 0xFFFF
905    v_add_f16 v0, -1, 0    // v0 = 0xFFFF (NaN)
906
907    v_add_u32 v0, -1, 0    // v0 = 0xFFFFFFFF
908    v_add_f32 v0, -1, 0    // v0 = 0xFFFFFFFF (NaN)
909
910.. _amdgpu_synid_fp_const_conv:
911
912Floating-Point Inline Constants
913~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
914
915Floating-point :ref:`inline constants<amdgpu_synid_constant>`
916may be thought of as 64-bit
917:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
918when used as operands they are converted to a floating-point number of
919:ref:`expected operand size<amdgpu_syn_instruction_type>`.
920
921Examples:
922
923.. parsed-literal::
924
925    // GFX9
926
927    v_add_f16 v0, 1.0, 0    // v0 = 0x3C00 (1.0)
928    v_add_u16 v0, 1.0, 0    // v0 = 0x3C00
929
930    v_add_f32 v0, 1.0, 0    // v0 = 0x3F800000 (1.0)
931    v_add_u32 v0, 1.0, 0    // v0 = 0x3F800000
932
933
934.. _amdgpu_synid_lit_conv:
935
936Literals
937--------
938
939.. _amdgpu_synid_int_lit_conv:
940
941Integer Literals
942~~~~~~~~~~~~~~~~
943
944Integer :ref:`literals<amdgpu_synid_literal>`
945are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
946
947When used as operands they are converted to
948:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
949
950    ============== ============== =============== ====================================================================
951    Expected type  Condition      Result          Note
952    ============== ============== =============== ====================================================================
953    i16, u16, b16  cond(num,16)   num.u16         Truncate to 16 bits.
954    i32, u32, b32  cond(num,32)   num.u32         Truncate to 32 bits.
955    i64            cond(num,32)   {-1,num.i32}    Truncate to 32 bits and then sign-extend the result to 64 bits.
956    u64, b64       cond(num,32)   { 0,num.u32}    Truncate to 32 bits and then zero-extend the result to 64 bits.
957    f16            cond(num,16)   num.u16         Use low 16 bits as an f16 value.
958    f32            cond(num,32)   num.u32         Use low 32 bits as an f32 value.
959    f64            cond(num,32)   {num.u32,0}     Use low 32 bits of the number as high 32 bits
960                                                  of the result; low 32 bits of the result are zeroed.
961    ============== ============== =============== ====================================================================
962
963The condition *cond(X,S)* indicates if a 64-bit number *X*
964can be converted to a smaller size *S* by truncation of upper bits.
965There are two cases when the conversion is possible:
966
967* The truncated bits are all 0.
968* The truncated bits are all 1 and the value after truncation has its MSB bit set.
969
970Examples of valid literals:
971
972.. parsed-literal::
973
974    // GFX9
975                                             // Literal value after conversion:
976    v_add_u16 v0, 0xff00, v0                 //   0xff00
977    v_add_u16 v0, 0xffffffffffffff00, v0     //   0xff00
978    v_add_u16 v0, -256, v0                   //   0xff00
979                                             // Literal value after conversion:
980    s_bfe_i64 s[0:1], 0xffefffff, s3         //   0xffffffffffefffff
981    s_bfe_u64 s[0:1], 0xffefffff, s3         //   0x00000000ffefffff
982    v_ceil_f64_e32 v[0:1], 0xffefffff        //   0xffefffff00000000 (-1.7976922776554302e308)
983
984Examples of invalid literals:
985
986.. parsed-literal::
987
988    // GFX9
989
990    v_add_u16 v0, 0x1ff00, v0               // truncated bits are not all 0 or 1
991    v_add_u16 v0, 0xffffffffffff00ff, v0    // truncated bits do not match MSB of the result
992
993.. _amdgpu_synid_fp_lit_conv:
994
995Floating-Point Literals
996~~~~~~~~~~~~~~~~~~~~~~~
997
998Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
999:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
1000
1001When used as operands they are converted to
1002:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
1003
1004    ============== ============== ================= =================================================================
1005    Expected type  Condition      Result            Note
1006    ============== ============== ================= =================================================================
1007    i16, u16, b16  cond(num,16)   f16(num)          Convert to f16 and use bits of the result as an integer value.
1008    i32, u32, b32  cond(num,32)   f32(num)          Convert to f32 and use bits of the result as an integer value.
1009    i64, u64, b64  false          \-                Conversion disabled because of an unclear semantics.
1010    f16            cond(num,16)   f16(num)          Convert to f16.
1011    f32            cond(num,32)   f32(num)          Convert to f32.
1012    f64            true           {num.u32.hi,0}    Use high 32 bits of the number as high 32 bits of the result;
1013                                                    zero-fill low 32 bits of the result.
1014
1015                                                    Note that the result may differ from the original number.
1016    ============== ============== ================= =================================================================
1017
1018The condition *cond(X,S)* indicates if an f64 number *X* can be converted
1019to a smaller *S*-bit floating-point type without overflow or underflow.
1020Precision lost is allowed.
1021
1022Examples of valid literals:
1023
1024.. parsed-literal::
1025
1026    // GFX9
1027
1028    v_add_f16 v1, 65500.0, v2
1029    v_add_f32 v1, 65600.0, v2
1030
1031    // Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff)
1032    // Literal value after conversion:  1.7976922776554302e308 (0x7fefffff00000000)
1033    v_ceil_f64 v[0:1], 1.7976931348623157e308
1034
1035Examples of invalid literals:
1036
1037.. parsed-literal::
1038
1039    // GFX9
1040
1041    v_add_f16 v1, 65600.0, v2    // overflow
1042
1043.. _amdgpu_synid_exp_conv:
1044
1045Expressions
1046~~~~~~~~~~~
1047
1048Expressions operate with and result in 64-bit integers.
1049
1050When used as operands they are truncated to
1051:ref:`expected operand size<amdgpu_syn_instruction_type>`.
1052No data type conversions are performed.
1053
1054Examples:
1055
1056.. parsed-literal::
1057
1058    // GFX9
1059
1060    x = 0.1
1061    v_sqrt_f32 v0, x           // v0 = [low 32 bits of 0.1 (double)]
1062    v_sqrt_f32 v0, (0.1 + 0)   // the same as above
1063    v_sqrt_f32 v0, 0.1         // v0 = [0.1 (double) converted to float]
1064
1065