1===================================== 2Syntax of AMDGPU Instruction Operands 3===================================== 4 5.. contents:: 6 :local: 7 8Conventions 9=========== 10 11The following notation is used throughout this document: 12 13 =================== ============================================================================= 14 Notation Description 15 =================== ============================================================================= 16 {0..N} Any integer value in the range from 0 to N (inclusive). 17 <x> Syntax and meaning of *x* is explained elsewhere. 18 =================== ============================================================================= 19 20.. _amdgpu_syn_operands: 21 22Operands 23======== 24 25.. _amdgpu_synid_v: 26 27v 28- 29 30Vector registers. There are 256 32-bit vector registers. 31 32A sequence of *vector* registers may be used to operate with more than 32 bits of data. 33 34Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers. 35 36 =================================================== ==================================================================== 37 Syntax Description 38 =================================================== ==================================================================== 39 **v**\<N> A single 32-bit *vector* register. 40 41 *N* must be a decimal integer number. 42 **v[**\ <N>\ **]** A single 32-bit *vector* register. 43 44 *N* may be specified as an 45 :ref:`integer number<amdgpu_synid_integer_number>` 46 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 47 **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. 48 49 *N* and *K* may be specified as 50 :ref:`integer numbers<amdgpu_synid_integer_number>` 51 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 52 **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. 53 54 Register indices must be specified as decimal integer numbers. 55 =================================================== ==================================================================== 56 57Note. *N* and *K* must satisfy the following conditions: 58 59* *N* <= *K*. 60* 0 <= *N* <= 255. 61* 0 <= *K* <= 255. 62* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16. 63 64Examples: 65 66.. parsed-literal:: 67 68 v255 69 v[0] 70 v[0:1] 71 v[1:1] 72 v[0:3] 73 v[2*2] 74 v[1-1:2-1] 75 [v252] 76 [v252,v253,v254,v255] 77 78.. _amdgpu_synid_s: 79 80s 81- 82 83Scalar 32-bit registers. The number of available *scalar* registers depends on GPU: 84 85 ======= ============================ 86 GPU Number of *scalar* registers 87 ======= ============================ 88 GFX7 104 89 GFX8 102 90 GFX9 102 91 ======= ============================ 92 93A sequence of *scalar* registers may be used to operate with more than 32 bits of data. 94Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers. 95 96Pairs of *scalar* registers must be even-aligned (the first register must be even). 97Sequences of 4 and more *scalar* registers must be quad-aligned. 98 99 ======================================================== ==================================================================== 100 Syntax Description 101 ======================================================== ==================================================================== 102 **s**\ <N> A single 32-bit *scalar* register. 103 104 *N* must be a decimal integer number. 105 **s[**\ <N>\ **]** A single 32-bit *scalar* register. 106 107 *N* may be specified as an 108 :ref:`integer number<amdgpu_synid_integer_number>` 109 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 110 **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. 111 112 *N* and *K* may be specified as 113 :ref:`integer numbers<amdgpu_synid_integer_number>` 114 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 115 **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. 116 117 Register indices must be specified as decimal integer numbers. 118 ======================================================== ==================================================================== 119 120Note. *N* and *K* must satisfy the following conditions: 121 122* *N* must be properly aligned based on sequence size. 123* *N* <= *K*. 124* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. 125* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. 126* *K-N+1* must be equal to 1, 2, 4, 8 or 16. 127 128Examples: 129 130.. parsed-literal:: 131 132 s0 133 s[0] 134 s[0:1] 135 s[1:1] 136 s[0:3] 137 s[2*2] 138 s[1-1:2-1] 139 [s4] 140 [s4,s5,s6,s7] 141 142Examples of *scalar* registers with an invalid alignment: 143 144.. parsed-literal:: 145 146 s[1:2] 147 s[2:5] 148 149.. _amdgpu_synid_trap: 150 151trap 152---- 153 154A set of trap handler registers: 155 156* :ref:`ttmp<amdgpu_synid_ttmp>` 157* :ref:`tba<amdgpu_synid_tba>` 158* :ref:`tma<amdgpu_synid_tma>` 159 160.. _amdgpu_synid_ttmp: 161 162ttmp 163---- 164 165Trap handler temporary scalar registers, 32-bits wide. 166The number of available *ttmp* registers depends on GPU: 167 168 ======= =========================== 169 GPU Number of *ttmp* registers 170 ======= =========================== 171 GFX7 12 172 GFX8 12 173 GFX9 16 174 ======= =========================== 175 176A sequence of *ttmp* registers may be used to operate with more than 32 bits of data. 177Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers. 178 179Pairs of *ttmp* registers must be even-aligned (the first register must be even). 180Sequences of 4 and more *ttmp* registers must be quad-aligned. 181 182 ============================================================= ==================================================================== 183 Syntax Description 184 ============================================================= ==================================================================== 185 **ttmp**\ <N> A single 32-bit *ttmp* register. 186 187 *N* must be a decimal integer number. 188 **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register. 189 190 *N* may be specified as an 191 :ref:`integer number<amdgpu_synid_integer_number>` 192 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 193 **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. 194 195 *N* and *K* may be specified as 196 :ref:`integer numbers<amdgpu_synid_integer_number>` 197 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 198 **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. 199 200 Register indices must be specified as decimal integer numbers. 201 ============================================================= ==================================================================== 202 203Note. *N* and *K* must satisfy the following conditions: 204 205* *N* must be properly aligned based on sequence size. 206* *N* <= *K*. 207* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. 208* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. 209* *K-N+1* must be equal to 1, 2, 4, 8 or 16. 210 211Examples: 212 213.. parsed-literal:: 214 215 ttmp0 216 ttmp[0] 217 ttmp[0:1] 218 ttmp[1:1] 219 ttmp[0:3] 220 ttmp[2*2] 221 ttmp[1-1:2-1] 222 [ttmp4] 223 [ttmp4,ttmp5,ttmp6,ttmp7] 224 225Examples of *ttmp* registers with an invalid alignment: 226 227.. parsed-literal:: 228 229 ttmp[1:2] 230 ttmp[2:5] 231 232.. _amdgpu_synid_tba: 233 234tba 235--- 236 237Trap base address, 64-bits wide. Holds the pointer to the current trap handler program. 238 239 ================== ======================================================================= ============= 240 Syntax Description Availability 241 ================== ======================================================================= ============= 242 tba 64-bit *trap base address* register. GFX7, GFX8 243 [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8 244 [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8 245 ================== ======================================================================= ============= 246 247High and low 32 bits of *trap base address* may be accessed as separate registers: 248 249 ================== ======================================================================= ============= 250 Syntax Description Availability 251 ================== ======================================================================= ============= 252 tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8 253 tba_hi High 32 bits of *trap base address* register. GFX7, GFX8 254 [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8 255 [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8 256 ================== ======================================================================= ============= 257 258Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9, 259but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions. 260 261.. _amdgpu_synid_tma: 262 263tma 264--- 265 266Trap memory address, 64-bits wide. 267 268 ================= ======================================================================= ================== 269 Syntax Description Availability 270 ================= ======================================================================= ================== 271 tma 64-bit *trap memory address* register. GFX7, GFX8 272 [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8 273 [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8 274 ================= ======================================================================= ================== 275 276High and low 32 bits of *trap memory address* may be accessed as separate registers: 277 278 ================= ======================================================================= ================== 279 Syntax Description Availability 280 ================= ======================================================================= ================== 281 tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8 282 tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8 283 [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8 284 [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8 285 ================= ======================================================================= ================== 286 287Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9, 288but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions. 289 290.. _amdgpu_synid_flat_scratch: 291 292flat_scratch 293------------ 294 295Flat scratch address, 64-bits wide. Holds the base address of scratch memory. 296 297 ================================== ================================================================ 298 Syntax Description 299 ================================== ================================================================ 300 flat_scratch 64-bit *flat scratch* address register. 301 [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax). 302 [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax). 303 ================================== ================================================================ 304 305High and low 32 bits of *flat scratch* address may be accessed as separate registers: 306 307 ========================= ========================================================================= 308 Syntax Description 309 ========================= ========================================================================= 310 flat_scratch_lo Low 32 bits of *flat scratch* address register. 311 flat_scratch_hi High 32 bits of *flat scratch* address register. 312 [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax). 313 [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax). 314 ========================= ========================================================================= 315 316.. _amdgpu_synid_xnack: 317 318xnack 319----- 320 321Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads 322received an *XNACK* due to a vector memory operation. 323 324.. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature. 325 326\ 327 328 ============================== ===================================================== 329 Syntax Description 330 ============================== ===================================================== 331 xnack_mask 64-bit *xnack mask* register. 332 [xnack_mask] 64-bit *xnack mask* register (an alternative syntax). 333 [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax). 334 ============================== ===================================================== 335 336High and low 32 bits of *xnack mask* may be accessed as separate registers: 337 338 ===================== ============================================================== 339 Syntax Description 340 ===================== ============================================================== 341 xnack_mask_lo Low 32 bits of *xnack mask* register. 342 xnack_mask_hi High 32 bits of *xnack mask* register. 343 [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax). 344 [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax). 345 ===================== ============================================================== 346 347.. _amdgpu_synid_vcc: 348 349vcc 350--- 351 352Vector condition code, 64-bits wide. A bit mask with one bit per thread; 353it holds the result of a vector compare operation. 354 355 ================ ========================================================================= 356 Syntax Description 357 ================ ========================================================================= 358 vcc 64-bit *vector condition code* register. 359 [vcc] 64-bit *vector condition code* register (an alternative syntax). 360 [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax). 361 ================ ========================================================================= 362 363High and low 32 bits of *vector condition code* may be accessed as separate registers: 364 365 ================ ========================================================================= 366 Syntax Description 367 ================ ========================================================================= 368 vcc_lo Low 32 bits of *vector condition code* register. 369 vcc_hi High 32 bits of *vector condition code* register. 370 [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax). 371 [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax). 372 ================ ========================================================================= 373 374.. _amdgpu_synid_m0: 375 376m0 377-- 378 379A 32-bit memory register. It has various uses, 380including register indexing and bounds checking. 381 382 =========== =================================================== 383 Syntax Description 384 =========== =================================================== 385 m0 A 32-bit *memory* register. 386 [m0] A 32-bit *memory* register (an alternative syntax). 387 =========== =================================================== 388 389.. _amdgpu_synid_exec: 390 391exec 392---- 393 394Execute mask, 64-bits wide. A bit mask with one bit per thread, 395which is applied to vector instructions and controls which threads execute 396and which ignore the instruction. 397 398 ===================== ================================================================= 399 Syntax Description 400 ===================== ================================================================= 401 exec 64-bit *execute mask* register. 402 [exec] 64-bit *execute mask* register (an alternative syntax). 403 [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax). 404 ===================== ================================================================= 405 406High and low 32 bits of *execute mask* may be accessed as separate registers: 407 408 ===================== ================================================================= 409 Syntax Description 410 ===================== ================================================================= 411 exec_lo Low 32 bits of *execute mask* register. 412 exec_hi High 32 bits of *execute mask* register. 413 [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax). 414 [exec_hi] High 32 bits of *execute mask* register (an alternative syntax). 415 ===================== ================================================================= 416 417.. _amdgpu_synid_vccz: 418 419vccz 420---- 421 422A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros. 423 424.. WARNING:: This operand is not currently supported by AMDGPU assembler. 425 426.. _amdgpu_synid_execz: 427 428execz 429----- 430 431A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros. 432 433.. WARNING:: This operand is not currently supported by AMDGPU assembler. 434 435.. _amdgpu_synid_scc: 436 437scc 438--- 439 440A single bit flag indicating the result of a scalar compare operation. 441 442.. WARNING:: This operand is not currently supported by AMDGPU assembler. 443 444lds_direct 445---------- 446 447A special operand which supplies a 32-bit value 448fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address. 449 450.. WARNING:: This operand is not currently supported by AMDGPU assembler. 451 452.. _amdgpu_synid_constant: 453 454constant 455-------- 456 457A set of integer and floating-point *inline constants*: 458 459* :ref:`iconst<amdgpu_synid_iconst>` 460* :ref:`fconst<amdgpu_synid_fconst>` 461 462These operands are encoded as a part of instruction. 463 464If a number may be encoded as either 465a :ref:`literal<amdgpu_synid_literal>` or 466an :ref:`inline constant<amdgpu_synid_constant>`, 467assembler selects the latter encoding as more efficient. 468 469.. _amdgpu_synid_iconst: 470 471iconst 472------ 473 474An :ref:`integer number<amdgpu_synid_integer_number>` 475encoded as an *inline constant*. 476 477Only a small fraction of integer numbers may be encoded as *inline constants*. 478They are enumerated in the table below. 479Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. 480 481Integer *inline constants* are converted to 482:ref:`expected operand type<amdgpu_syn_instruction_type>` 483as described :ref:`here<amdgpu_synid_int_const_conv>`. 484 485 ================================== ==================================== 486 Value Note 487 ================================== ==================================== 488 {0..64} Positive integer inline constants. 489 {-16..-1} Negative integer inline constants. 490 ================================== ==================================== 491 492.. WARNING:: GFX7 does not support inline constants for *f16* operands. 493 494There are also symbolic inline constants which provide read-only access to H/W registers. 495 496.. WARNING:: These inline constants are not currently supported by AMDGPU assembler. 497 498\ 499 500 ======================== ================================================ ============= 501 Syntax Note Availability 502 ======================== ================================================ ============= 503 shared_base Base address of shared memory region. GFX9 504 shared_limit Address of the end of shared memory region. GFX9 505 private_base Base address of private memory region. GFX9 506 private_limit Address of the end of private memory region. GFX9 507 pops_exiting_wave_id A dedicated counter for POPS. GFX9 508 ======================== ================================================ ============= 509 510.. _amdgpu_synid_fconst: 511 512fconst 513------ 514 515A :ref:`floating-point number<amdgpu_synid_floating-point_number>` 516encoded as an *inline constant*. 517 518Only a small fraction of floating-point numbers may be encoded as *inline constants*. 519They are enumerated in the table below. 520Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. 521 522Floating-point *inline constants* are converted to 523:ref:`expected operand type<amdgpu_syn_instruction_type>` 524as described :ref:`here<amdgpu_synid_fp_const_conv>`. 525 526 ===================== ===================================================== ================== 527 Value Note Availability 528 ===================== ===================================================== ================== 529 0.0 The same as integer constant 0. All GPUs 530 0.5 Floating-point constant 0.5 All GPUs 531 1.0 Floating-point constant 1.0 All GPUs 532 2.0 Floating-point constant 2.0 All GPUs 533 4.0 Floating-point constant 4.0 All GPUs 534 -0.5 Floating-point constant -0.5 All GPUs 535 -1.0 Floating-point constant -1.0 All GPUs 536 -2.0 Floating-point constant -2.0 All GPUs 537 -4.0 Floating-point constant -4.0 All GPUs 538 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9 539 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9 540 0.15915494309189532 1.0/(2.0*pi). GFX8, GFX9 541 ===================== ===================================================== ================== 542 543.. WARNING:: GFX7 does not support inline constants for *f16* operands. 544 545.. _amdgpu_synid_literal: 546 547literal 548------- 549 550A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream. 551 552If a number may be encoded as either 553a :ref:`literal<amdgpu_synid_literal>` or 554an :ref:`inline constant<amdgpu_synid_constant>`, 555assembler selects the latter encoding as more efficient. 556 557Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`, 558:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or 559:ref:`expressions<amdgpu_synid_expression>` 560(expressions are currently supported for 32-bit operands only). 561 562A 64-bit literal value is converted by assembler 563to an :ref:`expected operand type<amdgpu_syn_instruction_type>` 564as described :ref:`here<amdgpu_synid_lit_conv>`. 565 566An instruction may use only one literal but several operands may refer the same literal. 567 568.. _amdgpu_synid_uimm8: 569 570uimm8 571----- 572 573A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. 574The value is encoded as part of the opcode so it is free to use. 575 576.. _amdgpu_synid_uimm32: 577 578uimm32 579------ 580 581A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. 582The value is stored as a separate 32-bit dword in the instruction stream. 583 584.. _amdgpu_synid_uimm20: 585 586uimm20 587------ 588 589A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. 590 591.. _amdgpu_synid_uimm21: 592 593uimm21 594------ 595 596A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. 597 598.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. 599 600.. _amdgpu_synid_simm21: 601 602simm21 603------ 604 605A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`. 606 607.. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. 608 609.. _amdgpu_synid_off: 610 611off 612--- 613 614A special entity which indicates that the value of this operand is not used. 615 616 ================================== =================================================== 617 Syntax Description 618 ================================== =================================================== 619 off Indicates an unused operand. 620 ================================== =================================================== 621 622 623.. _amdgpu_synid_number: 624 625Numbers 626======= 627 628.. _amdgpu_synid_integer_number: 629 630Integer Numbers 631--------------- 632 633Integer numbers are 64 bits wide. 634They may be specified in binary, octal, hexadecimal and decimal formats: 635 636 ============== ==================================== 637 Format Syntax 638 ============== ==================================== 639 Decimal [-]?[1-9][0-9]* 640 Binary [-]?0b[01]+ 641 Octal [-]?0[0-7]+ 642 Hexadecimal [-]?0x[0-9a-fA-F]+ 643 \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 644 ============== ==================================== 645 646Examples: 647 648.. parsed-literal:: 649 650 -1234 651 0b1010 652 010 653 0xff 654 0ffh 655 656.. _amdgpu_synid_floating-point_number: 657 658Floating-Point Numbers 659---------------------- 660 661All floating-point numbers are handled as double (64 bits wide). 662 663Floating-point numbers may be specified in hexadecimal and decimal formats: 664 665 ============== ======================================================== ======================================================== 666 Format Syntax Note 667 ============== ======================================================== ======================================================== 668 Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent. 669 Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ 670 ============== ======================================================== ======================================================== 671 672Examples: 673 674.. parsed-literal:: 675 676 -1.234 677 234e2 678 -0x1afp-10 679 0x.1afp10 680 681.. _amdgpu_synid_expression: 682 683Expressions 684=========== 685 686An expression specifies an address or a numeric value. 687There are two kinds of expressions: 688 689* :ref:`Absolute<amdgpu_synid_absolute_expression>`. 690* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`. 691 692.. _amdgpu_synid_absolute_expression: 693 694Absolute Expressions 695-------------------- 696 697The value of an absolute expression remains the same after program relocation. 698Absolute expressions must not include unassigned and relocatable values 699such as labels. 700 701Examples: 702 703.. parsed-literal:: 704 705 x = -1 706 y = x + 10 707 708.. _amdgpu_synid_relocatable_expression: 709 710Relocatable Expressions 711----------------------- 712 713The value of a relocatable expression depends on program relocation. 714 715Note that use of relocatable expressions is limited with branch targets 716and 32-bit :ref:`literals<amdgpu_synid_literal>`. 717 718Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`. 719 720Examples: 721 722.. parsed-literal:: 723 724 y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative. 725 z = . 726 727Expression Data Type 728-------------------- 729 730Expressions and operands of expressions are interpreted as 64-bit integers. 731 732Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double). 733However these operands are also handled as 64-bit integers 734using binary representation of specified floating-point numbers. 735No conversion from floating-point to integer is performed. 736 737Examples: 738 739.. parsed-literal:: 740 741 x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1. 742 y = x + x // y is a sum of two integer values; it is not equal to 0.2! 743 744Syntax 745------ 746 747Expressions are composed of 748:ref:`symbols<amdgpu_synid_symbol>`, 749:ref:`integer numbers<amdgpu_synid_integer_number>`, 750:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`, 751:ref:`binary operators<amdgpu_synid_expression_bin_op>`, 752:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions. 753 754Expressions may also use "." which is a reference to the current PC (program counter). 755 756The syntax of expressions is shown below:: 757 758 expr ::= expr binop expr | primaryexpr ; 759 760 primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ; 761 762 binop ::= '&&' 763 | '||' 764 | '|' 765 | '^' 766 | '&' 767 | '!' 768 | '==' 769 | '!=' 770 | '<>' 771 | '<' 772 | '<=' 773 | '>' 774 | '>=' 775 | '<<' 776 | '>>' 777 | '+' 778 | '-' 779 | '*' 780 | '/' 781 | '%' ; 782 783 unop ::= '~' 784 | '+' 785 | '-' 786 | '!' ; 787 788.. _amdgpu_synid_expression_bin_op: 789 790Binary Operators 791---------------- 792 793Binary operators are described in the following table. 794They operate on and produce 64-bit integers. 795Operators with higher priority are performed first. 796 797 ========== ========= =============================================== 798 Operator Priority Meaning 799 ========== ========= =============================================== 800 \* 5 Integer multiplication. 801 / 5 Integer division. 802 % 5 Integer signed remainder. 803 \+ 4 Integer addition. 804 \- 4 Integer subtraction. 805 << 3 Integer shift left. 806 >> 3 Logical shift right. 807 == 2 Equality comparison. 808 != 2 Inequality comparison. 809 <> 2 Inequality comparison. 810 < 2 Signed less than comparison. 811 <= 2 Signed less than or equal comparison. 812 > 2 Signed greater than comparison. 813 >= 2 Signed greater than or equal comparison. 814 \| 1 Bitwise or. 815 ^ 1 Bitwise xor. 816 & 1 Bitwise and. 817 && 0 Logical and. 818 || 0 Logical or. 819 ========== ========= =============================================== 820 821.. _amdgpu_synid_expression_un_op: 822 823Unary Operators 824--------------- 825 826Unary operators are described in the following table. 827They operate on and produce 64-bit integers. 828 829 ========== =============================================== 830 Operator Meaning 831 ========== =============================================== 832 ! Logical negation. 833 ~ Bitwise negation. 834 \+ Integer unary plus. 835 \- Integer unary minus. 836 ========== =============================================== 837 838.. _amdgpu_synid_symbol: 839 840Symbols 841------- 842 843A symbol is a named 64-bit value, representing a relocatable 844address or an absolute (non-relocatable) number. 845 846Symbol names have the following syntax: 847 ``[a-zA-Z_.][a-zA-Z0-9_$.@]*`` 848 849The table below provides several examples of syntax used for symbol definition. 850 851 ================ ========================================================== 852 Syntax Meaning 853 ================ ========================================================== 854 .globl <S> Declares a global symbol S without assigning it a value. 855 .set <S>, <E> Assigns the value of an expression E to a symbol S. 856 <S> = <E> Assigns the value of an expression E to a symbol S. 857 <S>: Declares a label S and assigns it the current PC value. 858 ================ ========================================================== 859 860A symbol may be used before it is declared or assigned; 861unassigned symbols are assumed to be PC-relative. 862 863Addition information about symbols may be found :ref:`here<amdgpu-symbols>`. 864 865.. _amdgpu_synid_conv: 866 867Conversions 868=========== 869 870This section describes what happens when a 64-bit 871:ref:`integer number<amdgpu_synid_integer_number>`, a 872:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a 873:ref:`symbol<amdgpu_synid_symbol>` 874is used for an operand which has a different type or size. 875 876Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W: 877 878* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W. 879* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler. 880 881.. _amdgpu_synid_const_conv: 882 883Inline Constants 884---------------- 885 886.. _amdgpu_synid_int_const_conv: 887 888Integer Inline Constants 889~~~~~~~~~~~~~~~~~~~~~~~~ 890 891Integer :ref:`inline constants<amdgpu_synid_constant>` 892may be thought of as 64-bit 893:ref:`integer numbers<amdgpu_synid_integer_number>`; 894when used as operands they are truncated to the size of 895:ref:`expected operand type<amdgpu_syn_instruction_type>`. 896No data type conversions are performed. 897 898Examples: 899 900.. parsed-literal:: 901 902 // GFX9 903 904 v_add_u16 v0, -1, 0 // v0 = 0xFFFF 905 v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN) 906 907 v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF 908 v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN) 909 910.. _amdgpu_synid_fp_const_conv: 911 912Floating-Point Inline Constants 913~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 914 915Floating-point :ref:`inline constants<amdgpu_synid_constant>` 916may be thought of as 64-bit 917:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`; 918when used as operands they are converted to a floating-point number of 919:ref:`expected operand size<amdgpu_syn_instruction_type>`. 920 921Examples: 922 923.. parsed-literal:: 924 925 // GFX9 926 927 v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0) 928 v_add_u16 v0, 1.0, 0 // v0 = 0x3C00 929 930 v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0) 931 v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000 932 933 934.. _amdgpu_synid_lit_conv: 935 936Literals 937-------- 938 939.. _amdgpu_synid_int_lit_conv: 940 941Integer Literals 942~~~~~~~~~~~~~~~~ 943 944Integer :ref:`literals<amdgpu_synid_literal>` 945are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`. 946 947When used as operands they are converted to 948:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below. 949 950 ============== ============== =============== ==================================================================== 951 Expected type Condition Result Note 952 ============== ============== =============== ==================================================================== 953 i16, u16, b16 cond(num,16) num.u16 Truncate to 16 bits. 954 i32, u32, b32 cond(num,32) num.u32 Truncate to 32 bits. 955 i64 cond(num,32) {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits. 956 u64, b64 cond(num,32) { 0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits. 957 f16 cond(num,16) num.u16 Use low 16 bits as an f16 value. 958 f32 cond(num,32) num.u32 Use low 32 bits as an f32 value. 959 f64 cond(num,32) {num.u32,0} Use low 32 bits of the number as high 32 bits 960 of the result; low 32 bits of the result are zeroed. 961 ============== ============== =============== ==================================================================== 962 963The condition *cond(X,S)* indicates if a 64-bit number *X* 964can be converted to a smaller size *S* by truncation of upper bits. 965There are two cases when the conversion is possible: 966 967* The truncated bits are all 0. 968* The truncated bits are all 1 and the value after truncation has its MSB bit set. 969 970Examples of valid literals: 971 972.. parsed-literal:: 973 974 // GFX9 975 // Literal value after conversion: 976 v_add_u16 v0, 0xff00, v0 // 0xff00 977 v_add_u16 v0, 0xffffffffffffff00, v0 // 0xff00 978 v_add_u16 v0, -256, v0 // 0xff00 979 // Literal value after conversion: 980 s_bfe_i64 s[0:1], 0xffefffff, s3 // 0xffffffffffefffff 981 s_bfe_u64 s[0:1], 0xffefffff, s3 // 0x00000000ffefffff 982 v_ceil_f64_e32 v[0:1], 0xffefffff // 0xffefffff00000000 (-1.7976922776554302e308) 983 984Examples of invalid literals: 985 986.. parsed-literal:: 987 988 // GFX9 989 990 v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1 991 v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result 992 993.. _amdgpu_synid_fp_lit_conv: 994 995Floating-Point Literals 996~~~~~~~~~~~~~~~~~~~~~~~ 997 998Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit 999:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`. 1000 1001When used as operands they are converted to 1002:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below. 1003 1004 ============== ============== ================= ================================================================= 1005 Expected type Condition Result Note 1006 ============== ============== ================= ================================================================= 1007 i16, u16, b16 cond(num,16) f16(num) Convert to f16 and use bits of the result as an integer value. 1008 i32, u32, b32 cond(num,32) f32(num) Convert to f32 and use bits of the result as an integer value. 1009 i64, u64, b64 false \- Conversion disabled because of an unclear semantics. 1010 f16 cond(num,16) f16(num) Convert to f16. 1011 f32 cond(num,32) f32(num) Convert to f32. 1012 f64 true {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result; 1013 zero-fill low 32 bits of the result. 1014 1015 Note that the result may differ from the original number. 1016 ============== ============== ================= ================================================================= 1017 1018The condition *cond(X,S)* indicates if an f64 number *X* can be converted 1019to a smaller *S*-bit floating-point type without overflow or underflow. 1020Precision lost is allowed. 1021 1022Examples of valid literals: 1023 1024.. parsed-literal:: 1025 1026 // GFX9 1027 1028 v_add_f16 v1, 65500.0, v2 1029 v_add_f32 v1, 65600.0, v2 1030 1031 // Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff) 1032 // Literal value after conversion: 1.7976922776554302e308 (0x7fefffff00000000) 1033 v_ceil_f64 v[0:1], 1.7976931348623157e308 1034 1035Examples of invalid literals: 1036 1037.. parsed-literal:: 1038 1039 // GFX9 1040 1041 v_add_f16 v1, 65600.0, v2 // overflow 1042 1043.. _amdgpu_synid_exp_conv: 1044 1045Expressions 1046~~~~~~~~~~~ 1047 1048Expressions operate with and result in 64-bit integers. 1049 1050When used as operands they are truncated to 1051:ref:`expected operand size<amdgpu_syn_instruction_type>`. 1052No data type conversions are performed. 1053 1054Examples: 1055 1056.. parsed-literal:: 1057 1058 // GFX9 1059 1060 x = 0.1 1061 v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)] 1062 v_sqrt_f32 v0, (0.1 + 0) // the same as above 1063 v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float] 1064 1065