1===================================== 2Syntax of AMDGPU Instruction Operands 3===================================== 4 5.. contents:: 6 :local: 7 8Conventions 9=========== 10 11The following notation is used throughout this document: 12 13 =================== ============================================================================= 14 Notation Description 15 =================== ============================================================================= 16 {0..N} Any integer value in the range from 0 to N (inclusive). 17 <x> Syntax and meaning of *x* is explained elsewhere. 18 =================== ============================================================================= 19 20.. _amdgpu_syn_operands: 21 22Operands 23======== 24 25.. _amdgpu_synid_v: 26 27v 28- 29 30Vector registers. There are 256 32-bit vector registers. 31 32A sequence of *vector* registers may be used to operate with more than 32 bits of data. 33 34Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers. 35 36 =================================================== ==================================================================== 37 Syntax Description 38 =================================================== ==================================================================== 39 **v**\<N> A single 32-bit *vector* register. 40 41 *N* must be a decimal 42 :ref:`integer number<amdgpu_synid_integer_number>`. 43 **v[**\ <N>\ **]** A single 32-bit *vector* register. 44 45 *N* may be specified as an 46 :ref:`integer number<amdgpu_synid_integer_number>` 47 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 48 **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. 49 50 *N* and *K* may be specified as 51 :ref:`integer numbers<amdgpu_synid_integer_number>` 52 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 53 **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. 54 55 Register indices must be specified as decimal 56 :ref:`integer numbers<amdgpu_synid_integer_number>`. 57 =================================================== ==================================================================== 58 59Note: *N* and *K* must satisfy the following conditions: 60 61* *N* <= *K*. 62* 0 <= *N* <= 255. 63* 0 <= *K* <= 255. 64* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16. 65 66Examples: 67 68.. parsed-literal:: 69 70 v255 71 v[0] 72 v[0:1] 73 v[1:1] 74 v[0:3] 75 v[2*2] 76 v[1-1:2-1] 77 [v252] 78 [v252,v253,v254,v255] 79 80.. _amdgpu_synid_nsa: 81 82GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*: 83 84 ===================================== ================================================= 85 Syntax Description 86 ===================================== ================================================= 87 **[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers. 88 Each register may be specified using a syntax 89 defined :ref:`above<amdgpu_synid_v>`. 90 91 In contrast with standard syntax, registers 92 in *NSA* sequence are not required to have 93 consecutive indices. Moreover, the same register 94 may appear in the list more than once. 95 ===================================== ================================================= 96 97Examples: 98 99.. parsed-literal:: 100 101 [v32,v1,v[2]] 102 [v[32],v[1:1],[v2]] 103 [v4,v4,v4,v4] 104 105.. _amdgpu_synid_s: 106 107s 108- 109 110Scalar 32-bit registers. The number of available *scalar* registers depends on GPU: 111 112 ======= ============================ 113 GPU Number of *scalar* registers 114 ======= ============================ 115 GFX7 104 116 GFX8 102 117 GFX9 102 118 GFX10 106 119 ======= ============================ 120 121A sequence of *scalar* registers may be used to operate with more than 32 bits of data. 122Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers. 123 124Pairs of *scalar* registers must be even-aligned (the first register must be even). 125Sequences of 4 and more *scalar* registers must be quad-aligned. 126 127 ======================================================== ==================================================================== 128 Syntax Description 129 ======================================================== ==================================================================== 130 **s**\ <N> A single 32-bit *scalar* register. 131 132 *N* must be a decimal 133 :ref:`integer number<amdgpu_synid_integer_number>`. 134 135 **s[**\ <N>\ **]** A single 32-bit *scalar* register. 136 137 *N* may be specified as an 138 :ref:`integer number<amdgpu_synid_integer_number>` 139 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 140 **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. 141 142 *N* and *K* may be specified as 143 :ref:`integer numbers<amdgpu_synid_integer_number>` 144 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 145 146 **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. 147 148 Register indices must be specified as decimal 149 :ref:`integer numbers<amdgpu_synid_integer_number>`. 150 ======================================================== ==================================================================== 151 152Note: *N* and *K* must satisfy the following conditions: 153 154* *N* must be properly aligned based on sequence size. 155* *N* <= *K*. 156* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. 157* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. 158* *K-N+1* must be equal to 1, 2, 4, 8 or 16. 159 160Examples: 161 162.. parsed-literal:: 163 164 s0 165 s[0] 166 s[0:1] 167 s[1:1] 168 s[0:3] 169 s[2*2] 170 s[1-1:2-1] 171 [s4] 172 [s4,s5,s6,s7] 173 174Examples of *scalar* registers with an invalid alignment: 175 176.. parsed-literal:: 177 178 s[1:2] 179 s[2:5] 180 181.. _amdgpu_synid_trap: 182 183trap 184---- 185 186A set of trap handler registers: 187 188* :ref:`ttmp<amdgpu_synid_ttmp>` 189* :ref:`tba<amdgpu_synid_tba>` 190* :ref:`tma<amdgpu_synid_tma>` 191 192.. _amdgpu_synid_ttmp: 193 194ttmp 195---- 196 197Trap handler temporary scalar registers, 32-bits wide. 198The number of available *ttmp* registers depends on GPU: 199 200 ======= =========================== 201 GPU Number of *ttmp* registers 202 ======= =========================== 203 GFX7 12 204 GFX8 12 205 GFX9 16 206 GFX10 16 207 ======= =========================== 208 209A sequence of *ttmp* registers may be used to operate with more than 32 bits of data. 210Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers. 211 212Pairs of *ttmp* registers must be even-aligned (the first register must be even). 213Sequences of 4 and more *ttmp* registers must be quad-aligned. 214 215 ============================================================= ==================================================================== 216 Syntax Description 217 ============================================================= ==================================================================== 218 **ttmp**\ <N> A single 32-bit *ttmp* register. 219 220 *N* must be a decimal 221 :ref:`integer number<amdgpu_synid_integer_number>`. 222 **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register. 223 224 *N* may be specified as an 225 :ref:`integer number<amdgpu_synid_integer_number>` 226 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 227 **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. 228 229 *N* and *K* may be specified as 230 :ref:`integer numbers<amdgpu_synid_integer_number>` 231 or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. 232 **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. 233 234 Register indices must be specified as decimal 235 :ref:`integer numbers<amdgpu_synid_integer_number>`. 236 ============================================================= ==================================================================== 237 238Note: *N* and *K* must satisfy the following conditions: 239 240* *N* must be properly aligned based on sequence size. 241* *N* <= *K*. 242* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. 243* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. 244* *K-N+1* must be equal to 1, 2, 4, 8 or 16. 245 246Examples: 247 248.. parsed-literal:: 249 250 ttmp0 251 ttmp[0] 252 ttmp[0:1] 253 ttmp[1:1] 254 ttmp[0:3] 255 ttmp[2*2] 256 ttmp[1-1:2-1] 257 [ttmp4] 258 [ttmp4,ttmp5,ttmp6,ttmp7] 259 260Examples of *ttmp* registers with an invalid alignment: 261 262.. parsed-literal:: 263 264 ttmp[1:2] 265 ttmp[2:5] 266 267.. _amdgpu_synid_tba: 268 269tba 270--- 271 272Trap base address, 64-bits wide. Holds the pointer to the current trap handler program. 273 274 ================== ======================================================================= ============= 275 Syntax Description Availability 276 ================== ======================================================================= ============= 277 tba 64-bit *trap base address* register. GFX7, GFX8 278 [tba] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8 279 [tba_lo,tba_hi] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8 280 ================== ======================================================================= ============= 281 282High and low 32 bits of *trap base address* may be accessed as separate registers: 283 284 ================== ======================================================================= ============= 285 Syntax Description Availability 286 ================== ======================================================================= ============= 287 tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8 288 tba_hi High 32 bits of *trap base address* register. GFX7, GFX8 289 [tba_lo] Low 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8 290 [tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8 291 ================== ======================================================================= ============= 292 293Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10, 294but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions. 295 296.. _amdgpu_synid_tma: 297 298tma 299--- 300 301Trap memory address, 64-bits wide. 302 303 ================= ======================================================================= ================== 304 Syntax Description Availability 305 ================= ======================================================================= ================== 306 tma 64-bit *trap memory address* register. GFX7, GFX8 307 [tma] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8 308 [tma_lo,tma_hi] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8 309 ================= ======================================================================= ================== 310 311High and low 32 bits of *trap memory address* may be accessed as separate registers: 312 313 ================= ======================================================================= ================== 314 Syntax Description Availability 315 ================= ======================================================================= ================== 316 tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8 317 tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8 318 [tma_lo] Low 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8 319 [tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8 320 ================= ======================================================================= ================== 321 322Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10, 323but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions. 324 325.. _amdgpu_synid_flat_scratch: 326 327flat_scratch 328------------ 329 330Flat scratch address, 64-bits wide. Holds the base address of scratch memory. 331 332 ================================== ================================================================ 333 Syntax Description 334 ================================== ================================================================ 335 flat_scratch 64-bit *flat scratch* address register. 336 [flat_scratch] 64-bit *flat scratch* address register (an SP3 syntax). 337 [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an SP3 syntax). 338 ================================== ================================================================ 339 340High and low 32 bits of *flat scratch* address may be accessed as separate registers: 341 342 ========================= ========================================================================= 343 Syntax Description 344 ========================= ========================================================================= 345 flat_scratch_lo Low 32 bits of *flat scratch* address register. 346 flat_scratch_hi High 32 bits of *flat scratch* address register. 347 [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an SP3 syntax). 348 [flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax). 349 ========================= ========================================================================= 350 351.. _amdgpu_synid_xnack: 352 353xnack 354----- 355 356Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads 357received an *XNACK* due to a vector memory operation. 358 359.. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`. 360 361\ 362 363 ============================== ===================================================== 364 Syntax Description 365 ============================== ===================================================== 366 xnack_mask 64-bit *xnack mask* register. 367 [xnack_mask] 64-bit *xnack mask* register (an SP3 syntax). 368 [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an SP3 syntax). 369 ============================== ===================================================== 370 371High and low 32 bits of *xnack mask* may be accessed as separate registers: 372 373 ===================== ============================================================== 374 Syntax Description 375 ===================== ============================================================== 376 xnack_mask_lo Low 32 bits of *xnack mask* register. 377 xnack_mask_hi High 32 bits of *xnack mask* register. 378 [xnack_mask_lo] Low 32 bits of *xnack mask* register (an SP3 syntax). 379 [xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax). 380 ===================== ============================================================== 381 382.. _amdgpu_synid_vcc: 383.. _amdgpu_synid_vcc_lo: 384 385vcc 386--- 387 388Vector condition code, 64-bits wide. A bit mask with one bit per thread; 389it holds the result of a vector compare operation. 390 391Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode. 392 393 ================ ========================================================================= 394 Syntax Description 395 ================ ========================================================================= 396 vcc 64-bit *vector condition code* register. 397 [vcc] 64-bit *vector condition code* register (an SP3 syntax). 398 [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an SP3 syntax). 399 ================ ========================================================================= 400 401High and low 32 bits of *vector condition code* may be accessed as separate registers: 402 403 ================ ========================================================================= 404 Syntax Description 405 ================ ========================================================================= 406 vcc_lo Low 32 bits of *vector condition code* register. 407 vcc_hi High 32 bits of *vector condition code* register. 408 [vcc_lo] Low 32 bits of *vector condition code* register (an SP3 syntax). 409 [vcc_hi] High 32 bits of *vector condition code* register (an SP3 syntax). 410 ================ ========================================================================= 411 412.. _amdgpu_synid_m0: 413 414m0 415-- 416 417A 32-bit memory register. It has various uses, 418including register indexing and bounds checking. 419 420 =========== =================================================== 421 Syntax Description 422 =========== =================================================== 423 m0 A 32-bit *memory* register. 424 [m0] A 32-bit *memory* register (an SP3 syntax). 425 =========== =================================================== 426 427.. _amdgpu_synid_exec: 428 429exec 430---- 431 432Execute mask, 64-bits wide. A bit mask with one bit per thread, 433which is applied to vector instructions and controls which threads execute 434and which ignore the instruction. 435 436Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode. 437 438 ===================== ================================================================= 439 Syntax Description 440 ===================== ================================================================= 441 exec 64-bit *execute mask* register. 442 [exec] 64-bit *execute mask* register (an SP3 syntax). 443 [exec_lo,exec_hi] 64-bit *execute mask* register (an SP3 syntax). 444 ===================== ================================================================= 445 446High and low 32 bits of *execute mask* may be accessed as separate registers: 447 448 ===================== ================================================================= 449 Syntax Description 450 ===================== ================================================================= 451 exec_lo Low 32 bits of *execute mask* register. 452 exec_hi High 32 bits of *execute mask* register. 453 [exec_lo] Low 32 bits of *execute mask* register (an SP3 syntax). 454 [exec_hi] High 32 bits of *execute mask* register (an SP3 syntax). 455 ===================== ================================================================= 456 457.. _amdgpu_synid_vccz: 458 459vccz 460---- 461 462A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros. 463 464Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`. 465 466.. _amdgpu_synid_execz: 467 468execz 469----- 470 471A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros. 472 473Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`. 474 475.. _amdgpu_synid_scc: 476 477scc 478--- 479 480A single bit flag indicating the result of a scalar compare operation. 481 482.. _amdgpu_synid_lds_direct: 483 484lds_direct 485---------- 486 487A special operand which supplies a 32-bit value 488fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address. 489 490.. _amdgpu_synid_null: 491 492null 493---- 494 495This is a special operand which may be used as a source or a destination. 496 497When used as a destination, the result of the operation is discarded. 498 499When used as a source, it supplies zero value. 500 501GFX10 only. 502 503.. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10. 504 505.. _amdgpu_synid_constant: 506 507inline constant 508--------------- 509 510An *inline constant* is an integer or a floating-point value encoded as a part of an instruction. 511Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`. 512 513Inline constants include: 514 515* :ref:`iconst<amdgpu_synid_iconst>` 516* :ref:`fconst<amdgpu_synid_fconst>` 517* :ref:`ival<amdgpu_synid_ival>` 518 519If a number may be encoded as either 520a :ref:`literal<amdgpu_synid_literal>` or 521a :ref:`constant<amdgpu_synid_constant>`, 522assembler selects the latter encoding as more efficient. 523 524.. _amdgpu_synid_iconst: 525 526iconst 527~~~~~~ 528 529An :ref:`integer number<amdgpu_synid_integer_number>` or 530an :ref:`absolute expression<amdgpu_synid_absolute_expression>` 531encoded as an *inline constant*. 532 533Only a small fraction of integer numbers may be encoded as *inline constants*. 534They are enumerated in the table below. 535Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. 536 537 ================================== ==================================== 538 Value Note 539 ================================== ==================================== 540 {0..64} Positive integer inline constants. 541 {-16..-1} Negative integer inline constants. 542 ================================== ==================================== 543 544.. WARNING:: GFX7 does not support inline constants for *f16* operands. 545 546.. _amdgpu_synid_fconst: 547 548fconst 549~~~~~~ 550 551A :ref:`floating-point number<amdgpu_synid_floating-point_number>` 552encoded as an *inline constant*. 553 554Only a small fraction of floating-point numbers may be encoded as *inline constants*. 555They are enumerated in the table below. 556Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. 557 558 ===================== ===================================================== ================== 559 Value Note Availability 560 ===================== ===================================================== ================== 561 0.0 The same as integer constant 0. All GPUs 562 0.5 Floating-point constant 0.5 All GPUs 563 1.0 Floating-point constant 1.0 All GPUs 564 2.0 Floating-point constant 2.0 All GPUs 565 4.0 Floating-point constant 4.0 All GPUs 566 -0.5 Floating-point constant -0.5 All GPUs 567 -1.0 Floating-point constant -1.0 All GPUs 568 -2.0 Floating-point constant -2.0 All GPUs 569 -4.0 Floating-point constant -4.0 All GPUs 570 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9, GFX10 571 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9, GFX10 572 0.15915494309189532 1.0/(2.0*pi). GFX8, GFX9, GFX10 573 ===================== ===================================================== ================== 574 575.. WARNING:: GFX7 does not support inline constants for *f16* operands. 576 577.. _amdgpu_synid_ival: 578 579ival 580~~~~ 581 582A symbolic operand encoded as an *inline constant*. 583These operands provide read-only access to H/W registers. 584 585 ======================== ================================================ ============= 586 Syntax Note Availability 587 ======================== ================================================ ============= 588 shared_base Base address of shared memory region. GFX9, GFX10 589 shared_limit Address of the end of shared memory region. GFX9, GFX10 590 private_base Base address of private memory region. GFX9, GFX10 591 private_limit Address of the end of private memory region. GFX9, GFX10 592 pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10 593 ======================== ================================================ ============= 594 595.. _amdgpu_synid_literal: 596 597literal 598------- 599 600A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream. 601Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`. 602 603If a number may be encoded as either 604a :ref:`literal<amdgpu_synid_literal>` or 605an :ref:`inline constant<amdgpu_synid_constant>`, 606assembler selects the latter encoding as more efficient. 607 608Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`, 609:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`, 610:ref:`absolute expressions<amdgpu_synid_absolute_expression>` or 611:ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`. 612 613An instruction may use only one literal but several operands may refer the same literal. 614 615.. _amdgpu_synid_uimm8: 616 617uimm8 618----- 619 620A 8-bit :ref:`integer number<amdgpu_synid_integer_number>` 621or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 622The value must be in the range 0..0xFF. 623 624.. _amdgpu_synid_uimm32: 625 626uimm32 627------ 628 629A 32-bit :ref:`integer number<amdgpu_synid_integer_number>` 630or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 631The value must be in the range 0..0xFFFFFFFF. 632 633.. _amdgpu_synid_uimm20: 634 635uimm20 636------ 637 638A 20-bit :ref:`integer number<amdgpu_synid_integer_number>` 639or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 640 641The value must be in the range 0..0xFFFFF. 642 643.. _amdgpu_synid_uimm21: 644 645uimm21 646------ 647 648A 21-bit :ref:`integer number<amdgpu_synid_integer_number>` 649or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 650 651The value must be in the range 0..0x1FFFFF. 652 653.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. 654 655.. _amdgpu_synid_simm21: 656 657simm21 658------ 659 660A 21-bit :ref:`integer number<amdgpu_synid_integer_number>` 661or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. 662 663The value must be in the range -0x100000..0x0FFFFF. 664 665.. WARNING:: Assembler currently supports 20-bit unsigned offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. 666 667.. _amdgpu_synid_off: 668 669off 670--- 671 672A special entity which indicates that the value of this operand is not used. 673 674 ================================== =================================================== 675 Syntax Description 676 ================================== =================================================== 677 off Indicates an unused operand. 678 ================================== =================================================== 679 680 681.. _amdgpu_synid_number: 682 683Numbers 684======= 685 686.. _amdgpu_synid_integer_number: 687 688Integer Numbers 689--------------- 690 691Integer numbers are 64 bits wide. 692They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>` 693as described :ref:`here<amdgpu_synid_int_conv>`. 694 695Integer numbers may be specified in binary, octal, hexadecimal and decimal formats: 696 697 ============ =============================== ======== 698 Format Syntax Example 699 ============ =============================== ======== 700 Decimal [-]?[1-9][0-9]* -1234 701 Binary [-]?0b[01]+ 0b1010 702 Octal [-]?0[0-7]+ 010 703 Hexadecimal [-]?0x[0-9a-fA-F]+ 0xff 704 \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 0ffh 705 ============ =============================== ======== 706 707.. _amdgpu_synid_floating-point_number: 708 709Floating-Point Numbers 710---------------------- 711 712All floating-point numbers are handled as double (64 bits wide). 713They are converted to 714:ref:`expected operand type<amdgpu_syn_instruction_type>` 715as described :ref:`here<amdgpu_synid_fp_conv>`. 716 717Floating-point numbers may be specified in hexadecimal and decimal formats: 718 719 ============ ======================================================== ====================== ==================== 720 Format Syntax Examples Note 721 ============ ======================================================== ====================== ==================== 722 Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? -1.234, 234e2 Must include either 723 a decimal separator 724 or an exponent. 725 Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ -0x1afp-10, 0x.1afp10 726 ============ ======================================================== ====================== ==================== 727 728.. _amdgpu_synid_expression: 729 730Expressions 731=========== 732 733An expression is evaluated to a 64-bit integer. 734Note that floating-point expressions are not supported. 735 736There are two kinds of expressions: 737 738* :ref:`Absolute<amdgpu_synid_absolute_expression>`. 739* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`. 740 741.. _amdgpu_synid_absolute_expression: 742 743Absolute Expressions 744-------------------- 745 746The value of an absolute expression does not change after program relocation. 747Absolute expressions must not include unassigned and relocatable values 748such as labels. 749 750Absolute expressions are evaluated to 64-bit integer values and converted to 751:ref:`expected operand type<amdgpu_syn_instruction_type>` 752as described :ref:`here<amdgpu_synid_int_conv>`. 753 754Examples: 755 756.. parsed-literal:: 757 758 x = -1 759 y = x + 10 760 761.. _amdgpu_synid_relocatable_expression: 762 763Relocatable Expressions 764----------------------- 765 766The value of a relocatable expression depends on program relocation. 767 768Note that use of relocatable expressions is limited with branch targets 769and 32-bit integer operands. 770 771A relocatable expression is evaluated to a 64-bit integer value 772which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>` 773of symbol(s) used in the expression. For example, if an instruction refers a label, 774this reference is evaluated to an offset from the address after the instruction 775to the label address: 776 777.. parsed-literal:: 778 779 label: 780 v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4 781 782Note that values of relocatable expressions are usually unknown at assembly time; 783they are resolved later by a linker and converted to 784:ref:`expected operand type<amdgpu_syn_instruction_type>` 785as described :ref:`here<amdgpu_synid_rl_conv>`. 786 787Operands and Operations 788----------------------- 789 790Expressions are composed of 64-bit integer operands and operations. 791Operands include :ref:`integer numbers<amdgpu_synid_integer_number>` 792and :ref:`symbols<amdgpu_synid_symbol>`. 793 794Expressions may also use "." which is a reference to the current PC (program counter). 795 796:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>` 797operations produce 64-bit integer results. 798 799Syntax of Expressions 800--------------------- 801 802The syntax of expressions is shown below:: 803 804 expr ::= expr binop expr | primaryexpr ; 805 806 primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ; 807 808 binop ::= '&&' 809 | '||' 810 | '|' 811 | '^' 812 | '&' 813 | '!' 814 | '==' 815 | '!=' 816 | '<>' 817 | '<' 818 | '<=' 819 | '>' 820 | '>=' 821 | '<<' 822 | '>>' 823 | '+' 824 | '-' 825 | '*' 826 | '/' 827 | '%' ; 828 829 unop ::= '~' 830 | '+' 831 | '-' 832 | '!' ; 833 834.. _amdgpu_synid_expression_bin_op: 835 836Binary Operators 837---------------- 838 839Binary operators are described in the following table. 840They operate on and produce 64-bit integers. 841Operators with higher priority are performed first. 842 843 ========== ========= =============================================== 844 Operator Priority Meaning 845 ========== ========= =============================================== 846 \* 5 Integer multiplication. 847 / 5 Integer division. 848 % 5 Integer signed remainder. 849 \+ 4 Integer addition. 850 \- 4 Integer subtraction. 851 << 3 Integer shift left. 852 >> 3 Logical shift right. 853 == 2 Equality comparison. 854 != 2 Inequality comparison. 855 <> 2 Inequality comparison. 856 < 2 Signed less than comparison. 857 <= 2 Signed less than or equal comparison. 858 > 2 Signed greater than comparison. 859 >= 2 Signed greater than or equal comparison. 860 \| 1 Bitwise or. 861 ^ 1 Bitwise xor. 862 & 1 Bitwise and. 863 && 0 Logical and. 864 || 0 Logical or. 865 ========== ========= =============================================== 866 867.. _amdgpu_synid_expression_un_op: 868 869Unary Operators 870--------------- 871 872Unary operators are described in the following table. 873They operate on and produce 64-bit integers. 874 875 ========== =============================================== 876 Operator Meaning 877 ========== =============================================== 878 ! Logical negation. 879 ~ Bitwise negation. 880 \+ Integer unary plus. 881 \- Integer unary minus. 882 ========== =============================================== 883 884.. _amdgpu_synid_symbol: 885 886Symbols 887------- 888 889A symbol is a named 64-bit integer value, representing a relocatable 890address or an absolute (non-relocatable) number. 891 892Symbol names have the following syntax: 893 ``[a-zA-Z_.][a-zA-Z0-9_$.@]*`` 894 895The table below provides several examples of syntax used for symbol definition. 896 897 ================ ========================================================== 898 Syntax Meaning 899 ================ ========================================================== 900 .globl <S> Declares a global symbol S without assigning it a value. 901 .set <S>, <E> Assigns the value of an expression E to a symbol S. 902 <S> = <E> Assigns the value of an expression E to a symbol S. 903 <S>: Declares a label S and assigns it the current PC value. 904 ================ ========================================================== 905 906A symbol may be used before it is declared or assigned; 907unassigned symbols are assumed to be PC-relative. 908 909Additional information about symbols may be found :ref:`here<amdgpu-symbols>`. 910 911.. _amdgpu_synid_conv: 912 913Type and Size Conversion 914======================== 915 916This section describes what happens when a 64-bit 917:ref:`integer number<amdgpu_synid_integer_number>`, a 918:ref:`floating-point number<amdgpu_synid_floating-point_number>` or an 919:ref:`expression<amdgpu_synid_expression>` 920is used for an operand which has a different type or size. 921 922.. _amdgpu_synid_int_conv: 923 924Conversion of Integer Values 925---------------------------- 926 927Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or 928:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to 929the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps: 930 9311. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width* 932(see the table below). There are two cases when this operation is enabled: 933 934 * The truncated bits are all 0. 935 * The truncated bits are all 1 and the value after truncation has its MSB bit set. 936 937In all other cases assembler triggers an error. 938 9392. *Conversion*. The input value is converted to the expected type as described in the table below. 940Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both). 941 942 ============== ================= =============== ==================================================================== 943 Expected type Truncation Width Conversion Description 944 ============== ================= =============== ==================================================================== 945 i16, u16, b16 16 num.u16 Truncate to 16 bits. 946 i32, u32, b32 32 num.u32 Truncate to 32 bits. 947 i64 32 {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits. 948 u64, b64 32 {0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits. 949 f16 16 num.u16 Use low 16 bits as an f16 value. 950 f32 32 num.u32 Use low 32 bits as an f32 value. 951 f64 32 {num.u32,0} Use low 32 bits of the number as high 32 bits 952 of the result; low 32 bits of the result are zeroed. 953 ============== ================= =============== ==================================================================== 954 955Examples of enabled conversions: 956 957.. parsed-literal:: 958 959 // GFX9 960 961 v_add_u16 v0, -1, 0 // src0 = 0xFFFF 962 v_add_f16 v0, -1, 0 // src0 = 0xFFFF (NaN) 963 // 964 v_add_u32 v0, -1, 0 // src0 = 0xFFFFFFFF 965 v_add_f32 v0, -1, 0 // src0 = 0xFFFFFFFF (NaN) 966 // 967 v_add_u16 v0, 0xff00, v0 // src0 = 0xff00 968 v_add_u16 v0, 0xffffffffffffff00, v0 // src0 = 0xff00 969 v_add_u16 v0, -256, v0 // src0 = 0xff00 970 // 971 s_bfe_i64 s[0:1], 0xffefffff, s3 // src0 = 0xffffffffffefffff 972 s_bfe_u64 s[0:1], 0xffefffff, s3 // src0 = 0x00000000ffefffff 973 v_ceil_f64_e32 v[0:1], 0xffefffff // src0 = 0xffefffff00000000 (-1.7976922776554302e308) 974 // 975 x = 0xffefffff // 976 s_bfe_i64 s[0:1], x, s3 // src0 = 0xffffffffffefffff 977 s_bfe_u64 s[0:1], x, s3 // src0 = 0x00000000ffefffff 978 v_ceil_f64_e32 v[0:1], x // src0 = 0xffefffff00000000 (-1.7976922776554302e308) 979 980Examples of disabled conversions: 981 982.. parsed-literal:: 983 984 // GFX9 985 986 v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1 987 v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result 988 989.. _amdgpu_synid_fp_conv: 990 991Conversion of Floating-Point Values 992----------------------------------- 993 994Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`. 995These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps: 996 9971. *Validation*. Assembler checks if the input f64 number can be converted 998to the *required floating-point type* (see the table below) without overflow or underflow. 999Precision lost is allowed. If this conversion is not possible, assembler triggers an error. 1000 10012. *Conversion*. The input value is converted to the expected type as described in the table below. 1002Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both). 1003 1004 ============== ================ ================= ================================================================= 1005 Expected type Required FP Type Conversion Description 1006 ============== ================ ================= ================================================================= 1007 i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value. 1008 i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value. 1009 i64, u64, b64 \- \- Conversion disabled. 1010 f16 f16 f16(num) Convert to f16. 1011 f32 f32 f32(num) Convert to f32. 1012 f64 f64 {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result; 1013 zero-fill low 32 bits of the result. 1014 1015 Note that the result may differ from the original number. 1016 ============== ================ ================= ================================================================= 1017 1018Examples of enabled conversions: 1019 1020.. parsed-literal:: 1021 1022 // GFX9 1023 1024 v_add_f16 v0, 1.0, 0 // src0 = 0x3C00 (1.0) 1025 v_add_u16 v0, 1.0, 0 // src0 = 0x3C00 1026 // 1027 v_add_f32 v0, 1.0, 0 // src0 = 0x3F800000 (1.0) 1028 v_add_u32 v0, 1.0, 0 // src0 = 0x3F800000 1029 1030 // src0 before conversion: 1031 // 1.7976931348623157e308 = 0x7fefffffffffffff 1032 // src0 after conversion: 1033 // 1.7976922776554302e308 = 0x7fefffff00000000 1034 v_ceil_f64 v[0:1], 1.7976931348623157e308 1035 1036 v_add_f16 v1, 65500.0, v2 // ok for f16. 1037 v_add_f32 v1, 65600.0, v2 // ok for f32, but would result in overflow for f16. 1038 1039Examples of disabled conversions: 1040 1041.. parsed-literal:: 1042 1043 // GFX9 1044 1045 v_add_f16 v1, 65600.0, v2 // overflow 1046 1047.. _amdgpu_synid_rl_conv: 1048 1049Conversion of Relocatable Values 1050-------------------------------- 1051 1052:ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>` 1053may be used with 32-bit integer operands and jump targets. 1054 1055When the value of a relocatable expression is resolved by a linker, it is 1056converted as needed and truncated to the operand size. The conversion depends 1057on :ref:`relocation type<amdgpu-relocation-records>` and operand kind. 1058 1059For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*, 1060this reference is evaluated to a 64-bit offset from the address after the 1061instruction to the address being referenced, *counted in bytes*. 1062Then the value is truncated to 32 bits and encoded as a literal: 1063 1064.. parsed-literal:: 1065 1066 expr = . 1067 v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4 1068 // and then truncated to 0xFFFFFFFC 1069 1070As another example, when a branch instruction refers a label, 1071this reference is evaluated to an offset from the address after the 1072instruction to the label address, *counted in dwords*. 1073Then the value is truncated to 16 bits: 1074 1075.. parsed-literal:: 1076 1077 label: 1078 s_branch label // 'label' operand is evaluated to -1 and truncated to 0xFFFF 1079