1# Allow Location Descriptions on the DWARF Expression Stack <!-- omit in toc --> 2 3- [Extension](#extension) 4- [Heterogeneous Computing Devices](#heterogeneous-computing-devices) 5- [DWARF 5](#dwarf-5) 6 - [What is DWARF?](#what-is-dwarf) 7 - [Examples](#examples) 8 - [Dynamic Array Size](#dynamic-array-size) 9 - [Variable Location in Register](#variable-location-in-register) 10 - [Variable Location in Memory](#variable-location-in-memory) 11 - [Variable Spread Across Different Locations](#variable-spread-across-different-locations) 12 - [Offsetting a Composite Location](#offsetting-a-composite-location) 13 - [Limitations](#limitations) 14- [Extension Solution](#extension-solution) 15 - [Location Description](#location-description) 16 - [Stack Location Description Operations](#stack-location-description-operations) 17 - [Examples](#examples-1) 18 - [Source Language Variable Spilled to Part of a Vector Register](#source-language-variable-spilled-to-part-of-a-vector-register) 19 - [Source Language Variable Spread Across Multiple Vector Registers](#source-language-variable-spread-across-multiple-vector-registers) 20 - [Source Language Variable Spread Across Multiple Kinds of Locations](#source-language-variable-spread-across-multiple-kinds-of-locations) 21 - [Address Spaces](#address-spaces) 22 - [Bit Offsets](#bit-offsets) 23 - [Call Frame Information (CFI)](#call-frame-information-cfi) 24 - [Objects Not In Byte Aligned Global Memory](#objects-not-in-byte-aligned-global-memory) 25 - [Higher Order Operations](#higher-order-operations) 26 - [Objects In Multiple Places](#objects-in-multiple-places) 27- [Conclusion](#conclusion) 28- [Further Information](#further-information) 29 30# Extension 31 32This extension is to generalize the DWARF expression evaluation model to allow 33location descriptions to be manipulated on the stack. It is done in a manner 34that is backwards compatible with DWARF 5. This permits operations to act on 35location descriptions in an incremental, consistent, and composable manner. 36 37It allows a small number of operations to be defined to address the requirements 38of heterogeneous devices as well as providing benefits to non-heterogeneous 39devices. It also acts as a foundation to provide support for other issues that 40have been raised that would benefit all devices. 41 42Other approaches were explored that involved adding specialized operations and 43rules. However, these resulted in the need for more operations that did not 44compose. It also resulted in operations with context sensitive semantics and 45corner cases that had to be defined. The observation was that numerous 46specialized context sensitive operations are harder for both produces and 47consumers than a smaller number of general composable operations that have 48consistent semantics regardless of context. 49 50The following sections first describe heterogeneous devices and the features 51they have that are not addressed by DWARF 5. Then a brief simplified overview of 52the DWARF 5 expression evaluation model is presented that highlights the 53difficulties for supporting the heterogeneous features. Finally, an overview of 54the extension is presented, using simplified examples to illustrate how it can 55address the issues of heterogeneous devices and also benefit non-heterogeneous 56devices. References to further information are provided. 57 58# Heterogeneous Computing Devices 59 60GPUs and other heterogeneous computing devices have features not common to CPU 61computing devices. 62 63These devices often have many more registers than a CPU. This helps reduce 64memory accesses which tend to be more expensive than on a CPU due to the much 65larger number of threads concurrently executing. In addition to traditional 66scalar registers of a CPU, these devices often have many wide vector registers. 67 68 69 70They may support masked vector instructions that are used by the compiler to map 71high level language threads onto the lanes of the vector registers. As a 72consequence, multiple language threads execute in lockstep as the vector 73instructions are executed. This is termed single instruction multiple thread 74(SIMT) execution. 75 76 77 78GPUs can have multiple memory address spaces in addition to the single global 79memory address space of a CPU. These additional address spaces are accessed 80using distinct instructions and are often local to a particular thread or group 81of threads. 82 83For example, a GPU may have a per thread block address space that is implemented 84as scratch pad memory with explicit hardware support to isolate portions to 85specific groups of threads created as a single thread block. 86 87A GPU may also use global memory in a non linear manner. For example, to support 88providing a SIMT per lane address space efficiently, there may be instructions 89that support interleaved access. 90 91Through optimization, the source variables may be located across these different 92storage kinds. SIMT execution requires locations to be able to express selection 93of runtime defined pieces of vector registers. With the more complex locations, 94there is a benefit to be able to factorize their calculation which requires all 95location kinds to be supported uniformly, otherwise duplication is necessary. 96 97# DWARF 5 98 99Before presenting the proposed solution to supporting heterogeneous devices, a 100brief overview of the DWARF 5 expression evaluation model will be given to 101highlight the aspects being addressed by the extension. 102 103## What is DWARF? 104 105DWARF is a standardized way to specify debug information. It describes source 106language entities such as compilation units, functions, types, variables, etc. 107It is either embedded directly in sections of the code object executables, or 108split into separate files that they reference. 109 110DWARF maps between source program language entities and their hardware 111representations. For example: 112 113- It maps a hardware instruction program counter to a source language program 114 line, and vice versa. 115- It maps a source language function to the hardware instruction program counter 116 for its entry point. 117- It maps a source language variable to its hardware location when at a 118 particular program counter. 119- It provides information to allow virtual unwinding of hardware registers for a 120 source language function call stack. 121- In addition, it provides numerous other information about the source language 122 program. 123 124In particular, there is great diversity in the way a source language entity 125could be mapped to a hardware location. The location may involve runtime values. 126For example, a source language variable location could be: 127 128- In register. 129- At a memory address. 130- At an offset from the current stack pointer. 131- Optimized away, but with a known compiler time value. 132- Optimized away, but with an unknown value, such as happens for unused 133 variables. 134- Spread across combination of the above kinds of locations. 135- At a memory address, but also transiently loaded into registers. 136 137To support this DWARF 5 defines a rich expression language comprised of loclist 138expressions and operation expressions. Loclist expressions allow the result to 139vary depending on the PC. Operation expressions are made up of a list of 140operations that are evaluated on a simple stack machine. 141 142A DWARF expression can be used as the value of different attributes of different 143debug information entries (DIE). A DWARF expression can also be used as an 144argument to call frame information information (CFI) entry operations. An 145expression is evaluated in a context dictated by where it is used. The context 146may include: 147 148- Whether the expression needs to produce a value or the location of an entity. 149- The current execution point including process, thread, PC, and stack frame. 150- Some expressions are evaluated with the stack initialized with a specific 151 value or with the location of a base object that is available using the 152 DW_OP_push_object_address operation. 153 154## Examples 155 156The following examples illustrate how DWARF expressions involving operations are 157evaluated in DWARF 5. DWARF also has expressions involving location lists that 158are not covered in these examples. 159 160### Dynamic Array Size 161 162The first example is for an operation expression associated with a DIE attribute 163that provides the number of elements in a dynamic array type. Such an attribute 164dictates that the expression must be evaluated in the context of providing a 165value result kind. 166 167 168 169In this hypothetical example, the compiler has allocated an array descriptor in 170memory and placed the descriptor's address in architecture register SGPR0. The 171first location of the array descriptor is the runtime size of the array. 172 173A possible expression to retrieve the dynamic size of the array is: 174 175 DW_OP_regval_type SGPR0 Generic 176 DW_OP_deref 177 178The expression is evaluated one operation at a time. Operations have operands 179and can pop and push entries on a stack. 180 181 182 183The expression evaluation starts with the first DW_OP_regval_type operation. 184This operation reads the current value of an architecture register specified by 185its first operand: SGPR0. The second operand specifies the size of the data to 186read. The read value is pushed on the stack. Each stack element is a value and 187its associated type. 188 189 190 191The type must be a DWARF base type. It specifies the encoding, byte ordering, 192and size of values of the type. DWARF defines that each architecture has a 193default generic type: it is an architecture specific integral encoding and byte 194ordering, that is the size of the architecture's global memory address. 195 196The DW_OP_deref operation pops a value off the stack, treats it as a global 197memory address, and reads the contents of that location using the generic type. 198It pushes the read value on the stack as the value and its associated generic 199type. 200 201 202 203The evaluation stops when it reaches the end of the expression. The result of an 204expression that is evaluated with a value result kind context is the top element 205of the stack, which provides the value and its type. 206 207### Variable Location in Register 208 209This example is for an operation expression associated with a DIE attribute that 210provides the location of a source language variable. Such an attribute dictates 211that the expression must be evaluated in the context of providing a location 212result kind. 213 214DWARF defines the locations of objects in terms of location descriptions. 215 216In this example, the compiler has allocated a source language variable in 217architecture register SGPR0. 218 219 220 221A possible expression to specify the location of the variable is: 222 223 DW_OP_regx SGPR0 224 225 226 227The DW_OP_regx operation creates a location description that specifies the 228location of the architecture register specified by the operand: SGPR0. Unlike 229values, location descriptions are not pushed on the stack. Instead they are 230conceptually placed in a location area. Unlike values, location descriptions do 231not have an associated type, they only denote the location of the base of the 232object. 233 234 235 236Again, evaluation stops when it reaches the end of the expression. The result of 237an expression that is evaluated with a location result kind context is the 238location description in the location area. 239 240### Variable Location in Memory 241 242The next example is for an operation expression associated with a DIE attribute 243that provides the location of a source language variable that is allocated in a 244stack frame. The compiler has placed the stack frame pointer in architecture 245register SGPR0, and allocated the variable at offset 0x10 from the stack frame 246base. The stack frames are allocated in global memory, so SGPR0 contains a 247global memory address. 248 249 250 251A possible expression to specify the location of the variable is: 252 253 DW_OP_regval_type SGPR0 Generic 254 DW_OP_plus_uconst 0x10 255 256 257 258As in the previous example, the DW_OP_regval_type operation pushes the stack 259frame pointer global memory address onto the stack. The generic type is the size 260of a global memory address. 261 262 263 264The DW_OP_plus_uconst operation pops a value from the stack, which must have a 265type with an integral encoding, adds the value of its operand, and pushes the 266result back on the stack with the same associated type. In this example, that 267computes the global memory address of the source language variable. 268 269 270 271Evaluation stops when it reaches the end of the expression. If the expression 272that is evaluated has a location result kind context, and the location area is 273empty, then the top stack element must be a value with the generic type. The 274value is implicitly popped from the stack, and treated as a global memory 275address to create a global memory location description, which is placed in the 276location area. The result of the expression is the location description in the 277location area. 278 279 280 281### Variable Spread Across Different Locations 282 283This example is for a source variable that is partly in a register, partly undefined, and partly in memory. 284 285 286 287DWARF defines composite location descriptions that can have one or more parts. 288Each part specifies a location description and the number of bytes used from it. 289The following operation expression creates a composite location description. 290 291 DW_OP_regx SGPR3 292 DW_OP_piece 4 293 DW_OP_piece 2 294 DW_OP_bregx SGPR0 0x10 295 DW_OP_piece 2 296 297 298 299The DW_OP_regx operation creates a register location description in the location 300area. 301 302 303 304The first DW_OP_piece operation creates an incomplete composite location 305description in the location area with a single part. The location description in 306the location area is used to define the beginning of the part for the size 307specified by the operand, namely 4 bytes. 308 309 310 311A subsequent DW_OP_piece adds a new part to an incomplete composite location 312description already in the location area. The parts form a contiguous set of 313bytes. If there are no other location descriptions in the location area, and no 314value on the stack, then the part implicitly uses the undefined location 315description. Again, the operand specifies the size of the part in bytes. The 316undefined location description can be used to indicate a part that has been 317optimized away. In this case, 2 bytes of undefined value. 318 319 320 321The DW_OP_bregx operation reads the architecture register specified by the first 322operand (SGPR0) as the generic type, adds the value of the second operand 323(0x10), and pushes the value on the stack. 324 325 326 327The next DW_OP_piece operation adds another part to the already created 328incomplete composite location. 329 330If there is no other location in the location area, but there is a value on 331stack, the new part is a memory location description. The memory address used is 332popped from the stack. In this case, the operand of 2 indicates there are 2 333bytes from memory. 334 335 336 337Evaluation stops when it reaches the end of the expression. If the expression 338that is evaluated has a location result kind context, and the location area has 339an incomplete composite location description, the incomplete composite location 340is implicitly converted to a complete composite location description. The result 341of the expression is the location description in the location area. 342 343 344 345### Offsetting a Composite Location 346 347This example attempts to extend the previous example to offset the composite 348location description it created. The *Variable Location in Memory* example 349conveniently used the DW_OP_plus operation to offset a memory address. 350 351 DW_OP_regx SGPR3 352 DW_OP_piece 4 353 DW_OP_piece 2 354 DW_OP_bregx SGPR0 0x10 355 DW_OP_piece 2 356 DW_OP_plus_uconst 5 357 358 359 360However, DW_OP_plus cannot be used to offset a composite location. It only 361operates on the stack. 362 363 364 365To offset a composite location description, the compiler would need to make a 366different composite location description, starting at the part corresponding to 367the offset. For example: 368 369 DW_OP_piece 1 370 DW_OP_bregx SGPR0 0x10 371 DW_OP_piece 2 372 373This illustrates that operations on stack values are not composable with 374operations on location descriptions. 375 376## Limitations 377 378DWARF 5 is unable to describe variables in runtime indexed parts of registers. 379This is required to describe a source variable that is located in a lane of a 380SIMT vector register. 381 382Some features only work when located in global memory. The type attribute 383expressions require a base object which could be in any kind of location. 384 385DWARF procedures can only accept global memory address arguments. This limits 386the ability to factorize the creation of locations that involve other location 387kinds. 388 389There are no vector base types. This is required to describe vector registers. 390 391There is no operation to create a memory location in a non-global address space. 392Only the dereference operation supports providing an address space. 393 394CFI location expressions do not allow composite locations or non-global address 395space memory locations. Both these are needed in optimized code for devices with 396vector registers and address spaces. 397 398Bit field offsets are only supported in a limited way for register locations. 399Supporting them in a uniform manner for all location kinds is required to 400support languages with bit sized entities. 401 402# Extension Solution 403 404This section outlines the extension to generalize the DWARF expression evaluation 405model to allow location descriptions to be manipulated on the stack. It presents 406a number of simplified examples to demonstrate the benefits and how the extension 407solves the issues of heterogeneous devices. It presents how this is done in 408a manner that is backwards compatible with DWARF 5. 409 410## Location Description 411 412In order to have consistent, composable operations that act on location 413descriptions, the extension defines a uniform way to handle all location kinds. 414That includes memory, register, implicit, implicit pointer, undefined, and 415composite location descriptions. 416 417Each kind of location description is conceptually a zero-based offset within a 418piece of storage. The storage is a contiguous linear organization of a certain 419number of bytes (see below for how this is extended to support bit sized 420storage). 421 422- For global memory, the storage is the linear stream of bytes of the 423 architecture's address size. 424- For each separate architecture register, it is the linear stream of bytes of 425 the size of that specific register. 426- For an implicit, it is the linear stream of bytes of the value when 427 represented using the value's base type which specifies the encoding, size, 428 and byte ordering. 429- For undefined, it is an infinitely sized linear stream where every byte is 430 undefined. 431- For composite, it is a linear stream of bytes defined by the composite's parts. 432 433## Stack Location Description Operations 434 435The DWARF expression stack is extended to allow each stack entry to either be a 436value or a location description. 437 438Evaluation rules are defined to implicitly convert a stack element that is a 439value to a location description, or vice versa, so that all DWARF 5 expressions 440continue to have the same semantics. This reflects that a memory address is 441effectively used as a proxy for a memory location description. 442 443For each place that allows a DWARF expression to be specified, it is defined if 444the expression is to be evaluated as a value or a location description. 445 446Existing DWARF expression operations that are used to act on memory addresses 447are generalized to act on any location description kind. For example, the 448DW_OP_deref operation pops a location description rather than a memory address 449value from the stack and reads the storage associated with the location kind 450starting at the location description's offset. 451 452Existing DWARF expression operations that create location descriptions are 453changed to pop and push location descriptions on the stack. For example, the 454DW_OP_value, DW_OP_regx, DW_OP_implicit_value, DW_OP_implicit_pointer, 455DW_OP_stack_value, and DW_OP_piece. 456 457New operations that act on location descriptions can be added. For example, a 458DW_OP_offset operation that modifies the offset of the location description on 459top of the stack. Unlike the DW_OP_plus operation that only works with memory 460address, a DW_OP_offset operation can work with any location kind. 461 462To allow incremental and nested creation of composite location descriptions, a 463DW_OP_piece_end can be defined to explicitly indicate the last part of a 464composite. Currently, creating a composite must always be the last operation of 465an expression. 466 467A DW_OP_undefined operation can be defined that explicitly creates the undefined 468location description. Currently this is only possible as a piece of a composite 469when the stack is empty. 470 471## Examples 472 473This section provides some motivating examples to illustrate the benefits that 474result from allowing location descriptions on the stack. 475 476### Source Language Variable Spilled to Part of a Vector Register 477 478A compiler generating code for a GPU may allocate a source language variable 479that it proves has the same value for every lane of a SIMT thread in a scalar 480register. It may then need to spill that scalar register. To avoid the high cost 481of spilling to memory, it may spill to a fixed lane of one of the numerous 482vector registers. 483 484 485 486The following expression defines the location of a source language variable that 487the compiler allocated in a scalar register, but had to spill to lane 5 of a 488vector register at this point of the code. 489 490 DW_OP_regx VGPR0 491 DW_OP_offset_uconst 20 492 493 494 495The DW_OP_regx pushes a register location description on the stack. The storage 496for the register is the size of the vector register. The register location 497description conceptually references that storage with an initial offset of 0. 498The architecture defines the byte ordering of the register. 499 500 501 502The DW_OP_offset_uconst pops a location description off the stack, adds its 503operand value to the offset, and pushes the updated location description back on 504the stack. In this case the source language variable is being spilled to lane 5 505and each lane's component which is 32-bits (4 bytes), so the offset is 5*4=20. 506 507 508 509The result of the expression evaluation is the location description on the top 510of the stack. 511 512An alternative approach could be for the target to define distinct register 513names for each part of each vector register. However, this is not practical for 514GPUs due to the sheer number of registers that would have to be defined. It 515would also not permit a runtime index into part of the whole register to be used 516as shown in the next example. 517 518### Source Language Variable Spread Across Multiple Vector Registers 519 520A compiler may generate SIMT code for a GPU. Each source language thread of 521execution is mapped to a single lane of the GPU thread. Source language 522variables that are mapped to a register, are mapped to the lane component of the 523vector registers corresponding to the source language's thread of execution. 524 525The location expression for such variables must therefore be executed in the 526context of the focused source language thread of execution. A DW_OP_push_lane 527operation can be defined to push the value of the lane for the currently focused 528source language thread of execution. The value to use would be provided by the 529consumer of DWARF when it evaluates the location expression. 530 531If the source language variable is larger than the size of the vector register 532lane component, then multiple vector registers are used. Each source language 533thread of execution will only use the vector register components for its 534associated lane. 535 536 537 538The following expression defines the location of a source language variable that 539has to occupy two vector registers. A composite location description is created 540that combines the two parts. It will give the correct result regardless of which 541lane corresponds to the source language thread of execution that the user is 542focused on. 543 544 DW_OP_regx VGPR0 545 DW_OP_push_lane 546 DW_OP_uconst 4 547 DW_OP_mul 548 DW_OP_offset 549 DW_OP_piece 4 550 DW_OP_regx VGPR1 551 DW_OP_push_lane 552 DW_OP_uconst 4 553 DW_OP_mul 554 DW_OP_offset 555 DW_OP_piece 4 556 557 558 559The DW_OP_regx VGPR0 pushes a location description for the first register. 560 561 562 563The DW_OP_push_lane; DW_OP_uconst 4; DW_OP_mul calculates the offset for the 564focused lanes vector register component as 4 times the lane number. 565 566 567 568 569 570 571 572The DW_OP_offset adjusts the register location description's offset to the 573runtime computed value. 574 575 576 577The DW_OP_piece either creates a new composite location description, or adds a 578new part to an existing incomplete one. It pops the location description to use 579for the new part. It then pops the next stack element if it is an incomplete 580composite location description, otherwise it creates a new incomplete composite 581location description with no parts. Finally it pushes the incomplete composite 582after adding the new part. 583 584In this case a register location description is added to a new incomplete 585composite location description. The 4 of the DW_OP_piece specifies the size of 586the register storage that comprises the part. Note that the 4 bytes start at the 587computed register offset. 588 589For backwards compatibility, if the stack is empty or the top stack element is 590an incomplete composite, an undefined location description is used for the part. 591If the top stack element is a generic base type value, then it is implicitly 592converted to a global memory location description with an offset equal to the 593value. 594 595 596 597The rest of the expression does the same for VGPR1. However, when the 598DW_OP_piece is evaluated there is an incomplete composite on the stack. So the 599VGPR1 register location description is added as a second part. 600 601 602 603 604 605 606 607 608 609 610 611 612 613At the end of the expression, if the top stack element is an incomplete 614composite location description, it is converted to a complete location 615description and returned as the result. 616 617 618 619### Source Language Variable Spread Across Multiple Kinds of Locations 620 621This example is the same as the previous one, except the first 2 bytes of the 622second vector register have been spilled to memory, and the last 2 bytes have 623been proven to be a constant and optimized away. 624 625 626 627 DW_OP_regx VGPR0 628 DW_OP_push_lane 629 DW_OP_uconst 4 630 DW_OP_mul 631 DW_OP_offset 632 DW_OP_piece 4 633 DW_OP_addr 0xbeef 634 DW_OP_piece 2 635 DW_OP_uconst 0xf00d 636 DW_OP_stack_value 637 DW_OP_piece 2 638 DW_OP_piece_end 639 640The first 6 operations are the same. 641 642 643 644The DW_OP_addr operation pushes a global memory location description on the 645stack with an offset equal to the address. 646 647 648 649The next DW_OP_piece adds the global memory location description as the next 2 650byte part of the composite. 651 652 653 654The DW_OP_uconst 0xf00d; DW_OP_stack_value pushes an implicit location 655description on the stack. The storage of the implicit location description is 656the representation of the value 0xf00d using the generic base type's encoding, 657size, and byte ordering. 658 659 660 661 662 663The final DW_OP_piece adds 2 bytes of the implicit location description as the 664third part of the composite location description. 665 666 667 668The DW_OP_piece_end operation explicitly makes the incomplete composite location 669description into a complete location description. This allows a complete 670composite location description to be created on the stack that can be used as 671the location description of another following operation. For example, the 672DW_OP_offset can be applied to it. More practically, it permits creation of 673multiple composite location descriptions on the stack which can be used to pass 674arguments to a DWARF procedure using a DW_OP_call* operation. This can be 675beneficial to factor the incrementally creation of location descriptions. 676 677 678 679### Address Spaces 680 681Heterogeneous devices can have multiple hardware supported address spaces which 682use specific hardware instructions to access them. 683 684For example, GPUs that use SIMT execution may provide hardware support to access 685memory such that each lane can see a linear memory view, while the backing 686memory is actually being accessed in an interleaved manner so that the locations 687for each lanes Nth dword are contiguous. This minimizes cache lines read by the 688SIMT execution. 689 690 691 692The following expression defines the location of a source language variable that 693is allocated at offset 0x10 in the current subprograms stack frame. The 694subprogram stack frames are per lane and reside in an interleaved address space. 695 696 DW_OP_regval_type SGPR0 Generic 697 DW_OP_uconst 1 698 DW_OP_form_aspace_address 699 DW_OP_offset 0x10 700 701 702 703The DW_OP_regval_type operation pushes the contents of SGPR0 as a generic value. 704This is the register that holds the address of the current stack frame. 705 706 707 708The DW_OP_uconst operation pushes the address space number. Each architecture 709defines the numbers it uses in DWARF. In this case, address space 1 is being 710used as the per lane memory. 711 712 713 714The DW_OP_form_aspace_address operation pops a value and an address space 715number. Each address space is associated with a separate storage. A memory 716location description is pushed which refers to the address space's storage, with 717an offset of the popped value. 718 719 720 721All operations that act on location descriptions work with memory locations 722regardless of their address space. 723 724Every architecture defines address space 0 as the default global memory address 725space. 726 727Generalizing memory location descriptions to include an address space component 728avoids having to create specialized operations to work with address spaces. 729 730The source variable is at offset 0x10 in the stack frame. The DW_OP_offset 731operation works on memory location descriptions that have an address space just 732like for any other kind of location description. 733 734 735 736The only operations in DWARF 5 that take an address space are DW_OP_xderef*. 737They treat a value as the address in a specified address space, and read its 738contents. There is no operation to actually create a location description that 739references an address space. There is no way to include address space memory 740locations in parts of composite locations. 741 742Since DW_OP_piece now takes any kind of location description for its pieces, it 743is now possible for parts of a composite to involve locations in different 744address spaces. For example, this can happen when parts of a source variable 745allocated in a register are spilled to a stack frame that resides in the 746non-global address space. 747 748### Bit Offsets 749 750With the generalization of location descriptions on the stack, it is possible to 751define a DW_OP_bit_offset operation that adjusts the offset of any kind of 752location in terms of bits rather than bytes. The offset can be a runtime 753computed value. This is generally useful for any source language that support 754bit sized entities, and for registers that are not a whole number of bytes. 755 756DWARF 5 only supports bit fields in composites using DW_OP_bit_piece. It does 757not support runtime computed offsets which can happen for bit field packed 758arrays. It is also not generally composable as it must be the last part of an 759expression. 760 761The following example defines a location description for a source variable that 762is allocated starting at bit 20 of a register. A similar expression could be 763used if the source variable was at a bit offset within memory or a particular 764address space, or if the offset is a runtime value. 765 766 767 768 DW_OP_regx SGPR3 769 DW_OP_uconst 20 770 DW_OP_bit_offset 771 772 773 774 775 776 777 778The DW_OP_bit_offset operation pops a value and location description from the 779stack. It pushes the location description after updating its offset using the 780value as a bit count. 781 782 783 784The ordering of bits within a byte, like byte ordering, is defined by the target 785architecture. A base type could be extended to specify bit ordering in addition 786to byte ordering. 787 788## Call Frame Information (CFI) 789 790DWARF defines call frame information (CFI) that can be used to virtually unwind 791the subprogram call stack. This involves determining the location where register 792values have been spilled. DWARF 5 limits these locations to either be registers 793or global memory. As shown in the earlier examples, heterogeneous devices may 794spill registers to parts of other registers, to non-global memory address 795spaces, or even a composite of different location kinds. 796 797Therefore, the extension extends the CFI rules to support any kind of location 798description, and operations to create locations in address spaces. 799 800## Objects Not In Byte Aligned Global Memory 801 802DWARF 5 only effectively supports byte aligned memory locations on the stack by 803using a global memory address as a proxy for a memory location description. This 804is a problem for attributes that define DWARF expressions that require the 805location of some source language entity that is not allocated in byte aligned 806global memory. 807 808For example, the DWARF expression of the DW_AT_data_member_location attribute is 809evaluated with an initial stack containing the location of a type instance 810object. That object could be located in a register, in a non-global memory 811address space, be described by a composite location description, or could even 812be an implicit location description. 813 814A similar problem exists for DWARF expressions that use the 815DW_OP_push_object_address operation. This operation pushes the location of a 816program object associated with the attribute that defines the expression. 817 818Allowing any kind of location description on the stack permits the DW_OP_call* 819operations to be used to factor the creation of location descriptions. The 820inputs and outputs of the call are passed on the stack. For example, on GPUs an 821expression can be defined to describe the effective PC of inactive lanes of SIMT 822execution. This is naturally done by composing the result of expressions for 823each nested control flow region. This can be done by making each control flow 824region have its own DWARF procedure, and then calling it from the expressions of 825the nested control flow regions. The alternative is to make each control flow 826region have the complete expression which results in much larger DWARF and is 827less convenient to generate. 828 829GPU compilers work hard to allocate objects in the larger number of registers to 830reduce memory accesses, they have to use different memory address spaces, and 831they perform optimizations that result in composites of these. Allowing 832operations to work with any kind of location description enables creating 833expressions that support all of these. 834 835Full general support for bit fields and implicit locations benefits 836optimizations on any target. 837 838## Higher Order Operations 839 840The generalization allows an elegant way to add higher order operations that 841create location descriptions out of other location descriptions in a general 842composable manner. 843 844For example, a DW_OP_extend operation could create a composite location 845description out of a location description, an element size, and an element 846count. The resulting composite would effectively be a vector of element count 847elements with each element being the same location description of the specified 848bit size. 849 850A DW_OP_select_bit_piece operation could create a composite location description 851out of two location descriptions, a bit mask value, and an element size. The 852resulting composite would effectively be a vector of elements, selecting from 853one of the two input locations according to the bit mask. 854 855These could be used in the expression of an attribute that computes the 856effective PC of lanes of SIMT execution. The vector result efficiently computes 857the PC for each SIMT lane at once. The mask could be the hardware execution mask 858register that controls which SIMT lanes are executing. For active divergent 859lanes the vector element would be the current PC, and for inactive divergent 860lanes the PC would correspond to the source language line at which the lane is 861logically positioned. 862 863Similarly, a DW_OP_overlay_piece operation could be defined that creates a 864composite location description out of two location descriptions, an offset 865value, and a size. The resulting composite would consist of parts that are 866equivalent to one of the location descriptions, but with the other location 867description replacing a slice defined by the offset and size. This could be used 868to efficiently express a source language array that has had a set of elements 869promoted into a vector register when executing a set of iterations of a loop in 870a SIMD manner. 871 872## Objects In Multiple Places 873 874A compiler may allocate a source variable in stack frame memory, but for some 875range of code may promote it to a register. If the generated code does not 876change the register value, then there is no need to save it back to memory. 877Effectively, during that range, the source variable is in both memory and a 878register. If a consumer, such as a debugger, allows the user to change the value 879of the source variable in that PC range, then it would need to change both 880places. 881 882DWARF 5 supports loclists which are able to specify the location of a source 883language entity is in different places at different PC locations. It can also 884express that a source language entity is in multiple places at the same time. 885 886DWARF 5 defines operation expressions and loclists separately. In general, this 887is adequate as non-memory location descriptions can only be computed as the last 888step of an expression evaluation. 889 890However, allowing location descriptions on the stack permits non-memory location 891descriptions to be used in the middle of expression evaluation. For example, the 892DW_OP_call* and DW_OP_implicit_pointer operations can result in evaluating the 893expression of a DW_AT_location attribute of a DIE. The DW_AT_location attribute 894allows the loclist form. So the result could include multiple location 895descriptions. 896 897Similarly, the DWARF expression associated with attributes such as 898DW_AT_data_member_location that are evaluated with an initial stack containing a 899location description, or a DWARF operation expression that uses the 900DW_OP_push_object_address operation, may want to act on the result of another 901expression that returned a location description involving multiple places. 902 903Therefore, the extension needs to define how expression operations that use those 904results will behave. The extension does this by generalizing the expression stack 905to allow an entry to be one or more single location descriptions. In doing this, 906it unifies the definitions of DWARF operation expressions and loclist 907expressions in a natural way. 908 909All operations that act on location descriptions are extended to act on multiple 910single location descriptions. For example, the DW_OP_offset operation adds the 911offset to each single location description. The DW_OP_deref* operations simply 912read the storage of one of the single location descriptions, since multiple 913single location descriptions must all hold the same value. Similarly, if the 914evaluation of a DWARF expression results in multiple single location 915descriptions, the consumer can ensure any updates are done to all of them, and 916any reads can use any one of them. 917 918# Conclusion 919 920A strength of DWARF is that it has generally sought to provide generalized 921composable solutions that address many problems, rather than solutions that only 922address one-off issues. This extension attempts to follow that tradition by 923defining a backwards compatible composable generalization that can address a 924significant family of issues. It addresses the specific issues present for 925heterogeneous computing devices, provides benefits for non-heterogeneous 926devices, and can help address a number of other previously reported issues. 927 928# Further Information 929 930The following references provide additional information on the extension. 931 932Slides and a video of a presentation at the Linux Plumbers Conference 2021 933related to this extension are available. 934 935The LLVM compiler extension includes possible normative text changes for this 936extension as well as the operations mentioned in the motivating examples. It 937also covers other extensions needed for heterogeneous devices. 938 939- DWARF extensions for optimized SIMT/SIMD (GPU) debugging - Linux Plumbers Conference 2021 940 - [Video](https://www.youtube.com/watch?v=QiR0ra0ymEY&t=10015s) 941 - [Slides](https://linuxplumbersconf.org/event/11/contributions/1012/attachments/798/1505/DWARF_Extensions_for_Optimized_SIMT-SIMD_GPU_Debugging-LPC2021.pdf) 942- [DWARF Extensions For Heterogeneous Debugging](https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html) 943