1# Allow Location Descriptions on the DWARF Expression Stack <!-- omit in toc --> 2 3- [Extension](#extension) 4- [Heterogeneous Computing Devices](#heterogeneous-computing-devices) 5- [DWARF 5](#dwarf-5) 6 - [How DWARF Maps Source Language To Hardware](#how-dwarf-maps-source-language-to-hardware) 7 - [Examples](#examples) 8 - [Dynamic Array Size](#dynamic-array-size) 9 - [Variable Location in Register](#variable-location-in-register) 10 - [Variable Location in Memory](#variable-location-in-memory) 11 - [Variable Spread Across Different Locations](#variable-spread-across-different-locations) 12 - [Offsetting a Composite Location](#offsetting-a-composite-location) 13 - [Limitations](#limitations) 14- [Extension Solution](#extension-solution) 15 - [Location Description](#location-description) 16 - [Stack Location Description Operations](#stack-location-description-operations) 17 - [Examples](#examples-1) 18 - [Source Language Variable Spilled to Part of a Vector Register](#source-language-variable-spilled-to-part-of-a-vector-register) 19 - [Source Language Variable Spread Across Multiple Vector Registers](#source-language-variable-spread-across-multiple-vector-registers) 20 - [Source Language Variable Spread Across Multiple Kinds of Locations](#source-language-variable-spread-across-multiple-kinds-of-locations) 21 - [Address Spaces](#address-spaces) 22 - [Bit Offsets](#bit-offsets) 23 - [Call Frame Information (CFI)](#call-frame-information-cfi) 24 - [Objects Not In Byte Aligned Global Memory](#objects-not-in-byte-aligned-global-memory) 25 - [Higher Order Operations](#higher-order-operations) 26 - [Objects In Multiple Places](#objects-in-multiple-places) 27- [Conclusion](#conclusion) 28- [Further Information](#further-information) 29 30# Extension 31 32In DWARF 5, expressions are evaluated using a typed value stack, a separate 33location area, and an independent loclist mechanism. This extension unifies all 34three mechanisms into a single generalized DWARF expression evaluation model 35that allows both typed values and location descriptions to be manipulated on the 36evaluation stack. Both single and multiple location descriptions are supported 37on the stack. In addition, the call frame information (CFI) is extended to 38support the full generality of location descriptions. This is done in a manner 39that is backwards compatible with DWARF 5. The extension involves changes to the 40DWARF 5 sections 2.5 (pp 26-38), 2.6 (pp 38-45), and 6.4 (pp 171-182). 41 42The extension permits operations to act on location descriptions in an 43incremental, consistent, and composable manner. It allows a small number of 44operations to be defined to address the requirements of heterogeneous devices as 45well as providing benefits to non-heterogeneous devices. It acts as a foundation 46to provide support for other issues that have been raised that would benefit all 47devices. 48 49Other approaches were explored that involved adding specialized operations and 50rules. However, these resulted in the need for more operations that did not 51compose. It also resulted in operations with context sensitive semantics and 52corner cases that had to be defined. The observation was that numerous 53specialized context sensitive operations are harder for both produces and 54consumers than a smaller number of general composable operations that have 55consistent semantics regardless of context. 56 57The following sections first describe heterogeneous devices and the features 58they have that are not addressed by DWARF 5. Then a brief simplified overview of 59the DWARF 5 expression evaluation model is presented that highlights the 60difficulties for supporting the heterogeneous features. Finally, an overview of 61the extension is presented, using simplified examples to illustrate how it can 62address the issues of heterogeneous devices and also benefit non-heterogeneous 63devices. References to further information are provided. 64 65# Heterogeneous Computing Devices 66 67GPUs and other heterogeneous computing devices have features not common to CPU 68computing devices. 69 70These devices often have many more registers than a CPU. This helps reduce 71memory accesses which tend to be more expensive than on a CPU due to the much 72larger number of threads concurrently executing. In addition to traditional 73scalar registers of a CPU, these devices often have many wide vector registers. 74 75 76 77They may support masked vector instructions that are used by the compiler to map 78high level language threads onto the lanes of the vector registers. As a 79consequence, multiple language threads execute in lockstep as the vector 80instructions are executed. This is termed single instruction multiple thread 81(SIMT) execution. 82 83 84 85GPUs can have multiple memory address spaces in addition to the single global 86memory address space of a CPU. These additional address spaces are accessed 87using distinct instructions and are often local to a particular thread or group 88of threads. 89 90For example, a GPU may have a per thread block address space that is implemented 91as scratch pad memory with explicit hardware support to isolate portions to 92specific groups of threads created as a single thread block. 93 94A GPU may also use global memory in a non linear manner. For example, to support 95providing a SIMT per lane address space efficiently, there may be instructions 96that support interleaved access. 97 98Through optimization, the source variables may be located across these different 99storage kinds. SIMT execution requires locations to be able to express selection 100of runtime defined pieces of vector registers. With the more complex locations, 101there is a benefit to be able to factorize their calculation which requires all 102location kinds to be supported uniformly, otherwise duplication is necessary. 103 104# DWARF 5 105 106Before presenting the proposed solution to supporting heterogeneous devices, a 107brief overview of the DWARF 5 expression evaluation model will be given to 108highlight the aspects being addressed by the extension. 109 110## How DWARF Maps Source Language To Hardware 111 112DWARF is a standardized way to specify debug information. It describes source 113language entities such as compilation units, functions, types, variables, etc. 114It is either embedded directly in sections of the code object executables, or 115split into separate files that they reference. 116 117DWARF maps between source program language entities and their hardware 118representations. For example: 119 120- It maps a hardware instruction program counter to a source language program 121 line, and vice versa. 122- It maps a source language function to the hardware instruction program counter 123 for its entry point. 124- It maps a source language variable to its hardware location when at a 125 particular program counter. 126- It provides information to allow virtual unwinding of hardware registers for a 127 source language function call stack. 128- In addition, it provides numerous other information about the source language 129 program. 130 131In particular, there is great diversity in the way a source language entity 132could be mapped to a hardware location. The location may involve runtime values. 133For example, a source language variable location could be: 134 135- In register. 136- At a memory address. 137- At an offset from the current stack pointer. 138- Optimized away, but with a known compiler time value. 139- Optimized away, but with an unknown value, such as happens for unused 140 variables. 141- Spread across combination of the above kinds of locations. 142- At a memory address, but also transiently loaded into registers. 143 144To support this DWARF 5 defines a rich expression language comprised of loclist 145expressions and operation expressions. Loclist expressions allow the result to 146vary depending on the PC. Operation expressions are made up of a list of 147operations that are evaluated on a simple stack machine. 148 149A DWARF expression can be used as the value of different attributes of different 150debug information entries (DIE). A DWARF expression can also be used as an 151argument to call frame information information (CFI) entry operations. An 152expression is evaluated in a context dictated by where it is used. The context 153may include: 154 155- Whether the expression needs to produce a value or the location of an entity. 156- The current execution point including process, thread, PC, and stack frame. 157- Some expressions are evaluated with the stack initialized with a specific 158 value or with the location of a base object that is available using the 159 DW_OP_push_object_address operation. 160 161## Examples 162 163The following examples illustrate how DWARF expressions involving operations are 164evaluated in DWARF 5. DWARF also has expressions involving location lists that 165are not covered in these examples. 166 167### Dynamic Array Size 168 169The first example is for an operation expression associated with a DIE attribute 170that provides the number of elements in a dynamic array type. Such an attribute 171dictates that the expression must be evaluated in the context of providing a 172value result kind. 173 174 175 176In this hypothetical example, the compiler has allocated an array descriptor in 177memory and placed the descriptor's address in architecture register SGPR0. The 178first location of the array descriptor is the runtime size of the array. 179 180A possible expression to retrieve the dynamic size of the array is: 181 182 DW_OP_regval_type SGPR0 Generic 183 DW_OP_deref 184 185The expression is evaluated one operation at a time. Operations have operands 186and can pop and push entries on a stack. 187 188 189 190The expression evaluation starts with the first DW_OP_regval_type operation. 191This operation reads the current value of an architecture register specified by 192its first operand: SGPR0. The second operand specifies the size of the data to 193read. The read value is pushed on the stack. Each stack element is a value and 194its associated type. 195 196 197 198The type must be a DWARF base type. It specifies the encoding, byte ordering, 199and size of values of the type. DWARF defines that each architecture has a 200default generic type: it is an architecture specific integral encoding and byte 201ordering, that is the size of the architecture's global memory address. 202 203The DW_OP_deref operation pops a value off the stack, treats it as a global 204memory address, and reads the contents of that location using the generic type. 205It pushes the read value on the stack as the value and its associated generic 206type. 207 208 209 210The evaluation stops when it reaches the end of the expression. The result of an 211expression that is evaluated with a value result kind context is the top element 212of the stack, which provides the value and its type. 213 214### Variable Location in Register 215 216This example is for an operation expression associated with a DIE attribute that 217provides the location of a source language variable. Such an attribute dictates 218that the expression must be evaluated in the context of providing a location 219result kind. 220 221DWARF defines the locations of objects in terms of location descriptions. 222 223In this example, the compiler has allocated a source language variable in 224architecture register SGPR0. 225 226 227 228A possible expression to specify the location of the variable is: 229 230 DW_OP_regx SGPR0 231 232 233 234The DW_OP_regx operation creates a location description that specifies the 235location of the architecture register specified by the operand: SGPR0. Unlike 236values, location descriptions are not pushed on the stack. Instead they are 237conceptually placed in a location area. Unlike values, location descriptions do 238not have an associated type, they only denote the location of the base of the 239object. 240 241 242 243Again, evaluation stops when it reaches the end of the expression. The result of 244an expression that is evaluated with a location result kind context is the 245location description in the location area. 246 247### Variable Location in Memory 248 249The next example is for an operation expression associated with a DIE attribute 250that provides the location of a source language variable that is allocated in a 251stack frame. The compiler has placed the stack frame pointer in architecture 252register SGPR0, and allocated the variable at offset 0x10 from the stack frame 253base. The stack frames are allocated in global memory, so SGPR0 contains a 254global memory address. 255 256 257 258A possible expression to specify the location of the variable is: 259 260 DW_OP_regval_type SGPR0 Generic 261 DW_OP_plus_uconst 0x10 262 263 264 265As in the previous example, the DW_OP_regval_type operation pushes the stack 266frame pointer global memory address onto the stack. The generic type is the size 267of a global memory address. 268 269 270 271The DW_OP_plus_uconst operation pops a value from the stack, which must have a 272type with an integral encoding, adds the value of its operand, and pushes the 273result back on the stack with the same associated type. In this example, that 274computes the global memory address of the source language variable. 275 276 277 278Evaluation stops when it reaches the end of the expression. If the expression 279that is evaluated has a location result kind context, and the location area is 280empty, then the top stack element must be a value with the generic type. The 281value is implicitly popped from the stack, and treated as a global memory 282address to create a global memory location description, which is placed in the 283location area. The result of the expression is the location description in the 284location area. 285 286 287 288### Variable Spread Across Different Locations 289 290This example is for a source variable that is partly in a register, partly undefined, and partly in memory. 291 292 293 294DWARF defines composite location descriptions that can have one or more parts. 295Each part specifies a location description and the number of bytes used from it. 296The following operation expression creates a composite location description. 297 298 DW_OP_regx SGPR3 299 DW_OP_piece 4 300 DW_OP_piece 2 301 DW_OP_bregx SGPR0 0x10 302 DW_OP_piece 2 303 304 305 306The DW_OP_regx operation creates a register location description in the location 307area. 308 309 310 311The first DW_OP_piece operation creates an incomplete composite location 312description in the location area with a single part. The location description in 313the location area is used to define the beginning of the part for the size 314specified by the operand, namely 4 bytes. 315 316 317 318A subsequent DW_OP_piece adds a new part to an incomplete composite location 319description already in the location area. The parts form a contiguous set of 320bytes. If there are no other location descriptions in the location area, and no 321value on the stack, then the part implicitly uses the undefined location 322description. Again, the operand specifies the size of the part in bytes. The 323undefined location description can be used to indicate a part that has been 324optimized away. In this case, 2 bytes of undefined value. 325 326 327 328The DW_OP_bregx operation reads the architecture register specified by the first 329operand (SGPR0) as the generic type, adds the value of the second operand 330(0x10), and pushes the value on the stack. 331 332 333 334The next DW_OP_piece operation adds another part to the already created 335incomplete composite location. 336 337If there is no other location in the location area, but there is a value on 338stack, the new part is a memory location description. The memory address used is 339popped from the stack. In this case, the operand of 2 indicates there are 2 340bytes from memory. 341 342 343 344Evaluation stops when it reaches the end of the expression. If the expression 345that is evaluated has a location result kind context, and the location area has 346an incomplete composite location description, the incomplete composite location 347is implicitly converted to a complete composite location description. The result 348of the expression is the location description in the location area. 349 350 351 352### Offsetting a Composite Location 353 354This example attempts to extend the previous example to offset the composite 355location description it created. The *Variable Location in Memory* example 356conveniently used the DW_OP_plus operation to offset a memory address. 357 358 DW_OP_regx SGPR3 359 DW_OP_piece 4 360 DW_OP_piece 2 361 DW_OP_bregx SGPR0 0x10 362 DW_OP_piece 2 363 DW_OP_plus_uconst 5 364 365 366 367However, DW_OP_plus cannot be used to offset a composite location. It only 368operates on the stack. 369 370 371 372To offset a composite location description, the compiler would need to make a 373different composite location description, starting at the part corresponding to 374the offset. For example: 375 376 DW_OP_piece 1 377 DW_OP_bregx SGPR0 0x10 378 DW_OP_piece 2 379 380This illustrates that operations on stack values are not composable with 381operations on location descriptions. 382 383## Limitations 384 385DWARF 5 is unable to describe variables in runtime indexed parts of registers. 386This is required to describe a source variable that is located in a lane of a 387SIMT vector register. 388 389Some features only work when located in global memory. The type attribute 390expressions require a base object which could be in any kind of location. 391 392DWARF procedures can only accept global memory address arguments. This limits 393the ability to factorize the creation of locations that involve other location 394kinds. 395 396There are no vector base types. This is required to describe vector registers. 397 398There is no operation to create a memory location in a non-global address space. 399Only the dereference operation supports providing an address space. 400 401CFI location expressions do not allow composite locations or non-global address 402space memory locations. Both these are needed in optimized code for devices with 403vector registers and address spaces. 404 405Bit field offsets are only supported in a limited way for register locations. 406Supporting them in a uniform manner for all location kinds is required to 407support languages with bit sized entities. 408 409# Extension Solution 410 411This section outlines the extension to generalize the DWARF expression evaluation 412model to allow location descriptions to be manipulated on the stack. It presents 413a number of simplified examples to demonstrate the benefits and how the extension 414solves the issues of heterogeneous devices. It presents how this is done in 415a manner that is backwards compatible with DWARF 5. 416 417## Location Description 418 419In order to have consistent, composable operations that act on location 420descriptions, the extension defines a uniform way to handle all location kinds. 421That includes memory, register, implicit, implicit pointer, undefined, and 422composite location descriptions. 423 424Each kind of location description is conceptually a zero-based offset within a 425piece of storage. The storage is a contiguous linear organization of a certain 426number of bytes (see below for how this is extended to support bit sized 427storage). 428 429- For global memory, the storage is the linear stream of bytes of the 430 architecture's address size. 431- For each separate architecture register, it is the linear stream of bytes of 432 the size of that specific register. 433- For an implicit, it is the linear stream of bytes of the value when 434 represented using the value's base type which specifies the encoding, size, 435 and byte ordering. 436- For undefined, it is an infinitely sized linear stream where every byte is 437 undefined. 438- For composite, it is a linear stream of bytes defined by the composite's parts. 439 440## Stack Location Description Operations 441 442The DWARF expression stack is extended to allow each stack entry to either be a 443value or a location description. 444 445Evaluation rules are defined to implicitly convert a stack element that is a 446value to a location description, or vice versa, so that all DWARF 5 expressions 447continue to have the same semantics. This reflects that a memory address is 448effectively used as a proxy for a memory location description. 449 450For each place that allows a DWARF expression to be specified, it is defined if 451the expression is to be evaluated as a value or a location description. 452 453Existing DWARF expression operations that are used to act on memory addresses 454are generalized to act on any location description kind. For example, the 455DW_OP_deref operation pops a location description rather than a memory address 456value from the stack and reads the storage associated with the location kind 457starting at the location description's offset. 458 459Existing DWARF expression operations that create location descriptions are 460changed to pop and push location descriptions on the stack. For example, the 461DW_OP_value, DW_OP_regx, DW_OP_implicit_value, DW_OP_implicit_pointer, 462DW_OP_stack_value, and DW_OP_piece. 463 464New operations that act on location descriptions can be added. For example, a 465DW_OP_offset operation that modifies the offset of the location description on 466top of the stack. Unlike the DW_OP_plus operation that only works with memory 467address, a DW_OP_offset operation can work with any location kind. 468 469To allow incremental and nested creation of composite location descriptions, a 470DW_OP_piece_end can be defined to explicitly indicate the last part of a 471composite. Currently, creating a composite must always be the last operation of 472an expression. 473 474A DW_OP_undefined operation can be defined that explicitly creates the undefined 475location description. Currently this is only possible as a piece of a composite 476when the stack is empty. 477 478## Examples 479 480This section provides some motivating examples to illustrate the benefits that 481result from allowing location descriptions on the stack. 482 483### Source Language Variable Spilled to Part of a Vector Register 484 485A compiler generating code for a GPU may allocate a source language variable 486that it proves has the same value for every lane of a SIMT thread in a scalar 487register. It may then need to spill that scalar register. To avoid the high cost 488of spilling to memory, it may spill to a fixed lane of one of the numerous 489vector registers. 490 491 492 493The following expression defines the location of a source language variable that 494the compiler allocated in a scalar register, but had to spill to lane 5 of a 495vector register at this point of the code. 496 497 DW_OP_regx VGPR0 498 DW_OP_offset_uconst 20 499 500 501 502The DW_OP_regx pushes a register location description on the stack. The storage 503for the register is the size of the vector register. The register location 504description conceptually references that storage with an initial offset of 0. 505The architecture defines the byte ordering of the register. 506 507 508 509The DW_OP_offset_uconst pops a location description off the stack, adds its 510operand value to the offset, and pushes the updated location description back on 511the stack. In this case the source language variable is being spilled to lane 5 512and each lane's component which is 32-bits (4 bytes), so the offset is 5*4=20. 513 514 515 516The result of the expression evaluation is the location description on the top 517of the stack. 518 519An alternative approach could be for the target to define distinct register 520names for each part of each vector register. However, this is not practical for 521GPUs due to the sheer number of registers that would have to be defined. It 522would also not permit a runtime index into part of the whole register to be used 523as shown in the next example. 524 525### Source Language Variable Spread Across Multiple Vector Registers 526 527A compiler may generate SIMT code for a GPU. Each source language thread of 528execution is mapped to a single lane of the GPU thread. Source language 529variables that are mapped to a register, are mapped to the lane component of the 530vector registers corresponding to the source language's thread of execution. 531 532The location expression for such variables must therefore be executed in the 533context of the focused source language thread of execution. A DW_OP_push_lane 534operation can be defined to push the value of the lane for the currently focused 535source language thread of execution. The value to use would be provided by the 536consumer of DWARF when it evaluates the location expression. 537 538If the source language variable is larger than the size of the vector register 539lane component, then multiple vector registers are used. Each source language 540thread of execution will only use the vector register components for its 541associated lane. 542 543 544 545The following expression defines the location of a source language variable that 546has to occupy two vector registers. A composite location description is created 547that combines the two parts. It will give the correct result regardless of which 548lane corresponds to the source language thread of execution that the user is 549focused on. 550 551 DW_OP_regx VGPR0 552 DW_OP_push_lane 553 DW_OP_uconst 4 554 DW_OP_mul 555 DW_OP_offset 556 DW_OP_piece 4 557 DW_OP_regx VGPR1 558 DW_OP_push_lane 559 DW_OP_uconst 4 560 DW_OP_mul 561 DW_OP_offset 562 DW_OP_piece 4 563 564 565 566The DW_OP_regx VGPR0 pushes a location description for the first register. 567 568 569 570The DW_OP_push_lane; DW_OP_uconst 4; DW_OP_mul calculates the offset for the 571focused lanes vector register component as 4 times the lane number. 572 573 574 575 576 577 578 579The DW_OP_offset adjusts the register location description's offset to the 580runtime computed value. 581 582 583 584The DW_OP_piece either creates a new composite location description, or adds a 585new part to an existing incomplete one. It pops the location description to use 586for the new part. It then pops the next stack element if it is an incomplete 587composite location description, otherwise it creates a new incomplete composite 588location description with no parts. Finally it pushes the incomplete composite 589after adding the new part. 590 591In this case a register location description is added to a new incomplete 592composite location description. The 4 of the DW_OP_piece specifies the size of 593the register storage that comprises the part. Note that the 4 bytes start at the 594computed register offset. 595 596For backwards compatibility, if the stack is empty or the top stack element is 597an incomplete composite, an undefined location description is used for the part. 598If the top stack element is a generic base type value, then it is implicitly 599converted to a global memory location description with an offset equal to the 600value. 601 602 603 604The rest of the expression does the same for VGPR1. However, when the 605DW_OP_piece is evaluated there is an incomplete composite on the stack. So the 606VGPR1 register location description is added as a second part. 607 608 609 610 611 612 613 614 615 616 617 618 619 620At the end of the expression, if the top stack element is an incomplete 621composite location description, it is converted to a complete location 622description and returned as the result. 623 624 625 626### Source Language Variable Spread Across Multiple Kinds of Locations 627 628This example is the same as the previous one, except the first 2 bytes of the 629second vector register have been spilled to memory, and the last 2 bytes have 630been proven to be a constant and optimized away. 631 632 633 634 DW_OP_regx VGPR0 635 DW_OP_push_lane 636 DW_OP_uconst 4 637 DW_OP_mul 638 DW_OP_offset 639 DW_OP_piece 4 640 DW_OP_addr 0xbeef 641 DW_OP_piece 2 642 DW_OP_uconst 0xf00d 643 DW_OP_stack_value 644 DW_OP_piece 2 645 DW_OP_piece_end 646 647The first 6 operations are the same. 648 649 650 651The DW_OP_addr operation pushes a global memory location description on the 652stack with an offset equal to the address. 653 654 655 656The next DW_OP_piece adds the global memory location description as the next 2 657byte part of the composite. 658 659 660 661The DW_OP_uconst 0xf00d; DW_OP_stack_value pushes an implicit location 662description on the stack. The storage of the implicit location description is 663the representation of the value 0xf00d using the generic base type's encoding, 664size, and byte ordering. 665 666 667 668 669 670The final DW_OP_piece adds 2 bytes of the implicit location description as the 671third part of the composite location description. 672 673 674 675The DW_OP_piece_end operation explicitly makes the incomplete composite location 676description into a complete location description. This allows a complete 677composite location description to be created on the stack that can be used as 678the location description of another following operation. For example, the 679DW_OP_offset can be applied to it. More practically, it permits creation of 680multiple composite location descriptions on the stack which can be used to pass 681arguments to a DWARF procedure using a DW_OP_call* operation. This can be 682beneficial to factor the incrementally creation of location descriptions. 683 684 685 686### Address Spaces 687 688Heterogeneous devices can have multiple hardware supported address spaces which 689use specific hardware instructions to access them. 690 691For example, GPUs that use SIMT execution may provide hardware support to access 692memory such that each lane can see a linear memory view, while the backing 693memory is actually being accessed in an interleaved manner so that the locations 694for each lanes Nth dword are contiguous. This minimizes cache lines read by the 695SIMT execution. 696 697 698 699The following expression defines the location of a source language variable that 700is allocated at offset 0x10 in the current subprograms stack frame. The 701subprogram stack frames are per lane and reside in an interleaved address space. 702 703 DW_OP_regval_type SGPR0 Generic 704 DW_OP_uconst 1 705 DW_OP_form_aspace_address 706 DW_OP_offset 0x10 707 708 709 710The DW_OP_regval_type operation pushes the contents of SGPR0 as a generic value. 711This is the register that holds the address of the current stack frame. 712 713 714 715The DW_OP_uconst operation pushes the address space number. Each architecture 716defines the numbers it uses in DWARF. In this case, address space 1 is being 717used as the per lane memory. 718 719 720 721The DW_OP_form_aspace_address operation pops a value and an address space 722number. Each address space is associated with a separate storage. A memory 723location description is pushed which refers to the address space's storage, with 724an offset of the popped value. 725 726 727 728All operations that act on location descriptions work with memory locations 729regardless of their address space. 730 731Every architecture defines address space 0 as the default global memory address 732space. 733 734Generalizing memory location descriptions to include an address space component 735avoids having to create specialized operations to work with address spaces. 736 737The source variable is at offset 0x10 in the stack frame. The DW_OP_offset 738operation works on memory location descriptions that have an address space just 739like for any other kind of location description. 740 741 742 743The only operations in DWARF 5 that take an address space are DW_OP_xderef*. 744They treat a value as the address in a specified address space, and read its 745contents. There is no operation to actually create a location description that 746references an address space. There is no way to include address space memory 747locations in parts of composite locations. 748 749Since DW_OP_piece now takes any kind of location description for its pieces, it 750is now possible for parts of a composite to involve locations in different 751address spaces. For example, this can happen when parts of a source variable 752allocated in a register are spilled to a stack frame that resides in the 753non-global address space. 754 755### Bit Offsets 756 757With the generalization of location descriptions on the stack, it is possible to 758define a DW_OP_bit_offset operation that adjusts the offset of any kind of 759location in terms of bits rather than bytes. The offset can be a runtime 760computed value. This is generally useful for any source language that support 761bit sized entities, and for registers that are not a whole number of bytes. 762 763DWARF 5 only supports bit fields in composites using DW_OP_bit_piece. It does 764not support runtime computed offsets which can happen for bit field packed 765arrays. It is also not generally composable as it must be the last part of an 766expression. 767 768The following example defines a location description for a source variable that 769is allocated starting at bit 20 of a register. A similar expression could be 770used if the source variable was at a bit offset within memory or a particular 771address space, or if the offset is a runtime value. 772 773 774 775 DW_OP_regx SGPR3 776 DW_OP_uconst 20 777 DW_OP_bit_offset 778 779 780 781 782 783 784 785The DW_OP_bit_offset operation pops a value and location description from the 786stack. It pushes the location description after updating its offset using the 787value as a bit count. 788 789 790 791The ordering of bits within a byte, like byte ordering, is defined by the target 792architecture. A base type could be extended to specify bit ordering in addition 793to byte ordering. 794 795## Call Frame Information (CFI) 796 797DWARF defines call frame information (CFI) that can be used to virtually unwind 798the subprogram call stack. This involves determining the location where register 799values have been spilled. DWARF 5 limits these locations to either be registers 800or global memory. As shown in the earlier examples, heterogeneous devices may 801spill registers to parts of other registers, to non-global memory address 802spaces, or even a composite of different location kinds. 803 804Therefore, the extension extends the CFI rules to support any kind of location 805description, and operations to create locations in address spaces. 806 807## Objects Not In Byte Aligned Global Memory 808 809DWARF 5 only effectively supports byte aligned memory locations on the stack by 810using a global memory address as a proxy for a memory location description. This 811is a problem for attributes that define DWARF expressions that require the 812location of some source language entity that is not allocated in byte aligned 813global memory. 814 815For example, the DWARF expression of the DW_AT_data_member_location attribute is 816evaluated with an initial stack containing the location of a type instance 817object. That object could be located in a register, in a non-global memory 818address space, be described by a composite location description, or could even 819be an implicit location description. 820 821A similar problem exists for DWARF expressions that use the 822DW_OP_push_object_address operation. This operation pushes the location of a 823program object associated with the attribute that defines the expression. 824 825Allowing any kind of location description on the stack permits the DW_OP_call* 826operations to be used to factor the creation of location descriptions. The 827inputs and outputs of the call are passed on the stack. For example, on GPUs an 828expression can be defined to describe the effective PC of inactive lanes of SIMT 829execution. This is naturally done by composing the result of expressions for 830each nested control flow region. This can be done by making each control flow 831region have its own DWARF procedure, and then calling it from the expressions of 832the nested control flow regions. The alternative is to make each control flow 833region have the complete expression which results in much larger DWARF and is 834less convenient to generate. 835 836GPU compilers work hard to allocate objects in the larger number of registers to 837reduce memory accesses, they have to use different memory address spaces, and 838they perform optimizations that result in composites of these. Allowing 839operations to work with any kind of location description enables creating 840expressions that support all of these. 841 842Full general support for bit fields and implicit locations benefits 843optimizations on any target. 844 845## Higher Order Operations 846 847The generalization allows an elegant way to add higher order operations that 848create location descriptions out of other location descriptions in a general 849composable manner. 850 851For example, a DW_OP_extend operation could create a composite location 852description out of a location description, an element size, and an element 853count. The resulting composite would effectively be a vector of element count 854elements with each element being the same location description of the specified 855bit size. 856 857A DW_OP_select_bit_piece operation could create a composite location description 858out of two location descriptions, a bit mask value, and an element size. The 859resulting composite would effectively be a vector of elements, selecting from 860one of the two input locations according to the bit mask. 861 862These could be used in the expression of an attribute that computes the 863effective PC of lanes of SIMT execution. The vector result efficiently computes 864the PC for each SIMT lane at once. The mask could be the hardware execution mask 865register that controls which SIMT lanes are executing. For active divergent 866lanes the vector element would be the current PC, and for inactive divergent 867lanes the PC would correspond to the source language line at which the lane is 868logically positioned. 869 870Similarly, a DW_OP_overlay_piece operation could be defined that creates a 871composite location description out of two location descriptions, an offset 872value, and a size. The resulting composite would consist of parts that are 873equivalent to one of the location descriptions, but with the other location 874description replacing a slice defined by the offset and size. This could be used 875to efficiently express a source language array that has had a set of elements 876promoted into a vector register when executing a set of iterations of a loop in 877a SIMD manner. 878 879## Objects In Multiple Places 880 881A compiler may allocate a source variable in stack frame memory, but for some 882range of code may promote it to a register. If the generated code does not 883change the register value, then there is no need to save it back to memory. 884Effectively, during that range, the source variable is in both memory and a 885register. If a consumer, such as a debugger, allows the user to change the value 886of the source variable in that PC range, then it would need to change both 887places. 888 889DWARF 5 supports loclists which are able to specify the location of a source 890language entity is in different places at different PC locations. It can also 891express that a source language entity is in multiple places at the same time. 892 893DWARF 5 defines operation expressions and loclists separately. In general, this 894is adequate as non-memory location descriptions can only be computed as the last 895step of an expression evaluation. 896 897However, allowing location descriptions on the stack permits non-memory location 898descriptions to be used in the middle of expression evaluation. For example, the 899DW_OP_call* and DW_OP_implicit_pointer operations can result in evaluating the 900expression of a DW_AT_location attribute of a DIE. The DW_AT_location attribute 901allows the loclist form. So the result could include multiple location 902descriptions. 903 904Similarly, the DWARF expression associated with attributes such as 905DW_AT_data_member_location that are evaluated with an initial stack containing a 906location description, or a DWARF operation expression that uses the 907DW_OP_push_object_address operation, may want to act on the result of another 908expression that returned a location description involving multiple places. 909 910Therefore, the extension needs to define how expression operations that use those 911results will behave. The extension does this by generalizing the expression stack 912to allow an entry to be one or more single location descriptions. In doing this, 913it unifies the definitions of DWARF operation expressions and loclist 914expressions in a natural way. 915 916All operations that act on location descriptions are extended to act on multiple 917single location descriptions. For example, the DW_OP_offset operation adds the 918offset to each single location description. The DW_OP_deref* operations simply 919read the storage of one of the single location descriptions, since multiple 920single location descriptions must all hold the same value. Similarly, if the 921evaluation of a DWARF expression results in multiple single location 922descriptions, the consumer can ensure any updates are done to all of them, and 923any reads can use any one of them. 924 925# Conclusion 926 927A strength of DWARF is that it has generally sought to provide generalized 928composable solutions that address many problems, rather than solutions that only 929address one-off issues. This extension attempts to follow that tradition by 930defining a backwards compatible composable generalization that can address a 931significant family of issues. It addresses the specific issues present for 932heterogeneous computing devices, provides benefits for non-heterogeneous 933devices, and can help address a number of other previously reported issues. 934 935# Further Information 936 937The following references provide additional information on the extension. 938 939Slides and a video of a presentation at the Linux Plumbers Conference 2021 940related to this extension are available. 941 942The LLVM compiler extension includes possible normative text changes for this 943extension as well as the operations mentioned in the motivating examples. It 944also covers other extensions needed for heterogeneous devices. 945 946- DWARF extensions for optimized SIMT/SIMD (GPU) debugging - Linux Plumbers Conference 2021 947 - [Video](https://www.youtube.com/watch?v=QiR0ra0ymEY&t=10015s) 948 - [Slides](https://linuxplumbersconf.org/event/11/contributions/1012/attachments/798/1505/DWARF_Extensions_for_Optimized_SIMT-SIMD_GPU_Debugging-LPC2021.pdf) 949- [DWARF Extensions For Heterogeneous Debugging](https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html) 950