1# Allow Location Descriptions on the DWARF Expression Stack <!-- omit in toc -->
2
3- [Extension](#extension)
4- [Heterogeneous Computing Devices](#heterogeneous-computing-devices)
5- [DWARF 5](#dwarf-5)
6  - [What is DWARF?](#what-is-dwarf)
7  - [Examples](#examples)
8    - [Dynamic Array Size](#dynamic-array-size)
9    - [Variable Location in Register](#variable-location-in-register)
10    - [Variable Location in Memory](#variable-location-in-memory)
11    - [Variable Spread Across Different Locations](#variable-spread-across-different-locations)
12    - [Offsetting a Composite Location](#offsetting-a-composite-location)
13  - [Limitations](#limitations)
14- [Extension Solution](#extension-solution)
15  - [Location Description](#location-description)
16  - [Stack Location Description Operations](#stack-location-description-operations)
17  - [Examples](#examples-1)
18    - [Source Language Variable Spilled to Part of a Vector Register](#source-language-variable-spilled-to-part-of-a-vector-register)
19    - [Source Language Variable Spread Across Multiple Vector Registers](#source-language-variable-spread-across-multiple-vector-registers)
20    - [Source Language Variable Spread Across Multiple Kinds of Locations](#source-language-variable-spread-across-multiple-kinds-of-locations)
21    - [Address Spaces](#address-spaces)
22    - [Bit Offsets](#bit-offsets)
23  - [Call Frame Information (CFI)](#call-frame-information-cfi)
24  - [Objects Not In Byte Aligned Global Memory](#objects-not-in-byte-aligned-global-memory)
25  - [Higher Order Operations](#higher-order-operations)
26  - [Objects In Multiple Places](#objects-in-multiple-places)
27- [Conclusion](#conclusion)
28- [Further Information](#further-information)
29
30# Extension
31
32This extension is to generalize the DWARF expression evaluation model to allow
33location descriptions to be manipulated on the stack. It is done in a manner
34that is backwards compatible with DWARF 5. This permits operations to act on
35location descriptions in an incremental, consistent, and composable manner.
36
37It allows a small number of operations to be defined to address the requirements
38of heterogeneous devices as well as providing benefits to non-heterogeneous
39devices. It also acts as a foundation to provide support for other issues that
40have been raised that would benefit all devices.
41
42Other approaches were explored that involved adding specialized operations and
43rules. However, these resulted in the need for more operations that did not
44compose. It also resulted in operations with context sensitive semantics and
45corner cases that had to be defined. The observation was that numerous
46specialized context sensitive operations are harder for both produces and
47consumers than a smaller number of general composable operations that have
48consistent semantics regardless of context.
49
50The following sections first describe heterogeneous devices and the features
51they have that are not addressed by DWARF 5. Then a brief simplified overview of
52the DWARF 5 expression evaluation model is presented that highlights the
53difficulties for supporting the heterogeneous features. Finally, an overview of
54the extension is presented, using simplified examples to illustrate how it can
55address the issues of heterogeneous devices and also benefit non-heterogeneous
56devices. References to further information are provided.
57
58# Heterogeneous Computing Devices
59
60GPUs and other heterogeneous computing devices have features not common to CPU
61computing devices.
62
63These devices often have many more registers than a CPU. This helps reduce
64memory accesses which tend to be more expensive than on a CPU due to the much
65larger number of threads concurrently executing. In addition to traditional
66scalar registers of a CPU, these devices often have many wide vector registers.
67
68![Example GPU Hardware](images/example-gpu-hardware.png)
69
70They may support masked vector instructions that are used by the compiler to map
71high level language threads onto the lanes of the vector registers. As a
72consequence, multiple language threads execute in lockstep as the vector
73instructions are executed. This is termed single instruction multiple thread
74(SIMT) execution.
75
76![SIMT/SIMD Execution Model](images/simt-execution-model.png)
77
78GPUs can have multiple memory address spaces in addition to the single global
79memory address space of a CPU. These additional address spaces are accessed
80using distinct instructions and are often local to a particular thread or group
81of threads.
82
83For example, a GPU may have a per thread block address space that is implemented
84as scratch pad memory with explicit hardware support to isolate portions to
85specific groups of threads created as a single thread block.
86
87A GPU may also use global memory in a non linear manner. For example, to support
88providing a SIMT per lane address space efficiently, there may be instructions
89that support interleaved access.
90
91Through optimization, the source variables may be located across these different
92storage kinds. SIMT execution requires locations to be able to express selection
93of runtime defined pieces of vector registers. With the more complex locations,
94there is a benefit to be able to factorize their calculation which requires all
95location kinds to be supported uniformly, otherwise duplication is necessary.
96
97# DWARF 5
98
99Before presenting the proposed solution to supporting heterogeneous devices, a
100brief overview of the DWARF 5 expression evaluation model will be given to
101highlight the aspects being addressed by the extension.
102
103## What is DWARF?
104
105DWARF is a standardized way to specify debug information. It describes source
106language entities such as compilation units, functions, types, variables, etc.
107It is either embedded directly in sections of the code object executables, or
108split into separate files that they reference.
109
110DWARF maps between source program language entities and their hardware
111representations. For example:
112
113- It maps a hardware instruction program counter to a source language program
114  line, and vice versa.
115- It maps a source language function to the hardware instruction program counter
116  for its entry point.
117- It maps a source language variable to its hardware location when at a
118  particular program counter.
119- It provides information to allow virtual unwinding of hardware registers for a
120  source language function call stack.
121- In addition, it provides numerous other information about the source language
122  program.
123
124In particular, there is great diversity in the way a source language entity
125could be mapped to a hardware location. The location may involve runtime values.
126For example, a source language variable location could be:
127
128- In register.
129- At a memory address.
130- At an offset from the current stack pointer.
131- Optimized away, but with a known compiler time value.
132- Optimized away, but with an unknown value, such as happens for unused
133  variables.
134- Spread across combination of the above kinds of locations.
135- At a memory address, but also transiently loaded into registers.
136
137To support this DWARF 5 defines a rich expression language comprised of loclist
138expressions and operation expressions. Loclist expressions allow the result to
139vary depending on the PC. Operation expressions are made up of a list of
140operations that are evaluated on a simple stack machine.
141
142A DWARF expression can be used as the value of different attributes of different
143debug information entries (DIE). A DWARF expression can also be used as an
144argument to call frame information information (CFI) entry operations. An
145expression is evaluated in a context dictated by where it is used. The context
146may include:
147
148- Whether the expression needs to produce a value or the location of an entity.
149- The current execution point including process, thread, PC, and stack frame.
150- Some expressions are evaluated with the stack initialized with a specific
151  value or with the location of a base object that is available using the
152  DW_OP_push_object_address operation.
153
154## Examples
155
156The following examples illustrate how DWARF expressions involving operations are
157evaluated in DWARF 5. DWARF also has expressions involving location lists that
158are not covered in these examples.
159
160### Dynamic Array Size
161
162The first example is for an operation expression associated with a DIE attribute
163that provides the number of elements in a dynamic array type. Such an attribute
164dictates that the expression must be evaluated in the context of providing a
165value result kind.
166
167![Dynamic Array Size Example](images/01-value.example.png)
168
169In this hypothetical example, the compiler has allocated an array descriptor in
170memory and placed the descriptor's address in architecture register SGPR0. The
171first location of the array descriptor is the runtime size of the array.
172
173A possible expression to retrieve the dynamic size of the array is:
174
175    DW_OP_regval_type SGPR0 Generic
176    DW_OP_deref
177
178The expression is evaluated one operation at a time. Operations have operands
179and can pop and push entries on a stack.
180
181![Dynamic Array Size Example: Step 1](images/01-value.example.frame.1.png)
182
183The expression evaluation starts with the first DW_OP_regval_type operation.
184This operation reads the current value of an architecture register specified by
185its first operand: SGPR0. The second operand specifies the size of the data to
186read. The read value is pushed on the stack. Each stack element is a value and
187its associated type.
188
189![Dynamic Array Size Example: Step 2](images/01-value.example.frame.2.png)
190
191The type must be a DWARF base type. It specifies the encoding, byte ordering,
192and size of values of the type. DWARF defines that each architecture has a
193default generic type: it is an architecture specific integral encoding and byte
194ordering, that is the size of the architecture's global memory address.
195
196The DW_OP_deref operation pops a value off the stack, treats it as a global
197memory address, and reads the contents of that location using the generic type.
198It pushes the read value on the stack as the value and its associated generic
199type.
200
201![Dynamic Array Size Example: Step 3](images/01-value.example.frame.3.png)
202
203The evaluation stops when it reaches the end of the expression. The result of an
204expression that is evaluated with a value result kind context is the top element
205of the stack, which provides the value and its type.
206
207### Variable Location in Register
208
209This example is for an operation expression associated with a DIE attribute that
210provides the location of a source language variable. Such an attribute dictates
211that the expression must be evaluated in the context of providing a location
212result kind.
213
214DWARF defines the locations of objects in terms of location descriptions.
215
216In this example, the compiler has allocated a source language variable in
217architecture register SGPR0.
218
219![Variable Location in Register Example](images/02-reg.example.png)
220
221A possible expression to specify the location of the variable is:
222
223    DW_OP_regx SGPR0
224
225![Variable Location in Register Example: Step 1](images/02-reg.example.frame.1.png)
226
227The DW_OP_regx operation creates a location description that specifies the
228location of the architecture register specified by the operand: SGPR0. Unlike
229values, location descriptions are not pushed on the stack. Instead they are
230conceptually placed in a location area. Unlike values, location descriptions do
231not have an associated type, they only denote the location of the base of the
232object.
233
234![Variable Location in Register Example: Step 2](images/02-reg.example.frame.2.png)
235
236Again, evaluation stops when it reaches the end of the expression. The result of
237an expression that is evaluated with a location result kind context is the
238location description in the location area.
239
240### Variable Location in Memory
241
242The next example is for an operation expression associated with a DIE attribute
243that provides the location of a source language variable that is allocated in a
244stack frame. The compiler has placed the stack frame pointer in architecture
245register SGPR0, and allocated the variable at offset 0x10 from the stack frame
246base. The stack frames are allocated in global memory, so SGPR0 contains a
247global memory address.
248
249![Variable Location in Memory Example](images/03-memory.example.png)
250
251A possible expression to specify the location of the variable is:
252
253    DW_OP_regval_type SGPR0 Generic
254    DW_OP_plus_uconst 0x10
255
256![Variable Location in Memory Example: Step 1](images/03-memory.example.frame.1.png)
257
258As in the previous example, the DW_OP_regval_type operation pushes the stack
259frame pointer global memory address onto the stack. The generic type is the size
260of a global memory address.
261
262![Variable Location in Memory Example: Step 2](images/03-memory.example.frame.2.png)
263
264The DW_OP_plus_uconst operation pops a value from the stack, which must have a
265type with an integral encoding, adds the value of its operand, and pushes the
266result back on the stack with the same associated type. In this example, that
267computes the global memory address of the source language variable.
268
269![Variable Location in Memory Example: Step 3](images/03-memory.example.frame.3.png)
270
271Evaluation stops when it reaches the end of the expression. If the expression
272that is evaluated has a location result kind context, and the location area is
273empty, then the top stack element must be a value with the generic type. The
274value is implicitly popped from the stack, and treated as a global memory
275address to create a global memory location description, which is placed in the
276location area. The result of the expression is the location description in the
277location area.
278
279![Variable Location in Memory Example: Step 4](images/03-memory.example.frame.4.png)
280
281### Variable Spread Across Different Locations
282
283This example is for a source variable that is partly in a register, partly undefined, and partly in memory.
284
285![Variable Spread Across Different Locations Example](images/04-composite.example.png)
286
287DWARF defines composite location descriptions that can have one or more parts.
288Each part specifies a location description and the number of bytes used from it.
289The following operation expression creates a composite location description.
290
291    DW_OP_regx SGPR3
292    DW_OP_piece 4
293    DW_OP_piece 2
294    DW_OP_bregx SGPR0 0x10
295    DW_OP_piece 2
296
297![Variable Spread Across Different Locations Example: Step 1](images/04-composite.example.frame.1.png)
298
299The DW_OP_regx operation creates a register location description in the location
300area.
301
302![Variable Spread Across Different Locations Example: Step 2](images/04-composite.example.frame.2.png)
303
304The first DW_OP_piece operation creates an incomplete composite location
305description in the location area with a single part. The location description in
306the location area is used to define the beginning of the part for the size
307specified by the operand, namely 4 bytes.
308
309![Variable Spread Across Different Locations Example: Step 3](images/04-composite.example.frame.3.png)
310
311A subsequent DW_OP_piece adds a new part to an incomplete composite location
312description already in the location area. The parts form a contiguous set of
313bytes. If there are no other location descriptions in the location area, and no
314value on the stack, then the part implicitly uses the undefined location
315description. Again, the operand specifies the size of the part in bytes. The
316undefined location description can be used to indicate a part that has been
317optimized away. In this case, 2 bytes of undefined value.
318
319![Variable Spread Across Different Locations Example: Step 4](images/04-composite.example.frame.4.png)
320
321The DW_OP_bregx operation reads the architecture register specified by the first
322operand (SGPR0) as the generic type, adds the value of the second operand
323(0x10), and pushes the value on the stack.
324
325![Variable Spread Across Different Locations Example: Step 5](images/04-composite.example.frame.5.png)
326
327The next DW_OP_piece operation adds another part to the already created
328incomplete composite location.
329
330If there is no other location in the location area, but there is a value on
331stack, the new part is a memory location description. The memory address used is
332popped from the stack. In this case, the operand of 2 indicates there are 2
333bytes from memory.
334
335![Variable Spread Across Different Locations Example: Step 6](images/04-composite.example.frame.6.png)
336
337Evaluation stops when it reaches the end of the expression. If the expression
338that is evaluated has a location result kind context, and the location area has
339an incomplete composite location description, the incomplete composite location
340is implicitly converted to a complete composite location description. The result
341of the expression is the location description in the location area.
342
343![Variable Spread Across Different Locations Example: Step 7](images/04-composite.example.frame.7.png)
344
345### Offsetting a Composite Location
346
347This example attempts to extend the previous example to offset the composite
348location description it created. The *Variable Location in Memory* example
349conveniently used the DW_OP_plus operation to offset a memory address.
350
351    DW_OP_regx SGPR3
352    DW_OP_piece 4
353    DW_OP_piece 2
354    DW_OP_bregx SGPR0 0x10
355    DW_OP_piece 2
356    DW_OP_plus_uconst 5
357
358![Offsetting a Composite Location Example: Step 6](images/05-composite-plus.example.frame.1.png)
359
360However, DW_OP_plus cannot be used to offset a composite location. It only
361operates on the stack.
362
363![Offsetting a Composite Location Example: Step 7](images/05-composite-plus.example.frame.2.png)
364
365To offset a composite location description, the compiler would need to make a
366different composite location description, starting at the part corresponding to
367the offset. For example:
368
369    DW_OP_piece 1
370    DW_OP_bregx SGPR0 0x10
371    DW_OP_piece 2
372
373This illustrates that operations on stack values are not composable with
374operations on location descriptions.
375
376## Limitations
377
378DWARF 5 is unable to describe variables in runtime indexed parts of registers.
379This is required to describe a source variable that is located in a lane of a
380SIMT vector register.
381
382Some features only work when located in global memory. The type attribute
383expressions require a base object which could be in any kind of location.
384
385DWARF procedures can only accept global memory address arguments. This limits
386the ability to factorize the creation of locations that involve other location
387kinds.
388
389There are no vector base types. This is required to describe vector registers.
390
391There is no operation to create a memory location in a non-global address space.
392Only the dereference operation supports providing an address space.
393
394CFI location expressions do not allow composite locations or non-global address
395space memory locations. Both these are needed in optimized code for devices with
396vector registers and address spaces.
397
398Bit field offsets are only supported in a limited way for register locations.
399Supporting them in a uniform manner for all location kinds is required to
400support languages with bit sized entities.
401
402# Extension Solution
403
404This section outlines the extension to generalize the DWARF expression evaluation
405model to allow location descriptions to be manipulated on the stack. It presents
406a number of simplified examples to demonstrate the benefits and how the extension
407solves the issues of heterogeneous devices. It presents how this is done in
408a manner that is backwards compatible with DWARF 5.
409
410## Location Description
411
412In order to have consistent, composable operations that act on location
413descriptions, the extension defines a uniform way to handle all location kinds.
414That includes memory, register, implicit, implicit pointer, undefined, and
415composite location descriptions.
416
417Each kind of location description is conceptually a zero-based offset within a
418piece of storage. The storage is a contiguous linear organization of a certain
419number of bytes (see below for how this is extended to support bit sized
420storage).
421
422- For global memory, the storage is the linear stream of bytes of the
423  architecture's address size.
424- For each separate architecture register, it is the linear stream of bytes of
425  the size of that specific register.
426- For an implicit, it is the linear stream of bytes of the value when
427  represented using the value's base type which specifies the encoding, size,
428  and byte ordering.
429- For undefined, it is an infinitely sized linear stream where every byte is
430  undefined.
431- For composite, it is a linear stream of bytes defined by the composite's parts.
432
433## Stack Location Description Operations
434
435The DWARF expression stack is extended to allow each stack entry to either be a
436value or a location description.
437
438Evaluation rules are defined to implicitly convert a stack element that is a
439value to a location description, or vice versa, so that all DWARF 5 expressions
440continue to have the same semantics. This reflects that a memory address is
441effectively used as a proxy for a memory location description.
442
443For each place that allows a DWARF expression to be specified, it is defined if
444the expression is to be evaluated as a value or a location description.
445
446Existing DWARF expression operations that are used to act on memory addresses
447are generalized to act on any location description kind. For example, the
448DW_OP_deref operation pops a location description rather than a memory address
449value from the stack and reads the storage associated with the location kind
450starting at the location description's offset.
451
452Existing DWARF expression operations that create location descriptions are
453changed to pop and push location descriptions on the stack. For example, the
454DW_OP_value, DW_OP_regx, DW_OP_implicit_value, DW_OP_implicit_pointer,
455DW_OP_stack_value, and DW_OP_piece.
456
457New operations that act on location descriptions can be added. For example, a
458DW_OP_offset operation that modifies the offset of the location description on
459top of the stack. Unlike the DW_OP_plus operation that only works with memory
460address, a DW_OP_offset operation can work with any location kind.
461
462To allow incremental and nested creation of composite location descriptions, a
463DW_OP_piece_end can be defined to explicitly indicate the last part of a
464composite. Currently, creating a composite must always be the last operation of
465an expression.
466
467A DW_OP_undefined operation can be defined that explicitly creates the undefined
468location description. Currently this is only possible as a piece of a composite
469when the stack is empty.
470
471## Examples
472
473This section provides some motivating examples to illustrate the benefits that
474result from allowing location descriptions on the stack.
475
476### Source Language Variable Spilled to Part of a Vector Register
477
478A compiler generating code for a GPU may allocate a source language variable
479that it proves has the same value for every lane of a SIMT thread in a scalar
480register. It may then need to spill that scalar register. To avoid the high cost
481of spilling to memory, it may spill to a fixed lane of one of the numerous
482vector registers.
483
484![Source Language Variable Spilled to Part of a Vector Register Example](images/06-extension-spill-sgpr-to-static-vpgr-lane.example.png)
485
486The following expression defines the location of a source language variable that
487the compiler allocated in a scalar register, but had to spill to lane 5 of a
488vector register at this point of the code.
489
490    DW_OP_regx VGPR0
491    DW_OP_offset_uconst 20
492
493![Source Language Variable Spilled to Part of a Vector Register Example: Step 1](images/06-extension-spill-sgpr-to-static-vpgr-lane.example.frame.1.png)
494
495The DW_OP_regx pushes a register location description on the stack. The storage
496for the register is the size of the vector register. The register location
497description conceptually references that storage with an initial offset of 0.
498The architecture defines the byte ordering of the register.
499
500![Source Language Variable Spilled to Part of a Vector Register Example: Step 2](images/06-extension-spill-sgpr-to-static-vpgr-lane.example.frame.2.png)
501
502The DW_OP_offset_uconst pops a location description off the stack, adds its
503operand value to the offset, and pushes the updated location description back on
504the stack. In this case the source language variable is being spilled to lane 5
505and each lane's component which is 32-bits (4 bytes), so the offset is 5*4=20.
506
507![Source Language Variable Spilled to Part of a Vector Register Example: Step 3](images/06-extension-spill-sgpr-to-static-vpgr-lane.example.frame.3.png)
508
509The result of the expression evaluation is the location description on the top
510of the stack.
511
512An alternative approach could be for the target to define distinct register
513names for each part of each vector register. However, this is not practical for
514GPUs due to the sheer number of registers that would have to be defined. It
515would also not permit a runtime index into part of the whole register to be used
516as shown in the next example.
517
518### Source Language Variable Spread Across Multiple Vector Registers
519
520A compiler may generate SIMT code for a GPU. Each source language thread of
521execution is mapped to a single lane of the GPU thread. Source language
522variables that are mapped to a register, are mapped to the lane component of the
523vector registers corresponding to the source language's thread of execution.
524
525The location expression for such variables must therefore be executed in the
526context of the focused source language thread of execution. A DW_OP_push_lane
527operation can be defined to push the value of the lane for the currently focused
528source language thread of execution. The value to use would be provided by the
529consumer of DWARF when it evaluates the location expression.
530
531If the source language variable is larger than the size of the vector register
532lane component, then multiple vector registers are used. Each source language
533thread of execution will only use the vector register components for its
534associated lane.
535
536![Source Language Variable Spread Across Multiple Vector Registers Example](images/07-extension-multi-lane-vgpr.example.png)
537
538The following expression defines the location of a source language variable that
539has to occupy two vector registers. A composite location description is created
540that combines the two parts. It will give the correct result regardless of which
541lane corresponds to the source language thread of execution that the user is
542focused on.
543
544    DW_OP_regx VGPR0
545    DW_OP_push_lane
546    DW_OP_uconst 4
547    DW_OP_mul
548    DW_OP_offset
549    DW_OP_piece 4
550    DW_OP_regx VGPR1
551    DW_OP_push_lane
552    DW_OP_uconst 4
553    DW_OP_mul
554    DW_OP_offset
555    DW_OP_piece 4
556
557![Source Language Variable Spread Across Multiple Vector Registers Example: Step 1](images/07-extension-multi-lane-vgpr.example.frame.1.png)
558
559The DW_OP_regx VGPR0 pushes a location description for the first register.
560
561![Source Language Variable Spread Across Multiple Vector Registers Example: Step 2](images/07-extension-multi-lane-vgpr.example.frame.2.png)
562
563The DW_OP_push_lane; DW_OP_uconst 4; DW_OP_mul calculates the offset for the
564focused lanes vector register component as 4 times the lane number.
565
566![Source Language Variable Spread Across Multiple Vector Registers Example: Step 3](images/07-extension-multi-lane-vgpr.example.frame.3.png)
567
568![Source Language Variable Spread Across Multiple Vector Registers Example: Step 4](images/07-extension-multi-lane-vgpr.example.frame.4.png)
569
570![Source Language Variable Spread Across Multiple Vector Registers Example: Step 5](images/07-extension-multi-lane-vgpr.example.frame.5.png)
571
572The DW_OP_offset adjusts the register location description's offset to the
573runtime computed value.
574
575![Source Language Variable Spread Across Multiple Vector Registers Example: Step 6](images/07-extension-multi-lane-vgpr.example.frame.6.png)
576
577The DW_OP_piece either creates a new composite location description, or adds a
578new part to an existing incomplete one. It pops the location description to use
579for the new part. It then pops the next stack element if it is an incomplete
580composite location description, otherwise it creates a new incomplete composite
581location description with no parts. Finally it pushes the incomplete composite
582after adding the new part.
583
584In this case a register location description is added to a new incomplete
585composite location description. The 4 of the DW_OP_piece specifies the size of
586the register storage that comprises the part. Note that the 4 bytes start at the
587computed register offset.
588
589For backwards compatibility, if the stack is empty or the top stack element is
590an incomplete composite, an undefined location description is used for the part.
591If the top stack element is a generic base type value, then it is implicitly
592converted to a global memory location description with an offset equal to the
593value.
594
595![Source Language Variable Spread Across Multiple Vector Registers Example: Step 7](images/07-extension-multi-lane-vgpr.example.frame.7.png)
596
597The rest of the expression does the same for VGPR1. However, when the
598DW_OP_piece is evaluated there is an incomplete composite on the stack. So the
599VGPR1 register location description is added as a second part.
600
601![Source Language Variable Spread Across Multiple Vector Registers Example: Step 8](images/07-extension-multi-lane-vgpr.example.frame.8.png)
602
603![Source Language Variable Spread Across Multiple Vector Registers Example: Step 9](images/07-extension-multi-lane-vgpr.example.frame.9.png)
604
605![Source Language Variable Spread Across Multiple Vector Registers Example: Step 10](images/07-extension-multi-lane-vgpr.example.frame.10.png)
606
607![Source Language Variable Spread Across Multiple Vector Registers Example: Step 11](images/07-extension-multi-lane-vgpr.example.frame.11.png)
608
609![Source Language Variable Spread Across Multiple Vector Registers Example: Step 12](images/07-extension-multi-lane-vgpr.example.frame.12.png)
610
611![Source Language Variable Spread Across Multiple Vector Registers Example: Step 13](images/07-extension-multi-lane-vgpr.example.frame.13.png)
612
613At the end of the expression, if the top stack element is an incomplete
614composite location description, it is converted to a complete location
615description and returned as the result.
616
617![Source Language Variable Spread Across Multiple Vector Registers Example: Step 14](images/07-extension-multi-lane-vgpr.example.frame.14.png)
618
619### Source Language Variable Spread Across Multiple Kinds of Locations
620
621This example is the same as the previous one, except the first 2 bytes of the
622second vector register have been spilled to memory, and the last 2 bytes have
623been proven to be a constant and optimized away.
624
625![Source Language Variable Spread Across Multiple Kinds of Locations Example](images/08-extension-mixed-composite.example.png)
626
627    DW_OP_regx VGPR0
628    DW_OP_push_lane
629    DW_OP_uconst 4
630    DW_OP_mul
631    DW_OP_offset
632    DW_OP_piece 4
633    DW_OP_addr 0xbeef
634    DW_OP_piece 2
635    DW_OP_uconst 0xf00d
636    DW_OP_stack_value
637    DW_OP_piece 2
638    DW_OP_piece_end
639
640The first 6 operations are the same.
641
642![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 7](images/08-extension-mixed-composite.example.frame.1.png)
643
644The DW_OP_addr operation pushes a global memory location description on the
645stack with an offset equal to the address.
646
647![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 8](images/08-extension-mixed-composite.example.frame.2.png)
648
649The next DW_OP_piece adds the global memory location description as the next 2
650byte part of the composite.
651
652![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 9](images/08-extension-mixed-composite.example.frame.3.png)
653
654The DW_OP_uconst 0xf00d; DW_OP_stack_value pushes an implicit location
655description on the stack. The storage of the implicit location description is
656the representation of the value 0xf00d using the generic base type's encoding,
657size, and byte ordering.
658
659![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 10](images/08-extension-mixed-composite.example.frame.4.png)
660
661![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 11](images/08-extension-mixed-composite.example.frame.5.png)
662
663The final DW_OP_piece adds 2 bytes of the implicit location description as the
664third part of the composite location description.
665
666![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 12](images/08-extension-mixed-composite.example.frame.6.png)
667
668The DW_OP_piece_end operation explicitly makes the incomplete composite location
669description into a complete location description. This allows a complete
670composite location description to be created on the stack that can be used as
671the location description of another following operation. For example, the
672DW_OP_offset can be applied to it. More practically, it permits creation of
673multiple composite location descriptions on the stack which can be used to pass
674arguments to a DWARF procedure using a DW_OP_call* operation. This can be
675beneficial to factor the incrementally creation of location descriptions.
676
677![Source Language Variable Spread Across Multiple Kinds of Locations Example: Step 12](images/08-extension-mixed-composite.example.frame.7.png)
678
679### Address Spaces
680
681Heterogeneous devices can have multiple hardware supported address spaces which
682use specific hardware instructions to access them.
683
684For example, GPUs that use SIMT execution may provide hardware support to access
685memory such that each lane can see a linear memory view, while the backing
686memory is actually being accessed in an interleaved manner so that the locations
687for each lanes Nth dword are contiguous. This minimizes cache lines read by the
688SIMT execution.
689
690![Address Spaces Example](images/09-extension-form-aspace.example.png)
691
692The following expression defines the location of a source language variable that
693is allocated at offset 0x10 in the current subprograms stack frame. The
694subprogram stack frames are per lane and reside in an interleaved address space.
695
696    DW_OP_regval_type SGPR0 Generic
697    DW_OP_uconst 1
698    DW_OP_form_aspace_address
699    DW_OP_offset 0x10
700
701![Address Spaces Example: Step 1](images/09-extension-form-aspace.example.frame.1.png)
702
703The DW_OP_regval_type operation pushes the contents of SGPR0 as a generic value.
704This is the register that holds the address of the current stack frame.
705
706![Address Spaces Example: Step 2](images/09-extension-form-aspace.example.frame.2.png)
707
708The DW_OP_uconst operation pushes the address space number. Each architecture
709defines the numbers it uses in DWARF. In this case, address space 1 is being
710used as the per lane memory.
711
712![Address Spaces Example: Step 3](images/09-extension-form-aspace.example.frame.3.png)
713
714The DW_OP_form_aspace_address operation pops a value and an address space
715number. Each address space is associated with a separate storage. A memory
716location description is pushed which refers to the address space's storage, with
717an offset of the popped value.
718
719![Address Spaces Example: Step 4](images/09-extension-form-aspace.example.frame.4.png)
720
721All operations that act on location descriptions work with memory locations
722regardless of their address space.
723
724Every architecture defines address space 0 as the default global memory address
725space.
726
727Generalizing memory location descriptions to include an address space component
728avoids having to create specialized operations to work with address spaces.
729
730The source variable is at offset 0x10 in the stack frame. The DW_OP_offset
731operation works on memory location descriptions that have an address space just
732like for any other kind of location description.
733
734![Address Spaces Example: Step 5](images/09-extension-form-aspace.example.frame.5.png)
735
736The only operations in DWARF 5 that take an address space are DW_OP_xderef*.
737They treat a value as the address in a specified address space, and read its
738contents. There is no operation to actually create a location description that
739references an address space. There is no way to include address space memory
740locations in parts of composite locations.
741
742Since DW_OP_piece now takes any kind of location description for its pieces, it
743is now possible for parts of a composite to involve locations in different
744address spaces. For example, this can happen when parts of a source variable
745allocated in a register are spilled to a stack frame that resides in the
746non-global address space.
747
748### Bit Offsets
749
750With the generalization of location descriptions on the stack, it is possible to
751define a DW_OP_bit_offset operation that adjusts the offset of any kind of
752location in terms of bits rather than bytes. The offset can be a runtime
753computed value. This is generally useful for any source language that support
754bit sized entities, and for registers that are not a whole number of bytes.
755
756DWARF 5 only supports bit fields in composites using DW_OP_bit_piece. It does
757not support runtime computed offsets which can happen for bit field packed
758arrays. It is also not generally composable as it must be the last part of an
759expression.
760
761The following example defines a location description for a source variable that
762is allocated starting at bit 20 of a register. A similar expression could be
763used if the source variable was at a bit offset within memory or a particular
764address space, or if the offset is a runtime value.
765
766![Bit Offsets Example](images/10-extension-bit-offset.example.png)
767
768    DW_OP_regx SGPR3
769    DW_OP_uconst 20
770    DW_OP_bit_offset
771
772![Bit Offsets Example: Step 1](images/10-extension-bit-offset.example.frame.1.png)
773
774![Bit Offsets Example: Step 2](images/10-extension-bit-offset.example.frame.2.png)
775
776![Bit Offsets Example: Step 3](images/10-extension-bit-offset.example.frame.3.png)
777
778The DW_OP_bit_offset operation pops a value and location description from the
779stack. It pushes the location description after updating its offset using the
780value as a bit count.
781
782![Bit Offsets Example: Step 4](images/10-extension-bit-offset.example.frame.4.png)
783
784The ordering of bits within a byte, like byte ordering, is defined by the target
785architecture. A base type could be extended to specify bit ordering in addition
786to byte ordering.
787
788## Call Frame Information (CFI)
789
790DWARF defines call frame information (CFI) that can be used to virtually unwind
791the subprogram call stack. This involves determining the location where register
792values have been spilled. DWARF 5 limits these locations to either be registers
793or global memory. As shown in the earlier examples, heterogeneous devices may
794spill registers to parts of other registers, to non-global memory address
795spaces, or even a composite of different location kinds.
796
797Therefore, the extension extends the CFI rules to support any kind of location
798description, and operations to create locations in address spaces.
799
800## Objects Not In Byte Aligned Global Memory
801
802DWARF 5 only effectively supports byte aligned memory locations on the stack by
803using a global memory address as a proxy for a memory location description. This
804is a problem for attributes that define DWARF expressions that require the
805location of some source language entity that is not allocated in byte aligned
806global memory.
807
808For example, the DWARF expression of the DW_AT_data_member_location attribute is
809evaluated with an initial stack containing the location of a type instance
810object. That object could be located in a register, in a non-global memory
811address space, be described by a composite location description, or could even
812be an implicit location description.
813
814A similar problem exists for DWARF expressions that use the
815DW_OP_push_object_address operation. This operation pushes the location of a
816program object associated with the attribute that defines the expression.
817
818Allowing any kind of location description on the stack permits the DW_OP_call*
819operations to be used to factor the creation of location descriptions. The
820inputs and outputs of the call are passed on the stack. For example, on GPUs an
821expression can be defined to describe the effective PC of inactive lanes of SIMT
822execution. This is naturally done by composing the result of expressions for
823each nested control flow region. This can be done by making each control flow
824region have its own DWARF procedure, and then calling it from the expressions of
825the nested control flow regions. The alternative is to make each control flow
826region have the complete expression which results in much larger DWARF and is
827less convenient to generate.
828
829GPU compilers work hard to allocate objects in the larger number of registers to
830reduce memory accesses, they have to use different memory address spaces, and
831they perform optimizations that result in composites of these. Allowing
832operations to work with any kind of location description enables creating
833expressions that support all of these.
834
835Full general support for bit fields and implicit locations benefits
836optimizations on any target.
837
838## Higher Order Operations
839
840The generalization allows an elegant way to add higher order operations that
841create location descriptions out of other location descriptions in a general
842composable manner.
843
844For example, a DW_OP_extend operation could create a composite location
845description out of a location description, an element size, and an element
846count. The resulting composite would effectively be a vector of element count
847elements with each element being the same location description of the specified
848bit size.
849
850A DW_OP_select_bit_piece operation could create a composite location description
851out of two location descriptions, a bit mask value, and an element size. The
852resulting composite would effectively be a vector of elements, selecting from
853one of the two input locations according to the bit mask.
854
855These could be used in the expression of an attribute that computes the
856effective PC of lanes of SIMT execution. The vector result efficiently computes
857the PC for each SIMT lane at once. The mask could be the hardware execution mask
858register that controls which SIMT lanes are executing. For active divergent
859lanes the vector element would be the current PC, and for inactive divergent
860lanes the PC would correspond to the source language line at which the lane is
861logically positioned.
862
863Similarly, a DW_OP_overlay_piece operation could be defined that creates a
864composite location description out of two location descriptions, an offset
865value, and a size. The resulting composite would consist of parts that are
866equivalent to one of the location descriptions, but with the other location
867description replacing a slice defined by the offset and size. This could be used
868to efficiently express a source language array that has had a set of elements
869promoted into a vector register when executing a set of iterations of a loop in
870a SIMD manner.
871
872## Objects In Multiple Places
873
874A compiler may allocate a source variable in stack frame memory, but for some
875range of code may promote it to a register. If the generated code does not
876change the register value, then there is no need to save it back to memory.
877Effectively, during that range, the source variable is in both memory and a
878register. If a consumer, such as a debugger, allows the user to change the value
879of the source variable in that PC range, then it would need to change both
880places.
881
882DWARF 5 supports loclists which are able to specify the location of a source
883language entity is in different places at different PC locations. It can also
884express that a source language entity is in multiple places at the same time.
885
886DWARF 5 defines operation expressions and loclists separately. In general, this
887is adequate as non-memory location descriptions can only be computed as the last
888step of an expression evaluation.
889
890However, allowing location descriptions on the stack permits non-memory location
891descriptions to be used in the middle of expression evaluation. For example, the
892DW_OP_call* and DW_OP_implicit_pointer operations can result in evaluating the
893expression of a DW_AT_location attribute of a DIE. The DW_AT_location attribute
894allows the loclist form. So the result could include multiple location
895descriptions.
896
897Similarly, the DWARF expression associated with attributes such as
898DW_AT_data_member_location that are evaluated with an initial stack containing a
899location description, or a DWARF operation expression that uses the
900DW_OP_push_object_address operation, may want to act on the result of another
901expression that returned a location description involving multiple places.
902
903Therefore, the extension needs to define how expression operations that use those
904results will behave. The extension does this by generalizing the expression stack
905to allow an entry to be one or more single location descriptions. In doing this,
906it unifies the definitions of DWARF operation expressions and loclist
907expressions in a natural way.
908
909All operations that act on location descriptions are extended to act on multiple
910single location descriptions. For example, the DW_OP_offset operation adds the
911offset to each single location description. The DW_OP_deref* operations simply
912read the storage of one of the single location descriptions, since multiple
913single location descriptions must all hold the same value. Similarly, if the
914evaluation of a DWARF expression results in multiple single location
915descriptions, the consumer can ensure any updates are done to all of them, and
916any reads can use any one of them.
917
918# Conclusion
919
920A strength of DWARF is that it has generally sought to provide generalized
921composable solutions that address many problems, rather than solutions that only
922address one-off issues. This extension attempts to follow that tradition by
923defining a backwards compatible composable generalization that can address a
924significant family of issues. It addresses the specific issues present for
925heterogeneous computing devices, provides benefits for non-heterogeneous
926devices, and can help address a number of other previously reported issues.
927
928# Further Information
929
930The following references provide additional information on the extension.
931
932Slides and a video of a presentation at the Linux Plumbers Conference 2021
933related to this extension are available.
934
935The LLVM compiler extension includes possible normative text changes for this
936extension as well as the operations mentioned in the motivating examples. It
937also covers other extensions needed for heterogeneous devices.
938
939- DWARF extensions for optimized SIMT/SIMD (GPU) debugging - Linux Plumbers Conference 2021
940  - [Video](https://www.youtube.com/watch?v=QiR0ra0ymEY&t=10015s)
941  - [Slides](https://linuxplumbersconf.org/event/11/contributions/1012/attachments/798/1505/DWARF_Extensions_for_Optimized_SIMT-SIMD_GPU_Debugging-LPC2021.pdf)
942- [DWARF Extensions For Heterogeneous Debugging](https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html)
943