16ac1de48SDmitri Gribenko================================
26ac1de48SDmitri GribenkoSource Level Debugging with LLVM
36ac1de48SDmitri Gribenko================================
46ac1de48SDmitri Gribenko
56ac1de48SDmitri Gribenko.. contents::
66ac1de48SDmitri Gribenko   :local:
76ac1de48SDmitri Gribenko
86ac1de48SDmitri GribenkoIntroduction
96ac1de48SDmitri Gribenko============
106ac1de48SDmitri Gribenko
116ac1de48SDmitri GribenkoThis document is the central repository for all information pertaining to debug
126ac1de48SDmitri Gribenkoinformation in LLVM.  It describes the :ref:`actual format that the LLVM debug
136ac1de48SDmitri Gribenkoinformation takes <format>`, which is useful for those interested in creating
146ac1de48SDmitri Gribenkofront-ends or dealing directly with the information.  Further, this document
156ac1de48SDmitri Gribenkoprovides specific examples of what debug information for C/C++ looks like.
166ac1de48SDmitri Gribenko
176ac1de48SDmitri GribenkoPhilosophy behind LLVM debugging information
186ac1de48SDmitri Gribenko--------------------------------------------
196ac1de48SDmitri Gribenko
206ac1de48SDmitri GribenkoThe idea of the LLVM debugging information is to capture how the important
216ac1de48SDmitri Gribenkopieces of the source-language's Abstract Syntax Tree map onto LLVM code.
226ac1de48SDmitri GribenkoSeveral design aspects have shaped the solution that appears here.  The
236ac1de48SDmitri Gribenkoimportant ones are:
246ac1de48SDmitri Gribenko
256ac1de48SDmitri Gribenko* Debugging information should have very little impact on the rest of the
266ac1de48SDmitri Gribenko  compiler.  No transformations, analyses, or code generators should need to
276ac1de48SDmitri Gribenko  be modified because of debugging information.
286ac1de48SDmitri Gribenko
296ac1de48SDmitri Gribenko* LLVM optimizations should interact in :ref:`well-defined and easily described
306ac1de48SDmitri Gribenko  ways <intro_debugopt>` with the debugging information.
316ac1de48SDmitri Gribenko
326ac1de48SDmitri Gribenko* Because LLVM is designed to support arbitrary programming languages,
336ac1de48SDmitri Gribenko  LLVM-to-LLVM tools should not need to know anything about the semantics of
346ac1de48SDmitri Gribenko  the source-level-language.
356ac1de48SDmitri Gribenko
366ac1de48SDmitri Gribenko* Source-level languages are often **widely** different from one another.
376ac1de48SDmitri Gribenko  LLVM should not put any restrictions of the flavor of the source-language,
386ac1de48SDmitri Gribenko  and the debugging information should work with any language.
396ac1de48SDmitri Gribenko
406ac1de48SDmitri Gribenko* With code generator support, it should be possible to use an LLVM compiler
416ac1de48SDmitri Gribenko  to compile a program to native machine code and standard debugging
426ac1de48SDmitri Gribenko  formats.  This allows compatibility with traditional machine-code level
436ac1de48SDmitri Gribenko  debuggers, like GDB or DBX.
446ac1de48SDmitri Gribenko
456ac1de48SDmitri GribenkoThe approach used by the LLVM implementation is to use a small set of
466ac1de48SDmitri Gribenko:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
476ac1de48SDmitri Gribenkobetween LLVM program objects and the source-level objects.  The description of
486ac1de48SDmitri Gribenkothe source-level program is maintained in LLVM metadata in an
496ac1de48SDmitri Gribenko:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
506ac1de48SDmitri Gribenkocurrently uses working draft 7 of the `DWARF 3 standard
516ac1de48SDmitri Gribenko<http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
526ac1de48SDmitri Gribenko
536ac1de48SDmitri GribenkoWhen a program is being debugged, a debugger interacts with the user and turns
546ac1de48SDmitri Gribenkothe stored debug information into source-language specific information.  As
556ac1de48SDmitri Gribenkosuch, a debugger must be aware of the source-language, and is thus tied to a
566ac1de48SDmitri Gribenkospecific language or family of languages.
576ac1de48SDmitri Gribenko
586ac1de48SDmitri GribenkoDebug information consumers
596ac1de48SDmitri Gribenko---------------------------
606ac1de48SDmitri Gribenko
616ac1de48SDmitri GribenkoThe role of debug information is to provide meta information normally stripped
626ac1de48SDmitri Gribenkoaway during the compilation process.  This meta information provides an LLVM
636ac1de48SDmitri Gribenkouser a relationship between generated code and the original program source
646ac1de48SDmitri Gribenkocode.
656ac1de48SDmitri Gribenko
660ad60a90SReid KlecknerCurrently, there are two backend consumers of debug info: DwarfDebug and
67ce898dbbSVedant KumarCodeViewDebug. DwarfDebug produces DWARF suitable for use with GDB, LLDB, and
680ad60a90SReid Klecknerother DWARF-based debuggers. :ref:`CodeViewDebug <codeview>` produces CodeView,
690ad60a90SReid Klecknerthe Microsoft debug info format, which is usable with Microsoft debuggers such
700ad60a90SReid Kleckneras Visual Studio and WinDBG. LLVM's debug information format is mostly derived
710ad60a90SReid Klecknerfrom and inspired by DWARF, but it is feasible to translate into other target
720ad60a90SReid Klecknerdebug info formats such as STABS.
736ac1de48SDmitri Gribenko
746ac1de48SDmitri GribenkoIt would also be reasonable to use debug information to feed profiling tools
756ac1de48SDmitri Gribenkofor analysis of generated code, or, tools for reconstructing the original
766ac1de48SDmitri Gribenkosource from generated code.
776ac1de48SDmitri Gribenko
786ac1de48SDmitri Gribenko.. _intro_debugopt:
796ac1de48SDmitri Gribenko
80ad6ff878SAnastasis GrammenosDebug information and optimizations
81ad6ff878SAnastasis Grammenos-----------------------------------
826ac1de48SDmitri Gribenko
836ac1de48SDmitri GribenkoAn extremely high priority of LLVM debugging information is to make it interact
846ac1de48SDmitri Gribenkowell with optimizations and analysis.  In particular, the LLVM debug
856ac1de48SDmitri Gribenkoinformation provides the following guarantees:
866ac1de48SDmitri Gribenko
876ac1de48SDmitri Gribenko* LLVM debug information **always provides information to accurately read
886ac1de48SDmitri Gribenko  the source-level state of the program**, regardless of which LLVM
89b429a0feSVedant Kumar  optimizations have been run. :doc:`HowToUpdateDebugInfo` specifies how debug
90b429a0feSVedant Kumar  info should be updated in various kinds of code transformations to avoid
91b429a0feSVedant Kumar  breaking this guarantee, and how to preserve as much useful debug info as
92b429a0feSVedant Kumar  possible.  Note that some optimizations may impact the ability to modify the
93b429a0feSVedant Kumar  current state of the program with a debugger, such as setting program
94b429a0feSVedant Kumar  variables, or calling functions that have been deleted.
956ac1de48SDmitri Gribenko
96ce898dbbSVedant Kumar* As desired, LLVM optimizations can be upgraded to be aware of debugging
97ce898dbbSVedant Kumar  information, allowing them to update the debugging information as they
98ce898dbbSVedant Kumar  perform aggressive optimizations.  This means that, with effort, the LLVM
99ce898dbbSVedant Kumar  optimizers could optimize debug code just as well as non-debug code.
1006ac1de48SDmitri Gribenko
1016ac1de48SDmitri Gribenko* LLVM debug information does not prevent optimizations from
1026ac1de48SDmitri Gribenko  happening (for example inlining, basic block reordering/merging/cleanup,
1036ac1de48SDmitri Gribenko  tail duplication, etc).
1046ac1de48SDmitri Gribenko
1056ac1de48SDmitri Gribenko* LLVM debug information is automatically optimized along with the rest of
1066ac1de48SDmitri Gribenko  the program, using existing facilities.  For example, duplicate
1076ac1de48SDmitri Gribenko  information is automatically merged by the linker, and unused information
1086ac1de48SDmitri Gribenko  is automatically removed.
1096ac1de48SDmitri Gribenko
1106ac1de48SDmitri GribenkoBasically, the debug information allows you to compile a program with
1116ac1de48SDmitri Gribenko"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
1126ac1de48SDmitri Gribenkothe program as it executes from a debugger.  Compiling a program with
1136ac1de48SDmitri Gribenko"``-O3 -g``" gives you full debug information that is always available and
1146ac1de48SDmitri Gribenkoaccurate for reading (e.g., you get accurate stack traces despite tail call
1156ac1de48SDmitri Gribenkoelimination and inlining), but you might lose the ability to modify the program
116ce898dbbSVedant Kumarand call functions which were optimized out of the program, or inlined away
1176ac1de48SDmitri Gribenkocompletely.
1186ac1de48SDmitri Gribenko
1194f340e97SMatthias BraunThe :doc:`LLVM test-suite <TestSuiteMakefileGuide>` provides a framework to
1204f340e97SMatthias Brauntest the optimizer's handling of debugging information.  It can be run like
1214f340e97SMatthias Braunthis:
1226ac1de48SDmitri Gribenko
1236ac1de48SDmitri Gribenko.. code-block:: bash
1246ac1de48SDmitri Gribenko
1256ac1de48SDmitri Gribenko  % cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
1266ac1de48SDmitri Gribenko  % make TEST=dbgopt
1276ac1de48SDmitri Gribenko
1286ac1de48SDmitri GribenkoThis will test impact of debugging information on optimization passes.  If
1296ac1de48SDmitri Gribenkodebugging information influences optimization passes then it will be reported
1306ac1de48SDmitri Gribenkoas a failure.  See :doc:`TestingGuide` for more information on LLVM test
1316ac1de48SDmitri Gribenkoinfrastructure and how to run various tests.
1326ac1de48SDmitri Gribenko
1336ac1de48SDmitri Gribenko.. _format:
1346ac1de48SDmitri Gribenko
1356ac1de48SDmitri GribenkoDebugging information format
1366ac1de48SDmitri Gribenko============================
1376ac1de48SDmitri Gribenko
1386ac1de48SDmitri GribenkoLLVM debugging information has been carefully designed to make it possible for
1396ac1de48SDmitri Gribenkothe optimizer to optimize the program and debugging information without
1406ac1de48SDmitri Gribenkonecessarily having to know anything about debugging information.  In
1416ac1de48SDmitri Gribenkoparticular, the use of metadata avoids duplicated debugging information from
1426ac1de48SDmitri Gribenkothe beginning, and the global dead code elimination pass automatically deletes
1436ac1de48SDmitri Gribenkodebugging information for a function if it decides to delete the function.
1446ac1de48SDmitri Gribenko
1456ac1de48SDmitri GribenkoTo do this, most of the debugging information (descriptors for types,
1466ac1de48SDmitri Gribenkovariables, functions, source files, etc) is inserted by the language front-end
1476ac1de48SDmitri Gribenkoin the form of LLVM metadata.
1486ac1de48SDmitri Gribenko
1496ac1de48SDmitri GribenkoDebug information is designed to be agnostic about the target debugger and
1506ac1de48SDmitri Gribenkodebugging information representation (e.g. DWARF/Stabs/etc).  It uses a generic
1516ac1de48SDmitri Gribenkopass to decode the information that represents variables, types, functions,
1526ac1de48SDmitri Gribenkonamespaces, etc: this allows for arbitrary source-language semantics and
1536ac1de48SDmitri Gribenkotype-systems to be used, as long as there is a module written for the target
1546ac1de48SDmitri Gribenkodebugger to interpret the information.
1556ac1de48SDmitri Gribenko
1566ac1de48SDmitri GribenkoTo provide basic functionality, the LLVM debugger does have to make some
1576ac1de48SDmitri Gribenkoassumptions about the source-level language being debugged, though it keeps
1586ac1de48SDmitri Gribenkothese to a minimum.  The only common features that the LLVM debugger assumes
159605308a4SMichael Kupersteinexist are `source files <LangRef.html#difile>`_, and `program objects
160605308a4SMichael Kuperstein<LangRef.html#diglobalvariable>`_.  These abstract objects are used by a
161d937cd9fSDuncan P. N. Exon Smithdebugger to form stack traces, show information about local variables, etc.
1626ac1de48SDmitri Gribenko
1636ac1de48SDmitri GribenkoThis section of the documentation first describes the representation aspects
1646ac1de48SDmitri Gribenkocommon to any source-language.  :ref:`ccxx_frontend` describes the data layout
1656ac1de48SDmitri Gribenkoconventions used by the C and C++ front-ends.
1666ac1de48SDmitri Gribenko
167d937cd9fSDuncan P. N. Exon SmithDebug information descriptors are `specialized metadata nodes
168d937cd9fSDuncan P. N. Exon Smith<LangRef.html#specialized-metadata>`_, first-class subclasses of ``Metadata``.
169b1416837SAdrian Prantl
1706ac1de48SDmitri Gribenko.. _format_common_intrinsics:
1716ac1de48SDmitri Gribenko
1726ac1de48SDmitri GribenkoDebugger intrinsic functions
173d937cd9fSDuncan P. N. Exon Smith----------------------------
1746ac1de48SDmitri Gribenko
1756ac1de48SDmitri GribenkoLLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
1760fe506bcSReid Klecknertrack source local variables through optimization and code generation.
1770fe506bcSReid Kleckner
1780fe506bcSReid Kleckner``llvm.dbg.addr``
1790fe506bcSReid Kleckner^^^^^^^^^^^^^^^^^^^^
1800fe506bcSReid Kleckner
1810fe506bcSReid Kleckner.. code-block:: llvm
1820fe506bcSReid Kleckner
1830fe506bcSReid Kleckner  void @llvm.dbg.addr(metadata, metadata, metadata)
1840fe506bcSReid Kleckner
1850fe506bcSReid KlecknerThis intrinsic provides information about a local element (e.g., variable).
1860fe506bcSReid KlecknerThe first argument is metadata holding the address of variable, typically a
1870fe506bcSReid Klecknerstatic alloca in the function entry block.  The second argument is a
1880fe506bcSReid Kleckner`local variable <LangRef.html#dilocalvariable>`_ containing a description of
1890fe506bcSReid Klecknerthe variable.  The third argument is a `complex expression
1900fe506bcSReid Kleckner<LangRef.html#diexpression>`_.  An `llvm.dbg.addr` intrinsic describes the
1910fe506bcSReid Kleckner*address* of a source variable.
1920fe506bcSReid Kleckner
193aaecdc44SJonas Devlieghere.. code-block:: text
1940fe506bcSReid Kleckner
1950fe506bcSReid Kleckner    %i.addr = alloca i32, align 4
1960fe506bcSReid Kleckner    call void @llvm.dbg.addr(metadata i32* %i.addr, metadata !1,
1970fe506bcSReid Kleckner                             metadata !DIExpression()), !dbg !2
1980fe506bcSReid Kleckner    !1 = !DILocalVariable(name: "i", ...) ; int i
1990fe506bcSReid Kleckner    !2 = !DILocation(...)
2000fe506bcSReid Kleckner    ...
2010fe506bcSReid Kleckner    %buffer = alloca [256 x i8], align 8
2020fe506bcSReid Kleckner    ; The address of i is buffer+64.
2030fe506bcSReid Kleckner    call void @llvm.dbg.addr(metadata [256 x i8]* %buffer, metadata !3,
2040fe506bcSReid Kleckner                             metadata !DIExpression(DW_OP_plus, 64)), !dbg !4
2050fe506bcSReid Kleckner    !3 = !DILocalVariable(name: "i", ...) ; int i
2060fe506bcSReid Kleckner    !4 = !DILocation(...)
2070fe506bcSReid Kleckner
2080fe506bcSReid KlecknerA frontend should generate exactly one call to ``llvm.dbg.addr`` at the point
2090fe506bcSReid Klecknerof declaration of a source variable. Optimization passes that fully promote the
2100fe506bcSReid Klecknervariable from memory to SSA values will replace this call with possibly
2110fe506bcSReid Klecknermultiple calls to `llvm.dbg.value`. Passes that delete stores are effectively
2120fe506bcSReid Klecknerpartial promotion, and they will insert a mix of calls to ``llvm.dbg.value``
2130fe506bcSReid Klecknerand ``llvm.dbg.addr`` to track the source variable value when it is available.
2140fe506bcSReid KlecknerAfter optimization, there may be multiple calls to ``llvm.dbg.addr`` describing
2150fe506bcSReid Klecknerthe program points where the variables lives in memory. All calls for the same
2160fe506bcSReid Klecknerconcrete source variable must agree on the memory location.
2170fe506bcSReid Kleckner
2186ac1de48SDmitri Gribenko
2196ac1de48SDmitri Gribenko``llvm.dbg.declare``
2206ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^^
2216ac1de48SDmitri Gribenko
2226ac1de48SDmitri Gribenko.. code-block:: llvm
2236ac1de48SDmitri Gribenko
224605308a4SMichael Kuperstein  void @llvm.dbg.declare(metadata, metadata, metadata)
2256ac1de48SDmitri Gribenko
2260fe506bcSReid KlecknerThis intrinsic is identical to `llvm.dbg.addr`, except that there can only be
2270fe506bcSReid Klecknerone call to `llvm.dbg.declare` for a given concrete `local variable
2280fe506bcSReid Kleckner<LangRef.html#dilocalvariable>`_. It is not control-dependent, meaning that if
2290fe506bcSReid Klecknera call to `llvm.dbg.declare` exists and has a valid location argument, that
2300fe506bcSReid Kleckneraddress is considered to be the true home of the variable across its entire
2310fe506bcSReid Klecknerlifetime. This makes it hard for optimizations to preserve accurate debug info
2320fe506bcSReid Klecknerin the presence of ``llvm.dbg.declare``, so we are transitioning away from it,
2330fe506bcSReid Klecknerand we plan to deprecate it in future LLVM releases.
2346825fb64SAdrian Prantl
2356ac1de48SDmitri Gribenko
2366ac1de48SDmitri Gribenko``llvm.dbg.value``
2376ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^
2386ac1de48SDmitri Gribenko
2396ac1de48SDmitri Gribenko.. code-block:: llvm
2406ac1de48SDmitri Gribenko
241abe04759SAdrian Prantl  void @llvm.dbg.value(metadata, metadata, metadata)
2426ac1de48SDmitri Gribenko
2436ac1de48SDmitri GribenkoThis intrinsic provides information when a user source variable is set to a new
244593ec59cSVedant Kumarvalue.  The first argument is the new value (wrapped as metadata).  The second
245abe04759SAdrian Prantlargument is a `local variable <LangRef.html#dilocalvariable>`_ containing a
246593ec59cSVedant Kumardescription of the variable.  The third argument is a `complex expression
247abe04759SAdrian Prantl<LangRef.html#diexpression>`_.
2486ac1de48SDmitri Gribenko
2498a05b01dSVedant KumarAn `llvm.dbg.value` intrinsic describes the *value* of a source variable
2508a05b01dSVedant Kumardirectly, not its address.  Note that the value operand of this intrinsic may
2518a05b01dSVedant Kumarbe indirect (i.e, a pointer to the source variable), provided that interpreting
2528a05b01dSVedant Kumarthe complex expression derives the direct value.
2538a05b01dSVedant Kumar
2546ac1de48SDmitri GribenkoObject lifetimes and scoping
2556ac1de48SDmitri Gribenko============================
2566ac1de48SDmitri Gribenko
2576ac1de48SDmitri GribenkoIn many languages, the local variables in functions can have their lifetimes or
2586ac1de48SDmitri Gribenkoscopes limited to a subset of a function.  In the C family of languages, for
2596ac1de48SDmitri Gribenkoexample, variables are only live (readable and writable) within the source
2606ac1de48SDmitri Gribenkoblock that they are defined in.  In functional languages, values are only
2616ac1de48SDmitri Gribenkoreadable after they have been defined.  Though this is a very obvious concept,
2626ac1de48SDmitri Gribenkoit is non-trivial to model in LLVM, because it has no notion of scoping in this
2636ac1de48SDmitri Gribenkosense, and does not want to be tied to a language's scoping rules.
2646ac1de48SDmitri Gribenko
2656ac1de48SDmitri GribenkoIn order to handle this, the LLVM debug format uses the metadata attached to
2666ac1de48SDmitri Gribenkollvm instructions to encode line number and scoping information.  Consider the
2676ac1de48SDmitri Gribenkofollowing C fragment, for example:
2686ac1de48SDmitri Gribenko
2696ac1de48SDmitri Gribenko.. code-block:: c
2706ac1de48SDmitri Gribenko
2716ac1de48SDmitri Gribenko  1.  void foo() {
2726ac1de48SDmitri Gribenko  2.    int X = 21;
2736ac1de48SDmitri Gribenko  3.    int Y = 22;
2746ac1de48SDmitri Gribenko  4.    {
2756ac1de48SDmitri Gribenko  5.      int Z = 23;
2766ac1de48SDmitri Gribenko  6.      Z = X;
2776ac1de48SDmitri Gribenko  7.    }
2786ac1de48SDmitri Gribenko  8.    X = Y;
2796ac1de48SDmitri Gribenko  9.  }
2806ac1de48SDmitri Gribenko
2810fe506bcSReid Kleckner.. FIXME: Update the following example to use llvm.dbg.addr once that is the
2820fe506bcSReid Kleckner   default in clang.
2830fe506bcSReid Kleckner
2846ac1de48SDmitri GribenkoCompiled to LLVM, this function would be represented like this:
2856ac1de48SDmitri Gribenko
286124f2593SRenato Golin.. code-block:: text
2876ac1de48SDmitri Gribenko
288d937cd9fSDuncan P. N. Exon Smith  ; Function Attrs: nounwind ssp uwtable
28950108683SPeter Collingbourne  define void @foo() #0 !dbg !4 {
2906ac1de48SDmitri Gribenko  entry:
291e814a37aSBill Wendling    %X = alloca i32, align 4
292e814a37aSBill Wendling    %Y = alloca i32, align 4
293e814a37aSBill Wendling    %Z = alloca i32, align 4
29405963a3dSArthur Eubanks    call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
29505963a3dSArthur Eubanks    store i32 21, i32* %X, align 4, !dbg !14
29605963a3dSArthur Eubanks    call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
29705963a3dSArthur Eubanks    store i32 22, i32* %Y, align 4, !dbg !16
29805963a3dSArthur Eubanks    call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
29905963a3dSArthur Eubanks    store i32 23, i32* %Z, align 4, !dbg !19
30005963a3dSArthur Eubanks    %0 = load i32, i32* %X, align 4, !dbg !20
30105963a3dSArthur Eubanks    store i32 %0, i32* %Z, align 4, !dbg !21
30205963a3dSArthur Eubanks    %1 = load i32, i32* %Y, align 4, !dbg !22
30305963a3dSArthur Eubanks    store i32 %1, i32* %X, align 4, !dbg !23
30405963a3dSArthur Eubanks    ret void, !dbg !24
3056ac1de48SDmitri Gribenko  }
3066ac1de48SDmitri Gribenko
307c4fe5db1SDavid Blaikie  ; Function Attrs: nounwind readnone
308d937cd9fSDuncan P. N. Exon Smith  declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
3096ac1de48SDmitri Gribenko
310502a77f1SFangrui Song  attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
311c4fe5db1SDavid Blaikie  attributes #1 = { nounwind readnone }
312c4fe5db1SDavid Blaikie
313c4fe5db1SDavid Blaikie  !llvm.dbg.cu = !{!0}
314d937cd9fSDuncan P. N. Exon Smith  !llvm.module.flags = !{!7, !8, !9}
315d937cd9fSDuncan P. N. Exon Smith  !llvm.ident = !{!10}
316c4fe5db1SDavid Blaikie
317b8089516SAdrian Prantl  !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2)
318a9308c49SDuncan P. N. Exon Smith  !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info")
319d937cd9fSDuncan P. N. Exon Smith  !2 = !{}
320d937cd9fSDuncan P. N. Exon Smith  !3 = !{!4}
321f623dc9aSEllis Hoag  !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, retainedNodes: !2)
322a9308c49SDuncan P. N. Exon Smith  !5 = !DISubroutineType(types: !6)
323d937cd9fSDuncan P. N. Exon Smith  !6 = !{null}
324d937cd9fSDuncan P. N. Exon Smith  !7 = !{i32 2, !"Dwarf Version", i32 2}
325d937cd9fSDuncan P. N. Exon Smith  !8 = !{i32 2, !"Debug Info Version", i32 3}
326d937cd9fSDuncan P. N. Exon Smith  !9 = !{i32 1, !"PIC Level", i32 2}
327d937cd9fSDuncan P. N. Exon Smith  !10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
328ed013cd2SDuncan P. N. Exon Smith  !11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
329a9308c49SDuncan P. N. Exon Smith  !12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
33005963a3dSArthur Eubanks  !13 = !DIExpression()
33105963a3dSArthur Eubanks  !14 = !DILocation(line: 2, column: 9, scope: !4)
33205963a3dSArthur Eubanks  !15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
33305963a3dSArthur Eubanks  !16 = !DILocation(line: 3, column: 9, scope: !4)
33405963a3dSArthur Eubanks  !17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
33505963a3dSArthur Eubanks  !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
33605963a3dSArthur Eubanks  !19 = !DILocation(line: 5, column: 11, scope: !18)
33705963a3dSArthur Eubanks  !20 = !DILocation(line: 6, column: 11, scope: !18)
33805963a3dSArthur Eubanks  !21 = !DILocation(line: 6, column: 9, scope: !18)
33905963a3dSArthur Eubanks  !22 = !DILocation(line: 8, column: 9, scope: !4)
34005963a3dSArthur Eubanks  !23 = !DILocation(line: 8, column: 7, scope: !4)
34105963a3dSArthur Eubanks  !24 = !DILocation(line: 9, column: 3, scope: !4)
342d937cd9fSDuncan P. N. Exon Smith
3436ac1de48SDmitri Gribenko
3446ac1de48SDmitri GribenkoThis example illustrates a few important details about LLVM debugging
3456ac1de48SDmitri Gribenkoinformation.  In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
3466ac1de48SDmitri Gribenkolocation information, which are attached to an instruction, are applied
3476ac1de48SDmitri Gribenkotogether to allow a debugger to analyze the relationship between statements,
3486ac1de48SDmitri Gribenkovariable definitions, and the code used to implement the function.
3496ac1de48SDmitri Gribenko
3506ac1de48SDmitri Gribenko.. code-block:: llvm
3516ac1de48SDmitri Gribenko
35205963a3dSArthur Eubanks  call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
353c4fe5db1SDavid Blaikie    ; [debug line = 2:7] [debug variable = X]
3546ac1de48SDmitri Gribenko
3556ac1de48SDmitri GribenkoThe first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
35605963a3dSArthur Eubanksvariable ``X``.  The metadata ``!dbg !14`` attached to the intrinsic provides
3576ac1de48SDmitri Gribenkoscope information for the variable ``X``.
3586ac1de48SDmitri Gribenko
359124f2593SRenato Golin.. code-block:: text
3606ac1de48SDmitri Gribenko
36105963a3dSArthur Eubanks  !14 = !DILocation(line: 2, column: 9, scope: !4)
36250108683SPeter Collingbourne  !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
363d937cd9fSDuncan P. N. Exon Smith                              isLocal: false, isDefinition: true, scopeLine: 1,
364f623dc9aSEllis Hoag                              isOptimized: false, retainedNodes: !2)
3656ac1de48SDmitri Gribenko
36605963a3dSArthur EubanksHere ``!14`` is metadata providing `location information
367605308a4SMichael Kuperstein<LangRef.html#dilocation>`_.  In this example, scope is encoded by ``!4``, a
368605308a4SMichael Kuperstein`subprogram descriptor <LangRef.html#disubprogram>`_.  This way the location
3696ac1de48SDmitri Gribenkoinformation attached to the intrinsics indicates that the variable ``X`` is
3706ac1de48SDmitri Gribenkodeclared at line number 2 at a function level scope in function ``foo``.
3716ac1de48SDmitri Gribenko
3726ac1de48SDmitri GribenkoNow lets take another example.
3736ac1de48SDmitri Gribenko
3746ac1de48SDmitri Gribenko.. code-block:: llvm
3756ac1de48SDmitri Gribenko
37605963a3dSArthur Eubanks  call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
377c4fe5db1SDavid Blaikie    ; [debug line = 5:9] [debug variable = Z]
3786ac1de48SDmitri Gribenko
379c4fe5db1SDavid BlaikieThe third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
38005963a3dSArthur Eubanksvariable ``Z``.  The metadata ``!dbg !19`` attached to the intrinsic provides
3816ac1de48SDmitri Gribenkoscope information for the variable ``Z``.
3826ac1de48SDmitri Gribenko
383124f2593SRenato Golin.. code-block:: text
3846ac1de48SDmitri Gribenko
38505963a3dSArthur Eubanks  !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
38605963a3dSArthur Eubanks  !19 = !DILocation(line: 5, column: 11, scope: !18)
3876ac1de48SDmitri Gribenko
38805963a3dSArthur EubanksHere ``!19`` indicates that ``Z`` is declared at line number 5 and column
38905963a3dSArthur Eubanksnumber 11 inside of lexical scope ``!18``.  The lexical scope itself resides
390d937cd9fSDuncan P. N. Exon Smithinside of subprogram ``!4`` described above.
3916ac1de48SDmitri Gribenko
3926ac1de48SDmitri GribenkoThe scope information attached with each instruction provides a straightforward
3936ac1de48SDmitri Gribenkoway to find instructions covered by a scope.
3946ac1de48SDmitri Gribenko
39566943c32SJeremy MorseObject lifetime in optimized code
39666943c32SJeremy Morse=================================
39766943c32SJeremy Morse
39866943c32SJeremy MorseIn the example above, every variable assignment uniquely corresponds to a
39966943c32SJeremy Morsememory store to the variable's position on the stack. However in heavily
40066943c32SJeremy Morseoptimized code LLVM promotes most variables into SSA values, which can
40166943c32SJeremy Morseeventually be placed in physical registers or memory locations. To track SSA
40266943c32SJeremy Morsevalues through compilation, when objects are promoted to SSA values an
40366943c32SJeremy Morse``llvm.dbg.value`` intrinsic is created for each assignment, recording the
40466943c32SJeremy Morsevariable's new location. Compared with the ``llvm.dbg.declare`` intrinsic:
40566943c32SJeremy Morse
4069370a741SAdrian Prantl* A dbg.value terminates the effect of any preceding dbg.values for (any
40766943c32SJeremy Morse  overlapping fragments of) the specified variable.
40866943c32SJeremy Morse* The dbg.value's position in the IR defines where in the instruction stream
40966943c32SJeremy Morse  the variable's value changes.
41066943c32SJeremy Morse* Operands can be constants, indicating the variable is assigned a
41166943c32SJeremy Morse  constant value.
41266943c32SJeremy Morse
41366943c32SJeremy MorseCare must be taken to update ``llvm.dbg.value`` intrinsics when optimization
41466943c32SJeremy Morsepasses alter or move instructions and blocks -- the developer could observe such
41566943c32SJeremy Morsechanges reflected in the value of variables when debugging the program. For any
41666943c32SJeremy Morseexecution of the optimized program, the set of variable values presented to the
41766943c32SJeremy Morsedeveloper by the debugger should not show a state that would never have existed
41866943c32SJeremy Morsein the execution of the unoptimized program, given the same input. Doing so
41966943c32SJeremy Morserisks misleading the developer by reporting a state that does not exist,
42066943c32SJeremy Morsedamaging their understanding of the optimized program and undermining their
42166943c32SJeremy Morsetrust in the debugger.
42266943c32SJeremy Morse
42366943c32SJeremy MorseSometimes perfectly preserving variable locations is not possible, often when a
42466943c32SJeremy Morseredundant calculation is optimized out. In such cases, a ``llvm.dbg.value``
42566943c32SJeremy Morsewith operand ``undef`` should be used, to terminate earlier variable locations
42666943c32SJeremy Morseand let the debugger present ``optimized out`` to the developer. Withholding
42766943c32SJeremy Morsethese potentially stale variable values from the developer diminishes the
42866943c32SJeremy Morseamount of available debug information, but increases the reliability of the
42966943c32SJeremy Morseremaining information.
43066943c32SJeremy Morse
43166943c32SJeremy MorseTo illustrate some potential issues, consider the following example:
43266943c32SJeremy Morse
43366943c32SJeremy Morse.. code-block:: llvm
43466943c32SJeremy Morse
43566943c32SJeremy Morse  define i32 @foo(i32 %bar, i1 %cond) {
43666943c32SJeremy Morse  entry:
43766943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 0, metadata !1, metadata !2)
43866943c32SJeremy Morse    br i1 %cond, label %truebr, label %falsebr
43966943c32SJeremy Morse  truebr:
44066943c32SJeremy Morse    %tval = add i32 %bar, 1
44166943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %tval, metadata !1, metadata !2)
44266943c32SJeremy Morse    %g1 = call i32 @gazonk()
44366943c32SJeremy Morse    br label %exit
44466943c32SJeremy Morse  falsebr:
44566943c32SJeremy Morse    %fval = add i32 %bar, 2
44666943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %fval, metadata !1, metadata !2)
44766943c32SJeremy Morse    %g2 = call i32 @gazonk()
44866943c32SJeremy Morse    br label %exit
44966943c32SJeremy Morse  exit:
45066943c32SJeremy Morse    %merge = phi [ %tval, %truebr ], [ %fval, %falsebr ]
45166943c32SJeremy Morse    %g = phi [ %g1, %truebr ], [ %g2, %falsebr ]
45266943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %merge, metadata !1, metadata !2)
45366943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %g, metadata !3, metadata !2)
45466943c32SJeremy Morse    %plusten = add i32 %merge, 10
45566943c32SJeremy Morse    %toret = add i32 %plusten, %g
45666943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %toret, metadata !1, metadata !2)
45766943c32SJeremy Morse    ret i32 %toret
45866943c32SJeremy Morse  }
45966943c32SJeremy Morse
46066943c32SJeremy MorseContaining two source-level variables in ``!1`` and ``!3``. The function could,
46166943c32SJeremy Morseperhaps, be optimized into the following code:
46266943c32SJeremy Morse
46366943c32SJeremy Morse.. code-block:: llvm
46466943c32SJeremy Morse
46566943c32SJeremy Morse  define i32 @foo(i32 %bar, i1 %cond) {
46666943c32SJeremy Morse  entry:
46766943c32SJeremy Morse    %g = call i32 @gazonk()
46866943c32SJeremy Morse    %addoper = select i1 %cond, i32 11, i32 12
46966943c32SJeremy Morse    %plusten = add i32 %bar, %addoper
47066943c32SJeremy Morse    %toret = add i32 %plusten, %g
47166943c32SJeremy Morse    ret i32 %toret
47266943c32SJeremy Morse  }
47366943c32SJeremy Morse
47466943c32SJeremy MorseWhat ``llvm.dbg.value`` intrinsics should be placed to represent the original variable
475e334a3a6SHans Wennborglocations in this code? Unfortunately the second, third and fourth
47666943c32SJeremy Morsedbg.values for ``!1`` in the source function have had their operands
47766943c32SJeremy Morse(%tval, %fval, %merge) optimized out. Assuming we cannot recover them, we
47866943c32SJeremy Morsemight consider this placement of dbg.values:
47966943c32SJeremy Morse
48066943c32SJeremy Morse.. code-block:: llvm
48166943c32SJeremy Morse
48266943c32SJeremy Morse  define i32 @foo(i32 %bar, i1 %cond) {
48366943c32SJeremy Morse  entry:
48466943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 0, metadata !1, metadata !2)
48566943c32SJeremy Morse    %g = call i32 @gazonk()
48666943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %g, metadata !3, metadata !2)
48766943c32SJeremy Morse    %addoper = select i1 %cond, i32 11, i32 12
48866943c32SJeremy Morse    %plusten = add i32 %bar, %addoper
48966943c32SJeremy Morse    %toret = add i32 %plusten, %g
49066943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %toret, metadata !1, metadata !2)
49166943c32SJeremy Morse    ret i32 %toret
49266943c32SJeremy Morse  }
49366943c32SJeremy Morse
49466943c32SJeremy MorseHowever, this will cause ``!3`` to have the return value of ``@gazonk()`` at
49566943c32SJeremy Morsethe same time as ``!1`` has the constant value zero -- a pair of assignments
49666943c32SJeremy Morsethat never occurred in the unoptimized program. To avoid this, we must terminate
49766943c32SJeremy Morsethe range that ``!1`` has the constant value assignment by inserting an undef
49866943c32SJeremy Morsedbg.value before the dbg.value for ``!3``:
49966943c32SJeremy Morse
50066943c32SJeremy Morse.. code-block:: llvm
50166943c32SJeremy Morse
50266943c32SJeremy Morse  define i32 @foo(i32 %bar, i1 %cond) {
50366943c32SJeremy Morse  entry:
50466943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 0, metadata !1, metadata !2)
50566943c32SJeremy Morse    %g = call i32 @gazonk()
50666943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 undef, metadata !1, metadata !2)
50766943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %g, metadata !3, metadata !2)
50866943c32SJeremy Morse    %addoper = select i1 %cond, i32 11, i32 12
50966943c32SJeremy Morse    %plusten = add i32 %bar, %addoper
51066943c32SJeremy Morse    %toret = add i32 %plusten, %g
51166943c32SJeremy Morse    call @llvm.dbg.value(metadata i32 %toret, metadata !1, metadata !2)
51266943c32SJeremy Morse    ret i32 %toret
51366943c32SJeremy Morse  }
51466943c32SJeremy Morse
51566943c32SJeremy MorseIn general, if any dbg.value has its operand optimized out and cannot be
51666943c32SJeremy Morserecovered, then an undef dbg.value is necessary to terminate earlier variable
51766943c32SJeremy Morselocations. Additional undef dbg.values may be necessary when the debugger can
51866943c32SJeremy Morseobserve re-ordering of assignments.
51966943c32SJeremy Morse
52038803920SJeremy MorseHow variable location metadata is transformed during CodeGen
52138803920SJeremy Morse============================================================
52238803920SJeremy Morse
52338803920SJeremy MorseLLVM preserves debug information throughout mid-level and backend passes,
52438803920SJeremy Morseultimately producing a mapping between source-level information and
52538803920SJeremy Morseinstruction ranges. This
52638803920SJeremy Morseis relatively straightforwards for line number information, as mapping
52738803920SJeremy Morseinstructions to line numbers is a simple association. For variable locations
52838803920SJeremy Morsehowever the story is more complex. As each ``llvm.dbg.value`` intrinsic
52938803920SJeremy Morserepresents a source-level assignment of a value to a source variable, the
53038803920SJeremy Morsevariable location intrinsics effectively embed a small imperative program
53138803920SJeremy Morsewithin the LLVM IR. By the end of CodeGen, this becomes a mapping from each
53238803920SJeremy Morsevariable to their machine locations over ranges of instructions.
53338803920SJeremy MorseFrom IR to object emission, the major transformations which affect variable
53438803920SJeremy Morselocation fidelity are:
535a1a4f5f1SJeremy Morse
53638803920SJeremy Morse1. Instruction Selection
53738803920SJeremy Morse2. Register allocation
53838803920SJeremy Morse3. Block layout
53938803920SJeremy Morse
54038803920SJeremy Morseeach of which are discussed below. In addition, instruction scheduling can
54138803920SJeremy Morsesignificantly change the ordering of the program, and occurs in a number of
54238803920SJeremy Morsedifferent passes.
54338803920SJeremy Morse
544a1a4f5f1SJeremy MorseSome variable locations are not transformed during CodeGen. Stack locations
545a1a4f5f1SJeremy Morsespecified by ``llvm.dbg.declare`` are valid and unchanging for the entire
546a1a4f5f1SJeremy Morseduration of the function, and are recorded in a simple MachineFunction table.
547a1a4f5f1SJeremy MorseLocation changes in the prologue and epilogue of a function are also ignored:
548a1a4f5f1SJeremy Morseframe setup and destruction may take several instructions, require a
549a1a4f5f1SJeremy Morsedisproportionate amount of debugging information in the output binary to
550a1a4f5f1SJeremy Morsedescribe, and should be stepped over by debuggers anyway.
551a1a4f5f1SJeremy Morse
55238803920SJeremy MorseVariable locations in Instruction Selection and MIR
55338803920SJeremy Morse---------------------------------------------------
55438803920SJeremy Morse
55538803920SJeremy MorseInstruction selection creates a MIR function from an IR function, and just as
55638803920SJeremy Morseit transforms ``intermediate`` instructions into machine instructions, so must
55738803920SJeremy Morse``intermediate`` variable locations become machine variable locations.
55838803920SJeremy MorseWithin IR, variable locations are always identified by a Value, but in MIR
55938803920SJeremy Morsethere can be different types of variable locations. In addition, some IR
56038803920SJeremy Morselocations become unavailable, for example if the operation of multiple IR
56138803920SJeremy Morseinstructions are combined into one machine instruction (such as
56238803920SJeremy Morsemultiply-and-accumulate) then intermediate Values are lost. To track variable
56338803920SJeremy Morselocations through instruction selection, they are first separated into
56438803920SJeremy Morselocations that do not depend on code generation (constants, stack locations,
56538803920SJeremy Morseallocated virtual registers) and those that do. For those that do, debug
56638803920SJeremy Morsemetadata is attached to SDNodes in SelectionDAGs. After instruction selection
56738803920SJeremy Morsehas occurred and a MIR function is created, if the SDNode associated with debug
56838803920SJeremy Morsemetadata is allocated a virtual register, that virtual register is used as the
56938803920SJeremy Morsevariable location. If the SDNode is folded into a machine instruction or
57038803920SJeremy Morseotherwise transformed into a non-register, the variable location becomes
57138803920SJeremy Morseunavailable.
57238803920SJeremy Morse
57338803920SJeremy MorseLocations that are unavailable are treated as if they have been optimized out:
57438803920SJeremy Morsein IR the location would be assigned ``undef`` by a debug intrinsic, and in MIR
57538803920SJeremy Morsethe equivalent location is used.
57638803920SJeremy Morse
57738803920SJeremy MorseAfter MIR locations are assigned to each variable, machine pseudo-instructions
57838803920SJeremy Morsecorresponding to each ``llvm.dbg.value`` and ``llvm.dbg.addr`` intrinsic are
579f6774130SStephen Tozerinserted. There are two forms of this type of instruction.
580f6774130SStephen Tozer
581f6774130SStephen TozerThe first form, ``DBG_VALUE``, appears thus:
58238803920SJeremy Morse
58338803920SJeremy Morse.. code-block:: text
58438803920SJeremy Morse
58538803920SJeremy Morse  DBG_VALUE %1, $noreg, !123, !DIExpression()
58638803920SJeremy Morse
587f6774130SStephen TozerAnd has the following operands:
588a1a4f5f1SJeremy Morse * The first operand can record the variable location as a register,
589a1a4f5f1SJeremy Morse   a frame index, an immediate, or the base address register if the original
590a1a4f5f1SJeremy Morse   debug intrinsic referred to memory. ``$noreg`` indicates the variable
591a1a4f5f1SJeremy Morse   location is undefined, equivalent to an ``undef`` dbg.value operand.
59238803920SJeremy Morse * The type of the second operand indicates whether the variable location is
59338803920SJeremy Morse   directly referred to by the DBG_VALUE, or whether it is indirect. The
59438803920SJeremy Morse   ``$noreg`` register signifies the former, an immediate operand (0) the
59538803920SJeremy Morse   latter.
59638803920SJeremy Morse * Operand 3 is the Variable field of the original debug intrinsic.
59738803920SJeremy Morse * Operand 4 is the Expression field of the original debug intrinsic.
59838803920SJeremy Morse
599f6774130SStephen TozerThe second form, ``DBG_VALUE_LIST``, appears thus:
600f6774130SStephen Tozer
601f6774130SStephen Tozer.. code-block:: text
602f6774130SStephen Tozer
603f6774130SStephen Tozer  DBG_VALUE_LIST !123, !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus), %1, %2
604f6774130SStephen Tozer
605f6774130SStephen TozerAnd has the following operands:
606f6774130SStephen Tozer * The first operand is the Variable field of the original debug intrinsic.
607f6774130SStephen Tozer * The second operand is the Expression field of the original debug intrinsic.
608f6774130SStephen Tozer * Any number of operands, from the 3rd onwards, record a sequence of variable
609f6774130SStephen Tozer   location operands, which may take any of the same values as the first
610f6774130SStephen Tozer   operand of the ``DBG_VALUE`` instruction above. These variable location
611f6774130SStephen Tozer   operands are inserted into the final DWARF Expression in positions indicated
612f6774130SStephen Tozer   by the DW_OP_LLVM_arg operator in the `DIExpression
613f6774130SStephen Tozer   <LangRef.html#diexpression>`.
614f6774130SStephen Tozer
61538803920SJeremy MorseThe position at which the DBG_VALUEs are inserted should correspond to the
61638803920SJeremy Morsepositions of their matching ``llvm.dbg.value`` intrinsics in the IR block.  As
61738803920SJeremy Morsewith optimization, LLVM aims to preserve the order in which variable
61838803920SJeremy Morseassignments occurred in the source program. However SelectionDAG performs some
61938803920SJeremy Morseinstruction scheduling, which can reorder assignments (discussed below).
62038803920SJeremy MorseFunction parameter locations are moved to the beginning of the function if
62138803920SJeremy Morsethey're not already, to ensure they're immediately available on function entry.
62238803920SJeremy Morse
62338803920SJeremy MorseTo demonstrate variable locations during instruction selection, consider
62438803920SJeremy Morsethe following example:
62538803920SJeremy Morse
62638803920SJeremy Morse.. code-block:: llvm
62738803920SJeremy Morse
62838803920SJeremy Morse  define i32 @foo(i32* %addr) {
62938803920SJeremy Morse  entry:
63038803920SJeremy Morse    call void @llvm.dbg.value(metadata i32 0, metadata !3, metadata !DIExpression()), !dbg !5
63138803920SJeremy Morse    br label %bb1, !dbg !5
63238803920SJeremy Morse
63338803920SJeremy Morse  bb1:                                              ; preds = %bb1, %entry
63438803920SJeremy Morse    %bar.0 = phi i32 [ 0, %entry ], [ %add, %bb1 ]
63538803920SJeremy Morse    call void @llvm.dbg.value(metadata i32 %bar.0, metadata !3, metadata !DIExpression()), !dbg !5
63638803920SJeremy Morse    %addr1 = getelementptr i32, i32 *%addr, i32 1, !dbg !5
63738803920SJeremy Morse    call void @llvm.dbg.value(metadata i32 *%addr1, metadata !3, metadata !DIExpression()), !dbg !5
63838803920SJeremy Morse    %loaded1 = load i32, i32* %addr1, !dbg !5
63938803920SJeremy Morse    %addr2 = getelementptr i32, i32 *%addr, i32 %bar.0, !dbg !5
64038803920SJeremy Morse    call void @llvm.dbg.value(metadata i32 *%addr2, metadata !3, metadata !DIExpression()), !dbg !5
64138803920SJeremy Morse    %loaded2 = load i32, i32* %addr2, !dbg !5
64238803920SJeremy Morse    %add = add i32 %bar.0, 1, !dbg !5
64338803920SJeremy Morse    call void @llvm.dbg.value(metadata i32 %add, metadata !3, metadata !DIExpression()), !dbg !5
64438803920SJeremy Morse    %added = add i32 %loaded1, %loaded2
64538803920SJeremy Morse    %cond = icmp ult i32 %added, %bar.0, !dbg !5
64638803920SJeremy Morse    br i1 %cond, label %bb1, label %bb2, !dbg !5
64738803920SJeremy Morse
64838803920SJeremy Morse  bb2:                                              ; preds = %bb1
64938803920SJeremy Morse    ret i32 0, !dbg !5
65038803920SJeremy Morse  }
65138803920SJeremy Morse
65238803920SJeremy MorseIf one compiles this IR with ``llc -o - -start-after=codegen-prepare -stop-after=expand-isel-pseudos -mtriple=x86_64--``, the following MIR is produced:
65338803920SJeremy Morse
65438803920SJeremy Morse.. code-block:: text
65538803920SJeremy Morse
65638803920SJeremy Morse  bb.0.entry:
65738803920SJeremy Morse    successors: %bb.1(0x80000000)
65838803920SJeremy Morse    liveins: $rdi
65938803920SJeremy Morse
66038803920SJeremy Morse    %2:gr64 = COPY $rdi
66138803920SJeremy Morse    %3:gr32 = MOV32r0 implicit-def dead $eflags
66238803920SJeremy Morse    DBG_VALUE 0, $noreg, !3, !DIExpression(), debug-location !5
66338803920SJeremy Morse
66438803920SJeremy Morse  bb.1.bb1:
66538803920SJeremy Morse    successors: %bb.1(0x7c000000), %bb.2(0x04000000)
66638803920SJeremy Morse
66738803920SJeremy Morse    %0:gr32 = PHI %3, %bb.0, %1, %bb.1
66838803920SJeremy Morse    DBG_VALUE %0, $noreg, !3, !DIExpression(), debug-location !5
66938803920SJeremy Morse    DBG_VALUE %2, $noreg, !3, !DIExpression(DW_OP_plus_uconst, 4, DW_OP_stack_value), debug-location !5
67038803920SJeremy Morse    %4:gr32 = MOV32rm %2, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
67138803920SJeremy Morse    %5:gr64_nosp = MOVSX64rr32 %0, debug-location !5
67238803920SJeremy Morse    DBG_VALUE $noreg, $noreg, !3, !DIExpression(), debug-location !5
67338803920SJeremy Morse    %1:gr32 = INC32r %0, implicit-def dead $eflags, debug-location !5
67438803920SJeremy Morse    DBG_VALUE %1, $noreg, !3, !DIExpression(), debug-location !5
67538803920SJeremy Morse    %6:gr32 = ADD32rm %4, %2, 4, killed %5, 0, $noreg, implicit-def dead $eflags :: (load 4 from %ir.addr2)
67638803920SJeremy Morse    %7:gr32 = SUB32rr %6, %0, implicit-def $eflags, debug-location !5
67738803920SJeremy Morse    JB_1 %bb.1, implicit $eflags, debug-location !5
67838803920SJeremy Morse    JMP_1 %bb.2, debug-location !5
67938803920SJeremy Morse
68038803920SJeremy Morse  bb.2.bb2:
68138803920SJeremy Morse    %8:gr32 = MOV32r0 implicit-def dead $eflags
68238803920SJeremy Morse    $eax = COPY %8, debug-location !5
68338803920SJeremy Morse    RET 0, $eax, debug-location !5
68438803920SJeremy Morse
68538803920SJeremy MorseObserve first that there is a DBG_VALUE instruction for every ``llvm.dbg.value``
68638803920SJeremy Morseintrinsic in the source IR, ensuring no source level assignments go missing.
68738803920SJeremy MorseThen consider the different ways in which variable locations have been recorded:
68838803920SJeremy Morse
68938803920SJeremy Morse* For the first dbg.value an immediate operand is used to record a zero value.
69038803920SJeremy Morse* The dbg.value of the PHI instruction leads to a DBG_VALUE of virtual register
69138803920SJeremy Morse  ``%0``.
69238803920SJeremy Morse* The first GEP has its effect folded into the first load instruction
69338803920SJeremy Morse  (as a 4-byte offset), but the variable location is salvaged by folding
69438803920SJeremy Morse  the GEPs effect into the DIExpression.
69538803920SJeremy Morse* The second GEP is also folded into the corresponding load. However, it is
69638803920SJeremy Morse  insufficiently simple to be salvaged, and is emitted as a ``$noreg``
69738803920SJeremy Morse  DBG_VALUE, indicating that the variable takes on an undefined location.
69838803920SJeremy Morse* The final dbg.value has its Value placed in virtual register ``%1``.
69938803920SJeremy Morse
70038803920SJeremy MorseInstruction Scheduling
70138803920SJeremy Morse----------------------
70238803920SJeremy Morse
70338803920SJeremy MorseA number of passes can reschedule instructions, notably instruction selection
70438803920SJeremy Morseand the pre-and-post RA machine schedulers. Instruction scheduling can
70538803920SJeremy Morsesignificantly change the nature of the program -- in the (very unlikely) worst
70638803920SJeremy Morsecase the instruction sequence could be completely reversed. In such
70738803920SJeremy Morsecircumstances LLVM follows the principle applied to optimizations, that it is
70838803920SJeremy Morsebetter for the debugger not to display any state than a misleading state.
70938803920SJeremy MorseThus, whenever instructions are advanced in order of execution, any
71038803920SJeremy Morsecorresponding DBG_VALUE is kept in its original position, and if an instruction
71138803920SJeremy Morseis delayed then the variable is given an undefined location for the duration
71238803920SJeremy Morseof the delay. To illustrate, consider this pseudo-MIR:
71338803920SJeremy Morse
71438803920SJeremy Morse.. code-block:: text
71538803920SJeremy Morse
71638803920SJeremy Morse  %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
71738803920SJeremy Morse  DBG_VALUE %1, $noreg, !1, !2
71838803920SJeremy Morse  %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags
71938803920SJeremy Morse  DBG_VALUE %4, $noreg, !3, !4
72038803920SJeremy Morse  %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
72138803920SJeremy Morse  DBG_VALUE %7, $noreg, !5, !6
72238803920SJeremy Morse
72338803920SJeremy MorseImagine that the SUB32rr were moved forward to give us the following MIR:
72438803920SJeremy Morse
72538803920SJeremy Morse.. code-block:: text
72638803920SJeremy Morse
72738803920SJeremy Morse  %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
72838803920SJeremy Morse  %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
72938803920SJeremy Morse  DBG_VALUE %1, $noreg, !1, !2
73038803920SJeremy Morse  %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags
73138803920SJeremy Morse  DBG_VALUE %4, $noreg, !3, !4
73238803920SJeremy Morse  DBG_VALUE %7, $noreg, !5, !6
73338803920SJeremy Morse
73438803920SJeremy MorseIn this circumstance LLVM would leave the MIR as shown above. Were we to move
73538803920SJeremy Morsethe DBG_VALUE of virtual register %7 upwards with the SUB32rr, we would re-order
7369370a741SAdrian Prantlassignments and introduce a new state of the program. Whereas with the solution
73738803920SJeremy Morseabove, the debugger will see one fewer combination of variable values, because
73838803920SJeremy Morse``!3`` and ``!5`` will change value at the same time. This is preferred over
73938803920SJeremy Morsemisrepresenting the original program.
74038803920SJeremy Morse
74138803920SJeremy MorseIn comparison, if one sunk the MOV32rm, LLVM would produce the following:
74238803920SJeremy Morse
74338803920SJeremy Morse.. code-block:: text
74438803920SJeremy Morse
74538803920SJeremy Morse  DBG_VALUE $noreg, $noreg, !1, !2
74638803920SJeremy Morse  %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags
74738803920SJeremy Morse  DBG_VALUE %4, $noreg, !3, !4
74838803920SJeremy Morse  %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
74938803920SJeremy Morse  DBG_VALUE %7, $noreg, !5, !6
75038803920SJeremy Morse  %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1)
75138803920SJeremy Morse  DBG_VALUE %1, $noreg, !1, !2
75238803920SJeremy Morse
75338803920SJeremy MorseHere, to avoid presenting a state in which the first assignment to ``!1``
75438803920SJeremy Morsedisappears, the DBG_VALUE at the top of the block assigns the variable the
75538803920SJeremy Morseundefined location, until its value is available at the end of the block where
75638803920SJeremy Morsean additional DBG_VALUE is added. Were any other DBG_VALUE for ``!1`` to occur
75738803920SJeremy Morsein the instructions that the MOV32rm was sunk past, the DBG_VALUE for ``%1``
75838803920SJeremy Morsewould be dropped and the debugger would never observe it in the variable. This
75938803920SJeremy Morseaccurately reflects that the value is not available during the corresponding
76038803920SJeremy Morseportion of the original program.
76138803920SJeremy Morse
76238803920SJeremy MorseVariable locations during Register Allocation
76338803920SJeremy Morse---------------------------------------------
76438803920SJeremy Morse
76538803920SJeremy MorseTo avoid debug instructions interfering with the register allocator, the
76638803920SJeremy MorseLiveDebugVariables pass extracts variable locations from a MIR function and
76738803920SJeremy Morsedeletes the corresponding DBG_VALUE instructions. Some localized copy
76838803920SJeremy Morsepropagation is performed within blocks. After register allocation, the
7699370a741SAdrian PrantlVirtRegRewriter pass re-inserts DBG_VALUE instructions in their original
77038803920SJeremy Morsepositions, translating virtual register references into their physical
77138803920SJeremy Morsemachine locations. To avoid encoding incorrect variable locations, in this
77238803920SJeremy Morsepass any DBG_VALUE of a virtual register that is not live, is replaced by
773df686842SDjordje Todorovicthe undefined location. The LiveDebugVariables may insert redundant DBG_VALUEs
774df686842SDjordje Todorovicbecause of virtual register rewriting. These will be subsequently removed by
775df686842SDjordje Todorovicthe RemoveRedundantDebugValues pass.
77638803920SJeremy Morse
77738803920SJeremy MorseLiveDebugValues expansion of variable locations
77838803920SJeremy Morse-----------------------------------------------
77938803920SJeremy Morse
78038803920SJeremy MorseAfter all optimizations have run and shortly before emission, the
78138803920SJeremy MorseLiveDebugValues pass runs to achieve two aims:
78238803920SJeremy Morse
78338803920SJeremy Morse* To propagate the location of variables through copies and register spills,
78438803920SJeremy Morse* For every block, to record every valid variable location in that block.
78538803920SJeremy Morse
78638803920SJeremy MorseAfter this pass the DBG_VALUE instruction changes meaning: rather than
78738803920SJeremy Morsecorresponding to a source-level assignment where the variable may change value,
78838803920SJeremy Morseit asserts the location of a variable in a block, and loses effect outside the
78938803920SJeremy Morseblock. Propagating variable locations through copies and spills is
79038803920SJeremy Morsestraightforwards: determining the variable location in every basic block
7919370a741SAdrian Prantlrequires the consideration of control flow. Consider the following IR, which
79238803920SJeremy Morsepresents several difficulties:
79338803920SJeremy Morse
7947a112c44SJeremy Morse.. code-block:: text
79538803920SJeremy Morse
79638803920SJeremy Morse  define dso_local i32 @foo(i1 %cond, i32 %input) !dbg !12 {
79738803920SJeremy Morse  entry:
79838803920SJeremy Morse    br i1 %cond, label %truebr, label %falsebr
79938803920SJeremy Morse
80038803920SJeremy Morse  bb1:
80138803920SJeremy Morse    %value = phi i32 [ %value1, %truebr ], [ %value2, %falsebr ]
80238803920SJeremy Morse    br label %exit, !dbg !26
80338803920SJeremy Morse
80438803920SJeremy Morse  truebr:
80505963a3dSArthur Eubanks    call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
80605963a3dSArthur Eubanks    call void @llvm.dbg.value(metadata i32 1, metadata !23, metadata !DIExpression()), !dbg !24
80738803920SJeremy Morse    %value1 = add i32 %input, 1
80838803920SJeremy Morse    br label %bb1
80938803920SJeremy Morse
81038803920SJeremy Morse  falsebr:
81105963a3dSArthur Eubanks    call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24
81205963a3dSArthur Eubanks    call void @llvm.dbg.value(metadata i32 2, metadata !23, metadata !DIExpression()), !dbg !24
81338803920SJeremy Morse    %value = add i32 %input, 2
81438803920SJeremy Morse    br label %bb1
81538803920SJeremy Morse
81638803920SJeremy Morse  exit:
81738803920SJeremy Morse    ret i32 %value, !dbg !30
81838803920SJeremy Morse  }
81938803920SJeremy Morse
82038803920SJeremy MorseHere the difficulties are:
82138803920SJeremy Morse
82238803920SJeremy Morse* The control flow is roughly the opposite of basic block order
82305963a3dSArthur Eubanks* The value of the ``!23`` variable merges into ``%bb1``, but there is no PHI
82438803920SJeremy Morse  node
82538803920SJeremy Morse
82638803920SJeremy MorseAs mentioned above, the ``llvm.dbg.value`` intrinsics essentially form an
82738803920SJeremy Morseimperative program embedded in the IR, with each intrinsic defining a variable
82838803920SJeremy Morselocation. This *could* be converted to an SSA form by mem2reg, in the same way
82938803920SJeremy Morsethat it uses use-def chains to identify control flow merges and insert phi
83038803920SJeremy Morsenodes for IR Values. However, because debug variable locations are defined for
83138803920SJeremy Morseevery machine instruction, in effect every IR instruction uses every variable
83238803920SJeremy Morselocation, which would lead to a large number of debugging intrinsics being
83338803920SJeremy Morsegenerated.
83438803920SJeremy Morse
83538803920SJeremy MorseExamining the example above, variable ``!30`` is assigned ``%input`` on both
83605963a3dSArthur Eubanksconditional paths through the function, while ``!23`` is assigned differing
83738803920SJeremy Morseconstant values on either path. Where control flow merges in ``%bb1`` we would
83805963a3dSArthur Eubankswant ``!30`` to keep its location (``%input``), but ``!23`` to become undefined
83938803920SJeremy Morseas we cannot determine at runtime what value it should have in %bb1 without
84038803920SJeremy Morseinserting a PHI node. mem2reg does not insert the PHI node to avoid changing
84138803920SJeremy Morsecodegen when debugging is enabled, and does not insert the other dbg.values
84238803920SJeremy Morseto avoid adding very large numbers of intrinsics.
84338803920SJeremy Morse
84438803920SJeremy MorseInstead, LiveDebugValues determines variable locations when control
84538803920SJeremy Morseflow merges. A dataflow analysis is used to propagate locations between blocks:
84638803920SJeremy Morsewhen control flow merges, if a variable has the same location in all
84738803920SJeremy Morsepredecessors then that location is propagated into the successor. If the
84838803920SJeremy Morsepredecessor locations disagree, the location becomes undefined.
84938803920SJeremy Morse
85038803920SJeremy MorseOnce LiveDebugValues has run, every block should have all valid variable
85138803920SJeremy Morselocations described by DBG_VALUE instructions within the block. Very little
85238803920SJeremy Morseeffort is then required by supporting classes (such as
85338803920SJeremy MorseDbgEntityHistoryCalculator) to build a map of each instruction to every
85438803920SJeremy Morsevalid variable location, without the need to consider control flow. From
85538803920SJeremy Morsethe example above, it is otherwise difficult to determine that the location
85638803920SJeremy Morseof variable ``!30`` should flow "up" into block ``%bb1``, but that the location
85705963a3dSArthur Eubanksof variable ``!23`` should not flow "down" into the ``%exit`` block.
85838803920SJeremy Morse
8596ac1de48SDmitri Gribenko.. _ccxx_frontend:
8606ac1de48SDmitri Gribenko
8616ac1de48SDmitri GribenkoC/C++ front-end specific debug information
8626ac1de48SDmitri Gribenko==========================================
8636ac1de48SDmitri Gribenko
86409f320adSAdrian PrantlThe C and C++ front-ends represent information about the program in a
86509f320adSAdrian Prantlformat that is effectively identical to `DWARF <http://www.dwarfstd.org/>`_
86609f320adSAdrian Prantlin terms of information content.  This allows code generators to
86709f320adSAdrian Prantltrivially support native debuggers by generating standard dwarf
86809f320adSAdrian Prantlinformation, and contains enough information for non-dwarf targets to
86909f320adSAdrian Prantltranslate it as needed.
8706ac1de48SDmitri Gribenko
8716ac1de48SDmitri GribenkoThis section describes the forms used to represent C and C++ programs.  Other
8726ac1de48SDmitri Gribenkolanguages could pattern themselves after this (which itself is tuned to
8734a5dd4a8SAdrian Prantlrepresenting programs in the same way that DWARF does), or they could choose
8746ac1de48SDmitri Gribenkoto provide completely different forms if they don't fit into the DWARF model.
8756ac1de48SDmitri GribenkoAs support for debugging information gets added to the various LLVM
8766ac1de48SDmitri Gribenkosource-language front-ends, the information used should be documented here.
8776ac1de48SDmitri Gribenko
8784a5dd4a8SAdrian PrantlThe following sections provide examples of a few C/C++ constructs and
8794a5dd4a8SAdrian Prantlthe debug information that would best describe those constructs.  The
8804a5dd4a8SAdrian Prantlcanonical references are the ``DINode`` classes defined in
8814a5dd4a8SAdrian Prantl``include/llvm/IR/DebugInfoMetadata.h`` and the implementations of the
8824a5dd4a8SAdrian Prantlhelper functions in ``lib/IR/DIBuilder.cpp``.
8836ac1de48SDmitri Gribenko
8846ac1de48SDmitri GribenkoC/C++ source file information
8856ac1de48SDmitri Gribenko-----------------------------
8866ac1de48SDmitri Gribenko
8876ac1de48SDmitri Gribenko``llvm::Instruction`` provides easy access to metadata attached with an
8886ac1de48SDmitri Gribenkoinstruction.  One can extract line number information encoded in LLVM IR using
889f032c956SDuncan P. N. Exon Smith``Instruction::getDebugLoc()`` and ``DILocation::getLine()``.
8906ac1de48SDmitri Gribenko
8916ac1de48SDmitri Gribenko.. code-block:: c++
8926ac1de48SDmitri Gribenko
893f032c956SDuncan P. N. Exon Smith  if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
894f032c956SDuncan P. N. Exon Smith    unsigned Line = Loc->getLine();
895f032c956SDuncan P. N. Exon Smith    StringRef File = Loc->getFilename();
896f032c956SDuncan P. N. Exon Smith    StringRef Dir = Loc->getDirectory();
897eb7f6020SCalixte Denizet    bool ImplicitCode = Loc->isImplicitCode();
8986ac1de48SDmitri Gribenko  }
8996ac1de48SDmitri Gribenko
900eb7f6020SCalixte DenizetWhen the flag ImplicitCode is true then it means that the Instruction has been
901eb7f6020SCalixte Denizetadded by the front-end but doesn't correspond to source code written by the user. For example
902eb7f6020SCalixte Denizet
903eb7f6020SCalixte Denizet.. code-block:: c++
904eb7f6020SCalixte Denizet
905eb7f6020SCalixte Denizet  if (MyBoolean) {
906eb7f6020SCalixte Denizet    MyObject MO;
907eb7f6020SCalixte Denizet    ...
908eb7f6020SCalixte Denizet  }
909eb7f6020SCalixte Denizet
910eb7f6020SCalixte DenizetAt the end of the scope the MyObject's destructor is called but it isn't written
911eb7f6020SCalixte Denizetexplicitly. This information is useful to avoid to have counters on brackets when
912eb7f6020SCalixte Denizetmaking code coverage.
913eb7f6020SCalixte Denizet
9146ac1de48SDmitri GribenkoC/C++ global variable information
9156ac1de48SDmitri Gribenko---------------------------------
9166ac1de48SDmitri Gribenko
9176ac1de48SDmitri GribenkoGiven an integer global variable declared as follows:
9186ac1de48SDmitri Gribenko
9196ac1de48SDmitri Gribenko.. code-block:: c
9206ac1de48SDmitri Gribenko
9213c989984SVictor Leschuk  _Alignas(8) int MyGlobal = 100;
9226ac1de48SDmitri Gribenko
9236ac1de48SDmitri Gribenkoa C/C++ front-end would generate the following descriptors:
9246ac1de48SDmitri Gribenko
925124f2593SRenato Golin.. code-block:: text
9266ac1de48SDmitri Gribenko
9276ac1de48SDmitri Gribenko  ;;
9286ac1de48SDmitri Gribenko  ;; Define the global itself.
9296ac1de48SDmitri Gribenko  ;;
9303c989984SVictor Leschuk  @MyGlobal = global i32 100, align 8, !dbg !0
931d937cd9fSDuncan P. N. Exon Smith
9326ac1de48SDmitri Gribenko  ;;
9336ac1de48SDmitri Gribenko  ;; List of debug info of globals
9346ac1de48SDmitri Gribenko  ;;
9353c989984SVictor Leschuk  !llvm.dbg.cu = !{!1}
9366ac1de48SDmitri Gribenko
937d937cd9fSDuncan P. N. Exon Smith  ;; Some unrelated metadata.
938d937cd9fSDuncan P. N. Exon Smith  !llvm.module.flags = !{!6, !7}
9393c989984SVictor Leschuk  !llvm.ident = !{!8}
9403c989984SVictor Leschuk
9413c989984SVictor Leschuk  ;; Define the global variable itself
9423c989984SVictor Leschuk  !0 = distinct !DIGlobalVariable(name: "MyGlobal", scope: !1, file: !2, line: 1, type: !5, isLocal: false, isDefinition: true, align: 64)
943d937cd9fSDuncan P. N. Exon Smith
9446ac1de48SDmitri Gribenko  ;; Define the compile unit.
9453c989984SVictor Leschuk  !1 = distinct !DICompileUnit(language: DW_LANG_C99, file: !2,
946693d39ddSJames Y Knight                               producer: "clang version 4.0.0",
947b8089516SAdrian Prantl                               isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug,
9483c989984SVictor Leschuk                               enums: !3, globals: !4)
9496ac1de48SDmitri Gribenko
9506ac1de48SDmitri Gribenko  ;;
9516ac1de48SDmitri Gribenko  ;; Define the file
9526ac1de48SDmitri Gribenko  ;;
9533c989984SVictor Leschuk  !2 = !DIFile(filename: "/dev/stdin",
954d937cd9fSDuncan P. N. Exon Smith               directory: "/Users/dexonsmith/data/llvm/debug-info")
955d937cd9fSDuncan P. N. Exon Smith
956d937cd9fSDuncan P. N. Exon Smith  ;; An empty array.
9573c989984SVictor Leschuk  !3 = !{}
958d937cd9fSDuncan P. N. Exon Smith
959d937cd9fSDuncan P. N. Exon Smith  ;; The Array of Global Variables
9603c989984SVictor Leschuk  !4 = !{!0}
9616ac1de48SDmitri Gribenko
9626ac1de48SDmitri Gribenko  ;;
9636ac1de48SDmitri Gribenko  ;; Define the type
9646ac1de48SDmitri Gribenko  ;;
9653c989984SVictor Leschuk  !5 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
966d937cd9fSDuncan P. N. Exon Smith
967d937cd9fSDuncan P. N. Exon Smith  ;; Dwarf version to output.
9683c989984SVictor Leschuk  !6 = !{i32 2, !"Dwarf Version", i32 4}
969d937cd9fSDuncan P. N. Exon Smith
970d937cd9fSDuncan P. N. Exon Smith  ;; Debug info schema version.
971d937cd9fSDuncan P. N. Exon Smith  !7 = !{i32 2, !"Debug Info Version", i32 3}
9726ac1de48SDmitri Gribenko
9733c989984SVictor Leschuk  ;; Compiler identification
974693d39ddSJames Y Knight  !8 = !{!"clang version 4.0.0"}
9753c989984SVictor Leschuk
9763c989984SVictor Leschuk
9773c989984SVictor LeschukThe align value in DIGlobalVariable description specifies variable alignment in
9783c989984SVictor Leschukcase it was forced by C11 _Alignas(), C++11 alignas() keywords or compiler
9793c989984SVictor Leschukattribute __attribute__((aligned ())). In other case (when this field is missing)
9803c989984SVictor Leschukalignment is considered default. This is used when producing DWARF output
9813c989984SVictor Leschukfor DW_AT_alignment value.
9823c989984SVictor Leschuk
9836ac1de48SDmitri GribenkoC/C++ function information
9846ac1de48SDmitri Gribenko--------------------------
9856ac1de48SDmitri Gribenko
9866ac1de48SDmitri GribenkoGiven a function declared as follows:
9876ac1de48SDmitri Gribenko
9886ac1de48SDmitri Gribenko.. code-block:: c
9896ac1de48SDmitri Gribenko
9906ac1de48SDmitri Gribenko  int main(int argc, char *argv[]) {
9916ac1de48SDmitri Gribenko    return 0;
9926ac1de48SDmitri Gribenko  }
9936ac1de48SDmitri Gribenko
9946ac1de48SDmitri Gribenkoa C/C++ front-end would generate the following descriptors:
9956ac1de48SDmitri Gribenko
996124f2593SRenato Golin.. code-block:: text
9976ac1de48SDmitri Gribenko
9986ac1de48SDmitri Gribenko  ;;
999c4fe5db1SDavid Blaikie  ;; Define the anchor for subprograms.
10006ac1de48SDmitri Gribenko  ;;
1001a9308c49SDuncan P. N. Exon Smith  !4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5,
1002d937cd9fSDuncan P. N. Exon Smith                     isLocal: false, isDefinition: true, scopeLine: 1,
1003d937cd9fSDuncan P. N. Exon Smith                     flags: DIFlagPrototyped, isOptimized: false,
1004f623dc9aSEllis Hoag                     retainedNodes: !2)
1005936675e2SDuncan P. N. Exon Smith
10066ac1de48SDmitri Gribenko  ;;
10076ac1de48SDmitri Gribenko  ;; Define the subprogram itself.
10086ac1de48SDmitri Gribenko  ;;
100950108683SPeter Collingbourne  define i32 @main(i32 %argc, i8** %argv) !dbg !4 {
10106ac1de48SDmitri Gribenko  ...
10116ac1de48SDmitri Gribenko  }
10126ac1de48SDmitri Gribenko
1013f919be33SAdrian PrantlC++ specific debug information
1014f919be33SAdrian Prantl==============================
1015f919be33SAdrian Prantl
1016f919be33SAdrian PrantlC++ special member functions information
1017f919be33SAdrian Prantl----------------------------------------
1018f919be33SAdrian Prantl
1019f919be33SAdrian PrantlDWARF v5 introduces attributes defined to enhance debugging information of C++ programs. LLVM can generate (or omit) these appropriate DWARF attributes. In C++ a special member function Ctors, Dtors, Copy/Move Ctors, assignment operators can be declared with C++11 keyword deleted. This is represented in LLVM using spFlags value DISPFlagDeleted.
1020f919be33SAdrian Prantl
1021f919be33SAdrian PrantlGiven a class declaration with copy constructor declared as deleted:
1022f919be33SAdrian Prantl
1023f919be33SAdrian Prantl.. code-block:: c
1024f919be33SAdrian Prantl
1025f919be33SAdrian Prantl  class foo {
1026f919be33SAdrian Prantl   public:
1027f919be33SAdrian Prantl     foo(const foo&) = deleted;
1028f919be33SAdrian Prantl  };
1029f919be33SAdrian Prantl
1030f65d4aa9SKazuaki IshizakiA C++ frontend would generate following:
1031f919be33SAdrian Prantl
1032f919be33SAdrian Prantl.. code-block:: text
1033f919be33SAdrian Prantl
1034f919be33SAdrian Prantl  !17 = !DISubprogram(name: "foo", scope: !11, file: !1, line: 5, type: !18, scopeLine: 5, flags: DIFlagPublic | DIFlagPrototyped, spFlags: DISPFlagDeleted)
1035f919be33SAdrian Prantl
1036f65d4aa9SKazuaki Ishizakiand this will produce an additional DWARF attribute as:
1037f919be33SAdrian Prantl
1038f919be33SAdrian Prantl.. code-block:: text
1039f919be33SAdrian Prantl
1040f919be33SAdrian Prantl  DW_TAG_subprogram [7] *
1041f919be33SAdrian Prantl    DW_AT_name [DW_FORM_strx1]    (indexed (00000006) string = "foo")
1042f919be33SAdrian Prantl    DW_AT_decl_line [DW_FORM_data1]       (5)
1043f919be33SAdrian Prantl    ...
1044f919be33SAdrian Prantl    DW_AT_deleted [DW_FORM_flag_present]  (true)
1045f919be33SAdrian Prantl
1046e69917f1SAdrian PrantlFortran specific debug information
1047e69917f1SAdrian Prantl==================================
1048e69917f1SAdrian Prantl
1049e69917f1SAdrian PrantlFortran function information
1050e69917f1SAdrian Prantl----------------------------
1051e69917f1SAdrian Prantl
1052e69917f1SAdrian PrantlThere are a few DWARF attributes defined to support client debugging of Fortran programs.  LLVM can generate (or omit) the appropriate DWARF attributes for the prefix-specs of ELEMENTAL, PURE, IMPURE, RECURSIVE, and NON_RECURSIVE.  This is done by using the spFlags values: DISPFlagElemental, DISPFlagPure, and DISPFlagRecursive.
1053e69917f1SAdrian Prantl
1054e69917f1SAdrian Prantl.. code-block:: fortran
1055e69917f1SAdrian Prantl
1056e69917f1SAdrian Prantl  elemental function elem_func(a)
1057e69917f1SAdrian Prantl
1058e69917f1SAdrian Prantla Fortran front-end would generate the following descriptors:
1059e69917f1SAdrian Prantl
1060e69917f1SAdrian Prantl.. code-block:: text
1061e69917f1SAdrian Prantl
1062e69917f1SAdrian Prantl  !11 = distinct !DISubprogram(name: "subroutine2", scope: !1, file: !1,
1063e69917f1SAdrian Prantl          line: 5, type: !8, scopeLine: 6,
1064e69917f1SAdrian Prantl          spFlags: DISPFlagDefinition | DISPFlagElemental, unit: !0,
1065e69917f1SAdrian Prantl          retainedNodes: !2)
1066e69917f1SAdrian Prantl
1067e69917f1SAdrian Prantland this will materialize an additional DWARF attribute as:
1068e69917f1SAdrian Prantl
1069e69917f1SAdrian Prantl.. code-block:: text
1070e69917f1SAdrian Prantl
1071e69917f1SAdrian Prantl  DW_TAG_subprogram [3]
1072e69917f1SAdrian Prantl     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000010 ".text")
1073e69917f1SAdrian Prantl     DW_AT_high_pc [DW_FORM_data4]   (0x00000001)
1074e69917f1SAdrian Prantl     ...
1075e69917f1SAdrian Prantl     DW_AT_elemental [DW_FORM_flag_present]  (true)
1076e69917f1SAdrian Prantl
1077f91d18eaSSourabh Singh TomarThere are a few DWARF tags defined to represent Fortran specific constructs i.e DW_TAG_string_type for representing Fortran character(n). In LLVM this is represented as DIStringType.
1078f91d18eaSSourabh Singh Tomar
1079f91d18eaSSourabh Singh Tomar.. code-block:: fortran
1080f91d18eaSSourabh Singh Tomar
1081f91d18eaSSourabh Singh Tomar  character(len=*), intent(in) :: string
1082f91d18eaSSourabh Singh Tomar
1083f91d18eaSSourabh Singh Tomara Fortran front-end would generate the following descriptors:
1084f91d18eaSSourabh Singh Tomar
1085f91d18eaSSourabh Singh Tomar.. code-block:: text
1086f91d18eaSSourabh Singh Tomar
1087f91d18eaSSourabh Singh Tomar  !DILocalVariable(name: "string", arg: 1, scope: !10, file: !3, line: 4, type: !15)
1088f91d18eaSSourabh Singh Tomar  !DIStringType(name: "character(*)!2", stringLength: !16, stringLengthExpression: !DIExpression(), size: 32)
1089f91d18eaSSourabh Singh Tomar
109070fdbf35SYASHASVI KHATAVKARA fortran deferred-length character can also contain the information of raw storage of the characters in addition to the length of the string. This information is encoded in the  stringLocationExpression field. Based on this information, DW_AT_data_location attribute is emitted in a DW_TAG_string_type debug info.
109170fdbf35SYASHASVI KHATAVKAR
109270fdbf35SYASHASVI KHATAVKAR  !DIStringType(name: "character(*)!2", stringLengthExpression: !DIExpression(), stringLocationExpression: !DIExpression(DW_OP_push_object_address, DW_OP_deref), size: 32)
1093f9f78a2cSYASHASVI KHATAVKAR
1094f91d18eaSSourabh Singh Tomarand this will materialize in DWARF tags as:
1095f91d18eaSSourabh Singh Tomar
1096f91d18eaSSourabh Singh Tomar.. code-block:: text
1097f91d18eaSSourabh Singh Tomar
1098f91d18eaSSourabh Singh Tomar   DW_TAG_string_type
1099f91d18eaSSourabh Singh Tomar                DW_AT_name      ("character(*)!2")
1100f91d18eaSSourabh Singh Tomar                DW_AT_string_length     (0x00000064)
1101f91d18eaSSourabh Singh Tomar   0x00000064:    DW_TAG_variable
1102f91d18eaSSourabh Singh Tomar                  DW_AT_location      (DW_OP_fbreg +16)
1103f91d18eaSSourabh Singh Tomar                  DW_AT_type  (0x00000083 "integer*8")
110470fdbf35SYASHASVI KHATAVKAR                  DW_AT_data_location (DW_OP_push_object_address, DW_OP_deref)
1105f91d18eaSSourabh Singh Tomar                  ...
1106f91d18eaSSourabh Singh Tomar                  DW_AT_artificial    (true)
1107*eab6e94fSChih-Ping Chen
1108*eab6e94fSChih-Ping ChenA Fortran front-end may need to generate a *trampoline* function to call a
1109*eab6e94fSChih-Ping Chenfunction defined in a different compilation unit. In this case, the front-end
1110*eab6e94fSChih-Ping Chencan emit the following descriptor for the trampoline function:
1111*eab6e94fSChih-Ping Chen
1112*eab6e94fSChih-Ping Chen.. code-block:: text
1113*eab6e94fSChih-Ping Chen
1114*eab6e94fSChih-Ping Chen  !DISubprogram(name: "sub1_.t0p", linkageName: "sub1_.t0p", scope: !4, file: !4, type: !5, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !7, retainedNodes: !24, targetFuncName: "sub1_")
1115*eab6e94fSChih-Ping Chen
1116*eab6e94fSChih-Ping ChenThe targetFuncName field is the name of the function that the trampoline
1117*eab6e94fSChih-Ping Chencalls. This descriptor results in the following DWARF tag:
1118*eab6e94fSChih-Ping Chen
1119*eab6e94fSChih-Ping Chen.. code-block:: text
1120*eab6e94fSChih-Ping Chen
1121*eab6e94fSChih-Ping Chen  DW_TAG_subprogram
1122*eab6e94fSChih-Ping Chen    ...
1123*eab6e94fSChih-Ping Chen    DW_AT_linkage_name	("sub1_.t0p")
1124*eab6e94fSChih-Ping Chen    DW_AT_name	("sub1_.t0p")
1125*eab6e94fSChih-Ping Chen    DW_AT_trampoline	("sub1_")
1126f91d18eaSSourabh Singh Tomar
11276ac1de48SDmitri GribenkoDebugging information format
11286ac1de48SDmitri Gribenko============================
11296ac1de48SDmitri Gribenko
11306ac1de48SDmitri GribenkoDebugging Information Extension for Objective C Properties
11316ac1de48SDmitri Gribenko----------------------------------------------------------
11326ac1de48SDmitri Gribenko
11336ac1de48SDmitri GribenkoIntroduction
11346ac1de48SDmitri Gribenko^^^^^^^^^^^^
11356ac1de48SDmitri Gribenko
11366ac1de48SDmitri GribenkoObjective C provides a simpler way to declare and define accessor methods using
11376ac1de48SDmitri Gribenkodeclared properties.  The language provides features to declare a property and
11386ac1de48SDmitri Gribenkoto let compiler synthesize accessor methods.
11396ac1de48SDmitri Gribenko
11406ac1de48SDmitri GribenkoThe debugger lets developer inspect Objective C interfaces and their instance
11416ac1de48SDmitri Gribenkovariables and class variables.  However, the debugger does not know anything
11426ac1de48SDmitri Gribenkoabout the properties defined in Objective C interfaces.  The debugger consumes
11436ac1de48SDmitri Gribenkoinformation generated by compiler in DWARF format.  The format does not support
11446ac1de48SDmitri Gribenkoencoding of Objective C properties.  This proposal describes DWARF extensions to
11456ac1de48SDmitri Gribenkoencode Objective C properties, which the debugger can use to let developers
11466ac1de48SDmitri Gribenkoinspect Objective C properties.
11476ac1de48SDmitri Gribenko
11486ac1de48SDmitri GribenkoProposal
11496ac1de48SDmitri Gribenko^^^^^^^^
11506ac1de48SDmitri Gribenko
11516ac1de48SDmitri GribenkoObjective C properties exist separately from class members.  A property can be
11526ac1de48SDmitri Gribenkodefined only by "setter" and "getter" selectors, and be calculated anew on each
11536ac1de48SDmitri Gribenkoaccess.  Or a property can just be a direct access to some declared ivar.
11546ac1de48SDmitri GribenkoFinally it can have an ivar "automatically synthesized" for it by the compiler,
11556ac1de48SDmitri Gribenkoin which case the property can be referred to in user code directly using the
11566ac1de48SDmitri Gribenkostandard C dereference syntax as well as through the property "dot" syntax, but
11576ac1de48SDmitri Gribenkothere is no entry in the ``@interface`` declaration corresponding to this ivar.
11586ac1de48SDmitri Gribenko
11596ac1de48SDmitri GribenkoTo facilitate debugging, these properties we will add a new DWARF TAG into the
11606ac1de48SDmitri Gribenko``DW_TAG_structure_type`` definition for the class to hold the description of a
11616ac1de48SDmitri Gribenkogiven property, and a set of DWARF attributes that provide said description.
11626ac1de48SDmitri GribenkoThe property tag will also contain the name and declared type of the property.
11636ac1de48SDmitri Gribenko
11646ac1de48SDmitri GribenkoIf there is a related ivar, there will also be a DWARF property attribute placed
11656ac1de48SDmitri Gribenkoin the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
11666ac1de48SDmitri Gribenkofor that property.  And in the case where the compiler synthesizes the ivar
11676ac1de48SDmitri Gribenkodirectly, the compiler is expected to generate a ``DW_TAG_member`` for that
11686ac1de48SDmitri Gribenkoivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
11696ac1de48SDmitri Gribenkoto access this ivar directly in code, and with the property attribute pointing
11706ac1de48SDmitri Gribenkoback to the property it is backing.
11716ac1de48SDmitri Gribenko
11726ac1de48SDmitri GribenkoThe following examples will serve as illustration for our discussion:
11736ac1de48SDmitri Gribenko
11746ac1de48SDmitri Gribenko.. code-block:: objc
11756ac1de48SDmitri Gribenko
11766ac1de48SDmitri Gribenko  @interface I1 {
11776ac1de48SDmitri Gribenko    int n2;
11786ac1de48SDmitri Gribenko  }
11796ac1de48SDmitri Gribenko
11806ac1de48SDmitri Gribenko  @property int p1;
11816ac1de48SDmitri Gribenko  @property int p2;
11826ac1de48SDmitri Gribenko  @end
11836ac1de48SDmitri Gribenko
11846ac1de48SDmitri Gribenko  @implementation I1
11856ac1de48SDmitri Gribenko  @synthesize p1;
11866ac1de48SDmitri Gribenko  @synthesize p2 = n2;
11876ac1de48SDmitri Gribenko  @end
11886ac1de48SDmitri Gribenko
11896ac1de48SDmitri GribenkoThis produces the following DWARF (this is a "pseudo dwarfdump" output):
11906ac1de48SDmitri Gribenko
11916ac1de48SDmitri Gribenko.. code-block:: none
11926ac1de48SDmitri Gribenko
11936ac1de48SDmitri Gribenko  0x00000100:  TAG_structure_type [7] *
11946ac1de48SDmitri Gribenko                 AT_APPLE_runtime_class( 0x10 )
11956ac1de48SDmitri Gribenko                 AT_name( "I1" )
11966ac1de48SDmitri Gribenko                 AT_decl_file( "Objc_Property.m" )
11976ac1de48SDmitri Gribenko                 AT_decl_line( 3 )
11986ac1de48SDmitri Gribenko
11996ac1de48SDmitri Gribenko  0x00000110    TAG_APPLE_property
12006ac1de48SDmitri Gribenko                  AT_name ( "p1" )
12016ac1de48SDmitri Gribenko                  AT_type ( {0x00000150} ( int ) )
12026ac1de48SDmitri Gribenko
12036ac1de48SDmitri Gribenko  0x00000120:   TAG_APPLE_property
12046ac1de48SDmitri Gribenko                  AT_name ( "p2" )
12056ac1de48SDmitri Gribenko                  AT_type ( {0x00000150} ( int ) )
12066ac1de48SDmitri Gribenko
12076ac1de48SDmitri Gribenko  0x00000130:   TAG_member [8]
12086ac1de48SDmitri Gribenko                  AT_name( "_p1" )
12096ac1de48SDmitri Gribenko                  AT_APPLE_property ( {0x00000110} "p1" )
12106ac1de48SDmitri Gribenko                  AT_type( {0x00000150} ( int ) )
12116ac1de48SDmitri Gribenko                  AT_artificial ( 0x1 )
12126ac1de48SDmitri Gribenko
12136ac1de48SDmitri Gribenko  0x00000140:    TAG_member [8]
12146ac1de48SDmitri Gribenko                   AT_name( "n2" )
12156ac1de48SDmitri Gribenko                   AT_APPLE_property ( {0x00000120} "p2" )
12166ac1de48SDmitri Gribenko                   AT_type( {0x00000150} ( int ) )
12176ac1de48SDmitri Gribenko
12186ac1de48SDmitri Gribenko  0x00000150:  AT_type( ( int ) )
12196ac1de48SDmitri Gribenko
12206ac1de48SDmitri GribenkoNote, the current convention is that the name of the ivar for an
12216ac1de48SDmitri Gribenkoauto-synthesized property is the name of the property from which it derives
12226ac1de48SDmitri Gribenkowith an underscore prepended, as is shown in the example.  But we actually
12236ac1de48SDmitri Gribenkodon't need to know this convention, since we are given the name of the ivar
12246ac1de48SDmitri Gribenkodirectly.
12256ac1de48SDmitri Gribenko
12266ac1de48SDmitri GribenkoAlso, it is common practice in ObjC to have different property declarations in
12276ac1de48SDmitri Gribenkothe @interface and @implementation - e.g. to provide a read-only property in
12286ac1de48SDmitri Gribenkothe interface, and a read-write interface in the implementation.  In that case,
12296ac1de48SDmitri Gribenkothe compiler should emit whichever property declaration will be in force in the
12306ac1de48SDmitri Gribenkocurrent translation unit.
12316ac1de48SDmitri Gribenko
12326ac1de48SDmitri GribenkoDevelopers can decorate a property with attributes which are encoded using
12336ac1de48SDmitri Gribenko``DW_AT_APPLE_property_attribute``.
12346ac1de48SDmitri Gribenko
12356ac1de48SDmitri Gribenko.. code-block:: objc
12366ac1de48SDmitri Gribenko
12376ac1de48SDmitri Gribenko  @property (readonly, nonatomic) int pr;
12386ac1de48SDmitri Gribenko
12396ac1de48SDmitri Gribenko.. code-block:: none
12406ac1de48SDmitri Gribenko
12416ac1de48SDmitri Gribenko  TAG_APPLE_property [8]
12426ac1de48SDmitri Gribenko    AT_name( "pr" )
12436ac1de48SDmitri Gribenko    AT_type ( {0x00000147} (int) )
12446ac1de48SDmitri Gribenko    AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
12456ac1de48SDmitri Gribenko
12466ac1de48SDmitri GribenkoThe setter and getter method names are attached to the property using
12476ac1de48SDmitri Gribenko``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
12486ac1de48SDmitri Gribenko
12496ac1de48SDmitri Gribenko.. code-block:: objc
12506ac1de48SDmitri Gribenko
12516ac1de48SDmitri Gribenko  @interface I1
12526ac1de48SDmitri Gribenko  @property (setter=myOwnP3Setter:) int p3;
12536ac1de48SDmitri Gribenko  -(void)myOwnP3Setter:(int)a;
12546ac1de48SDmitri Gribenko  @end
12556ac1de48SDmitri Gribenko
12566ac1de48SDmitri Gribenko  @implementation I1
12576ac1de48SDmitri Gribenko  @synthesize p3;
12586ac1de48SDmitri Gribenko  -(void)myOwnP3Setter:(int)a{ }
12596ac1de48SDmitri Gribenko  @end
12606ac1de48SDmitri Gribenko
12616ac1de48SDmitri GribenkoThe DWARF for this would be:
12626ac1de48SDmitri Gribenko
12636ac1de48SDmitri Gribenko.. code-block:: none
12646ac1de48SDmitri Gribenko
12656ac1de48SDmitri Gribenko  0x000003bd: TAG_structure_type [7] *
12666ac1de48SDmitri Gribenko                AT_APPLE_runtime_class( 0x10 )
12676ac1de48SDmitri Gribenko                AT_name( "I1" )
12686ac1de48SDmitri Gribenko                AT_decl_file( "Objc_Property.m" )
12696ac1de48SDmitri Gribenko                AT_decl_line( 3 )
12706ac1de48SDmitri Gribenko
12716ac1de48SDmitri Gribenko  0x000003cd      TAG_APPLE_property
12726ac1de48SDmitri Gribenko                    AT_name ( "p3" )
12736ac1de48SDmitri Gribenko                    AT_APPLE_property_setter ( "myOwnP3Setter:" )
12746ac1de48SDmitri Gribenko                    AT_type( {0x00000147} ( int ) )
12756ac1de48SDmitri Gribenko
12766ac1de48SDmitri Gribenko  0x000003f3:     TAG_member [8]
12776ac1de48SDmitri Gribenko                    AT_name( "_p3" )
12786ac1de48SDmitri Gribenko                    AT_type ( {0x00000147} ( int ) )
12796ac1de48SDmitri Gribenko                    AT_APPLE_property ( {0x000003cd} )
12806ac1de48SDmitri Gribenko                    AT_artificial ( 0x1 )
12816ac1de48SDmitri Gribenko
12826ac1de48SDmitri GribenkoNew DWARF Tags
12836ac1de48SDmitri Gribenko^^^^^^^^^^^^^^
12846ac1de48SDmitri Gribenko
12856ac1de48SDmitri Gribenko+-----------------------+--------+
12866ac1de48SDmitri Gribenko| TAG                   | Value  |
12876ac1de48SDmitri Gribenko+=======================+========+
12886ac1de48SDmitri Gribenko| DW_TAG_APPLE_property | 0x4200 |
12896ac1de48SDmitri Gribenko+-----------------------+--------+
12906ac1de48SDmitri Gribenko
12916ac1de48SDmitri GribenkoNew DWARF Attributes
12926ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^^
12936ac1de48SDmitri Gribenko
12946ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+
12956ac1de48SDmitri Gribenko| Attribute                      | Value  | Classes   |
12966ac1de48SDmitri Gribenko+================================+========+===========+
12976ac1de48SDmitri Gribenko| DW_AT_APPLE_property           | 0x3fed | Reference |
12986ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+
12996ac1de48SDmitri Gribenko| DW_AT_APPLE_property_getter    | 0x3fe9 | String    |
13006ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+
13016ac1de48SDmitri Gribenko| DW_AT_APPLE_property_setter    | 0x3fea | String    |
13026ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+
13036ac1de48SDmitri Gribenko| DW_AT_APPLE_property_attribute | 0x3feb | Constant  |
13046ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+
13056ac1de48SDmitri Gribenko
13066ac1de48SDmitri GribenkoNew DWARF Constants
13076ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^
13086ac1de48SDmitri Gribenko
1309eea4f885SFrederic Riss+--------------------------------------+-------+
13106ac1de48SDmitri Gribenko| Name                                 | Value |
1311eea4f885SFrederic Riss+======================================+=======+
1312eea4f885SFrederic Riss| DW_APPLE_PROPERTY_readonly           | 0x01  |
1313eea4f885SFrederic Riss+--------------------------------------+-------+
1314eea4f885SFrederic Riss| DW_APPLE_PROPERTY_getter             | 0x02  |
1315eea4f885SFrederic Riss+--------------------------------------+-------+
1316eea4f885SFrederic Riss| DW_APPLE_PROPERTY_assign             | 0x04  |
1317eea4f885SFrederic Riss+--------------------------------------+-------+
1318eea4f885SFrederic Riss| DW_APPLE_PROPERTY_readwrite          | 0x08  |
1319eea4f885SFrederic Riss+--------------------------------------+-------+
1320eea4f885SFrederic Riss| DW_APPLE_PROPERTY_retain             | 0x10  |
1321eea4f885SFrederic Riss+--------------------------------------+-------+
1322eea4f885SFrederic Riss| DW_APPLE_PROPERTY_copy               | 0x20  |
1323eea4f885SFrederic Riss+--------------------------------------+-------+
1324eea4f885SFrederic Riss| DW_APPLE_PROPERTY_nonatomic          | 0x40  |
1325eea4f885SFrederic Riss+--------------------------------------+-------+
1326eea4f885SFrederic Riss| DW_APPLE_PROPERTY_setter             | 0x80  |
1327eea4f885SFrederic Riss+--------------------------------------+-------+
1328eea4f885SFrederic Riss| DW_APPLE_PROPERTY_atomic             | 0x100 |
1329eea4f885SFrederic Riss+--------------------------------------+-------+
1330eea4f885SFrederic Riss| DW_APPLE_PROPERTY_weak               | 0x200 |
1331eea4f885SFrederic Riss+--------------------------------------+-------+
1332eea4f885SFrederic Riss| DW_APPLE_PROPERTY_strong             | 0x400 |
1333eea4f885SFrederic Riss+--------------------------------------+-------+
1334eea4f885SFrederic Riss| DW_APPLE_PROPERTY_unsafe_unretained  | 0x800 |
13350418ef26SAdrian Prantl+--------------------------------------+-------+
13360418ef26SAdrian Prantl| DW_APPLE_PROPERTY_nullability        | 0x1000|
13370418ef26SAdrian Prantl+--------------------------------------+-------+
13380418ef26SAdrian Prantl| DW_APPLE_PROPERTY_null_resettable    | 0x2000|
13390418ef26SAdrian Prantl+--------------------------------------+-------+
13400418ef26SAdrian Prantl| DW_APPLE_PROPERTY_class              | 0x4000|
13410418ef26SAdrian Prantl+--------------------------------------+-------+
13426ac1de48SDmitri Gribenko
13436ac1de48SDmitri GribenkoName Accelerator Tables
13446ac1de48SDmitri Gribenko-----------------------
13456ac1de48SDmitri Gribenko
13466ac1de48SDmitri GribenkoIntroduction
13476ac1de48SDmitri Gribenko^^^^^^^^^^^^
13486ac1de48SDmitri Gribenko
13496ac1de48SDmitri GribenkoThe "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
13506ac1de48SDmitri Gribenkodebugger needs.  The "``pub``" in the section name indicates that the entries
13516ac1de48SDmitri Gribenkoin the table are publicly visible names only.  This means no static or hidden
13526ac1de48SDmitri Gribenkofunctions show up in the "``.debug_pubnames``".  No static variables or private
13536ac1de48SDmitri Gribenkoclass variables are in the "``.debug_pubtypes``".  Many compilers add different
13546ac1de48SDmitri Gribenkothings to these tables, so we can't rely upon the contents between gcc, icc, or
13556ac1de48SDmitri Gribenkoclang.
13566ac1de48SDmitri Gribenko
13576ac1de48SDmitri GribenkoThe typical query given by users tends not to match up with the contents of
13586ac1de48SDmitri Gribenkothese tables.  For example, the DWARF spec states that "In the case of the name
13596ac1de48SDmitri Gribenkoof a function member or static data member of a C++ structure, class or union,
13606ac1de48SDmitri Gribenkothe name presented in the "``.debug_pubnames``" section is not the simple name
13616ac1de48SDmitri Gribenkogiven by the ``DW_AT_name attribute`` of the referenced debugging information
13626ac1de48SDmitri Gribenkoentry, but rather the fully qualified name of the data or function member."
13636ac1de48SDmitri GribenkoSo the only names in these tables for complex C++ entries is a fully
13646ac1de48SDmitri Gribenkoqualified name.  Debugger users tend not to enter their search strings as
13656ac1de48SDmitri Gribenko"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
13666ac1de48SDmitri Gribenko"``a::b::c``".  So the name entered in the name table must be demangled in
13676ac1de48SDmitri Gribenkoorder to chop it up appropriately and additional names must be manually entered
13686ac1de48SDmitri Gribenkointo the table to make it effective as a name lookup table for debuggers to
1369e9ffb45bSBruce Mitcheneruse.
13706ac1de48SDmitri Gribenko
13716ac1de48SDmitri GribenkoAll debuggers currently ignore the "``.debug_pubnames``" table as a result of
13726ac1de48SDmitri Gribenkoits inconsistent and useless public-only name content making it a waste of
13736ac1de48SDmitri Gribenkospace in the object file.  These tables, when they are written to disk, are not
13746ac1de48SDmitri Gribenkosorted in any way, leaving every debugger to do its own parsing and sorting.
13756ac1de48SDmitri GribenkoThese tables also include an inlined copy of the string values in the table
13766ac1de48SDmitri Gribenkoitself making the tables much larger than they need to be on disk, especially
13776ac1de48SDmitri Gribenkofor large C++ programs.
13786ac1de48SDmitri Gribenko
13796ac1de48SDmitri GribenkoCan't we just fix the sections by adding all of the names we need to this
13806ac1de48SDmitri Gribenkotable? No, because that is not what the tables are defined to contain and we
13816ac1de48SDmitri Gribenkowon't know the difference between the old bad tables and the new good tables.
13826ac1de48SDmitri GribenkoAt best we could make our own renamed sections that contain all of the data we
13836ac1de48SDmitri Gribenkoneed.
13846ac1de48SDmitri Gribenko
13856ac1de48SDmitri GribenkoThese tables are also insufficient for what a debugger like LLDB needs.  LLDB
13866ac1de48SDmitri Gribenkouses clang for its expression parsing where LLDB acts as a PCH.  LLDB is then
13876ac1de48SDmitri Gribenkooften asked to look for type "``foo``" or namespace "``bar``", or list items in
13886ac1de48SDmitri Gribenkonamespace "``baz``".  Namespaces are not included in the pubnames or pubtypes
13896ac1de48SDmitri Gribenkotables.  Since clang asks a lot of questions when it is parsing an expression,
13906ac1de48SDmitri Gribenkowe need to be very fast when looking up names, as it happens a lot.  Having new
13916ac1de48SDmitri Gribenkoaccelerator tables that are optimized for very quick lookups will benefit this
13926ac1de48SDmitri Gribenkotype of debugging experience greatly.
13936ac1de48SDmitri Gribenko
13946ac1de48SDmitri GribenkoWe would like to generate name lookup tables that can be mapped into memory
13956ac1de48SDmitri Gribenkofrom disk, and used as is, with little or no up-front parsing.  We would also
13966ac1de48SDmitri Gribenkobe able to control the exact content of these different tables so they contain
13976ac1de48SDmitri Gribenkoexactly what we need.  The Name Accelerator Tables were designed to fix these
13986ac1de48SDmitri Gribenkoissues.  In order to solve these issues we need to:
13996ac1de48SDmitri Gribenko
14006ac1de48SDmitri Gribenko* Have a format that can be mapped into memory from disk and used as is
14016ac1de48SDmitri Gribenko* Lookups should be very fast
14026ac1de48SDmitri Gribenko* Extensible table format so these tables can be made by many producers
14036ac1de48SDmitri Gribenko* Contain all of the names needed for typical lookups out of the box
14046ac1de48SDmitri Gribenko* Strict rules for the contents of tables
14056ac1de48SDmitri Gribenko
14066ac1de48SDmitri GribenkoTable size is important and the accelerator table format should allow the reuse
14076ac1de48SDmitri Gribenkoof strings from common string tables so the strings for the names are not
14086ac1de48SDmitri Gribenkoduplicated.  We also want to make sure the table is ready to be used as-is by
14096ac1de48SDmitri Gribenkosimply mapping the table into memory with minimal header parsing.
14106ac1de48SDmitri Gribenko
14116ac1de48SDmitri GribenkoThe name lookups need to be fast and optimized for the kinds of lookups that
14126ac1de48SDmitri Gribenkodebuggers tend to do.  Optimally we would like to touch as few parts of the
14136ac1de48SDmitri Gribenkomapped table as possible when doing a name lookup and be able to quickly find
14146ac1de48SDmitri Gribenkothe name entry we are looking for, or discover there are no matches.  In the
14156ac1de48SDmitri Gribenkocase of debuggers we optimized for lookups that fail most of the time.
14166ac1de48SDmitri Gribenko
14176ac1de48SDmitri GribenkoEach table that is defined should have strict rules on exactly what is in the
14186ac1de48SDmitri Gribenkoaccelerator tables and documented so clients can rely on the content.
14196ac1de48SDmitri Gribenko
14206ac1de48SDmitri GribenkoHash Tables
14216ac1de48SDmitri Gribenko^^^^^^^^^^^
14226ac1de48SDmitri Gribenko
14236ac1de48SDmitri GribenkoStandard Hash Tables
14246ac1de48SDmitri Gribenko""""""""""""""""""""
14256ac1de48SDmitri Gribenko
14266ac1de48SDmitri GribenkoTypical hash tables have a header, buckets, and each bucket points to the
14276ac1de48SDmitri Gribenkobucket contents:
14286ac1de48SDmitri Gribenko
14296ac1de48SDmitri Gribenko.. code-block:: none
14306ac1de48SDmitri Gribenko
14316ac1de48SDmitri Gribenko  .------------.
14326ac1de48SDmitri Gribenko  |  HEADER    |
14336ac1de48SDmitri Gribenko  |------------|
14346ac1de48SDmitri Gribenko  |  BUCKETS   |
14356ac1de48SDmitri Gribenko  |------------|
14366ac1de48SDmitri Gribenko  |  DATA      |
14376ac1de48SDmitri Gribenko  `------------'
14386ac1de48SDmitri Gribenko
14396ac1de48SDmitri GribenkoThe BUCKETS are an array of offsets to DATA for each hash:
14406ac1de48SDmitri Gribenko
14416ac1de48SDmitri Gribenko.. code-block:: none
14426ac1de48SDmitri Gribenko
14436ac1de48SDmitri Gribenko  .------------.
14446ac1de48SDmitri Gribenko  | 0x00001000 | BUCKETS[0]
14456ac1de48SDmitri Gribenko  | 0x00002000 | BUCKETS[1]
14466ac1de48SDmitri Gribenko  | 0x00002200 | BUCKETS[2]
14476ac1de48SDmitri Gribenko  | 0x000034f0 | BUCKETS[3]
14486ac1de48SDmitri Gribenko  |            | ...
14496ac1de48SDmitri Gribenko  | 0xXXXXXXXX | BUCKETS[n_buckets]
14506ac1de48SDmitri Gribenko  '------------'
14516ac1de48SDmitri Gribenko
14526ac1de48SDmitri GribenkoSo for ``bucket[3]`` in the example above, we have an offset into the table
14536ac1de48SDmitri Gribenko0x000034f0 which points to a chain of entries for the bucket.  Each bucket must
14546ac1de48SDmitri Gribenkocontain a next pointer, full 32 bit hash value, the string itself, and the data
14556ac1de48SDmitri Gribenkofor the current string value.
14566ac1de48SDmitri Gribenko
14576ac1de48SDmitri Gribenko.. code-block:: none
14586ac1de48SDmitri Gribenko
14596ac1de48SDmitri Gribenko              .------------.
14606ac1de48SDmitri Gribenko  0x000034f0: | 0x00003500 | next pointer
14616ac1de48SDmitri Gribenko              | 0x12345678 | 32 bit hash
14626ac1de48SDmitri Gribenko              | "erase"    | string value
14636ac1de48SDmitri Gribenko              | data[n]    | HashData for this bucket
14646ac1de48SDmitri Gribenko              |------------|
14656ac1de48SDmitri Gribenko  0x00003500: | 0x00003550 | next pointer
14666ac1de48SDmitri Gribenko              | 0x29273623 | 32 bit hash
14676ac1de48SDmitri Gribenko              | "dump"     | string value
14686ac1de48SDmitri Gribenko              | data[n]    | HashData for this bucket
14696ac1de48SDmitri Gribenko              |------------|
14706ac1de48SDmitri Gribenko  0x00003550: | 0x00000000 | next pointer
14716ac1de48SDmitri Gribenko              | 0x82638293 | 32 bit hash
14726ac1de48SDmitri Gribenko              | "main"     | string value
14736ac1de48SDmitri Gribenko              | data[n]    | HashData for this bucket
14746ac1de48SDmitri Gribenko              `------------'
14756ac1de48SDmitri Gribenko
14766ac1de48SDmitri GribenkoThe problem with this layout for debuggers is that we need to optimize for the
14776ac1de48SDmitri Gribenkonegative lookup case where the symbol we're searching for is not present.  So
1478ce898dbbSVedant Kumarif we were to lookup "``printf``" in the table above, we would make a 32-bit
1479ce898dbbSVedant Kumarhash for "``printf``", it might match ``bucket[3]``.  We would need to go to
1480ce898dbbSVedant Kumarthe offset 0x000034f0 and start looking to see if our 32 bit hash matches.  To
1481ce898dbbSVedant Kumardo so, we need to read the next pointer, then read the hash, compare it, and
1482ce898dbbSVedant Kumarskip to the next bucket.  Each time we are skipping many bytes in memory and
1483ce898dbbSVedant Kumartouching new pages just to do the compare on the full 32 bit hash.  All of
1484ce898dbbSVedant Kumarthese accesses then tell us that we didn't have a match.
14856ac1de48SDmitri Gribenko
14866ac1de48SDmitri GribenkoName Hash Tables
14876ac1de48SDmitri Gribenko""""""""""""""""
14886ac1de48SDmitri Gribenko
14896ac1de48SDmitri GribenkoTo solve the issues mentioned above we have structured the hash tables a bit
14906ac1de48SDmitri Gribenkodifferently: a header, buckets, an array of all unique 32 bit hash values,
14916ac1de48SDmitri Gribenkofollowed by an array of hash value data offsets, one for each hash value, then
14926ac1de48SDmitri Gribenkothe data for all hash values:
14936ac1de48SDmitri Gribenko
14946ac1de48SDmitri Gribenko.. code-block:: none
14956ac1de48SDmitri Gribenko
14966ac1de48SDmitri Gribenko  .-------------.
14976ac1de48SDmitri Gribenko  |  HEADER     |
14986ac1de48SDmitri Gribenko  |-------------|
14996ac1de48SDmitri Gribenko  |  BUCKETS    |
15006ac1de48SDmitri Gribenko  |-------------|
15016ac1de48SDmitri Gribenko  |  HASHES     |
15026ac1de48SDmitri Gribenko  |-------------|
15036ac1de48SDmitri Gribenko  |  OFFSETS    |
15046ac1de48SDmitri Gribenko  |-------------|
15056ac1de48SDmitri Gribenko  |  DATA       |
15066ac1de48SDmitri Gribenko  `-------------'
15076ac1de48SDmitri Gribenko
15086ac1de48SDmitri GribenkoThe ``BUCKETS`` in the name tables are an index into the ``HASHES`` array.  By
15096ac1de48SDmitri Gribenkomaking all of the full 32 bit hash values contiguous in memory, we allow
15106ac1de48SDmitri Gribenkoourselves to efficiently check for a match while touching as little memory as
15116ac1de48SDmitri Gribenkopossible.  Most often checking the 32 bit hash values is as far as the lookup
15126ac1de48SDmitri Gribenkogoes.  If it does match, it usually is a match with no collisions.  So for a
15136ac1de48SDmitri Gribenkotable with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
15146ac1de48SDmitri Gribenkovalues, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
15156ac1de48SDmitri Gribenko``OFFSETS`` as:
15166ac1de48SDmitri Gribenko
15176ac1de48SDmitri Gribenko.. code-block:: none
15186ac1de48SDmitri Gribenko
15196ac1de48SDmitri Gribenko  .-------------------------.
15206ac1de48SDmitri Gribenko  |  HEADER.magic           | uint32_t
15216ac1de48SDmitri Gribenko  |  HEADER.version         | uint16_t
15226ac1de48SDmitri Gribenko  |  HEADER.hash_function   | uint16_t
15236ac1de48SDmitri Gribenko  |  HEADER.bucket_count    | uint32_t
15246ac1de48SDmitri Gribenko  |  HEADER.hashes_count    | uint32_t
15256ac1de48SDmitri Gribenko  |  HEADER.header_data_len | uint32_t
15266ac1de48SDmitri Gribenko  |  HEADER_DATA            | HeaderData
15276ac1de48SDmitri Gribenko  |-------------------------|
15287e66bd39SEric Christopher  |  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
15296ac1de48SDmitri Gribenko  |-------------------------|
15307e66bd39SEric Christopher  |  HASHES                 | uint32_t[n_hashes] // 32 bit hash values
15316ac1de48SDmitri Gribenko  |-------------------------|
15327e66bd39SEric Christopher  |  OFFSETS                | uint32_t[n_hashes] // 32 bit offsets to hash value data
15336ac1de48SDmitri Gribenko  |-------------------------|
15346ac1de48SDmitri Gribenko  |  ALL HASH DATA          |
15356ac1de48SDmitri Gribenko  `-------------------------'
15366ac1de48SDmitri Gribenko
15376ac1de48SDmitri GribenkoSo taking the exact same data from the standard hash example above we end up
15386ac1de48SDmitri Gribenkowith:
15396ac1de48SDmitri Gribenko
15406ac1de48SDmitri Gribenko.. code-block:: none
15416ac1de48SDmitri Gribenko
15426ac1de48SDmitri Gribenko              .------------.
15436ac1de48SDmitri Gribenko              | HEADER     |
15446ac1de48SDmitri Gribenko              |------------|
15456ac1de48SDmitri Gribenko              |          0 | BUCKETS[0]
15466ac1de48SDmitri Gribenko              |          2 | BUCKETS[1]
15476ac1de48SDmitri Gribenko              |          5 | BUCKETS[2]
15486ac1de48SDmitri Gribenko              |          6 | BUCKETS[3]
15496ac1de48SDmitri Gribenko              |            | ...
15506ac1de48SDmitri Gribenko              |        ... | BUCKETS[n_buckets]
15516ac1de48SDmitri Gribenko              |------------|
15526ac1de48SDmitri Gribenko              | 0x........ | HASHES[0]
15536ac1de48SDmitri Gribenko              | 0x........ | HASHES[1]
15546ac1de48SDmitri Gribenko              | 0x........ | HASHES[2]
15556ac1de48SDmitri Gribenko              | 0x........ | HASHES[3]
15566ac1de48SDmitri Gribenko              | 0x........ | HASHES[4]
15576ac1de48SDmitri Gribenko              | 0x........ | HASHES[5]
15586ac1de48SDmitri Gribenko              | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
15596ac1de48SDmitri Gribenko              | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
15606ac1de48SDmitri Gribenko              | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
15616ac1de48SDmitri Gribenko              | 0x........ | HASHES[9]
15626ac1de48SDmitri Gribenko              | 0x........ | HASHES[10]
15636ac1de48SDmitri Gribenko              | 0x........ | HASHES[11]
15646ac1de48SDmitri Gribenko              | 0x........ | HASHES[12]
15656ac1de48SDmitri Gribenko              | 0x........ | HASHES[13]
15666ac1de48SDmitri Gribenko              | 0x........ | HASHES[n_hashes]
15676ac1de48SDmitri Gribenko              |------------|
15686ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[0]
15696ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[1]
15706ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[2]
15716ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[3]
15726ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[4]
15736ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[5]
15746ac1de48SDmitri Gribenko              | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
15756ac1de48SDmitri Gribenko              | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
15766ac1de48SDmitri Gribenko              | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
15776ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[9]
15786ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[10]
15796ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[11]
15806ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[12]
15816ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[13]
15826ac1de48SDmitri Gribenko              | 0x........ | OFFSETS[n_hashes]
15836ac1de48SDmitri Gribenko              |------------|
15846ac1de48SDmitri Gribenko              |            |
15856ac1de48SDmitri Gribenko              |            |
15866ac1de48SDmitri Gribenko              |            |
15876ac1de48SDmitri Gribenko              |            |
15886ac1de48SDmitri Gribenko              |            |
15896ac1de48SDmitri Gribenko              |------------|
15906ac1de48SDmitri Gribenko  0x000034f0: | 0x00001203 | .debug_str ("erase")
15916ac1de48SDmitri Gribenko              | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
15926ac1de48SDmitri Gribenko              | 0x........ | HashData[0]
15936ac1de48SDmitri Gribenko              | 0x........ | HashData[1]
15946ac1de48SDmitri Gribenko              | 0x........ | HashData[2]
15956ac1de48SDmitri Gribenko              | 0x........ | HashData[3]
15966ac1de48SDmitri Gribenko              | 0x00000000 | String offset into .debug_str (terminate data for hash)
15976ac1de48SDmitri Gribenko              |------------|
15986ac1de48SDmitri Gribenko  0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
15996ac1de48SDmitri Gribenko              | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
16006ac1de48SDmitri Gribenko              | 0x........ | HashData[0]
16016ac1de48SDmitri Gribenko              | 0x........ | HashData[1]
16026ac1de48SDmitri Gribenko              | 0x00001203 | String offset into .debug_str ("dump")
16036ac1de48SDmitri Gribenko              | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
16046ac1de48SDmitri Gribenko              | 0x........ | HashData[0]
16056ac1de48SDmitri Gribenko              | 0x........ | HashData[1]
16066ac1de48SDmitri Gribenko              | 0x........ | HashData[2]
16076ac1de48SDmitri Gribenko              | 0x00000000 | String offset into .debug_str (terminate data for hash)
16086ac1de48SDmitri Gribenko              |------------|
16096ac1de48SDmitri Gribenko  0x00003550: | 0x00001203 | String offset into .debug_str ("main")
16106ac1de48SDmitri Gribenko              | 0x00000009 | A 32 bit array count - number of HashData with name "main"
16116ac1de48SDmitri Gribenko              | 0x........ | HashData[0]
16126ac1de48SDmitri Gribenko              | 0x........ | HashData[1]
16136ac1de48SDmitri Gribenko              | 0x........ | HashData[2]
16146ac1de48SDmitri Gribenko              | 0x........ | HashData[3]
16156ac1de48SDmitri Gribenko              | 0x........ | HashData[4]
16166ac1de48SDmitri Gribenko              | 0x........ | HashData[5]
16176ac1de48SDmitri Gribenko              | 0x........ | HashData[6]
16186ac1de48SDmitri Gribenko              | 0x........ | HashData[7]
16196ac1de48SDmitri Gribenko              | 0x........ | HashData[8]
16206ac1de48SDmitri Gribenko              | 0x00000000 | String offset into .debug_str (terminate data for hash)
16216ac1de48SDmitri Gribenko              `------------'
16226ac1de48SDmitri Gribenko
16236ac1de48SDmitri GribenkoSo we still have all of the same data, we just organize it more efficiently for
16246ac1de48SDmitri Gribenkodebugger lookup.  If we repeat the same "``printf``" lookup from above, we
16256ac1de48SDmitri Gribenkowould hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
16266ac1de48SDmitri Gribenkohash value and modulo it by ``n_buckets``.  ``BUCKETS[3]`` contains "6" which
16276ac1de48SDmitri Gribenkois the index into the ``HASHES`` table.  We would then compare any consecutive
16286ac1de48SDmitri Gribenko32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
16296ac1de48SDmitri Gribenko``BUCKETS[3]``.  We do this by verifying that each subsequent hash value modulo
16306ac1de48SDmitri Gribenko``n_buckets`` is still 3.  In the case of a failed lookup we would access the
16316ac1de48SDmitri Gribenkomemory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
16326ac1de48SDmitri Gribenkobefore we know that we have no match.  We don't end up marching through
16336ac1de48SDmitri Gribenkomultiple words of memory and we really keep the number of processor data cache
16346ac1de48SDmitri Gribenkolines being accessed as small as possible.
16356ac1de48SDmitri Gribenko
16366ac1de48SDmitri GribenkoThe string hash that is used for these lookup tables is the Daniel J.
16376ac1de48SDmitri GribenkoBernstein hash which is also used in the ELF ``GNU_HASH`` sections.  It is a
16386ac1de48SDmitri Gribenkovery good hash for all kinds of names in programs with very few hash
16396ac1de48SDmitri Gribenkocollisions.
16406ac1de48SDmitri Gribenko
16416ac1de48SDmitri GribenkoEmpty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
16426ac1de48SDmitri Gribenko
16436ac1de48SDmitri GribenkoDetails
16446ac1de48SDmitri Gribenko^^^^^^^
16456ac1de48SDmitri Gribenko
16466ac1de48SDmitri GribenkoThese name hash tables are designed to be generic where specializations of the
16476ac1de48SDmitri Gribenkotable get to define additional data that goes into the header ("``HeaderData``"),
16486ac1de48SDmitri Gribenkohow the string value is stored ("``KeyType``") and the content of the data for each
16496ac1de48SDmitri Gribenkohash value.
16506ac1de48SDmitri Gribenko
16516ac1de48SDmitri GribenkoHeader Layout
16526ac1de48SDmitri Gribenko"""""""""""""
16536ac1de48SDmitri Gribenko
16546ac1de48SDmitri GribenkoThe header has a fixed part, and the specialized part.  The exact format of the
16556ac1de48SDmitri Gribenkoheader is:
16566ac1de48SDmitri Gribenko
16576ac1de48SDmitri Gribenko.. code-block:: c
16586ac1de48SDmitri Gribenko
16596ac1de48SDmitri Gribenko  struct Header
16606ac1de48SDmitri Gribenko  {
16616ac1de48SDmitri Gribenko    uint32_t   magic;           // 'HASH' magic value to allow endian detection
16626ac1de48SDmitri Gribenko    uint16_t   version;         // Version number
16636ac1de48SDmitri Gribenko    uint16_t   hash_function;   // The hash function enumeration that was used
16646ac1de48SDmitri Gribenko    uint32_t   bucket_count;    // The number of buckets in this hash table
16656ac1de48SDmitri Gribenko    uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
16666ac1de48SDmitri Gribenko    uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
16676ac1de48SDmitri Gribenko                                // Specifically the length of the following HeaderData field - this does not
16686ac1de48SDmitri Gribenko                                // include the size of the preceding fields
16696ac1de48SDmitri Gribenko    HeaderData header_data;     // Implementation specific header data
16706ac1de48SDmitri Gribenko  };
16716ac1de48SDmitri Gribenko
16726ac1de48SDmitri GribenkoThe header starts with a 32 bit "``magic``" value which must be ``'HASH'``
16736ac1de48SDmitri Gribenkoencoded as an ASCII integer.  This allows the detection of the start of the
16746ac1de48SDmitri Gribenkohash table and also allows the table's byte order to be determined so the table
16756ac1de48SDmitri Gribenkocan be correctly extracted.  The "``magic``" value is followed by a 16 bit
16766ac1de48SDmitri Gribenko``version`` number which allows the table to be revised and modified in the
16776ac1de48SDmitri Gribenkofuture.  The current version number is 1. ``hash_function`` is a ``uint16_t``
16786ac1de48SDmitri Gribenkoenumeration that specifies which hash function was used to produce this table.
16796ac1de48SDmitri GribenkoThe current values for the hash function enumerations include:
16806ac1de48SDmitri Gribenko
16816ac1de48SDmitri Gribenko.. code-block:: c
16826ac1de48SDmitri Gribenko
16836ac1de48SDmitri Gribenko  enum HashFunctionType
16846ac1de48SDmitri Gribenko  {
16856ac1de48SDmitri Gribenko    eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
16866ac1de48SDmitri Gribenko  };
16876ac1de48SDmitri Gribenko
16886ac1de48SDmitri Gribenko``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
16896ac1de48SDmitri Gribenkoare in the ``BUCKETS`` array.  ``hashes_count`` is the number of unique 32 bit
16906ac1de48SDmitri Gribenkohash values that are in the ``HASHES`` array, and is the same number of offsets
16916ac1de48SDmitri Gribenkoare contained in the ``OFFSETS`` array.  ``header_data_len`` specifies the size
16926ac1de48SDmitri Gribenkoin bytes of the ``HeaderData`` that is filled in by specialized versions of
16936ac1de48SDmitri Gribenkothis table.
16946ac1de48SDmitri Gribenko
16956ac1de48SDmitri GribenkoFixed Lookup
16966ac1de48SDmitri Gribenko""""""""""""
16976ac1de48SDmitri Gribenko
16986ac1de48SDmitri GribenkoThe header is followed by the buckets, hashes, offsets, and hash value data.
16996ac1de48SDmitri Gribenko
17006ac1de48SDmitri Gribenko.. code-block:: c
17016ac1de48SDmitri Gribenko
17026ac1de48SDmitri Gribenko  struct FixedTable
17036ac1de48SDmitri Gribenko  {
17046ac1de48SDmitri Gribenko    uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
17056ac1de48SDmitri Gribenko    uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
17066ac1de48SDmitri Gribenko    uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
17076ac1de48SDmitri Gribenko  };
17086ac1de48SDmitri Gribenko
17096ac1de48SDmitri Gribenko``buckets`` is an array of 32 bit indexes into the ``hashes`` array.  The
17106ac1de48SDmitri Gribenko``hashes`` array contains all of the 32 bit hash values for all names in the
17116ac1de48SDmitri Gribenkohash table.  Each hash in the ``hashes`` table has an offset in the ``offsets``
17126ac1de48SDmitri Gribenkoarray that points to the data for the hash value.
17136ac1de48SDmitri Gribenko
17146ac1de48SDmitri GribenkoThis table setup makes it very easy to repurpose these tables to contain
17156ac1de48SDmitri Gribenkodifferent data, while keeping the lookup mechanism the same for all tables.
17166ac1de48SDmitri GribenkoThis layout also makes it possible to save the table to disk and map it in
17176ac1de48SDmitri Gribenkolater and do very efficient name lookups with little or no parsing.
17186ac1de48SDmitri Gribenko
17196ac1de48SDmitri GribenkoDWARF lookup tables can be implemented in a variety of ways and can store a lot
17206ac1de48SDmitri Gribenkoof information for each name.  We want to make the DWARF tables extensible and
17216ac1de48SDmitri Gribenkoable to store the data efficiently so we have used some of the DWARF features
17226ac1de48SDmitri Gribenkothat enable efficient data storage to define exactly what kind of data we store
17236ac1de48SDmitri Gribenkofor each name.
17246ac1de48SDmitri Gribenko
17256ac1de48SDmitri GribenkoThe ``HeaderData`` contains a definition of the contents of each HashData chunk.
17266ac1de48SDmitri GribenkoWe might want to store an offset to all of the debug information entries (DIEs)
17276ac1de48SDmitri Gribenkofor each name.  To keep things extensible, we create a list of items, or
17286ac1de48SDmitri GribenkoAtoms, that are contained in the data for each name.  First comes the type of
17296ac1de48SDmitri Gribenkothe data in each atom:
17306ac1de48SDmitri Gribenko
17316ac1de48SDmitri Gribenko.. code-block:: c
17326ac1de48SDmitri Gribenko
17336ac1de48SDmitri Gribenko  enum AtomType
17346ac1de48SDmitri Gribenko  {
17356ac1de48SDmitri Gribenko    eAtomTypeNULL       = 0u,
17366ac1de48SDmitri Gribenko    eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
17376ac1de48SDmitri Gribenko    eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
17386ac1de48SDmitri Gribenko    eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
17396ac1de48SDmitri Gribenko    eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
17406ac1de48SDmitri Gribenko    eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
17416ac1de48SDmitri Gribenko  };
17426ac1de48SDmitri Gribenko
17436ac1de48SDmitri GribenkoThe enumeration values and their meanings are:
17446ac1de48SDmitri Gribenko
17456ac1de48SDmitri Gribenko.. code-block:: none
17466ac1de48SDmitri Gribenko
17476ac1de48SDmitri Gribenko  eAtomTypeNULL       - a termination atom that specifies the end of the atom list
17486ac1de48SDmitri Gribenko  eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
17496ac1de48SDmitri Gribenko  eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
17506ac1de48SDmitri Gribenko  eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
17516ac1de48SDmitri Gribenko  eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
17526ac1de48SDmitri Gribenko  eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
17536ac1de48SDmitri Gribenko
17546ac1de48SDmitri GribenkoThen we allow each atom type to define the atom type and how the data for each
17556ac1de48SDmitri Gribenkoatom type data is encoded:
17566ac1de48SDmitri Gribenko
17576ac1de48SDmitri Gribenko.. code-block:: c
17586ac1de48SDmitri Gribenko
17596ac1de48SDmitri Gribenko  struct Atom
17606ac1de48SDmitri Gribenko  {
17616ac1de48SDmitri Gribenko    uint16_t type;  // AtomType enum value
17626ac1de48SDmitri Gribenko    uint16_t form;  // DWARF DW_FORM_XXX defines
17636ac1de48SDmitri Gribenko  };
17646ac1de48SDmitri Gribenko
17656ac1de48SDmitri GribenkoThe ``form`` type above is from the DWARF specification and defines the exact
17666ac1de48SDmitri Gribenkoencoding of the data for the Atom type.  See the DWARF specification for the
17676ac1de48SDmitri Gribenko``DW_FORM_`` definitions.
17686ac1de48SDmitri Gribenko
17696ac1de48SDmitri Gribenko.. code-block:: c
17706ac1de48SDmitri Gribenko
17716ac1de48SDmitri Gribenko  struct HeaderData
17726ac1de48SDmitri Gribenko  {
17736ac1de48SDmitri Gribenko    uint32_t die_offset_base;
17746ac1de48SDmitri Gribenko    uint32_t atom_count;
17756ac1de48SDmitri Gribenko    Atoms    atoms[atom_count0];
17766ac1de48SDmitri Gribenko  };
17776ac1de48SDmitri Gribenko
17786ac1de48SDmitri Gribenko``HeaderData`` defines the base DIE offset that should be added to any atoms
17796ac1de48SDmitri Gribenkothat are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
17806ac1de48SDmitri Gribenko``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``.  It also defines
17816ac1de48SDmitri Gribenkowhat is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
17826ac1de48SDmitri Gribenkoeach field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
17836ac1de48SDmitri Gribenkoshould be interpreted.
17846ac1de48SDmitri Gribenko
17856ac1de48SDmitri GribenkoFor the current implementations of the "``.apple_names``" (all functions +
17866ac1de48SDmitri Gribenkoglobals), the "``.apple_types``" (names of all types that are defined), and
17876ac1de48SDmitri Gribenkothe "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
17886ac1de48SDmitri Gribenkoarray to be:
17896ac1de48SDmitri Gribenko
17906ac1de48SDmitri Gribenko.. code-block:: c
17916ac1de48SDmitri Gribenko
17926ac1de48SDmitri Gribenko  HeaderData.atom_count = 1;
17936ac1de48SDmitri Gribenko  HeaderData.atoms[0].type = eAtomTypeDIEOffset;
17946ac1de48SDmitri Gribenko  HeaderData.atoms[0].form = DW_FORM_data4;
17956ac1de48SDmitri Gribenko
17966ac1de48SDmitri GribenkoThis defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
17976ac1de48SDmitri Gribenkoencoded as a 32 bit value (DW_FORM_data4).  This allows a single name to have
17986ac1de48SDmitri Gribenkomultiple matching DIEs in a single file, which could come up with an inlined
17996ac1de48SDmitri Gribenkofunction for instance.  Future tables could include more information about the
18006ac1de48SDmitri GribenkoDIE such as flags indicating if the DIE is a function, method, block,
18016ac1de48SDmitri Gribenkoor inlined.
18026ac1de48SDmitri Gribenko
18036ac1de48SDmitri GribenkoThe KeyType for the DWARF table is a 32 bit string table offset into the
18046ac1de48SDmitri Gribenko".debug_str" table.  The ".debug_str" is the string table for the DWARF which
18056ac1de48SDmitri Gribenkomay already contain copies of all of the strings.  This helps make sure, with
18066ac1de48SDmitri Gribenkohelp from the compiler, that we reuse the strings between all of the DWARF
18076ac1de48SDmitri Gribenkosections and keeps the hash table size down.  Another benefit to having the
18086ac1de48SDmitri Gribenkocompiler generate all strings as DW_FORM_strp in the debug info, is that
18096ac1de48SDmitri GribenkoDWARF parsing can be made much faster.
18106ac1de48SDmitri Gribenko
18116ac1de48SDmitri GribenkoAfter a lookup is made, we get an offset into the hash data.  The hash data
18126ac1de48SDmitri Gribenkoneeds to be able to deal with 32 bit hash collisions, so the chunk of data
18136ac1de48SDmitri Gribenkoat the offset in the hash data consists of a triple:
18146ac1de48SDmitri Gribenko
18156ac1de48SDmitri Gribenko.. code-block:: c
18166ac1de48SDmitri Gribenko
18176ac1de48SDmitri Gribenko  uint32_t str_offset
18186ac1de48SDmitri Gribenko  uint32_t hash_data_count
18196ac1de48SDmitri Gribenko  HashData[hash_data_count]
18206ac1de48SDmitri Gribenko
18216ac1de48SDmitri GribenkoIf "str_offset" is zero, then the bucket contents are done. 99.9% of the
18226ac1de48SDmitri Gribenkohash data chunks contain a single item (no 32 bit hash collision):
18236ac1de48SDmitri Gribenko
18246ac1de48SDmitri Gribenko.. code-block:: none
18256ac1de48SDmitri Gribenko
18266ac1de48SDmitri Gribenko  .------------.
18276ac1de48SDmitri Gribenko  | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
18286ac1de48SDmitri Gribenko  | 0x00000004 | uint32_t HashData count
18296ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[0] DIE offset
18306ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[1] DIE offset
18316ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[2] DIE offset
18326ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[3] DIE offset
18336ac1de48SDmitri Gribenko  | 0x00000000 | uint32_t KeyType (end of hash chain)
18346ac1de48SDmitri Gribenko  `------------'
18356ac1de48SDmitri Gribenko
18366ac1de48SDmitri GribenkoIf there are collisions, you will have multiple valid string offsets:
18376ac1de48SDmitri Gribenko
18386ac1de48SDmitri Gribenko.. code-block:: none
18396ac1de48SDmitri Gribenko
18406ac1de48SDmitri Gribenko  .------------.
18416ac1de48SDmitri Gribenko  | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
18426ac1de48SDmitri Gribenko  | 0x00000004 | uint32_t HashData count
18436ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[0] DIE offset
18446ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[1] DIE offset
18456ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[2] DIE offset
18466ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[3] DIE offset
18476ac1de48SDmitri Gribenko  | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
18486ac1de48SDmitri Gribenko  | 0x00000002 | uint32_t HashData count
18496ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[0] DIE offset
18506ac1de48SDmitri Gribenko  | 0x........ | uint32_t HashData[1] DIE offset
18516ac1de48SDmitri Gribenko  | 0x00000000 | uint32_t KeyType (end of hash chain)
18526ac1de48SDmitri Gribenko  `------------'
18536ac1de48SDmitri Gribenko
18546ac1de48SDmitri GribenkoCurrent testing with real world C++ binaries has shown that there is around 1
18556ac1de48SDmitri Gribenko32 bit hash collision per 100,000 name entries.
18566ac1de48SDmitri Gribenko
18576ac1de48SDmitri GribenkoContents
18586ac1de48SDmitri Gribenko^^^^^^^^
18596ac1de48SDmitri Gribenko
18606ac1de48SDmitri GribenkoAs we said, we want to strictly define exactly what is included in the
18616ac1de48SDmitri Gribenkodifferent tables.  For DWARF, we have 3 tables: "``.apple_names``",
18626ac1de48SDmitri Gribenko"``.apple_types``", and "``.apple_namespaces``".
18636ac1de48SDmitri Gribenko
18646ac1de48SDmitri Gribenko"``.apple_names``" sections should contain an entry for each DWARF DIE whose
18656ac1de48SDmitri Gribenko``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
18666ac1de48SDmitri Gribenko``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
18676ac1de48SDmitri Gribenko``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``.  It also contains
18686ac1de48SDmitri Gribenko``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
18696ac1de48SDmitri Gribenkostatic variables).  All global and static variables should be included,
18706ac1de48SDmitri Gribenkoincluding those scoped within functions and classes.  For example using the
18716ac1de48SDmitri Gribenkofollowing code:
18726ac1de48SDmitri Gribenko
18736ac1de48SDmitri Gribenko.. code-block:: c
18746ac1de48SDmitri Gribenko
18756ac1de48SDmitri Gribenko  static int var = 0;
18766ac1de48SDmitri Gribenko
18776ac1de48SDmitri Gribenko  void f ()
18786ac1de48SDmitri Gribenko  {
18796ac1de48SDmitri Gribenko    static int var = 0;
18806ac1de48SDmitri Gribenko  }
18816ac1de48SDmitri Gribenko
18826ac1de48SDmitri GribenkoBoth of the static ``var`` variables would be included in the table.  All
18836ac1de48SDmitri Gribenkofunctions should emit both their full names and their basenames.  For C or C++,
18846ac1de48SDmitri Gribenkothe full name is the mangled name (if available) which is usually in the
18856ac1de48SDmitri Gribenko``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
18866ac1de48SDmitri Gribenkofunction basename.  If global or static variables have a mangled name in a
18876ac1de48SDmitri Gribenko``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
18886ac1de48SDmitri Gribenkosimple name found in the ``DW_AT_name`` attribute.
18896ac1de48SDmitri Gribenko
18906ac1de48SDmitri Gribenko"``.apple_types``" sections should contain an entry for each DWARF DIE whose
18916ac1de48SDmitri Gribenkotag is one of:
18926ac1de48SDmitri Gribenko
18936ac1de48SDmitri Gribenko* DW_TAG_array_type
18946ac1de48SDmitri Gribenko* DW_TAG_class_type
18956ac1de48SDmitri Gribenko* DW_TAG_enumeration_type
18966ac1de48SDmitri Gribenko* DW_TAG_pointer_type
18976ac1de48SDmitri Gribenko* DW_TAG_reference_type
18986ac1de48SDmitri Gribenko* DW_TAG_string_type
18996ac1de48SDmitri Gribenko* DW_TAG_structure_type
19006ac1de48SDmitri Gribenko* DW_TAG_subroutine_type
19016ac1de48SDmitri Gribenko* DW_TAG_typedef
19026ac1de48SDmitri Gribenko* DW_TAG_union_type
19036ac1de48SDmitri Gribenko* DW_TAG_ptr_to_member_type
19046ac1de48SDmitri Gribenko* DW_TAG_set_type
19056ac1de48SDmitri Gribenko* DW_TAG_subrange_type
19066ac1de48SDmitri Gribenko* DW_TAG_base_type
19076ac1de48SDmitri Gribenko* DW_TAG_const_type
190834435fd1SLuís Ferreira* DW_TAG_immutable_type
19096ac1de48SDmitri Gribenko* DW_TAG_file_type
19106ac1de48SDmitri Gribenko* DW_TAG_namelist
19116ac1de48SDmitri Gribenko* DW_TAG_packed_type
19126ac1de48SDmitri Gribenko* DW_TAG_volatile_type
19136ac1de48SDmitri Gribenko* DW_TAG_restrict_type
1914e1156c2eSVictor Leschuk* DW_TAG_atomic_type
19156ac1de48SDmitri Gribenko* DW_TAG_interface_type
19166ac1de48SDmitri Gribenko* DW_TAG_unspecified_type
19176ac1de48SDmitri Gribenko* DW_TAG_shared_type
19186ac1de48SDmitri Gribenko
19196ac1de48SDmitri GribenkoOnly entries with a ``DW_AT_name`` attribute are included, and the entry must
19206ac1de48SDmitri Gribenkonot be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
19216ac1de48SDmitri Gribenkovalue).  For example, using the following code:
19226ac1de48SDmitri Gribenko
19236ac1de48SDmitri Gribenko.. code-block:: c
19246ac1de48SDmitri Gribenko
19256ac1de48SDmitri Gribenko  int main ()
19266ac1de48SDmitri Gribenko  {
19276ac1de48SDmitri Gribenko    int *b = 0;
19286ac1de48SDmitri Gribenko    return *b;
19296ac1de48SDmitri Gribenko  }
19306ac1de48SDmitri Gribenko
19316ac1de48SDmitri GribenkoWe get a few type DIEs:
19326ac1de48SDmitri Gribenko
19336ac1de48SDmitri Gribenko.. code-block:: none
19346ac1de48SDmitri Gribenko
19356ac1de48SDmitri Gribenko  0x00000067:     TAG_base_type [5]
19366ac1de48SDmitri Gribenko                  AT_encoding( DW_ATE_signed )
19376ac1de48SDmitri Gribenko                  AT_name( "int" )
19386ac1de48SDmitri Gribenko                  AT_byte_size( 0x04 )
19396ac1de48SDmitri Gribenko
19406ac1de48SDmitri Gribenko  0x0000006e:     TAG_pointer_type [6]
19416ac1de48SDmitri Gribenko                  AT_type( {0x00000067} ( int ) )
19426ac1de48SDmitri Gribenko                  AT_byte_size( 0x08 )
19436ac1de48SDmitri Gribenko
19446ac1de48SDmitri GribenkoThe DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
19456ac1de48SDmitri Gribenko
19466ac1de48SDmitri Gribenko"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
19476ac1de48SDmitri GribenkoIf we run into a namespace that has no name this is an anonymous namespace, and
19486ac1de48SDmitri Gribenkothe name should be output as "``(anonymous namespace)``" (without the quotes).
19496ac1de48SDmitri GribenkoWhy?  This matches the output of the ``abi::cxa_demangle()`` that is in the
19506ac1de48SDmitri Gribenkostandard C++ library that demangles mangled names.
19516ac1de48SDmitri Gribenko
19526ac1de48SDmitri Gribenko
19536ac1de48SDmitri GribenkoLanguage Extensions and File Format Changes
19546ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19556ac1de48SDmitri Gribenko
19566ac1de48SDmitri GribenkoObjective-C Extensions
19576ac1de48SDmitri Gribenko""""""""""""""""""""""
19586ac1de48SDmitri Gribenko
19596ac1de48SDmitri Gribenko"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
19606ac1de48SDmitri GribenkoObjective-C class.  The name used in the hash table is the name of the
19616ac1de48SDmitri GribenkoObjective-C class itself.  If the Objective-C class has a category, then an
19626ac1de48SDmitri Gribenkoentry is made for both the class name without the category, and for the class
19636ac1de48SDmitri Gribenkoname with the category.  So if we have a DIE at offset 0x1234 with a name of
19646ac1de48SDmitri Gribenkomethod "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
19656ac1de48SDmitri Gribenkoan entry for "``NSString``" that points to DIE 0x1234, and an entry for
19666ac1de48SDmitri Gribenko"``NSString(my_additions)``" that points to 0x1234.  This allows us to quickly
19676ac1de48SDmitri Gribenkotrack down all Objective-C methods for an Objective-C class when doing
19686ac1de48SDmitri Gribenkoexpressions.  It is needed because of the dynamic nature of Objective-C where
19696ac1de48SDmitri Gribenkoanyone can add methods to a class.  The DWARF for Objective-C methods is also
19706ac1de48SDmitri Gribenkoemitted differently from C++ classes where the methods are not usually
19716ac1de48SDmitri Gribenkocontained in the class definition, they are scattered about across one or more
19726ac1de48SDmitri Gribenkocompile units.  Categories can also be defined in different shared libraries.
19736ac1de48SDmitri GribenkoSo we need to be able to quickly find all of the methods and class functions
19746ac1de48SDmitri Gribenkogiven the Objective-C class name, or quickly find all methods and class
19756ac1de48SDmitri Gribenkofunctions for a class + category name.  This table does not contain any
19766ac1de48SDmitri Gribenkoselector names, it just maps Objective-C class names (or class names +
19776ac1de48SDmitri Gribenkocategory) to all of the methods and class functions.  The selectors are added
19786ac1de48SDmitri Gribenkoas function basenames in the "``.debug_names``" section.
19796ac1de48SDmitri Gribenko
19806ac1de48SDmitri GribenkoIn the "``.apple_names``" section for Objective-C functions, the full name is
19816ac1de48SDmitri Gribenkothe entire function name with the brackets ("``-[NSString
19826ac1de48SDmitri GribenkostringWithCString:]``") and the basename is the selector only
19836ac1de48SDmitri Gribenko("``stringWithCString:``").
19846ac1de48SDmitri Gribenko
19856ac1de48SDmitri GribenkoMach-O Changes
19866ac1de48SDmitri Gribenko""""""""""""""
19876ac1de48SDmitri Gribenko
1988f907b891SAlp TokerThe sections names for the apple hash tables are for non-mach-o files.  For
19896ac1de48SDmitri Gribenkomach-o files, the sections should be contained in the ``__DWARF`` segment with
19906ac1de48SDmitri Gribenkonames as follows:
19916ac1de48SDmitri Gribenko
19926ac1de48SDmitri Gribenko* "``.apple_names``" -> "``__apple_names``"
19936ac1de48SDmitri Gribenko* "``.apple_types``" -> "``__apple_types``"
19946ac1de48SDmitri Gribenko* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
19956ac1de48SDmitri Gribenko* "``.apple_objc``" -> "``__apple_objc``"
19966ac1de48SDmitri Gribenko
19970ad60a90SReid Kleckner.. _codeview:
19980ad60a90SReid Kleckner
19990ad60a90SReid KlecknerCodeView Debug Info Format
20000ad60a90SReid Kleckner==========================
20010ad60a90SReid Kleckner
20020ad60a90SReid KlecknerLLVM supports emitting CodeView, the Microsoft debug info format, and this
20030ad60a90SReid Klecknersection describes the design and implementation of that support.
20040ad60a90SReid Kleckner
20050ad60a90SReid KlecknerFormat Background
20060ad60a90SReid Kleckner-----------------
20070ad60a90SReid Kleckner
20080ad60a90SReid KlecknerCodeView as a format is clearly oriented around C++ debugging, and in C++, the
20090ad60a90SReid Klecknermajority of debug information tends to be type information. Therefore, the
20100ad60a90SReid Kleckneroverriding design constraint of CodeView is the separation of type information
20110ad60a90SReid Klecknerfrom other "symbol" information so that type information can be efficiently
20120ad60a90SReid Klecknermerged across translation units. Both type information and symbol information is
20130ad60a90SReid Klecknergenerally stored as a sequence of records, where each record begins with a
20140ad60a90SReid Kleckner16-bit record size and a 16-bit record kind.
20150ad60a90SReid Kleckner
20160ad60a90SReid KlecknerType information is usually stored in the ``.debug$T`` section of the object
20170ad60a90SReid Klecknerfile.  All other debug info, such as line info, string table, symbol info, and
20180ad60a90SReid Klecknerinlinee info, is stored in one or more ``.debug$S`` sections. There may only be
20190ad60a90SReid Klecknerone ``.debug$T`` section per object file, since all other debug info refers to
20200ad60a90SReid Klecknerit. If a PDB (enabled by the ``/Zi`` MSVC option) was used during compilation,
20210ad60a90SReid Klecknerthe ``.debug$T`` section will contain only an ``LF_TYPESERVER2`` record pointing
20220ad60a90SReid Klecknerto the PDB. When using PDBs, symbol information appears to remain in the object
20230ad60a90SReid Klecknerfile ``.debug$S`` sections.
20240ad60a90SReid Kleckner
20250ad60a90SReid KlecknerType records are referred to by their index, which is the number of records in
20260ad60a90SReid Klecknerthe stream before a given record plus ``0x1000``. Many common basic types, such
20270ad60a90SReid Kleckneras the basic integral types and unqualified pointers to them, are represented
20280ad60a90SReid Klecknerusing type indices less than ``0x1000``. Such basic types are built in to
20290ad60a90SReid KlecknerCodeView consumers and do not require type records.
20300ad60a90SReid Kleckner
20310ad60a90SReid KlecknerEach type record may only contain type indices that are less than its own type
20320ad60a90SReid Klecknerindex. This ensures that the graph of type stream references is acyclic. While
20330ad60a90SReid Klecknerthe source-level type graph may contain cycles through pointer types (consider a
20340ad60a90SReid Klecknerlinked list struct), these cycles are removed from the type stream by always
20350ad60a90SReid Klecknerreferring to the forward declaration record of user-defined record types. Only
20360ad60a90SReid Kleckner"symbol" records in the ``.debug$S`` streams may refer to complete,
20370ad60a90SReid Klecknernon-forward-declaration type records.
20380ad60a90SReid Kleckner
20390ad60a90SReid KlecknerWorking with CodeView
20400ad60a90SReid Kleckner---------------------
20410ad60a90SReid Kleckner
20420ad60a90SReid KlecknerThese are instructions for some common tasks for developers working to improve
20430ad60a90SReid KlecknerLLVM's CodeView support. Most of them revolve around using the CodeView dumper
20440ad60a90SReid Klecknerembedded in ``llvm-readobj``.
20450ad60a90SReid Kleckner
20460ad60a90SReid Kleckner* Testing MSVC's output::
20470ad60a90SReid Kleckner
20480ad60a90SReid Kleckner    $ cl -c -Z7 foo.cpp # Use /Z7 to keep types in the object file
2049e29e30b1SFangrui Song    $ llvm-readobj --codeview foo.obj
20500ad60a90SReid Kleckner
20510ad60a90SReid Kleckner* Getting LLVM IR debug info out of Clang::
20520ad60a90SReid Kleckner
20530ad60a90SReid Kleckner    $ clang -g -gcodeview --target=x86_64-windows-msvc foo.cpp -S -emit-llvm
20540ad60a90SReid Kleckner
20550ad60a90SReid Kleckner  Use this to generate LLVM IR for LLVM test cases.
20560ad60a90SReid Kleckner
20570ad60a90SReid Kleckner* Generate and dump CodeView from LLVM IR metadata::
20580ad60a90SReid Kleckner
20590ad60a90SReid Kleckner    $ llc foo.ll -filetype=obj -o foo.obj
2060e29e30b1SFangrui Song    $ llvm-readobj --codeview foo.obj > foo.txt
20610ad60a90SReid Kleckner
20620ad60a90SReid Kleckner  Use this pattern in lit test cases and FileCheck the output of llvm-readobj
20630ad60a90SReid Kleckner
20640ad60a90SReid KlecknerImproving LLVM's CodeView support is a process of finding interesting type
20650ad60a90SReid Klecknerrecords, constructing a C++ test case that makes MSVC emit those records,
20660ad60a90SReid Klecknerdumping the records, understanding them, and then generating equivalent records
20670ad60a90SReid Klecknerin LLVM's backend.
2068