16ac1de48SDmitri Gribenko================================ 26ac1de48SDmitri GribenkoSource Level Debugging with LLVM 36ac1de48SDmitri Gribenko================================ 46ac1de48SDmitri Gribenko 56ac1de48SDmitri Gribenko.. contents:: 66ac1de48SDmitri Gribenko :local: 76ac1de48SDmitri Gribenko 86ac1de48SDmitri GribenkoIntroduction 96ac1de48SDmitri Gribenko============ 106ac1de48SDmitri Gribenko 116ac1de48SDmitri GribenkoThis document is the central repository for all information pertaining to debug 126ac1de48SDmitri Gribenkoinformation in LLVM. It describes the :ref:`actual format that the LLVM debug 136ac1de48SDmitri Gribenkoinformation takes <format>`, which is useful for those interested in creating 146ac1de48SDmitri Gribenkofront-ends or dealing directly with the information. Further, this document 156ac1de48SDmitri Gribenkoprovides specific examples of what debug information for C/C++ looks like. 166ac1de48SDmitri Gribenko 176ac1de48SDmitri GribenkoPhilosophy behind LLVM debugging information 186ac1de48SDmitri Gribenko-------------------------------------------- 196ac1de48SDmitri Gribenko 206ac1de48SDmitri GribenkoThe idea of the LLVM debugging information is to capture how the important 216ac1de48SDmitri Gribenkopieces of the source-language's Abstract Syntax Tree map onto LLVM code. 226ac1de48SDmitri GribenkoSeveral design aspects have shaped the solution that appears here. The 236ac1de48SDmitri Gribenkoimportant ones are: 246ac1de48SDmitri Gribenko 256ac1de48SDmitri Gribenko* Debugging information should have very little impact on the rest of the 266ac1de48SDmitri Gribenko compiler. No transformations, analyses, or code generators should need to 276ac1de48SDmitri Gribenko be modified because of debugging information. 286ac1de48SDmitri Gribenko 296ac1de48SDmitri Gribenko* LLVM optimizations should interact in :ref:`well-defined and easily described 306ac1de48SDmitri Gribenko ways <intro_debugopt>` with the debugging information. 316ac1de48SDmitri Gribenko 326ac1de48SDmitri Gribenko* Because LLVM is designed to support arbitrary programming languages, 336ac1de48SDmitri Gribenko LLVM-to-LLVM tools should not need to know anything about the semantics of 346ac1de48SDmitri Gribenko the source-level-language. 356ac1de48SDmitri Gribenko 366ac1de48SDmitri Gribenko* Source-level languages are often **widely** different from one another. 376ac1de48SDmitri Gribenko LLVM should not put any restrictions of the flavor of the source-language, 386ac1de48SDmitri Gribenko and the debugging information should work with any language. 396ac1de48SDmitri Gribenko 406ac1de48SDmitri Gribenko* With code generator support, it should be possible to use an LLVM compiler 416ac1de48SDmitri Gribenko to compile a program to native machine code and standard debugging 426ac1de48SDmitri Gribenko formats. This allows compatibility with traditional machine-code level 436ac1de48SDmitri Gribenko debuggers, like GDB or DBX. 446ac1de48SDmitri Gribenko 456ac1de48SDmitri GribenkoThe approach used by the LLVM implementation is to use a small set of 466ac1de48SDmitri Gribenko:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping 476ac1de48SDmitri Gribenkobetween LLVM program objects and the source-level objects. The description of 486ac1de48SDmitri Gribenkothe source-level program is maintained in LLVM metadata in an 496ac1de48SDmitri Gribenko:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end 506ac1de48SDmitri Gribenkocurrently uses working draft 7 of the `DWARF 3 standard 516ac1de48SDmitri Gribenko<http://www.eagercon.com/dwarf/dwarf3std.htm>`_). 526ac1de48SDmitri Gribenko 536ac1de48SDmitri GribenkoWhen a program is being debugged, a debugger interacts with the user and turns 546ac1de48SDmitri Gribenkothe stored debug information into source-language specific information. As 556ac1de48SDmitri Gribenkosuch, a debugger must be aware of the source-language, and is thus tied to a 566ac1de48SDmitri Gribenkospecific language or family of languages. 576ac1de48SDmitri Gribenko 586ac1de48SDmitri GribenkoDebug information consumers 596ac1de48SDmitri Gribenko--------------------------- 606ac1de48SDmitri Gribenko 616ac1de48SDmitri GribenkoThe role of debug information is to provide meta information normally stripped 626ac1de48SDmitri Gribenkoaway during the compilation process. This meta information provides an LLVM 636ac1de48SDmitri Gribenkouser a relationship between generated code and the original program source 646ac1de48SDmitri Gribenkocode. 656ac1de48SDmitri Gribenko 660ad60a90SReid KlecknerCurrently, there are two backend consumers of debug info: DwarfDebug and 67ce898dbbSVedant KumarCodeViewDebug. DwarfDebug produces DWARF suitable for use with GDB, LLDB, and 680ad60a90SReid Klecknerother DWARF-based debuggers. :ref:`CodeViewDebug <codeview>` produces CodeView, 690ad60a90SReid Klecknerthe Microsoft debug info format, which is usable with Microsoft debuggers such 700ad60a90SReid Kleckneras Visual Studio and WinDBG. LLVM's debug information format is mostly derived 710ad60a90SReid Klecknerfrom and inspired by DWARF, but it is feasible to translate into other target 720ad60a90SReid Klecknerdebug info formats such as STABS. 736ac1de48SDmitri Gribenko 746ac1de48SDmitri GribenkoIt would also be reasonable to use debug information to feed profiling tools 756ac1de48SDmitri Gribenkofor analysis of generated code, or, tools for reconstructing the original 766ac1de48SDmitri Gribenkosource from generated code. 776ac1de48SDmitri Gribenko 786ac1de48SDmitri Gribenko.. _intro_debugopt: 796ac1de48SDmitri Gribenko 80ad6ff878SAnastasis GrammenosDebug information and optimizations 81ad6ff878SAnastasis Grammenos----------------------------------- 826ac1de48SDmitri Gribenko 836ac1de48SDmitri GribenkoAn extremely high priority of LLVM debugging information is to make it interact 846ac1de48SDmitri Gribenkowell with optimizations and analysis. In particular, the LLVM debug 856ac1de48SDmitri Gribenkoinformation provides the following guarantees: 866ac1de48SDmitri Gribenko 876ac1de48SDmitri Gribenko* LLVM debug information **always provides information to accurately read 886ac1de48SDmitri Gribenko the source-level state of the program**, regardless of which LLVM 89b429a0feSVedant Kumar optimizations have been run. :doc:`HowToUpdateDebugInfo` specifies how debug 90b429a0feSVedant Kumar info should be updated in various kinds of code transformations to avoid 91b429a0feSVedant Kumar breaking this guarantee, and how to preserve as much useful debug info as 92b429a0feSVedant Kumar possible. Note that some optimizations may impact the ability to modify the 93b429a0feSVedant Kumar current state of the program with a debugger, such as setting program 94b429a0feSVedant Kumar variables, or calling functions that have been deleted. 956ac1de48SDmitri Gribenko 96ce898dbbSVedant Kumar* As desired, LLVM optimizations can be upgraded to be aware of debugging 97ce898dbbSVedant Kumar information, allowing them to update the debugging information as they 98ce898dbbSVedant Kumar perform aggressive optimizations. This means that, with effort, the LLVM 99ce898dbbSVedant Kumar optimizers could optimize debug code just as well as non-debug code. 1006ac1de48SDmitri Gribenko 1016ac1de48SDmitri Gribenko* LLVM debug information does not prevent optimizations from 1026ac1de48SDmitri Gribenko happening (for example inlining, basic block reordering/merging/cleanup, 1036ac1de48SDmitri Gribenko tail duplication, etc). 1046ac1de48SDmitri Gribenko 1056ac1de48SDmitri Gribenko* LLVM debug information is automatically optimized along with the rest of 1066ac1de48SDmitri Gribenko the program, using existing facilities. For example, duplicate 1076ac1de48SDmitri Gribenko information is automatically merged by the linker, and unused information 1086ac1de48SDmitri Gribenko is automatically removed. 1096ac1de48SDmitri Gribenko 1106ac1de48SDmitri GribenkoBasically, the debug information allows you to compile a program with 1116ac1de48SDmitri Gribenko"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify 1126ac1de48SDmitri Gribenkothe program as it executes from a debugger. Compiling a program with 1136ac1de48SDmitri Gribenko"``-O3 -g``" gives you full debug information that is always available and 1146ac1de48SDmitri Gribenkoaccurate for reading (e.g., you get accurate stack traces despite tail call 1156ac1de48SDmitri Gribenkoelimination and inlining), but you might lose the ability to modify the program 116ce898dbbSVedant Kumarand call functions which were optimized out of the program, or inlined away 1176ac1de48SDmitri Gribenkocompletely. 1186ac1de48SDmitri Gribenko 1194f340e97SMatthias BraunThe :doc:`LLVM test-suite <TestSuiteMakefileGuide>` provides a framework to 1204f340e97SMatthias Brauntest the optimizer's handling of debugging information. It can be run like 1214f340e97SMatthias Braunthis: 1226ac1de48SDmitri Gribenko 1236ac1de48SDmitri Gribenko.. code-block:: bash 1246ac1de48SDmitri Gribenko 1256ac1de48SDmitri Gribenko % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level 1266ac1de48SDmitri Gribenko % make TEST=dbgopt 1276ac1de48SDmitri Gribenko 1286ac1de48SDmitri GribenkoThis will test impact of debugging information on optimization passes. If 1296ac1de48SDmitri Gribenkodebugging information influences optimization passes then it will be reported 1306ac1de48SDmitri Gribenkoas a failure. See :doc:`TestingGuide` for more information on LLVM test 1316ac1de48SDmitri Gribenkoinfrastructure and how to run various tests. 1326ac1de48SDmitri Gribenko 1336ac1de48SDmitri Gribenko.. _format: 1346ac1de48SDmitri Gribenko 1356ac1de48SDmitri GribenkoDebugging information format 1366ac1de48SDmitri Gribenko============================ 1376ac1de48SDmitri Gribenko 1386ac1de48SDmitri GribenkoLLVM debugging information has been carefully designed to make it possible for 1396ac1de48SDmitri Gribenkothe optimizer to optimize the program and debugging information without 1406ac1de48SDmitri Gribenkonecessarily having to know anything about debugging information. In 1416ac1de48SDmitri Gribenkoparticular, the use of metadata avoids duplicated debugging information from 1426ac1de48SDmitri Gribenkothe beginning, and the global dead code elimination pass automatically deletes 1436ac1de48SDmitri Gribenkodebugging information for a function if it decides to delete the function. 1446ac1de48SDmitri Gribenko 1456ac1de48SDmitri GribenkoTo do this, most of the debugging information (descriptors for types, 1466ac1de48SDmitri Gribenkovariables, functions, source files, etc) is inserted by the language front-end 1476ac1de48SDmitri Gribenkoin the form of LLVM metadata. 1486ac1de48SDmitri Gribenko 1496ac1de48SDmitri GribenkoDebug information is designed to be agnostic about the target debugger and 1506ac1de48SDmitri Gribenkodebugging information representation (e.g. DWARF/Stabs/etc). It uses a generic 1516ac1de48SDmitri Gribenkopass to decode the information that represents variables, types, functions, 1526ac1de48SDmitri Gribenkonamespaces, etc: this allows for arbitrary source-language semantics and 1536ac1de48SDmitri Gribenkotype-systems to be used, as long as there is a module written for the target 1546ac1de48SDmitri Gribenkodebugger to interpret the information. 1556ac1de48SDmitri Gribenko 1566ac1de48SDmitri GribenkoTo provide basic functionality, the LLVM debugger does have to make some 1576ac1de48SDmitri Gribenkoassumptions about the source-level language being debugged, though it keeps 1586ac1de48SDmitri Gribenkothese to a minimum. The only common features that the LLVM debugger assumes 159605308a4SMichael Kupersteinexist are `source files <LangRef.html#difile>`_, and `program objects 160605308a4SMichael Kuperstein<LangRef.html#diglobalvariable>`_. These abstract objects are used by a 161d937cd9fSDuncan P. N. Exon Smithdebugger to form stack traces, show information about local variables, etc. 1626ac1de48SDmitri Gribenko 1636ac1de48SDmitri GribenkoThis section of the documentation first describes the representation aspects 1646ac1de48SDmitri Gribenkocommon to any source-language. :ref:`ccxx_frontend` describes the data layout 1656ac1de48SDmitri Gribenkoconventions used by the C and C++ front-ends. 1666ac1de48SDmitri Gribenko 167d937cd9fSDuncan P. N. Exon SmithDebug information descriptors are `specialized metadata nodes 168d937cd9fSDuncan P. N. Exon Smith<LangRef.html#specialized-metadata>`_, first-class subclasses of ``Metadata``. 169b1416837SAdrian Prantl 1706ac1de48SDmitri Gribenko.. _format_common_intrinsics: 1716ac1de48SDmitri Gribenko 1726ac1de48SDmitri GribenkoDebugger intrinsic functions 173d937cd9fSDuncan P. N. Exon Smith---------------------------- 1746ac1de48SDmitri Gribenko 1756ac1de48SDmitri GribenkoLLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to 1760fe506bcSReid Klecknertrack source local variables through optimization and code generation. 1770fe506bcSReid Kleckner 1780fe506bcSReid Kleckner``llvm.dbg.addr`` 1790fe506bcSReid Kleckner^^^^^^^^^^^^^^^^^^^^ 1800fe506bcSReid Kleckner 1810fe506bcSReid Kleckner.. code-block:: llvm 1820fe506bcSReid Kleckner 1830fe506bcSReid Kleckner void @llvm.dbg.addr(metadata, metadata, metadata) 1840fe506bcSReid Kleckner 1850fe506bcSReid KlecknerThis intrinsic provides information about a local element (e.g., variable). 1860fe506bcSReid KlecknerThe first argument is metadata holding the address of variable, typically a 1870fe506bcSReid Klecknerstatic alloca in the function entry block. The second argument is a 1880fe506bcSReid Kleckner`local variable <LangRef.html#dilocalvariable>`_ containing a description of 1890fe506bcSReid Klecknerthe variable. The third argument is a `complex expression 1900fe506bcSReid Kleckner<LangRef.html#diexpression>`_. An `llvm.dbg.addr` intrinsic describes the 1910fe506bcSReid Kleckner*address* of a source variable. 1920fe506bcSReid Kleckner 193aaecdc44SJonas Devlieghere.. code-block:: text 1940fe506bcSReid Kleckner 1950fe506bcSReid Kleckner %i.addr = alloca i32, align 4 1960fe506bcSReid Kleckner call void @llvm.dbg.addr(metadata i32* %i.addr, metadata !1, 1970fe506bcSReid Kleckner metadata !DIExpression()), !dbg !2 1980fe506bcSReid Kleckner !1 = !DILocalVariable(name: "i", ...) ; int i 1990fe506bcSReid Kleckner !2 = !DILocation(...) 2000fe506bcSReid Kleckner ... 2010fe506bcSReid Kleckner %buffer = alloca [256 x i8], align 8 2020fe506bcSReid Kleckner ; The address of i is buffer+64. 2030fe506bcSReid Kleckner call void @llvm.dbg.addr(metadata [256 x i8]* %buffer, metadata !3, 2040fe506bcSReid Kleckner metadata !DIExpression(DW_OP_plus, 64)), !dbg !4 2050fe506bcSReid Kleckner !3 = !DILocalVariable(name: "i", ...) ; int i 2060fe506bcSReid Kleckner !4 = !DILocation(...) 2070fe506bcSReid Kleckner 2080fe506bcSReid KlecknerA frontend should generate exactly one call to ``llvm.dbg.addr`` at the point 2090fe506bcSReid Klecknerof declaration of a source variable. Optimization passes that fully promote the 2100fe506bcSReid Klecknervariable from memory to SSA values will replace this call with possibly 2110fe506bcSReid Klecknermultiple calls to `llvm.dbg.value`. Passes that delete stores are effectively 2120fe506bcSReid Klecknerpartial promotion, and they will insert a mix of calls to ``llvm.dbg.value`` 2130fe506bcSReid Klecknerand ``llvm.dbg.addr`` to track the source variable value when it is available. 2140fe506bcSReid KlecknerAfter optimization, there may be multiple calls to ``llvm.dbg.addr`` describing 2150fe506bcSReid Klecknerthe program points where the variables lives in memory. All calls for the same 2160fe506bcSReid Klecknerconcrete source variable must agree on the memory location. 2170fe506bcSReid Kleckner 2186ac1de48SDmitri Gribenko 2196ac1de48SDmitri Gribenko``llvm.dbg.declare`` 2206ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^^ 2216ac1de48SDmitri Gribenko 2226ac1de48SDmitri Gribenko.. code-block:: llvm 2236ac1de48SDmitri Gribenko 224605308a4SMichael Kuperstein void @llvm.dbg.declare(metadata, metadata, metadata) 2256ac1de48SDmitri Gribenko 2260fe506bcSReid KlecknerThis intrinsic is identical to `llvm.dbg.addr`, except that there can only be 2270fe506bcSReid Klecknerone call to `llvm.dbg.declare` for a given concrete `local variable 2280fe506bcSReid Kleckner<LangRef.html#dilocalvariable>`_. It is not control-dependent, meaning that if 2290fe506bcSReid Klecknera call to `llvm.dbg.declare` exists and has a valid location argument, that 2300fe506bcSReid Kleckneraddress is considered to be the true home of the variable across its entire 2310fe506bcSReid Klecknerlifetime. This makes it hard for optimizations to preserve accurate debug info 2320fe506bcSReid Klecknerin the presence of ``llvm.dbg.declare``, so we are transitioning away from it, 2330fe506bcSReid Klecknerand we plan to deprecate it in future LLVM releases. 2346825fb64SAdrian Prantl 2356ac1de48SDmitri Gribenko 2366ac1de48SDmitri Gribenko``llvm.dbg.value`` 2376ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^ 2386ac1de48SDmitri Gribenko 2396ac1de48SDmitri Gribenko.. code-block:: llvm 2406ac1de48SDmitri Gribenko 241abe04759SAdrian Prantl void @llvm.dbg.value(metadata, metadata, metadata) 2426ac1de48SDmitri Gribenko 2436ac1de48SDmitri GribenkoThis intrinsic provides information when a user source variable is set to a new 244593ec59cSVedant Kumarvalue. The first argument is the new value (wrapped as metadata). The second 245abe04759SAdrian Prantlargument is a `local variable <LangRef.html#dilocalvariable>`_ containing a 246593ec59cSVedant Kumardescription of the variable. The third argument is a `complex expression 247abe04759SAdrian Prantl<LangRef.html#diexpression>`_. 2486ac1de48SDmitri Gribenko 2498a05b01dSVedant KumarAn `llvm.dbg.value` intrinsic describes the *value* of a source variable 2508a05b01dSVedant Kumardirectly, not its address. Note that the value operand of this intrinsic may 2518a05b01dSVedant Kumarbe indirect (i.e, a pointer to the source variable), provided that interpreting 2528a05b01dSVedant Kumarthe complex expression derives the direct value. 2538a05b01dSVedant Kumar 2546ac1de48SDmitri GribenkoObject lifetimes and scoping 2556ac1de48SDmitri Gribenko============================ 2566ac1de48SDmitri Gribenko 2576ac1de48SDmitri GribenkoIn many languages, the local variables in functions can have their lifetimes or 2586ac1de48SDmitri Gribenkoscopes limited to a subset of a function. In the C family of languages, for 2596ac1de48SDmitri Gribenkoexample, variables are only live (readable and writable) within the source 2606ac1de48SDmitri Gribenkoblock that they are defined in. In functional languages, values are only 2616ac1de48SDmitri Gribenkoreadable after they have been defined. Though this is a very obvious concept, 2626ac1de48SDmitri Gribenkoit is non-trivial to model in LLVM, because it has no notion of scoping in this 2636ac1de48SDmitri Gribenkosense, and does not want to be tied to a language's scoping rules. 2646ac1de48SDmitri Gribenko 2656ac1de48SDmitri GribenkoIn order to handle this, the LLVM debug format uses the metadata attached to 2666ac1de48SDmitri Gribenkollvm instructions to encode line number and scoping information. Consider the 2676ac1de48SDmitri Gribenkofollowing C fragment, for example: 2686ac1de48SDmitri Gribenko 2696ac1de48SDmitri Gribenko.. code-block:: c 2706ac1de48SDmitri Gribenko 2716ac1de48SDmitri Gribenko 1. void foo() { 2726ac1de48SDmitri Gribenko 2. int X = 21; 2736ac1de48SDmitri Gribenko 3. int Y = 22; 2746ac1de48SDmitri Gribenko 4. { 2756ac1de48SDmitri Gribenko 5. int Z = 23; 2766ac1de48SDmitri Gribenko 6. Z = X; 2776ac1de48SDmitri Gribenko 7. } 2786ac1de48SDmitri Gribenko 8. X = Y; 2796ac1de48SDmitri Gribenko 9. } 2806ac1de48SDmitri Gribenko 2810fe506bcSReid Kleckner.. FIXME: Update the following example to use llvm.dbg.addr once that is the 2820fe506bcSReid Kleckner default in clang. 2830fe506bcSReid Kleckner 2846ac1de48SDmitri GribenkoCompiled to LLVM, this function would be represented like this: 2856ac1de48SDmitri Gribenko 286124f2593SRenato Golin.. code-block:: text 2876ac1de48SDmitri Gribenko 288d937cd9fSDuncan P. N. Exon Smith ; Function Attrs: nounwind ssp uwtable 28950108683SPeter Collingbourne define void @foo() #0 !dbg !4 { 2906ac1de48SDmitri Gribenko entry: 291e814a37aSBill Wendling %X = alloca i32, align 4 292e814a37aSBill Wendling %Y = alloca i32, align 4 293e814a37aSBill Wendling %Z = alloca i32, align 4 29405963a3dSArthur Eubanks call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14 29505963a3dSArthur Eubanks store i32 21, i32* %X, align 4, !dbg !14 29605963a3dSArthur Eubanks call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16 29705963a3dSArthur Eubanks store i32 22, i32* %Y, align 4, !dbg !16 29805963a3dSArthur Eubanks call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19 29905963a3dSArthur Eubanks store i32 23, i32* %Z, align 4, !dbg !19 30005963a3dSArthur Eubanks %0 = load i32, i32* %X, align 4, !dbg !20 30105963a3dSArthur Eubanks store i32 %0, i32* %Z, align 4, !dbg !21 30205963a3dSArthur Eubanks %1 = load i32, i32* %Y, align 4, !dbg !22 30305963a3dSArthur Eubanks store i32 %1, i32* %X, align 4, !dbg !23 30405963a3dSArthur Eubanks ret void, !dbg !24 3056ac1de48SDmitri Gribenko } 3066ac1de48SDmitri Gribenko 307c4fe5db1SDavid Blaikie ; Function Attrs: nounwind readnone 308d937cd9fSDuncan P. N. Exon Smith declare void @llvm.dbg.declare(metadata, metadata, metadata) #1 3096ac1de48SDmitri Gribenko 310502a77f1SFangrui Song attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } 311c4fe5db1SDavid Blaikie attributes #1 = { nounwind readnone } 312c4fe5db1SDavid Blaikie 313c4fe5db1SDavid Blaikie !llvm.dbg.cu = !{!0} 314d937cd9fSDuncan P. N. Exon Smith !llvm.module.flags = !{!7, !8, !9} 315d937cd9fSDuncan P. N. Exon Smith !llvm.ident = !{!10} 316c4fe5db1SDavid Blaikie 317b8089516SAdrian Prantl !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2) 318a9308c49SDuncan P. N. Exon Smith !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info") 319d937cd9fSDuncan P. N. Exon Smith !2 = !{} 320d937cd9fSDuncan P. N. Exon Smith !3 = !{!4} 321f623dc9aSEllis Hoag !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, retainedNodes: !2) 322a9308c49SDuncan P. N. Exon Smith !5 = !DISubroutineType(types: !6) 323d937cd9fSDuncan P. N. Exon Smith !6 = !{null} 324d937cd9fSDuncan P. N. Exon Smith !7 = !{i32 2, !"Dwarf Version", i32 2} 325d937cd9fSDuncan P. N. Exon Smith !8 = !{i32 2, !"Debug Info Version", i32 3} 326d937cd9fSDuncan P. N. Exon Smith !9 = !{i32 1, !"PIC Level", i32 2} 327d937cd9fSDuncan P. N. Exon Smith !10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"} 328ed013cd2SDuncan P. N. Exon Smith !11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12) 329a9308c49SDuncan P. N. Exon Smith !12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed) 33005963a3dSArthur Eubanks !13 = !DIExpression() 33105963a3dSArthur Eubanks !14 = !DILocation(line: 2, column: 9, scope: !4) 33205963a3dSArthur Eubanks !15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12) 33305963a3dSArthur Eubanks !16 = !DILocation(line: 3, column: 9, scope: !4) 33405963a3dSArthur Eubanks !17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12) 33505963a3dSArthur Eubanks !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5) 33605963a3dSArthur Eubanks !19 = !DILocation(line: 5, column: 11, scope: !18) 33705963a3dSArthur Eubanks !20 = !DILocation(line: 6, column: 11, scope: !18) 33805963a3dSArthur Eubanks !21 = !DILocation(line: 6, column: 9, scope: !18) 33905963a3dSArthur Eubanks !22 = !DILocation(line: 8, column: 9, scope: !4) 34005963a3dSArthur Eubanks !23 = !DILocation(line: 8, column: 7, scope: !4) 34105963a3dSArthur Eubanks !24 = !DILocation(line: 9, column: 3, scope: !4) 342d937cd9fSDuncan P. N. Exon Smith 3436ac1de48SDmitri Gribenko 3446ac1de48SDmitri GribenkoThis example illustrates a few important details about LLVM debugging 3456ac1de48SDmitri Gribenkoinformation. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and 3466ac1de48SDmitri Gribenkolocation information, which are attached to an instruction, are applied 3476ac1de48SDmitri Gribenkotogether to allow a debugger to analyze the relationship between statements, 3486ac1de48SDmitri Gribenkovariable definitions, and the code used to implement the function. 3496ac1de48SDmitri Gribenko 3506ac1de48SDmitri Gribenko.. code-block:: llvm 3516ac1de48SDmitri Gribenko 35205963a3dSArthur Eubanks call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14 353c4fe5db1SDavid Blaikie ; [debug line = 2:7] [debug variable = X] 3546ac1de48SDmitri Gribenko 3556ac1de48SDmitri GribenkoThe first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the 35605963a3dSArthur Eubanksvariable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides 3576ac1de48SDmitri Gribenkoscope information for the variable ``X``. 3586ac1de48SDmitri Gribenko 359124f2593SRenato Golin.. code-block:: text 3606ac1de48SDmitri Gribenko 36105963a3dSArthur Eubanks !14 = !DILocation(line: 2, column: 9, scope: !4) 36250108683SPeter Collingbourne !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, 363d937cd9fSDuncan P. N. Exon Smith isLocal: false, isDefinition: true, scopeLine: 1, 364f623dc9aSEllis Hoag isOptimized: false, retainedNodes: !2) 3656ac1de48SDmitri Gribenko 36605963a3dSArthur EubanksHere ``!14`` is metadata providing `location information 367605308a4SMichael Kuperstein<LangRef.html#dilocation>`_. In this example, scope is encoded by ``!4``, a 368605308a4SMichael Kuperstein`subprogram descriptor <LangRef.html#disubprogram>`_. This way the location 3696ac1de48SDmitri Gribenkoinformation attached to the intrinsics indicates that the variable ``X`` is 3706ac1de48SDmitri Gribenkodeclared at line number 2 at a function level scope in function ``foo``. 3716ac1de48SDmitri Gribenko 3726ac1de48SDmitri GribenkoNow lets take another example. 3736ac1de48SDmitri Gribenko 3746ac1de48SDmitri Gribenko.. code-block:: llvm 3756ac1de48SDmitri Gribenko 37605963a3dSArthur Eubanks call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19 377c4fe5db1SDavid Blaikie ; [debug line = 5:9] [debug variable = Z] 3786ac1de48SDmitri Gribenko 379c4fe5db1SDavid BlaikieThe third intrinsic ``%llvm.dbg.declare`` encodes debugging information for 38005963a3dSArthur Eubanksvariable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides 3816ac1de48SDmitri Gribenkoscope information for the variable ``Z``. 3826ac1de48SDmitri Gribenko 383124f2593SRenato Golin.. code-block:: text 3846ac1de48SDmitri Gribenko 38505963a3dSArthur Eubanks !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5) 38605963a3dSArthur Eubanks !19 = !DILocation(line: 5, column: 11, scope: !18) 3876ac1de48SDmitri Gribenko 38805963a3dSArthur EubanksHere ``!19`` indicates that ``Z`` is declared at line number 5 and column 38905963a3dSArthur Eubanksnumber 11 inside of lexical scope ``!18``. The lexical scope itself resides 390d937cd9fSDuncan P. N. Exon Smithinside of subprogram ``!4`` described above. 3916ac1de48SDmitri Gribenko 3926ac1de48SDmitri GribenkoThe scope information attached with each instruction provides a straightforward 3936ac1de48SDmitri Gribenkoway to find instructions covered by a scope. 3946ac1de48SDmitri Gribenko 39566943c32SJeremy MorseObject lifetime in optimized code 39666943c32SJeremy Morse================================= 39766943c32SJeremy Morse 39866943c32SJeremy MorseIn the example above, every variable assignment uniquely corresponds to a 39966943c32SJeremy Morsememory store to the variable's position on the stack. However in heavily 40066943c32SJeremy Morseoptimized code LLVM promotes most variables into SSA values, which can 40166943c32SJeremy Morseeventually be placed in physical registers or memory locations. To track SSA 40266943c32SJeremy Morsevalues through compilation, when objects are promoted to SSA values an 40366943c32SJeremy Morse``llvm.dbg.value`` intrinsic is created for each assignment, recording the 40466943c32SJeremy Morsevariable's new location. Compared with the ``llvm.dbg.declare`` intrinsic: 40566943c32SJeremy Morse 4069370a741SAdrian Prantl* A dbg.value terminates the effect of any preceding dbg.values for (any 40766943c32SJeremy Morse overlapping fragments of) the specified variable. 40866943c32SJeremy Morse* The dbg.value's position in the IR defines where in the instruction stream 40966943c32SJeremy Morse the variable's value changes. 41066943c32SJeremy Morse* Operands can be constants, indicating the variable is assigned a 41166943c32SJeremy Morse constant value. 41266943c32SJeremy Morse 41366943c32SJeremy MorseCare must be taken to update ``llvm.dbg.value`` intrinsics when optimization 41466943c32SJeremy Morsepasses alter or move instructions and blocks -- the developer could observe such 41566943c32SJeremy Morsechanges reflected in the value of variables when debugging the program. For any 41666943c32SJeremy Morseexecution of the optimized program, the set of variable values presented to the 41766943c32SJeremy Morsedeveloper by the debugger should not show a state that would never have existed 41866943c32SJeremy Morsein the execution of the unoptimized program, given the same input. Doing so 41966943c32SJeremy Morserisks misleading the developer by reporting a state that does not exist, 42066943c32SJeremy Morsedamaging their understanding of the optimized program and undermining their 42166943c32SJeremy Morsetrust in the debugger. 42266943c32SJeremy Morse 42366943c32SJeremy MorseSometimes perfectly preserving variable locations is not possible, often when a 42466943c32SJeremy Morseredundant calculation is optimized out. In such cases, a ``llvm.dbg.value`` 42566943c32SJeremy Morsewith operand ``undef`` should be used, to terminate earlier variable locations 42666943c32SJeremy Morseand let the debugger present ``optimized out`` to the developer. Withholding 42766943c32SJeremy Morsethese potentially stale variable values from the developer diminishes the 42866943c32SJeremy Morseamount of available debug information, but increases the reliability of the 42966943c32SJeremy Morseremaining information. 43066943c32SJeremy Morse 43166943c32SJeremy MorseTo illustrate some potential issues, consider the following example: 43266943c32SJeremy Morse 43366943c32SJeremy Morse.. code-block:: llvm 43466943c32SJeremy Morse 43566943c32SJeremy Morse define i32 @foo(i32 %bar, i1 %cond) { 43666943c32SJeremy Morse entry: 43766943c32SJeremy Morse call @llvm.dbg.value(metadata i32 0, metadata !1, metadata !2) 43866943c32SJeremy Morse br i1 %cond, label %truebr, label %falsebr 43966943c32SJeremy Morse truebr: 44066943c32SJeremy Morse %tval = add i32 %bar, 1 44166943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %tval, metadata !1, metadata !2) 44266943c32SJeremy Morse %g1 = call i32 @gazonk() 44366943c32SJeremy Morse br label %exit 44466943c32SJeremy Morse falsebr: 44566943c32SJeremy Morse %fval = add i32 %bar, 2 44666943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %fval, metadata !1, metadata !2) 44766943c32SJeremy Morse %g2 = call i32 @gazonk() 44866943c32SJeremy Morse br label %exit 44966943c32SJeremy Morse exit: 45066943c32SJeremy Morse %merge = phi [ %tval, %truebr ], [ %fval, %falsebr ] 45166943c32SJeremy Morse %g = phi [ %g1, %truebr ], [ %g2, %falsebr ] 45266943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %merge, metadata !1, metadata !2) 45366943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %g, metadata !3, metadata !2) 45466943c32SJeremy Morse %plusten = add i32 %merge, 10 45566943c32SJeremy Morse %toret = add i32 %plusten, %g 45666943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %toret, metadata !1, metadata !2) 45766943c32SJeremy Morse ret i32 %toret 45866943c32SJeremy Morse } 45966943c32SJeremy Morse 46066943c32SJeremy MorseContaining two source-level variables in ``!1`` and ``!3``. The function could, 46166943c32SJeremy Morseperhaps, be optimized into the following code: 46266943c32SJeremy Morse 46366943c32SJeremy Morse.. code-block:: llvm 46466943c32SJeremy Morse 46566943c32SJeremy Morse define i32 @foo(i32 %bar, i1 %cond) { 46666943c32SJeremy Morse entry: 46766943c32SJeremy Morse %g = call i32 @gazonk() 46866943c32SJeremy Morse %addoper = select i1 %cond, i32 11, i32 12 46966943c32SJeremy Morse %plusten = add i32 %bar, %addoper 47066943c32SJeremy Morse %toret = add i32 %plusten, %g 47166943c32SJeremy Morse ret i32 %toret 47266943c32SJeremy Morse } 47366943c32SJeremy Morse 47466943c32SJeremy MorseWhat ``llvm.dbg.value`` intrinsics should be placed to represent the original variable 475e334a3a6SHans Wennborglocations in this code? Unfortunately the second, third and fourth 47666943c32SJeremy Morsedbg.values for ``!1`` in the source function have had their operands 47766943c32SJeremy Morse(%tval, %fval, %merge) optimized out. Assuming we cannot recover them, we 47866943c32SJeremy Morsemight consider this placement of dbg.values: 47966943c32SJeremy Morse 48066943c32SJeremy Morse.. code-block:: llvm 48166943c32SJeremy Morse 48266943c32SJeremy Morse define i32 @foo(i32 %bar, i1 %cond) { 48366943c32SJeremy Morse entry: 48466943c32SJeremy Morse call @llvm.dbg.value(metadata i32 0, metadata !1, metadata !2) 48566943c32SJeremy Morse %g = call i32 @gazonk() 48666943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %g, metadata !3, metadata !2) 48766943c32SJeremy Morse %addoper = select i1 %cond, i32 11, i32 12 48866943c32SJeremy Morse %plusten = add i32 %bar, %addoper 48966943c32SJeremy Morse %toret = add i32 %plusten, %g 49066943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %toret, metadata !1, metadata !2) 49166943c32SJeremy Morse ret i32 %toret 49266943c32SJeremy Morse } 49366943c32SJeremy Morse 49466943c32SJeremy MorseHowever, this will cause ``!3`` to have the return value of ``@gazonk()`` at 49566943c32SJeremy Morsethe same time as ``!1`` has the constant value zero -- a pair of assignments 49666943c32SJeremy Morsethat never occurred in the unoptimized program. To avoid this, we must terminate 49766943c32SJeremy Morsethe range that ``!1`` has the constant value assignment by inserting an undef 49866943c32SJeremy Morsedbg.value before the dbg.value for ``!3``: 49966943c32SJeremy Morse 50066943c32SJeremy Morse.. code-block:: llvm 50166943c32SJeremy Morse 50266943c32SJeremy Morse define i32 @foo(i32 %bar, i1 %cond) { 50366943c32SJeremy Morse entry: 50466943c32SJeremy Morse call @llvm.dbg.value(metadata i32 0, metadata !1, metadata !2) 50566943c32SJeremy Morse %g = call i32 @gazonk() 50666943c32SJeremy Morse call @llvm.dbg.value(metadata i32 undef, metadata !1, metadata !2) 50766943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %g, metadata !3, metadata !2) 50866943c32SJeremy Morse %addoper = select i1 %cond, i32 11, i32 12 50966943c32SJeremy Morse %plusten = add i32 %bar, %addoper 51066943c32SJeremy Morse %toret = add i32 %plusten, %g 51166943c32SJeremy Morse call @llvm.dbg.value(metadata i32 %toret, metadata !1, metadata !2) 51266943c32SJeremy Morse ret i32 %toret 51366943c32SJeremy Morse } 51466943c32SJeremy Morse 51566943c32SJeremy MorseIn general, if any dbg.value has its operand optimized out and cannot be 51666943c32SJeremy Morserecovered, then an undef dbg.value is necessary to terminate earlier variable 51766943c32SJeremy Morselocations. Additional undef dbg.values may be necessary when the debugger can 51866943c32SJeremy Morseobserve re-ordering of assignments. 51966943c32SJeremy Morse 52038803920SJeremy MorseHow variable location metadata is transformed during CodeGen 52138803920SJeremy Morse============================================================ 52238803920SJeremy Morse 52338803920SJeremy MorseLLVM preserves debug information throughout mid-level and backend passes, 52438803920SJeremy Morseultimately producing a mapping between source-level information and 52538803920SJeremy Morseinstruction ranges. This 52638803920SJeremy Morseis relatively straightforwards for line number information, as mapping 52738803920SJeremy Morseinstructions to line numbers is a simple association. For variable locations 52838803920SJeremy Morsehowever the story is more complex. As each ``llvm.dbg.value`` intrinsic 52938803920SJeremy Morserepresents a source-level assignment of a value to a source variable, the 53038803920SJeremy Morsevariable location intrinsics effectively embed a small imperative program 53138803920SJeremy Morsewithin the LLVM IR. By the end of CodeGen, this becomes a mapping from each 53238803920SJeremy Morsevariable to their machine locations over ranges of instructions. 53338803920SJeremy MorseFrom IR to object emission, the major transformations which affect variable 53438803920SJeremy Morselocation fidelity are: 535a1a4f5f1SJeremy Morse 53638803920SJeremy Morse1. Instruction Selection 53738803920SJeremy Morse2. Register allocation 53838803920SJeremy Morse3. Block layout 53938803920SJeremy Morse 54038803920SJeremy Morseeach of which are discussed below. In addition, instruction scheduling can 54138803920SJeremy Morsesignificantly change the ordering of the program, and occurs in a number of 54238803920SJeremy Morsedifferent passes. 54338803920SJeremy Morse 544a1a4f5f1SJeremy MorseSome variable locations are not transformed during CodeGen. Stack locations 545a1a4f5f1SJeremy Morsespecified by ``llvm.dbg.declare`` are valid and unchanging for the entire 546a1a4f5f1SJeremy Morseduration of the function, and are recorded in a simple MachineFunction table. 547a1a4f5f1SJeremy MorseLocation changes in the prologue and epilogue of a function are also ignored: 548a1a4f5f1SJeremy Morseframe setup and destruction may take several instructions, require a 549a1a4f5f1SJeremy Morsedisproportionate amount of debugging information in the output binary to 550a1a4f5f1SJeremy Morsedescribe, and should be stepped over by debuggers anyway. 551a1a4f5f1SJeremy Morse 55238803920SJeremy MorseVariable locations in Instruction Selection and MIR 55338803920SJeremy Morse--------------------------------------------------- 55438803920SJeremy Morse 55538803920SJeremy MorseInstruction selection creates a MIR function from an IR function, and just as 55638803920SJeremy Morseit transforms ``intermediate`` instructions into machine instructions, so must 55738803920SJeremy Morse``intermediate`` variable locations become machine variable locations. 55838803920SJeremy MorseWithin IR, variable locations are always identified by a Value, but in MIR 55938803920SJeremy Morsethere can be different types of variable locations. In addition, some IR 56038803920SJeremy Morselocations become unavailable, for example if the operation of multiple IR 56138803920SJeremy Morseinstructions are combined into one machine instruction (such as 56238803920SJeremy Morsemultiply-and-accumulate) then intermediate Values are lost. To track variable 56338803920SJeremy Morselocations through instruction selection, they are first separated into 56438803920SJeremy Morselocations that do not depend on code generation (constants, stack locations, 56538803920SJeremy Morseallocated virtual registers) and those that do. For those that do, debug 56638803920SJeremy Morsemetadata is attached to SDNodes in SelectionDAGs. After instruction selection 56738803920SJeremy Morsehas occurred and a MIR function is created, if the SDNode associated with debug 56838803920SJeremy Morsemetadata is allocated a virtual register, that virtual register is used as the 56938803920SJeremy Morsevariable location. If the SDNode is folded into a machine instruction or 57038803920SJeremy Morseotherwise transformed into a non-register, the variable location becomes 57138803920SJeremy Morseunavailable. 57238803920SJeremy Morse 57338803920SJeremy MorseLocations that are unavailable are treated as if they have been optimized out: 57438803920SJeremy Morsein IR the location would be assigned ``undef`` by a debug intrinsic, and in MIR 57538803920SJeremy Morsethe equivalent location is used. 57638803920SJeremy Morse 57738803920SJeremy MorseAfter MIR locations are assigned to each variable, machine pseudo-instructions 57838803920SJeremy Morsecorresponding to each ``llvm.dbg.value`` and ``llvm.dbg.addr`` intrinsic are 579f6774130SStephen Tozerinserted. There are two forms of this type of instruction. 580f6774130SStephen Tozer 581f6774130SStephen TozerThe first form, ``DBG_VALUE``, appears thus: 58238803920SJeremy Morse 58338803920SJeremy Morse.. code-block:: text 58438803920SJeremy Morse 58538803920SJeremy Morse DBG_VALUE %1, $noreg, !123, !DIExpression() 58638803920SJeremy Morse 587f6774130SStephen TozerAnd has the following operands: 588a1a4f5f1SJeremy Morse * The first operand can record the variable location as a register, 589a1a4f5f1SJeremy Morse a frame index, an immediate, or the base address register if the original 590a1a4f5f1SJeremy Morse debug intrinsic referred to memory. ``$noreg`` indicates the variable 591a1a4f5f1SJeremy Morse location is undefined, equivalent to an ``undef`` dbg.value operand. 59238803920SJeremy Morse * The type of the second operand indicates whether the variable location is 59338803920SJeremy Morse directly referred to by the DBG_VALUE, or whether it is indirect. The 59438803920SJeremy Morse ``$noreg`` register signifies the former, an immediate operand (0) the 59538803920SJeremy Morse latter. 59638803920SJeremy Morse * Operand 3 is the Variable field of the original debug intrinsic. 59738803920SJeremy Morse * Operand 4 is the Expression field of the original debug intrinsic. 59838803920SJeremy Morse 599f6774130SStephen TozerThe second form, ``DBG_VALUE_LIST``, appears thus: 600f6774130SStephen Tozer 601f6774130SStephen Tozer.. code-block:: text 602f6774130SStephen Tozer 603f6774130SStephen Tozer DBG_VALUE_LIST !123, !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus), %1, %2 604f6774130SStephen Tozer 605f6774130SStephen TozerAnd has the following operands: 606f6774130SStephen Tozer * The first operand is the Variable field of the original debug intrinsic. 607f6774130SStephen Tozer * The second operand is the Expression field of the original debug intrinsic. 608f6774130SStephen Tozer * Any number of operands, from the 3rd onwards, record a sequence of variable 609f6774130SStephen Tozer location operands, which may take any of the same values as the first 610f6774130SStephen Tozer operand of the ``DBG_VALUE`` instruction above. These variable location 611f6774130SStephen Tozer operands are inserted into the final DWARF Expression in positions indicated 612f6774130SStephen Tozer by the DW_OP_LLVM_arg operator in the `DIExpression 613f6774130SStephen Tozer <LangRef.html#diexpression>`. 614f6774130SStephen Tozer 61538803920SJeremy MorseThe position at which the DBG_VALUEs are inserted should correspond to the 61638803920SJeremy Morsepositions of their matching ``llvm.dbg.value`` intrinsics in the IR block. As 61738803920SJeremy Morsewith optimization, LLVM aims to preserve the order in which variable 61838803920SJeremy Morseassignments occurred in the source program. However SelectionDAG performs some 61938803920SJeremy Morseinstruction scheduling, which can reorder assignments (discussed below). 62038803920SJeremy MorseFunction parameter locations are moved to the beginning of the function if 62138803920SJeremy Morsethey're not already, to ensure they're immediately available on function entry. 62238803920SJeremy Morse 62338803920SJeremy MorseTo demonstrate variable locations during instruction selection, consider 62438803920SJeremy Morsethe following example: 62538803920SJeremy Morse 62638803920SJeremy Morse.. code-block:: llvm 62738803920SJeremy Morse 62838803920SJeremy Morse define i32 @foo(i32* %addr) { 62938803920SJeremy Morse entry: 63038803920SJeremy Morse call void @llvm.dbg.value(metadata i32 0, metadata !3, metadata !DIExpression()), !dbg !5 63138803920SJeremy Morse br label %bb1, !dbg !5 63238803920SJeremy Morse 63338803920SJeremy Morse bb1: ; preds = %bb1, %entry 63438803920SJeremy Morse %bar.0 = phi i32 [ 0, %entry ], [ %add, %bb1 ] 63538803920SJeremy Morse call void @llvm.dbg.value(metadata i32 %bar.0, metadata !3, metadata !DIExpression()), !dbg !5 63638803920SJeremy Morse %addr1 = getelementptr i32, i32 *%addr, i32 1, !dbg !5 63738803920SJeremy Morse call void @llvm.dbg.value(metadata i32 *%addr1, metadata !3, metadata !DIExpression()), !dbg !5 63838803920SJeremy Morse %loaded1 = load i32, i32* %addr1, !dbg !5 63938803920SJeremy Morse %addr2 = getelementptr i32, i32 *%addr, i32 %bar.0, !dbg !5 64038803920SJeremy Morse call void @llvm.dbg.value(metadata i32 *%addr2, metadata !3, metadata !DIExpression()), !dbg !5 64138803920SJeremy Morse %loaded2 = load i32, i32* %addr2, !dbg !5 64238803920SJeremy Morse %add = add i32 %bar.0, 1, !dbg !5 64338803920SJeremy Morse call void @llvm.dbg.value(metadata i32 %add, metadata !3, metadata !DIExpression()), !dbg !5 64438803920SJeremy Morse %added = add i32 %loaded1, %loaded2 64538803920SJeremy Morse %cond = icmp ult i32 %added, %bar.0, !dbg !5 64638803920SJeremy Morse br i1 %cond, label %bb1, label %bb2, !dbg !5 64738803920SJeremy Morse 64838803920SJeremy Morse bb2: ; preds = %bb1 64938803920SJeremy Morse ret i32 0, !dbg !5 65038803920SJeremy Morse } 65138803920SJeremy Morse 65238803920SJeremy MorseIf one compiles this IR with ``llc -o - -start-after=codegen-prepare -stop-after=expand-isel-pseudos -mtriple=x86_64--``, the following MIR is produced: 65338803920SJeremy Morse 65438803920SJeremy Morse.. code-block:: text 65538803920SJeremy Morse 65638803920SJeremy Morse bb.0.entry: 65738803920SJeremy Morse successors: %bb.1(0x80000000) 65838803920SJeremy Morse liveins: $rdi 65938803920SJeremy Morse 66038803920SJeremy Morse %2:gr64 = COPY $rdi 66138803920SJeremy Morse %3:gr32 = MOV32r0 implicit-def dead $eflags 66238803920SJeremy Morse DBG_VALUE 0, $noreg, !3, !DIExpression(), debug-location !5 66338803920SJeremy Morse 66438803920SJeremy Morse bb.1.bb1: 66538803920SJeremy Morse successors: %bb.1(0x7c000000), %bb.2(0x04000000) 66638803920SJeremy Morse 66738803920SJeremy Morse %0:gr32 = PHI %3, %bb.0, %1, %bb.1 66838803920SJeremy Morse DBG_VALUE %0, $noreg, !3, !DIExpression(), debug-location !5 66938803920SJeremy Morse DBG_VALUE %2, $noreg, !3, !DIExpression(DW_OP_plus_uconst, 4, DW_OP_stack_value), debug-location !5 67038803920SJeremy Morse %4:gr32 = MOV32rm %2, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1) 67138803920SJeremy Morse %5:gr64_nosp = MOVSX64rr32 %0, debug-location !5 67238803920SJeremy Morse DBG_VALUE $noreg, $noreg, !3, !DIExpression(), debug-location !5 67338803920SJeremy Morse %1:gr32 = INC32r %0, implicit-def dead $eflags, debug-location !5 67438803920SJeremy Morse DBG_VALUE %1, $noreg, !3, !DIExpression(), debug-location !5 67538803920SJeremy Morse %6:gr32 = ADD32rm %4, %2, 4, killed %5, 0, $noreg, implicit-def dead $eflags :: (load 4 from %ir.addr2) 67638803920SJeremy Morse %7:gr32 = SUB32rr %6, %0, implicit-def $eflags, debug-location !5 67738803920SJeremy Morse JB_1 %bb.1, implicit $eflags, debug-location !5 67838803920SJeremy Morse JMP_1 %bb.2, debug-location !5 67938803920SJeremy Morse 68038803920SJeremy Morse bb.2.bb2: 68138803920SJeremy Morse %8:gr32 = MOV32r0 implicit-def dead $eflags 68238803920SJeremy Morse $eax = COPY %8, debug-location !5 68338803920SJeremy Morse RET 0, $eax, debug-location !5 68438803920SJeremy Morse 68538803920SJeremy MorseObserve first that there is a DBG_VALUE instruction for every ``llvm.dbg.value`` 68638803920SJeremy Morseintrinsic in the source IR, ensuring no source level assignments go missing. 68738803920SJeremy MorseThen consider the different ways in which variable locations have been recorded: 68838803920SJeremy Morse 68938803920SJeremy Morse* For the first dbg.value an immediate operand is used to record a zero value. 69038803920SJeremy Morse* The dbg.value of the PHI instruction leads to a DBG_VALUE of virtual register 69138803920SJeremy Morse ``%0``. 69238803920SJeremy Morse* The first GEP has its effect folded into the first load instruction 69338803920SJeremy Morse (as a 4-byte offset), but the variable location is salvaged by folding 69438803920SJeremy Morse the GEPs effect into the DIExpression. 69538803920SJeremy Morse* The second GEP is also folded into the corresponding load. However, it is 69638803920SJeremy Morse insufficiently simple to be salvaged, and is emitted as a ``$noreg`` 69738803920SJeremy Morse DBG_VALUE, indicating that the variable takes on an undefined location. 69838803920SJeremy Morse* The final dbg.value has its Value placed in virtual register ``%1``. 69938803920SJeremy Morse 70038803920SJeremy MorseInstruction Scheduling 70138803920SJeremy Morse---------------------- 70238803920SJeremy Morse 70338803920SJeremy MorseA number of passes can reschedule instructions, notably instruction selection 70438803920SJeremy Morseand the pre-and-post RA machine schedulers. Instruction scheduling can 70538803920SJeremy Morsesignificantly change the nature of the program -- in the (very unlikely) worst 70638803920SJeremy Morsecase the instruction sequence could be completely reversed. In such 70738803920SJeremy Morsecircumstances LLVM follows the principle applied to optimizations, that it is 70838803920SJeremy Morsebetter for the debugger not to display any state than a misleading state. 70938803920SJeremy MorseThus, whenever instructions are advanced in order of execution, any 71038803920SJeremy Morsecorresponding DBG_VALUE is kept in its original position, and if an instruction 71138803920SJeremy Morseis delayed then the variable is given an undefined location for the duration 71238803920SJeremy Morseof the delay. To illustrate, consider this pseudo-MIR: 71338803920SJeremy Morse 71438803920SJeremy Morse.. code-block:: text 71538803920SJeremy Morse 71638803920SJeremy Morse %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1) 71738803920SJeremy Morse DBG_VALUE %1, $noreg, !1, !2 71838803920SJeremy Morse %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags 71938803920SJeremy Morse DBG_VALUE %4, $noreg, !3, !4 72038803920SJeremy Morse %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags 72138803920SJeremy Morse DBG_VALUE %7, $noreg, !5, !6 72238803920SJeremy Morse 72338803920SJeremy MorseImagine that the SUB32rr were moved forward to give us the following MIR: 72438803920SJeremy Morse 72538803920SJeremy Morse.. code-block:: text 72638803920SJeremy Morse 72738803920SJeremy Morse %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags 72838803920SJeremy Morse %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1) 72938803920SJeremy Morse DBG_VALUE %1, $noreg, !1, !2 73038803920SJeremy Morse %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags 73138803920SJeremy Morse DBG_VALUE %4, $noreg, !3, !4 73238803920SJeremy Morse DBG_VALUE %7, $noreg, !5, !6 73338803920SJeremy Morse 73438803920SJeremy MorseIn this circumstance LLVM would leave the MIR as shown above. Were we to move 73538803920SJeremy Morsethe DBG_VALUE of virtual register %7 upwards with the SUB32rr, we would re-order 7369370a741SAdrian Prantlassignments and introduce a new state of the program. Whereas with the solution 73738803920SJeremy Morseabove, the debugger will see one fewer combination of variable values, because 73838803920SJeremy Morse``!3`` and ``!5`` will change value at the same time. This is preferred over 73938803920SJeremy Morsemisrepresenting the original program. 74038803920SJeremy Morse 74138803920SJeremy MorseIn comparison, if one sunk the MOV32rm, LLVM would produce the following: 74238803920SJeremy Morse 74338803920SJeremy Morse.. code-block:: text 74438803920SJeremy Morse 74538803920SJeremy Morse DBG_VALUE $noreg, $noreg, !1, !2 74638803920SJeremy Morse %4:gr32 = ADD32rr %3, %2, implicit-def dead $eflags 74738803920SJeremy Morse DBG_VALUE %4, $noreg, !3, !4 74838803920SJeremy Morse %7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags 74938803920SJeremy Morse DBG_VALUE %7, $noreg, !5, !6 75038803920SJeremy Morse %1:gr32 = MOV32rm %0, 1, $noreg, 4, $noreg, debug-location !5 :: (load 4 from %ir.addr1) 75138803920SJeremy Morse DBG_VALUE %1, $noreg, !1, !2 75238803920SJeremy Morse 75338803920SJeremy MorseHere, to avoid presenting a state in which the first assignment to ``!1`` 75438803920SJeremy Morsedisappears, the DBG_VALUE at the top of the block assigns the variable the 75538803920SJeremy Morseundefined location, until its value is available at the end of the block where 75638803920SJeremy Morsean additional DBG_VALUE is added. Were any other DBG_VALUE for ``!1`` to occur 75738803920SJeremy Morsein the instructions that the MOV32rm was sunk past, the DBG_VALUE for ``%1`` 75838803920SJeremy Morsewould be dropped and the debugger would never observe it in the variable. This 75938803920SJeremy Morseaccurately reflects that the value is not available during the corresponding 76038803920SJeremy Morseportion of the original program. 76138803920SJeremy Morse 76238803920SJeremy MorseVariable locations during Register Allocation 76338803920SJeremy Morse--------------------------------------------- 76438803920SJeremy Morse 76538803920SJeremy MorseTo avoid debug instructions interfering with the register allocator, the 76638803920SJeremy MorseLiveDebugVariables pass extracts variable locations from a MIR function and 76738803920SJeremy Morsedeletes the corresponding DBG_VALUE instructions. Some localized copy 76838803920SJeremy Morsepropagation is performed within blocks. After register allocation, the 7699370a741SAdrian PrantlVirtRegRewriter pass re-inserts DBG_VALUE instructions in their original 77038803920SJeremy Morsepositions, translating virtual register references into their physical 77138803920SJeremy Morsemachine locations. To avoid encoding incorrect variable locations, in this 77238803920SJeremy Morsepass any DBG_VALUE of a virtual register that is not live, is replaced by 773df686842SDjordje Todorovicthe undefined location. The LiveDebugVariables may insert redundant DBG_VALUEs 774df686842SDjordje Todorovicbecause of virtual register rewriting. These will be subsequently removed by 775df686842SDjordje Todorovicthe RemoveRedundantDebugValues pass. 77638803920SJeremy Morse 77738803920SJeremy MorseLiveDebugValues expansion of variable locations 77838803920SJeremy Morse----------------------------------------------- 77938803920SJeremy Morse 78038803920SJeremy MorseAfter all optimizations have run and shortly before emission, the 78138803920SJeremy MorseLiveDebugValues pass runs to achieve two aims: 78238803920SJeremy Morse 78338803920SJeremy Morse* To propagate the location of variables through copies and register spills, 78438803920SJeremy Morse* For every block, to record every valid variable location in that block. 78538803920SJeremy Morse 78638803920SJeremy MorseAfter this pass the DBG_VALUE instruction changes meaning: rather than 78738803920SJeremy Morsecorresponding to a source-level assignment where the variable may change value, 78838803920SJeremy Morseit asserts the location of a variable in a block, and loses effect outside the 78938803920SJeremy Morseblock. Propagating variable locations through copies and spills is 79038803920SJeremy Morsestraightforwards: determining the variable location in every basic block 7919370a741SAdrian Prantlrequires the consideration of control flow. Consider the following IR, which 79238803920SJeremy Morsepresents several difficulties: 79338803920SJeremy Morse 7947a112c44SJeremy Morse.. code-block:: text 79538803920SJeremy Morse 79638803920SJeremy Morse define dso_local i32 @foo(i1 %cond, i32 %input) !dbg !12 { 79738803920SJeremy Morse entry: 79838803920SJeremy Morse br i1 %cond, label %truebr, label %falsebr 79938803920SJeremy Morse 80038803920SJeremy Morse bb1: 80138803920SJeremy Morse %value = phi i32 [ %value1, %truebr ], [ %value2, %falsebr ] 80238803920SJeremy Morse br label %exit, !dbg !26 80338803920SJeremy Morse 80438803920SJeremy Morse truebr: 80505963a3dSArthur Eubanks call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24 80605963a3dSArthur Eubanks call void @llvm.dbg.value(metadata i32 1, metadata !23, metadata !DIExpression()), !dbg !24 80738803920SJeremy Morse %value1 = add i32 %input, 1 80838803920SJeremy Morse br label %bb1 80938803920SJeremy Morse 81038803920SJeremy Morse falsebr: 81105963a3dSArthur Eubanks call void @llvm.dbg.value(metadata i32 %input, metadata !30, metadata !DIExpression()), !dbg !24 81205963a3dSArthur Eubanks call void @llvm.dbg.value(metadata i32 2, metadata !23, metadata !DIExpression()), !dbg !24 81338803920SJeremy Morse %value = add i32 %input, 2 81438803920SJeremy Morse br label %bb1 81538803920SJeremy Morse 81638803920SJeremy Morse exit: 81738803920SJeremy Morse ret i32 %value, !dbg !30 81838803920SJeremy Morse } 81938803920SJeremy Morse 82038803920SJeremy MorseHere the difficulties are: 82138803920SJeremy Morse 82238803920SJeremy Morse* The control flow is roughly the opposite of basic block order 82305963a3dSArthur Eubanks* The value of the ``!23`` variable merges into ``%bb1``, but there is no PHI 82438803920SJeremy Morse node 82538803920SJeremy Morse 82638803920SJeremy MorseAs mentioned above, the ``llvm.dbg.value`` intrinsics essentially form an 82738803920SJeremy Morseimperative program embedded in the IR, with each intrinsic defining a variable 82838803920SJeremy Morselocation. This *could* be converted to an SSA form by mem2reg, in the same way 82938803920SJeremy Morsethat it uses use-def chains to identify control flow merges and insert phi 83038803920SJeremy Morsenodes for IR Values. However, because debug variable locations are defined for 83138803920SJeremy Morseevery machine instruction, in effect every IR instruction uses every variable 83238803920SJeremy Morselocation, which would lead to a large number of debugging intrinsics being 83338803920SJeremy Morsegenerated. 83438803920SJeremy Morse 83538803920SJeremy MorseExamining the example above, variable ``!30`` is assigned ``%input`` on both 83605963a3dSArthur Eubanksconditional paths through the function, while ``!23`` is assigned differing 83738803920SJeremy Morseconstant values on either path. Where control flow merges in ``%bb1`` we would 83805963a3dSArthur Eubankswant ``!30`` to keep its location (``%input``), but ``!23`` to become undefined 83938803920SJeremy Morseas we cannot determine at runtime what value it should have in %bb1 without 84038803920SJeremy Morseinserting a PHI node. mem2reg does not insert the PHI node to avoid changing 84138803920SJeremy Morsecodegen when debugging is enabled, and does not insert the other dbg.values 84238803920SJeremy Morseto avoid adding very large numbers of intrinsics. 84338803920SJeremy Morse 84438803920SJeremy MorseInstead, LiveDebugValues determines variable locations when control 84538803920SJeremy Morseflow merges. A dataflow analysis is used to propagate locations between blocks: 84638803920SJeremy Morsewhen control flow merges, if a variable has the same location in all 84738803920SJeremy Morsepredecessors then that location is propagated into the successor. If the 84838803920SJeremy Morsepredecessor locations disagree, the location becomes undefined. 84938803920SJeremy Morse 85038803920SJeremy MorseOnce LiveDebugValues has run, every block should have all valid variable 85138803920SJeremy Morselocations described by DBG_VALUE instructions within the block. Very little 85238803920SJeremy Morseeffort is then required by supporting classes (such as 85338803920SJeremy MorseDbgEntityHistoryCalculator) to build a map of each instruction to every 85438803920SJeremy Morsevalid variable location, without the need to consider control flow. From 85538803920SJeremy Morsethe example above, it is otherwise difficult to determine that the location 85638803920SJeremy Morseof variable ``!30`` should flow "up" into block ``%bb1``, but that the location 85705963a3dSArthur Eubanksof variable ``!23`` should not flow "down" into the ``%exit`` block. 85838803920SJeremy Morse 8596ac1de48SDmitri Gribenko.. _ccxx_frontend: 8606ac1de48SDmitri Gribenko 8616ac1de48SDmitri GribenkoC/C++ front-end specific debug information 8626ac1de48SDmitri Gribenko========================================== 8636ac1de48SDmitri Gribenko 86409f320adSAdrian PrantlThe C and C++ front-ends represent information about the program in a 86509f320adSAdrian Prantlformat that is effectively identical to `DWARF <http://www.dwarfstd.org/>`_ 86609f320adSAdrian Prantlin terms of information content. This allows code generators to 86709f320adSAdrian Prantltrivially support native debuggers by generating standard dwarf 86809f320adSAdrian Prantlinformation, and contains enough information for non-dwarf targets to 86909f320adSAdrian Prantltranslate it as needed. 8706ac1de48SDmitri Gribenko 8716ac1de48SDmitri GribenkoThis section describes the forms used to represent C and C++ programs. Other 8726ac1de48SDmitri Gribenkolanguages could pattern themselves after this (which itself is tuned to 8734a5dd4a8SAdrian Prantlrepresenting programs in the same way that DWARF does), or they could choose 8746ac1de48SDmitri Gribenkoto provide completely different forms if they don't fit into the DWARF model. 8756ac1de48SDmitri GribenkoAs support for debugging information gets added to the various LLVM 8766ac1de48SDmitri Gribenkosource-language front-ends, the information used should be documented here. 8776ac1de48SDmitri Gribenko 8784a5dd4a8SAdrian PrantlThe following sections provide examples of a few C/C++ constructs and 8794a5dd4a8SAdrian Prantlthe debug information that would best describe those constructs. The 8804a5dd4a8SAdrian Prantlcanonical references are the ``DINode`` classes defined in 8814a5dd4a8SAdrian Prantl``include/llvm/IR/DebugInfoMetadata.h`` and the implementations of the 8824a5dd4a8SAdrian Prantlhelper functions in ``lib/IR/DIBuilder.cpp``. 8836ac1de48SDmitri Gribenko 8846ac1de48SDmitri GribenkoC/C++ source file information 8856ac1de48SDmitri Gribenko----------------------------- 8866ac1de48SDmitri Gribenko 8876ac1de48SDmitri Gribenko``llvm::Instruction`` provides easy access to metadata attached with an 8886ac1de48SDmitri Gribenkoinstruction. One can extract line number information encoded in LLVM IR using 889f032c956SDuncan P. N. Exon Smith``Instruction::getDebugLoc()`` and ``DILocation::getLine()``. 8906ac1de48SDmitri Gribenko 8916ac1de48SDmitri Gribenko.. code-block:: c++ 8926ac1de48SDmitri Gribenko 893f032c956SDuncan P. N. Exon Smith if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction 894f032c956SDuncan P. N. Exon Smith unsigned Line = Loc->getLine(); 895f032c956SDuncan P. N. Exon Smith StringRef File = Loc->getFilename(); 896f032c956SDuncan P. N. Exon Smith StringRef Dir = Loc->getDirectory(); 897eb7f6020SCalixte Denizet bool ImplicitCode = Loc->isImplicitCode(); 8986ac1de48SDmitri Gribenko } 8996ac1de48SDmitri Gribenko 900eb7f6020SCalixte DenizetWhen the flag ImplicitCode is true then it means that the Instruction has been 901eb7f6020SCalixte Denizetadded by the front-end but doesn't correspond to source code written by the user. For example 902eb7f6020SCalixte Denizet 903eb7f6020SCalixte Denizet.. code-block:: c++ 904eb7f6020SCalixte Denizet 905eb7f6020SCalixte Denizet if (MyBoolean) { 906eb7f6020SCalixte Denizet MyObject MO; 907eb7f6020SCalixte Denizet ... 908eb7f6020SCalixte Denizet } 909eb7f6020SCalixte Denizet 910eb7f6020SCalixte DenizetAt the end of the scope the MyObject's destructor is called but it isn't written 911eb7f6020SCalixte Denizetexplicitly. This information is useful to avoid to have counters on brackets when 912eb7f6020SCalixte Denizetmaking code coverage. 913eb7f6020SCalixte Denizet 9146ac1de48SDmitri GribenkoC/C++ global variable information 9156ac1de48SDmitri Gribenko--------------------------------- 9166ac1de48SDmitri Gribenko 9176ac1de48SDmitri GribenkoGiven an integer global variable declared as follows: 9186ac1de48SDmitri Gribenko 9196ac1de48SDmitri Gribenko.. code-block:: c 9206ac1de48SDmitri Gribenko 9213c989984SVictor Leschuk _Alignas(8) int MyGlobal = 100; 9226ac1de48SDmitri Gribenko 9236ac1de48SDmitri Gribenkoa C/C++ front-end would generate the following descriptors: 9246ac1de48SDmitri Gribenko 925124f2593SRenato Golin.. code-block:: text 9266ac1de48SDmitri Gribenko 9276ac1de48SDmitri Gribenko ;; 9286ac1de48SDmitri Gribenko ;; Define the global itself. 9296ac1de48SDmitri Gribenko ;; 9303c989984SVictor Leschuk @MyGlobal = global i32 100, align 8, !dbg !0 931d937cd9fSDuncan P. N. Exon Smith 9326ac1de48SDmitri Gribenko ;; 9336ac1de48SDmitri Gribenko ;; List of debug info of globals 9346ac1de48SDmitri Gribenko ;; 9353c989984SVictor Leschuk !llvm.dbg.cu = !{!1} 9366ac1de48SDmitri Gribenko 937d937cd9fSDuncan P. N. Exon Smith ;; Some unrelated metadata. 938d937cd9fSDuncan P. N. Exon Smith !llvm.module.flags = !{!6, !7} 9393c989984SVictor Leschuk !llvm.ident = !{!8} 9403c989984SVictor Leschuk 9413c989984SVictor Leschuk ;; Define the global variable itself 9423c989984SVictor Leschuk !0 = distinct !DIGlobalVariable(name: "MyGlobal", scope: !1, file: !2, line: 1, type: !5, isLocal: false, isDefinition: true, align: 64) 943d937cd9fSDuncan P. N. Exon Smith 9446ac1de48SDmitri Gribenko ;; Define the compile unit. 9453c989984SVictor Leschuk !1 = distinct !DICompileUnit(language: DW_LANG_C99, file: !2, 946693d39ddSJames Y Knight producer: "clang version 4.0.0", 947b8089516SAdrian Prantl isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, 9483c989984SVictor Leschuk enums: !3, globals: !4) 9496ac1de48SDmitri Gribenko 9506ac1de48SDmitri Gribenko ;; 9516ac1de48SDmitri Gribenko ;; Define the file 9526ac1de48SDmitri Gribenko ;; 9533c989984SVictor Leschuk !2 = !DIFile(filename: "/dev/stdin", 954d937cd9fSDuncan P. N. Exon Smith directory: "/Users/dexonsmith/data/llvm/debug-info") 955d937cd9fSDuncan P. N. Exon Smith 956d937cd9fSDuncan P. N. Exon Smith ;; An empty array. 9573c989984SVictor Leschuk !3 = !{} 958d937cd9fSDuncan P. N. Exon Smith 959d937cd9fSDuncan P. N. Exon Smith ;; The Array of Global Variables 9603c989984SVictor Leschuk !4 = !{!0} 9616ac1de48SDmitri Gribenko 9626ac1de48SDmitri Gribenko ;; 9636ac1de48SDmitri Gribenko ;; Define the type 9646ac1de48SDmitri Gribenko ;; 9653c989984SVictor Leschuk !5 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) 966d937cd9fSDuncan P. N. Exon Smith 967d937cd9fSDuncan P. N. Exon Smith ;; Dwarf version to output. 9683c989984SVictor Leschuk !6 = !{i32 2, !"Dwarf Version", i32 4} 969d937cd9fSDuncan P. N. Exon Smith 970d937cd9fSDuncan P. N. Exon Smith ;; Debug info schema version. 971d937cd9fSDuncan P. N. Exon Smith !7 = !{i32 2, !"Debug Info Version", i32 3} 9726ac1de48SDmitri Gribenko 9733c989984SVictor Leschuk ;; Compiler identification 974693d39ddSJames Y Knight !8 = !{!"clang version 4.0.0"} 9753c989984SVictor Leschuk 9763c989984SVictor Leschuk 9773c989984SVictor LeschukThe align value in DIGlobalVariable description specifies variable alignment in 9783c989984SVictor Leschukcase it was forced by C11 _Alignas(), C++11 alignas() keywords or compiler 9793c989984SVictor Leschukattribute __attribute__((aligned ())). In other case (when this field is missing) 9803c989984SVictor Leschukalignment is considered default. This is used when producing DWARF output 9813c989984SVictor Leschukfor DW_AT_alignment value. 9823c989984SVictor Leschuk 9836ac1de48SDmitri GribenkoC/C++ function information 9846ac1de48SDmitri Gribenko-------------------------- 9856ac1de48SDmitri Gribenko 9866ac1de48SDmitri GribenkoGiven a function declared as follows: 9876ac1de48SDmitri Gribenko 9886ac1de48SDmitri Gribenko.. code-block:: c 9896ac1de48SDmitri Gribenko 9906ac1de48SDmitri Gribenko int main(int argc, char *argv[]) { 9916ac1de48SDmitri Gribenko return 0; 9926ac1de48SDmitri Gribenko } 9936ac1de48SDmitri Gribenko 9946ac1de48SDmitri Gribenkoa C/C++ front-end would generate the following descriptors: 9956ac1de48SDmitri Gribenko 996124f2593SRenato Golin.. code-block:: text 9976ac1de48SDmitri Gribenko 9986ac1de48SDmitri Gribenko ;; 999c4fe5db1SDavid Blaikie ;; Define the anchor for subprograms. 10006ac1de48SDmitri Gribenko ;; 1001a9308c49SDuncan P. N. Exon Smith !4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5, 1002d937cd9fSDuncan P. N. Exon Smith isLocal: false, isDefinition: true, scopeLine: 1, 1003d937cd9fSDuncan P. N. Exon Smith flags: DIFlagPrototyped, isOptimized: false, 1004f623dc9aSEllis Hoag retainedNodes: !2) 1005936675e2SDuncan P. N. Exon Smith 10066ac1de48SDmitri Gribenko ;; 10076ac1de48SDmitri Gribenko ;; Define the subprogram itself. 10086ac1de48SDmitri Gribenko ;; 100950108683SPeter Collingbourne define i32 @main(i32 %argc, i8** %argv) !dbg !4 { 10106ac1de48SDmitri Gribenko ... 10116ac1de48SDmitri Gribenko } 10126ac1de48SDmitri Gribenko 1013f919be33SAdrian PrantlC++ specific debug information 1014f919be33SAdrian Prantl============================== 1015f919be33SAdrian Prantl 1016f919be33SAdrian PrantlC++ special member functions information 1017f919be33SAdrian Prantl---------------------------------------- 1018f919be33SAdrian Prantl 1019f919be33SAdrian PrantlDWARF v5 introduces attributes defined to enhance debugging information of C++ programs. LLVM can generate (or omit) these appropriate DWARF attributes. In C++ a special member function Ctors, Dtors, Copy/Move Ctors, assignment operators can be declared with C++11 keyword deleted. This is represented in LLVM using spFlags value DISPFlagDeleted. 1020f919be33SAdrian Prantl 1021f919be33SAdrian PrantlGiven a class declaration with copy constructor declared as deleted: 1022f919be33SAdrian Prantl 1023f919be33SAdrian Prantl.. code-block:: c 1024f919be33SAdrian Prantl 1025f919be33SAdrian Prantl class foo { 1026f919be33SAdrian Prantl public: 1027f919be33SAdrian Prantl foo(const foo&) = deleted; 1028f919be33SAdrian Prantl }; 1029f919be33SAdrian Prantl 1030f65d4aa9SKazuaki IshizakiA C++ frontend would generate following: 1031f919be33SAdrian Prantl 1032f919be33SAdrian Prantl.. code-block:: text 1033f919be33SAdrian Prantl 1034f919be33SAdrian Prantl !17 = !DISubprogram(name: "foo", scope: !11, file: !1, line: 5, type: !18, scopeLine: 5, flags: DIFlagPublic | DIFlagPrototyped, spFlags: DISPFlagDeleted) 1035f919be33SAdrian Prantl 1036f65d4aa9SKazuaki Ishizakiand this will produce an additional DWARF attribute as: 1037f919be33SAdrian Prantl 1038f919be33SAdrian Prantl.. code-block:: text 1039f919be33SAdrian Prantl 1040f919be33SAdrian Prantl DW_TAG_subprogram [7] * 1041f919be33SAdrian Prantl DW_AT_name [DW_FORM_strx1] (indexed (00000006) string = "foo") 1042f919be33SAdrian Prantl DW_AT_decl_line [DW_FORM_data1] (5) 1043f919be33SAdrian Prantl ... 1044f919be33SAdrian Prantl DW_AT_deleted [DW_FORM_flag_present] (true) 1045f919be33SAdrian Prantl 1046e69917f1SAdrian PrantlFortran specific debug information 1047e69917f1SAdrian Prantl================================== 1048e69917f1SAdrian Prantl 1049e69917f1SAdrian PrantlFortran function information 1050e69917f1SAdrian Prantl---------------------------- 1051e69917f1SAdrian Prantl 1052e69917f1SAdrian PrantlThere are a few DWARF attributes defined to support client debugging of Fortran programs. LLVM can generate (or omit) the appropriate DWARF attributes for the prefix-specs of ELEMENTAL, PURE, IMPURE, RECURSIVE, and NON_RECURSIVE. This is done by using the spFlags values: DISPFlagElemental, DISPFlagPure, and DISPFlagRecursive. 1053e69917f1SAdrian Prantl 1054e69917f1SAdrian Prantl.. code-block:: fortran 1055e69917f1SAdrian Prantl 1056e69917f1SAdrian Prantl elemental function elem_func(a) 1057e69917f1SAdrian Prantl 1058e69917f1SAdrian Prantla Fortran front-end would generate the following descriptors: 1059e69917f1SAdrian Prantl 1060e69917f1SAdrian Prantl.. code-block:: text 1061e69917f1SAdrian Prantl 1062e69917f1SAdrian Prantl !11 = distinct !DISubprogram(name: "subroutine2", scope: !1, file: !1, 1063e69917f1SAdrian Prantl line: 5, type: !8, scopeLine: 6, 1064e69917f1SAdrian Prantl spFlags: DISPFlagDefinition | DISPFlagElemental, unit: !0, 1065e69917f1SAdrian Prantl retainedNodes: !2) 1066e69917f1SAdrian Prantl 1067e69917f1SAdrian Prantland this will materialize an additional DWARF attribute as: 1068e69917f1SAdrian Prantl 1069e69917f1SAdrian Prantl.. code-block:: text 1070e69917f1SAdrian Prantl 1071e69917f1SAdrian Prantl DW_TAG_subprogram [3] 1072e69917f1SAdrian Prantl DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010 ".text") 1073e69917f1SAdrian Prantl DW_AT_high_pc [DW_FORM_data4] (0x00000001) 1074e69917f1SAdrian Prantl ... 1075e69917f1SAdrian Prantl DW_AT_elemental [DW_FORM_flag_present] (true) 1076e69917f1SAdrian Prantl 1077f91d18eaSSourabh Singh TomarThere are a few DWARF tags defined to represent Fortran specific constructs i.e DW_TAG_string_type for representing Fortran character(n). In LLVM this is represented as DIStringType. 1078f91d18eaSSourabh Singh Tomar 1079f91d18eaSSourabh Singh Tomar.. code-block:: fortran 1080f91d18eaSSourabh Singh Tomar 1081f91d18eaSSourabh Singh Tomar character(len=*), intent(in) :: string 1082f91d18eaSSourabh Singh Tomar 1083f91d18eaSSourabh Singh Tomara Fortran front-end would generate the following descriptors: 1084f91d18eaSSourabh Singh Tomar 1085f91d18eaSSourabh Singh Tomar.. code-block:: text 1086f91d18eaSSourabh Singh Tomar 1087f91d18eaSSourabh Singh Tomar !DILocalVariable(name: "string", arg: 1, scope: !10, file: !3, line: 4, type: !15) 1088f91d18eaSSourabh Singh Tomar !DIStringType(name: "character(*)!2", stringLength: !16, stringLengthExpression: !DIExpression(), size: 32) 1089f91d18eaSSourabh Singh Tomar 109070fdbf35SYASHASVI KHATAVKARA fortran deferred-length character can also contain the information of raw storage of the characters in addition to the length of the string. This information is encoded in the stringLocationExpression field. Based on this information, DW_AT_data_location attribute is emitted in a DW_TAG_string_type debug info. 109170fdbf35SYASHASVI KHATAVKAR 109270fdbf35SYASHASVI KHATAVKAR !DIStringType(name: "character(*)!2", stringLengthExpression: !DIExpression(), stringLocationExpression: !DIExpression(DW_OP_push_object_address, DW_OP_deref), size: 32) 1093f9f78a2cSYASHASVI KHATAVKAR 1094f91d18eaSSourabh Singh Tomarand this will materialize in DWARF tags as: 1095f91d18eaSSourabh Singh Tomar 1096f91d18eaSSourabh Singh Tomar.. code-block:: text 1097f91d18eaSSourabh Singh Tomar 1098f91d18eaSSourabh Singh Tomar DW_TAG_string_type 1099f91d18eaSSourabh Singh Tomar DW_AT_name ("character(*)!2") 1100f91d18eaSSourabh Singh Tomar DW_AT_string_length (0x00000064) 1101f91d18eaSSourabh Singh Tomar 0x00000064: DW_TAG_variable 1102f91d18eaSSourabh Singh Tomar DW_AT_location (DW_OP_fbreg +16) 1103f91d18eaSSourabh Singh Tomar DW_AT_type (0x00000083 "integer*8") 110470fdbf35SYASHASVI KHATAVKAR DW_AT_data_location (DW_OP_push_object_address, DW_OP_deref) 1105f91d18eaSSourabh Singh Tomar ... 1106f91d18eaSSourabh Singh Tomar DW_AT_artificial (true) 1107*eab6e94fSChih-Ping Chen 1108*eab6e94fSChih-Ping ChenA Fortran front-end may need to generate a *trampoline* function to call a 1109*eab6e94fSChih-Ping Chenfunction defined in a different compilation unit. In this case, the front-end 1110*eab6e94fSChih-Ping Chencan emit the following descriptor for the trampoline function: 1111*eab6e94fSChih-Ping Chen 1112*eab6e94fSChih-Ping Chen.. code-block:: text 1113*eab6e94fSChih-Ping Chen 1114*eab6e94fSChih-Ping Chen !DISubprogram(name: "sub1_.t0p", linkageName: "sub1_.t0p", scope: !4, file: !4, type: !5, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !7, retainedNodes: !24, targetFuncName: "sub1_") 1115*eab6e94fSChih-Ping Chen 1116*eab6e94fSChih-Ping ChenThe targetFuncName field is the name of the function that the trampoline 1117*eab6e94fSChih-Ping Chencalls. This descriptor results in the following DWARF tag: 1118*eab6e94fSChih-Ping Chen 1119*eab6e94fSChih-Ping Chen.. code-block:: text 1120*eab6e94fSChih-Ping Chen 1121*eab6e94fSChih-Ping Chen DW_TAG_subprogram 1122*eab6e94fSChih-Ping Chen ... 1123*eab6e94fSChih-Ping Chen DW_AT_linkage_name ("sub1_.t0p") 1124*eab6e94fSChih-Ping Chen DW_AT_name ("sub1_.t0p") 1125*eab6e94fSChih-Ping Chen DW_AT_trampoline ("sub1_") 1126f91d18eaSSourabh Singh Tomar 11276ac1de48SDmitri GribenkoDebugging information format 11286ac1de48SDmitri Gribenko============================ 11296ac1de48SDmitri Gribenko 11306ac1de48SDmitri GribenkoDebugging Information Extension for Objective C Properties 11316ac1de48SDmitri Gribenko---------------------------------------------------------- 11326ac1de48SDmitri Gribenko 11336ac1de48SDmitri GribenkoIntroduction 11346ac1de48SDmitri Gribenko^^^^^^^^^^^^ 11356ac1de48SDmitri Gribenko 11366ac1de48SDmitri GribenkoObjective C provides a simpler way to declare and define accessor methods using 11376ac1de48SDmitri Gribenkodeclared properties. The language provides features to declare a property and 11386ac1de48SDmitri Gribenkoto let compiler synthesize accessor methods. 11396ac1de48SDmitri Gribenko 11406ac1de48SDmitri GribenkoThe debugger lets developer inspect Objective C interfaces and their instance 11416ac1de48SDmitri Gribenkovariables and class variables. However, the debugger does not know anything 11426ac1de48SDmitri Gribenkoabout the properties defined in Objective C interfaces. The debugger consumes 11436ac1de48SDmitri Gribenkoinformation generated by compiler in DWARF format. The format does not support 11446ac1de48SDmitri Gribenkoencoding of Objective C properties. This proposal describes DWARF extensions to 11456ac1de48SDmitri Gribenkoencode Objective C properties, which the debugger can use to let developers 11466ac1de48SDmitri Gribenkoinspect Objective C properties. 11476ac1de48SDmitri Gribenko 11486ac1de48SDmitri GribenkoProposal 11496ac1de48SDmitri Gribenko^^^^^^^^ 11506ac1de48SDmitri Gribenko 11516ac1de48SDmitri GribenkoObjective C properties exist separately from class members. A property can be 11526ac1de48SDmitri Gribenkodefined only by "setter" and "getter" selectors, and be calculated anew on each 11536ac1de48SDmitri Gribenkoaccess. Or a property can just be a direct access to some declared ivar. 11546ac1de48SDmitri GribenkoFinally it can have an ivar "automatically synthesized" for it by the compiler, 11556ac1de48SDmitri Gribenkoin which case the property can be referred to in user code directly using the 11566ac1de48SDmitri Gribenkostandard C dereference syntax as well as through the property "dot" syntax, but 11576ac1de48SDmitri Gribenkothere is no entry in the ``@interface`` declaration corresponding to this ivar. 11586ac1de48SDmitri Gribenko 11596ac1de48SDmitri GribenkoTo facilitate debugging, these properties we will add a new DWARF TAG into the 11606ac1de48SDmitri Gribenko``DW_TAG_structure_type`` definition for the class to hold the description of a 11616ac1de48SDmitri Gribenkogiven property, and a set of DWARF attributes that provide said description. 11626ac1de48SDmitri GribenkoThe property tag will also contain the name and declared type of the property. 11636ac1de48SDmitri Gribenko 11646ac1de48SDmitri GribenkoIf there is a related ivar, there will also be a DWARF property attribute placed 11656ac1de48SDmitri Gribenkoin the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG 11666ac1de48SDmitri Gribenkofor that property. And in the case where the compiler synthesizes the ivar 11676ac1de48SDmitri Gribenkodirectly, the compiler is expected to generate a ``DW_TAG_member`` for that 11686ac1de48SDmitri Gribenkoivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used 11696ac1de48SDmitri Gribenkoto access this ivar directly in code, and with the property attribute pointing 11706ac1de48SDmitri Gribenkoback to the property it is backing. 11716ac1de48SDmitri Gribenko 11726ac1de48SDmitri GribenkoThe following examples will serve as illustration for our discussion: 11736ac1de48SDmitri Gribenko 11746ac1de48SDmitri Gribenko.. code-block:: objc 11756ac1de48SDmitri Gribenko 11766ac1de48SDmitri Gribenko @interface I1 { 11776ac1de48SDmitri Gribenko int n2; 11786ac1de48SDmitri Gribenko } 11796ac1de48SDmitri Gribenko 11806ac1de48SDmitri Gribenko @property int p1; 11816ac1de48SDmitri Gribenko @property int p2; 11826ac1de48SDmitri Gribenko @end 11836ac1de48SDmitri Gribenko 11846ac1de48SDmitri Gribenko @implementation I1 11856ac1de48SDmitri Gribenko @synthesize p1; 11866ac1de48SDmitri Gribenko @synthesize p2 = n2; 11876ac1de48SDmitri Gribenko @end 11886ac1de48SDmitri Gribenko 11896ac1de48SDmitri GribenkoThis produces the following DWARF (this is a "pseudo dwarfdump" output): 11906ac1de48SDmitri Gribenko 11916ac1de48SDmitri Gribenko.. code-block:: none 11926ac1de48SDmitri Gribenko 11936ac1de48SDmitri Gribenko 0x00000100: TAG_structure_type [7] * 11946ac1de48SDmitri Gribenko AT_APPLE_runtime_class( 0x10 ) 11956ac1de48SDmitri Gribenko AT_name( "I1" ) 11966ac1de48SDmitri Gribenko AT_decl_file( "Objc_Property.m" ) 11976ac1de48SDmitri Gribenko AT_decl_line( 3 ) 11986ac1de48SDmitri Gribenko 11996ac1de48SDmitri Gribenko 0x00000110 TAG_APPLE_property 12006ac1de48SDmitri Gribenko AT_name ( "p1" ) 12016ac1de48SDmitri Gribenko AT_type ( {0x00000150} ( int ) ) 12026ac1de48SDmitri Gribenko 12036ac1de48SDmitri Gribenko 0x00000120: TAG_APPLE_property 12046ac1de48SDmitri Gribenko AT_name ( "p2" ) 12056ac1de48SDmitri Gribenko AT_type ( {0x00000150} ( int ) ) 12066ac1de48SDmitri Gribenko 12076ac1de48SDmitri Gribenko 0x00000130: TAG_member [8] 12086ac1de48SDmitri Gribenko AT_name( "_p1" ) 12096ac1de48SDmitri Gribenko AT_APPLE_property ( {0x00000110} "p1" ) 12106ac1de48SDmitri Gribenko AT_type( {0x00000150} ( int ) ) 12116ac1de48SDmitri Gribenko AT_artificial ( 0x1 ) 12126ac1de48SDmitri Gribenko 12136ac1de48SDmitri Gribenko 0x00000140: TAG_member [8] 12146ac1de48SDmitri Gribenko AT_name( "n2" ) 12156ac1de48SDmitri Gribenko AT_APPLE_property ( {0x00000120} "p2" ) 12166ac1de48SDmitri Gribenko AT_type( {0x00000150} ( int ) ) 12176ac1de48SDmitri Gribenko 12186ac1de48SDmitri Gribenko 0x00000150: AT_type( ( int ) ) 12196ac1de48SDmitri Gribenko 12206ac1de48SDmitri GribenkoNote, the current convention is that the name of the ivar for an 12216ac1de48SDmitri Gribenkoauto-synthesized property is the name of the property from which it derives 12226ac1de48SDmitri Gribenkowith an underscore prepended, as is shown in the example. But we actually 12236ac1de48SDmitri Gribenkodon't need to know this convention, since we are given the name of the ivar 12246ac1de48SDmitri Gribenkodirectly. 12256ac1de48SDmitri Gribenko 12266ac1de48SDmitri GribenkoAlso, it is common practice in ObjC to have different property declarations in 12276ac1de48SDmitri Gribenkothe @interface and @implementation - e.g. to provide a read-only property in 12286ac1de48SDmitri Gribenkothe interface, and a read-write interface in the implementation. In that case, 12296ac1de48SDmitri Gribenkothe compiler should emit whichever property declaration will be in force in the 12306ac1de48SDmitri Gribenkocurrent translation unit. 12316ac1de48SDmitri Gribenko 12326ac1de48SDmitri GribenkoDevelopers can decorate a property with attributes which are encoded using 12336ac1de48SDmitri Gribenko``DW_AT_APPLE_property_attribute``. 12346ac1de48SDmitri Gribenko 12356ac1de48SDmitri Gribenko.. code-block:: objc 12366ac1de48SDmitri Gribenko 12376ac1de48SDmitri Gribenko @property (readonly, nonatomic) int pr; 12386ac1de48SDmitri Gribenko 12396ac1de48SDmitri Gribenko.. code-block:: none 12406ac1de48SDmitri Gribenko 12416ac1de48SDmitri Gribenko TAG_APPLE_property [8] 12426ac1de48SDmitri Gribenko AT_name( "pr" ) 12436ac1de48SDmitri Gribenko AT_type ( {0x00000147} (int) ) 12446ac1de48SDmitri Gribenko AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) 12456ac1de48SDmitri Gribenko 12466ac1de48SDmitri GribenkoThe setter and getter method names are attached to the property using 12476ac1de48SDmitri Gribenko``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes. 12486ac1de48SDmitri Gribenko 12496ac1de48SDmitri Gribenko.. code-block:: objc 12506ac1de48SDmitri Gribenko 12516ac1de48SDmitri Gribenko @interface I1 12526ac1de48SDmitri Gribenko @property (setter=myOwnP3Setter:) int p3; 12536ac1de48SDmitri Gribenko -(void)myOwnP3Setter:(int)a; 12546ac1de48SDmitri Gribenko @end 12556ac1de48SDmitri Gribenko 12566ac1de48SDmitri Gribenko @implementation I1 12576ac1de48SDmitri Gribenko @synthesize p3; 12586ac1de48SDmitri Gribenko -(void)myOwnP3Setter:(int)a{ } 12596ac1de48SDmitri Gribenko @end 12606ac1de48SDmitri Gribenko 12616ac1de48SDmitri GribenkoThe DWARF for this would be: 12626ac1de48SDmitri Gribenko 12636ac1de48SDmitri Gribenko.. code-block:: none 12646ac1de48SDmitri Gribenko 12656ac1de48SDmitri Gribenko 0x000003bd: TAG_structure_type [7] * 12666ac1de48SDmitri Gribenko AT_APPLE_runtime_class( 0x10 ) 12676ac1de48SDmitri Gribenko AT_name( "I1" ) 12686ac1de48SDmitri Gribenko AT_decl_file( "Objc_Property.m" ) 12696ac1de48SDmitri Gribenko AT_decl_line( 3 ) 12706ac1de48SDmitri Gribenko 12716ac1de48SDmitri Gribenko 0x000003cd TAG_APPLE_property 12726ac1de48SDmitri Gribenko AT_name ( "p3" ) 12736ac1de48SDmitri Gribenko AT_APPLE_property_setter ( "myOwnP3Setter:" ) 12746ac1de48SDmitri Gribenko AT_type( {0x00000147} ( int ) ) 12756ac1de48SDmitri Gribenko 12766ac1de48SDmitri Gribenko 0x000003f3: TAG_member [8] 12776ac1de48SDmitri Gribenko AT_name( "_p3" ) 12786ac1de48SDmitri Gribenko AT_type ( {0x00000147} ( int ) ) 12796ac1de48SDmitri Gribenko AT_APPLE_property ( {0x000003cd} ) 12806ac1de48SDmitri Gribenko AT_artificial ( 0x1 ) 12816ac1de48SDmitri Gribenko 12826ac1de48SDmitri GribenkoNew DWARF Tags 12836ac1de48SDmitri Gribenko^^^^^^^^^^^^^^ 12846ac1de48SDmitri Gribenko 12856ac1de48SDmitri Gribenko+-----------------------+--------+ 12866ac1de48SDmitri Gribenko| TAG | Value | 12876ac1de48SDmitri Gribenko+=======================+========+ 12886ac1de48SDmitri Gribenko| DW_TAG_APPLE_property | 0x4200 | 12896ac1de48SDmitri Gribenko+-----------------------+--------+ 12906ac1de48SDmitri Gribenko 12916ac1de48SDmitri GribenkoNew DWARF Attributes 12926ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^^ 12936ac1de48SDmitri Gribenko 12946ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+ 12956ac1de48SDmitri Gribenko| Attribute | Value | Classes | 12966ac1de48SDmitri Gribenko+================================+========+===========+ 12976ac1de48SDmitri Gribenko| DW_AT_APPLE_property | 0x3fed | Reference | 12986ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+ 12996ac1de48SDmitri Gribenko| DW_AT_APPLE_property_getter | 0x3fe9 | String | 13006ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+ 13016ac1de48SDmitri Gribenko| DW_AT_APPLE_property_setter | 0x3fea | String | 13026ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+ 13036ac1de48SDmitri Gribenko| DW_AT_APPLE_property_attribute | 0x3feb | Constant | 13046ac1de48SDmitri Gribenko+--------------------------------+--------+-----------+ 13056ac1de48SDmitri Gribenko 13066ac1de48SDmitri GribenkoNew DWARF Constants 13076ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^ 13086ac1de48SDmitri Gribenko 1309eea4f885SFrederic Riss+--------------------------------------+-------+ 13106ac1de48SDmitri Gribenko| Name | Value | 1311eea4f885SFrederic Riss+======================================+=======+ 1312eea4f885SFrederic Riss| DW_APPLE_PROPERTY_readonly | 0x01 | 1313eea4f885SFrederic Riss+--------------------------------------+-------+ 1314eea4f885SFrederic Riss| DW_APPLE_PROPERTY_getter | 0x02 | 1315eea4f885SFrederic Riss+--------------------------------------+-------+ 1316eea4f885SFrederic Riss| DW_APPLE_PROPERTY_assign | 0x04 | 1317eea4f885SFrederic Riss+--------------------------------------+-------+ 1318eea4f885SFrederic Riss| DW_APPLE_PROPERTY_readwrite | 0x08 | 1319eea4f885SFrederic Riss+--------------------------------------+-------+ 1320eea4f885SFrederic Riss| DW_APPLE_PROPERTY_retain | 0x10 | 1321eea4f885SFrederic Riss+--------------------------------------+-------+ 1322eea4f885SFrederic Riss| DW_APPLE_PROPERTY_copy | 0x20 | 1323eea4f885SFrederic Riss+--------------------------------------+-------+ 1324eea4f885SFrederic Riss| DW_APPLE_PROPERTY_nonatomic | 0x40 | 1325eea4f885SFrederic Riss+--------------------------------------+-------+ 1326eea4f885SFrederic Riss| DW_APPLE_PROPERTY_setter | 0x80 | 1327eea4f885SFrederic Riss+--------------------------------------+-------+ 1328eea4f885SFrederic Riss| DW_APPLE_PROPERTY_atomic | 0x100 | 1329eea4f885SFrederic Riss+--------------------------------------+-------+ 1330eea4f885SFrederic Riss| DW_APPLE_PROPERTY_weak | 0x200 | 1331eea4f885SFrederic Riss+--------------------------------------+-------+ 1332eea4f885SFrederic Riss| DW_APPLE_PROPERTY_strong | 0x400 | 1333eea4f885SFrederic Riss+--------------------------------------+-------+ 1334eea4f885SFrederic Riss| DW_APPLE_PROPERTY_unsafe_unretained | 0x800 | 13350418ef26SAdrian Prantl+--------------------------------------+-------+ 13360418ef26SAdrian Prantl| DW_APPLE_PROPERTY_nullability | 0x1000| 13370418ef26SAdrian Prantl+--------------------------------------+-------+ 13380418ef26SAdrian Prantl| DW_APPLE_PROPERTY_null_resettable | 0x2000| 13390418ef26SAdrian Prantl+--------------------------------------+-------+ 13400418ef26SAdrian Prantl| DW_APPLE_PROPERTY_class | 0x4000| 13410418ef26SAdrian Prantl+--------------------------------------+-------+ 13426ac1de48SDmitri Gribenko 13436ac1de48SDmitri GribenkoName Accelerator Tables 13446ac1de48SDmitri Gribenko----------------------- 13456ac1de48SDmitri Gribenko 13466ac1de48SDmitri GribenkoIntroduction 13476ac1de48SDmitri Gribenko^^^^^^^^^^^^ 13486ac1de48SDmitri Gribenko 13496ac1de48SDmitri GribenkoThe "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a 13506ac1de48SDmitri Gribenkodebugger needs. The "``pub``" in the section name indicates that the entries 13516ac1de48SDmitri Gribenkoin the table are publicly visible names only. This means no static or hidden 13526ac1de48SDmitri Gribenkofunctions show up in the "``.debug_pubnames``". No static variables or private 13536ac1de48SDmitri Gribenkoclass variables are in the "``.debug_pubtypes``". Many compilers add different 13546ac1de48SDmitri Gribenkothings to these tables, so we can't rely upon the contents between gcc, icc, or 13556ac1de48SDmitri Gribenkoclang. 13566ac1de48SDmitri Gribenko 13576ac1de48SDmitri GribenkoThe typical query given by users tends not to match up with the contents of 13586ac1de48SDmitri Gribenkothese tables. For example, the DWARF spec states that "In the case of the name 13596ac1de48SDmitri Gribenkoof a function member or static data member of a C++ structure, class or union, 13606ac1de48SDmitri Gribenkothe name presented in the "``.debug_pubnames``" section is not the simple name 13616ac1de48SDmitri Gribenkogiven by the ``DW_AT_name attribute`` of the referenced debugging information 13626ac1de48SDmitri Gribenkoentry, but rather the fully qualified name of the data or function member." 13636ac1de48SDmitri GribenkoSo the only names in these tables for complex C++ entries is a fully 13646ac1de48SDmitri Gribenkoqualified name. Debugger users tend not to enter their search strings as 13656ac1de48SDmitri Gribenko"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or 13666ac1de48SDmitri Gribenko"``a::b::c``". So the name entered in the name table must be demangled in 13676ac1de48SDmitri Gribenkoorder to chop it up appropriately and additional names must be manually entered 13686ac1de48SDmitri Gribenkointo the table to make it effective as a name lookup table for debuggers to 1369e9ffb45bSBruce Mitcheneruse. 13706ac1de48SDmitri Gribenko 13716ac1de48SDmitri GribenkoAll debuggers currently ignore the "``.debug_pubnames``" table as a result of 13726ac1de48SDmitri Gribenkoits inconsistent and useless public-only name content making it a waste of 13736ac1de48SDmitri Gribenkospace in the object file. These tables, when they are written to disk, are not 13746ac1de48SDmitri Gribenkosorted in any way, leaving every debugger to do its own parsing and sorting. 13756ac1de48SDmitri GribenkoThese tables also include an inlined copy of the string values in the table 13766ac1de48SDmitri Gribenkoitself making the tables much larger than they need to be on disk, especially 13776ac1de48SDmitri Gribenkofor large C++ programs. 13786ac1de48SDmitri Gribenko 13796ac1de48SDmitri GribenkoCan't we just fix the sections by adding all of the names we need to this 13806ac1de48SDmitri Gribenkotable? No, because that is not what the tables are defined to contain and we 13816ac1de48SDmitri Gribenkowon't know the difference between the old bad tables and the new good tables. 13826ac1de48SDmitri GribenkoAt best we could make our own renamed sections that contain all of the data we 13836ac1de48SDmitri Gribenkoneed. 13846ac1de48SDmitri Gribenko 13856ac1de48SDmitri GribenkoThese tables are also insufficient for what a debugger like LLDB needs. LLDB 13866ac1de48SDmitri Gribenkouses clang for its expression parsing where LLDB acts as a PCH. LLDB is then 13876ac1de48SDmitri Gribenkooften asked to look for type "``foo``" or namespace "``bar``", or list items in 13886ac1de48SDmitri Gribenkonamespace "``baz``". Namespaces are not included in the pubnames or pubtypes 13896ac1de48SDmitri Gribenkotables. Since clang asks a lot of questions when it is parsing an expression, 13906ac1de48SDmitri Gribenkowe need to be very fast when looking up names, as it happens a lot. Having new 13916ac1de48SDmitri Gribenkoaccelerator tables that are optimized for very quick lookups will benefit this 13926ac1de48SDmitri Gribenkotype of debugging experience greatly. 13936ac1de48SDmitri Gribenko 13946ac1de48SDmitri GribenkoWe would like to generate name lookup tables that can be mapped into memory 13956ac1de48SDmitri Gribenkofrom disk, and used as is, with little or no up-front parsing. We would also 13966ac1de48SDmitri Gribenkobe able to control the exact content of these different tables so they contain 13976ac1de48SDmitri Gribenkoexactly what we need. The Name Accelerator Tables were designed to fix these 13986ac1de48SDmitri Gribenkoissues. In order to solve these issues we need to: 13996ac1de48SDmitri Gribenko 14006ac1de48SDmitri Gribenko* Have a format that can be mapped into memory from disk and used as is 14016ac1de48SDmitri Gribenko* Lookups should be very fast 14026ac1de48SDmitri Gribenko* Extensible table format so these tables can be made by many producers 14036ac1de48SDmitri Gribenko* Contain all of the names needed for typical lookups out of the box 14046ac1de48SDmitri Gribenko* Strict rules for the contents of tables 14056ac1de48SDmitri Gribenko 14066ac1de48SDmitri GribenkoTable size is important and the accelerator table format should allow the reuse 14076ac1de48SDmitri Gribenkoof strings from common string tables so the strings for the names are not 14086ac1de48SDmitri Gribenkoduplicated. We also want to make sure the table is ready to be used as-is by 14096ac1de48SDmitri Gribenkosimply mapping the table into memory with minimal header parsing. 14106ac1de48SDmitri Gribenko 14116ac1de48SDmitri GribenkoThe name lookups need to be fast and optimized for the kinds of lookups that 14126ac1de48SDmitri Gribenkodebuggers tend to do. Optimally we would like to touch as few parts of the 14136ac1de48SDmitri Gribenkomapped table as possible when doing a name lookup and be able to quickly find 14146ac1de48SDmitri Gribenkothe name entry we are looking for, or discover there are no matches. In the 14156ac1de48SDmitri Gribenkocase of debuggers we optimized for lookups that fail most of the time. 14166ac1de48SDmitri Gribenko 14176ac1de48SDmitri GribenkoEach table that is defined should have strict rules on exactly what is in the 14186ac1de48SDmitri Gribenkoaccelerator tables and documented so clients can rely on the content. 14196ac1de48SDmitri Gribenko 14206ac1de48SDmitri GribenkoHash Tables 14216ac1de48SDmitri Gribenko^^^^^^^^^^^ 14226ac1de48SDmitri Gribenko 14236ac1de48SDmitri GribenkoStandard Hash Tables 14246ac1de48SDmitri Gribenko"""""""""""""""""""" 14256ac1de48SDmitri Gribenko 14266ac1de48SDmitri GribenkoTypical hash tables have a header, buckets, and each bucket points to the 14276ac1de48SDmitri Gribenkobucket contents: 14286ac1de48SDmitri Gribenko 14296ac1de48SDmitri Gribenko.. code-block:: none 14306ac1de48SDmitri Gribenko 14316ac1de48SDmitri Gribenko .------------. 14326ac1de48SDmitri Gribenko | HEADER | 14336ac1de48SDmitri Gribenko |------------| 14346ac1de48SDmitri Gribenko | BUCKETS | 14356ac1de48SDmitri Gribenko |------------| 14366ac1de48SDmitri Gribenko | DATA | 14376ac1de48SDmitri Gribenko `------------' 14386ac1de48SDmitri Gribenko 14396ac1de48SDmitri GribenkoThe BUCKETS are an array of offsets to DATA for each hash: 14406ac1de48SDmitri Gribenko 14416ac1de48SDmitri Gribenko.. code-block:: none 14426ac1de48SDmitri Gribenko 14436ac1de48SDmitri Gribenko .------------. 14446ac1de48SDmitri Gribenko | 0x00001000 | BUCKETS[0] 14456ac1de48SDmitri Gribenko | 0x00002000 | BUCKETS[1] 14466ac1de48SDmitri Gribenko | 0x00002200 | BUCKETS[2] 14476ac1de48SDmitri Gribenko | 0x000034f0 | BUCKETS[3] 14486ac1de48SDmitri Gribenko | | ... 14496ac1de48SDmitri Gribenko | 0xXXXXXXXX | BUCKETS[n_buckets] 14506ac1de48SDmitri Gribenko '------------' 14516ac1de48SDmitri Gribenko 14526ac1de48SDmitri GribenkoSo for ``bucket[3]`` in the example above, we have an offset into the table 14536ac1de48SDmitri Gribenko0x000034f0 which points to a chain of entries for the bucket. Each bucket must 14546ac1de48SDmitri Gribenkocontain a next pointer, full 32 bit hash value, the string itself, and the data 14556ac1de48SDmitri Gribenkofor the current string value. 14566ac1de48SDmitri Gribenko 14576ac1de48SDmitri Gribenko.. code-block:: none 14586ac1de48SDmitri Gribenko 14596ac1de48SDmitri Gribenko .------------. 14606ac1de48SDmitri Gribenko 0x000034f0: | 0x00003500 | next pointer 14616ac1de48SDmitri Gribenko | 0x12345678 | 32 bit hash 14626ac1de48SDmitri Gribenko | "erase" | string value 14636ac1de48SDmitri Gribenko | data[n] | HashData for this bucket 14646ac1de48SDmitri Gribenko |------------| 14656ac1de48SDmitri Gribenko 0x00003500: | 0x00003550 | next pointer 14666ac1de48SDmitri Gribenko | 0x29273623 | 32 bit hash 14676ac1de48SDmitri Gribenko | "dump" | string value 14686ac1de48SDmitri Gribenko | data[n] | HashData for this bucket 14696ac1de48SDmitri Gribenko |------------| 14706ac1de48SDmitri Gribenko 0x00003550: | 0x00000000 | next pointer 14716ac1de48SDmitri Gribenko | 0x82638293 | 32 bit hash 14726ac1de48SDmitri Gribenko | "main" | string value 14736ac1de48SDmitri Gribenko | data[n] | HashData for this bucket 14746ac1de48SDmitri Gribenko `------------' 14756ac1de48SDmitri Gribenko 14766ac1de48SDmitri GribenkoThe problem with this layout for debuggers is that we need to optimize for the 14776ac1de48SDmitri Gribenkonegative lookup case where the symbol we're searching for is not present. So 1478ce898dbbSVedant Kumarif we were to lookup "``printf``" in the table above, we would make a 32-bit 1479ce898dbbSVedant Kumarhash for "``printf``", it might match ``bucket[3]``. We would need to go to 1480ce898dbbSVedant Kumarthe offset 0x000034f0 and start looking to see if our 32 bit hash matches. To 1481ce898dbbSVedant Kumardo so, we need to read the next pointer, then read the hash, compare it, and 1482ce898dbbSVedant Kumarskip to the next bucket. Each time we are skipping many bytes in memory and 1483ce898dbbSVedant Kumartouching new pages just to do the compare on the full 32 bit hash. All of 1484ce898dbbSVedant Kumarthese accesses then tell us that we didn't have a match. 14856ac1de48SDmitri Gribenko 14866ac1de48SDmitri GribenkoName Hash Tables 14876ac1de48SDmitri Gribenko"""""""""""""""" 14886ac1de48SDmitri Gribenko 14896ac1de48SDmitri GribenkoTo solve the issues mentioned above we have structured the hash tables a bit 14906ac1de48SDmitri Gribenkodifferently: a header, buckets, an array of all unique 32 bit hash values, 14916ac1de48SDmitri Gribenkofollowed by an array of hash value data offsets, one for each hash value, then 14926ac1de48SDmitri Gribenkothe data for all hash values: 14936ac1de48SDmitri Gribenko 14946ac1de48SDmitri Gribenko.. code-block:: none 14956ac1de48SDmitri Gribenko 14966ac1de48SDmitri Gribenko .-------------. 14976ac1de48SDmitri Gribenko | HEADER | 14986ac1de48SDmitri Gribenko |-------------| 14996ac1de48SDmitri Gribenko | BUCKETS | 15006ac1de48SDmitri Gribenko |-------------| 15016ac1de48SDmitri Gribenko | HASHES | 15026ac1de48SDmitri Gribenko |-------------| 15036ac1de48SDmitri Gribenko | OFFSETS | 15046ac1de48SDmitri Gribenko |-------------| 15056ac1de48SDmitri Gribenko | DATA | 15066ac1de48SDmitri Gribenko `-------------' 15076ac1de48SDmitri Gribenko 15086ac1de48SDmitri GribenkoThe ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By 15096ac1de48SDmitri Gribenkomaking all of the full 32 bit hash values contiguous in memory, we allow 15106ac1de48SDmitri Gribenkoourselves to efficiently check for a match while touching as little memory as 15116ac1de48SDmitri Gribenkopossible. Most often checking the 32 bit hash values is as far as the lookup 15126ac1de48SDmitri Gribenkogoes. If it does match, it usually is a match with no collisions. So for a 15136ac1de48SDmitri Gribenkotable with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash 15146ac1de48SDmitri Gribenkovalues, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and 15156ac1de48SDmitri Gribenko``OFFSETS`` as: 15166ac1de48SDmitri Gribenko 15176ac1de48SDmitri Gribenko.. code-block:: none 15186ac1de48SDmitri Gribenko 15196ac1de48SDmitri Gribenko .-------------------------. 15206ac1de48SDmitri Gribenko | HEADER.magic | uint32_t 15216ac1de48SDmitri Gribenko | HEADER.version | uint16_t 15226ac1de48SDmitri Gribenko | HEADER.hash_function | uint16_t 15236ac1de48SDmitri Gribenko | HEADER.bucket_count | uint32_t 15246ac1de48SDmitri Gribenko | HEADER.hashes_count | uint32_t 15256ac1de48SDmitri Gribenko | HEADER.header_data_len | uint32_t 15266ac1de48SDmitri Gribenko | HEADER_DATA | HeaderData 15276ac1de48SDmitri Gribenko |-------------------------| 15287e66bd39SEric Christopher | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes 15296ac1de48SDmitri Gribenko |-------------------------| 15307e66bd39SEric Christopher | HASHES | uint32_t[n_hashes] // 32 bit hash values 15316ac1de48SDmitri Gribenko |-------------------------| 15327e66bd39SEric Christopher | OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data 15336ac1de48SDmitri Gribenko |-------------------------| 15346ac1de48SDmitri Gribenko | ALL HASH DATA | 15356ac1de48SDmitri Gribenko `-------------------------' 15366ac1de48SDmitri Gribenko 15376ac1de48SDmitri GribenkoSo taking the exact same data from the standard hash example above we end up 15386ac1de48SDmitri Gribenkowith: 15396ac1de48SDmitri Gribenko 15406ac1de48SDmitri Gribenko.. code-block:: none 15416ac1de48SDmitri Gribenko 15426ac1de48SDmitri Gribenko .------------. 15436ac1de48SDmitri Gribenko | HEADER | 15446ac1de48SDmitri Gribenko |------------| 15456ac1de48SDmitri Gribenko | 0 | BUCKETS[0] 15466ac1de48SDmitri Gribenko | 2 | BUCKETS[1] 15476ac1de48SDmitri Gribenko | 5 | BUCKETS[2] 15486ac1de48SDmitri Gribenko | 6 | BUCKETS[3] 15496ac1de48SDmitri Gribenko | | ... 15506ac1de48SDmitri Gribenko | ... | BUCKETS[n_buckets] 15516ac1de48SDmitri Gribenko |------------| 15526ac1de48SDmitri Gribenko | 0x........ | HASHES[0] 15536ac1de48SDmitri Gribenko | 0x........ | HASHES[1] 15546ac1de48SDmitri Gribenko | 0x........ | HASHES[2] 15556ac1de48SDmitri Gribenko | 0x........ | HASHES[3] 15566ac1de48SDmitri Gribenko | 0x........ | HASHES[4] 15576ac1de48SDmitri Gribenko | 0x........ | HASHES[5] 15586ac1de48SDmitri Gribenko | 0x12345678 | HASHES[6] hash for BUCKETS[3] 15596ac1de48SDmitri Gribenko | 0x29273623 | HASHES[7] hash for BUCKETS[3] 15606ac1de48SDmitri Gribenko | 0x82638293 | HASHES[8] hash for BUCKETS[3] 15616ac1de48SDmitri Gribenko | 0x........ | HASHES[9] 15626ac1de48SDmitri Gribenko | 0x........ | HASHES[10] 15636ac1de48SDmitri Gribenko | 0x........ | HASHES[11] 15646ac1de48SDmitri Gribenko | 0x........ | HASHES[12] 15656ac1de48SDmitri Gribenko | 0x........ | HASHES[13] 15666ac1de48SDmitri Gribenko | 0x........ | HASHES[n_hashes] 15676ac1de48SDmitri Gribenko |------------| 15686ac1de48SDmitri Gribenko | 0x........ | OFFSETS[0] 15696ac1de48SDmitri Gribenko | 0x........ | OFFSETS[1] 15706ac1de48SDmitri Gribenko | 0x........ | OFFSETS[2] 15716ac1de48SDmitri Gribenko | 0x........ | OFFSETS[3] 15726ac1de48SDmitri Gribenko | 0x........ | OFFSETS[4] 15736ac1de48SDmitri Gribenko | 0x........ | OFFSETS[5] 15746ac1de48SDmitri Gribenko | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] 15756ac1de48SDmitri Gribenko | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] 15766ac1de48SDmitri Gribenko | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] 15776ac1de48SDmitri Gribenko | 0x........ | OFFSETS[9] 15786ac1de48SDmitri Gribenko | 0x........ | OFFSETS[10] 15796ac1de48SDmitri Gribenko | 0x........ | OFFSETS[11] 15806ac1de48SDmitri Gribenko | 0x........ | OFFSETS[12] 15816ac1de48SDmitri Gribenko | 0x........ | OFFSETS[13] 15826ac1de48SDmitri Gribenko | 0x........ | OFFSETS[n_hashes] 15836ac1de48SDmitri Gribenko |------------| 15846ac1de48SDmitri Gribenko | | 15856ac1de48SDmitri Gribenko | | 15866ac1de48SDmitri Gribenko | | 15876ac1de48SDmitri Gribenko | | 15886ac1de48SDmitri Gribenko | | 15896ac1de48SDmitri Gribenko |------------| 15906ac1de48SDmitri Gribenko 0x000034f0: | 0x00001203 | .debug_str ("erase") 15916ac1de48SDmitri Gribenko | 0x00000004 | A 32 bit array count - number of HashData with name "erase" 15926ac1de48SDmitri Gribenko | 0x........ | HashData[0] 15936ac1de48SDmitri Gribenko | 0x........ | HashData[1] 15946ac1de48SDmitri Gribenko | 0x........ | HashData[2] 15956ac1de48SDmitri Gribenko | 0x........ | HashData[3] 15966ac1de48SDmitri Gribenko | 0x00000000 | String offset into .debug_str (terminate data for hash) 15976ac1de48SDmitri Gribenko |------------| 15986ac1de48SDmitri Gribenko 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") 15996ac1de48SDmitri Gribenko | 0x00000002 | A 32 bit array count - number of HashData with name "collision" 16006ac1de48SDmitri Gribenko | 0x........ | HashData[0] 16016ac1de48SDmitri Gribenko | 0x........ | HashData[1] 16026ac1de48SDmitri Gribenko | 0x00001203 | String offset into .debug_str ("dump") 16036ac1de48SDmitri Gribenko | 0x00000003 | A 32 bit array count - number of HashData with name "dump" 16046ac1de48SDmitri Gribenko | 0x........ | HashData[0] 16056ac1de48SDmitri Gribenko | 0x........ | HashData[1] 16066ac1de48SDmitri Gribenko | 0x........ | HashData[2] 16076ac1de48SDmitri Gribenko | 0x00000000 | String offset into .debug_str (terminate data for hash) 16086ac1de48SDmitri Gribenko |------------| 16096ac1de48SDmitri Gribenko 0x00003550: | 0x00001203 | String offset into .debug_str ("main") 16106ac1de48SDmitri Gribenko | 0x00000009 | A 32 bit array count - number of HashData with name "main" 16116ac1de48SDmitri Gribenko | 0x........ | HashData[0] 16126ac1de48SDmitri Gribenko | 0x........ | HashData[1] 16136ac1de48SDmitri Gribenko | 0x........ | HashData[2] 16146ac1de48SDmitri Gribenko | 0x........ | HashData[3] 16156ac1de48SDmitri Gribenko | 0x........ | HashData[4] 16166ac1de48SDmitri Gribenko | 0x........ | HashData[5] 16176ac1de48SDmitri Gribenko | 0x........ | HashData[6] 16186ac1de48SDmitri Gribenko | 0x........ | HashData[7] 16196ac1de48SDmitri Gribenko | 0x........ | HashData[8] 16206ac1de48SDmitri Gribenko | 0x00000000 | String offset into .debug_str (terminate data for hash) 16216ac1de48SDmitri Gribenko `------------' 16226ac1de48SDmitri Gribenko 16236ac1de48SDmitri GribenkoSo we still have all of the same data, we just organize it more efficiently for 16246ac1de48SDmitri Gribenkodebugger lookup. If we repeat the same "``printf``" lookup from above, we 16256ac1de48SDmitri Gribenkowould hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit 16266ac1de48SDmitri Gribenkohash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which 16276ac1de48SDmitri Gribenkois the index into the ``HASHES`` table. We would then compare any consecutive 16286ac1de48SDmitri Gribenko32 bit hashes values in the ``HASHES`` array as long as the hashes would be in 16296ac1de48SDmitri Gribenko``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo 16306ac1de48SDmitri Gribenko``n_buckets`` is still 3. In the case of a failed lookup we would access the 16316ac1de48SDmitri Gribenkomemory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes 16326ac1de48SDmitri Gribenkobefore we know that we have no match. We don't end up marching through 16336ac1de48SDmitri Gribenkomultiple words of memory and we really keep the number of processor data cache 16346ac1de48SDmitri Gribenkolines being accessed as small as possible. 16356ac1de48SDmitri Gribenko 16366ac1de48SDmitri GribenkoThe string hash that is used for these lookup tables is the Daniel J. 16376ac1de48SDmitri GribenkoBernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a 16386ac1de48SDmitri Gribenkovery good hash for all kinds of names in programs with very few hash 16396ac1de48SDmitri Gribenkocollisions. 16406ac1de48SDmitri Gribenko 16416ac1de48SDmitri GribenkoEmpty buckets are designated by using an invalid hash index of ``UINT32_MAX``. 16426ac1de48SDmitri Gribenko 16436ac1de48SDmitri GribenkoDetails 16446ac1de48SDmitri Gribenko^^^^^^^ 16456ac1de48SDmitri Gribenko 16466ac1de48SDmitri GribenkoThese name hash tables are designed to be generic where specializations of the 16476ac1de48SDmitri Gribenkotable get to define additional data that goes into the header ("``HeaderData``"), 16486ac1de48SDmitri Gribenkohow the string value is stored ("``KeyType``") and the content of the data for each 16496ac1de48SDmitri Gribenkohash value. 16506ac1de48SDmitri Gribenko 16516ac1de48SDmitri GribenkoHeader Layout 16526ac1de48SDmitri Gribenko""""""""""""" 16536ac1de48SDmitri Gribenko 16546ac1de48SDmitri GribenkoThe header has a fixed part, and the specialized part. The exact format of the 16556ac1de48SDmitri Gribenkoheader is: 16566ac1de48SDmitri Gribenko 16576ac1de48SDmitri Gribenko.. code-block:: c 16586ac1de48SDmitri Gribenko 16596ac1de48SDmitri Gribenko struct Header 16606ac1de48SDmitri Gribenko { 16616ac1de48SDmitri Gribenko uint32_t magic; // 'HASH' magic value to allow endian detection 16626ac1de48SDmitri Gribenko uint16_t version; // Version number 16636ac1de48SDmitri Gribenko uint16_t hash_function; // The hash function enumeration that was used 16646ac1de48SDmitri Gribenko uint32_t bucket_count; // The number of buckets in this hash table 16656ac1de48SDmitri Gribenko uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table 16666ac1de48SDmitri Gribenko uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment 16676ac1de48SDmitri Gribenko // Specifically the length of the following HeaderData field - this does not 16686ac1de48SDmitri Gribenko // include the size of the preceding fields 16696ac1de48SDmitri Gribenko HeaderData header_data; // Implementation specific header data 16706ac1de48SDmitri Gribenko }; 16716ac1de48SDmitri Gribenko 16726ac1de48SDmitri GribenkoThe header starts with a 32 bit "``magic``" value which must be ``'HASH'`` 16736ac1de48SDmitri Gribenkoencoded as an ASCII integer. This allows the detection of the start of the 16746ac1de48SDmitri Gribenkohash table and also allows the table's byte order to be determined so the table 16756ac1de48SDmitri Gribenkocan be correctly extracted. The "``magic``" value is followed by a 16 bit 16766ac1de48SDmitri Gribenko``version`` number which allows the table to be revised and modified in the 16776ac1de48SDmitri Gribenkofuture. The current version number is 1. ``hash_function`` is a ``uint16_t`` 16786ac1de48SDmitri Gribenkoenumeration that specifies which hash function was used to produce this table. 16796ac1de48SDmitri GribenkoThe current values for the hash function enumerations include: 16806ac1de48SDmitri Gribenko 16816ac1de48SDmitri Gribenko.. code-block:: c 16826ac1de48SDmitri Gribenko 16836ac1de48SDmitri Gribenko enum HashFunctionType 16846ac1de48SDmitri Gribenko { 16856ac1de48SDmitri Gribenko eHashFunctionDJB = 0u, // Daniel J Bernstein hash function 16866ac1de48SDmitri Gribenko }; 16876ac1de48SDmitri Gribenko 16886ac1de48SDmitri Gribenko``bucket_count`` is a 32 bit unsigned integer that represents how many buckets 16896ac1de48SDmitri Gribenkoare in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit 16906ac1de48SDmitri Gribenkohash values that are in the ``HASHES`` array, and is the same number of offsets 16916ac1de48SDmitri Gribenkoare contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size 16926ac1de48SDmitri Gribenkoin bytes of the ``HeaderData`` that is filled in by specialized versions of 16936ac1de48SDmitri Gribenkothis table. 16946ac1de48SDmitri Gribenko 16956ac1de48SDmitri GribenkoFixed Lookup 16966ac1de48SDmitri Gribenko"""""""""""" 16976ac1de48SDmitri Gribenko 16986ac1de48SDmitri GribenkoThe header is followed by the buckets, hashes, offsets, and hash value data. 16996ac1de48SDmitri Gribenko 17006ac1de48SDmitri Gribenko.. code-block:: c 17016ac1de48SDmitri Gribenko 17026ac1de48SDmitri Gribenko struct FixedTable 17036ac1de48SDmitri Gribenko { 17046ac1de48SDmitri Gribenko uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below 17056ac1de48SDmitri Gribenko uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table 17066ac1de48SDmitri Gribenko uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above 17076ac1de48SDmitri Gribenko }; 17086ac1de48SDmitri Gribenko 17096ac1de48SDmitri Gribenko``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The 17106ac1de48SDmitri Gribenko``hashes`` array contains all of the 32 bit hash values for all names in the 17116ac1de48SDmitri Gribenkohash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` 17126ac1de48SDmitri Gribenkoarray that points to the data for the hash value. 17136ac1de48SDmitri Gribenko 17146ac1de48SDmitri GribenkoThis table setup makes it very easy to repurpose these tables to contain 17156ac1de48SDmitri Gribenkodifferent data, while keeping the lookup mechanism the same for all tables. 17166ac1de48SDmitri GribenkoThis layout also makes it possible to save the table to disk and map it in 17176ac1de48SDmitri Gribenkolater and do very efficient name lookups with little or no parsing. 17186ac1de48SDmitri Gribenko 17196ac1de48SDmitri GribenkoDWARF lookup tables can be implemented in a variety of ways and can store a lot 17206ac1de48SDmitri Gribenkoof information for each name. We want to make the DWARF tables extensible and 17216ac1de48SDmitri Gribenkoable to store the data efficiently so we have used some of the DWARF features 17226ac1de48SDmitri Gribenkothat enable efficient data storage to define exactly what kind of data we store 17236ac1de48SDmitri Gribenkofor each name. 17246ac1de48SDmitri Gribenko 17256ac1de48SDmitri GribenkoThe ``HeaderData`` contains a definition of the contents of each HashData chunk. 17266ac1de48SDmitri GribenkoWe might want to store an offset to all of the debug information entries (DIEs) 17276ac1de48SDmitri Gribenkofor each name. To keep things extensible, we create a list of items, or 17286ac1de48SDmitri GribenkoAtoms, that are contained in the data for each name. First comes the type of 17296ac1de48SDmitri Gribenkothe data in each atom: 17306ac1de48SDmitri Gribenko 17316ac1de48SDmitri Gribenko.. code-block:: c 17326ac1de48SDmitri Gribenko 17336ac1de48SDmitri Gribenko enum AtomType 17346ac1de48SDmitri Gribenko { 17356ac1de48SDmitri Gribenko eAtomTypeNULL = 0u, 17366ac1de48SDmitri Gribenko eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding 17376ac1de48SDmitri Gribenko eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question 17386ac1de48SDmitri Gribenko eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 17396ac1de48SDmitri Gribenko eAtomTypeNameFlags = 4u, // Flags from enum NameFlags 17406ac1de48SDmitri Gribenko eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags 17416ac1de48SDmitri Gribenko }; 17426ac1de48SDmitri Gribenko 17436ac1de48SDmitri GribenkoThe enumeration values and their meanings are: 17446ac1de48SDmitri Gribenko 17456ac1de48SDmitri Gribenko.. code-block:: none 17466ac1de48SDmitri Gribenko 17476ac1de48SDmitri Gribenko eAtomTypeNULL - a termination atom that specifies the end of the atom list 17486ac1de48SDmitri Gribenko eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name 17496ac1de48SDmitri Gribenko eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE 17506ac1de48SDmitri Gribenko eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is 17516ac1de48SDmitri Gribenko eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) 17526ac1de48SDmitri Gribenko eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) 17536ac1de48SDmitri Gribenko 17546ac1de48SDmitri GribenkoThen we allow each atom type to define the atom type and how the data for each 17556ac1de48SDmitri Gribenkoatom type data is encoded: 17566ac1de48SDmitri Gribenko 17576ac1de48SDmitri Gribenko.. code-block:: c 17586ac1de48SDmitri Gribenko 17596ac1de48SDmitri Gribenko struct Atom 17606ac1de48SDmitri Gribenko { 17616ac1de48SDmitri Gribenko uint16_t type; // AtomType enum value 17626ac1de48SDmitri Gribenko uint16_t form; // DWARF DW_FORM_XXX defines 17636ac1de48SDmitri Gribenko }; 17646ac1de48SDmitri Gribenko 17656ac1de48SDmitri GribenkoThe ``form`` type above is from the DWARF specification and defines the exact 17666ac1de48SDmitri Gribenkoencoding of the data for the Atom type. See the DWARF specification for the 17676ac1de48SDmitri Gribenko``DW_FORM_`` definitions. 17686ac1de48SDmitri Gribenko 17696ac1de48SDmitri Gribenko.. code-block:: c 17706ac1de48SDmitri Gribenko 17716ac1de48SDmitri Gribenko struct HeaderData 17726ac1de48SDmitri Gribenko { 17736ac1de48SDmitri Gribenko uint32_t die_offset_base; 17746ac1de48SDmitri Gribenko uint32_t atom_count; 17756ac1de48SDmitri Gribenko Atoms atoms[atom_count0]; 17766ac1de48SDmitri Gribenko }; 17776ac1de48SDmitri Gribenko 17786ac1de48SDmitri Gribenko``HeaderData`` defines the base DIE offset that should be added to any atoms 17796ac1de48SDmitri Gribenkothat are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``, 17806ac1de48SDmitri Gribenko``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines 17816ac1de48SDmitri Gribenkowhat is contained in each ``HashData`` object -- ``Atom.form`` tells us how large 17826ac1de48SDmitri Gribenkoeach field will be in the ``HashData`` and the ``Atom.type`` tells us how this data 17836ac1de48SDmitri Gribenkoshould be interpreted. 17846ac1de48SDmitri Gribenko 17856ac1de48SDmitri GribenkoFor the current implementations of the "``.apple_names``" (all functions + 17866ac1de48SDmitri Gribenkoglobals), the "``.apple_types``" (names of all types that are defined), and 17876ac1de48SDmitri Gribenkothe "``.apple_namespaces``" (all namespaces), we currently set the ``Atom`` 17886ac1de48SDmitri Gribenkoarray to be: 17896ac1de48SDmitri Gribenko 17906ac1de48SDmitri Gribenko.. code-block:: c 17916ac1de48SDmitri Gribenko 17926ac1de48SDmitri Gribenko HeaderData.atom_count = 1; 17936ac1de48SDmitri Gribenko HeaderData.atoms[0].type = eAtomTypeDIEOffset; 17946ac1de48SDmitri Gribenko HeaderData.atoms[0].form = DW_FORM_data4; 17956ac1de48SDmitri Gribenko 17966ac1de48SDmitri GribenkoThis defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is 17976ac1de48SDmitri Gribenkoencoded as a 32 bit value (DW_FORM_data4). This allows a single name to have 17986ac1de48SDmitri Gribenkomultiple matching DIEs in a single file, which could come up with an inlined 17996ac1de48SDmitri Gribenkofunction for instance. Future tables could include more information about the 18006ac1de48SDmitri GribenkoDIE such as flags indicating if the DIE is a function, method, block, 18016ac1de48SDmitri Gribenkoor inlined. 18026ac1de48SDmitri Gribenko 18036ac1de48SDmitri GribenkoThe KeyType for the DWARF table is a 32 bit string table offset into the 18046ac1de48SDmitri Gribenko".debug_str" table. The ".debug_str" is the string table for the DWARF which 18056ac1de48SDmitri Gribenkomay already contain copies of all of the strings. This helps make sure, with 18066ac1de48SDmitri Gribenkohelp from the compiler, that we reuse the strings between all of the DWARF 18076ac1de48SDmitri Gribenkosections and keeps the hash table size down. Another benefit to having the 18086ac1de48SDmitri Gribenkocompiler generate all strings as DW_FORM_strp in the debug info, is that 18096ac1de48SDmitri GribenkoDWARF parsing can be made much faster. 18106ac1de48SDmitri Gribenko 18116ac1de48SDmitri GribenkoAfter a lookup is made, we get an offset into the hash data. The hash data 18126ac1de48SDmitri Gribenkoneeds to be able to deal with 32 bit hash collisions, so the chunk of data 18136ac1de48SDmitri Gribenkoat the offset in the hash data consists of a triple: 18146ac1de48SDmitri Gribenko 18156ac1de48SDmitri Gribenko.. code-block:: c 18166ac1de48SDmitri Gribenko 18176ac1de48SDmitri Gribenko uint32_t str_offset 18186ac1de48SDmitri Gribenko uint32_t hash_data_count 18196ac1de48SDmitri Gribenko HashData[hash_data_count] 18206ac1de48SDmitri Gribenko 18216ac1de48SDmitri GribenkoIf "str_offset" is zero, then the bucket contents are done. 99.9% of the 18226ac1de48SDmitri Gribenkohash data chunks contain a single item (no 32 bit hash collision): 18236ac1de48SDmitri Gribenko 18246ac1de48SDmitri Gribenko.. code-block:: none 18256ac1de48SDmitri Gribenko 18266ac1de48SDmitri Gribenko .------------. 18276ac1de48SDmitri Gribenko | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 18286ac1de48SDmitri Gribenko | 0x00000004 | uint32_t HashData count 18296ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[0] DIE offset 18306ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[1] DIE offset 18316ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[2] DIE offset 18326ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[3] DIE offset 18336ac1de48SDmitri Gribenko | 0x00000000 | uint32_t KeyType (end of hash chain) 18346ac1de48SDmitri Gribenko `------------' 18356ac1de48SDmitri Gribenko 18366ac1de48SDmitri GribenkoIf there are collisions, you will have multiple valid string offsets: 18376ac1de48SDmitri Gribenko 18386ac1de48SDmitri Gribenko.. code-block:: none 18396ac1de48SDmitri Gribenko 18406ac1de48SDmitri Gribenko .------------. 18416ac1de48SDmitri Gribenko | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 18426ac1de48SDmitri Gribenko | 0x00000004 | uint32_t HashData count 18436ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[0] DIE offset 18446ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[1] DIE offset 18456ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[2] DIE offset 18466ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[3] DIE offset 18476ac1de48SDmitri Gribenko | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") 18486ac1de48SDmitri Gribenko | 0x00000002 | uint32_t HashData count 18496ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[0] DIE offset 18506ac1de48SDmitri Gribenko | 0x........ | uint32_t HashData[1] DIE offset 18516ac1de48SDmitri Gribenko | 0x00000000 | uint32_t KeyType (end of hash chain) 18526ac1de48SDmitri Gribenko `------------' 18536ac1de48SDmitri Gribenko 18546ac1de48SDmitri GribenkoCurrent testing with real world C++ binaries has shown that there is around 1 18556ac1de48SDmitri Gribenko32 bit hash collision per 100,000 name entries. 18566ac1de48SDmitri Gribenko 18576ac1de48SDmitri GribenkoContents 18586ac1de48SDmitri Gribenko^^^^^^^^ 18596ac1de48SDmitri Gribenko 18606ac1de48SDmitri GribenkoAs we said, we want to strictly define exactly what is included in the 18616ac1de48SDmitri Gribenkodifferent tables. For DWARF, we have 3 tables: "``.apple_names``", 18626ac1de48SDmitri Gribenko"``.apple_types``", and "``.apple_namespaces``". 18636ac1de48SDmitri Gribenko 18646ac1de48SDmitri Gribenko"``.apple_names``" sections should contain an entry for each DWARF DIE whose 18656ac1de48SDmitri Gribenko``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or 18666ac1de48SDmitri Gribenko``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``, 18676ac1de48SDmitri Gribenko``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains 18686ac1de48SDmitri Gribenko``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and 18696ac1de48SDmitri Gribenkostatic variables). All global and static variables should be included, 18706ac1de48SDmitri Gribenkoincluding those scoped within functions and classes. For example using the 18716ac1de48SDmitri Gribenkofollowing code: 18726ac1de48SDmitri Gribenko 18736ac1de48SDmitri Gribenko.. code-block:: c 18746ac1de48SDmitri Gribenko 18756ac1de48SDmitri Gribenko static int var = 0; 18766ac1de48SDmitri Gribenko 18776ac1de48SDmitri Gribenko void f () 18786ac1de48SDmitri Gribenko { 18796ac1de48SDmitri Gribenko static int var = 0; 18806ac1de48SDmitri Gribenko } 18816ac1de48SDmitri Gribenko 18826ac1de48SDmitri GribenkoBoth of the static ``var`` variables would be included in the table. All 18836ac1de48SDmitri Gribenkofunctions should emit both their full names and their basenames. For C or C++, 18846ac1de48SDmitri Gribenkothe full name is the mangled name (if available) which is usually in the 18856ac1de48SDmitri Gribenko``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the 18866ac1de48SDmitri Gribenkofunction basename. If global or static variables have a mangled name in a 18876ac1de48SDmitri Gribenko``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the 18886ac1de48SDmitri Gribenkosimple name found in the ``DW_AT_name`` attribute. 18896ac1de48SDmitri Gribenko 18906ac1de48SDmitri Gribenko"``.apple_types``" sections should contain an entry for each DWARF DIE whose 18916ac1de48SDmitri Gribenkotag is one of: 18926ac1de48SDmitri Gribenko 18936ac1de48SDmitri Gribenko* DW_TAG_array_type 18946ac1de48SDmitri Gribenko* DW_TAG_class_type 18956ac1de48SDmitri Gribenko* DW_TAG_enumeration_type 18966ac1de48SDmitri Gribenko* DW_TAG_pointer_type 18976ac1de48SDmitri Gribenko* DW_TAG_reference_type 18986ac1de48SDmitri Gribenko* DW_TAG_string_type 18996ac1de48SDmitri Gribenko* DW_TAG_structure_type 19006ac1de48SDmitri Gribenko* DW_TAG_subroutine_type 19016ac1de48SDmitri Gribenko* DW_TAG_typedef 19026ac1de48SDmitri Gribenko* DW_TAG_union_type 19036ac1de48SDmitri Gribenko* DW_TAG_ptr_to_member_type 19046ac1de48SDmitri Gribenko* DW_TAG_set_type 19056ac1de48SDmitri Gribenko* DW_TAG_subrange_type 19066ac1de48SDmitri Gribenko* DW_TAG_base_type 19076ac1de48SDmitri Gribenko* DW_TAG_const_type 190834435fd1SLuís Ferreira* DW_TAG_immutable_type 19096ac1de48SDmitri Gribenko* DW_TAG_file_type 19106ac1de48SDmitri Gribenko* DW_TAG_namelist 19116ac1de48SDmitri Gribenko* DW_TAG_packed_type 19126ac1de48SDmitri Gribenko* DW_TAG_volatile_type 19136ac1de48SDmitri Gribenko* DW_TAG_restrict_type 1914e1156c2eSVictor Leschuk* DW_TAG_atomic_type 19156ac1de48SDmitri Gribenko* DW_TAG_interface_type 19166ac1de48SDmitri Gribenko* DW_TAG_unspecified_type 19176ac1de48SDmitri Gribenko* DW_TAG_shared_type 19186ac1de48SDmitri Gribenko 19196ac1de48SDmitri GribenkoOnly entries with a ``DW_AT_name`` attribute are included, and the entry must 19206ac1de48SDmitri Gribenkonot be a forward declaration (``DW_AT_declaration`` attribute with a non-zero 19216ac1de48SDmitri Gribenkovalue). For example, using the following code: 19226ac1de48SDmitri Gribenko 19236ac1de48SDmitri Gribenko.. code-block:: c 19246ac1de48SDmitri Gribenko 19256ac1de48SDmitri Gribenko int main () 19266ac1de48SDmitri Gribenko { 19276ac1de48SDmitri Gribenko int *b = 0; 19286ac1de48SDmitri Gribenko return *b; 19296ac1de48SDmitri Gribenko } 19306ac1de48SDmitri Gribenko 19316ac1de48SDmitri GribenkoWe get a few type DIEs: 19326ac1de48SDmitri Gribenko 19336ac1de48SDmitri Gribenko.. code-block:: none 19346ac1de48SDmitri Gribenko 19356ac1de48SDmitri Gribenko 0x00000067: TAG_base_type [5] 19366ac1de48SDmitri Gribenko AT_encoding( DW_ATE_signed ) 19376ac1de48SDmitri Gribenko AT_name( "int" ) 19386ac1de48SDmitri Gribenko AT_byte_size( 0x04 ) 19396ac1de48SDmitri Gribenko 19406ac1de48SDmitri Gribenko 0x0000006e: TAG_pointer_type [6] 19416ac1de48SDmitri Gribenko AT_type( {0x00000067} ( int ) ) 19426ac1de48SDmitri Gribenko AT_byte_size( 0x08 ) 19436ac1de48SDmitri Gribenko 19446ac1de48SDmitri GribenkoThe DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``. 19456ac1de48SDmitri Gribenko 19466ac1de48SDmitri Gribenko"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs. 19476ac1de48SDmitri GribenkoIf we run into a namespace that has no name this is an anonymous namespace, and 19486ac1de48SDmitri Gribenkothe name should be output as "``(anonymous namespace)``" (without the quotes). 19496ac1de48SDmitri GribenkoWhy? This matches the output of the ``abi::cxa_demangle()`` that is in the 19506ac1de48SDmitri Gribenkostandard C++ library that demangles mangled names. 19516ac1de48SDmitri Gribenko 19526ac1de48SDmitri Gribenko 19536ac1de48SDmitri GribenkoLanguage Extensions and File Format Changes 19546ac1de48SDmitri Gribenko^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 19556ac1de48SDmitri Gribenko 19566ac1de48SDmitri GribenkoObjective-C Extensions 19576ac1de48SDmitri Gribenko"""""""""""""""""""""" 19586ac1de48SDmitri Gribenko 19596ac1de48SDmitri Gribenko"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an 19606ac1de48SDmitri GribenkoObjective-C class. The name used in the hash table is the name of the 19616ac1de48SDmitri GribenkoObjective-C class itself. If the Objective-C class has a category, then an 19626ac1de48SDmitri Gribenkoentry is made for both the class name without the category, and for the class 19636ac1de48SDmitri Gribenkoname with the category. So if we have a DIE at offset 0x1234 with a name of 19646ac1de48SDmitri Gribenkomethod "``-[NSString(my_additions) stringWithSpecialString:]``", we would add 19656ac1de48SDmitri Gribenkoan entry for "``NSString``" that points to DIE 0x1234, and an entry for 19666ac1de48SDmitri Gribenko"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly 19676ac1de48SDmitri Gribenkotrack down all Objective-C methods for an Objective-C class when doing 19686ac1de48SDmitri Gribenkoexpressions. It is needed because of the dynamic nature of Objective-C where 19696ac1de48SDmitri Gribenkoanyone can add methods to a class. The DWARF for Objective-C methods is also 19706ac1de48SDmitri Gribenkoemitted differently from C++ classes where the methods are not usually 19716ac1de48SDmitri Gribenkocontained in the class definition, they are scattered about across one or more 19726ac1de48SDmitri Gribenkocompile units. Categories can also be defined in different shared libraries. 19736ac1de48SDmitri GribenkoSo we need to be able to quickly find all of the methods and class functions 19746ac1de48SDmitri Gribenkogiven the Objective-C class name, or quickly find all methods and class 19756ac1de48SDmitri Gribenkofunctions for a class + category name. This table does not contain any 19766ac1de48SDmitri Gribenkoselector names, it just maps Objective-C class names (or class names + 19776ac1de48SDmitri Gribenkocategory) to all of the methods and class functions. The selectors are added 19786ac1de48SDmitri Gribenkoas function basenames in the "``.debug_names``" section. 19796ac1de48SDmitri Gribenko 19806ac1de48SDmitri GribenkoIn the "``.apple_names``" section for Objective-C functions, the full name is 19816ac1de48SDmitri Gribenkothe entire function name with the brackets ("``-[NSString 19826ac1de48SDmitri GribenkostringWithCString:]``") and the basename is the selector only 19836ac1de48SDmitri Gribenko("``stringWithCString:``"). 19846ac1de48SDmitri Gribenko 19856ac1de48SDmitri GribenkoMach-O Changes 19866ac1de48SDmitri Gribenko"""""""""""""" 19876ac1de48SDmitri Gribenko 1988f907b891SAlp TokerThe sections names for the apple hash tables are for non-mach-o files. For 19896ac1de48SDmitri Gribenkomach-o files, the sections should be contained in the ``__DWARF`` segment with 19906ac1de48SDmitri Gribenkonames as follows: 19916ac1de48SDmitri Gribenko 19926ac1de48SDmitri Gribenko* "``.apple_names``" -> "``__apple_names``" 19936ac1de48SDmitri Gribenko* "``.apple_types``" -> "``__apple_types``" 19946ac1de48SDmitri Gribenko* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) 19956ac1de48SDmitri Gribenko* "``.apple_objc``" -> "``__apple_objc``" 19966ac1de48SDmitri Gribenko 19970ad60a90SReid Kleckner.. _codeview: 19980ad60a90SReid Kleckner 19990ad60a90SReid KlecknerCodeView Debug Info Format 20000ad60a90SReid Kleckner========================== 20010ad60a90SReid Kleckner 20020ad60a90SReid KlecknerLLVM supports emitting CodeView, the Microsoft debug info format, and this 20030ad60a90SReid Klecknersection describes the design and implementation of that support. 20040ad60a90SReid Kleckner 20050ad60a90SReid KlecknerFormat Background 20060ad60a90SReid Kleckner----------------- 20070ad60a90SReid Kleckner 20080ad60a90SReid KlecknerCodeView as a format is clearly oriented around C++ debugging, and in C++, the 20090ad60a90SReid Klecknermajority of debug information tends to be type information. Therefore, the 20100ad60a90SReid Kleckneroverriding design constraint of CodeView is the separation of type information 20110ad60a90SReid Klecknerfrom other "symbol" information so that type information can be efficiently 20120ad60a90SReid Klecknermerged across translation units. Both type information and symbol information is 20130ad60a90SReid Klecknergenerally stored as a sequence of records, where each record begins with a 20140ad60a90SReid Kleckner16-bit record size and a 16-bit record kind. 20150ad60a90SReid Kleckner 20160ad60a90SReid KlecknerType information is usually stored in the ``.debug$T`` section of the object 20170ad60a90SReid Klecknerfile. All other debug info, such as line info, string table, symbol info, and 20180ad60a90SReid Klecknerinlinee info, is stored in one or more ``.debug$S`` sections. There may only be 20190ad60a90SReid Klecknerone ``.debug$T`` section per object file, since all other debug info refers to 20200ad60a90SReid Klecknerit. If a PDB (enabled by the ``/Zi`` MSVC option) was used during compilation, 20210ad60a90SReid Klecknerthe ``.debug$T`` section will contain only an ``LF_TYPESERVER2`` record pointing 20220ad60a90SReid Klecknerto the PDB. When using PDBs, symbol information appears to remain in the object 20230ad60a90SReid Klecknerfile ``.debug$S`` sections. 20240ad60a90SReid Kleckner 20250ad60a90SReid KlecknerType records are referred to by their index, which is the number of records in 20260ad60a90SReid Klecknerthe stream before a given record plus ``0x1000``. Many common basic types, such 20270ad60a90SReid Kleckneras the basic integral types and unqualified pointers to them, are represented 20280ad60a90SReid Klecknerusing type indices less than ``0x1000``. Such basic types are built in to 20290ad60a90SReid KlecknerCodeView consumers and do not require type records. 20300ad60a90SReid Kleckner 20310ad60a90SReid KlecknerEach type record may only contain type indices that are less than its own type 20320ad60a90SReid Klecknerindex. This ensures that the graph of type stream references is acyclic. While 20330ad60a90SReid Klecknerthe source-level type graph may contain cycles through pointer types (consider a 20340ad60a90SReid Klecknerlinked list struct), these cycles are removed from the type stream by always 20350ad60a90SReid Klecknerreferring to the forward declaration record of user-defined record types. Only 20360ad60a90SReid Kleckner"symbol" records in the ``.debug$S`` streams may refer to complete, 20370ad60a90SReid Klecknernon-forward-declaration type records. 20380ad60a90SReid Kleckner 20390ad60a90SReid KlecknerWorking with CodeView 20400ad60a90SReid Kleckner--------------------- 20410ad60a90SReid Kleckner 20420ad60a90SReid KlecknerThese are instructions for some common tasks for developers working to improve 20430ad60a90SReid KlecknerLLVM's CodeView support. Most of them revolve around using the CodeView dumper 20440ad60a90SReid Klecknerembedded in ``llvm-readobj``. 20450ad60a90SReid Kleckner 20460ad60a90SReid Kleckner* Testing MSVC's output:: 20470ad60a90SReid Kleckner 20480ad60a90SReid Kleckner $ cl -c -Z7 foo.cpp # Use /Z7 to keep types in the object file 2049e29e30b1SFangrui Song $ llvm-readobj --codeview foo.obj 20500ad60a90SReid Kleckner 20510ad60a90SReid Kleckner* Getting LLVM IR debug info out of Clang:: 20520ad60a90SReid Kleckner 20530ad60a90SReid Kleckner $ clang -g -gcodeview --target=x86_64-windows-msvc foo.cpp -S -emit-llvm 20540ad60a90SReid Kleckner 20550ad60a90SReid Kleckner Use this to generate LLVM IR for LLVM test cases. 20560ad60a90SReid Kleckner 20570ad60a90SReid Kleckner* Generate and dump CodeView from LLVM IR metadata:: 20580ad60a90SReid Kleckner 20590ad60a90SReid Kleckner $ llc foo.ll -filetype=obj -o foo.obj 2060e29e30b1SFangrui Song $ llvm-readobj --codeview foo.obj > foo.txt 20610ad60a90SReid Kleckner 20620ad60a90SReid Kleckner Use this pattern in lit test cases and FileCheck the output of llvm-readobj 20630ad60a90SReid Kleckner 20640ad60a90SReid KlecknerImproving LLVM's CodeView support is a process of finding interesting type 20650ad60a90SReid Klecknerrecords, constructing a C++ test case that makes MSVC emit those records, 20660ad60a90SReid Klecknerdumping the records, understanding them, and then generating equivalent records 20670ad60a90SReid Klecknerin LLVM's backend. 2068