1================================ 2Source Level Debugging with LLVM 3================================ 4 5.. sectionauthor:: Chris Lattner <[email protected]> and Jim Laskey <[email protected]> 6 7.. contents:: 8 :local: 9 10Introduction 11============ 12 13This document is the central repository for all information pertaining to debug 14information in LLVM. It describes the :ref:`actual format that the LLVM debug 15information takes <format>`, which is useful for those interested in creating 16front-ends or dealing directly with the information. Further, this document 17provides specific examples of what debug information for C/C++ looks like. 18 19Philosophy behind LLVM debugging information 20-------------------------------------------- 21 22The idea of the LLVM debugging information is to capture how the important 23pieces of the source-language's Abstract Syntax Tree map onto LLVM code. 24Several design aspects have shaped the solution that appears here. The 25important ones are: 26 27* Debugging information should have very little impact on the rest of the 28 compiler. No transformations, analyses, or code generators should need to 29 be modified because of debugging information. 30 31* LLVM optimizations should interact in :ref:`well-defined and easily described 32 ways <intro_debugopt>` with the debugging information. 33 34* Because LLVM is designed to support arbitrary programming languages, 35 LLVM-to-LLVM tools should not need to know anything about the semantics of 36 the source-level-language. 37 38* Source-level languages are often **widely** different from one another. 39 LLVM should not put any restrictions of the flavor of the source-language, 40 and the debugging information should work with any language. 41 42* With code generator support, it should be possible to use an LLVM compiler 43 to compile a program to native machine code and standard debugging 44 formats. This allows compatibility with traditional machine-code level 45 debuggers, like GDB or DBX. 46 47The approach used by the LLVM implementation is to use a small set of 48:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping 49between LLVM program objects and the source-level objects. The description of 50the source-level program is maintained in LLVM metadata in an 51:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end 52currently uses working draft 7 of the `DWARF 3 standard 53<http://www.eagercon.com/dwarf/dwarf3std.htm>`_). 54 55When a program is being debugged, a debugger interacts with the user and turns 56the stored debug information into source-language specific information. As 57such, a debugger must be aware of the source-language, and is thus tied to a 58specific language or family of languages. 59 60Debug information consumers 61--------------------------- 62 63The role of debug information is to provide meta information normally stripped 64away during the compilation process. This meta information provides an LLVM 65user a relationship between generated code and the original program source 66code. 67 68Currently, debug information is consumed by DwarfDebug to produce dwarf 69information used by the gdb debugger. Other targets could use the same 70information to produce stabs or other debug forms. 71 72It would also be reasonable to use debug information to feed profiling tools 73for analysis of generated code, or, tools for reconstructing the original 74source from generated code. 75 76TODO - expound a bit more. 77 78.. _intro_debugopt: 79 80Debugging optimized code 81------------------------ 82 83An extremely high priority of LLVM debugging information is to make it interact 84well with optimizations and analysis. In particular, the LLVM debug 85information provides the following guarantees: 86 87* LLVM debug information **always provides information to accurately read 88 the source-level state of the program**, regardless of which LLVM 89 optimizations have been run, and without any modification to the 90 optimizations themselves. However, some optimizations may impact the 91 ability to modify the current state of the program with a debugger, such 92 as setting program variables, or calling functions that have been 93 deleted. 94 95* As desired, LLVM optimizations can be upgraded to be aware of the LLVM 96 debugging information, allowing them to update the debugging information 97 as they perform aggressive optimizations. This means that, with effort, 98 the LLVM optimizers could optimize debug code just as well as non-debug 99 code. 100 101* LLVM debug information does not prevent optimizations from 102 happening (for example inlining, basic block reordering/merging/cleanup, 103 tail duplication, etc). 104 105* LLVM debug information is automatically optimized along with the rest of 106 the program, using existing facilities. For example, duplicate 107 information is automatically merged by the linker, and unused information 108 is automatically removed. 109 110Basically, the debug information allows you to compile a program with 111"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify 112the program as it executes from a debugger. Compiling a program with 113"``-O3 -g``" gives you full debug information that is always available and 114accurate for reading (e.g., you get accurate stack traces despite tail call 115elimination and inlining), but you might lose the ability to modify the program 116and call functions where were optimized out of the program, or inlined away 117completely. 118 119:ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test 120optimizer's handling of debugging information. It can be run like this: 121 122.. code-block:: bash 123 124 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level 125 % make TEST=dbgopt 126 127This will test impact of debugging information on optimization passes. If 128debugging information influences optimization passes then it will be reported 129as a failure. See :doc:`TestingGuide` for more information on LLVM test 130infrastructure and how to run various tests. 131 132.. _format: 133 134Debugging information format 135============================ 136 137LLVM debugging information has been carefully designed to make it possible for 138the optimizer to optimize the program and debugging information without 139necessarily having to know anything about debugging information. In 140particular, the use of metadata avoids duplicated debugging information from 141the beginning, and the global dead code elimination pass automatically deletes 142debugging information for a function if it decides to delete the function. 143 144To do this, most of the debugging information (descriptors for types, 145variables, functions, source files, etc) is inserted by the language front-end 146in the form of LLVM metadata. 147 148Debug information is designed to be agnostic about the target debugger and 149debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic 150pass to decode the information that represents variables, types, functions, 151namespaces, etc: this allows for arbitrary source-language semantics and 152type-systems to be used, as long as there is a module written for the target 153debugger to interpret the information. 154 155To provide basic functionality, the LLVM debugger does have to make some 156assumptions about the source-level language being debugged, though it keeps 157these to a minimum. The only common features that the LLVM debugger assumes 158exist are :ref:`source files <format_files>`, and :ref:`program objects 159<format_global_variables>`. These abstract objects are used by a debugger to 160form stack traces, show information about local variables, etc. 161 162This section of the documentation first describes the representation aspects 163common to any source-language. :ref:`ccxx_frontend` describes the data layout 164conventions used by the C and C++ front-ends. 165 166Debug information descriptors 167----------------------------- 168 169In consideration of the complexity and volume of debug information, LLVM 170provides a specification for well formed debug descriptors. 171 172Consumers of LLVM debug information expect the descriptors for program objects 173to start in a canonical format, but the descriptors can include additional 174information appended at the end that is source-language specific. All LLVM 175debugging information is versioned, allowing backwards compatibility in the 176case that the core structures need to change in some way. Also, all debugging 177information objects start with a tag to indicate what type of object it is. 178The source-language is allowed to define its own objects, by using unreserved 179tag numbers. We recommend using with tags in the range 0x1000 through 0x2000 180(there is a defined ``enum DW_TAG_user_base = 0x1000``.) 181 182The fields of debug descriptors used internally by LLVM are restricted to only 183the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and 184``mdnode``. 185 186.. code-block:: llvm 187 188 !1 = metadata !{ 189 i32, ;; A tag 190 ... 191 } 192 193<a name="LLVMDebugVersion">The first field of a descriptor is always an 194``i32`` containing a tag value identifying the content of the descriptor. 195The remaining fields are specific to the descriptor. The values of tags are 196loosely bound to the tag values of DWARF information entries. However, that 197does not restrict the use of the information supplied to DWARF targets. To 198facilitate versioning of debug information, the tag is augmented with the 199current debug version (``LLVMDebugVersion = 8 << 16`` or 0x80000 or 200524288.) 201 202The details of the various descriptors follow. 203 204Compile unit descriptors 205^^^^^^^^^^^^^^^^^^^^^^^^ 206 207.. code-block:: llvm 208 209 !0 = metadata !{ 210 i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit) 211 i32, ;; Unused field. 212 i32, ;; DWARF language identifier (ex. DW_LANG_C89) 213 metadata, ;; Source file name 214 metadata, ;; Source file directory (includes trailing slash) 215 metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") 216 i1, ;; True if this is a main compile unit. 217 i1, ;; True if this is optimized. 218 metadata, ;; Flags 219 i32 ;; Runtime version 220 metadata ;; List of enums types 221 metadata ;; List of retained types 222 metadata ;; List of subprograms 223 metadata ;; List of global variables 224 } 225 226These descriptors contain a source language ID for the file (we use the DWARF 2273.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``, 228``DW_LANG_Cobol74``, etc), three strings describing the filename, working 229directory of the compiler, and an identifier string for the compiler that 230produced it. 231 232Compile unit descriptors provide the root context for objects declared in a 233specific compilation unit. File descriptors are defined using this context. 234These descriptors are collected by a named metadata ``!llvm.dbg.cu``. They 235keep track of subprograms, global variables and type information. 236 237.. _format_files: 238 239File descriptors 240^^^^^^^^^^^^^^^^ 241 242.. code-block:: llvm 243 244 !0 = metadata !{ 245 i32, ;; Tag = 41 + LLVMDebugVersion (DW_TAG_file_type) 246 metadata, ;; Source file name 247 metadata, ;; Source file directory (includes trailing slash) 248 metadata ;; Unused 249 } 250 251These descriptors contain information for a file. Global variables and top 252level functions would be defined using this context. File descriptors also 253provide context for source line correspondence. 254 255Each input file is encoded as a separate file descriptor in LLVM debugging 256information output. 257 258.. _format_global_variables: 259 260Global variable descriptors 261^^^^^^^^^^^^^^^^^^^^^^^^^^^ 262 263.. code-block:: llvm 264 265 !1 = metadata !{ 266 i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable) 267 i32, ;; Unused field. 268 metadata, ;; Reference to context descriptor 269 metadata, ;; Name 270 metadata, ;; Display name (fully qualified C++ name) 271 metadata, ;; MIPS linkage name (for C++) 272 metadata, ;; Reference to file where defined 273 i32, ;; Line number where defined 274 metadata, ;; Reference to type descriptor 275 i1, ;; True if the global is local to compile unit (static) 276 i1, ;; True if the global is defined in the compile unit (not extern) 277 {}* ;; Reference to the global variable 278 } 279 280These descriptors provide debug information about globals variables. They 281provide details such as name, type and where the variable is defined. All 282global variables are collected inside the named metadata ``!llvm.dbg.cu``. 283 284.. _format_subprograms: 285 286Subprogram descriptors 287^^^^^^^^^^^^^^^^^^^^^^ 288 289.. code-block:: llvm 290 291 !2 = metadata !{ 292 i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram) 293 i32, ;; Unused field. 294 metadata, ;; Reference to context descriptor 295 metadata, ;; Name 296 metadata, ;; Display name (fully qualified C++ name) 297 metadata, ;; MIPS linkage name (for C++) 298 metadata, ;; Reference to file where defined 299 i32, ;; Line number where defined 300 metadata, ;; Reference to type descriptor 301 i1, ;; True if the global is local to compile unit (static) 302 i1, ;; True if the global is defined in the compile unit (not extern) 303 i32, ;; Line number where the scope of the subprogram begins 304 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual 305 i32, ;; Index into a virtual function 306 metadata, ;; indicates which base type contains the vtable pointer for the 307 ;; derived class 308 i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. 309 i1, ;; isOptimized 310 Function * , ;; Pointer to LLVM function 311 metadata, ;; Lists function template parameters 312 metadata, ;; Function declaration descriptor 313 metadata ;; List of function variables 314 } 315 316These descriptors provide debug information about functions, methods and 317subprograms. They provide details such as name, return types and the source 318location where the subprogram is defined. 319 320Block descriptors 321^^^^^^^^^^^^^^^^^ 322 323.. code-block:: llvm 324 325 !3 = metadata !{ 326 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) 327 metadata,;; Reference to context descriptor 328 i32, ;; Line number 329 i32, ;; Column number 330 metadata,;; Reference to source file 331 i32 ;; Unique ID to identify blocks from a template function 332 } 333 334This descriptor provides debug information about nested blocks within a 335subprogram. The line number and column numbers are used to dinstinguish two 336lexical blocks at same depth. 337 338.. code-block:: llvm 339 340 !3 = metadata !{ 341 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) 342 metadata ;; Reference to the scope we're annotating with a file change 343 metadata,;; Reference to the file the scope is enclosed in. 344 } 345 346This descriptor provides a wrapper around a lexical scope to handle file 347changes in the middle of a lexical block. 348 349.. _format_basic_type: 350 351Basic type descriptors 352^^^^^^^^^^^^^^^^^^^^^^ 353 354.. code-block:: llvm 355 356 !4 = metadata !{ 357 i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type) 358 metadata, ;; Reference to context 359 metadata, ;; Name (may be "" for anonymous types) 360 metadata, ;; Reference to file where defined (may be NULL) 361 i32, ;; Line number where defined (may be 0) 362 i64, ;; Size in bits 363 i64, ;; Alignment in bits 364 i64, ;; Offset in bits 365 i32, ;; Flags 366 i32 ;; DWARF type encoding 367 } 368 369These descriptors define primitive types used in the code. Example ``int``, 370``bool`` and ``float``. The context provides the scope of the type, which is 371usually the top level. Since basic types are not usually user defined the 372context and line number can be left as NULL and 0. The size, alignment and 373offset are expressed in bits and can be 64 bit values. The alignment is used 374to round the offset when embedded in a :ref:`composite type 375<format_composite_type>` (example to keep float doubles on 64 bit boundaries). 376The offset is the bit offset if embedded in a :ref:`composite type 377<format_composite_type>`. 378 379The type encoding provides the details of the type. The values are typically 380one of the following: 381 382.. code-block:: llvm 383 384 DW_ATE_address = 1 385 DW_ATE_boolean = 2 386 DW_ATE_float = 4 387 DW_ATE_signed = 5 388 DW_ATE_signed_char = 6 389 DW_ATE_unsigned = 7 390 DW_ATE_unsigned_char = 8 391 392.. _format_derived_type: 393 394Derived type descriptors 395^^^^^^^^^^^^^^^^^^^^^^^^ 396 397.. code-block:: llvm 398 399 !5 = metadata !{ 400 i32, ;; Tag (see below) 401 metadata, ;; Reference to context 402 metadata, ;; Name (may be "" for anonymous types) 403 metadata, ;; Reference to file where defined (may be NULL) 404 i32, ;; Line number where defined (may be 0) 405 i64, ;; Size in bits 406 i64, ;; Alignment in bits 407 i64, ;; Offset in bits 408 i32, ;; Flags to encode attributes, e.g. private 409 metadata, ;; Reference to type derived from 410 metadata, ;; (optional) Name of the Objective C property associated with 411 ;; Objective-C an ivar, or the type of which this 412 ;; pointer-to-member is pointing to members of. 413 metadata, ;; (optional) Name of the Objective C property getter selector. 414 metadata, ;; (optional) Name of the Objective C property setter selector. 415 i32 ;; (optional) Objective C property attributes. 416 } 417 418These descriptors are used to define types derived from other types. The value 419of the tag varies depending on the meaning. The following are possible tag 420values: 421 422.. code-block:: llvm 423 424 DW_TAG_formal_parameter = 5 425 DW_TAG_member = 13 426 DW_TAG_pointer_type = 15 427 DW_TAG_reference_type = 16 428 DW_TAG_typedef = 22 429 DW_TAG_ptr_to_member_type = 31 430 DW_TAG_const_type = 38 431 DW_TAG_volatile_type = 53 432 DW_TAG_restrict_type = 55 433 434``DW_TAG_member`` is used to define a member of a :ref:`composite type 435<format_composite_type>` or :ref:`subprogram <format_subprograms>`. The type 436of the member is the :ref:`derived type <format_derived_type>`. 437``DW_TAG_formal_parameter`` is used to define a member which is a formal 438argument of a subprogram. 439 440``DW_TAG_typedef`` is used to provide a name for the derived type. 441 442``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``, 443``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the 444:ref:`derived type <format_derived_type>`. 445 446:ref:`Derived type <format_derived_type>` location can be determined from the 447context and line number. The size, alignment and offset are expressed in bits 448and can be 64 bit values. The alignment is used to round the offset when 449embedded in a :ref:`composite type <format_composite_type>` (example to keep 450float doubles on 64 bit boundaries.) The offset is the bit offset if embedded 451in a :ref:`composite type <format_composite_type>`. 452 453Note that the ``void *`` type is expressed as a type derived from NULL. 454 455.. _format_composite_type: 456 457Composite type descriptors 458^^^^^^^^^^^^^^^^^^^^^^^^^^ 459 460.. code-block:: llvm 461 462 !6 = metadata !{ 463 i32, ;; Tag (see below) 464 metadata, ;; Reference to context 465 metadata, ;; Name (may be "" for anonymous types) 466 metadata, ;; Reference to file where defined (may be NULL) 467 i32, ;; Line number where defined (may be 0) 468 i64, ;; Size in bits 469 i64, ;; Alignment in bits 470 i64, ;; Offset in bits 471 i32, ;; Flags 472 metadata, ;; Reference to type derived from 473 metadata, ;; Reference to array of member descriptors 474 i32 ;; Runtime languages 475 } 476 477These descriptors are used to define types that are composed of 0 or more 478elements. The value of the tag varies depending on the meaning. The following 479are possible tag values: 480 481.. code-block:: llvm 482 483 DW_TAG_array_type = 1 484 DW_TAG_enumeration_type = 4 485 DW_TAG_structure_type = 19 486 DW_TAG_union_type = 23 487 DW_TAG_vector_type = 259 488 DW_TAG_subroutine_type = 21 489 DW_TAG_inheritance = 28 490 491The vector flag indicates that an array type is a native packed vector. 492 493The members of array types (tag = ``DW_TAG_array_type``) or vector types (tag = 494``DW_TAG_vector_type``) are :ref:`subrange descriptors <format_subrange>`, each 495representing the range of subscripts at that level of indexing. 496 497The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are 498:ref:`enumerator descriptors <format_enumerator>`, each representing the 499definition of enumeration value for the set. All enumeration type descriptors 500are collected inside the named metadata ``!llvm.dbg.cu``. 501 502The members of structure (tag = ``DW_TAG_structure_type``) or union (tag = 503``DW_TAG_union_type``) types are any one of the :ref:`basic 504<format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite 505<format_composite_type>` type descriptors, each representing a field member of 506the structure or union. 507 508For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide 509information about base classes, static members and member functions. If a 510member is a :ref:`derived type descriptor <format_derived_type>` and has a tag 511of ``DW_TAG_inheritance``, then the type represents a base class. If the member 512of is a :ref:`global variable descriptor <format_global_variables>` then it 513represents a static member. And, if the member is a :ref:`subprogram 514descriptor <format_subprograms>` then it represents a member function. For 515static members and member functions, ``getName()`` returns the members link or 516the C++ mangled name. ``getDisplayName()`` the simplied version of the name. 517 518The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements 519is the return type for the subroutine. The remaining elements are the formal 520arguments to the subroutine. 521 522:ref:`Composite type <format_composite_type>` location can be determined from 523the context and line number. The size, alignment and offset are expressed in 524bits and can be 64 bit values. The alignment is used to round the offset when 525embedded in a :ref:`composite type <format_composite_type>` (as an example, to 526keep float doubles on 64 bit boundaries). The offset is the bit offset if 527embedded in a :ref:`composite type <format_composite_type>`. 528 529.. _format_subrange: 530 531Subrange descriptors 532^^^^^^^^^^^^^^^^^^^^ 533 534.. code-block:: llvm 535 536 !42 = metadata !{ 537 i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type) 538 i64, ;; Low value 539 i64 ;; High value 540 } 541 542These descriptors are used to define ranges of array subscripts for an array 543:ref:`composite type <format_composite_type>`. The low value defines the lower 544bounds typically zero for C/C++. The high value is the upper bounds. Values 545are 64 bit. ``High - Low + 1`` is the size of the array. If ``Low > High`` 546the array bounds are not included in generated debugging information. 547 548.. _format_enumerator: 549 550Enumerator descriptors 551^^^^^^^^^^^^^^^^^^^^^^ 552 553.. code-block:: llvm 554 555 !6 = metadata !{ 556 i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator) 557 metadata, ;; Name 558 i64 ;; Value 559 } 560 561These descriptors are used to define members of an enumeration :ref:`composite 562type <format_composite_type>`, it associates the name to the value. 563 564Local variables 565^^^^^^^^^^^^^^^ 566 567.. code-block:: llvm 568 569 !7 = metadata !{ 570 i32, ;; Tag (see below) 571 metadata, ;; Context 572 metadata, ;; Name 573 metadata, ;; Reference to file where defined 574 i32, ;; 24 bit - Line number where defined 575 ;; 8 bit - Argument number. 1 indicates 1st argument. 576 metadata, ;; Type descriptor 577 i32, ;; flags 578 metadata ;; (optional) Reference to inline location 579 } 580 581These descriptors are used to define variables local to a sub program. The 582value of the tag depends on the usage of the variable: 583 584.. code-block:: llvm 585 586 DW_TAG_auto_variable = 256 587 DW_TAG_arg_variable = 257 588 DW_TAG_return_variable = 258 589 590An auto variable is any variable declared in the body of the function. An 591argument variable is any variable that appears as a formal argument to the 592function. A return variable is used to track the result of a function and has 593no source correspondent. 594 595The context is either the subprogram or block where the variable is defined. 596Name the source variable name. Context and line indicate where the variable 597was defined. Type descriptor defines the declared type of the variable. 598 599.. _format_common_intrinsics: 600 601Debugger intrinsic functions 602^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 603 604LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to 605provide debug information at various points in generated code. 606 607``llvm.dbg.declare`` 608^^^^^^^^^^^^^^^^^^^^ 609 610.. code-block:: llvm 611 612 void %llvm.dbg.declare(metadata, metadata) 613 614This intrinsic provides information about a local element (e.g., variable). 615The first argument is metadata holding the alloca for the variable. The second 616argument is metadata containing a description of the variable. 617 618``llvm.dbg.value`` 619^^^^^^^^^^^^^^^^^^ 620 621.. code-block:: llvm 622 623 void %llvm.dbg.value(metadata, i64, metadata) 624 625This intrinsic provides information when a user source variable is set to a new 626value. The first argument is the new value (wrapped as metadata). The second 627argument is the offset in the user source variable where the new value is 628written. The third argument is metadata containing a description of the user 629source variable. 630 631Object lifetimes and scoping 632============================ 633 634In many languages, the local variables in functions can have their lifetimes or 635scopes limited to a subset of a function. In the C family of languages, for 636example, variables are only live (readable and writable) within the source 637block that they are defined in. In functional languages, values are only 638readable after they have been defined. Though this is a very obvious concept, 639it is non-trivial to model in LLVM, because it has no notion of scoping in this 640sense, and does not want to be tied to a language's scoping rules. 641 642In order to handle this, the LLVM debug format uses the metadata attached to 643llvm instructions to encode line number and scoping information. Consider the 644following C fragment, for example: 645 646.. code-block:: c 647 648 1. void foo() { 649 2. int X = 21; 650 3. int Y = 22; 651 4. { 652 5. int Z = 23; 653 6. Z = X; 654 7. } 655 8. X = Y; 656 9. } 657 658Compiled to LLVM, this function would be represented like this: 659 660.. code-block:: llvm 661 662 define void @foo() nounwind ssp { 663 entry: 664 %X = alloca i32, align 4 ; <i32*> [#uses=4] 665 %Y = alloca i32, align 4 ; <i32*> [#uses=4] 666 %Z = alloca i32, align 4 ; <i32*> [#uses=3] 667 %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] 668 call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 669 store i32 21, i32* %X, !dbg !8 670 %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] 671 call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 672 store i32 22, i32* %Y, !dbg !11 673 %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] 674 call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 675 store i32 23, i32* %Z, !dbg !15 676 %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] 677 %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] 678 %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] 679 store i32 %add, i32* %Z, !dbg !16 680 %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] 681 store i32 %tmp2, i32* %X, !dbg !17 682 ret void, !dbg !18 683 } 684 685 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone 686 687 !0 = metadata !{i32 459008, metadata !1, metadata !"X", 688 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] 689 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 690 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", 691 metadata !"foo", metadata !3, i32 1, metadata !4, 692 i1 false, i1 true}; [DW_TAG_subprogram ] 693 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", 694 metadata !"/private/tmp", metadata !"clang 1.1", i1 true, 695 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] 696 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, 697 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] 698 !5 = metadata !{null} 699 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, 700 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] 701 !7 = metadata !{i32 2, i32 7, metadata !1, null} 702 !8 = metadata !{i32 2, i32 3, metadata !1, null} 703 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, 704 metadata !6}; [ DW_TAG_auto_variable ] 705 !10 = metadata !{i32 3, i32 7, metadata !1, null} 706 !11 = metadata !{i32 3, i32 3, metadata !1, null} 707 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, 708 metadata !6}; [ DW_TAG_auto_variable ] 709 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 710 !14 = metadata !{i32 5, i32 9, metadata !13, null} 711 !15 = metadata !{i32 5, i32 5, metadata !13, null} 712 !16 = metadata !{i32 6, i32 5, metadata !13, null} 713 !17 = metadata !{i32 8, i32 3, metadata !1, null} 714 !18 = metadata !{i32 9, i32 1, metadata !2, null} 715 716This example illustrates a few important details about LLVM debugging 717information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and 718location information, which are attached to an instruction, are applied 719together to allow a debugger to analyze the relationship between statements, 720variable definitions, and the code used to implement the function. 721 722.. code-block:: llvm 723 724 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 725 726The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the 727variable ``X``. The metadata ``!dbg !7`` attached to the intrinsic provides 728scope information for the variable ``X``. 729 730.. code-block:: llvm 731 732 !7 = metadata !{i32 2, i32 7, metadata !1, null} 733 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 734 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", 735 metadata !"foo", metadata !"foo", metadata !3, i32 1, 736 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] 737 738Here ``!7`` is metadata providing location information. It has four fields: 739line number, column number, scope, and original scope. The original scope 740represents inline location if this instruction is inlined inside a caller, and 741is null otherwise. In this example, scope is encoded by ``!1``. ``!1`` 742represents a lexical block inside the scope ``!2``, where ``!2`` is a 743:ref:`subprogram descriptor <format_subprograms>`. This way the location 744information attached to the intrinsics indicates that the variable ``X`` is 745declared at line number 2 at a function level scope in function ``foo``. 746 747Now lets take another example. 748 749.. code-block:: llvm 750 751 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 752 753The second intrinsic ``%llvm.dbg.declare`` encodes debugging information for 754variable ``Z``. The metadata ``!dbg !14`` attached to the intrinsic provides 755scope information for the variable ``Z``. 756 757.. code-block:: llvm 758 759 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 760 !14 = metadata !{i32 5, i32 9, metadata !13, null} 761 762Here ``!14`` indicates that ``Z`` is declared at line number 5 and 763column number 9 inside of lexical scope ``!13``. The lexical scope itself 764resides inside of lexical scope ``!1`` described above. 765 766The scope information attached with each instruction provides a straightforward 767way to find instructions covered by a scope. 768 769.. _ccxx_frontend: 770 771C/C++ front-end specific debug information 772========================================== 773 774The C and C++ front-ends represent information about the program in a format 775that is effectively identical to `DWARF 3.0 776<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information 777content. This allows code generators to trivially support native debuggers by 778generating standard dwarf information, and contains enough information for 779non-dwarf targets to translate it as needed. 780 781This section describes the forms used to represent C and C++ programs. Other 782languages could pattern themselves after this (which itself is tuned to 783representing programs in the same way that DWARF 3 does), or they could choose 784to provide completely different forms if they don't fit into the DWARF model. 785As support for debugging information gets added to the various LLVM 786source-language front-ends, the information used should be documented here. 787 788The following sections provide examples of various C/C++ constructs and the 789debug information that would best describe those constructs. 790 791C/C++ source file information 792----------------------------- 793 794Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the 795directory ``/Users/mine/sources``, the following code: 796 797.. code-block:: c 798 799 #include "MyHeader.h" 800 801 int main(int argc, char *argv[]) { 802 return 0; 803 } 804 805a C/C++ front-end would generate the following descriptors: 806 807.. code-block:: llvm 808 809 ... 810 ;; 811 ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". 812 ;; 813 !2 = metadata !{ 814 i32 524305, ;; Tag 815 i32 0, ;; Unused 816 i32 4, ;; Language Id 817 metadata !"MySource.cpp", 818 metadata !"/Users/mine/sources", 819 metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", 820 i1 true, ;; Main Compile Unit 821 i1 false, ;; Optimized compile unit 822 metadata !"", ;; Compiler flags 823 i32 0} ;; Runtime version 824 825 ;; 826 ;; Define the file for the file "/Users/mine/sources/MySource.cpp". 827 ;; 828 !1 = metadata !{ 829 i32 524329, ;; Tag 830 metadata !"MySource.cpp", 831 metadata !"/Users/mine/sources", 832 metadata !2 ;; Compile unit 833 } 834 835 ;; 836 ;; Define the file for the file "/Users/mine/sources/Myheader.h" 837 ;; 838 !3 = metadata !{ 839 i32 524329, ;; Tag 840 metadata !"Myheader.h" 841 metadata !"/Users/mine/sources", 842 metadata !2 ;; Compile unit 843 } 844 845 ... 846 847``llvm::Instruction`` provides easy access to metadata attached with an 848instruction. One can extract line number information encoded in LLVM IR using 849``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``. 850 851.. code-block:: c++ 852 853 if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction 854 DILocation Loc(N); // DILocation is in DebugInfo.h 855 unsigned Line = Loc.getLineNumber(); 856 StringRef File = Loc.getFilename(); 857 StringRef Dir = Loc.getDirectory(); 858 } 859 860C/C++ global variable information 861--------------------------------- 862 863Given an integer global variable declared as follows: 864 865.. code-block:: c 866 867 int MyGlobal = 100; 868 869a C/C++ front-end would generate the following descriptors: 870 871.. code-block:: llvm 872 873 ;; 874 ;; Define the global itself. 875 ;; 876 %MyGlobal = global int 100 877 ... 878 ;; 879 ;; List of debug info of globals 880 ;; 881 !llvm.dbg.cu = !{!0} 882 883 ;; Define the compile unit. 884 !0 = metadata !{ 885 i32 786449, ;; Tag 886 i32 0, ;; Context 887 i32 4, ;; Language 888 metadata !"foo.cpp", ;; File 889 metadata !"/Volumes/Data/tmp", ;; Directory 890 metadata !"clang version 3.1 ", ;; Producer 891 i1 true, ;; Deprecated field 892 i1 false, ;; "isOptimized"? 893 metadata !"", ;; Flags 894 i32 0, ;; Runtime Version 895 metadata !1, ;; Enum Types 896 metadata !1, ;; Retained Types 897 metadata !1, ;; Subprograms 898 metadata !3 ;; Global Variables 899 } ; [ DW_TAG_compile_unit ] 900 901 ;; The Array of Global Variables 902 !3 = metadata !{ 903 metadata !4 904 } 905 906 !4 = metadata !{ 907 metadata !5 908 } 909 910 ;; 911 ;; Define the global variable itself. 912 ;; 913 !5 = metadata !{ 914 i32 786484, ;; Tag 915 i32 0, ;; Unused 916 null, ;; Unused 917 metadata !"MyGlobal", ;; Name 918 metadata !"MyGlobal", ;; Display Name 919 metadata !"", ;; Linkage Name 920 metadata !6, ;; File 921 i32 1, ;; Line 922 metadata !7, ;; Type 923 i32 0, ;; IsLocalToUnit 924 i32 1, ;; IsDefinition 925 i32* @MyGlobal ;; LLVM-IR Value 926 } ; [ DW_TAG_variable ] 927 928 ;; 929 ;; Define the file 930 ;; 931 !6 = metadata !{ 932 i32 786473, ;; Tag 933 metadata !"foo.cpp", ;; File 934 metadata !"/Volumes/Data/tmp", ;; Directory 935 null ;; Unused 936 } ; [ DW_TAG_file_type ] 937 938 ;; 939 ;; Define the type 940 ;; 941 !7 = metadata !{ 942 i32 786468, ;; Tag 943 null, ;; Unused 944 metadata !"int", ;; Name 945 null, ;; Unused 946 i32 0, ;; Line 947 i64 32, ;; Size in Bits 948 i64 32, ;; Align in Bits 949 i64 0, ;; Offset 950 i32 0, ;; Flags 951 i32 5 ;; Encoding 952 } ; [ DW_TAG_base_type ] 953 954C/C++ function information 955-------------------------- 956 957Given a function declared as follows: 958 959.. code-block:: c 960 961 int main(int argc, char *argv[]) { 962 return 0; 963 } 964 965a C/C++ front-end would generate the following descriptors: 966 967.. code-block:: llvm 968 969 ;; 970 ;; Define the anchor for subprograms. Note that the second field of the 971 ;; anchor is 46, which is the same as the tag for subprograms 972 ;; (46 = DW_TAG_subprogram.) 973 ;; 974 !6 = metadata !{ 975 i32 524334, ;; Tag 976 i32 0, ;; Unused 977 metadata !1, ;; Context 978 metadata !"main", ;; Name 979 metadata !"main", ;; Display name 980 metadata !"main", ;; Linkage name 981 metadata !1, ;; File 982 i32 1, ;; Line number 983 metadata !4, ;; Type 984 i1 false, ;; Is local 985 i1 true, ;; Is definition 986 i32 0, ;; Virtuality attribute, e.g. pure virtual function 987 i32 0, ;; Index into virtual table for C++ methods 988 i32 0, ;; Type that holds virtual table. 989 i32 0, ;; Flags 990 i1 false, ;; True if this function is optimized 991 Function *, ;; Pointer to llvm::Function 992 null ;; Function template parameters 993 } 994 ;; 995 ;; Define the subprogram itself. 996 ;; 997 define i32 @main(i32 %argc, i8** %argv) { 998 ... 999 } 1000 1001C/C++ basic types 1002----------------- 1003 1004The following are the basic type descriptors for C/C++ core types: 1005 1006bool 1007^^^^ 1008 1009.. code-block:: llvm 1010 1011 !2 = metadata !{ 1012 i32 524324, ;; Tag 1013 metadata !1, ;; Context 1014 metadata !"bool", ;; Name 1015 metadata !1, ;; File 1016 i32 0, ;; Line number 1017 i64 8, ;; Size in Bits 1018 i64 8, ;; Align in Bits 1019 i64 0, ;; Offset in Bits 1020 i32 0, ;; Flags 1021 i32 2 ;; Encoding 1022 } 1023 1024char 1025^^^^ 1026 1027.. code-block:: llvm 1028 1029 !2 = metadata !{ 1030 i32 524324, ;; Tag 1031 metadata !1, ;; Context 1032 metadata !"char", ;; Name 1033 metadata !1, ;; File 1034 i32 0, ;; Line number 1035 i64 8, ;; Size in Bits 1036 i64 8, ;; Align in Bits 1037 i64 0, ;; Offset in Bits 1038 i32 0, ;; Flags 1039 i32 6 ;; Encoding 1040 } 1041 1042unsigned char 1043^^^^^^^^^^^^^ 1044 1045.. code-block:: llvm 1046 1047 !2 = metadata !{ 1048 i32 524324, ;; Tag 1049 metadata !1, ;; Context 1050 metadata !"unsigned char", 1051 metadata !1, ;; File 1052 i32 0, ;; Line number 1053 i64 8, ;; Size in Bits 1054 i64 8, ;; Align in Bits 1055 i64 0, ;; Offset in Bits 1056 i32 0, ;; Flags 1057 i32 8 ;; Encoding 1058 } 1059 1060short 1061^^^^^ 1062 1063.. code-block:: llvm 1064 1065 !2 = metadata !{ 1066 i32 524324, ;; Tag 1067 metadata !1, ;; Context 1068 metadata !"short int", 1069 metadata !1, ;; File 1070 i32 0, ;; Line number 1071 i64 16, ;; Size in Bits 1072 i64 16, ;; Align in Bits 1073 i64 0, ;; Offset in Bits 1074 i32 0, ;; Flags 1075 i32 5 ;; Encoding 1076 } 1077 1078unsigned short 1079^^^^^^^^^^^^^^ 1080 1081.. code-block:: llvm 1082 1083 !2 = metadata !{ 1084 i32 524324, ;; Tag 1085 metadata !1, ;; Context 1086 metadata !"short unsigned int", 1087 metadata !1, ;; File 1088 i32 0, ;; Line number 1089 i64 16, ;; Size in Bits 1090 i64 16, ;; Align in Bits 1091 i64 0, ;; Offset in Bits 1092 i32 0, ;; Flags 1093 i32 7 ;; Encoding 1094 } 1095 1096int 1097^^^ 1098 1099.. code-block:: llvm 1100 1101 !2 = metadata !{ 1102 i32 524324, ;; Tag 1103 metadata !1, ;; Context 1104 metadata !"int", ;; Name 1105 metadata !1, ;; File 1106 i32 0, ;; Line number 1107 i64 32, ;; Size in Bits 1108 i64 32, ;; Align in Bits 1109 i64 0, ;; Offset in Bits 1110 i32 0, ;; Flags 1111 i32 5 ;; Encoding 1112 } 1113 1114unsigned int 1115^^^^^^^^^^^^ 1116 1117.. code-block:: llvm 1118 1119 !2 = metadata !{ 1120 i32 524324, ;; Tag 1121 metadata !1, ;; Context 1122 metadata !"unsigned int", 1123 metadata !1, ;; File 1124 i32 0, ;; Line number 1125 i64 32, ;; Size in Bits 1126 i64 32, ;; Align in Bits 1127 i64 0, ;; Offset in Bits 1128 i32 0, ;; Flags 1129 i32 7 ;; Encoding 1130 } 1131 1132long long 1133^^^^^^^^^ 1134 1135.. code-block:: llvm 1136 1137 !2 = metadata !{ 1138 i32 524324, ;; Tag 1139 metadata !1, ;; Context 1140 metadata !"long long int", 1141 metadata !1, ;; File 1142 i32 0, ;; Line number 1143 i64 64, ;; Size in Bits 1144 i64 64, ;; Align in Bits 1145 i64 0, ;; Offset in Bits 1146 i32 0, ;; Flags 1147 i32 5 ;; Encoding 1148 } 1149 1150unsigned long long 1151^^^^^^^^^^^^^^^^^^ 1152 1153.. code-block:: llvm 1154 1155 !2 = metadata !{ 1156 i32 524324, ;; Tag 1157 metadata !1, ;; Context 1158 metadata !"long long unsigned int", 1159 metadata !1, ;; File 1160 i32 0, ;; Line number 1161 i64 64, ;; Size in Bits 1162 i64 64, ;; Align in Bits 1163 i64 0, ;; Offset in Bits 1164 i32 0, ;; Flags 1165 i32 7 ;; Encoding 1166 } 1167 1168float 1169^^^^^ 1170 1171.. code-block:: llvm 1172 1173 !2 = metadata !{ 1174 i32 524324, ;; Tag 1175 metadata !1, ;; Context 1176 metadata !"float", 1177 metadata !1, ;; File 1178 i32 0, ;; Line number 1179 i64 32, ;; Size in Bits 1180 i64 32, ;; Align in Bits 1181 i64 0, ;; Offset in Bits 1182 i32 0, ;; Flags 1183 i32 4 ;; Encoding 1184 } 1185 1186double 1187^^^^^^ 1188 1189.. code-block:: llvm 1190 1191 !2 = metadata !{ 1192 i32 524324, ;; Tag 1193 metadata !1, ;; Context 1194 metadata !"double",;; Name 1195 metadata !1, ;; File 1196 i32 0, ;; Line number 1197 i64 64, ;; Size in Bits 1198 i64 64, ;; Align in Bits 1199 i64 0, ;; Offset in Bits 1200 i32 0, ;; Flags 1201 i32 4 ;; Encoding 1202 } 1203 1204C/C++ derived types 1205------------------- 1206 1207Given the following as an example of C/C++ derived type: 1208 1209.. code-block:: c 1210 1211 typedef const int *IntPtr; 1212 1213a C/C++ front-end would generate the following descriptors: 1214 1215.. code-block:: llvm 1216 1217 ;; 1218 ;; Define the typedef "IntPtr". 1219 ;; 1220 !2 = metadata !{ 1221 i32 524310, ;; Tag 1222 metadata !1, ;; Context 1223 metadata !"IntPtr", ;; Name 1224 metadata !3, ;; File 1225 i32 0, ;; Line number 1226 i64 0, ;; Size in bits 1227 i64 0, ;; Align in bits 1228 i64 0, ;; Offset in bits 1229 i32 0, ;; Flags 1230 metadata !4 ;; Derived From type 1231 } 1232 ;; 1233 ;; Define the pointer type. 1234 ;; 1235 !4 = metadata !{ 1236 i32 524303, ;; Tag 1237 metadata !1, ;; Context 1238 metadata !"", ;; Name 1239 metadata !1, ;; File 1240 i32 0, ;; Line number 1241 i64 64, ;; Size in bits 1242 i64 64, ;; Align in bits 1243 i64 0, ;; Offset in bits 1244 i32 0, ;; Flags 1245 metadata !5 ;; Derived From type 1246 } 1247 ;; 1248 ;; Define the const type. 1249 ;; 1250 !5 = metadata !{ 1251 i32 524326, ;; Tag 1252 metadata !1, ;; Context 1253 metadata !"", ;; Name 1254 metadata !1, ;; File 1255 i32 0, ;; Line number 1256 i64 32, ;; Size in bits 1257 i64 32, ;; Align in bits 1258 i64 0, ;; Offset in bits 1259 i32 0, ;; Flags 1260 metadata !6 ;; Derived From type 1261 } 1262 ;; 1263 ;; Define the int type. 1264 ;; 1265 !6 = metadata !{ 1266 i32 524324, ;; Tag 1267 metadata !1, ;; Context 1268 metadata !"int", ;; Name 1269 metadata !1, ;; File 1270 i32 0, ;; Line number 1271 i64 32, ;; Size in bits 1272 i64 32, ;; Align in bits 1273 i64 0, ;; Offset in bits 1274 i32 0, ;; Flags 1275 5 ;; Encoding 1276 } 1277 1278C/C++ struct/union types 1279------------------------ 1280 1281Given the following as an example of C/C++ struct type: 1282 1283.. code-block:: c 1284 1285 struct Color { 1286 unsigned Red; 1287 unsigned Green; 1288 unsigned Blue; 1289 }; 1290 1291a C/C++ front-end would generate the following descriptors: 1292 1293.. code-block:: llvm 1294 1295 ;; 1296 ;; Define basic type for unsigned int. 1297 ;; 1298 !5 = metadata !{ 1299 i32 524324, ;; Tag 1300 metadata !1, ;; Context 1301 metadata !"unsigned int", 1302 metadata !1, ;; File 1303 i32 0, ;; Line number 1304 i64 32, ;; Size in Bits 1305 i64 32, ;; Align in Bits 1306 i64 0, ;; Offset in Bits 1307 i32 0, ;; Flags 1308 i32 7 ;; Encoding 1309 } 1310 ;; 1311 ;; Define composite type for struct Color. 1312 ;; 1313 !2 = metadata !{ 1314 i32 524307, ;; Tag 1315 metadata !1, ;; Context 1316 metadata !"Color", ;; Name 1317 metadata !1, ;; Compile unit 1318 i32 1, ;; Line number 1319 i64 96, ;; Size in bits 1320 i64 32, ;; Align in bits 1321 i64 0, ;; Offset in bits 1322 i32 0, ;; Flags 1323 null, ;; Derived From 1324 metadata !3, ;; Elements 1325 i32 0 ;; Runtime Language 1326 } 1327 1328 ;; 1329 ;; Define the Red field. 1330 ;; 1331 !4 = metadata !{ 1332 i32 524301, ;; Tag 1333 metadata !1, ;; Context 1334 metadata !"Red", ;; Name 1335 metadata !1, ;; File 1336 i32 2, ;; Line number 1337 i64 32, ;; Size in bits 1338 i64 32, ;; Align in bits 1339 i64 0, ;; Offset in bits 1340 i32 0, ;; Flags 1341 metadata !5 ;; Derived From type 1342 } 1343 1344 ;; 1345 ;; Define the Green field. 1346 ;; 1347 !6 = metadata !{ 1348 i32 524301, ;; Tag 1349 metadata !1, ;; Context 1350 metadata !"Green", ;; Name 1351 metadata !1, ;; File 1352 i32 3, ;; Line number 1353 i64 32, ;; Size in bits 1354 i64 32, ;; Align in bits 1355 i64 32, ;; Offset in bits 1356 i32 0, ;; Flags 1357 metadata !5 ;; Derived From type 1358 } 1359 1360 ;; 1361 ;; Define the Blue field. 1362 ;; 1363 !7 = metadata !{ 1364 i32 524301, ;; Tag 1365 metadata !1, ;; Context 1366 metadata !"Blue", ;; Name 1367 metadata !1, ;; File 1368 i32 4, ;; Line number 1369 i64 32, ;; Size in bits 1370 i64 32, ;; Align in bits 1371 i64 64, ;; Offset in bits 1372 i32 0, ;; Flags 1373 metadata !5 ;; Derived From type 1374 } 1375 1376 ;; 1377 ;; Define the array of fields used by the composite type Color. 1378 ;; 1379 !3 = metadata !{metadata !4, metadata !6, metadata !7} 1380 1381C/C++ enumeration types 1382----------------------- 1383 1384Given the following as an example of C/C++ enumeration type: 1385 1386.. code-block:: c 1387 1388 enum Trees { 1389 Spruce = 100, 1390 Oak = 200, 1391 Maple = 300 1392 }; 1393 1394a C/C++ front-end would generate the following descriptors: 1395 1396.. code-block:: llvm 1397 1398 ;; 1399 ;; Define composite type for enum Trees 1400 ;; 1401 !2 = metadata !{ 1402 i32 524292, ;; Tag 1403 metadata !1, ;; Context 1404 metadata !"Trees", ;; Name 1405 metadata !1, ;; File 1406 i32 1, ;; Line number 1407 i64 32, ;; Size in bits 1408 i64 32, ;; Align in bits 1409 i64 0, ;; Offset in bits 1410 i32 0, ;; Flags 1411 null, ;; Derived From type 1412 metadata !3, ;; Elements 1413 i32 0 ;; Runtime language 1414 } 1415 1416 ;; 1417 ;; Define the array of enumerators used by composite type Trees. 1418 ;; 1419 !3 = metadata !{metadata !4, metadata !5, metadata !6} 1420 1421 ;; 1422 ;; Define Spruce enumerator. 1423 ;; 1424 !4 = metadata !{i32 524328, metadata !"Spruce", i64 100} 1425 1426 ;; 1427 ;; Define Oak enumerator. 1428 ;; 1429 !5 = metadata !{i32 524328, metadata !"Oak", i64 200} 1430 1431 ;; 1432 ;; Define Maple enumerator. 1433 ;; 1434 !6 = metadata !{i32 524328, metadata !"Maple", i64 300} 1435 1436Debugging information format 1437============================ 1438 1439Debugging Information Extension for Objective C Properties 1440---------------------------------------------------------- 1441 1442Introduction 1443^^^^^^^^^^^^ 1444 1445Objective C provides a simpler way to declare and define accessor methods using 1446declared properties. The language provides features to declare a property and 1447to let compiler synthesize accessor methods. 1448 1449The debugger lets developer inspect Objective C interfaces and their instance 1450variables and class variables. However, the debugger does not know anything 1451about the properties defined in Objective C interfaces. The debugger consumes 1452information generated by compiler in DWARF format. The format does not support 1453encoding of Objective C properties. This proposal describes DWARF extensions to 1454encode Objective C properties, which the debugger can use to let developers 1455inspect Objective C properties. 1456 1457Proposal 1458^^^^^^^^ 1459 1460Objective C properties exist separately from class members. A property can be 1461defined only by "setter" and "getter" selectors, and be calculated anew on each 1462access. Or a property can just be a direct access to some declared ivar. 1463Finally it can have an ivar "automatically synthesized" for it by the compiler, 1464in which case the property can be referred to in user code directly using the 1465standard C dereference syntax as well as through the property "dot" syntax, but 1466there is no entry in the ``@interface`` declaration corresponding to this ivar. 1467 1468To facilitate debugging, these properties we will add a new DWARF TAG into the 1469``DW_TAG_structure_type`` definition for the class to hold the description of a 1470given property, and a set of DWARF attributes that provide said description. 1471The property tag will also contain the name and declared type of the property. 1472 1473If there is a related ivar, there will also be a DWARF property attribute placed 1474in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG 1475for that property. And in the case where the compiler synthesizes the ivar 1476directly, the compiler is expected to generate a ``DW_TAG_member`` for that 1477ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used 1478to access this ivar directly in code, and with the property attribute pointing 1479back to the property it is backing. 1480 1481The following examples will serve as illustration for our discussion: 1482 1483.. code-block:: objc 1484 1485 @interface I1 { 1486 int n2; 1487 } 1488 1489 @property int p1; 1490 @property int p2; 1491 @end 1492 1493 @implementation I1 1494 @synthesize p1; 1495 @synthesize p2 = n2; 1496 @end 1497 1498This produces the following DWARF (this is a "pseudo dwarfdump" output): 1499 1500.. code-block:: none 1501 1502 0x00000100: TAG_structure_type [7] * 1503 AT_APPLE_runtime_class( 0x10 ) 1504 AT_name( "I1" ) 1505 AT_decl_file( "Objc_Property.m" ) 1506 AT_decl_line( 3 ) 1507 1508 0x00000110 TAG_APPLE_property 1509 AT_name ( "p1" ) 1510 AT_type ( {0x00000150} ( int ) ) 1511 1512 0x00000120: TAG_APPLE_property 1513 AT_name ( "p2" ) 1514 AT_type ( {0x00000150} ( int ) ) 1515 1516 0x00000130: TAG_member [8] 1517 AT_name( "_p1" ) 1518 AT_APPLE_property ( {0x00000110} "p1" ) 1519 AT_type( {0x00000150} ( int ) ) 1520 AT_artificial ( 0x1 ) 1521 1522 0x00000140: TAG_member [8] 1523 AT_name( "n2" ) 1524 AT_APPLE_property ( {0x00000120} "p2" ) 1525 AT_type( {0x00000150} ( int ) ) 1526 1527 0x00000150: AT_type( ( int ) ) 1528 1529Note, the current convention is that the name of the ivar for an 1530auto-synthesized property is the name of the property from which it derives 1531with an underscore prepended, as is shown in the example. But we actually 1532don't need to know this convention, since we are given the name of the ivar 1533directly. 1534 1535Also, it is common practice in ObjC to have different property declarations in 1536the @interface and @implementation - e.g. to provide a read-only property in 1537the interface,and a read-write interface in the implementation. In that case, 1538the compiler should emit whichever property declaration will be in force in the 1539current translation unit. 1540 1541Developers can decorate a property with attributes which are encoded using 1542``DW_AT_APPLE_property_attribute``. 1543 1544.. code-block:: objc 1545 1546 @property (readonly, nonatomic) int pr; 1547 1548.. code-block:: none 1549 1550 TAG_APPLE_property [8] 1551 AT_name( "pr" ) 1552 AT_type ( {0x00000147} (int) ) 1553 AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) 1554 1555The setter and getter method names are attached to the property using 1556``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes. 1557 1558.. code-block:: objc 1559 1560 @interface I1 1561 @property (setter=myOwnP3Setter:) int p3; 1562 -(void)myOwnP3Setter:(int)a; 1563 @end 1564 1565 @implementation I1 1566 @synthesize p3; 1567 -(void)myOwnP3Setter:(int)a{ } 1568 @end 1569 1570The DWARF for this would be: 1571 1572.. code-block:: none 1573 1574 0x000003bd: TAG_structure_type [7] * 1575 AT_APPLE_runtime_class( 0x10 ) 1576 AT_name( "I1" ) 1577 AT_decl_file( "Objc_Property.m" ) 1578 AT_decl_line( 3 ) 1579 1580 0x000003cd TAG_APPLE_property 1581 AT_name ( "p3" ) 1582 AT_APPLE_property_setter ( "myOwnP3Setter:" ) 1583 AT_type( {0x00000147} ( int ) ) 1584 1585 0x000003f3: TAG_member [8] 1586 AT_name( "_p3" ) 1587 AT_type ( {0x00000147} ( int ) ) 1588 AT_APPLE_property ( {0x000003cd} ) 1589 AT_artificial ( 0x1 ) 1590 1591New DWARF Tags 1592^^^^^^^^^^^^^^ 1593 1594+-----------------------+--------+ 1595| TAG | Value | 1596+=======================+========+ 1597| DW_TAG_APPLE_property | 0x4200 | 1598+-----------------------+--------+ 1599 1600New DWARF Attributes 1601^^^^^^^^^^^^^^^^^^^^ 1602 1603+--------------------------------+--------+-----------+ 1604| Attribute | Value | Classes | 1605+================================+========+===========+ 1606| DW_AT_APPLE_property | 0x3fed | Reference | 1607+--------------------------------+--------+-----------+ 1608| DW_AT_APPLE_property_getter | 0x3fe9 | String | 1609+--------------------------------+--------+-----------+ 1610| DW_AT_APPLE_property_setter | 0x3fea | String | 1611+--------------------------------+--------+-----------+ 1612| DW_AT_APPLE_property_attribute | 0x3feb | Constant | 1613+--------------------------------+--------+-----------+ 1614 1615New DWARF Constants 1616^^^^^^^^^^^^^^^^^^^ 1617 1618+--------------------------------+-------+ 1619| Name | Value | 1620+================================+=======+ 1621| DW_AT_APPLE_PROPERTY_readonly | 0x1 | 1622+--------------------------------+-------+ 1623| DW_AT_APPLE_PROPERTY_readwrite | 0x2 | 1624+--------------------------------+-------+ 1625| DW_AT_APPLE_PROPERTY_assign | 0x4 | 1626+--------------------------------+-------+ 1627| DW_AT_APPLE_PROPERTY_retain | 0x8 | 1628+--------------------------------+-------+ 1629| DW_AT_APPLE_PROPERTY_copy | 0x10 | 1630+--------------------------------+-------+ 1631| DW_AT_APPLE_PROPERTY_nonatomic | 0x20 | 1632+--------------------------------+-------+ 1633 1634Name Accelerator Tables 1635----------------------- 1636 1637Introduction 1638^^^^^^^^^^^^ 1639 1640The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a 1641debugger needs. The "``pub``" in the section name indicates that the entries 1642in the table are publicly visible names only. This means no static or hidden 1643functions show up in the "``.debug_pubnames``". No static variables or private 1644class variables are in the "``.debug_pubtypes``". Many compilers add different 1645things to these tables, so we can't rely upon the contents between gcc, icc, or 1646clang. 1647 1648The typical query given by users tends not to match up with the contents of 1649these tables. For example, the DWARF spec states that "In the case of the name 1650of a function member or static data member of a C++ structure, class or union, 1651the name presented in the "``.debug_pubnames``" section is not the simple name 1652given by the ``DW_AT_name attribute`` of the referenced debugging information 1653entry, but rather the fully qualified name of the data or function member." 1654So the only names in these tables for complex C++ entries is a fully 1655qualified name. Debugger users tend not to enter their search strings as 1656"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or 1657"``a::b::c``". So the name entered in the name table must be demangled in 1658order to chop it up appropriately and additional names must be manually entered 1659into the table to make it effective as a name lookup table for debuggers to 1660se. 1661 1662All debuggers currently ignore the "``.debug_pubnames``" table as a result of 1663its inconsistent and useless public-only name content making it a waste of 1664space in the object file. These tables, when they are written to disk, are not 1665sorted in any way, leaving every debugger to do its own parsing and sorting. 1666These tables also include an inlined copy of the string values in the table 1667itself making the tables much larger than they need to be on disk, especially 1668for large C++ programs. 1669 1670Can't we just fix the sections by adding all of the names we need to this 1671table? No, because that is not what the tables are defined to contain and we 1672won't know the difference between the old bad tables and the new good tables. 1673At best we could make our own renamed sections that contain all of the data we 1674need. 1675 1676These tables are also insufficient for what a debugger like LLDB needs. LLDB 1677uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then 1678often asked to look for type "``foo``" or namespace "``bar``", or list items in 1679namespace "``baz``". Namespaces are not included in the pubnames or pubtypes 1680tables. Since clang asks a lot of questions when it is parsing an expression, 1681we need to be very fast when looking up names, as it happens a lot. Having new 1682accelerator tables that are optimized for very quick lookups will benefit this 1683type of debugging experience greatly. 1684 1685We would like to generate name lookup tables that can be mapped into memory 1686from disk, and used as is, with little or no up-front parsing. We would also 1687be able to control the exact content of these different tables so they contain 1688exactly what we need. The Name Accelerator Tables were designed to fix these 1689issues. In order to solve these issues we need to: 1690 1691* Have a format that can be mapped into memory from disk and used as is 1692* Lookups should be very fast 1693* Extensible table format so these tables can be made by many producers 1694* Contain all of the names needed for typical lookups out of the box 1695* Strict rules for the contents of tables 1696 1697Table size is important and the accelerator table format should allow the reuse 1698of strings from common string tables so the strings for the names are not 1699duplicated. We also want to make sure the table is ready to be used as-is by 1700simply mapping the table into memory with minimal header parsing. 1701 1702The name lookups need to be fast and optimized for the kinds of lookups that 1703debuggers tend to do. Optimally we would like to touch as few parts of the 1704mapped table as possible when doing a name lookup and be able to quickly find 1705the name entry we are looking for, or discover there are no matches. In the 1706case of debuggers we optimized for lookups that fail most of the time. 1707 1708Each table that is defined should have strict rules on exactly what is in the 1709accelerator tables and documented so clients can rely on the content. 1710 1711Hash Tables 1712^^^^^^^^^^^ 1713 1714Standard Hash Tables 1715"""""""""""""""""""" 1716 1717Typical hash tables have a header, buckets, and each bucket points to the 1718bucket contents: 1719 1720.. code-block:: none 1721 1722 .------------. 1723 | HEADER | 1724 |------------| 1725 | BUCKETS | 1726 |------------| 1727 | DATA | 1728 `------------' 1729 1730The BUCKETS are an array of offsets to DATA for each hash: 1731 1732.. code-block:: none 1733 1734 .------------. 1735 | 0x00001000 | BUCKETS[0] 1736 | 0x00002000 | BUCKETS[1] 1737 | 0x00002200 | BUCKETS[2] 1738 | 0x000034f0 | BUCKETS[3] 1739 | | ... 1740 | 0xXXXXXXXX | BUCKETS[n_buckets] 1741 '------------' 1742 1743So for ``bucket[3]`` in the example above, we have an offset into the table 17440x000034f0 which points to a chain of entries for the bucket. Each bucket must 1745contain a next pointer, full 32 bit hash value, the string itself, and the data 1746for the current string value. 1747 1748.. code-block:: none 1749 1750 .------------. 1751 0x000034f0: | 0x00003500 | next pointer 1752 | 0x12345678 | 32 bit hash 1753 | "erase" | string value 1754 | data[n] | HashData for this bucket 1755 |------------| 1756 0x00003500: | 0x00003550 | next pointer 1757 | 0x29273623 | 32 bit hash 1758 | "dump" | string value 1759 | data[n] | HashData for this bucket 1760 |------------| 1761 0x00003550: | 0x00000000 | next pointer 1762 | 0x82638293 | 32 bit hash 1763 | "main" | string value 1764 | data[n] | HashData for this bucket 1765 `------------' 1766 1767The problem with this layout for debuggers is that we need to optimize for the 1768negative lookup case where the symbol we're searching for is not present. So 1769if we were to lookup "``printf``" in the table above, we would make a 32 hash 1770for "``printf``", it might match ``bucket[3]``. We would need to go to the 1771offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do 1772so, we need to read the next pointer, then read the hash, compare it, and skip 1773to the next bucket. Each time we are skipping many bytes in memory and 1774touching new cache pages just to do the compare on the full 32 bit hash. All 1775of these accesses then tell us that we didn't have a match. 1776 1777Name Hash Tables 1778"""""""""""""""" 1779 1780To solve the issues mentioned above we have structured the hash tables a bit 1781differently: a header, buckets, an array of all unique 32 bit hash values, 1782followed by an array of hash value data offsets, one for each hash value, then 1783the data for all hash values: 1784 1785.. code-block:: none 1786 1787 .-------------. 1788 | HEADER | 1789 |-------------| 1790 | BUCKETS | 1791 |-------------| 1792 | HASHES | 1793 |-------------| 1794 | OFFSETS | 1795 |-------------| 1796 | DATA | 1797 `-------------' 1798 1799The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By 1800making all of the full 32 bit hash values contiguous in memory, we allow 1801ourselves to efficiently check for a match while touching as little memory as 1802possible. Most often checking the 32 bit hash values is as far as the lookup 1803goes. If it does match, it usually is a match with no collisions. So for a 1804table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash 1805values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and 1806``OFFSETS`` as: 1807 1808.. code-block:: none 1809 1810 .-------------------------. 1811 | HEADER.magic | uint32_t 1812 | HEADER.version | uint16_t 1813 | HEADER.hash_function | uint16_t 1814 | HEADER.bucket_count | uint32_t 1815 | HEADER.hashes_count | uint32_t 1816 | HEADER.header_data_len | uint32_t 1817 | HEADER_DATA | HeaderData 1818 |-------------------------| 1819 | BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes 1820 |-------------------------| 1821 | HASHES | uint32_t[hashes_count] // 32 bit hash values 1822 |-------------------------| 1823 | OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data 1824 |-------------------------| 1825 | ALL HASH DATA | 1826 `-------------------------' 1827 1828So taking the exact same data from the standard hash example above we end up 1829with: 1830 1831.. code-block:: none 1832 1833 .------------. 1834 | HEADER | 1835 |------------| 1836 | 0 | BUCKETS[0] 1837 | 2 | BUCKETS[1] 1838 | 5 | BUCKETS[2] 1839 | 6 | BUCKETS[3] 1840 | | ... 1841 | ... | BUCKETS[n_buckets] 1842 |------------| 1843 | 0x........ | HASHES[0] 1844 | 0x........ | HASHES[1] 1845 | 0x........ | HASHES[2] 1846 | 0x........ | HASHES[3] 1847 | 0x........ | HASHES[4] 1848 | 0x........ | HASHES[5] 1849 | 0x12345678 | HASHES[6] hash for BUCKETS[3] 1850 | 0x29273623 | HASHES[7] hash for BUCKETS[3] 1851 | 0x82638293 | HASHES[8] hash for BUCKETS[3] 1852 | 0x........ | HASHES[9] 1853 | 0x........ | HASHES[10] 1854 | 0x........ | HASHES[11] 1855 | 0x........ | HASHES[12] 1856 | 0x........ | HASHES[13] 1857 | 0x........ | HASHES[n_hashes] 1858 |------------| 1859 | 0x........ | OFFSETS[0] 1860 | 0x........ | OFFSETS[1] 1861 | 0x........ | OFFSETS[2] 1862 | 0x........ | OFFSETS[3] 1863 | 0x........ | OFFSETS[4] 1864 | 0x........ | OFFSETS[5] 1865 | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] 1866 | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] 1867 | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] 1868 | 0x........ | OFFSETS[9] 1869 | 0x........ | OFFSETS[10] 1870 | 0x........ | OFFSETS[11] 1871 | 0x........ | OFFSETS[12] 1872 | 0x........ | OFFSETS[13] 1873 | 0x........ | OFFSETS[n_hashes] 1874 |------------| 1875 | | 1876 | | 1877 | | 1878 | | 1879 | | 1880 |------------| 1881 0x000034f0: | 0x00001203 | .debug_str ("erase") 1882 | 0x00000004 | A 32 bit array count - number of HashData with name "erase" 1883 | 0x........ | HashData[0] 1884 | 0x........ | HashData[1] 1885 | 0x........ | HashData[2] 1886 | 0x........ | HashData[3] 1887 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1888 |------------| 1889 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") 1890 | 0x00000002 | A 32 bit array count - number of HashData with name "collision" 1891 | 0x........ | HashData[0] 1892 | 0x........ | HashData[1] 1893 | 0x00001203 | String offset into .debug_str ("dump") 1894 | 0x00000003 | A 32 bit array count - number of HashData with name "dump" 1895 | 0x........ | HashData[0] 1896 | 0x........ | HashData[1] 1897 | 0x........ | HashData[2] 1898 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1899 |------------| 1900 0x00003550: | 0x00001203 | String offset into .debug_str ("main") 1901 | 0x00000009 | A 32 bit array count - number of HashData with name "main" 1902 | 0x........ | HashData[0] 1903 | 0x........ | HashData[1] 1904 | 0x........ | HashData[2] 1905 | 0x........ | HashData[3] 1906 | 0x........ | HashData[4] 1907 | 0x........ | HashData[5] 1908 | 0x........ | HashData[6] 1909 | 0x........ | HashData[7] 1910 | 0x........ | HashData[8] 1911 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1912 `------------' 1913 1914So we still have all of the same data, we just organize it more efficiently for 1915debugger lookup. If we repeat the same "``printf``" lookup from above, we 1916would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit 1917hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which 1918is the index into the ``HASHES`` table. We would then compare any consecutive 191932 bit hashes values in the ``HASHES`` array as long as the hashes would be in 1920``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo 1921``n_buckets`` is still 3. In the case of a failed lookup we would access the 1922memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes 1923before we know that we have no match. We don't end up marching through 1924multiple words of memory and we really keep the number of processor data cache 1925lines being accessed as small as possible. 1926 1927The string hash that is used for these lookup tables is the Daniel J. 1928Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a 1929very good hash for all kinds of names in programs with very few hash 1930collisions. 1931 1932Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``. 1933 1934Details 1935^^^^^^^ 1936 1937These name hash tables are designed to be generic where specializations of the 1938table get to define additional data that goes into the header ("``HeaderData``"), 1939how the string value is stored ("``KeyType``") and the content of the data for each 1940hash value. 1941 1942Header Layout 1943""""""""""""" 1944 1945The header has a fixed part, and the specialized part. The exact format of the 1946header is: 1947 1948.. code-block:: c 1949 1950 struct Header 1951 { 1952 uint32_t magic; // 'HASH' magic value to allow endian detection 1953 uint16_t version; // Version number 1954 uint16_t hash_function; // The hash function enumeration that was used 1955 uint32_t bucket_count; // The number of buckets in this hash table 1956 uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table 1957 uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment 1958 // Specifically the length of the following HeaderData field - this does not 1959 // include the size of the preceding fields 1960 HeaderData header_data; // Implementation specific header data 1961 }; 1962 1963The header starts with a 32 bit "``magic``" value which must be ``'HASH'`` 1964encoded as an ASCII integer. This allows the detection of the start of the 1965hash table and also allows the table's byte order to be determined so the table 1966can be correctly extracted. The "``magic``" value is followed by a 16 bit 1967``version`` number which allows the table to be revised and modified in the 1968future. The current version number is 1. ``hash_function`` is a ``uint16_t`` 1969enumeration that specifies which hash function was used to produce this table. 1970The current values for the hash function enumerations include: 1971 1972.. code-block:: c 1973 1974 enum HashFunctionType 1975 { 1976 eHashFunctionDJB = 0u, // Daniel J Bernstein hash function 1977 }; 1978 1979``bucket_count`` is a 32 bit unsigned integer that represents how many buckets 1980are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit 1981hash values that are in the ``HASHES`` array, and is the same number of offsets 1982are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size 1983in bytes of the ``HeaderData`` that is filled in by specialized versions of 1984this table. 1985 1986Fixed Lookup 1987"""""""""""" 1988 1989The header is followed by the buckets, hashes, offsets, and hash value data. 1990 1991.. code-block:: c 1992 1993 struct FixedTable 1994 { 1995 uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below 1996 uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table 1997 uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above 1998 }; 1999 2000``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The 2001``hashes`` array contains all of the 32 bit hash values for all names in the 2002hash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` 2003array that points to the data for the hash value. 2004 2005This table setup makes it very easy to repurpose these tables to contain 2006different data, while keeping the lookup mechanism the same for all tables. 2007This layout also makes it possible to save the table to disk and map it in 2008later and do very efficient name lookups with little or no parsing. 2009 2010DWARF lookup tables can be implemented in a variety of ways and can store a lot 2011of information for each name. We want to make the DWARF tables extensible and 2012able to store the data efficiently so we have used some of the DWARF features 2013that enable efficient data storage to define exactly what kind of data we store 2014for each name. 2015 2016The ``HeaderData`` contains a definition of the contents of each HashData chunk. 2017We might want to store an offset to all of the debug information entries (DIEs) 2018for each name. To keep things extensible, we create a list of items, or 2019Atoms, that are contained in the data for each name. First comes the type of 2020the data in each atom: 2021 2022.. code-block:: c 2023 2024 enum AtomType 2025 { 2026 eAtomTypeNULL = 0u, 2027 eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding 2028 eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question 2029 eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 2030 eAtomTypeNameFlags = 4u, // Flags from enum NameFlags 2031 eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags 2032 }; 2033 2034The enumeration values and their meanings are: 2035 2036.. code-block:: none 2037 2038 eAtomTypeNULL - a termination atom that specifies the end of the atom list 2039 eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name 2040 eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE 2041 eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is 2042 eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) 2043 eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) 2044 2045Then we allow each atom type to define the atom type and how the data for each 2046atom type data is encoded: 2047 2048.. code-block:: c 2049 2050 struct Atom 2051 { 2052 uint16_t type; // AtomType enum value 2053 uint16_t form; // DWARF DW_FORM_XXX defines 2054 }; 2055 2056The ``form`` type above is from the DWARF specification and defines the exact 2057encoding of the data for the Atom type. See the DWARF specification for the 2058``DW_FORM_`` definitions. 2059 2060.. code-block:: c 2061 2062 struct HeaderData 2063 { 2064 uint32_t die_offset_base; 2065 uint32_t atom_count; 2066 Atoms atoms[atom_count0]; 2067 }; 2068 2069``HeaderData`` defines the base DIE offset that should be added to any atoms 2070that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``, 2071``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines 2072what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large 2073each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data 2074should be interpreted. 2075 2076For the current implementations of the "``.apple_names``" (all functions + 2077globals), the "``.apple_types``" (names of all types that are defined), and 2078the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom`` 2079array to be: 2080 2081.. code-block:: c 2082 2083 HeaderData.atom_count = 1; 2084 HeaderData.atoms[0].type = eAtomTypeDIEOffset; 2085 HeaderData.atoms[0].form = DW_FORM_data4; 2086 2087This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is 2088 encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have 2089 multiple matching DIEs in a single file, which could come up with an inlined 2090 function for instance. Future tables could include more information about the 2091 DIE such as flags indicating if the DIE is a function, method, block, 2092 or inlined. 2093 2094The KeyType for the DWARF table is a 32 bit string table offset into the 2095 ".debug_str" table. The ".debug_str" is the string table for the DWARF which 2096 may already contain copies of all of the strings. This helps make sure, with 2097 help from the compiler, that we reuse the strings between all of the DWARF 2098 sections and keeps the hash table size down. Another benefit to having the 2099 compiler generate all strings as DW_FORM_strp in the debug info, is that 2100 DWARF parsing can be made much faster. 2101 2102After a lookup is made, we get an offset into the hash data. The hash data 2103 needs to be able to deal with 32 bit hash collisions, so the chunk of data 2104 at the offset in the hash data consists of a triple: 2105 2106.. code-block:: c 2107 2108 uint32_t str_offset 2109 uint32_t hash_data_count 2110 HashData[hash_data_count] 2111 2112If "str_offset" is zero, then the bucket contents are done. 99.9% of the 2113 hash data chunks contain a single item (no 32 bit hash collision): 2114 2115.. code-block:: none 2116 2117 .------------. 2118 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2119 | 0x00000004 | uint32_t HashData count 2120 | 0x........ | uint32_t HashData[0] DIE offset 2121 | 0x........ | uint32_t HashData[1] DIE offset 2122 | 0x........ | uint32_t HashData[2] DIE offset 2123 | 0x........ | uint32_t HashData[3] DIE offset 2124 | 0x00000000 | uint32_t KeyType (end of hash chain) 2125 `------------' 2126 2127If there are collisions, you will have multiple valid string offsets: 2128 2129.. code-block:: none 2130 2131 .------------. 2132 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2133 | 0x00000004 | uint32_t HashData count 2134 | 0x........ | uint32_t HashData[0] DIE offset 2135 | 0x........ | uint32_t HashData[1] DIE offset 2136 | 0x........ | uint32_t HashData[2] DIE offset 2137 | 0x........ | uint32_t HashData[3] DIE offset 2138 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") 2139 | 0x00000002 | uint32_t HashData count 2140 | 0x........ | uint32_t HashData[0] DIE offset 2141 | 0x........ | uint32_t HashData[1] DIE offset 2142 | 0x00000000 | uint32_t KeyType (end of hash chain) 2143 `------------' 2144 2145Current testing with real world C++ binaries has shown that there is around 1 214632 bit hash collision per 100,000 name entries. 2147 2148Contents 2149^^^^^^^^ 2150 2151As we said, we want to strictly define exactly what is included in the 2152different tables. For DWARF, we have 3 tables: "``.apple_names``", 2153"``.apple_types``", and "``.apple_namespaces``". 2154 2155"``.apple_names``" sections should contain an entry for each DWARF DIE whose 2156``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or 2157``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``, 2158``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains 2159``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and 2160static variables). All global and static variables should be included, 2161including those scoped within functions and classes. For example using the 2162following code: 2163 2164.. code-block:: c 2165 2166 static int var = 0; 2167 2168 void f () 2169 { 2170 static int var = 0; 2171 } 2172 2173Both of the static ``var`` variables would be included in the table. All 2174functions should emit both their full names and their basenames. For C or C++, 2175the full name is the mangled name (if available) which is usually in the 2176``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the 2177function basename. If global or static variables have a mangled name in a 2178``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the 2179simple name found in the ``DW_AT_name`` attribute. 2180 2181"``.apple_types``" sections should contain an entry for each DWARF DIE whose 2182tag is one of: 2183 2184* DW_TAG_array_type 2185* DW_TAG_class_type 2186* DW_TAG_enumeration_type 2187* DW_TAG_pointer_type 2188* DW_TAG_reference_type 2189* DW_TAG_string_type 2190* DW_TAG_structure_type 2191* DW_TAG_subroutine_type 2192* DW_TAG_typedef 2193* DW_TAG_union_type 2194* DW_TAG_ptr_to_member_type 2195* DW_TAG_set_type 2196* DW_TAG_subrange_type 2197* DW_TAG_base_type 2198* DW_TAG_const_type 2199* DW_TAG_constant 2200* DW_TAG_file_type 2201* DW_TAG_namelist 2202* DW_TAG_packed_type 2203* DW_TAG_volatile_type 2204* DW_TAG_restrict_type 2205* DW_TAG_interface_type 2206* DW_TAG_unspecified_type 2207* DW_TAG_shared_type 2208 2209Only entries with a ``DW_AT_name`` attribute are included, and the entry must 2210not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero 2211value). For example, using the following code: 2212 2213.. code-block:: c 2214 2215 int main () 2216 { 2217 int *b = 0; 2218 return *b; 2219 } 2220 2221We get a few type DIEs: 2222 2223.. code-block:: none 2224 2225 0x00000067: TAG_base_type [5] 2226 AT_encoding( DW_ATE_signed ) 2227 AT_name( "int" ) 2228 AT_byte_size( 0x04 ) 2229 2230 0x0000006e: TAG_pointer_type [6] 2231 AT_type( {0x00000067} ( int ) ) 2232 AT_byte_size( 0x08 ) 2233 2234The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``. 2235 2236"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs. 2237If we run into a namespace that has no name this is an anonymous namespace, and 2238the name should be output as "``(anonymous namespace)``" (without the quotes). 2239Why? This matches the output of the ``abi::cxa_demangle()`` that is in the 2240standard C++ library that demangles mangled names. 2241 2242 2243Language Extensions and File Format Changes 2244^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2245 2246Objective-C Extensions 2247"""""""""""""""""""""" 2248 2249"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an 2250Objective-C class. The name used in the hash table is the name of the 2251Objective-C class itself. If the Objective-C class has a category, then an 2252entry is made for both the class name without the category, and for the class 2253name with the category. So if we have a DIE at offset 0x1234 with a name of 2254method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add 2255an entry for "``NSString``" that points to DIE 0x1234, and an entry for 2256"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly 2257track down all Objective-C methods for an Objective-C class when doing 2258expressions. It is needed because of the dynamic nature of Objective-C where 2259anyone can add methods to a class. The DWARF for Objective-C methods is also 2260emitted differently from C++ classes where the methods are not usually 2261contained in the class definition, they are scattered about across one or more 2262compile units. Categories can also be defined in different shared libraries. 2263So we need to be able to quickly find all of the methods and class functions 2264given the Objective-C class name, or quickly find all methods and class 2265functions for a class + category name. This table does not contain any 2266selector names, it just maps Objective-C class names (or class names + 2267category) to all of the methods and class functions. The selectors are added 2268as function basenames in the "``.debug_names``" section. 2269 2270In the "``.apple_names``" section for Objective-C functions, the full name is 2271the entire function name with the brackets ("``-[NSString 2272stringWithCString:]``") and the basename is the selector only 2273("``stringWithCString:``"). 2274 2275Mach-O Changes 2276"""""""""""""" 2277 2278The sections names for the apple hash tables are for non mach-o files. For 2279mach-o files, the sections should be contained in the ``__DWARF`` segment with 2280names as follows: 2281 2282* "``.apple_names``" -> "``__apple_names``" 2283* "``.apple_types``" -> "``__apple_types``" 2284* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) 2285* "``.apple_objc``" -> "``__apple_objc``" 2286 2287