1================================ 2Source Level Debugging with LLVM 3================================ 4 5.. sectionauthor:: Chris Lattner <[email protected]> and Jim Laskey <[email protected]> 6 7.. contents:: 8 :local: 9 10Introduction 11============ 12 13This document is the central repository for all information pertaining to debug 14information in LLVM. It describes the :ref:`actual format that the LLVM debug 15information takes <format>`, which is useful for those interested in creating 16front-ends or dealing directly with the information. Further, this document 17provides specific examples of what debug information for C/C++ looks like. 18 19Philosophy behind LLVM debugging information 20-------------------------------------------- 21 22The idea of the LLVM debugging information is to capture how the important 23pieces of the source-language's Abstract Syntax Tree map onto LLVM code. 24Several design aspects have shaped the solution that appears here. The 25important ones are: 26 27* Debugging information should have very little impact on the rest of the 28 compiler. No transformations, analyses, or code generators should need to 29 be modified because of debugging information. 30 31* LLVM optimizations should interact in :ref:`well-defined and easily described 32 ways <intro_debugopt>` with the debugging information. 33 34* Because LLVM is designed to support arbitrary programming languages, 35 LLVM-to-LLVM tools should not need to know anything about the semantics of 36 the source-level-language. 37 38* Source-level languages are often **widely** different from one another. 39 LLVM should not put any restrictions of the flavor of the source-language, 40 and the debugging information should work with any language. 41 42* With code generator support, it should be possible to use an LLVM compiler 43 to compile a program to native machine code and standard debugging 44 formats. This allows compatibility with traditional machine-code level 45 debuggers, like GDB or DBX. 46 47The approach used by the LLVM implementation is to use a small set of 48:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping 49between LLVM program objects and the source-level objects. The description of 50the source-level program is maintained in LLVM metadata in an 51:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end 52currently uses working draft 7 of the `DWARF 3 standard 53<http://www.eagercon.com/dwarf/dwarf3std.htm>`_). 54 55When a program is being debugged, a debugger interacts with the user and turns 56the stored debug information into source-language specific information. As 57such, a debugger must be aware of the source-language, and is thus tied to a 58specific language or family of languages. 59 60Debug information consumers 61--------------------------- 62 63The role of debug information is to provide meta information normally stripped 64away during the compilation process. This meta information provides an LLVM 65user a relationship between generated code and the original program source 66code. 67 68Currently, debug information is consumed by DwarfDebug to produce dwarf 69information used by the gdb debugger. Other targets could use the same 70information to produce stabs or other debug forms. 71 72It would also be reasonable to use debug information to feed profiling tools 73for analysis of generated code, or, tools for reconstructing the original 74source from generated code. 75 76TODO - expound a bit more. 77 78.. _intro_debugopt: 79 80Debugging optimized code 81------------------------ 82 83An extremely high priority of LLVM debugging information is to make it interact 84well with optimizations and analysis. In particular, the LLVM debug 85information provides the following guarantees: 86 87* LLVM debug information **always provides information to accurately read 88 the source-level state of the program**, regardless of which LLVM 89 optimizations have been run, and without any modification to the 90 optimizations themselves. However, some optimizations may impact the 91 ability to modify the current state of the program with a debugger, such 92 as setting program variables, or calling functions that have been 93 deleted. 94 95* As desired, LLVM optimizations can be upgraded to be aware of the LLVM 96 debugging information, allowing them to update the debugging information 97 as they perform aggressive optimizations. This means that, with effort, 98 the LLVM optimizers could optimize debug code just as well as non-debug 99 code. 100 101* LLVM debug information does not prevent optimizations from 102 happening (for example inlining, basic block reordering/merging/cleanup, 103 tail duplication, etc). 104 105* LLVM debug information is automatically optimized along with the rest of 106 the program, using existing facilities. For example, duplicate 107 information is automatically merged by the linker, and unused information 108 is automatically removed. 109 110Basically, the debug information allows you to compile a program with 111"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify 112the program as it executes from a debugger. Compiling a program with 113"``-O3 -g``" gives you full debug information that is always available and 114accurate for reading (e.g., you get accurate stack traces despite tail call 115elimination and inlining), but you might lose the ability to modify the program 116and call functions where were optimized out of the program, or inlined away 117completely. 118 119:ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test 120optimizer's handling of debugging information. It can be run like this: 121 122.. code-block:: bash 123 124 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level 125 % make TEST=dbgopt 126 127This will test impact of debugging information on optimization passes. If 128debugging information influences optimization passes then it will be reported 129as a failure. See :doc:`TestingGuide` for more information on LLVM test 130infrastructure and how to run various tests. 131 132.. _format: 133 134Debugging information format 135============================ 136 137LLVM debugging information has been carefully designed to make it possible for 138the optimizer to optimize the program and debugging information without 139necessarily having to know anything about debugging information. In 140particular, the use of metadata avoids duplicated debugging information from 141the beginning, and the global dead code elimination pass automatically deletes 142debugging information for a function if it decides to delete the function. 143 144To do this, most of the debugging information (descriptors for types, 145variables, functions, source files, etc) is inserted by the language front-end 146in the form of LLVM metadata. 147 148Debug information is designed to be agnostic about the target debugger and 149debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic 150pass to decode the information that represents variables, types, functions, 151namespaces, etc: this allows for arbitrary source-language semantics and 152type-systems to be used, as long as there is a module written for the target 153debugger to interpret the information. 154 155To provide basic functionality, the LLVM debugger does have to make some 156assumptions about the source-level language being debugged, though it keeps 157these to a minimum. The only common features that the LLVM debugger assumes 158exist are :ref:`source files <format_files>`, and :ref:`program objects 159<format_global_variables>`. These abstract objects are used by a debugger to 160form stack traces, show information about local variables, etc. 161 162This section of the documentation first describes the representation aspects 163common to any source-language. :ref:`ccxx_frontend` describes the data layout 164conventions used by the C and C++ front-ends. 165 166Debug information descriptors 167----------------------------- 168 169In consideration of the complexity and volume of debug information, LLVM 170provides a specification for well formed debug descriptors. 171 172Consumers of LLVM debug information expect the descriptors for program objects 173to start in a canonical format, but the descriptors can include additional 174information appended at the end that is source-language specific. All LLVM 175debugging information is versioned, allowing backwards compatibility in the 176case that the core structures need to change in some way. Also, all debugging 177information objects start with a tag to indicate what type of object it is. 178The source-language is allowed to define its own objects, by using unreserved 179tag numbers. We recommend using with tags in the range 0x1000 through 0x2000 180(there is a defined ``enum DW_TAG_user_base = 0x1000``.) 181 182The fields of debug descriptors used internally by LLVM are restricted to only 183the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and 184``mdnode``. 185 186.. code-block:: llvm 187 188 !1 = metadata !{ 189 i32, ;; A tag 190 ... 191 } 192 193<a name="LLVMDebugVersion">The first field of a descriptor is always an 194``i32`` containing a tag value identifying the content of the descriptor. 195The remaining fields are specific to the descriptor. The values of tags are 196loosely bound to the tag values of DWARF information entries. However, that 197does not restrict the use of the information supplied to DWARF targets. To 198facilitate versioning of debug information, the tag is augmented with the 199current debug version (``LLVMDebugVersion = 8 << 16`` or 0x80000 or 200524288.) 201 202The details of the various descriptors follow. 203 204Compile unit descriptors 205^^^^^^^^^^^^^^^^^^^^^^^^ 206 207.. code-block:: llvm 208 209 !0 = metadata !{ 210 i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit) 211 i32, ;; Unused field. 212 i32, ;; DWARF language identifier (ex. DW_LANG_C89) 213 metadata, ;; Source file name 214 metadata, ;; Source file directory (includes trailing slash) 215 metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") 216 i1, ;; True if this is a main compile unit. 217 i1, ;; True if this is optimized. 218 metadata, ;; Flags 219 i32 ;; Runtime version 220 metadata ;; List of enums types 221 metadata ;; List of retained types 222 metadata ;; List of subprograms 223 metadata ;; List of global variables 224 } 225 226These descriptors contain a source language ID for the file (we use the DWARF 2273.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``, 228``DW_LANG_Cobol74``, etc), three strings describing the filename, working 229directory of the compiler, and an identifier string for the compiler that 230produced it. 231 232Compile unit descriptors provide the root context for objects declared in a 233specific compilation unit. File descriptors are defined using this context. 234These descriptors are collected by a named metadata ``!llvm.dbg.cu``. Compile 235unit descriptor keeps track of subprograms, global variables and type 236information. 237 238.. _format_files: 239 240File descriptors 241^^^^^^^^^^^^^^^^ 242 243.. code-block:: llvm 244 245 !0 = metadata !{ 246 i32, ;; Tag = 41 + LLVMDebugVersion (DW_TAG_file_type) 247 metadata, ;; Source file name 248 metadata, ;; Source file directory (includes trailing slash) 249 metadata ;; Unused 250 } 251 252These descriptors contain information for a file. Global variables and top 253level functions would be defined using this context. File descriptors also 254provide context for source line correspondence. 255 256Each input file is encoded as a separate file descriptor in LLVM debugging 257information output. 258 259.. _format_global_variables: 260 261Global variable descriptors 262^^^^^^^^^^^^^^^^^^^^^^^^^^^ 263 264.. code-block:: llvm 265 266 !1 = metadata !{ 267 i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable) 268 i32, ;; Unused field. 269 metadata, ;; Reference to context descriptor 270 metadata, ;; Name 271 metadata, ;; Display name (fully qualified C++ name) 272 metadata, ;; MIPS linkage name (for C++) 273 metadata, ;; Reference to file where defined 274 i32, ;; Line number where defined 275 metadata, ;; Reference to type descriptor 276 i1, ;; True if the global is local to compile unit (static) 277 i1, ;; True if the global is defined in the compile unit (not extern) 278 {}* ;; Reference to the global variable 279 } 280 281These descriptors provide debug information about globals variables. The 282provide details such as name, type and where the variable is defined. All 283global variables are collected inside the named metadata ``!llvm.dbg.cu``. 284 285.. _format_subprograms: 286 287Subprogram descriptors 288^^^^^^^^^^^^^^^^^^^^^^ 289 290.. code-block:: llvm 291 292 !2 = metadata !{ 293 i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram) 294 i32, ;; Unused field. 295 metadata, ;; Reference to context descriptor 296 metadata, ;; Name 297 metadata, ;; Display name (fully qualified C++ name) 298 metadata, ;; MIPS linkage name (for C++) 299 metadata, ;; Reference to file where defined 300 i32, ;; Line number where defined 301 metadata, ;; Reference to type descriptor 302 i1, ;; True if the global is local to compile unit (static) 303 i1, ;; True if the global is defined in the compile unit (not extern) 304 i32, ;; Line number where the scope of the subprogram begins 305 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual 306 i32, ;; Index into a virtual function 307 metadata, ;; indicates which base type contains the vtable pointer for the 308 ;; derived class 309 i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. 310 i1, ;; isOptimized 311 Function * , ;; Pointer to LLVM function 312 metadata, ;; Lists function template parameters 313 metadata, ;; Function declaration descriptor 314 metadata ;; List of function variables 315 } 316 317These descriptors provide debug information about functions, methods and 318subprograms. They provide details such as name, return types and the source 319location where the subprogram is defined. 320 321Block descriptors 322^^^^^^^^^^^^^^^^^ 323 324.. code-block:: llvm 325 326 !3 = metadata !{ 327 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) 328 metadata,;; Reference to context descriptor 329 i32, ;; Line number 330 i32, ;; Column number 331 metadata,;; Reference to source file 332 i32 ;; Unique ID to identify blocks from a template function 333 } 334 335This descriptor provides debug information about nested blocks within a 336subprogram. The line number and column numbers are used to dinstinguish two 337lexical blocks at same depth. 338 339.. code-block:: llvm 340 341 !3 = metadata !{ 342 i32, ;; Tag = 11 + LLVMDebugVersion (DW_TAG_lexical_block) 343 metadata ;; Reference to the scope we're annotating with a file change 344 metadata,;; Reference to the file the scope is enclosed in. 345 } 346 347This descriptor provides a wrapper around a lexical scope to handle file 348changes in the middle of a lexical block. 349 350.. _format_basic_type: 351 352Basic type descriptors 353^^^^^^^^^^^^^^^^^^^^^^ 354 355.. code-block:: llvm 356 357 !4 = metadata !{ 358 i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type) 359 metadata, ;; Reference to context 360 metadata, ;; Name (may be "" for anonymous types) 361 metadata, ;; Reference to file where defined (may be NULL) 362 i32, ;; Line number where defined (may be 0) 363 i64, ;; Size in bits 364 i64, ;; Alignment in bits 365 i64, ;; Offset in bits 366 i32, ;; Flags 367 i32 ;; DWARF type encoding 368 } 369 370These descriptors define primitive types used in the code. Example ``int``, 371``bool`` and ``float``. The context provides the scope of the type, which is 372usually the top level. Since basic types are not usually user defined the 373context and line number can be left as NULL and 0. The size, alignment and 374offset are expressed in bits and can be 64 bit values. The alignment is used 375to round the offset when embedded in a :ref:`composite type 376<format_composite_type>` (example to keep float doubles on 64 bit boundaries). 377The offset is the bit offset if embedded in a :ref:`composite type 378<format_composite_type>`. 379 380The type encoding provides the details of the type. The values are typically 381one of the following: 382 383.. code-block:: llvm 384 385 DW_ATE_address = 1 386 DW_ATE_boolean = 2 387 DW_ATE_float = 4 388 DW_ATE_signed = 5 389 DW_ATE_signed_char = 6 390 DW_ATE_unsigned = 7 391 DW_ATE_unsigned_char = 8 392 393.. _format_derived_type: 394 395Derived type descriptors 396^^^^^^^^^^^^^^^^^^^^^^^^ 397 398.. code-block:: llvm 399 400 !5 = metadata !{ 401 i32, ;; Tag (see below) 402 metadata, ;; Reference to context 403 metadata, ;; Name (may be "" for anonymous types) 404 metadata, ;; Reference to file where defined (may be NULL) 405 i32, ;; Line number where defined (may be 0) 406 i64, ;; Size in bits 407 i64, ;; Alignment in bits 408 i64, ;; Offset in bits 409 i32, ;; Flags to encode attributes, e.g. private 410 metadata, ;; Reference to type derived from 411 metadata, ;; (optional) Name of the Objective C property associated with 412 ;; Objective-C an ivar 413 metadata, ;; (optional) Name of the Objective C property getter selector. 414 metadata, ;; (optional) Name of the Objective C property setter selector. 415 i32 ;; (optional) Objective C property attributes. 416 } 417 418These descriptors are used to define types derived from other types. The value 419of the tag varies depending on the meaning. The following are possible tag 420values: 421 422.. code-block:: llvm 423 424 DW_TAG_formal_parameter = 5 425 DW_TAG_member = 13 426 DW_TAG_pointer_type = 15 427 DW_TAG_reference_type = 16 428 DW_TAG_typedef = 22 429 DW_TAG_const_type = 38 430 DW_TAG_volatile_type = 53 431 DW_TAG_restrict_type = 55 432 433``DW_TAG_member`` is used to define a member of a :ref:`composite type 434<format_composite_type>` or :ref:`subprogram <format_subprograms>`. The type 435of the member is the :ref:`derived type <format_derived_type>`. 436``DW_TAG_formal_parameter`` is used to define a member which is a formal 437argument of a subprogram. 438 439``DW_TAG_typedef`` is used to provide a name for the derived type. 440 441``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``, 442``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the 443:ref:`derived type <format_derived_type>`. 444 445:ref:`Derived type <format_derived_type>` location can be determined from the 446context and line number. The size, alignment and offset are expressed in bits 447and can be 64 bit values. The alignment is used to round the offset when 448embedded in a :ref:`composite type <format_composite_type>` (example to keep 449float doubles on 64 bit boundaries.) The offset is the bit offset if embedded 450in a :ref:`composite type <format_composite_type>`. 451 452Note that the ``void *`` type is expressed as a type derived from NULL. 453 454.. _format_composite_type: 455 456Composite type descriptors 457^^^^^^^^^^^^^^^^^^^^^^^^^^ 458 459.. code-block:: llvm 460 461 !6 = metadata !{ 462 i32, ;; Tag (see below) 463 metadata, ;; Reference to context 464 metadata, ;; Name (may be "" for anonymous types) 465 metadata, ;; Reference to file where defined (may be NULL) 466 i32, ;; Line number where defined (may be 0) 467 i64, ;; Size in bits 468 i64, ;; Alignment in bits 469 i64, ;; Offset in bits 470 i32, ;; Flags 471 metadata, ;; Reference to type derived from 472 metadata, ;; Reference to array of member descriptors 473 i32 ;; Runtime languages 474 } 475 476These descriptors are used to define types that are composed of 0 or more 477elements. The value of the tag varies depending on the meaning. The following 478are possible tag values: 479 480.. code-block:: llvm 481 482 DW_TAG_array_type = 1 483 DW_TAG_enumeration_type = 4 484 DW_TAG_structure_type = 19 485 DW_TAG_union_type = 23 486 DW_TAG_vector_type = 259 487 DW_TAG_subroutine_type = 21 488 DW_TAG_inheritance = 28 489 490The vector flag indicates that an array type is a native packed vector. 491 492The members of array types (tag = ``DW_TAG_array_type``) or vector types (tag = 493``DW_TAG_vector_type``) are :ref:`subrange descriptors <format_subrange>`, each 494representing the range of subscripts at that level of indexing. 495 496The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are 497:ref:`enumerator descriptors <format_enumerator>`, each representing the 498definition of enumeration value for the set. All enumeration type descriptors 499are collected inside the named metadata ``!llvm.dbg.cu``. 500 501The members of structure (tag = ``DW_TAG_structure_type``) or union (tag = 502``DW_TAG_union_type``) types are any one of the :ref:`basic 503<format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite 504<format_composite_type>` type descriptors, each representing a field member of 505the structure or union. 506 507For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide 508information about base classes, static members and member functions. If a 509member is a :ref:`derived type descriptor <format_derived_type>` and has a tag 510of ``DW_TAG_inheritance``, then the type represents a base class. If the member 511of is a :ref:`global variable descriptor <format_global_variables>` then it 512represents a static member. And, if the member is a :ref:`subprogram 513descriptor <format_subprograms>` then it represents a member function. For 514static members and member functions, ``getName()`` returns the members link or 515the C++ mangled name. ``getDisplayName()`` the simplied version of the name. 516 517The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements 518is the return type for the subroutine. The remaining elements are the formal 519arguments to the subroutine. 520 521:ref:`Composite type <format_composite_type>` location can be determined from 522the context and line number. The size, alignment and offset are expressed in 523bits and can be 64 bit values. The alignment is used to round the offset when 524embedded in a :ref:`composite type <format_composite_type>` (as an example, to 525keep float doubles on 64 bit boundaries). The offset is the bit offset if 526embedded in a :ref:`composite type <format_composite_type>`. 527 528.. _format_subrange: 529 530Subrange descriptors 531^^^^^^^^^^^^^^^^^^^^ 532 533.. code-block:: llvm 534 535 !42 = metadata !{ 536 i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type) 537 i64, ;; Low value 538 i64 ;; High value 539 } 540 541These descriptors are used to define ranges of array subscripts for an array 542:ref:`composite type <format_composite_type>`. The low value defines the lower 543bounds typically zero for C/C++. The high value is the upper bounds. Values 544are 64 bit. ``High - Low + 1`` is the size of the array. If ``Low > High`` 545the array bounds are not included in generated debugging information. 546 547.. _format_enumerator: 548 549Enumerator descriptors 550^^^^^^^^^^^^^^^^^^^^^^ 551 552.. code-block:: llvm 553 554 !6 = metadata !{ 555 i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator) 556 metadata, ;; Name 557 i64 ;; Value 558 } 559 560These descriptors are used to define members of an enumeration :ref:`composite 561type <format_composite_type>`, it associates the name to the value. 562 563Local variables 564^^^^^^^^^^^^^^^ 565 566.. code-block:: llvm 567 568 !7 = metadata !{ 569 i32, ;; Tag (see below) 570 metadata, ;; Context 571 metadata, ;; Name 572 metadata, ;; Reference to file where defined 573 i32, ;; 24 bit - Line number where defined 574 ;; 8 bit - Argument number. 1 indicates 1st argument. 575 metadata, ;; Type descriptor 576 i32, ;; flags 577 metadata ;; (optional) Reference to inline location 578 } 579 580These descriptors are used to define variables local to a sub program. The 581value of the tag depends on the usage of the variable: 582 583.. code-block:: llvm 584 585 DW_TAG_auto_variable = 256 586 DW_TAG_arg_variable = 257 587 DW_TAG_return_variable = 258 588 589An auto variable is any variable declared in the body of the function. An 590argument variable is any variable that appears as a formal argument to the 591function. A return variable is used to track the result of a function and has 592no source correspondent. 593 594The context is either the subprogram or block where the variable is defined. 595Name the source variable name. Context and line indicate where the variable 596was defined. Type descriptor defines the declared type of the variable. 597 598.. _format_common_intrinsics: 599 600Debugger intrinsic functions 601^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 602 603LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to 604provide debug information at various points in generated code. 605 606``llvm.dbg.declare`` 607^^^^^^^^^^^^^^^^^^^^ 608 609.. code-block:: llvm 610 611 void %llvm.dbg.declare(metadata, metadata) 612 613This intrinsic provides information about a local element (e.g., variable). 614The first argument is metadata holding the alloca for the variable. The second 615argument is metadata containing a description of the variable. 616 617``llvm.dbg.value`` 618^^^^^^^^^^^^^^^^^^ 619 620.. code-block:: llvm 621 622 void %llvm.dbg.value(metadata, i64, metadata) 623 624This intrinsic provides information when a user source variable is set to a new 625value. The first argument is the new value (wrapped as metadata). The second 626argument is the offset in the user source variable where the new value is 627written. The third argument is metadata containing a description of the user 628source variable. 629 630Object lifetimes and scoping 631============================ 632 633In many languages, the local variables in functions can have their lifetimes or 634scopes limited to a subset of a function. In the C family of languages, for 635example, variables are only live (readable and writable) within the source 636block that they are defined in. In functional languages, values are only 637readable after they have been defined. Though this is a very obvious concept, 638it is non-trivial to model in LLVM, because it has no notion of scoping in this 639sense, and does not want to be tied to a language's scoping rules. 640 641In order to handle this, the LLVM debug format uses the metadata attached to 642llvm instructions to encode line number and scoping information. Consider the 643following C fragment, for example: 644 645.. code-block:: c 646 647 1. void foo() { 648 2. int X = 21; 649 3. int Y = 22; 650 4. { 651 5. int Z = 23; 652 6. Z = X; 653 7. } 654 8. X = Y; 655 9. } 656 657Compiled to LLVM, this function would be represented like this: 658 659.. code-block:: llvm 660 661 define void @foo() nounwind ssp { 662 entry: 663 %X = alloca i32, align 4 ; <i32*> [#uses=4] 664 %Y = alloca i32, align 4 ; <i32*> [#uses=4] 665 %Z = alloca i32, align 4 ; <i32*> [#uses=3] 666 %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] 667 call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 668 store i32 21, i32* %X, !dbg !8 669 %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] 670 call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 671 store i32 22, i32* %Y, !dbg !11 672 %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] 673 call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 674 store i32 23, i32* %Z, !dbg !15 675 %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] 676 %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] 677 %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] 678 store i32 %add, i32* %Z, !dbg !16 679 %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] 680 store i32 %tmp2, i32* %X, !dbg !17 681 ret void, !dbg !18 682 } 683 684 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone 685 686 !0 = metadata !{i32 459008, metadata !1, metadata !"X", 687 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] 688 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 689 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", 690 metadata !"foo", metadata !3, i32 1, metadata !4, 691 i1 false, i1 true}; [DW_TAG_subprogram ] 692 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", 693 metadata !"/private/tmp", metadata !"clang 1.1", i1 true, 694 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] 695 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, 696 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] 697 !5 = metadata !{null} 698 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, 699 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] 700 !7 = metadata !{i32 2, i32 7, metadata !1, null} 701 !8 = metadata !{i32 2, i32 3, metadata !1, null} 702 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, 703 metadata !6}; [ DW_TAG_auto_variable ] 704 !10 = metadata !{i32 3, i32 7, metadata !1, null} 705 !11 = metadata !{i32 3, i32 3, metadata !1, null} 706 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, 707 metadata !6}; [ DW_TAG_auto_variable ] 708 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 709 !14 = metadata !{i32 5, i32 9, metadata !13, null} 710 !15 = metadata !{i32 5, i32 5, metadata !13, null} 711 !16 = metadata !{i32 6, i32 5, metadata !13, null} 712 !17 = metadata !{i32 8, i32 3, metadata !1, null} 713 !18 = metadata !{i32 9, i32 1, metadata !2, null} 714 715This example illustrates a few important details about LLVM debugging 716information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and 717location information, which are attached to an instruction, are applied 718together to allow a debugger to analyze the relationship between statements, 719variable definitions, and the code used to implement the function. 720 721.. code-block:: llvm 722 723 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 724 725The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the 726variable ``X``. The metadata ``!dbg !7`` attached to the intrinsic provides 727scope information for the variable ``X``. 728 729.. code-block:: llvm 730 731 !7 = metadata !{i32 2, i32 7, metadata !1, null} 732 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 733 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", 734 metadata !"foo", metadata !"foo", metadata !3, i32 1, 735 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] 736 737Here ``!7`` is metadata providing location information. It has four fields: 738line number, column number, scope, and original scope. The original scope 739represents inline location if this instruction is inlined inside a caller, and 740is null otherwise. In this example, scope is encoded by ``!1``. ``!1`` 741represents a lexical block inside the scope ``!2``, where ``!2`` is a 742:ref:`subprogram descriptor <format_subprograms>`. This way the location 743information attached to the intrinsics indicates that the variable ``X`` is 744declared at line number 2 at a function level scope in function ``foo``. 745 746Now lets take another example. 747 748.. code-block:: llvm 749 750 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 751 752The second intrinsic ``%llvm.dbg.declare`` encodes debugging information for 753variable ``Z``. The metadata ``!dbg !14`` attached to the intrinsic provides 754scope information for the variable ``Z``. 755 756.. code-block:: llvm 757 758 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 759 !14 = metadata !{i32 5, i32 9, metadata !13, null} 760 761Here ``!14`` indicates that ``Z`` is declared at line number 5 and 762column number 9 inside of lexical scope ``!13``. The lexical scope itself 763resides inside of lexical scope ``!1`` described above. 764 765The scope information attached with each instruction provides a straightforward 766way to find instructions covered by a scope. 767 768.. _ccxx_frontend: 769 770C/C++ front-end specific debug information 771========================================== 772 773The C and C++ front-ends represent information about the program in a format 774that is effectively identical to `DWARF 3.0 775<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information 776content. This allows code generators to trivially support native debuggers by 777generating standard dwarf information, and contains enough information for 778non-dwarf targets to translate it as needed. 779 780This section describes the forms used to represent C and C++ programs. Other 781languages could pattern themselves after this (which itself is tuned to 782representing programs in the same way that DWARF 3 does), or they could choose 783to provide completely different forms if they don't fit into the DWARF model. 784As support for debugging information gets added to the various LLVM 785source-language front-ends, the information used should be documented here. 786 787The following sections provide examples of various C/C++ constructs and the 788debug information that would best describe those constructs. 789 790C/C++ source file information 791----------------------------- 792 793Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the 794directory ``/Users/mine/sources``, the following code: 795 796.. code-block:: c 797 798 #include "MyHeader.h" 799 800 int main(int argc, char *argv[]) { 801 return 0; 802 } 803 804a C/C++ front-end would generate the following descriptors: 805 806.. code-block:: llvm 807 808 ... 809 ;; 810 ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". 811 ;; 812 !2 = metadata !{ 813 i32 524305, ;; Tag 814 i32 0, ;; Unused 815 i32 4, ;; Language Id 816 metadata !"MySource.cpp", 817 metadata !"/Users/mine/sources", 818 metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", 819 i1 true, ;; Main Compile Unit 820 i1 false, ;; Optimized compile unit 821 metadata !"", ;; Compiler flags 822 i32 0} ;; Runtime version 823 824 ;; 825 ;; Define the file for the file "/Users/mine/sources/MySource.cpp". 826 ;; 827 !1 = metadata !{ 828 i32 524329, ;; Tag 829 metadata !"MySource.cpp", 830 metadata !"/Users/mine/sources", 831 metadata !2 ;; Compile unit 832 } 833 834 ;; 835 ;; Define the file for the file "/Users/mine/sources/Myheader.h" 836 ;; 837 !3 = metadata !{ 838 i32 524329, ;; Tag 839 metadata !"Myheader.h" 840 metadata !"/Users/mine/sources", 841 metadata !2 ;; Compile unit 842 } 843 844 ... 845 846``llvm::Instruction`` provides easy access to metadata attached with an 847instruction. One can extract line number information encoded in LLVM IR using 848``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``. 849 850.. code-block:: c++ 851 852 if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction 853 DILocation Loc(N); // DILocation is in DebugInfo.h 854 unsigned Line = Loc.getLineNumber(); 855 StringRef File = Loc.getFilename(); 856 StringRef Dir = Loc.getDirectory(); 857 } 858 859C/C++ global variable information 860--------------------------------- 861 862Given an integer global variable declared as follows: 863 864.. code-block:: c 865 866 int MyGlobal = 100; 867 868a C/C++ front-end would generate the following descriptors: 869 870.. code-block:: llvm 871 872 ;; 873 ;; Define the global itself. 874 ;; 875 %MyGlobal = global int 100 876 ... 877 ;; 878 ;; List of debug info of globals 879 ;; 880 !llvm.dbg.cu = !{!0} 881 882 ;; Define the compile unit. 883 !0 = metadata !{ 884 i32 786449, ;; Tag 885 i32 0, ;; Context 886 i32 4, ;; Language 887 metadata !"foo.cpp", ;; File 888 metadata !"/Volumes/Data/tmp", ;; Directory 889 metadata !"clang version 3.1 ", ;; Producer 890 i1 true, ;; Deprecated field 891 i1 false, ;; "isOptimized"? 892 metadata !"", ;; Flags 893 i32 0, ;; Runtime Version 894 metadata !1, ;; Enum Types 895 metadata !1, ;; Retained Types 896 metadata !1, ;; Subprograms 897 metadata !3 ;; Global Variables 898 } ; [ DW_TAG_compile_unit ] 899 900 ;; The Array of Global Variables 901 !3 = metadata !{ 902 metadata !4 903 } 904 905 !4 = metadata !{ 906 metadata !5 907 } 908 909 ;; 910 ;; Define the global variable itself. 911 ;; 912 !5 = metadata !{ 913 i32 786484, ;; Tag 914 i32 0, ;; Unused 915 null, ;; Unused 916 metadata !"MyGlobal", ;; Name 917 metadata !"MyGlobal", ;; Display Name 918 metadata !"", ;; Linkage Name 919 metadata !6, ;; File 920 i32 1, ;; Line 921 metadata !7, ;; Type 922 i32 0, ;; IsLocalToUnit 923 i32 1, ;; IsDefinition 924 i32* @MyGlobal ;; LLVM-IR Value 925 } ; [ DW_TAG_variable ] 926 927 ;; 928 ;; Define the file 929 ;; 930 !6 = metadata !{ 931 i32 786473, ;; Tag 932 metadata !"foo.cpp", ;; File 933 metadata !"/Volumes/Data/tmp", ;; Directory 934 null ;; Unused 935 } ; [ DW_TAG_file_type ] 936 937 ;; 938 ;; Define the type 939 ;; 940 !7 = metadata !{ 941 i32 786468, ;; Tag 942 null, ;; Unused 943 metadata !"int", ;; Name 944 null, ;; Unused 945 i32 0, ;; Line 946 i64 32, ;; Size in Bits 947 i64 32, ;; Align in Bits 948 i64 0, ;; Offset 949 i32 0, ;; Flags 950 i32 5 ;; Encoding 951 } ; [ DW_TAG_base_type ] 952 953C/C++ function information 954-------------------------- 955 956Given a function declared as follows: 957 958.. code-block:: c 959 960 int main(int argc, char *argv[]) { 961 return 0; 962 } 963 964a C/C++ front-end would generate the following descriptors: 965 966.. code-block:: llvm 967 968 ;; 969 ;; Define the anchor for subprograms. Note that the second field of the 970 ;; anchor is 46, which is the same as the tag for subprograms 971 ;; (46 = DW_TAG_subprogram.) 972 ;; 973 !6 = metadata !{ 974 i32 524334, ;; Tag 975 i32 0, ;; Unused 976 metadata !1, ;; Context 977 metadata !"main", ;; Name 978 metadata !"main", ;; Display name 979 metadata !"main", ;; Linkage name 980 metadata !1, ;; File 981 i32 1, ;; Line number 982 metadata !4, ;; Type 983 i1 false, ;; Is local 984 i1 true, ;; Is definition 985 i32 0, ;; Virtuality attribute, e.g. pure virtual function 986 i32 0, ;; Index into virtual table for C++ methods 987 i32 0, ;; Type that holds virtual table. 988 i32 0, ;; Flags 989 i1 false, ;; True if this function is optimized 990 Function *, ;; Pointer to llvm::Function 991 null ;; Function template parameters 992 } 993 ;; 994 ;; Define the subprogram itself. 995 ;; 996 define i32 @main(i32 %argc, i8** %argv) { 997 ... 998 } 999 1000C/C++ basic types 1001----------------- 1002 1003The following are the basic type descriptors for C/C++ core types: 1004 1005bool 1006^^^^ 1007 1008.. code-block:: llvm 1009 1010 !2 = metadata !{ 1011 i32 524324, ;; Tag 1012 metadata !1, ;; Context 1013 metadata !"bool", ;; Name 1014 metadata !1, ;; File 1015 i32 0, ;; Line number 1016 i64 8, ;; Size in Bits 1017 i64 8, ;; Align in Bits 1018 i64 0, ;; Offset in Bits 1019 i32 0, ;; Flags 1020 i32 2 ;; Encoding 1021 } 1022 1023char 1024^^^^ 1025 1026.. code-block:: llvm 1027 1028 !2 = metadata !{ 1029 i32 524324, ;; Tag 1030 metadata !1, ;; Context 1031 metadata !"char", ;; Name 1032 metadata !1, ;; File 1033 i32 0, ;; Line number 1034 i64 8, ;; Size in Bits 1035 i64 8, ;; Align in Bits 1036 i64 0, ;; Offset in Bits 1037 i32 0, ;; Flags 1038 i32 6 ;; Encoding 1039 } 1040 1041unsigned char 1042^^^^^^^^^^^^^ 1043 1044.. code-block:: llvm 1045 1046 !2 = metadata !{ 1047 i32 524324, ;; Tag 1048 metadata !1, ;; Context 1049 metadata !"unsigned char", 1050 metadata !1, ;; File 1051 i32 0, ;; Line number 1052 i64 8, ;; Size in Bits 1053 i64 8, ;; Align in Bits 1054 i64 0, ;; Offset in Bits 1055 i32 0, ;; Flags 1056 i32 8 ;; Encoding 1057 } 1058 1059short 1060^^^^^ 1061 1062.. code-block:: llvm 1063 1064 !2 = metadata !{ 1065 i32 524324, ;; Tag 1066 metadata !1, ;; Context 1067 metadata !"short int", 1068 metadata !1, ;; File 1069 i32 0, ;; Line number 1070 i64 16, ;; Size in Bits 1071 i64 16, ;; Align in Bits 1072 i64 0, ;; Offset in Bits 1073 i32 0, ;; Flags 1074 i32 5 ;; Encoding 1075 } 1076 1077unsigned short 1078^^^^^^^^^^^^^^ 1079 1080.. code-block:: llvm 1081 1082 !2 = metadata !{ 1083 i32 524324, ;; Tag 1084 metadata !1, ;; Context 1085 metadata !"short unsigned int", 1086 metadata !1, ;; File 1087 i32 0, ;; Line number 1088 i64 16, ;; Size in Bits 1089 i64 16, ;; Align in Bits 1090 i64 0, ;; Offset in Bits 1091 i32 0, ;; Flags 1092 i32 7 ;; Encoding 1093 } 1094 1095int 1096^^^ 1097 1098.. code-block:: llvm 1099 1100 !2 = metadata !{ 1101 i32 524324, ;; Tag 1102 metadata !1, ;; Context 1103 metadata !"int", ;; Name 1104 metadata !1, ;; File 1105 i32 0, ;; Line number 1106 i64 32, ;; Size in Bits 1107 i64 32, ;; Align in Bits 1108 i64 0, ;; Offset in Bits 1109 i32 0, ;; Flags 1110 i32 5 ;; Encoding 1111 } 1112 1113unsigned int 1114^^^^^^^^^^^^ 1115 1116.. code-block:: llvm 1117 1118 !2 = metadata !{ 1119 i32 524324, ;; Tag 1120 metadata !1, ;; Context 1121 metadata !"unsigned int", 1122 metadata !1, ;; File 1123 i32 0, ;; Line number 1124 i64 32, ;; Size in Bits 1125 i64 32, ;; Align in Bits 1126 i64 0, ;; Offset in Bits 1127 i32 0, ;; Flags 1128 i32 7 ;; Encoding 1129 } 1130 1131long long 1132^^^^^^^^^ 1133 1134.. code-block:: llvm 1135 1136 !2 = metadata !{ 1137 i32 524324, ;; Tag 1138 metadata !1, ;; Context 1139 metadata !"long long int", 1140 metadata !1, ;; File 1141 i32 0, ;; Line number 1142 i64 64, ;; Size in Bits 1143 i64 64, ;; Align in Bits 1144 i64 0, ;; Offset in Bits 1145 i32 0, ;; Flags 1146 i32 5 ;; Encoding 1147 } 1148 1149unsigned long long 1150^^^^^^^^^^^^^^^^^^ 1151 1152.. code-block:: llvm 1153 1154 !2 = metadata !{ 1155 i32 524324, ;; Tag 1156 metadata !1, ;; Context 1157 metadata !"long long unsigned int", 1158 metadata !1, ;; File 1159 i32 0, ;; Line number 1160 i64 64, ;; Size in Bits 1161 i64 64, ;; Align in Bits 1162 i64 0, ;; Offset in Bits 1163 i32 0, ;; Flags 1164 i32 7 ;; Encoding 1165 } 1166 1167float 1168^^^^^ 1169 1170.. code-block:: llvm 1171 1172 !2 = metadata !{ 1173 i32 524324, ;; Tag 1174 metadata !1, ;; Context 1175 metadata !"float", 1176 metadata !1, ;; File 1177 i32 0, ;; Line number 1178 i64 32, ;; Size in Bits 1179 i64 32, ;; Align in Bits 1180 i64 0, ;; Offset in Bits 1181 i32 0, ;; Flags 1182 i32 4 ;; Encoding 1183 } 1184 1185double 1186^^^^^^ 1187 1188.. code-block:: llvm 1189 1190 !2 = metadata !{ 1191 i32 524324, ;; Tag 1192 metadata !1, ;; Context 1193 metadata !"double",;; Name 1194 metadata !1, ;; File 1195 i32 0, ;; Line number 1196 i64 64, ;; Size in Bits 1197 i64 64, ;; Align in Bits 1198 i64 0, ;; Offset in Bits 1199 i32 0, ;; Flags 1200 i32 4 ;; Encoding 1201 } 1202 1203C/C++ derived types 1204------------------- 1205 1206Given the following as an example of C/C++ derived type: 1207 1208.. code-block:: c 1209 1210 typedef const int *IntPtr; 1211 1212a C/C++ front-end would generate the following descriptors: 1213 1214.. code-block:: llvm 1215 1216 ;; 1217 ;; Define the typedef "IntPtr". 1218 ;; 1219 !2 = metadata !{ 1220 i32 524310, ;; Tag 1221 metadata !1, ;; Context 1222 metadata !"IntPtr", ;; Name 1223 metadata !3, ;; File 1224 i32 0, ;; Line number 1225 i64 0, ;; Size in bits 1226 i64 0, ;; Align in bits 1227 i64 0, ;; Offset in bits 1228 i32 0, ;; Flags 1229 metadata !4 ;; Derived From type 1230 } 1231 ;; 1232 ;; Define the pointer type. 1233 ;; 1234 !4 = metadata !{ 1235 i32 524303, ;; Tag 1236 metadata !1, ;; Context 1237 metadata !"", ;; Name 1238 metadata !1, ;; File 1239 i32 0, ;; Line number 1240 i64 64, ;; Size in bits 1241 i64 64, ;; Align in bits 1242 i64 0, ;; Offset in bits 1243 i32 0, ;; Flags 1244 metadata !5 ;; Derived From type 1245 } 1246 ;; 1247 ;; Define the const type. 1248 ;; 1249 !5 = metadata !{ 1250 i32 524326, ;; Tag 1251 metadata !1, ;; Context 1252 metadata !"", ;; Name 1253 metadata !1, ;; File 1254 i32 0, ;; Line number 1255 i64 32, ;; Size in bits 1256 i64 32, ;; Align in bits 1257 i64 0, ;; Offset in bits 1258 i32 0, ;; Flags 1259 metadata !6 ;; Derived From type 1260 } 1261 ;; 1262 ;; Define the int type. 1263 ;; 1264 !6 = metadata !{ 1265 i32 524324, ;; Tag 1266 metadata !1, ;; Context 1267 metadata !"int", ;; Name 1268 metadata !1, ;; File 1269 i32 0, ;; Line number 1270 i64 32, ;; Size in bits 1271 i64 32, ;; Align in bits 1272 i64 0, ;; Offset in bits 1273 i32 0, ;; Flags 1274 5 ;; Encoding 1275 } 1276 1277C/C++ struct/union types 1278------------------------ 1279 1280Given the following as an example of C/C++ struct type: 1281 1282.. code-block:: c 1283 1284 struct Color { 1285 unsigned Red; 1286 unsigned Green; 1287 unsigned Blue; 1288 }; 1289 1290a C/C++ front-end would generate the following descriptors: 1291 1292.. code-block:: llvm 1293 1294 ;; 1295 ;; Define basic type for unsigned int. 1296 ;; 1297 !5 = metadata !{ 1298 i32 524324, ;; Tag 1299 metadata !1, ;; Context 1300 metadata !"unsigned int", 1301 metadata !1, ;; File 1302 i32 0, ;; Line number 1303 i64 32, ;; Size in Bits 1304 i64 32, ;; Align in Bits 1305 i64 0, ;; Offset in Bits 1306 i32 0, ;; Flags 1307 i32 7 ;; Encoding 1308 } 1309 ;; 1310 ;; Define composite type for struct Color. 1311 ;; 1312 !2 = metadata !{ 1313 i32 524307, ;; Tag 1314 metadata !1, ;; Context 1315 metadata !"Color", ;; Name 1316 metadata !1, ;; Compile unit 1317 i32 1, ;; Line number 1318 i64 96, ;; Size in bits 1319 i64 32, ;; Align in bits 1320 i64 0, ;; Offset in bits 1321 i32 0, ;; Flags 1322 null, ;; Derived From 1323 metadata !3, ;; Elements 1324 i32 0 ;; Runtime Language 1325 } 1326 1327 ;; 1328 ;; Define the Red field. 1329 ;; 1330 !4 = metadata !{ 1331 i32 524301, ;; Tag 1332 metadata !1, ;; Context 1333 metadata !"Red", ;; Name 1334 metadata !1, ;; File 1335 i32 2, ;; Line number 1336 i64 32, ;; Size in bits 1337 i64 32, ;; Align in bits 1338 i64 0, ;; Offset in bits 1339 i32 0, ;; Flags 1340 metadata !5 ;; Derived From type 1341 } 1342 1343 ;; 1344 ;; Define the Green field. 1345 ;; 1346 !6 = metadata !{ 1347 i32 524301, ;; Tag 1348 metadata !1, ;; Context 1349 metadata !"Green", ;; Name 1350 metadata !1, ;; File 1351 i32 3, ;; Line number 1352 i64 32, ;; Size in bits 1353 i64 32, ;; Align in bits 1354 i64 32, ;; Offset in bits 1355 i32 0, ;; Flags 1356 metadata !5 ;; Derived From type 1357 } 1358 1359 ;; 1360 ;; Define the Blue field. 1361 ;; 1362 !7 = metadata !{ 1363 i32 524301, ;; Tag 1364 metadata !1, ;; Context 1365 metadata !"Blue", ;; Name 1366 metadata !1, ;; File 1367 i32 4, ;; Line number 1368 i64 32, ;; Size in bits 1369 i64 32, ;; Align in bits 1370 i64 64, ;; Offset in bits 1371 i32 0, ;; Flags 1372 metadata !5 ;; Derived From type 1373 } 1374 1375 ;; 1376 ;; Define the array of fields used by the composite type Color. 1377 ;; 1378 !3 = metadata !{metadata !4, metadata !6, metadata !7} 1379 1380C/C++ enumeration types 1381----------------------- 1382 1383Given the following as an example of C/C++ enumeration type: 1384 1385.. code-block:: c 1386 1387 enum Trees { 1388 Spruce = 100, 1389 Oak = 200, 1390 Maple = 300 1391 }; 1392 1393a C/C++ front-end would generate the following descriptors: 1394 1395.. code-block:: llvm 1396 1397 ;; 1398 ;; Define composite type for enum Trees 1399 ;; 1400 !2 = metadata !{ 1401 i32 524292, ;; Tag 1402 metadata !1, ;; Context 1403 metadata !"Trees", ;; Name 1404 metadata !1, ;; File 1405 i32 1, ;; Line number 1406 i64 32, ;; Size in bits 1407 i64 32, ;; Align in bits 1408 i64 0, ;; Offset in bits 1409 i32 0, ;; Flags 1410 null, ;; Derived From type 1411 metadata !3, ;; Elements 1412 i32 0 ;; Runtime language 1413 } 1414 1415 ;; 1416 ;; Define the array of enumerators used by composite type Trees. 1417 ;; 1418 !3 = metadata !{metadata !4, metadata !5, metadata !6} 1419 1420 ;; 1421 ;; Define Spruce enumerator. 1422 ;; 1423 !4 = metadata !{i32 524328, metadata !"Spruce", i64 100} 1424 1425 ;; 1426 ;; Define Oak enumerator. 1427 ;; 1428 !5 = metadata !{i32 524328, metadata !"Oak", i64 200} 1429 1430 ;; 1431 ;; Define Maple enumerator. 1432 ;; 1433 !6 = metadata !{i32 524328, metadata !"Maple", i64 300} 1434 1435Debugging information format 1436============================ 1437 1438Debugging Information Extension for Objective C Properties 1439---------------------------------------------------------- 1440 1441Introduction 1442^^^^^^^^^^^^ 1443 1444Objective C provides a simpler way to declare and define accessor methods using 1445declared properties. The language provides features to declare a property and 1446to let compiler synthesize accessor methods. 1447 1448The debugger lets developer inspect Objective C interfaces and their instance 1449variables and class variables. However, the debugger does not know anything 1450about the properties defined in Objective C interfaces. The debugger consumes 1451information generated by compiler in DWARF format. The format does not support 1452encoding of Objective C properties. This proposal describes DWARF extensions to 1453encode Objective C properties, which the debugger can use to let developers 1454inspect Objective C properties. 1455 1456Proposal 1457^^^^^^^^ 1458 1459Objective C properties exist separately from class members. A property can be 1460defined only by "setter" and "getter" selectors, and be calculated anew on each 1461access. Or a property can just be a direct access to some declared ivar. 1462Finally it can have an ivar "automatically synthesized" for it by the compiler, 1463in which case the property can be referred to in user code directly using the 1464standard C dereference syntax as well as through the property "dot" syntax, but 1465there is no entry in the ``@interface`` declaration corresponding to this ivar. 1466 1467To facilitate debugging, these properties we will add a new DWARF TAG into the 1468``DW_TAG_structure_type`` definition for the class to hold the description of a 1469given property, and a set of DWARF attributes that provide said description. 1470The property tag will also contain the name and declared type of the property. 1471 1472If there is a related ivar, there will also be a DWARF property attribute placed 1473in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG 1474for that property. And in the case where the compiler synthesizes the ivar 1475directly, the compiler is expected to generate a ``DW_TAG_member`` for that 1476ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used 1477to access this ivar directly in code, and with the property attribute pointing 1478back to the property it is backing. 1479 1480The following examples will serve as illustration for our discussion: 1481 1482.. code-block:: objc 1483 1484 @interface I1 { 1485 int n2; 1486 } 1487 1488 @property int p1; 1489 @property int p2; 1490 @end 1491 1492 @implementation I1 1493 @synthesize p1; 1494 @synthesize p2 = n2; 1495 @end 1496 1497This produces the following DWARF (this is a "pseudo dwarfdump" output): 1498 1499.. code-block:: none 1500 1501 0x00000100: TAG_structure_type [7] * 1502 AT_APPLE_runtime_class( 0x10 ) 1503 AT_name( "I1" ) 1504 AT_decl_file( "Objc_Property.m" ) 1505 AT_decl_line( 3 ) 1506 1507 0x00000110 TAG_APPLE_property 1508 AT_name ( "p1" ) 1509 AT_type ( {0x00000150} ( int ) ) 1510 1511 0x00000120: TAG_APPLE_property 1512 AT_name ( "p2" ) 1513 AT_type ( {0x00000150} ( int ) ) 1514 1515 0x00000130: TAG_member [8] 1516 AT_name( "_p1" ) 1517 AT_APPLE_property ( {0x00000110} "p1" ) 1518 AT_type( {0x00000150} ( int ) ) 1519 AT_artificial ( 0x1 ) 1520 1521 0x00000140: TAG_member [8] 1522 AT_name( "n2" ) 1523 AT_APPLE_property ( {0x00000120} "p2" ) 1524 AT_type( {0x00000150} ( int ) ) 1525 1526 0x00000150: AT_type( ( int ) ) 1527 1528Note, the current convention is that the name of the ivar for an 1529auto-synthesized property is the name of the property from which it derives 1530with an underscore prepended, as is shown in the example. But we actually 1531don't need to know this convention, since we are given the name of the ivar 1532directly. 1533 1534Also, it is common practice in ObjC to have different property declarations in 1535the @interface and @implementation - e.g. to provide a read-only property in 1536the interface,and a read-write interface in the implementation. In that case, 1537the compiler should emit whichever property declaration will be in force in the 1538current translation unit. 1539 1540Developers can decorate a property with attributes which are encoded using 1541``DW_AT_APPLE_property_attribute``. 1542 1543.. code-block:: objc 1544 1545 @property (readonly, nonatomic) int pr; 1546 1547.. code-block:: none 1548 1549 TAG_APPLE_property [8] 1550 AT_name( "pr" ) 1551 AT_type ( {0x00000147} (int) ) 1552 AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) 1553 1554The setter and getter method names are attached to the property using 1555``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes. 1556 1557.. code-block:: objc 1558 1559 @interface I1 1560 @property (setter=myOwnP3Setter:) int p3; 1561 -(void)myOwnP3Setter:(int)a; 1562 @end 1563 1564 @implementation I1 1565 @synthesize p3; 1566 -(void)myOwnP3Setter:(int)a{ } 1567 @end 1568 1569The DWARF for this would be: 1570 1571.. code-block:: none 1572 1573 0x000003bd: TAG_structure_type [7] * 1574 AT_APPLE_runtime_class( 0x10 ) 1575 AT_name( "I1" ) 1576 AT_decl_file( "Objc_Property.m" ) 1577 AT_decl_line( 3 ) 1578 1579 0x000003cd TAG_APPLE_property 1580 AT_name ( "p3" ) 1581 AT_APPLE_property_setter ( "myOwnP3Setter:" ) 1582 AT_type( {0x00000147} ( int ) ) 1583 1584 0x000003f3: TAG_member [8] 1585 AT_name( "_p3" ) 1586 AT_type ( {0x00000147} ( int ) ) 1587 AT_APPLE_property ( {0x000003cd} ) 1588 AT_artificial ( 0x1 ) 1589 1590New DWARF Tags 1591^^^^^^^^^^^^^^ 1592 1593+-----------------------+--------+ 1594| TAG | Value | 1595+=======================+========+ 1596| DW_TAG_APPLE_property | 0x4200 | 1597+-----------------------+--------+ 1598 1599New DWARF Attributes 1600^^^^^^^^^^^^^^^^^^^^ 1601 1602+--------------------------------+--------+-----------+ 1603| Attribute | Value | Classes | 1604+================================+========+===========+ 1605| DW_AT_APPLE_property | 0x3fed | Reference | 1606+--------------------------------+--------+-----------+ 1607| DW_AT_APPLE_property_getter | 0x3fe9 | String | 1608+--------------------------------+--------+-----------+ 1609| DW_AT_APPLE_property_setter | 0x3fea | String | 1610+--------------------------------+--------+-----------+ 1611| DW_AT_APPLE_property_attribute | 0x3feb | Constant | 1612+--------------------------------+--------+-----------+ 1613 1614New DWARF Constants 1615^^^^^^^^^^^^^^^^^^^ 1616 1617+--------------------------------+-------+ 1618| Name | Value | 1619+================================+=======+ 1620| DW_AT_APPLE_PROPERTY_readonly | 0x1 | 1621+--------------------------------+-------+ 1622| DW_AT_APPLE_PROPERTY_readwrite | 0x2 | 1623+--------------------------------+-------+ 1624| DW_AT_APPLE_PROPERTY_assign | 0x4 | 1625+--------------------------------+-------+ 1626| DW_AT_APPLE_PROPERTY_retain | 0x8 | 1627+--------------------------------+-------+ 1628| DW_AT_APPLE_PROPERTY_copy | 0x10 | 1629+--------------------------------+-------+ 1630| DW_AT_APPLE_PROPERTY_nonatomic | 0x20 | 1631+--------------------------------+-------+ 1632 1633Name Accelerator Tables 1634----------------------- 1635 1636Introduction 1637^^^^^^^^^^^^ 1638 1639The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a 1640debugger needs. The "``pub``" in the section name indicates that the entries 1641in the table are publicly visible names only. This means no static or hidden 1642functions show up in the "``.debug_pubnames``". No static variables or private 1643class variables are in the "``.debug_pubtypes``". Many compilers add different 1644things to these tables, so we can't rely upon the contents between gcc, icc, or 1645clang. 1646 1647The typical query given by users tends not to match up with the contents of 1648these tables. For example, the DWARF spec states that "In the case of the name 1649of a function member or static data member of a C++ structure, class or union, 1650the name presented in the "``.debug_pubnames``" section is not the simple name 1651given by the ``DW_AT_name attribute`` of the referenced debugging information 1652entry, but rather the fully qualified name of the data or function member." 1653So the only names in these tables for complex C++ entries is a fully 1654qualified name. Debugger users tend not to enter their search strings as 1655"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or 1656"``a::b::c``". So the name entered in the name table must be demangled in 1657order to chop it up appropriately and additional names must be manually entered 1658into the table to make it effective as a name lookup table for debuggers to 1659se. 1660 1661All debuggers currently ignore the "``.debug_pubnames``" table as a result of 1662its inconsistent and useless public-only name content making it a waste of 1663space in the object file. These tables, when they are written to disk, are not 1664sorted in any way, leaving every debugger to do its own parsing and sorting. 1665These tables also include an inlined copy of the string values in the table 1666itself making the tables much larger than they need to be on disk, especially 1667for large C++ programs. 1668 1669Can't we just fix the sections by adding all of the names we need to this 1670table? No, because that is not what the tables are defined to contain and we 1671won't know the difference between the old bad tables and the new good tables. 1672At best we could make our own renamed sections that contain all of the data we 1673need. 1674 1675These tables are also insufficient for what a debugger like LLDB needs. LLDB 1676uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then 1677often asked to look for type "``foo``" or namespace "``bar``", or list items in 1678namespace "``baz``". Namespaces are not included in the pubnames or pubtypes 1679tables. Since clang asks a lot of questions when it is parsing an expression, 1680we need to be very fast when looking up names, as it happens a lot. Having new 1681accelerator tables that are optimized for very quick lookups will benefit this 1682type of debugging experience greatly. 1683 1684We would like to generate name lookup tables that can be mapped into memory 1685from disk, and used as is, with little or no up-front parsing. We would also 1686be able to control the exact content of these different tables so they contain 1687exactly what we need. The Name Accelerator Tables were designed to fix these 1688issues. In order to solve these issues we need to: 1689 1690* Have a format that can be mapped into memory from disk and used as is 1691* Lookups should be very fast 1692* Extensible table format so these tables can be made by many producers 1693* Contain all of the names needed for typical lookups out of the box 1694* Strict rules for the contents of tables 1695 1696Table size is important and the accelerator table format should allow the reuse 1697of strings from common string tables so the strings for the names are not 1698duplicated. We also want to make sure the table is ready to be used as-is by 1699simply mapping the table into memory with minimal header parsing. 1700 1701The name lookups need to be fast and optimized for the kinds of lookups that 1702debuggers tend to do. Optimally we would like to touch as few parts of the 1703mapped table as possible when doing a name lookup and be able to quickly find 1704the name entry we are looking for, or discover there are no matches. In the 1705case of debuggers we optimized for lookups that fail most of the time. 1706 1707Each table that is defined should have strict rules on exactly what is in the 1708accelerator tables and documented so clients can rely on the content. 1709 1710Hash Tables 1711^^^^^^^^^^^ 1712 1713Standard Hash Tables 1714"""""""""""""""""""" 1715 1716Typical hash tables have a header, buckets, and each bucket points to the 1717bucket contents: 1718 1719.. code-block:: none 1720 1721 .------------. 1722 | HEADER | 1723 |------------| 1724 | BUCKETS | 1725 |------------| 1726 | DATA | 1727 `------------' 1728 1729The BUCKETS are an array of offsets to DATA for each hash: 1730 1731.. code-block:: none 1732 1733 .------------. 1734 | 0x00001000 | BUCKETS[0] 1735 | 0x00002000 | BUCKETS[1] 1736 | 0x00002200 | BUCKETS[2] 1737 | 0x000034f0 | BUCKETS[3] 1738 | | ... 1739 | 0xXXXXXXXX | BUCKETS[n_buckets] 1740 '------------' 1741 1742So for ``bucket[3]`` in the example above, we have an offset into the table 17430x000034f0 which points to a chain of entries for the bucket. Each bucket must 1744contain a next pointer, full 32 bit hash value, the string itself, and the data 1745for the current string value. 1746 1747.. code-block:: none 1748 1749 .------------. 1750 0x000034f0: | 0x00003500 | next pointer 1751 | 0x12345678 | 32 bit hash 1752 | "erase" | string value 1753 | data[n] | HashData for this bucket 1754 |------------| 1755 0x00003500: | 0x00003550 | next pointer 1756 | 0x29273623 | 32 bit hash 1757 | "dump" | string value 1758 | data[n] | HashData for this bucket 1759 |------------| 1760 0x00003550: | 0x00000000 | next pointer 1761 | 0x82638293 | 32 bit hash 1762 | "main" | string value 1763 | data[n] | HashData for this bucket 1764 `------------' 1765 1766The problem with this layout for debuggers is that we need to optimize for the 1767negative lookup case where the symbol we're searching for is not present. So 1768if we were to lookup "``printf``" in the table above, we would make a 32 hash 1769for "``printf``", it might match ``bucket[3]``. We would need to go to the 1770offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do 1771so, we need to read the next pointer, then read the hash, compare it, and skip 1772to the next bucket. Each time we are skipping many bytes in memory and 1773touching new cache pages just to do the compare on the full 32 bit hash. All 1774of these accesses then tell us that we didn't have a match. 1775 1776Name Hash Tables 1777"""""""""""""""" 1778 1779To solve the issues mentioned above we have structured the hash tables a bit 1780differently: a header, buckets, an array of all unique 32 bit hash values, 1781followed by an array of hash value data offsets, one for each hash value, then 1782the data for all hash values: 1783 1784.. code-block:: none 1785 1786 .-------------. 1787 | HEADER | 1788 |-------------| 1789 | BUCKETS | 1790 |-------------| 1791 | HASHES | 1792 |-------------| 1793 | OFFSETS | 1794 |-------------| 1795 | DATA | 1796 `-------------' 1797 1798The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By 1799making all of the full 32 bit hash values contiguous in memory, we allow 1800ourselves to efficiently check for a match while touching as little memory as 1801possible. Most often checking the 32 bit hash values is as far as the lookup 1802goes. If it does match, it usually is a match with no collisions. So for a 1803table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash 1804values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and 1805``OFFSETS`` as: 1806 1807.. code-block:: none 1808 1809 .-------------------------. 1810 | HEADER.magic | uint32_t 1811 | HEADER.version | uint16_t 1812 | HEADER.hash_function | uint16_t 1813 | HEADER.bucket_count | uint32_t 1814 | HEADER.hashes_count | uint32_t 1815 | HEADER.header_data_len | uint32_t 1816 | HEADER_DATA | HeaderData 1817 |-------------------------| 1818 | BUCKETS | uint32_t[bucket_count] // 32 bit hash indexes 1819 |-------------------------| 1820 | HASHES | uint32_t[hashes_count] // 32 bit hash values 1821 |-------------------------| 1822 | OFFSETS | uint32_t[hashes_count] // 32 bit offsets to hash value data 1823 |-------------------------| 1824 | ALL HASH DATA | 1825 `-------------------------' 1826 1827So taking the exact same data from the standard hash example above we end up 1828with: 1829 1830.. code-block:: none 1831 1832 .------------. 1833 | HEADER | 1834 |------------| 1835 | 0 | BUCKETS[0] 1836 | 2 | BUCKETS[1] 1837 | 5 | BUCKETS[2] 1838 | 6 | BUCKETS[3] 1839 | | ... 1840 | ... | BUCKETS[n_buckets] 1841 |------------| 1842 | 0x........ | HASHES[0] 1843 | 0x........ | HASHES[1] 1844 | 0x........ | HASHES[2] 1845 | 0x........ | HASHES[3] 1846 | 0x........ | HASHES[4] 1847 | 0x........ | HASHES[5] 1848 | 0x12345678 | HASHES[6] hash for BUCKETS[3] 1849 | 0x29273623 | HASHES[7] hash for BUCKETS[3] 1850 | 0x82638293 | HASHES[8] hash for BUCKETS[3] 1851 | 0x........ | HASHES[9] 1852 | 0x........ | HASHES[10] 1853 | 0x........ | HASHES[11] 1854 | 0x........ | HASHES[12] 1855 | 0x........ | HASHES[13] 1856 | 0x........ | HASHES[n_hashes] 1857 |------------| 1858 | 0x........ | OFFSETS[0] 1859 | 0x........ | OFFSETS[1] 1860 | 0x........ | OFFSETS[2] 1861 | 0x........ | OFFSETS[3] 1862 | 0x........ | OFFSETS[4] 1863 | 0x........ | OFFSETS[5] 1864 | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] 1865 | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] 1866 | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] 1867 | 0x........ | OFFSETS[9] 1868 | 0x........ | OFFSETS[10] 1869 | 0x........ | OFFSETS[11] 1870 | 0x........ | OFFSETS[12] 1871 | 0x........ | OFFSETS[13] 1872 | 0x........ | OFFSETS[n_hashes] 1873 |------------| 1874 | | 1875 | | 1876 | | 1877 | | 1878 | | 1879 |------------| 1880 0x000034f0: | 0x00001203 | .debug_str ("erase") 1881 | 0x00000004 | A 32 bit array count - number of HashData with name "erase" 1882 | 0x........ | HashData[0] 1883 | 0x........ | HashData[1] 1884 | 0x........ | HashData[2] 1885 | 0x........ | HashData[3] 1886 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1887 |------------| 1888 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") 1889 | 0x00000002 | A 32 bit array count - number of HashData with name "collision" 1890 | 0x........ | HashData[0] 1891 | 0x........ | HashData[1] 1892 | 0x00001203 | String offset into .debug_str ("dump") 1893 | 0x00000003 | A 32 bit array count - number of HashData with name "dump" 1894 | 0x........ | HashData[0] 1895 | 0x........ | HashData[1] 1896 | 0x........ | HashData[2] 1897 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1898 |------------| 1899 0x00003550: | 0x00001203 | String offset into .debug_str ("main") 1900 | 0x00000009 | A 32 bit array count - number of HashData with name "main" 1901 | 0x........ | HashData[0] 1902 | 0x........ | HashData[1] 1903 | 0x........ | HashData[2] 1904 | 0x........ | HashData[3] 1905 | 0x........ | HashData[4] 1906 | 0x........ | HashData[5] 1907 | 0x........ | HashData[6] 1908 | 0x........ | HashData[7] 1909 | 0x........ | HashData[8] 1910 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1911 `------------' 1912 1913So we still have all of the same data, we just organize it more efficiently for 1914debugger lookup. If we repeat the same "``printf``" lookup from above, we 1915would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit 1916hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which 1917is the index into the ``HASHES`` table. We would then compare any consecutive 191832 bit hashes values in the ``HASHES`` array as long as the hashes would be in 1919``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo 1920``n_buckets`` is still 3. In the case of a failed lookup we would access the 1921memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes 1922before we know that we have no match. We don't end up marching through 1923multiple words of memory and we really keep the number of processor data cache 1924lines being accessed as small as possible. 1925 1926The string hash that is used for these lookup tables is the Daniel J. 1927Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a 1928very good hash for all kinds of names in programs with very few hash 1929collisions. 1930 1931Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``. 1932 1933Details 1934^^^^^^^ 1935 1936These name hash tables are designed to be generic where specializations of the 1937table get to define additional data that goes into the header ("``HeaderData``"), 1938how the string value is stored ("``KeyType``") and the content of the data for each 1939hash value. 1940 1941Header Layout 1942""""""""""""" 1943 1944The header has a fixed part, and the specialized part. The exact format of the 1945header is: 1946 1947.. code-block:: c 1948 1949 struct Header 1950 { 1951 uint32_t magic; // 'HASH' magic value to allow endian detection 1952 uint16_t version; // Version number 1953 uint16_t hash_function; // The hash function enumeration that was used 1954 uint32_t bucket_count; // The number of buckets in this hash table 1955 uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table 1956 uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment 1957 // Specifically the length of the following HeaderData field - this does not 1958 // include the size of the preceding fields 1959 HeaderData header_data; // Implementation specific header data 1960 }; 1961 1962The header starts with a 32 bit "``magic``" value which must be ``'HASH'`` 1963encoded as an ASCII integer. This allows the detection of the start of the 1964hash table and also allows the table's byte order to be determined so the table 1965can be correctly extracted. The "``magic``" value is followed by a 16 bit 1966``version`` number which allows the table to be revised and modified in the 1967future. The current version number is 1. ``hash_function`` is a ``uint16_t`` 1968enumeration that specifies which hash function was used to produce this table. 1969The current values for the hash function enumerations include: 1970 1971.. code-block:: c 1972 1973 enum HashFunctionType 1974 { 1975 eHashFunctionDJB = 0u, // Daniel J Bernstein hash function 1976 }; 1977 1978``bucket_count`` is a 32 bit unsigned integer that represents how many buckets 1979are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit 1980hash values that are in the ``HASHES`` array, and is the same number of offsets 1981are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size 1982in bytes of the ``HeaderData`` that is filled in by specialized versions of 1983this table. 1984 1985Fixed Lookup 1986"""""""""""" 1987 1988The header is followed by the buckets, hashes, offsets, and hash value data. 1989 1990.. code-block:: c 1991 1992 struct FixedTable 1993 { 1994 uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below 1995 uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table 1996 uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above 1997 }; 1998 1999``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The 2000``hashes`` array contains all of the 32 bit hash values for all names in the 2001hash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` 2002array that points to the data for the hash value. 2003 2004This table setup makes it very easy to repurpose these tables to contain 2005different data, while keeping the lookup mechanism the same for all tables. 2006This layout also makes it possible to save the table to disk and map it in 2007later and do very efficient name lookups with little or no parsing. 2008 2009DWARF lookup tables can be implemented in a variety of ways and can store a lot 2010of information for each name. We want to make the DWARF tables extensible and 2011able to store the data efficiently so we have used some of the DWARF features 2012that enable efficient data storage to define exactly what kind of data we store 2013for each name. 2014 2015The ``HeaderData`` contains a definition of the contents of each HashData chunk. 2016We might want to store an offset to all of the debug information entries (DIEs) 2017for each name. To keep things extensible, we create a list of items, or 2018Atoms, that are contained in the data for each name. First comes the type of 2019the data in each atom: 2020 2021.. code-block:: c 2022 2023 enum AtomType 2024 { 2025 eAtomTypeNULL = 0u, 2026 eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding 2027 eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question 2028 eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 2029 eAtomTypeNameFlags = 4u, // Flags from enum NameFlags 2030 eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags 2031 }; 2032 2033The enumeration values and their meanings are: 2034 2035.. code-block:: none 2036 2037 eAtomTypeNULL - a termination atom that specifies the end of the atom list 2038 eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name 2039 eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE 2040 eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is 2041 eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) 2042 eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) 2043 2044Then we allow each atom type to define the atom type and how the data for each 2045atom type data is encoded: 2046 2047.. code-block:: c 2048 2049 struct Atom 2050 { 2051 uint16_t type; // AtomType enum value 2052 uint16_t form; // DWARF DW_FORM_XXX defines 2053 }; 2054 2055The ``form`` type above is from the DWARF specification and defines the exact 2056encoding of the data for the Atom type. See the DWARF specification for the 2057``DW_FORM_`` definitions. 2058 2059.. code-block:: c 2060 2061 struct HeaderData 2062 { 2063 uint32_t die_offset_base; 2064 uint32_t atom_count; 2065 Atoms atoms[atom_count0]; 2066 }; 2067 2068``HeaderData`` defines the base DIE offset that should be added to any atoms 2069that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``, 2070``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines 2071what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large 2072each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data 2073should be interpreted. 2074 2075For the current implementations of the "``.apple_names``" (all functions + 2076globals), the "``.apple_types``" (names of all types that are defined), and 2077the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom`` 2078array to be: 2079 2080.. code-block:: c 2081 2082 HeaderData.atom_count = 1; 2083 HeaderData.atoms[0].type = eAtomTypeDIEOffset; 2084 HeaderData.atoms[0].form = DW_FORM_data4; 2085 2086This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is 2087 encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have 2088 multiple matching DIEs in a single file, which could come up with an inlined 2089 function for instance. Future tables could include more information about the 2090 DIE such as flags indicating if the DIE is a function, method, block, 2091 or inlined. 2092 2093The KeyType for the DWARF table is a 32 bit string table offset into the 2094 ".debug_str" table. The ".debug_str" is the string table for the DWARF which 2095 may already contain copies of all of the strings. This helps make sure, with 2096 help from the compiler, that we reuse the strings between all of the DWARF 2097 sections and keeps the hash table size down. Another benefit to having the 2098 compiler generate all strings as DW_FORM_strp in the debug info, is that 2099 DWARF parsing can be made much faster. 2100 2101After a lookup is made, we get an offset into the hash data. The hash data 2102 needs to be able to deal with 32 bit hash collisions, so the chunk of data 2103 at the offset in the hash data consists of a triple: 2104 2105.. code-block:: c 2106 2107 uint32_t str_offset 2108 uint32_t hash_data_count 2109 HashData[hash_data_count] 2110 2111If "str_offset" is zero, then the bucket contents are done. 99.9% of the 2112 hash data chunks contain a single item (no 32 bit hash collision): 2113 2114.. code-block:: none 2115 2116 .------------. 2117 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2118 | 0x00000004 | uint32_t HashData count 2119 | 0x........ | uint32_t HashData[0] DIE offset 2120 | 0x........ | uint32_t HashData[1] DIE offset 2121 | 0x........ | uint32_t HashData[2] DIE offset 2122 | 0x........ | uint32_t HashData[3] DIE offset 2123 | 0x00000000 | uint32_t KeyType (end of hash chain) 2124 `------------' 2125 2126If there are collisions, you will have multiple valid string offsets: 2127 2128.. code-block:: none 2129 2130 .------------. 2131 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2132 | 0x00000004 | uint32_t HashData count 2133 | 0x........ | uint32_t HashData[0] DIE offset 2134 | 0x........ | uint32_t HashData[1] DIE offset 2135 | 0x........ | uint32_t HashData[2] DIE offset 2136 | 0x........ | uint32_t HashData[3] DIE offset 2137 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") 2138 | 0x00000002 | uint32_t HashData count 2139 | 0x........ | uint32_t HashData[0] DIE offset 2140 | 0x........ | uint32_t HashData[1] DIE offset 2141 | 0x00000000 | uint32_t KeyType (end of hash chain) 2142 `------------' 2143 2144Current testing with real world C++ binaries has shown that there is around 1 214532 bit hash collision per 100,000 name entries. 2146 2147Contents 2148^^^^^^^^ 2149 2150As we said, we want to strictly define exactly what is included in the 2151different tables. For DWARF, we have 3 tables: "``.apple_names``", 2152"``.apple_types``", and "``.apple_namespaces``". 2153 2154"``.apple_names``" sections should contain an entry for each DWARF DIE whose 2155``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or 2156``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``, 2157``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains 2158``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and 2159static variables). All global and static variables should be included, 2160including those scoped within functions and classes. For example using the 2161following code: 2162 2163.. code-block:: c 2164 2165 static int var = 0; 2166 2167 void f () 2168 { 2169 static int var = 0; 2170 } 2171 2172Both of the static ``var`` variables would be included in the table. All 2173functions should emit both their full names and their basenames. For C or C++, 2174the full name is the mangled name (if available) which is usually in the 2175``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the 2176function basename. If global or static variables have a mangled name in a 2177``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the 2178simple name found in the ``DW_AT_name`` attribute. 2179 2180"``.apple_types``" sections should contain an entry for each DWARF DIE whose 2181tag is one of: 2182 2183* DW_TAG_array_type 2184* DW_TAG_class_type 2185* DW_TAG_enumeration_type 2186* DW_TAG_pointer_type 2187* DW_TAG_reference_type 2188* DW_TAG_string_type 2189* DW_TAG_structure_type 2190* DW_TAG_subroutine_type 2191* DW_TAG_typedef 2192* DW_TAG_union_type 2193* DW_TAG_ptr_to_member_type 2194* DW_TAG_set_type 2195* DW_TAG_subrange_type 2196* DW_TAG_base_type 2197* DW_TAG_const_type 2198* DW_TAG_constant 2199* DW_TAG_file_type 2200* DW_TAG_namelist 2201* DW_TAG_packed_type 2202* DW_TAG_volatile_type 2203* DW_TAG_restrict_type 2204* DW_TAG_interface_type 2205* DW_TAG_unspecified_type 2206* DW_TAG_shared_type 2207 2208Only entries with a ``DW_AT_name`` attribute are included, and the entry must 2209not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero 2210value). For example, using the following code: 2211 2212.. code-block:: c 2213 2214 int main () 2215 { 2216 int *b = 0; 2217 return *b; 2218 } 2219 2220We get a few type DIEs: 2221 2222.. code-block:: none 2223 2224 0x00000067: TAG_base_type [5] 2225 AT_encoding( DW_ATE_signed ) 2226 AT_name( "int" ) 2227 AT_byte_size( 0x04 ) 2228 2229 0x0000006e: TAG_pointer_type [6] 2230 AT_type( {0x00000067} ( int ) ) 2231 AT_byte_size( 0x08 ) 2232 2233The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``. 2234 2235"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs. 2236If we run into a namespace that has no name this is an anonymous namespace, and 2237the name should be output as "``(anonymous namespace)``" (without the quotes). 2238Why? This matches the output of the ``abi::cxa_demangle()`` that is in the 2239standard C++ library that demangles mangled names. 2240 2241 2242Language Extensions and File Format Changes 2243^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2244 2245Objective-C Extensions 2246"""""""""""""""""""""" 2247 2248"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an 2249Objective-C class. The name used in the hash table is the name of the 2250Objective-C class itself. If the Objective-C class has a category, then an 2251entry is made for both the class name without the category, and for the class 2252name with the category. So if we have a DIE at offset 0x1234 with a name of 2253method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add 2254an entry for "``NSString``" that points to DIE 0x1234, and an entry for 2255"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly 2256track down all Objective-C methods for an Objective-C class when doing 2257expressions. It is needed because of the dynamic nature of Objective-C where 2258anyone can add methods to a class. The DWARF for Objective-C methods is also 2259emitted differently from C++ classes where the methods are not usually 2260contained in the class definition, they are scattered about across one or more 2261compile units. Categories can also be defined in different shared libraries. 2262So we need to be able to quickly find all of the methods and class functions 2263given the Objective-C class name, or quickly find all methods and class 2264functions for a class + category name. This table does not contain any 2265selector names, it just maps Objective-C class names (or class names + 2266category) to all of the methods and class functions. The selectors are added 2267as function basenames in the "``.debug_names``" section. 2268 2269In the "``.apple_names``" section for Objective-C functions, the full name is 2270the entire function name with the brackets ("``-[NSString 2271stringWithCString:]``") and the basename is the selector only 2272("``stringWithCString:``"). 2273 2274Mach-O Changes 2275"""""""""""""" 2276 2277The sections names for the apple hash tables are for non mach-o files. For 2278mach-o files, the sections should be contained in the ``__DWARF`` segment with 2279names as follows: 2280 2281* "``.apple_names``" -> "``__apple_names``" 2282* "``.apple_types``" -> "``__apple_types``" 2283* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) 2284* "``.apple_objc``" -> "``__apple_objc``" 2285 2286