1===================================== 2Garbage Collection with LLVM 3===================================== 4 5.. contents:: 6 :local: 7 8Abstract 9======== 10 11This document covers how to integrate LLVM into a compiler for a language which 12supports garbage collection. **Note that LLVM itself does not provide a 13garbage collector.** You must provide your own. 14 15Quick Start 16============ 17 18First, you should pick a collector strategy. LLVM includes a number of built 19in ones, but you can also implement a loadable plugin with a custom definition. 20Note that the collector strategy is a description of how LLVM should generate 21code such that it interacts with your collector and runtime, not a description 22of the collector itself. 23 24Next, mark your generated functions as using your chosen collector strategy. 25From c++, you can call: 26 27.. code-block:: c++ 28 29 F.setGC(<collector description name>); 30 31 32This will produce IR like the following fragment: 33 34.. code-block:: llvm 35 36 define void @foo() gc "<collector description name>" { ... } 37 38 39When generating LLVM IR for your functions, you will need to: 40 41* Use ``@llvm.gcread`` and/or ``@llvm.gcwrite`` in place of standard load and 42 store instructions. These intrinsics are used to represent load and store 43 barriers. If you collector does not require such barriers, you can skip 44 this step. 45 46* Use the memory allocation routines provided by your garbage collector's 47 runtime library. 48 49* If your collector requires them, generate type maps according to your 50 runtime's binary interface. LLVM is not involved in the process. In 51 particular, the LLVM type system is not suitable for conveying such 52 information though the compiler. 53 54* Insert any coordination code required for interacting with your collector. 55 Many collectors require running application code to periodically check a 56 flag and conditionally call a runtime function. This is often referred to 57 as a safepoint poll. 58 59You will need to identify roots (i.e. references to heap objects your collector 60needs to know about) in your generated IR, so that LLVM can encode them into 61your final stack maps. Depending on the collector strategy chosen, this is 62accomplished by using either the ``@llvm.gcroot`` intrinsics or an 63``gc.statepoint`` relocation sequence. 64 65Don't forget to create a root for each intermediate value that is generated when 66evaluating an expression. In ``h(f(), g())``, the result of ``f()`` could 67easily be collected if evaluating ``g()`` triggers a collection. 68 69Finally, you need to link your runtime library with the generated program 70executable (for a static compiler) or ensure the appropriate symbols are 71available for the runtime linker (for a JIT compiler). 72 73 74Introduction 75============ 76 77What is Garbage Collection? 78--------------------------- 79 80Garbage collection is a widely used technique that frees the programmer from 81having to know the lifetimes of heap objects, making software easier to produce 82and maintain. Many programming languages rely on garbage collection for 83automatic memory management. There are two primary forms of garbage collection: 84conservative and accurate. 85 86Conservative garbage collection often does not require any special support from 87either the language or the compiler: it can handle non-type-safe programming 88languages (such as C/C++) and does not require any special information from the 89compiler. The `Boehm collector 90<http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a 91state-of-the-art conservative collector. 92 93Accurate garbage collection requires the ability to identify all pointers in the 94program at run-time (which requires that the source-language be type-safe in 95most cases). Identifying pointers at run-time requires compiler support to 96locate all places that hold live pointer variables at run-time, including the 97:ref:`processor stack and registers <gcroot>`. 98 99Conservative garbage collection is attractive because it does not require any 100special compiler support, but it does have problems. In particular, because the 101conservative garbage collector cannot *know* that a particular word in the 102machine is a pointer, it cannot move live objects in the heap (preventing the 103use of compacting and generational GC algorithms) and it can occasionally suffer 104from memory leaks due to integer values that happen to point to objects in the 105program. In addition, some aggressive compiler transformations can break 106conservative garbage collectors (though these seem rare in practice). 107 108Accurate garbage collectors do not suffer from any of these problems, but they 109can suffer from degraded scalar optimization of the program. In particular, 110because the runtime must be able to identify and update all pointers active in 111the program, some optimizations are less effective. In practice, however, the 112locality and performance benefits of using aggressive garbage collection 113techniques dominates any low-level losses. 114 115This document describes the mechanisms and interfaces provided by LLVM to 116support accurate garbage collection. 117 118Goals and non-goals 119------------------- 120 121LLVM's intermediate representation provides :ref:`garbage collection intrinsics 122<gc_intrinsics>` that offer support for a broad class of collector models. For 123instance, the intrinsics permit: 124 125* semi-space collectors 126 127* mark-sweep collectors 128 129* generational collectors 130 131* incremental collectors 132 133* concurrent collectors 134 135* cooperative collectors 136 137* reference counting 138 139We hope that the support built into the LLVM IR is sufficient to support a 140broad class of garbage collected languages including Scheme, ML, Java, C#, 141Perl, Python, Lua, Ruby, other scripting languages, and more. 142 143Note that LLVM **does not itself provide a garbage collector** --- this should 144be part of your language's runtime library. LLVM provides a framework for 145describing the garbage collectors requirements to the compiler. In particular, 146LLVM provides support for generating stack maps at call sites, polling for a 147safepoint, and emitting load and store barriers. You can also extend LLVM - 148possibly through a loadable :ref:`code generation plugins <plugin>` - to 149generate code and data structures which conforms to the *binary interface* 150specified by the *runtime library*. This is similar to the relationship between 151LLVM and DWARF debugging info, for example. The difference primarily lies in 152the lack of an established standard in the domain of garbage collection --- thus 153the need for a flexible extension mechanism. 154 155The aspects of the binary interface with which LLVM's GC support is 156concerned are: 157 158* Creation of GC safepoints within code where collection is allowed to execute 159 safely. 160 161* Computation of the stack map. For each safe point in the code, object 162 references within the stack frame must be identified so that the collector may 163 traverse and perhaps update them. 164 165* Write barriers when storing object references to the heap. These are commonly 166 used to optimize incremental scans in generational collectors. 167 168* Emission of read barriers when loading object references. These are useful 169 for interoperating with concurrent collectors. 170 171There are additional areas that LLVM does not directly address: 172 173* Registration of global roots with the runtime. 174 175* Registration of stack map entries with the runtime. 176 177* The functions used by the program to allocate memory, trigger a collection, 178 etc. 179 180* Computation or compilation of type maps, or registration of them with the 181 runtime. These are used to crawl the heap for object references. 182 183In general, LLVM's support for GC does not include features which can be 184adequately addressed with other features of the IR and does not specify a 185particular binary interface. On the plus side, this means that you should be 186able to integrate LLVM with an existing runtime. On the other hand, it can 187have the effect of leaving a lot of work for the developer of a novel 188language. We try to mitigate this by providing built in collector strategy 189descriptions that can work with many common collector designs and easy 190extension points. If you don't already have a specific binary interface 191you need to support, we recommend trying to use one of these built in collector 192strategies. 193 194.. _gc_intrinsics: 195 196LLVM IR Features 197================ 198 199This section describes the garbage collection facilities provided by the 200:doc:`LLVM intermediate representation <LangRef>`. The exact behavior of these 201IR features is specified by the selected :ref:`GC strategy description 202<plugin>`. 203 204Specifying GC code generation: ``gc "..."`` 205------------------------------------------- 206 207.. code-block:: llvm 208 209 define <returntype> @name(...) gc "name" { ... } 210 211The ``gc`` function attribute is used to specify the desired GC strategy to the 212compiler. Its programmatic equivalent is the ``setGC`` method of ``Function``. 213 214Setting ``gc "name"`` on a function triggers a search for a matching subclass 215of GCStrategy. Some collector strategies are built in. You can add others 216using either the loadable plugin mechanism, or by patching your copy of LLVM. 217It is the selected GC strategy which defines the exact nature of the code 218generated to support GC. If none is found, the compiler will raise an error. 219 220Specifying the GC style on a per-function basis allows LLVM to link together 221programs that use different garbage collection algorithms (or none at all). 222 223.. _gcroot: 224 225Identifying GC roots on the stack 226---------------------------------- 227 228LLVM currently supports two different mechanisms for describing references in 229compiled code at safepoints. ``llvm.gcroot`` is the older mechanism; 230``gc.statepoint`` has been added more recently. At the moment, you can choose 231either implementation (on a per :ref:`GC strategy <plugin>` basis). Longer 232term, we will probably either migrate away from ``llvm.gcroot`` entirely, or 233substantially merge their implementations. Note that most new development 234work is focused on ``gc.statepoint``. 235 236Using ``gc.statepoint`` 237^^^^^^^^^^^^^^^^^^^^^^^^ 238:doc:`This page <Statepoints>` contains detailed documentation for 239``gc.statepoint``. 240 241Using ``llvm.gcwrite`` 242^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 243 244.. code-block:: llvm 245 246 void @llvm.gcroot(i8** %ptrloc, i8* %metadata) 247 248The ``llvm.gcroot`` intrinsic is used to inform LLVM that a stack variable 249references an object on the heap and is to be tracked for garbage collection. 250The exact impact on generated code is specified by a :ref:`compiler plugin 251<plugin>`. All calls to ``llvm.gcroot`` **must** reside inside the first basic 252block. 253 254The first argument **must** be a value referring to an alloca instruction or a 255bitcast of an alloca. The second contains a pointer to metadata that should be 256associated with the pointer, and **must** be a constant or global value 257address. If your target collector uses tags, use a null pointer for metadata. 258 259A compiler which performs manual SSA construction **must** ensure that SSA 260values representing GC references are stored in to the alloca passed to the 261respective ``gcroot`` before every call site and reloaded after every call. 262A compiler which uses mem2reg to raise imperative code using ``alloca`` into 263SSA form need only add a call to ``@llvm.gcroot`` for those variables which 264are pointers into the GC heap. 265 266It is also important to mark intermediate values with ``llvm.gcroot``. For 267example, consider ``h(f(), g())``. Beware leaking the result of ``f()`` in the 268case that ``g()`` triggers a collection. Note, that stack variables must be 269initialized and marked with ``llvm.gcroot`` in function's prologue. 270 271The ``%metadata`` argument can be used to avoid requiring heap objects to have 272'isa' pointers or tag bits. [Appel89_, Goldberg91_, Tolmach94_] If specified, 273its value will be tracked along with the location of the pointer in the stack 274frame. 275 276Consider the following fragment of Java code: 277 278.. code-block:: java 279 280 { 281 Object X; // A null-initialized reference to an object 282 ... 283 } 284 285This block (which may be located in the middle of a function or in a loop nest), 286could be compiled to this LLVM code: 287 288.. code-block:: llvm 289 290 Entry: 291 ;; In the entry block for the function, allocate the 292 ;; stack space for X, which is an LLVM pointer. 293 %X = alloca %Object* 294 295 ;; Tell LLVM that the stack space is a stack root. 296 ;; Java has type-tags on objects, so we pass null as metadata. 297 %tmp = bitcast %Object** %X to i8** 298 call void @llvm.gcroot(i8** %tmp, i8* null) 299 ... 300 301 ;; "CodeBlock" is the block corresponding to the start 302 ;; of the scope above. 303 CodeBlock: 304 ;; Java null-initializes pointers. 305 store %Object* null, %Object** %X 306 307 ... 308 309 ;; As the pointer goes out of scope, store a null value into 310 ;; it, to indicate that the value is no longer live. 311 store %Object* null, %Object** %X 312 ... 313 314Reading and writing references in the heap 315------------------------------------------ 316 317Some collectors need to be informed when the mutator (the program that needs 318garbage collection) either reads a pointer from or writes a pointer to a field 319of a heap object. The code fragments inserted at these points are called *read 320barriers* and *write barriers*, respectively. The amount of code that needs to 321be executed is usually quite small and not on the critical path of any 322computation, so the overall performance impact of the barrier is tolerable. 323 324Barriers often require access to the *object pointer* rather than the *derived 325pointer* (which is a pointer to the field within the object). Accordingly, 326these intrinsics take both pointers as separate arguments for completeness. In 327this snippet, ``%object`` is the object pointer, and ``%derived`` is the derived 328pointer: 329 330.. code-block:: llvm 331 332 ;; An array type. 333 %class.Array = type { %class.Object, i32, [0 x %class.Object*] } 334 ... 335 336 ;; Load the object pointer from a gcroot. 337 %object = load %class.Array** %object_addr 338 339 ;; Compute the derived pointer. 340 %derived = getelementptr %object, i32 0, i32 2, i32 %n 341 342LLVM does not enforce this relationship between the object and derived pointer 343(although a particular :ref:`collector strategy <plugin>` might). However, it 344would be an unusual collector that violated it. 345 346The use of these intrinsics is naturally optional if the target GC does not 347require the corresponding barrier. The GC strategy used with such a collector 348should replace the intrinsic calls with the corresponding ``load`` or 349``store`` instruction if they are used. 350 351One known deficiency with the current design is that the barrier intrinsics do 352not include the size or alignment of the underlying operation performed. It is 353currently assumed that the operation is of pointer size and the alignment is 354assumed to be the target machine's default alignment. 355 356Write barrier: ``llvm.gcwrite`` 357^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 358 359.. code-block:: llvm 360 361 void @llvm.gcwrite(i8* %value, i8* %object, i8** %derived) 362 363For write barriers, LLVM provides the ``llvm.gcwrite`` intrinsic function. It 364has exactly the same semantics as a non-volatile ``store`` to the derived 365pointer (the third argument). The exact code generated is specified by the 366Function's selected :ref:`GC strategy <plugin>`. 367 368Many important algorithms require write barriers, including generational and 369concurrent collectors. Additionally, write barriers could be used to implement 370reference counting. 371 372Read barrier: ``llvm.gcread`` 373^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 374 375.. code-block:: llvm 376 377 i8* @llvm.gcread(i8* %object, i8** %derived) 378 379For read barriers, LLVM provides the ``llvm.gcread`` intrinsic function. It has 380exactly the same semantics as a non-volatile ``load`` from the derived pointer 381(the second argument). The exact code generated is specified by the Function's 382selected :ref:`GC strategy <plugin>`. 383 384Read barriers are needed by fewer algorithms than write barriers, and may have a 385greater performance impact since pointer reads are more frequent than writes. 386 387.. _plugin: 388 389Built In Collectors 390==================== 391 392LLVM includes built in support for several varieties of garbage collectors. 393 394The Shadow Stack GC 395---------------------- 396 397To use this collector strategy, mark your functions with: 398 399.. code-block:: c++ 400 401 F.setGC("shadow-stack"); 402 403Unlike many GC algorithms which rely on a cooperative code generator to compile 404stack maps, this algorithm carefully maintains a linked list of stack roots 405[:ref:`Henderson2002 <henderson02>`]. This so-called "shadow stack" mirrors the 406machine stack. Maintaining this data structure is slower than using a stack map 407compiled into the executable as constant data, but has a significant portability 408advantage because it requires no special support from the target code generator, 409and does not require tricky platform-specific code to crawl the machine stack. 410 411The tradeoff for this simplicity and portability is: 412 413* High overhead per function call. 414 415* Not thread-safe. 416 417Still, it's an easy way to get started. After your compiler and runtime are up 418and running, writing a :ref:`plugin <plugin>` will allow you to take advantage 419of :ref:`more advanced GC features <collector-algos>` of LLVM in order to 420improve performance. 421 422 423The shadow stack doesn't imply a memory allocation algorithm. A semispace 424collector or building atop ``malloc`` are great places to start, and can be 425implemented with very little code. 426 427When it comes time to collect, however, your runtime needs to traverse the stack 428roots, and for this it needs to integrate with the shadow stack. Luckily, doing 429so is very simple. (This code is heavily commented to help you understand the 430data structure, but there are only 20 lines of meaningful code.) 431 432.. code-block:: c++ 433 434 /// @brief The map for a single function's stack frame. One of these is 435 /// compiled as constant data into the executable for each function. 436 /// 437 /// Storage of metadata values is elided if the %metadata parameter to 438 /// @llvm.gcroot is null. 439 struct FrameMap { 440 int32_t NumRoots; //< Number of roots in stack frame. 441 int32_t NumMeta; //< Number of metadata entries. May be < NumRoots. 442 const void *Meta[0]; //< Metadata for each root. 443 }; 444 445 /// @brief A link in the dynamic shadow stack. One of these is embedded in 446 /// the stack frame of each function on the call stack. 447 struct StackEntry { 448 StackEntry *Next; //< Link to next stack entry (the caller's). 449 const FrameMap *Map; //< Pointer to constant FrameMap. 450 void *Roots[0]; //< Stack roots (in-place array). 451 }; 452 453 /// @brief The head of the singly-linked list of StackEntries. Functions push 454 /// and pop onto this in their prologue and epilogue. 455 /// 456 /// Since there is only a global list, this technique is not threadsafe. 457 StackEntry *llvm_gc_root_chain; 458 459 /// @brief Calls Visitor(root, meta) for each GC root on the stack. 460 /// root and meta are exactly the values passed to 461 /// @llvm.gcroot. 462 /// 463 /// Visitor could be a function to recursively mark live objects. Or it 464 /// might copy them to another heap or generation. 465 /// 466 /// @param Visitor A function to invoke for every GC root on the stack. 467 void visitGCRoots(void (*Visitor)(void **Root, const void *Meta)) { 468 for (StackEntry *R = llvm_gc_root_chain; R; R = R->Next) { 469 unsigned i = 0; 470 471 // For roots [0, NumMeta), the metadata pointer is in the FrameMap. 472 for (unsigned e = R->Map->NumMeta; i != e; ++i) 473 Visitor(&R->Roots[i], R->Map->Meta[i]); 474 475 // For roots [NumMeta, NumRoots), the metadata pointer is null. 476 for (unsigned e = R->Map->NumRoots; i != e; ++i) 477 Visitor(&R->Roots[i], NULL); 478 } 479 } 480 481 482The 'Erlang' and 'Ocaml' GCs 483----------------------------- 484 485LLVM ships with two example collectors which leverage the ``gcroot`` 486mechanisms. To our knowledge, these are not actually used by any language 487runtime, but they do provide a reasonable starting point for someone interested 488in writing an ``gcroot`` compatible GC plugin. In particular, these are the 489only in tree examples of how to produce a custom binary stack map format using 490a ``gcroot`` strategy. 491 492As there names imply, the binary format produced is intended to model that 493used by the Erlang and OCaml compilers respectively. 494 495 496The Statepoint Example GC 497------------------------- 498 499.. code-block:: c++ 500 501 F.setGC("statepoint-example"); 502 503This GC provides an example of how one might use the infrastructure provided 504by ``gc.statepoint``. 505 506 507Custom GC Strategies 508==================== 509 510If none of the built in GC strategy descriptions met your needs above, you will 511need to define a custom GCStrategy and possibly, a custom LLVM pass to perform 512lowering. Your best example of where to start defining a custom GCStrategy 513would be to look at one of the built in strategies. 514 515You may be able to structure this additional code as a loadable plugin library. 516Loadable plugins are sufficient if all you need is to enable a different 517combination of built in functionality, but if you need to provide a custom 518lowering pass, you will need to build a patched version of LLVM. If you think 519you need a patched build, please ask for advice on llvm-dev. There may be an 520easy way we can extend the support to make it work for your use case without 521requiring a custom build. 522 523Collector Requirements 524---------------------- 525 526You should be able to leverage any existing collector library that includes the following elements: 527 528#. A memory allocator which exposes an allocation function your compiled 529 code can call. 530 531#. A binary format for the stack map. A stack map describes the location 532 of references at a safepoint and is used by precise collectors to identify 533 references within a stack frame on the machine stack. Note that collectors 534 which conservatively scan the stack don't require such a structure. 535 536#. A stack crawler to discover functions on the call stack, and enumerate the 537 references listed in the stack map for each call site. 538 539#. A mechanism for identifying references in global locations (e.g. global 540 variables). 541 542#. If you collector requires them, an LLVM IR implementation of your collectors 543 load and store barriers. Note that since many collectors don't require 544 barriers at all, LLVM defaults to lowering such barriers to normal loads 545 and stores unless you arrange otherwise. 546 547 548Implementing a collector plugin 549------------------------------- 550 551User code specifies which GC code generation to use with the ``gc`` function 552attribute or, equivalently, with the ``setGC`` method of ``Function``. 553 554To implement a GC plugin, it is necessary to subclass ``llvm::GCStrategy``, 555which can be accomplished in a few lines of boilerplate code. LLVM's 556infrastructure provides access to several important algorithms. For an 557uncontroversial collector, all that remains may be to compile LLVM's computed 558stack map to assembly code (using the binary representation expected by the 559runtime library). This can be accomplished in about 100 lines of code. 560 561This is not the appropriate place to implement a garbage collected heap or a 562garbage collector itself. That code should exist in the language's runtime 563library. The compiler plugin is responsible for generating code which conforms 564to the binary interface defined by library, most essentially the :ref:`stack map 565<stack-map>`. 566 567To subclass ``llvm::GCStrategy`` and register it with the compiler: 568 569.. code-block:: c++ 570 571 // lib/MyGC/MyGC.cpp - Example LLVM GC plugin 572 573 #include "llvm/CodeGen/GCStrategy.h" 574 #include "llvm/CodeGen/GCMetadata.h" 575 #include "llvm/Support/Compiler.h" 576 577 using namespace llvm; 578 579 namespace { 580 class LLVM_LIBRARY_VISIBILITY MyGC : public GCStrategy { 581 public: 582 MyGC() {} 583 }; 584 585 GCRegistry::Add<MyGC> 586 X("mygc", "My bespoke garbage collector."); 587 } 588 589This boilerplate collector does nothing. More specifically: 590 591* ``llvm.gcread`` calls are replaced with the corresponding ``load`` 592 instruction. 593 594* ``llvm.gcwrite`` calls are replaced with the corresponding ``store`` 595 instruction. 596 597* No safe points are added to the code. 598 599* The stack map is not compiled into the executable. 600 601Using the LLVM makefiles, this code 602can be compiled as a plugin using a simple makefile: 603 604.. code-block:: make 605 606 # lib/MyGC/Makefile 607 608 LEVEL := ../.. 609 LIBRARYNAME = MyGC 610 LOADABLE_MODULE = 1 611 612 include $(LEVEL)/Makefile.common 613 614Once the plugin is compiled, code using it may be compiled using ``llc 615-load=MyGC.so`` (though MyGC.so may have some other platform-specific 616extension): 617 618:: 619 620 $ cat sample.ll 621 define void @f() gc "mygc" { 622 entry: 623 ret void 624 } 625 $ llvm-as < sample.ll | llc -load=MyGC.so 626 627It is also possible to statically link the collector plugin into tools, such as 628a language-specific compiler front-end. 629 630.. _collector-algos: 631 632Overview of available features 633------------------------------ 634 635``GCStrategy`` provides a range of features through which a plugin may do useful 636work. Some of these are callbacks, some are algorithms that can be enabled, 637disabled, or customized. This matrix summarizes the supported (and planned) 638features and correlates them with the collection techniques which typically 639require them. 640 641.. |v| unicode:: 0x2714 642 :trim: 643 644.. |x| unicode:: 0x2718 645 :trim: 646 647+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 648| Algorithm | Done | Shadow | refcount | mark- | copying | incremental | threaded | concurrent | 649| | | stack | | sweep | | | | | 650+============+======+========+==========+=======+=========+=============+==========+============+ 651| stack map | |v| | | | |x| | |x| | |x| | |x| | |x| | 652+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 653| initialize | |v| | |x| | |x| | |x| | |x| | |x| | |x| | |x| | 654| roots | | | | | | | | | 655+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 656| derived | NO | | | | | | **N**\* | **N**\* | 657| pointers | | | | | | | | | 658+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 659| **custom | |v| | | | | | | | | 660| lowering** | | | | | | | | | 661+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 662| *gcroot* | |v| | |x| | |x| | | | | | | 663+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 664| *gcwrite* | |v| | | |x| | | | |x| | | |x| | 665+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 666| *gcread* | |v| | | | | | | | |x| | 667+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 668| **safe | | | | | | | | | 669| points** | | | | | | | | | 670+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 671| *in | |v| | | | |x| | |x| | |x| | |x| | |x| | 672| calls* | | | | | | | | | 673+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 674| *before | |v| | | | | | | |x| | |x| | 675| calls* | | | | | | | | | 676+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 677| *for | NO | | | | | | **N** | **N** | 678| loops* | | | | | | | | | 679+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 680| *before | |v| | | | | | | |x| | |x| | 681| escape* | | | | | | | | | 682+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 683| emit code | NO | | | | | | **N** | **N** | 684| at safe | | | | | | | | | 685| points | | | | | | | | | 686+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 687| **output** | | | | | | | | | 688+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 689| *assembly* | |v| | | | |x| | |x| | |x| | |x| | |x| | 690+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 691| *JIT* | NO | | | **?** | **?** | **?** | **?** | **?** | 692+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 693| *obj* | NO | | | **?** | **?** | **?** | **?** | **?** | 694+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 695| live | NO | | | **?** | **?** | **?** | **?** | **?** | 696| analysis | | | | | | | | | 697+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 698| register | NO | | | **?** | **?** | **?** | **?** | **?** | 699| map | | | | | | | | | 700+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 701| \* Derived pointers only pose a hasard to copying collections. | 702+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 703| **?** denotes a feature which could be utilized if available. | 704+------------+------+--------+----------+-------+---------+-------------+----------+------------+ 705 706To be clear, the collection techniques above are defined as: 707 708Shadow Stack 709 The mutator carefully maintains a linked list of stack roots. 710 711Reference Counting 712 The mutator maintains a reference count for each object and frees an object 713 when its count falls to zero. 714 715Mark-Sweep 716 When the heap is exhausted, the collector marks reachable objects starting 717 from the roots, then deallocates unreachable objects in a sweep phase. 718 719Copying 720 As reachability analysis proceeds, the collector copies objects from one heap 721 area to another, compacting them in the process. Copying collectors enable 722 highly efficient "bump pointer" allocation and can improve locality of 723 reference. 724 725Incremental 726 (Including generational collectors.) Incremental collectors generally have all 727 the properties of a copying collector (regardless of whether the mature heap 728 is compacting), but bring the added complexity of requiring write barriers. 729 730Threaded 731 Denotes a multithreaded mutator; the collector must still stop the mutator 732 ("stop the world") before beginning reachability analysis. Stopping a 733 multithreaded mutator is a complicated problem. It generally requires highly 734 platform-specific code in the runtime, and the production of carefully 735 designed machine code at safe points. 736 737Concurrent 738 In this technique, the mutator and the collector run concurrently, with the 739 goal of eliminating pause times. In a *cooperative* collector, the mutator 740 further aids with collection should a pause occur, allowing collection to take 741 advantage of multiprocessor hosts. The "stop the world" problem of threaded 742 collectors is generally still present to a limited extent. Sophisticated 743 marking algorithms are necessary. Read barriers may be necessary. 744 745As the matrix indicates, LLVM's garbage collection infrastructure is already 746suitable for a wide variety of collectors, but does not currently extend to 747multithreaded programs. This will be added in the future as there is 748interest. 749 750.. _stack-map: 751 752Computing stack maps 753-------------------- 754 755LLVM automatically computes a stack map. One of the most important features 756of a ``GCStrategy`` is to compile this information into the executable in 757the binary representation expected by the runtime library. 758 759The stack map consists of the location and identity of each GC root in the 760each function in the module. For each root: 761 762* ``RootNum``: The index of the root. 763 764* ``StackOffset``: The offset of the object relative to the frame pointer. 765 766* ``RootMetadata``: The value passed as the ``%metadata`` parameter to the 767 ``@llvm.gcroot`` intrinsic. 768 769Also, for the function as a whole: 770 771* ``getFrameSize()``: The overall size of the function's initial stack frame, 772 not accounting for any dynamic allocation. 773 774* ``roots_size()``: The count of roots in the function. 775 776To access the stack map, use ``GCFunctionMetadata::roots_begin()`` and 777-``end()`` from the :ref:`GCMetadataPrinter <assembly>`: 778 779.. code-block:: c++ 780 781 for (iterator I = begin(), E = end(); I != E; ++I) { 782 GCFunctionInfo *FI = *I; 783 unsigned FrameSize = FI->getFrameSize(); 784 size_t RootCount = FI->roots_size(); 785 786 for (GCFunctionInfo::roots_iterator RI = FI->roots_begin(), 787 RE = FI->roots_end(); 788 RI != RE; ++RI) { 789 int RootNum = RI->Num; 790 int RootStackOffset = RI->StackOffset; 791 Constant *RootMetadata = RI->Metadata; 792 } 793 } 794 795If the ``llvm.gcroot`` intrinsic is eliminated before code generation by a 796custom lowering pass, LLVM will compute an empty stack map. This may be useful 797for collector plugins which implement reference counting or a shadow stack. 798 799.. _init-roots: 800 801Initializing roots to null: ``InitRoots`` 802----------------------------------------- 803 804.. code-block:: c++ 805 806 MyGC::MyGC() { 807 InitRoots = true; 808 } 809 810When set, LLVM will automatically initialize each root to ``null`` upon entry to 811the function. This prevents the GC's sweep phase from visiting uninitialized 812pointers, which will almost certainly cause it to crash. This initialization 813occurs before custom lowering, so the two may be used together. 814 815Since LLVM does not yet compute liveness information, there is no means of 816distinguishing an uninitialized stack root from an initialized one. Therefore, 817this feature should be used by all GC plugins. It is enabled by default. 818 819Custom lowering of intrinsics: ``CustomRoots``, ``CustomReadBarriers``, and ``CustomWriteBarriers`` 820--------------------------------------------------------------------------------------------------- 821 822For GCs which use barriers or unusual treatment of stack roots, these 823flags allow the collector to perform arbitrary transformations of the 824LLVM IR: 825 826.. code-block:: c++ 827 828 class MyGC : public GCStrategy { 829 public: 830 MyGC() { 831 CustomRoots = true; 832 CustomReadBarriers = true; 833 CustomWriteBarriers = true; 834 } 835 }; 836 837If any of these flags are set, LLVM suppresses its default lowering for 838the corresponding intrinsics. Instead, you must provide a custom Pass 839which lowers the intrinsics as desired. If you have opted in to custom 840lowering of a particular intrinsic your pass **must** eliminate all 841instances of the corresponding intrinsic in functions which opt in to 842your GC. The best example of such a pass is the ShadowStackGC and it's 843ShadowStackGCLowering pass. 844 845There is currently no way to register such a custom lowering pass 846without building a custom copy of LLVM. 847 848.. _safe-points: 849 850Generating safe points: ``NeededSafePoints`` 851-------------------------------------------- 852 853LLVM can compute four kinds of safe points: 854 855.. code-block:: c++ 856 857 namespace GC { 858 /// PointKind - The type of a collector-safe point. 859 /// 860 enum PointKind { 861 Loop, //< Instr is a loop (backwards branch). 862 Return, //< Instr is a return instruction. 863 PreCall, //< Instr is a call instruction. 864 PostCall //< Instr is the return address of a call. 865 }; 866 } 867 868A collector can request any combination of the four by setting the 869``NeededSafePoints`` mask: 870 871.. code-block:: c++ 872 873 MyGC::MyGC() { 874 NeededSafePoints = 1 << GC::Loop 875 | 1 << GC::Return 876 | 1 << GC::PreCall 877 | 1 << GC::PostCall; 878 } 879 880It can then use the following routines to access safe points. 881 882.. code-block:: c++ 883 884 for (iterator I = begin(), E = end(); I != E; ++I) { 885 GCFunctionInfo *MD = *I; 886 size_t PointCount = MD->size(); 887 888 for (GCFunctionInfo::iterator PI = MD->begin(), 889 PE = MD->end(); PI != PE; ++PI) { 890 GC::PointKind PointKind = PI->Kind; 891 unsigned PointNum = PI->Num; 892 } 893 } 894 895Almost every collector requires ``PostCall`` safe points, since these correspond 896to the moments when the function is suspended during a call to a subroutine. 897 898Threaded programs generally require ``Loop`` safe points to guarantee that the 899application will reach a safe point within a bounded amount of time, even if it 900is executing a long-running loop which contains no function calls. 901 902Threaded collectors may also require ``Return`` and ``PreCall`` safe points to 903implement "stop the world" techniques using self-modifying code, where it is 904important that the program not exit the function without reaching a safe point 905(because only the topmost function has been patched). 906 907.. _assembly: 908 909Emitting assembly code: ``GCMetadataPrinter`` 910--------------------------------------------- 911 912LLVM allows a plugin to print arbitrary assembly code before and after the rest 913of a module's assembly code. At the end of the module, the GC can compile the 914LLVM stack map into assembly code. (At the beginning, this information is not 915yet computed.) 916 917Since AsmWriter and CodeGen are separate components of LLVM, a separate abstract 918base class and registry is provided for printing assembly code, the 919``GCMetadaPrinter`` and ``GCMetadataPrinterRegistry``. The AsmWriter will look 920for such a subclass if the ``GCStrategy`` sets ``UsesMetadata``: 921 922.. code-block:: c++ 923 924 MyGC::MyGC() { 925 UsesMetadata = true; 926 } 927 928This separation allows JIT-only clients to be smaller. 929 930Note that LLVM does not currently have analogous APIs to support code generation 931in the JIT, nor using the object writers. 932 933.. code-block:: c++ 934 935 // lib/MyGC/MyGCPrinter.cpp - Example LLVM GC printer 936 937 #include "llvm/CodeGen/GCMetadataPrinter.h" 938 #include "llvm/Support/Compiler.h" 939 940 using namespace llvm; 941 942 namespace { 943 class LLVM_LIBRARY_VISIBILITY MyGCPrinter : public GCMetadataPrinter { 944 public: 945 virtual void beginAssembly(AsmPrinter &AP); 946 947 virtual void finishAssembly(AsmPrinter &AP); 948 }; 949 950 GCMetadataPrinterRegistry::Add<MyGCPrinter> 951 X("mygc", "My bespoke garbage collector."); 952 } 953 954The collector should use ``AsmPrinter`` to print portable assembly code. The 955collector itself contains the stack map for the entire module, and may access 956the ``GCFunctionInfo`` using its own ``begin()`` and ``end()`` methods. Here's 957a realistic example: 958 959.. code-block:: c++ 960 961 #include "llvm/CodeGen/AsmPrinter.h" 962 #include "llvm/IR/Function.h" 963 #include "llvm/IR/DataLayout.h" 964 #include "llvm/Target/TargetAsmInfo.h" 965 #include "llvm/Target/TargetMachine.h" 966 967 void MyGCPrinter::beginAssembly(AsmPrinter &AP) { 968 // Nothing to do. 969 } 970 971 void MyGCPrinter::finishAssembly(AsmPrinter &AP) { 972 MCStreamer &OS = AP.OutStreamer; 973 unsigned IntPtrSize = AP.TM.getSubtargetImpl()->getDataLayout()->getPointerSize(); 974 975 // Put this in the data section. 976 OS.SwitchSection(AP.getObjFileLowering().getDataSection()); 977 978 // For each function... 979 for (iterator FI = begin(), FE = end(); FI != FE; ++FI) { 980 GCFunctionInfo &MD = **FI; 981 982 // A compact GC layout. Emit this data structure: 983 // 984 // struct { 985 // int32_t PointCount; 986 // void *SafePointAddress[PointCount]; 987 // int32_t StackFrameSize; // in words 988 // int32_t StackArity; 989 // int32_t LiveCount; 990 // int32_t LiveOffsets[LiveCount]; 991 // } __gcmap_<FUNCTIONNAME>; 992 993 // Align to address width. 994 AP.EmitAlignment(IntPtrSize == 4 ? 2 : 3); 995 996 // Emit PointCount. 997 OS.AddComment("safe point count"); 998 AP.EmitInt32(MD.size()); 999 1000 // And each safe point... 1001 for (GCFunctionInfo::iterator PI = MD.begin(), 1002 PE = MD.end(); PI != PE; ++PI) { 1003 // Emit the address of the safe point. 1004 OS.AddComment("safe point address"); 1005 MCSymbol *Label = PI->Label; 1006 AP.EmitLabelPlusOffset(Label/*Hi*/, 0/*Offset*/, 4/*Size*/); 1007 } 1008 1009 // Stack information never change in safe points! Only print info from the 1010 // first call-site. 1011 GCFunctionInfo::iterator PI = MD.begin(); 1012 1013 // Emit the stack frame size. 1014 OS.AddComment("stack frame size (in words)"); 1015 AP.EmitInt32(MD.getFrameSize() / IntPtrSize); 1016 1017 // Emit stack arity, i.e. the number of stacked arguments. 1018 unsigned RegisteredArgs = IntPtrSize == 4 ? 5 : 6; 1019 unsigned StackArity = MD.getFunction().arg_size() > RegisteredArgs ? 1020 MD.getFunction().arg_size() - RegisteredArgs : 0; 1021 OS.AddComment("stack arity"); 1022 AP.EmitInt32(StackArity); 1023 1024 // Emit the number of live roots in the function. 1025 OS.AddComment("live root count"); 1026 AP.EmitInt32(MD.live_size(PI)); 1027 1028 // And for each live root... 1029 for (GCFunctionInfo::live_iterator LI = MD.live_begin(PI), 1030 LE = MD.live_end(PI); 1031 LI != LE; ++LI) { 1032 // Emit live root's offset within the stack frame. 1033 OS.AddComment("stack index (offset / wordsize)"); 1034 AP.EmitInt32(LI->StackOffset); 1035 } 1036 } 1037 } 1038 1039References 1040========== 1041 1042.. _appel89: 1043 1044[Appel89] Runtime Tags Aren't Necessary. Andrew W. Appel. Lisp and Symbolic 1045Computation 19(7):703-705, July 1989. 1046 1047.. _goldberg91: 1048 1049[Goldberg91] Tag-free garbage collection for strongly typed programming 1050languages. Benjamin Goldberg. ACM SIGPLAN PLDI'91. 1051 1052.. _tolmach94: 1053 1054[Tolmach94] Tag-free garbage collection using explicit type parameters. Andrew 1055Tolmach. Proceedings of the 1994 ACM conference on LISP and functional 1056programming. 1057 1058.. _henderson02: 1059 1060[Henderson2002] `Accurate Garbage Collection in an Uncooperative Environment 1061<http://citeseer.ist.psu.edu/henderson02accurate.html>`__ 1062