1======================== 2LLVM Programmer's Manual 3======================== 4 5.. contents:: 6 :local: 7 8.. warning:: 9 This is always a work in progress. 10 11.. _introduction: 12 13Introduction 14============ 15 16This document is meant to highlight some of the important classes and interfaces 17available in the LLVM source-base. This manual is not intended to explain what 18LLVM is, how it works, and what LLVM code looks like. It assumes that you know 19the basics of LLVM and are interested in writing transformations or otherwise 20analyzing or manipulating the code. 21 22This document should get you oriented so that you can find your way in the 23continuously growing source code that makes up the LLVM infrastructure. Note 24that this manual is not intended to serve as a replacement for reading the 25source code, so if you think there should be a method in one of these classes to 26do something, but it's not listed, check the source. Links to the `doxygen 27<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as 28possible. 29 30The first section of this document describes general information that is useful 31to know when working in the LLVM infrastructure, and the second describes the 32Core LLVM classes. In the future this manual will be extended with information 33describing how to use extension libraries, such as dominator information, CFG 34traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen 35<http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template. 36 37.. _general: 38 39General Information 40=================== 41 42This section contains general information that is useful if you are working in 43the LLVM source-base, but that isn't specific to any particular API. 44 45.. _stl: 46 47The C++ Standard Template Library 48--------------------------------- 49 50LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much 51more than you are used to, or have seen before. Because of this, you might want 52to do a little background reading in the techniques used and capabilities of the 53library. There are many good pages that discuss the STL, and several books on 54the subject that you can get, so it will not be discussed in this document. 55 56Here are some useful links: 57 58#. `cppreference.com 59 <http://en.cppreference.com/w/>`_ - an excellent 60 reference for the STL and other parts of the standard C++ library. 61 62#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly 63 book in the making. It has a decent Standard Library Reference that rivals 64 Dinkumware's, and is unfortunately no longer free since the book has been 65 published. 66 67#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_. 68 69#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a 70 useful `Introduction to the STL 71 <http://www.sgi.com/tech/stl/stl_introduction.html>`_. 72 73#. `Bjarne Stroustrup's C++ Page 74 <http://www.research.att.com/%7Ebs/C++.html>`_. 75 76#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 77 (even better, get the book) 78 <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_. 79 80You are also encouraged to take a look at the :doc:`LLVM Coding Standards 81<CodingStandards>` guide which focuses on how to write maintainable code more 82than where to put your curly braces. 83 84.. _resources: 85 86Other useful references 87----------------------- 88 89#. `Using static and shared libraries across platforms 90 <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_ 91 92.. _apis: 93 94Important and useful LLVM APIs 95============================== 96 97Here we highlight some LLVM APIs that are generally useful and good to know 98about when writing transformations. 99 100.. _isa: 101 102The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates 103------------------------------------------------------ 104 105The LLVM source-base makes extensive use of a custom form of RTTI. These 106templates have many similarities to the C++ ``dynamic_cast<>`` operator, but 107they don't have some drawbacks (primarily stemming from the fact that 108``dynamic_cast<>`` only works on classes that have a v-table). Because they are 109used so often, you must know what they do and how they work. All of these 110templates are defined in the ``llvm/Support/Casting.h`` (`doxygen 111<http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very 112rarely have to include this file directly). 113 114``isa<>``: 115 The ``isa<>`` operator works exactly like the Java "``instanceof``" operator. 116 It returns true or false depending on whether a reference or pointer points to 117 an instance of the specified class. This can be very useful for constraint 118 checking of various sorts (example below). 119 120``cast<>``: 121 The ``cast<>`` operator is a "checked cast" operation. It converts a pointer 122 or reference from a base class to a derived class, causing an assertion 123 failure if it is not really an instance of the right type. This should be 124 used in cases where you have some information that makes you believe that 125 something is of the right type. An example of the ``isa<>`` and ``cast<>`` 126 template is: 127 128 .. code-block:: c++ 129 130 static bool isLoopInvariant(const Value *V, const Loop *L) { 131 if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V)) 132 return true; 133 134 // Otherwise, it must be an instruction... 135 return !L->contains(cast<Instruction>(V)->getParent()); 136 } 137 138 Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``, 139 for that use the ``dyn_cast<>`` operator. 140 141``dyn_cast<>``: 142 The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see 143 if the operand is of the specified type, and if so, returns a pointer to it 144 (this operator does not work with references). If the operand is not of the 145 correct type, a null pointer is returned. Thus, this works very much like 146 the ``dynamic_cast<>`` operator in C++, and should be used in the same 147 circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if`` 148 statement or some other flow control statement like this: 149 150 .. code-block:: c++ 151 152 if (auto *AI = dyn_cast<AllocationInst>(Val)) { 153 // ... 154 } 155 156 This form of the ``if`` statement effectively combines together a call to 157 ``isa<>`` and a call to ``cast<>`` into one statement, which is very 158 convenient. 159 160 Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's 161 ``instanceof`` operator, can be abused. In particular, you should not use big 162 chained ``if/then/else`` blocks to check for lots of different variants of 163 classes. If you find yourself wanting to do this, it is much cleaner and more 164 efficient to use the ``InstVisitor`` class to dispatch over the instruction 165 type directly. 166 167``cast_or_null<>``: 168 The ``cast_or_null<>`` operator works just like the ``cast<>`` operator, 169 except that it allows for a null pointer as an argument (which it then 170 propagates). This can sometimes be useful, allowing you to combine several 171 null checks into one. 172 173``dyn_cast_or_null<>``: 174 The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>`` 175 operator, except that it allows for a null pointer as an argument (which it 176 then propagates). This can sometimes be useful, allowing you to combine 177 several null checks into one. 178 179These five templates can be used with any classes, whether they have a v-table 180or not. If you want to add support for these templates, see the document 181:doc:`How to set up LLVM-style RTTI for your class hierarchy 182<HowToSetUpLLVMStyleRTTI>` 183 184.. _string_apis: 185 186Passing strings (the ``StringRef`` and ``Twine`` classes) 187--------------------------------------------------------- 188 189Although LLVM generally does not do much string manipulation, we do have several 190important APIs which take strings. Two important examples are the Value class 191-- which has names for instructions, functions, etc. -- and the ``StringMap`` 192class which is used extensively in LLVM and Clang. 193 194These are generic classes, and they need to be able to accept strings which may 195have embedded null characters. Therefore, they cannot simply take a ``const 196char *``, and taking a ``const std::string&`` requires clients to perform a heap 197allocation which is usually unnecessary. Instead, many LLVM APIs use a 198``StringRef`` or a ``const Twine&`` for passing strings efficiently. 199 200.. _StringRef: 201 202The ``StringRef`` class 203^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 204 205The ``StringRef`` data type represents a reference to a constant string (a 206character array and a length) and supports the common operations available on 207``std::string``, but does not require heap allocation. 208 209It can be implicitly constructed using a C style null-terminated string, an 210``std::string``, or explicitly with a character pointer and length. For 211example, the ``StringRef`` find function is declared as: 212 213.. code-block:: c++ 214 215 iterator find(StringRef Key); 216 217and clients can call it using any one of: 218 219.. code-block:: c++ 220 221 Map.find("foo"); // Lookup "foo" 222 Map.find(std::string("bar")); // Lookup "bar" 223 Map.find(StringRef("\0baz", 4)); // Lookup "\0baz" 224 225Similarly, APIs which need to return a string may return a ``StringRef`` 226instance, which can be used directly or converted to an ``std::string`` using 227the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen 228<http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more 229information. 230 231You should rarely use the ``StringRef`` class directly, because it contains 232pointers to external memory it is not generally safe to store an instance of the 233class (unless you know that the external storage will not be freed). 234``StringRef`` is small and pervasive enough in LLVM that it should always be 235passed by value. 236 237The ``Twine`` class 238^^^^^^^^^^^^^^^^^^^ 239 240The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__) 241class is an efficient way for APIs to accept concatenated strings. For example, 242a common LLVM paradigm is to name one instruction based on the name of another 243instruction with a suffix, for example: 244 245.. code-block:: c++ 246 247 New = CmpInst::Create(..., SO->getName() + ".cmp"); 248 249The ``Twine`` class is effectively a lightweight `rope 250<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to 251temporary (stack allocated) objects. Twines can be implicitly constructed as 252the result of the plus operator applied to strings (i.e., a C strings, an 253``std::string``, or a ``StringRef``). The twine delays the actual concatenation 254of strings until it is actually required, at which point it can be efficiently 255rendered directly into a character array. This avoids unnecessary heap 256allocation involved in constructing the temporary results of string 257concatenation. See ``llvm/ADT/Twine.h`` (`doxygen 258<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>` 259for more information. 260 261As with a ``StringRef``, ``Twine`` objects point to external memory and should 262almost never be stored or mentioned directly. They are intended solely for use 263when defining a function which should be able to efficiently accept concatenated 264strings. 265 266.. _formatting_strings: 267 268Formatting strings (the ``formatv`` function) 269--------------------------------------------- 270While LLVM doesn't necessarily do a lot of string manipulation and parsing, it 271does do a lot of string formatting. From diagnostic messages, to llvm tool 272outputs such as ``llvm-readobj`` to printing verbose disassembly listings and 273LLDB runtime logging, the need for string formatting is pervasive. 274 275The ``formatv`` is similar in spirit to ``printf``, but uses a different syntax 276which borrows heavily from Python and C#. Unlike ``printf`` it deduces the type 277to be formatted at compile time, so it does not need a format specifier such as 278``%d``. This reduces the mental overhead of trying to construct portable format 279strings, especially for platform-specific types like ``size_t`` or pointer types. 280Unlike both ``printf`` and Python, it additionally fails to compile if LLVM does 281not know how to format the type. These two properties ensure that the function 282is both safer and simpler to use than traditional formatting methods such as 283the ``printf`` family of functions. 284 285Simple formatting 286^^^^^^^^^^^^^^^^^ 287 288A call to ``formatv`` involves a single **format string** consisting of 0 or more 289**replacement sequences**, followed by a variable length list of **replacement values**. 290A replacement sequence is a string of the form ``{N[[,align]:style]}``. 291 292``N`` refers to the 0-based index of the argument from the list of replacement 293values. Note that this means it is possible to reference the same parameter 294multiple times, possibly with different style and/or alignment options, in any order. 295 296``align`` is an optional string specifying the width of the field to format 297the value into, and the alignment of the value within the field. It is specified as 298an optional **alignment style** followed by a positive integral **field width**. The 299alignment style can be one of the characters ``-`` (left align), ``=`` (center align), 300or ``+`` (right align). The default is right aligned. 301 302``style`` is an optional string consisting of a type specific that controls the 303formatting of the value. For example, to format a floating point value as a percentage, 304you can use the style option ``P``. 305 306Custom formatting 307^^^^^^^^^^^^^^^^^ 308 309There are two ways to customize the formatting behavior for a type. 310 3111. Provide a template specialization of ``llvm::format_provider<T>`` for your 312 type ``T`` with the appropriate static format method. 313 314 .. code-block:: c++ 315 316 namespace llvm { 317 template<> 318 struct format_provider<MyFooBar> { 319 static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) { 320 // Do whatever is necessary to format `V` into `Stream` 321 } 322 }; 323 void foo() { 324 MyFooBar X; 325 std::string S = formatv("{0}", X); 326 } 327 } 328 329 This is a useful extensibility mechanism for adding support for formatting your own 330 custom types with your own custom Style options. But it does not help when you want 331 to extend the mechanism for formatting a type that the library already knows how to 332 format. For that, we need something else. 333 3342. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``. 335 336 .. code-block:: c++ 337 338 namespace anything { 339 struct format_int_custom : public llvm::FormatAdapter<int> { 340 explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {} 341 void format(llvm::raw_ostream &Stream, StringRef Style) override { 342 // Do whatever is necessary to format ``this->Item`` into ``Stream`` 343 } 344 }; 345 } 346 namespace llvm { 347 void foo() { 348 std::string S = formatv("{0}", anything::format_int_custom(42)); 349 } 350 } 351 352 If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv`` 353 will call the 354 ``format`` method on the argument passing in the specified style. This allows 355 one to provide custom formatting of any type, including one which already has 356 a builtin format provider. 357 358``formatv`` Examples 359^^^^^^^^^^^^^^^^^^^^ 360Below is intended to provide an incomplete set of examples demonstrating 361the usage of ``formatv``. More information can be found by reading the 362doxygen documentation or by looking at the unit test suite. 363 364 365.. code-block:: c++ 366 367 std::string S; 368 // Simple formatting of basic types and implicit string conversion. 369 S = formatv("{0} ({1:P})", 7, 0.35); // S == "7 (35.00%)" 370 371 // Out-of-order referencing and multi-referencing 372 outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1" 373 374 // Left, right, and center alignment 375 S = formatv("{0,7}", 'a'); // S == " a"; 376 S = formatv("{0,-7}", 'a'); // S == "a "; 377 S = formatv("{0,=7}", 'a'); // S == " a "; 378 S = formatv("{0,+7}", 'a'); // S == " a"; 379 380 // Custom styles 381 S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8" 382 383 // Adapters 384 S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7)); // S == " 42 " 385 S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi" 386 S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == " hi " 387 388 // Ranges 389 std::vector<int> V = {8, 9, 10}; 390 S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10" 391 S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10" 392 S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA" 393 394.. _error_apis: 395 396Error handling 397-------------- 398 399Proper error handling helps us identify bugs in our code, and helps end-users 400understand errors in their tool usage. Errors fall into two broad categories: 401*programmatic* and *recoverable*, with different strategies for handling and 402reporting. 403 404Programmatic Errors 405^^^^^^^^^^^^^^^^^^^ 406 407Programmatic errors are violations of program invariants or API contracts, and 408represent bugs within the program itself. Our aim is to document invariants, and 409to abort quickly at the point of failure (providing some basic diagnostic) when 410invariants are broken at runtime. 411 412The fundamental tools for handling programmatic errors are assertions and the 413llvm_unreachable function. Assertions are used to express invariant conditions, 414and should include a message describing the invariant: 415 416.. code-block:: c++ 417 418 assert(isPhysReg(R) && "All virt regs should have been allocated already."); 419 420The llvm_unreachable function can be used to document areas of control flow 421that should never be entered if the program invariants hold: 422 423.. code-block:: c++ 424 425 enum { Foo, Bar, Baz } X = foo(); 426 427 switch (X) { 428 case Foo: /* Handle Foo */; break; 429 case Bar: /* Handle Bar */; break; 430 default: 431 llvm_unreachable("X should be Foo or Bar here"); 432 } 433 434Recoverable Errors 435^^^^^^^^^^^^^^^^^^ 436 437Recoverable errors represent an error in the program's environment, for example 438a resource failure (a missing file, a dropped network connection, etc.), or 439malformed input. These errors should be detected and communicated to a level of 440the program where they can be handled appropriately. Handling the error may be 441as simple as reporting the issue to the user, or it may involve attempts at 442recovery. 443 444Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme 445represents errors using function return values, similar to classic C integer 446error codes, or C++'s ``std::error_code``. However, the ``Error`` class is 447actually a lightweight wrapper for user-defined error types, allowing arbitrary 448information to be attached to describe the error. This is similar to the way C++ 449exceptions allow throwing of user-defined types. 450 451Success values are created by calling ``Error::success()``, E.g.: 452 453.. code-block:: c++ 454 455 Error foo() { 456 // Do something. 457 // Return success. 458 return Error::success(); 459 } 460 461Success values are very cheap to construct and return - they have minimal 462impact on program performance. 463 464Failure values are constructed using ``make_error<T>``, where ``T`` is any class 465that inherits from the ErrorInfo utility, E.g.: 466 467.. code-block:: c++ 468 469 class BadFileFormat : public ErrorInfo<BadFileFormat> { 470 public: 471 static char ID; 472 std::string Path; 473 474 BadFileFormat(StringRef Path) : Path(Path.str()) {} 475 476 void log(raw_ostream &OS) const override { 477 OS << Path << " is malformed"; 478 } 479 480 std::error_code convertToErrorCode() const override { 481 return make_error_code(object_error::parse_failed); 482 } 483 }; 484 485 char FileExists::ID; // This should be declared in the C++ file. 486 487 Error printFormattedFile(StringRef Path) { 488 if (<check for valid format>) 489 return make_error<InvalidObjectFile>(Path); 490 // print file contents. 491 return Error::success(); 492 } 493 494Error values can be implicitly converted to bool: true for error, false for 495success, enabling the following idiom: 496 497.. code-block:: c++ 498 499 Error mayFail(); 500 501 Error foo() { 502 if (auto Err = mayFail()) 503 return Err; 504 // Success! We can proceed. 505 ... 506 507For functions that can fail but need to return a value the ``Expected<T>`` 508utility can be used. Values of this type can be constructed with either a 509``T``, or an ``Error``. Expected<T> values are also implicitly convertible to 510boolean, but with the opposite convention to ``Error``: true for success, false 511for error. If success, the ``T`` value can be accessed via the dereference 512operator. If failure, the ``Error`` value can be extracted using the 513``takeError()`` method. Idiomatic usage looks like: 514 515.. code-block:: c++ 516 517 Expected<FormattedFile> openFormattedFile(StringRef Path) { 518 // If badly formatted, return an error. 519 if (auto Err = checkFormat(Path)) 520 return std::move(Err); 521 // Otherwise return a FormattedFile instance. 522 return FormattedFile(Path); 523 } 524 525 Error processFormattedFile(StringRef Path) { 526 // Try to open a formatted file 527 if (auto FileOrErr = openFormattedFile(Path)) { 528 // On success, grab a reference to the file and continue. 529 auto &File = *FileOrErr; 530 ... 531 } else 532 // On error, extract the Error value and return it. 533 return FileOrErr.takeError(); 534 } 535 536If an ``Expected<T>`` value is in success mode then the ``takeError()`` method 537will return a success value. Using this fact, the above function can be 538rewritten as: 539 540.. code-block:: c++ 541 542 Error processFormattedFile(StringRef Path) { 543 // Try to open a formatted file 544 auto FileOrErr = openFormattedFile(Path); 545 if (auto Err = FileOrErr.takeError()) 546 // On error, extract the Error value and return it. 547 return Err; 548 // On success, grab a reference to the file and continue. 549 auto &File = *FileOrErr; 550 ... 551 } 552 553This second form is often more readable for functions that involve multiple 554``Expected<T>`` values as it limits the indentation required. 555 556All ``Error`` instances, whether success or failure, must be either checked or 557moved from (via ``std::move`` or a return) before they are destructed. 558Accidentally discarding an unchecked error will cause a program abort at the 559point where the unchecked value's destructor is run, making it easy to identify 560and fix violations of this rule. 561 562Success values are considered checked once they have been tested (by invoking 563the boolean conversion operator): 564 565.. code-block:: c++ 566 567 if (auto Err = canFail(...)) 568 return Err; // Failure value - move error to caller. 569 570 // Safe to continue: Err was checked. 571 572In contrast, the following code will always cause an abort, even if ``canFail`` 573returns a success value: 574 575.. code-block:: c++ 576 577 canFail(); 578 // Program will always abort here, even if canFail() returns Success, since 579 // the value is not checked. 580 581Failure values are considered checked once a handler for the error type has 582been activated: 583 584.. code-block:: c++ 585 586 handleErrors( 587 processFormattedFile(...), 588 [](const BadFileFormat &BFF) { 589 report("Unable to process " + BFF.Path + ": bad format"); 590 }, 591 [](const FileNotFound &FNF) { 592 report("File not found " + FNF.Path); 593 }); 594 595The ``handleErrors`` function takes an error as its first argument, followed by 596a variadic list of "handlers", each of which must be a callable type (a 597function, lambda, or class with a call operator) with one argument. The 598``handleErrors`` function will visit each handler in the sequence and check its 599argument type against the dynamic type of the error, running the first handler 600that matches. This is the same decision process that is used decide which catch 601clause to run for a C++ exception. 602 603Since the list of handlers passed to ``handleErrors`` may not cover every error 604type that can occur, the ``handleErrors`` function also returns an Error value 605that must be checked or propagated. If the error value that is passed to 606``handleErrors`` does not match any of the handlers it will be returned from 607handleErrors. Idiomatic use of ``handleErrors`` thus looks like: 608 609.. code-block:: c++ 610 611 if (auto Err = 612 handleErrors( 613 processFormattedFile(...), 614 [](const BadFileFormat &BFF) { 615 report("Unable to process " + BFF.Path + ": bad format"); 616 }, 617 [](const FileNotFound &FNF) { 618 report("File not found " + FNF.Path); 619 })) 620 return Err; 621 622In cases where you truly know that the handler list is exhaustive the 623``handleAllErrors`` function can be used instead. This is identical to 624``handleErrors`` except that it will terminate the program if an unhandled 625error is passed in, and can therefore return void. The ``handleAllErrors`` 626function should generally be avoided: the introduction of a new error type 627elsewhere in the program can easily turn a formerly exhaustive list of errors 628into a non-exhaustive list, risking unexpected program termination. Where 629possible, use handleErrors and propagate unknown errors up the stack instead. 630 631For tool code, where errors can be handled by printing an error message then 632exiting with an error code, the :ref:`ExitOnError <err_exitonerr>` utility 633may be a better choice than handleErrors, as it simplifies control flow when 634calling fallible functions. 635 636StringError 637""""""""""" 638 639Many kinds of errors have no recovery strategy, the only action that can be 640taken is to report them to the user so that the user can attempt to fix the 641environment. In this case representing the error as a string makes perfect 642sense. LLVM provides the ``StringError`` class for this purpose. It takes two 643arguments: A string error message, and an equivalent ``std::error_code`` for 644interoperability: 645 646.. code-block:: c++ 647 648 make_error<StringError>("Bad executable", 649 make_error_code(errc::executable_format_error")); 650 651If you're certain that the error you're building will never need to be converted 652to a ``std::error_code`` you can use the ``inconvertibleErrorCode()`` function: 653 654.. code-block:: c++ 655 656 make_error<StringError>("Bad executable", inconvertibleErrorCode()); 657 658This should be done only after careful consideration. If any attempt is made to 659convert this error to a ``std::error_code`` it will trigger immediate program 660termination. Unless you are certain that your errors will not need 661interoperability you should look for an existing ``std::error_code`` that you 662can convert to, and even (as painful as it is) consider introducing a new one as 663a stopgap measure. 664 665Interoperability with std::error_code and ErrorOr 666""""""""""""""""""""""""""""""""""""""""""""""""" 667 668Many existing LLVM APIs use ``std::error_code`` and its partner ``ErrorOr<T>`` 669(which plays the same role as ``Expected<T>``, but wraps a ``std::error_code`` 670rather than an ``Error``). The infectious nature of error types means that an 671attempt to change one of these functions to return ``Error`` or ``Expected<T>`` 672instead often results in an avalanche of changes to callers, callers of callers, 673and so on. (The first such attempt, returning an ``Error`` from 674MachOObjectFile's constructor, was abandoned after the diff reached 3000 lines, 675impacted half a dozen libraries, and was still growing). 676 677To solve this problem, the ``Error``/``std::error_code`` interoperability requirement was 678introduced. Two pairs of functions allow any ``Error`` value to be converted to a 679``std::error_code``, any ``Expected<T>`` to be converted to an ``ErrorOr<T>``, and vice 680versa: 681 682.. code-block:: c++ 683 684 std::error_code errorToErrorCode(Error Err); 685 Error errorCodeToError(std::error_code EC); 686 687 template <typename T> ErrorOr<T> expectedToErrorOr(Expected<T> TOrErr); 688 template <typename T> Expected<T> errorOrToExpected(ErrorOr<T> TOrEC); 689 690 691Using these APIs it is easy to make surgical patches that update individual 692functions from ``std::error_code`` to ``Error``, and from ``ErrorOr<T>`` to 693``Expected<T>``. 694 695Returning Errors from error handlers 696"""""""""""""""""""""""""""""""""""" 697 698Error recovery attempts may themselves fail. For that reason, ``handleErrors`` 699actually recognises three different forms of handler signature: 700 701.. code-block:: c++ 702 703 // Error must be handled, no new errors produced: 704 void(UserDefinedError &E); 705 706 // Error must be handled, new errors can be produced: 707 Error(UserDefinedError &E); 708 709 // Original error can be inspected, then re-wrapped and returned (or a new 710 // error can be produced): 711 Error(std::unique_ptr<UserDefinedError> E); 712 713Any error returned from a handler will be returned from the ``handleErrors`` 714function so that it can be handled itself, or propagated up the stack. 715 716.. _err_exitonerr: 717 718Using ExitOnError to simplify tool code 719""""""""""""""""""""""""""""""""""""""" 720 721Library code should never call ``exit`` for a recoverable error, however in tool 722code (especially command line tools) this can be a reasonable approach. Calling 723``exit`` upon encountering an error dramatically simplifies control flow as the 724error no longer needs to be propagated up the stack. This allows code to be 725written in straight-line style, as long as each fallible call is wrapped in a 726check and call to exit. The ``ExitOnError`` class supports this pattern by 727providing call operators that inspect ``Error`` values, stripping the error away 728in the success case and logging to ``stderr`` then exiting in the failure case. 729 730To use this class, declare a global ``ExitOnError`` variable in your program: 731 732.. code-block:: c++ 733 734 ExitOnError ExitOnErr; 735 736Calls to fallible functions can then be wrapped with a call to ``ExitOnErr``, 737turning them into non-failing calls: 738 739.. code-block:: c++ 740 741 Error mayFail(); 742 Expected<int> mayFail2(); 743 744 void foo() { 745 ExitOnErr(mayFail()); 746 int X = ExitOnErr(mayFail2()); 747 } 748 749On failure, the error's log message will be written to ``stderr``, optionally 750preceded by a string "banner" that can be set by calling the setBanner method. A 751mapping can also be supplied from ``Error`` values to exit codes using the 752``setExitCodeMapper`` method: 753 754.. code-block:: c++ 755 756 int main(int argc, char *argv[]) { 757 ExitOnErr.setBanner(std::string(argv[0]) + " error:"); 758 ExitOnErr.setExitCodeMapper( 759 [](const Error &Err) { 760 if (Err.isA<BadFileFormat>()) 761 return 2; 762 return 1; 763 }); 764 765Use ``ExitOnError`` in your tool code where possible as it can greatly improve 766readability. 767 768Fallible constructors 769""""""""""""""""""""" 770 771Some classes require resource acquisition or other complex initialization that 772can fail during construction. Unfortunately constructors can't return errors, 773and having clients test objects after they're constructed to ensure that they're 774valid is error prone as it's all too easy to forget the test. To work around 775this, use the named constructor idiom and return an ``Expected<T>``: 776 777.. code-block:: c++ 778 779 class Foo { 780 public: 781 782 static Expected<Foo> Create(Resource R1, Resource R2) { 783 Error Err; 784 Foo F(R1, R2, Err); 785 if (Err) 786 return std::move(Err); 787 return std::move(F); 788 } 789 790 private: 791 792 Foo(Resource R1, Resource R2, Error &Err) { 793 ErrorAsOutParameter EAO(&Err); 794 if (auto Err2 = R1.acquire()) { 795 Err = std::move(Err2); 796 return; 797 } 798 Err = R2.acquire(); 799 } 800 }; 801 802 803Here, the named constructor passes an ``Error`` by reference into the actual 804constructor, which the constructor can then use to return errors. The 805``ErrorAsOutParameter`` utility sets the ``Error`` value's checked flag on entry 806to the constructor so that the error can be assigned to, then resets it on exit 807to force the client (the named constructor) to check the error. 808 809By using this idiom, clients attempting to construct a Foo receive either a 810well-formed Foo or an Error, never an object in an invalid state. 811 812Propagating and consuming errors based on types 813""""""""""""""""""""""""""""""""""""""""""""""" 814 815In some contexts, certain types of error are known to be benign. For example, 816when walking an archive, some clients may be happy to skip over badly formatted 817object files rather than terminating the walk immediately. Skipping badly 818formatted objects could be achieved using an elaborate handler method, but the 819Error.h header provides two utilities that make this idiom much cleaner: the 820type inspection method, ``isA``, and the ``consumeError`` function: 821 822.. code-block:: c++ 823 824 Error walkArchive(Archive A) { 825 for (unsigned I = 0; I != A.numMembers(); ++I) { 826 auto ChildOrErr = A.getMember(I); 827 if (auto Err = ChildOrErr.takeError()) { 828 if (Err.isA<BadFileFormat>()) 829 consumeError(std::move(Err)) 830 else 831 return Err; 832 } 833 auto &Child = *ChildOrErr; 834 // Use Child 835 ... 836 } 837 return Error::success(); 838 } 839 840Concatenating Errors with joinErrors 841"""""""""""""""""""""""""""""""""""" 842 843In the archive walking example above ``BadFileFormat`` errors are simply 844consumed and ignored. If the client had wanted report these errors after 845completing the walk over the archive they could use the ``joinErrors`` utility: 846 847.. code-block:: c++ 848 849 Error walkArchive(Archive A) { 850 Error DeferredErrs = Error::success(); 851 for (unsigned I = 0; I != A.numMembers(); ++I) { 852 auto ChildOrErr = A.getMember(I); 853 if (auto Err = ChildOrErr.takeError()) 854 if (Err.isA<BadFileFormat>()) 855 DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err)); 856 else 857 return Err; 858 auto &Child = *ChildOrErr; 859 // Use Child 860 ... 861 } 862 return DeferredErrs; 863 } 864 865The ``joinErrors`` routine builds a special error type called ``ErrorList``, 866which holds a list of user defined errors. The ``handleErrors`` routine 867recognizes this type and will attempt to handle each of the contained erorrs in 868order. If all contained errors can be handled, ``handleErrors`` will return 869``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining 870errors and return the resulting ``ErrorList``. 871 872Building fallible iterators and iterator ranges 873""""""""""""""""""""""""""""""""""""""""""""""" 874 875The archive walking examples above retrieve archive members by index, however 876this requires considerable boiler-plate for iteration and error checking. We can 877clean this up by using ``Error`` with the "fallible iterator" pattern. The usual 878C++ iterator patterns do not allow for failure on increment, but we can 879incorporate support for it by having iterators hold an Error reference through 880which they can report failure. In this pattern, if an increment operation fails 881the failure is recorded via the Error reference and the iterator value is set to 882the end of the range in order to terminate the loop. This ensures that the 883dereference operation is safe anywhere that an ordinary iterator dereference 884would be safe (i.e. when the iterator is not equal to end). Where this pattern 885is followed (as in the ``llvm::object::Archive`` class) the result is much 886cleaner iteration idiom: 887 888.. code-block:: c++ 889 890 Error Err; 891 for (auto &Child : Ar->children(Err)) { 892 // Use Child - we only enter the loop when it's valid 893 ... 894 } 895 // Check Err after the loop to ensure it didn't break due to an error. 896 if (Err) 897 return Err; 898 899.. _function_apis: 900 901More information on Error and its related utilities can be found in the 902Error.h header file. 903 904Passing functions and other callable objects 905-------------------------------------------- 906 907Sometimes you may want a function to be passed a callback object. In order to 908support lambda expressions and other function objects, you should not use the 909traditional C approach of taking a function pointer and an opaque cookie: 910 911.. code-block:: c++ 912 913 void takeCallback(bool (*Callback)(Function *, void *), void *Cookie); 914 915Instead, use one of the following approaches: 916 917Function template 918^^^^^^^^^^^^^^^^^ 919 920If you don't mind putting the definition of your function into a header file, 921make it a function template that is templated on the callable type. 922 923.. code-block:: c++ 924 925 template<typename Callable> 926 void takeCallback(Callable Callback) { 927 Callback(1, 2, 3); 928 } 929 930The ``function_ref`` class template 931^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 932 933The ``function_ref`` 934(`doxygen <http://llvm.org/docs/doxygen/html/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class 935template represents a reference to a callable object, templated over the type 936of the callable. This is a good choice for passing a callback to a function, 937if you don't need to hold onto the callback after the function returns. In this 938way, ``function_ref`` is to ``std::function`` as ``StringRef`` is to 939``std::string``. 940 941``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from 942any callable object that can be called with arguments of type ``Param1``, 943``Param2``, ..., and returns a value that can be converted to type ``Ret``. 944For example: 945 946.. code-block:: c++ 947 948 void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) { 949 for (BasicBlock &BB : *F) 950 if (Callback(&BB)) 951 return; 952 } 953 954can be called using: 955 956.. code-block:: c++ 957 958 visitBasicBlocks(F, [&](BasicBlock *BB) { 959 if (process(BB)) 960 return isEmpty(BB); 961 return false; 962 }); 963 964Note that a ``function_ref`` object contains pointers to external memory, so it 965is not generally safe to store an instance of the class (unless you know that 966the external storage will not be freed). If you need this ability, consider 967using ``std::function``. ``function_ref`` is small enough that it should always 968be passed by value. 969 970.. _DEBUG: 971 972The ``DEBUG()`` macro and ``-debug`` option 973------------------------------------------- 974 975Often when working on your pass you will put a bunch of debugging printouts and 976other code into your pass. After you get it working, you want to remove it, but 977you may need it again in the future (to work out new bugs that you run across). 978 979Naturally, because of this, you don't want to delete the debug printouts, but 980you don't want them to always be noisy. A standard compromise is to comment 981them out, allowing you to enable them if you need them in the future. 982 983The ``llvm/Support/Debug.h`` (`doxygen 984<http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named 985``DEBUG()`` that is a much nicer solution to this problem. Basically, you can 986put arbitrary code into the argument of the ``DEBUG`` macro, and it is only 987executed if '``opt``' (or any other tool) is run with the '``-debug``' command 988line argument: 989 990.. code-block:: c++ 991 992 DEBUG(errs() << "I am here!\n"); 993 994Then you can run your pass like this: 995 996.. code-block:: none 997 998 $ opt < a.bc > /dev/null -mypass 999 <no output> 1000 $ opt < a.bc > /dev/null -mypass -debug 1001 I am here! 1002 1003Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not 1004have to create "yet another" command line option for the debug output for your 1005pass. Note that ``DEBUG()`` macros are disabled for non-asserts builds, so they 1006do not cause a performance impact at all (for the same reason, they should also 1007not contain side-effects!). 1008 1009One additional nice thing about the ``DEBUG()`` macro is that you can enable or 1010disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set 1011DebugFlag=1``" from the gdb if the program is running. If the program hasn't 1012been started yet, you can always just run it with ``-debug``. 1013 1014.. _DEBUG_TYPE: 1015 1016Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option 1017^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1018 1019Sometimes you may find yourself in a situation where enabling ``-debug`` just 1020turns on **too much** information (such as when working on the code generator). 1021If you want to enable debug information with more fine-grained control, you 1022should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as 1023follows: 1024 1025.. code-block:: c++ 1026 1027 #define DEBUG_TYPE "foo" 1028 DEBUG(errs() << "'foo' debug type\n"); 1029 #undef DEBUG_TYPE 1030 #define DEBUG_TYPE "bar" 1031 DEBUG(errs() << "'bar' debug type\n")); 1032 #undef DEBUG_TYPE 1033 1034Then you can run your pass like this: 1035 1036.. code-block:: none 1037 1038 $ opt < a.bc > /dev/null -mypass 1039 <no output> 1040 $ opt < a.bc > /dev/null -mypass -debug 1041 'foo' debug type 1042 'bar' debug type 1043 $ opt < a.bc > /dev/null -mypass -debug-only=foo 1044 'foo' debug type 1045 $ opt < a.bc > /dev/null -mypass -debug-only=bar 1046 'bar' debug type 1047 $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar 1048 'foo' debug type 1049 'bar' debug type 1050 1051Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file, 1052to specify the debug type for the entire module. Be careful that you only do 1053this after including Debug.h and not around any #include of headers. Also, you 1054should use names more meaningful than "foo" and "bar", because there is no 1055system in place to ensure that names do not conflict. If two different modules 1056use the same string, they will all be turned on when the name is specified. 1057This allows, for example, all debug information for instruction scheduling to be 1058enabled with ``-debug-only=InstrSched``, even if the source lives in multiple 1059files. The name must not include a comma (,) as that is used to separate the 1060arguments of the ``-debug-only`` option. 1061 1062For performance reasons, -debug-only is not available in optimized build 1063(``--enable-optimized``) of LLVM. 1064 1065The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would 1066like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It 1067takes an additional first parameter, which is the type to use. For example, the 1068preceding example could be written as: 1069 1070.. code-block:: c++ 1071 1072 DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n"); 1073 DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n")); 1074 1075.. _Statistic: 1076 1077The ``Statistic`` class & ``-stats`` option 1078------------------------------------------- 1079 1080The ``llvm/ADT/Statistic.h`` (`doxygen 1081<http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class 1082named ``Statistic`` that is used as a unified way to keep track of what the LLVM 1083compiler is doing and how effective various optimizations are. It is useful to 1084see what optimizations are contributing to making a particular program run 1085faster. 1086 1087Often you may run your pass on some big program, and you're interested to see 1088how many times it makes a certain transformation. Although you can do this with 1089hand inspection, or some ad-hoc method, this is a real pain and not very useful 1090for big programs. Using the ``Statistic`` class makes it very easy to keep 1091track of this information, and the calculated information is presented in a 1092uniform manner with the rest of the passes being executed. 1093 1094There are many examples of ``Statistic`` uses, but the basics of using it are as 1095follows: 1096 1097#. Define your statistic like this: 1098 1099 .. code-block:: c++ 1100 1101 #define DEBUG_TYPE "mypassname" // This goes before any #includes. 1102 STATISTIC(NumXForms, "The # of times I did stuff"); 1103 1104 The ``STATISTIC`` macro defines a static variable, whose name is specified by 1105 the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and 1106 the description is taken from the second argument. The variable defined 1107 ("NumXForms" in this case) acts like an unsigned integer. 1108 1109#. Whenever you make a transformation, bump the counter: 1110 1111 .. code-block:: c++ 1112 1113 ++NumXForms; // I did stuff! 1114 1115That's all you have to do. To get '``opt``' to print out the statistics 1116gathered, use the '``-stats``' option: 1117 1118.. code-block:: none 1119 1120 $ opt -stats -mypassname < program.bc > /dev/null 1121 ... statistics output ... 1122 1123Note that in order to use the '``-stats``' option, LLVM must be 1124compiled with assertions enabled. 1125 1126When running ``opt`` on a C file from the SPEC benchmark suite, it gives a 1127report that looks like this: 1128 1129.. code-block:: none 1130 1131 7646 bitcodewriter - Number of normal instructions 1132 725 bitcodewriter - Number of oversized instructions 1133 129996 bitcodewriter - Number of bitcode bytes written 1134 2817 raise - Number of insts DCEd or constprop'd 1135 3213 raise - Number of cast-of-self removed 1136 5046 raise - Number of expression trees converted 1137 75 raise - Number of other getelementptr's formed 1138 138 raise - Number of load/store peepholes 1139 42 deadtypeelim - Number of unused typenames removed from symtab 1140 392 funcresolve - Number of varargs functions resolved 1141 27 globaldce - Number of global variables removed 1142 2 adce - Number of basic blocks removed 1143 134 cee - Number of branches revectored 1144 49 cee - Number of setcc instruction eliminated 1145 532 gcse - Number of loads removed 1146 2919 gcse - Number of instructions removed 1147 86 indvars - Number of canonical indvars added 1148 87 indvars - Number of aux indvars removed 1149 25 instcombine - Number of dead inst eliminate 1150 434 instcombine - Number of insts combined 1151 248 licm - Number of load insts hoisted 1152 1298 licm - Number of insts hoisted to a loop pre-header 1153 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header) 1154 75 mem2reg - Number of alloca's promoted 1155 1444 cfgsimplify - Number of blocks simplified 1156 1157Obviously, with so many optimizations, having a unified framework for this stuff 1158is very nice. Making your pass fit well into the framework makes it more 1159maintainable and useful. 1160 1161.. _ViewGraph: 1162 1163Viewing graphs while debugging code 1164----------------------------------- 1165 1166Several of the important data structures in LLVM are graphs: for example CFGs 1167made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM 1168:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection 1169DAGs <SelectionDAG>`. In many cases, while debugging various parts of the 1170compiler, it is nice to instantly visualize these graphs. 1171 1172LLVM provides several callbacks that are available in a debug build to do 1173exactly that. If you call the ``Function::viewCFG()`` method, for example, the 1174current LLVM tool will pop up a window containing the CFG for the function where 1175each basic block is a node in the graph, and each node contains the instructions 1176in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does 1177not include the instructions), the ``MachineFunction::viewCFG()`` and 1178``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()`` 1179methods. Within GDB, for example, you can usually use something like ``call 1180DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to 1181these functions in your code in places you want to debug. 1182 1183Getting this to work requires a small amount of setup. On Unix systems 1184with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make 1185sure 'dot' and 'gv' are in your path. If you are running on Mac OS X, download 1186and install the Mac OS X `Graphviz program 1187<http://www.pixelglow.com/graphviz/>`_ and add 1188``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to 1189your path. The programs need not be present when configuring, building or 1190running LLVM and can simply be installed when needed during an active debug 1191session. 1192 1193``SelectionDAG`` has been extended to make it easier to locate *interesting* 1194nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node, 1195"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in 1196the specified color (choices of colors can be found at `colors 1197<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes 1198can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can 1199be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.) 1200If you want to restart and clear all the current graph attributes, then you can 1201``call DAG.clearGraphAttrs()``. 1202 1203Note that graph visualization features are compiled out of Release builds to 1204reduce file size. This means that you need a Debug+Asserts or Release+Asserts 1205build to use these features. 1206 1207.. _datastructure: 1208 1209Picking the Right Data Structure for a Task 1210=========================================== 1211 1212LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we 1213commonly use STL data structures. This section describes the trade-offs you 1214should consider when you pick one. 1215 1216The first step is a choose your own adventure: do you want a sequential 1217container, a set-like container, or a map-like container? The most important 1218thing when choosing a container is the algorithmic properties of how you plan to 1219access the container. Based on that, you should use: 1220 1221 1222* a :ref:`map-like <ds_map>` container if you need efficient look-up of a 1223 value based on another value. Map-like containers also support efficient 1224 queries for containment (whether a key is in the map). Map-like containers 1225 generally do not support efficient reverse mapping (values to keys). If you 1226 need that, use two maps. Some map-like containers also support efficient 1227 iteration through the keys in sorted order. Map-like containers are the most 1228 expensive sort, only use them if you need one of these capabilities. 1229 1230* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into 1231 a container that automatically eliminates duplicates. Some set-like 1232 containers support efficient iteration through the elements in sorted order. 1233 Set-like containers are more expensive than sequential containers. 1234 1235* a :ref:`sequential <ds_sequential>` container provides the most efficient way 1236 to add elements and keeps track of the order they are added to the collection. 1237 They permit duplicates and support efficient iteration, but do not support 1238 efficient look-up based on a key. 1239 1240* a :ref:`string <ds_string>` container is a specialized sequential container or 1241 reference structure that is used for character or byte arrays. 1242 1243* a :ref:`bit <ds_bit>` container provides an efficient way to store and 1244 perform set operations on sets of numeric id's, while automatically 1245 eliminating duplicates. Bit containers require a maximum of 1 bit for each 1246 identifier you want to store. 1247 1248Once the proper category of container is determined, you can fine tune the 1249memory use, constant factors, and cache behaviors of access by intelligently 1250picking a member of the category. Note that constant factors and cache behavior 1251can be a big deal. If you have a vector that usually only contains a few 1252elements (but could contain many), for example, it's much better to use 1253:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so 1254avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding 1255the elements to the container. 1256 1257.. _ds_sequential: 1258 1259Sequential Containers (std::vector, std::list, etc) 1260--------------------------------------------------- 1261 1262There are a variety of sequential containers available for you, based on your 1263needs. Pick the first in this section that will do what you want. 1264 1265.. _dss_arrayref: 1266 1267llvm/ADT/ArrayRef.h 1268^^^^^^^^^^^^^^^^^^^ 1269 1270The ``llvm::ArrayRef`` class is the preferred class to use in an interface that 1271accepts a sequential list of elements in memory and just reads from them. By 1272taking an ``ArrayRef``, the API can be passed a fixed size array, an 1273``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous 1274in memory. 1275 1276.. _dss_fixedarrays: 1277 1278Fixed Size Arrays 1279^^^^^^^^^^^^^^^^^ 1280 1281Fixed size arrays are very simple and very fast. They are good if you know 1282exactly how many elements you have, or you have a (low) upper bound on how many 1283you have. 1284 1285.. _dss_heaparrays: 1286 1287Heap Allocated Arrays 1288^^^^^^^^^^^^^^^^^^^^^ 1289 1290Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good 1291if the number of elements is variable, if you know how many elements you will 1292need before the array is allocated, and if the array is usually large (if not, 1293consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated 1294array is the cost of the new/delete (aka malloc/free). Also note that if you 1295are allocating an array of a type with a constructor, the constructor and 1296destructors will be run for every element in the array (re-sizable vectors only 1297construct those elements actually used). 1298 1299.. _dss_tinyptrvector: 1300 1301llvm/ADT/TinyPtrVector.h 1302^^^^^^^^^^^^^^^^^^^^^^^^ 1303 1304``TinyPtrVector<Type>`` is a highly specialized collection class that is 1305optimized to avoid allocation in the case when a vector has zero or one 1306elements. It has two major restrictions: 1) it can only hold values of pointer 1307type, and 2) it cannot hold a null pointer. 1308 1309Since this container is highly specialized, it is rarely used. 1310 1311.. _dss_smallvector: 1312 1313llvm/ADT/SmallVector.h 1314^^^^^^^^^^^^^^^^^^^^^^ 1315 1316``SmallVector<Type, N>`` is a simple class that looks and smells just like 1317``vector<Type>``: it supports efficient iteration, lays out elements in memory 1318order (so you can do pointer arithmetic between elements), supports efficient 1319push_back/pop_back operations, supports efficient random access to its elements, 1320etc. 1321 1322The advantage of SmallVector is that it allocates space for some number of 1323elements (N) **in the object itself**. Because of this, if the SmallVector is 1324dynamically smaller than N, no malloc is performed. This can be a big win in 1325cases where the malloc/free call is far more expensive than the code that 1326fiddles around with the elements. 1327 1328This is good for vectors that are "usually small" (e.g. the number of 1329predecessors/successors of a block is usually less than 8). On the other hand, 1330this makes the size of the SmallVector itself large, so you don't want to 1331allocate lots of them (doing so will waste a lot of space). As such, 1332SmallVectors are most useful when on the stack. 1333 1334SmallVector also provides a nice portable and efficient replacement for 1335``alloca``. 1336 1337.. note:: 1338 1339 Prefer to use ``SmallVectorImpl<T>`` as a parameter type. 1340 1341 In APIs that don't care about the "small size" (most?), prefer to use 1342 the ``SmallVectorImpl<T>`` class, which is basically just the "vector 1343 header" (and methods) without the elements allocated after it. Note that 1344 ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the 1345 conversion is implicit and costs nothing. E.g. 1346 1347 .. code-block:: c++ 1348 1349 // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>. 1350 hardcodedSmallSize(SmallVector<Foo, 2> &Out); 1351 // GOOD: Clients can pass any SmallVector<Foo, N>. 1352 allowsAnySmallSize(SmallVectorImpl<Foo> &Out); 1353 1354 void someFunc() { 1355 SmallVector<Foo, 8> Vec; 1356 hardcodedSmallSize(Vec); // Error. 1357 allowsAnySmallSize(Vec); // Works. 1358 } 1359 1360 Even though it has "``Impl``" in the name, this is so widely used that 1361 it really isn't "private to the implementation" anymore. A name like 1362 ``SmallVectorHeader`` would be more appropriate. 1363 1364.. _dss_vector: 1365 1366<vector> 1367^^^^^^^^ 1368 1369``std::vector`` is well loved and respected. It is useful when SmallVector 1370isn't: when the size of the vector is often large (thus the small optimization 1371will rarely be a benefit) or if you will be allocating many instances of the 1372vector itself (which would waste space for elements that aren't in the 1373container). vector is also useful when interfacing with code that expects 1374vectors :). 1375 1376One worthwhile note about std::vector: avoid code like this: 1377 1378.. code-block:: c++ 1379 1380 for ( ... ) { 1381 std::vector<foo> V; 1382 // make use of V. 1383 } 1384 1385Instead, write this as: 1386 1387.. code-block:: c++ 1388 1389 std::vector<foo> V; 1390 for ( ... ) { 1391 // make use of V. 1392 V.clear(); 1393 } 1394 1395Doing so will save (at least) one heap allocation and free per iteration of the 1396loop. 1397 1398.. _dss_deque: 1399 1400<deque> 1401^^^^^^^ 1402 1403``std::deque`` is, in some senses, a generalized version of ``std::vector``. 1404Like ``std::vector``, it provides constant time random access and other similar 1405properties, but it also provides efficient access to the front of the list. It 1406does not guarantee continuity of elements within memory. 1407 1408In exchange for this extra flexibility, ``std::deque`` has significantly higher 1409constant factor costs than ``std::vector``. If possible, use ``std::vector`` or 1410something cheaper. 1411 1412.. _dss_list: 1413 1414<list> 1415^^^^^^ 1416 1417``std::list`` is an extremely inefficient class that is rarely useful. It 1418performs a heap allocation for every element inserted into it, thus having an 1419extremely high constant factor, particularly for small data types. 1420``std::list`` also only supports bidirectional iteration, not random access 1421iteration. 1422 1423In exchange for this high cost, std::list supports efficient access to both ends 1424of the list (like ``std::deque``, but unlike ``std::vector`` or 1425``SmallVector``). In addition, the iterator invalidation characteristics of 1426std::list are stronger than that of a vector class: inserting or removing an 1427element into the list does not invalidate iterator or pointers to other elements 1428in the list. 1429 1430.. _dss_ilist: 1431 1432llvm/ADT/ilist.h 1433^^^^^^^^^^^^^^^^ 1434 1435``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive, 1436because it requires the element to store and provide access to the prev/next 1437pointers for the list. 1438 1439``ilist`` has the same drawbacks as ``std::list``, and additionally requires an 1440``ilist_traits`` implementation for the element type, but it provides some novel 1441characteristics. In particular, it can efficiently store polymorphic objects, 1442the traits class is informed when an element is inserted or removed from the 1443list, and ``ilist``\ s are guaranteed to support a constant-time splice 1444operation. 1445 1446These properties are exactly what we want for things like ``Instruction``\ s and 1447basic blocks, which is why these are implemented with ``ilist``\ s. 1448 1449Related classes of interest are explained in the following subsections: 1450 1451* :ref:`ilist_traits <dss_ilist_traits>` 1452 1453* :ref:`iplist <dss_iplist>` 1454 1455* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>` 1456 1457* :ref:`Sentinels <dss_ilist_sentinel>` 1458 1459.. _dss_packedvector: 1460 1461llvm/ADT/PackedVector.h 1462^^^^^^^^^^^^^^^^^^^^^^^ 1463 1464Useful for storing a vector of values using only a few number of bits for each 1465value. Apart from the standard operations of a vector-like container, it can 1466also perform an 'or' set operation. 1467 1468For example: 1469 1470.. code-block:: c++ 1471 1472 enum State { 1473 None = 0x0, 1474 FirstCondition = 0x1, 1475 SecondCondition = 0x2, 1476 Both = 0x3 1477 }; 1478 1479 State get() { 1480 PackedVector<State, 2> Vec1; 1481 Vec1.push_back(FirstCondition); 1482 1483 PackedVector<State, 2> Vec2; 1484 Vec2.push_back(SecondCondition); 1485 1486 Vec1 |= Vec2; 1487 return Vec1[0]; // returns 'Both'. 1488 } 1489 1490.. _dss_ilist_traits: 1491 1492ilist_traits 1493^^^^^^^^^^^^ 1494 1495``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>`` 1496(and consequently ``ilist<T>``) publicly derive from this traits class. 1497 1498.. _dss_iplist: 1499 1500iplist 1501^^^^^^ 1502 1503``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower 1504interface. Notably, inserters from ``T&`` are absent. 1505 1506``ilist_traits<T>`` is a public base of this class and can be used for a wide 1507variety of customizations. 1508 1509.. _dss_ilist_node: 1510 1511llvm/ADT/ilist_node.h 1512^^^^^^^^^^^^^^^^^^^^^ 1513 1514``ilist_node<T>`` implements the forward and backward links that are expected 1515by the ``ilist<T>`` (and analogous containers) in the default manner. 1516 1517``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually 1518``T`` publicly derives from ``ilist_node<T>``. 1519 1520.. _dss_ilist_sentinel: 1521 1522Sentinels 1523^^^^^^^^^ 1524 1525``ilist``\ s have another specialty that must be considered. To be a good 1526citizen in the C++ ecosystem, it needs to support the standard container 1527operations, such as ``begin`` and ``end`` iterators, etc. Also, the 1528``operator--`` must work correctly on the ``end`` iterator in the case of 1529non-empty ``ilist``\ s. 1530 1531The only sensible solution to this problem is to allocate a so-called *sentinel* 1532along with the intrusive list, which serves as the ``end`` iterator, providing 1533the back-link to the last element. However conforming to the C++ convention it 1534is illegal to ``operator++`` beyond the sentinel and it also must not be 1535dereferenced. 1536 1537These constraints allow for some implementation freedom to the ``ilist`` how to 1538allocate and store the sentinel. The corresponding policy is dictated by 1539``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need 1540for a sentinel arises. 1541 1542While the default policy is sufficient in most cases, it may break down when 1543``T`` does not provide a default constructor. Also, in the case of many 1544instances of ``ilist``\ s, the memory overhead of the associated sentinels is 1545wasted. To alleviate the situation with numerous and voluminous 1546``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*. 1547 1548Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which 1549superpose the sentinel with the ``ilist`` instance in memory. Pointer 1550arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s 1551``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves 1552as the back-link of the sentinel. This is the only field in the ghostly 1553sentinel which can be legally accessed. 1554 1555.. _dss_other: 1556 1557Other Sequential Container options 1558^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1559 1560Other STL containers are available, such as ``std::string``. 1561 1562There are also various STL adapter classes such as ``std::queue``, 1563``std::priority_queue``, ``std::stack``, etc. These provide simplified access 1564to an underlying container but don't affect the cost of the container itself. 1565 1566.. _ds_string: 1567 1568String-like containers 1569---------------------- 1570 1571There are a variety of ways to pass around and use strings in C and C++, and 1572LLVM adds a few new options to choose from. Pick the first option on this list 1573that will do what you need, they are ordered according to their relative cost. 1574 1575Note that it is generally preferred to *not* pass strings around as ``const 1576char*``'s. These have a number of problems, including the fact that they 1577cannot represent embedded nul ("\0") characters, and do not have a length 1578available efficiently. The general replacement for '``const char*``' is 1579StringRef. 1580 1581For more information on choosing string containers for APIs, please see 1582:ref:`Passing Strings <string_apis>`. 1583 1584.. _dss_stringref: 1585 1586llvm/ADT/StringRef.h 1587^^^^^^^^^^^^^^^^^^^^ 1588 1589The StringRef class is a simple value class that contains a pointer to a 1590character and a length, and is quite related to the :ref:`ArrayRef 1591<dss_arrayref>` class (but specialized for arrays of characters). Because 1592StringRef carries a length with it, it safely handles strings with embedded nul 1593characters in it, getting the length does not require a strlen call, and it even 1594has very convenient APIs for slicing and dicing the character range that it 1595represents. 1596 1597StringRef is ideal for passing simple strings around that are known to be live, 1598either because they are C string literals, std::string, a C array, or a 1599SmallVector. Each of these cases has an efficient implicit conversion to 1600StringRef, which doesn't result in a dynamic strlen being executed. 1601 1602StringRef has a few major limitations which make more powerful string containers 1603useful: 1604 1605#. You cannot directly convert a StringRef to a 'const char*' because there is 1606 no way to add a trailing nul (unlike the .c_str() method on various stronger 1607 classes). 1608 1609#. StringRef doesn't own or keep alive the underlying string bytes. 1610 As such it can easily lead to dangling pointers, and is not suitable for 1611 embedding in datastructures in most cases (instead, use an std::string or 1612 something like that). 1613 1614#. For the same reason, StringRef cannot be used as the return value of a 1615 method if the method "computes" the result string. Instead, use std::string. 1616 1617#. StringRef's do not allow you to mutate the pointed-to string bytes and it 1618 doesn't allow you to insert or remove bytes from the range. For editing 1619 operations like this, it interoperates with the :ref:`Twine <dss_twine>` 1620 class. 1621 1622Because of its strengths and limitations, it is very common for a function to 1623take a StringRef and for a method on an object to return a StringRef that points 1624into some string that it owns. 1625 1626.. _dss_twine: 1627 1628llvm/ADT/Twine.h 1629^^^^^^^^^^^^^^^^ 1630 1631The Twine class is used as an intermediary datatype for APIs that want to take a 1632string that can be constructed inline with a series of concatenations. Twine 1633works by forming recursive instances of the Twine datatype (a simple value 1634object) on the stack as temporary objects, linking them together into a tree 1635which is then linearized when the Twine is consumed. Twine is only safe to use 1636as the argument to a function, and should always be a const reference, e.g.: 1637 1638.. code-block:: c++ 1639 1640 void foo(const Twine &T); 1641 ... 1642 StringRef X = ... 1643 unsigned i = ... 1644 foo(X + "." + Twine(i)); 1645 1646This example forms a string like "blarg.42" by concatenating the values 1647together, and does not form intermediate strings containing "blarg" or "blarg.". 1648 1649Because Twine is constructed with temporary objects on the stack, and because 1650these instances are destroyed at the end of the current statement, it is an 1651inherently dangerous API. For example, this simple variant contains undefined 1652behavior and will probably crash: 1653 1654.. code-block:: c++ 1655 1656 void foo(const Twine &T); 1657 ... 1658 StringRef X = ... 1659 unsigned i = ... 1660 const Twine &Tmp = X + "." + Twine(i); 1661 foo(Tmp); 1662 1663... because the temporaries are destroyed before the call. That said, Twine's 1664are much more efficient than intermediate std::string temporaries, and they work 1665really well with StringRef. Just be aware of their limitations. 1666 1667.. _dss_smallstring: 1668 1669llvm/ADT/SmallString.h 1670^^^^^^^^^^^^^^^^^^^^^^ 1671 1672SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some 1673convenience APIs like += that takes StringRef's. SmallString avoids allocating 1674memory in the case when the preallocated space is enough to hold its data, and 1675it calls back to general heap allocation when required. Since it owns its data, 1676it is very safe to use and supports full mutation of the string. 1677 1678Like SmallVector's, the big downside to SmallString is their sizeof. While they 1679are optimized for small strings, they themselves are not particularly small. 1680This means that they work great for temporary scratch buffers on the stack, but 1681should not generally be put into the heap: it is very rare to see a SmallString 1682as the member of a frequently-allocated heap data structure or returned 1683by-value. 1684 1685.. _dss_stdstring: 1686 1687std::string 1688^^^^^^^^^^^ 1689 1690The standard C++ std::string class is a very general class that (like 1691SmallString) owns its underlying data. sizeof(std::string) is very reasonable 1692so it can be embedded into heap data structures and returned by-value. On the 1693other hand, std::string is highly inefficient for inline editing (e.g. 1694concatenating a bunch of stuff together) and because it is provided by the 1695standard library, its performance characteristics depend a lot of the host 1696standard library (e.g. libc++ and MSVC provide a highly optimized string class, 1697GCC contains a really slow implementation). 1698 1699The major disadvantage of std::string is that almost every operation that makes 1700them larger can allocate memory, which is slow. As such, it is better to use 1701SmallVector or Twine as a scratch buffer, but then use std::string to persist 1702the result. 1703 1704.. _ds_set: 1705 1706Set-Like Containers (std::set, SmallSet, SetVector, etc) 1707-------------------------------------------------------- 1708 1709Set-like containers are useful when you need to canonicalize multiple values 1710into a single representation. There are several different choices for how to do 1711this, providing various trade-offs. 1712 1713.. _dss_sortedvectorset: 1714 1715A sorted 'vector' 1716^^^^^^^^^^^^^^^^^ 1717 1718If you intend to insert a lot of elements, then do a lot of queries, a great 1719approach is to use a vector (or other sequential container) with 1720std::sort+std::unique to remove duplicates. This approach works really well if 1721your usage pattern has these two distinct phases (insert then query), and can be 1722coupled with a good choice of :ref:`sequential container <ds_sequential>`. 1723 1724This combination provides the several nice properties: the result data is 1725contiguous in memory (good for cache locality), has few allocations, is easy to 1726address (iterators in the final vector are just indices or pointers), and can be 1727efficiently queried with a standard binary search (e.g. 1728``std::lower_bound``; if you want the whole range of elements comparing 1729equal, use ``std::equal_range``). 1730 1731.. _dss_smallset: 1732 1733llvm/ADT/SmallSet.h 1734^^^^^^^^^^^^^^^^^^^ 1735 1736If you have a set-like data structure that is usually small and whose elements 1737are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has 1738space for N elements in place (thus, if the set is dynamically smaller than N, 1739no malloc traffic is required) and accesses them with a simple linear search. 1740When the set grows beyond N elements, it allocates a more expensive 1741representation that guarantees efficient access (for most types, it falls back 1742to :ref:`std::set <dss_set>`, but for pointers it uses something far better, 1743:ref:`SmallPtrSet <dss_smallptrset>`. 1744 1745The magic of this class is that it handles small sets extremely efficiently, but 1746gracefully handles extremely large sets without loss of efficiency. The 1747drawback is that the interface is quite small: it supports insertion, queries 1748and erasing, but does not support iteration. 1749 1750.. _dss_smallptrset: 1751 1752llvm/ADT/SmallPtrSet.h 1753^^^^^^^^^^^^^^^^^^^^^^ 1754 1755``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of 1756pointers is transparently implemented with a ``SmallPtrSet``), but also supports 1757iterators. If more than N insertions are performed, a single quadratically 1758probed hash table is allocated and grows as needed, providing extremely 1759efficient access (constant time insertion/deleting/queries with low constant 1760factors) and is very stingy with malloc traffic. 1761 1762Note that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet`` 1763are invalidated whenever an insertion occurs. Also, the values visited by the 1764iterators are not visited in sorted order. 1765 1766.. _dss_stringset: 1767 1768llvm/ADT/StringSet.h 1769^^^^^^^^^^^^^^^^^^^^ 1770 1771``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`, 1772and it allows efficient storage and retrieval of unique strings. 1773 1774Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports 1775iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you 1776need to call ``i->getKey()`` to access the item of the StringSet.) On the 1777other hand, ``StringSet`` doesn't support range-insertion and 1778copy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet 1779<dss_smallptrset>` do support. 1780 1781.. _dss_denseset: 1782 1783llvm/ADT/DenseSet.h 1784^^^^^^^^^^^^^^^^^^^ 1785 1786DenseSet is a simple quadratically probed hash table. It excels at supporting 1787small values: it uses a single allocation to hold all of the pairs that are 1788currently inserted in the set. DenseSet is a great way to unique small values 1789that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for 1790pointers). Note that DenseSet has the same requirements for the value type that 1791:ref:`DenseMap <dss_densemap>` has. 1792 1793.. _dss_sparseset: 1794 1795llvm/ADT/SparseSet.h 1796^^^^^^^^^^^^^^^^^^^^ 1797 1798SparseSet holds a small number of objects identified by unsigned keys of 1799moderate size. It uses a lot of memory, but provides operations that are almost 1800as fast as a vector. Typical keys are physical registers, virtual registers, or 1801numbered basic blocks. 1802 1803SparseSet is useful for algorithms that need very fast clear/find/insert/erase 1804and fast iteration over small sets. It is not intended for building composite 1805data structures. 1806 1807.. _dss_sparsemultiset: 1808 1809llvm/ADT/SparseMultiSet.h 1810^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1811 1812SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's 1813desirable attributes. Like SparseSet, it typically uses a lot of memory, but 1814provides operations that are almost as fast as a vector. Typical keys are 1815physical registers, virtual registers, or numbered basic blocks. 1816 1817SparseMultiSet is useful for algorithms that need very fast 1818clear/find/insert/erase of the entire collection, and iteration over sets of 1819elements sharing a key. It is often a more efficient choice than using composite 1820data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for 1821building composite data structures. 1822 1823.. _dss_FoldingSet: 1824 1825llvm/ADT/FoldingSet.h 1826^^^^^^^^^^^^^^^^^^^^^ 1827 1828FoldingSet is an aggregate class that is really good at uniquing 1829expensive-to-create or polymorphic objects. It is a combination of a chained 1830hash table with intrusive links (uniqued objects are required to inherit from 1831FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID 1832process. 1833 1834Consider a case where you want to implement a "getOrCreateFoo" method for a 1835complex object (for example, a node in the code generator). The client has a 1836description of **what** it wants to generate (it knows the opcode and all the 1837operands), but we don't want to 'new' a node, then try inserting it into a set 1838only to find out it already exists, at which point we would have to delete it 1839and return the node that already exists. 1840 1841To support this style of client, FoldingSet perform a query with a 1842FoldingSetNodeID (which wraps SmallVector) that can be used to describe the 1843element that we want to query for. The query either returns the element 1844matching the ID or it returns an opaque ID that indicates where insertion should 1845take place. Construction of the ID usually does not require heap traffic. 1846 1847Because FoldingSet uses intrusive links, it can support polymorphic objects in 1848the set (for example, you can have SDNode instances mixed with LoadSDNodes). 1849Because the elements are individually allocated, pointers to the elements are 1850stable: inserting or removing elements does not invalidate any pointers to other 1851elements. 1852 1853.. _dss_set: 1854 1855<set> 1856^^^^^ 1857 1858``std::set`` is a reasonable all-around set class, which is decent at many 1859things but great at nothing. std::set allocates memory for each element 1860inserted (thus it is very malloc intensive) and typically stores three pointers 1861per element in the set (thus adding a large amount of per-element space 1862overhead). It offers guaranteed log(n) performance, which is not particularly 1863fast from a complexity standpoint (particularly if the elements of the set are 1864expensive to compare, like strings), and has extremely high constant factors for 1865lookup, insertion and removal. 1866 1867The advantages of std::set are that its iterators are stable (deleting or 1868inserting an element from the set does not affect iterators or pointers to other 1869elements) and that iteration over the set is guaranteed to be in sorted order. 1870If the elements in the set are large, then the relative overhead of the pointers 1871and malloc traffic is not a big deal, but if the elements of the set are small, 1872std::set is almost never a good choice. 1873 1874.. _dss_setvector: 1875 1876llvm/ADT/SetVector.h 1877^^^^^^^^^^^^^^^^^^^^ 1878 1879LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a 1880set-like container along with a :ref:`Sequential Container <ds_sequential>` The 1881important property that this provides is efficient insertion with uniquing 1882(duplicate elements are ignored) with iteration support. It implements this by 1883inserting elements into both a set-like container and the sequential container, 1884using the set-like container for uniquing and the sequential container for 1885iteration. 1886 1887The difference between SetVector and other sets is that the order of iteration 1888is guaranteed to match the order of insertion into the SetVector. This property 1889is really important for things like sets of pointers. Because pointer values 1890are non-deterministic (e.g. vary across runs of the program on different 1891machines), iterating over the pointers in the set will not be in a well-defined 1892order. 1893 1894The drawback of SetVector is that it requires twice as much space as a normal 1895set and has the sum of constant factors from the set-like container and the 1896sequential container that it uses. Use it **only** if you need to iterate over 1897the elements in a deterministic order. SetVector is also expensive to delete 1898elements out of (linear time), unless you use its "pop_back" method, which is 1899faster. 1900 1901``SetVector`` is an adapter class that defaults to using ``std::vector`` and a 1902size 16 ``SmallSet`` for the underlying containers, so it is quite expensive. 1903However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class, 1904which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size. 1905If you use this, and if your sets are dynamically smaller than ``N``, you will 1906save a lot of heap traffic. 1907 1908.. _dss_uniquevector: 1909 1910llvm/ADT/UniqueVector.h 1911^^^^^^^^^^^^^^^^^^^^^^^ 1912 1913UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a 1914unique ID for each element inserted into the set. It internally contains a map 1915and a vector, and it assigns a unique ID for each value inserted into the set. 1916 1917UniqueVector is very expensive: its cost is the sum of the cost of maintaining 1918both the map and vector, it has high complexity, high constant factors, and 1919produces a lot of malloc traffic. It should be avoided. 1920 1921.. _dss_immutableset: 1922 1923llvm/ADT/ImmutableSet.h 1924^^^^^^^^^^^^^^^^^^^^^^^ 1925 1926ImmutableSet is an immutable (functional) set implementation based on an AVL 1927tree. Adding or removing elements is done through a Factory object and results 1928in the creation of a new ImmutableSet object. If an ImmutableSet already exists 1929with the given contents, then the existing one is returned; equality is compared 1930with a FoldingSetNodeID. The time and space complexity of add or remove 1931operations is logarithmic in the size of the original set. 1932 1933There is no method for returning an element of the set, you can only check for 1934membership. 1935 1936.. _dss_otherset: 1937 1938Other Set-Like Container Options 1939^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1940 1941The STL provides several other options, such as std::multiset and the various 1942"hash_set" like containers (whether from C++ TR1 or from the SGI library). We 1943never use hash_set and unordered_set because they are generally very expensive 1944(each insertion requires a malloc) and very non-portable. 1945 1946std::multiset is useful if you're not interested in elimination of duplicates, 1947but has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector 1948(where you don't delete duplicate entries) or some other approach is almost 1949always better. 1950 1951.. _ds_map: 1952 1953Map-Like Containers (std::map, DenseMap, etc) 1954--------------------------------------------- 1955 1956Map-like containers are useful when you want to associate data to a key. As 1957usual, there are a lot of different ways to do this. :) 1958 1959.. _dss_sortedvectormap: 1960 1961A sorted 'vector' 1962^^^^^^^^^^^^^^^^^ 1963 1964If your usage pattern follows a strict insert-then-query approach, you can 1965trivially use the same approach as :ref:`sorted vectors for set-like containers 1966<dss_sortedvectorset>`. The only difference is that your query function (which 1967uses std::lower_bound to get efficient log(n) lookup) should only compare the 1968key, not both the key and value. This yields the same advantages as sorted 1969vectors for sets. 1970 1971.. _dss_stringmap: 1972 1973llvm/ADT/StringMap.h 1974^^^^^^^^^^^^^^^^^^^^ 1975 1976Strings are commonly used as keys in maps, and they are difficult to support 1977efficiently: they are variable length, inefficient to hash and compare when 1978long, expensive to copy, etc. StringMap is a specialized container designed to 1979cope with these issues. It supports mapping an arbitrary range of bytes to an 1980arbitrary other object. 1981 1982The StringMap implementation uses a quadratically-probed hash table, where the 1983buckets store a pointer to the heap allocated entries (and some other stuff). 1984The entries in the map must be heap allocated because the strings are variable 1985length. The string data (key) and the element object (value) are stored in the 1986same allocation with the string data immediately after the element object. 1987This container guarantees the "``(char*)(&Value+1)``" points to the key string 1988for a value. 1989 1990The StringMap is very fast for several reasons: quadratic probing is very cache 1991efficient for lookups, the hash value of strings in buckets is not recomputed 1992when looking up an element, StringMap rarely has to touch the memory for 1993unrelated objects when looking up a value (even when hash collisions happen), 1994hash table growth does not recompute the hash values for strings already in the 1995table, and each pair in the map is store in a single allocation (the string data 1996is stored in the same allocation as the Value of a pair). 1997 1998StringMap also provides query methods that take byte ranges, so it only ever 1999copies a string if a value is inserted into the table. 2000 2001StringMap iteratation order, however, is not guaranteed to be deterministic, so 2002any uses which require that should instead use a std::map. 2003 2004.. _dss_indexmap: 2005 2006llvm/ADT/IndexedMap.h 2007^^^^^^^^^^^^^^^^^^^^^ 2008 2009IndexedMap is a specialized container for mapping small dense integers (or 2010values that can be mapped to small dense integers) to some other type. It is 2011internally implemented as a vector with a mapping function that maps the keys 2012to the dense integer range. 2013 2014This is useful for cases like virtual registers in the LLVM code generator: they 2015have a dense mapping that is offset by a compile-time constant (the first 2016virtual register ID). 2017 2018.. _dss_densemap: 2019 2020llvm/ADT/DenseMap.h 2021^^^^^^^^^^^^^^^^^^^ 2022 2023DenseMap is a simple quadratically probed hash table. It excels at supporting 2024small keys and values: it uses a single allocation to hold all of the pairs 2025that are currently inserted in the map. DenseMap is a great way to map 2026pointers to pointers, or map other small types to each other. 2027 2028There are several aspects of DenseMap that you should be aware of, however. 2029The iterators in a DenseMap are invalidated whenever an insertion occurs, 2030unlike map. Also, because DenseMap allocates space for a large number of 2031key/value pairs (it starts with 64 by default), it will waste a lot of space if 2032your keys or values are large. Finally, you must implement a partial 2033specialization of DenseMapInfo for the key that you want, if it isn't already 2034supported. This is required to tell DenseMap about two special marker values 2035(which can never be inserted into the map) that it needs internally. 2036 2037DenseMap's find_as() method supports lookup operations using an alternate key 2038type. This is useful in cases where the normal key type is expensive to 2039construct, but cheap to compare against. The DenseMapInfo is responsible for 2040defining the appropriate comparison and hashing methods for each alternate key 2041type used. 2042 2043.. _dss_valuemap: 2044 2045llvm/IR/ValueMap.h 2046^^^^^^^^^^^^^^^^^^^ 2047 2048ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping 2049``Value*``\ s (or subclasses) to another type. When a Value is deleted or 2050RAUW'ed, ValueMap will update itself so the new version of the key is mapped to 2051the same value, just as if the key were a WeakVH. You can configure exactly how 2052this happens, and what else happens on these two events, by passing a ``Config`` 2053parameter to the ValueMap template. 2054 2055.. _dss_intervalmap: 2056 2057llvm/ADT/IntervalMap.h 2058^^^^^^^^^^^^^^^^^^^^^^ 2059 2060IntervalMap is a compact map for small keys and values. It maps key intervals 2061instead of single keys, and it will automatically coalesce adjacent intervals. 2062When the map only contains a few intervals, they are stored in the map object 2063itself to avoid allocations. 2064 2065The IntervalMap iterators are quite big, so they should not be passed around as 2066STL iterators. The heavyweight iterators allow a smaller data structure. 2067 2068.. _dss_map: 2069 2070<map> 2071^^^^^ 2072 2073std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a 2074single allocation per pair inserted into the map, it offers log(n) lookup with 2075an extremely large constant factor, imposes a space penalty of 3 pointers per 2076pair in the map, etc. 2077 2078std::map is most useful when your keys or values are very large, if you need to 2079iterate over the collection in sorted order, or if you need stable iterators 2080into the map (i.e. they don't get invalidated if an insertion or deletion of 2081another element takes place). 2082 2083.. _dss_mapvector: 2084 2085llvm/ADT/MapVector.h 2086^^^^^^^^^^^^^^^^^^^^ 2087 2088``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The 2089main difference is that the iteration order is guaranteed to be the insertion 2090order, making it an easy (but somewhat expensive) solution for non-deterministic 2091iteration over maps of pointers. 2092 2093It is implemented by mapping from key to an index in a vector of key,value 2094pairs. This provides fast lookup and iteration, but has two main drawbacks: 2095the key is stored twice and removing elements takes linear time. If it is 2096necessary to remove elements, it's best to remove them in bulk using 2097``remove_if()``. 2098 2099.. _dss_inteqclasses: 2100 2101llvm/ADT/IntEqClasses.h 2102^^^^^^^^^^^^^^^^^^^^^^^ 2103 2104IntEqClasses provides a compact representation of equivalence classes of small 2105integers. Initially, each integer in the range 0..n-1 has its own equivalence 2106class. Classes can be joined by passing two class representatives to the 2107join(a, b) method. Two integers are in the same class when findLeader() returns 2108the same representative. 2109 2110Once all equivalence classes are formed, the map can be compressed so each 2111integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m 2112is the total number of equivalence classes. The map must be uncompressed before 2113it can be edited again. 2114 2115.. _dss_immutablemap: 2116 2117llvm/ADT/ImmutableMap.h 2118^^^^^^^^^^^^^^^^^^^^^^^ 2119 2120ImmutableMap is an immutable (functional) map implementation based on an AVL 2121tree. Adding or removing elements is done through a Factory object and results 2122in the creation of a new ImmutableMap object. If an ImmutableMap already exists 2123with the given key set, then the existing one is returned; equality is compared 2124with a FoldingSetNodeID. The time and space complexity of add or remove 2125operations is logarithmic in the size of the original map. 2126 2127.. _dss_othermap: 2128 2129Other Map-Like Container Options 2130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2131 2132The STL provides several other options, such as std::multimap and the various 2133"hash_map" like containers (whether from C++ TR1 or from the SGI library). We 2134never use hash_set and unordered_set because they are generally very expensive 2135(each insertion requires a malloc) and very non-portable. 2136 2137std::multimap is useful if you want to map a key to multiple values, but has all 2138the drawbacks of std::map. A sorted vector or some other approach is almost 2139always better. 2140 2141.. _ds_bit: 2142 2143Bit storage containers (BitVector, SparseBitVector) 2144--------------------------------------------------- 2145 2146Unlike the other containers, there are only two bit storage containers, and 2147choosing when to use each is relatively straightforward. 2148 2149One additional option is ``std::vector<bool>``: we discourage its use for two 2150reasons 1) the implementation in many common compilers (e.g. commonly 2151available versions of GCC) is extremely inefficient and 2) the C++ standards 2152committee is likely to deprecate this container and/or change it significantly 2153somehow. In any case, please don't use it. 2154 2155.. _dss_bitvector: 2156 2157BitVector 2158^^^^^^^^^ 2159 2160The BitVector container provides a dynamic size set of bits for manipulation. 2161It supports individual bit setting/testing, as well as set operations. The set 2162operations take time O(size of bitvector), but operations are performed one word 2163at a time, instead of one bit at a time. This makes the BitVector very fast for 2164set operations compared to other containers. Use the BitVector when you expect 2165the number of set bits to be high (i.e. a dense set). 2166 2167.. _dss_smallbitvector: 2168 2169SmallBitVector 2170^^^^^^^^^^^^^^ 2171 2172The SmallBitVector container provides the same interface as BitVector, but it is 2173optimized for the case where only a small number of bits, less than 25 or so, 2174are needed. It also transparently supports larger bit counts, but slightly less 2175efficiently than a plain BitVector, so SmallBitVector should only be used when 2176larger counts are rare. 2177 2178At this time, SmallBitVector does not support set operations (and, or, xor), and 2179its operator[] does not provide an assignable lvalue. 2180 2181.. _dss_sparsebitvector: 2182 2183SparseBitVector 2184^^^^^^^^^^^^^^^ 2185 2186The SparseBitVector container is much like BitVector, with one major difference: 2187Only the bits that are set, are stored. This makes the SparseBitVector much 2188more space efficient than BitVector when the set is sparse, as well as making 2189set operations O(number of set bits) instead of O(size of universe). The 2190downside to the SparseBitVector is that setting and testing of random bits is 2191O(N), and on large SparseBitVectors, this can be slower than BitVector. In our 2192implementation, setting or testing bits in sorted order (either forwards or 2193reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends 2194on size) of the current bit is also O(1). As a general statement, 2195testing/setting bits in a SparseBitVector is O(distance away from last set bit). 2196 2197.. _debugging: 2198 2199Debugging 2200========= 2201 2202A handful of `GDB pretty printers 2203<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are 2204provided for some of the core LLVM libraries. To use them, execute the 2205following (or add it to your ``~/.gdbinit``):: 2206 2207 source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py 2208 2209It also might be handy to enable the `print pretty 2210<http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to 2211avoid data structures being printed as a big block of text. 2212 2213.. _common: 2214 2215Helpful Hints for Common Operations 2216=================================== 2217 2218This section describes how to perform some very simple transformations of LLVM 2219code. This is meant to give examples of common idioms used, showing the 2220practical side of LLVM transformations. 2221 2222Because this is a "how-to" section, you should also read about the main classes 2223that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference 2224<coreclasses>` contains details and descriptions of the main classes that you 2225should know about. 2226 2227.. _inspection: 2228 2229Basic Inspection and Traversal Routines 2230--------------------------------------- 2231 2232The LLVM compiler infrastructure have many different data structures that may be 2233traversed. Following the example of the C++ standard template library, the 2234techniques used to traverse these various data structures are all basically the 2235same. For a enumerable sequence of values, the ``XXXbegin()`` function (or 2236method) returns an iterator to the start of the sequence, the ``XXXend()`` 2237function returns an iterator pointing to one past the last valid element of the 2238sequence, and there is some ``XXXiterator`` data type that is common between the 2239two operations. 2240 2241Because the pattern for iteration is common across many different aspects of the 2242program representation, the standard template library algorithms may be used on 2243them, and it is easier to remember how to iterate. First we show a few common 2244examples of the data structures that need to be traversed. Other data 2245structures are traversed in very similar ways. 2246 2247.. _iterate_function: 2248 2249Iterating over the ``BasicBlock`` in a ``Function`` 2250^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2251 2252It's quite common to have a ``Function`` instance that you'd like to transform 2253in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To 2254facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that 2255constitute the ``Function``. The following is an example that prints the name 2256of a ``BasicBlock`` and the number of ``Instruction``\ s it contains: 2257 2258.. code-block:: c++ 2259 2260 // func is a pointer to a Function instance 2261 for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) 2262 // Print out the name of the basic block if it has one, and then the 2263 // number of instructions that it contains 2264 errs() << "Basic block (name=" << i->getName() << ") has " 2265 << i->size() << " instructions.\n"; 2266 2267Note that i can be used as if it were a pointer for the purposes of invoking 2268member functions of the ``Instruction`` class. This is because the indirection 2269operator is overloaded for the iterator classes. In the above code, the 2270expression ``i->size()`` is exactly equivalent to ``(*i).size()`` just like 2271you'd expect. 2272 2273.. _iterate_basicblock: 2274 2275Iterating over the ``Instruction`` in a ``BasicBlock`` 2276^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2277 2278Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to 2279iterate over the individual instructions that make up ``BasicBlock``\ s. Here's 2280a code snippet that prints out each instruction in a ``BasicBlock``: 2281 2282.. code-block:: c++ 2283 2284 // blk is a pointer to a BasicBlock instance 2285 for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) 2286 // The next statement works since operator<<(ostream&,...) 2287 // is overloaded for Instruction& 2288 errs() << *i << "\n"; 2289 2290 2291However, this isn't really the best way to print out the contents of a 2292``BasicBlock``! Since the ostream operators are overloaded for virtually 2293anything you'll care about, you could have just invoked the print routine on the 2294basic block itself: ``errs() << *blk << "\n";``. 2295 2296.. _iterate_insiter: 2297 2298Iterating over the ``Instruction`` in a ``Function`` 2299^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2300 2301If you're finding that you commonly iterate over a ``Function``'s 2302``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s, 2303``InstIterator`` should be used instead. You'll need to include 2304``llvm/IR/InstIterator.h`` (`doxygen 2305<http://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate 2306``InstIterator``\ s explicitly in your code. Here's a small example that shows 2307how to dump all instructions in a function to the standard error stream: 2308 2309.. code-block:: c++ 2310 2311 #include "llvm/IR/InstIterator.h" 2312 2313 // F is a pointer to a Function instance 2314 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) 2315 errs() << *I << "\n"; 2316 2317Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with 2318its initial contents. For example, if you wanted to initialize a work list to 2319contain all instructions in a ``Function`` F, all you would need to do is 2320something like: 2321 2322.. code-block:: c++ 2323 2324 std::set<Instruction*> worklist; 2325 // or better yet, SmallPtrSet<Instruction*, 64> worklist; 2326 2327 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) 2328 worklist.insert(&*I); 2329 2330The STL set ``worklist`` would now contain all instructions in the ``Function`` 2331pointed to by F. 2332 2333.. _iterate_convert: 2334 2335Turning an iterator into a class pointer (and vice-versa) 2336^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2337 2338Sometimes, it'll be useful to grab a reference (or pointer) to a class instance 2339when all you've got at hand is an iterator. Well, extracting a reference or a 2340pointer from an iterator is very straight-forward. Assuming that ``i`` is a 2341``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``: 2342 2343.. code-block:: c++ 2344 2345 Instruction& inst = *i; // Grab reference to instruction reference 2346 Instruction* pinst = &*i; // Grab pointer to instruction reference 2347 const Instruction& inst = *j; 2348 2349However, the iterators you'll be working with in the LLVM framework are special: 2350they will automatically convert to a ptr-to-instance type whenever they need to. 2351Instead of dereferencing the iterator and then taking the address of the result, 2352you can simply assign the iterator to the proper pointer type and you get the 2353dereference and address-of operation as a result of the assignment (behind the 2354scenes, this is a result of overloading casting mechanisms). Thus the second 2355line of the last example, 2356 2357.. code-block:: c++ 2358 2359 Instruction *pinst = &*i; 2360 2361is semantically equivalent to 2362 2363.. code-block:: c++ 2364 2365 Instruction *pinst = i; 2366 2367It's also possible to turn a class pointer into the corresponding iterator, and 2368this is a constant time operation (very efficient). The following code snippet 2369illustrates use of the conversion constructors provided by LLVM iterators. By 2370using these, you can explicitly grab the iterator of something without actually 2371obtaining it via iteration over some structure: 2372 2373.. code-block:: c++ 2374 2375 void printNextInstruction(Instruction* inst) { 2376 BasicBlock::iterator it(inst); 2377 ++it; // After this line, it refers to the instruction after *inst 2378 if (it != inst->getParent()->end()) errs() << *it << "\n"; 2379 } 2380 2381Unfortunately, these implicit conversions come at a cost; they prevent these 2382iterators from conforming to standard iterator conventions, and thus from being 2383usable with standard algorithms and containers. For example, they prevent the 2384following code, where ``B`` is a ``BasicBlock``, from compiling: 2385 2386.. code-block:: c++ 2387 2388 llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end()); 2389 2390Because of this, these implicit conversions may be removed some day, and 2391``operator*`` changed to return a pointer instead of a reference. 2392 2393.. _iterate_complex: 2394 2395Finding call sites: a slightly more complex example 2396^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2397 2398Say that you're writing a FunctionPass and would like to count all the locations 2399in the entire module (that is, across every ``Function``) where a certain 2400function (i.e., some ``Function *``) is already in scope. As you'll learn 2401later, you may want to use an ``InstVisitor`` to accomplish this in a much more 2402straight-forward manner, but this example will allow us to explore how you'd do 2403it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we 2404want to do: 2405 2406.. code-block:: none 2407 2408 initialize callCounter to zero 2409 for each Function f in the Module 2410 for each BasicBlock b in f 2411 for each Instruction i in b 2412 if (i is a CallInst and calls the given function) 2413 increment callCounter 2414 2415And the actual code is (remember, because we're writing a ``FunctionPass``, our 2416``FunctionPass``-derived class simply has to override the ``runOnFunction`` 2417method): 2418 2419.. code-block:: c++ 2420 2421 Function* targetFunc = ...; 2422 2423 class OurFunctionPass : public FunctionPass { 2424 public: 2425 OurFunctionPass(): callCounter(0) { } 2426 2427 virtual runOnFunction(Function& F) { 2428 for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) { 2429 for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) { 2430 if (CallInst* callInst = dyn_cast<CallInst>(&*i)) { 2431 // We know we've encountered a call instruction, so we 2432 // need to determine if it's a call to the 2433 // function pointed to by m_func or not. 2434 if (callInst->getCalledFunction() == targetFunc) 2435 ++callCounter; 2436 } 2437 } 2438 } 2439 } 2440 2441 private: 2442 unsigned callCounter; 2443 }; 2444 2445.. _calls_and_invokes: 2446 2447Treating calls and invokes the same way 2448^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2449 2450You may have noticed that the previous example was a bit oversimplified in that 2451it did not deal with call sites generated by 'invoke' instructions. In this, 2452and in other situations, you may find that you want to treat ``CallInst``\ s and 2453``InvokeInst``\ s the same way, even though their most-specific common base 2454class is ``Instruction``, which includes lots of less closely-related things. 2455For these cases, LLVM provides a handy wrapper class called ``CallSite`` 2456(`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is 2457essentially a wrapper around an ``Instruction`` pointer, with some methods that 2458provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s. 2459 2460This class has "value semantics": it should be passed by value, not by reference 2461and it should not be dynamically allocated or deallocated using ``operator new`` 2462or ``operator delete``. It is efficiently copyable, assignable and 2463constructable, with costs equivalents to that of a bare pointer. If you look at 2464its definition, it has only a single pointer member. 2465 2466.. _iterate_chains: 2467 2468Iterating over def-use & use-def chains 2469^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2470 2471Frequently, we might have an instance of the ``Value`` class (`doxygen 2472<http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine 2473which ``User`` s use the ``Value``. The list of all ``User``\ s of a particular 2474``Value`` is called a *def-use* chain. For example, let's say we have a 2475``Function*`` named ``F`` to a particular function ``foo``. Finding all of the 2476instructions that *use* ``foo`` is as simple as iterating over the *def-use* 2477chain of ``F``: 2478 2479.. code-block:: c++ 2480 2481 Function *F = ...; 2482 2483 for (User *U : F->users()) { 2484 if (Instruction *Inst = dyn_cast<Instruction>(U)) { 2485 errs() << "F is used in instruction:\n"; 2486 errs() << *Inst << "\n"; 2487 } 2488 2489Alternatively, it's common to have an instance of the ``User`` Class (`doxygen 2490<http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what 2491``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is 2492known as a *use-def* chain. Instances of class ``Instruction`` are common 2493``User`` s, so we might want to iterate over all of the values that a particular 2494instruction uses (that is, the operands of the particular ``Instruction``): 2495 2496.. code-block:: c++ 2497 2498 Instruction *pi = ...; 2499 2500 for (Use &U : pi->operands()) { 2501 Value *v = U.get(); 2502 // ... 2503 } 2504 2505Declaring objects as ``const`` is an important tool of enforcing mutation free 2506algorithms (such as analyses, etc.). For this purpose above iterators come in 2507constant flavors as ``Value::const_use_iterator`` and 2508``Value::const_op_iterator``. They automatically arise when calling 2509``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively. 2510Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns 2511remain unchanged. 2512 2513.. _iterate_preds: 2514 2515Iterating over predecessors & successors of blocks 2516^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2517 2518Iterating over the predecessors and successors of a block is quite easy with the 2519routines defined in ``"llvm/IR/CFG.h"``. Just use code like this to 2520iterate over all predecessors of BB: 2521 2522.. code-block:: c++ 2523 2524 #include "llvm/IR/CFG.h" 2525 BasicBlock *BB = ...; 2526 2527 for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) { 2528 BasicBlock *Pred = *PI; 2529 // ... 2530 } 2531 2532Similarly, to iterate over successors use ``succ_iterator/succ_begin/succ_end``. 2533 2534.. _simplechanges: 2535 2536Making simple changes 2537--------------------- 2538 2539There are some primitive transformation operations present in the LLVM 2540infrastructure that are worth knowing about. When performing transformations, 2541it's fairly common to manipulate the contents of basic blocks. This section 2542describes some of the common methods for doing so and gives example code. 2543 2544.. _schanges_creating: 2545 2546Creating and inserting new ``Instruction``\ s 2547^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2548 2549*Instantiating Instructions* 2550 2551Creation of ``Instruction``\ s is straight-forward: simply call the constructor 2552for the kind of instruction to instantiate and provide the necessary parameters. 2553For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus: 2554 2555.. code-block:: c++ 2556 2557 AllocaInst* ai = new AllocaInst(Type::Int32Ty); 2558 2559will create an ``AllocaInst`` instance that represents the allocation of one 2560integer in the current stack frame, at run time. Each ``Instruction`` subclass 2561is likely to have varying default parameters which change the semantics of the 2562instruction, so refer to the `doxygen documentation for the subclass of 2563Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that 2564you're interested in instantiating. 2565 2566*Naming values* 2567 2568It is very useful to name the values of instructions when you're able to, as 2569this facilitates the debugging of your transformations. If you end up looking 2570at generated LLVM machine code, you definitely want to have logical names 2571associated with the results of instructions! By supplying a value for the 2572``Name`` (default) parameter of the ``Instruction`` constructor, you associate a 2573logical name with the result of the instruction's execution at run time. For 2574example, say that I'm writing a transformation that dynamically allocates space 2575for an integer on the stack, and that integer is going to be used as some kind 2576of index by some other code. To accomplish this, I place an ``AllocaInst`` at 2577the first point in the first ``BasicBlock`` of some ``Function``, and I'm 2578intending to use it within the same ``Function``. I might do: 2579 2580.. code-block:: c++ 2581 2582 AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc"); 2583 2584where ``indexLoc`` is now the logical name of the instruction's execution value, 2585which is a pointer to an integer on the run time stack. 2586 2587*Inserting instructions* 2588 2589There are essentially three ways to insert an ``Instruction`` into an existing 2590sequence of instructions that form a ``BasicBlock``: 2591 2592* Insertion into an explicit instruction list 2593 2594 Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``, 2595 and a newly-created instruction we wish to insert before ``*pi``, we do the 2596 following: 2597 2598 .. code-block:: c++ 2599 2600 BasicBlock *pb = ...; 2601 Instruction *pi = ...; 2602 Instruction *newInst = new Instruction(...); 2603 2604 pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb 2605 2606 Appending to the end of a ``BasicBlock`` is so common that the ``Instruction`` 2607 class and ``Instruction``-derived classes provide constructors which take a 2608 pointer to a ``BasicBlock`` to be appended to. For example code that looked 2609 like: 2610 2611 .. code-block:: c++ 2612 2613 BasicBlock *pb = ...; 2614 Instruction *newInst = new Instruction(...); 2615 2616 pb->getInstList().push_back(newInst); // Appends newInst to pb 2617 2618 becomes: 2619 2620 .. code-block:: c++ 2621 2622 BasicBlock *pb = ...; 2623 Instruction *newInst = new Instruction(..., pb); 2624 2625 which is much cleaner, especially if you are creating long instruction 2626 streams. 2627 2628* Insertion into an implicit instruction list 2629 2630 ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly 2631 associated with an existing instruction list: the instruction list of the 2632 enclosing basic block. Thus, we could have accomplished the same thing as the 2633 above code without being given a ``BasicBlock`` by doing: 2634 2635 .. code-block:: c++ 2636 2637 Instruction *pi = ...; 2638 Instruction *newInst = new Instruction(...); 2639 2640 pi->getParent()->getInstList().insert(pi, newInst); 2641 2642 In fact, this sequence of steps occurs so frequently that the ``Instruction`` 2643 class and ``Instruction``-derived classes provide constructors which take (as 2644 a default parameter) a pointer to an ``Instruction`` which the newly-created 2645 ``Instruction`` should precede. That is, ``Instruction`` constructors are 2646 capable of inserting the newly-created instance into the ``BasicBlock`` of a 2647 provided instruction, immediately before that instruction. Using an 2648 ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the 2649 above code becomes: 2650 2651 .. code-block:: c++ 2652 2653 Instruction* pi = ...; 2654 Instruction* newInst = new Instruction(..., pi); 2655 2656 which is much cleaner, especially if you're creating a lot of instructions and 2657 adding them to ``BasicBlock``\ s. 2658 2659* Insertion using an instance of ``IRBuilder`` 2660 2661 Inserting several ``Instruction``\ s can be quite laborious using the previous 2662 methods. The ``IRBuilder`` is a convenience class that can be used to add 2663 several instructions to the end of a ``BasicBlock`` or before a particular 2664 ``Instruction``. It also supports constant folding and renaming named 2665 registers (see ``IRBuilder``'s template arguments). 2666 2667 The example below demonstrates a very simple use of the ``IRBuilder`` where 2668 three instructions are inserted before the instruction ``pi``. The first two 2669 instructions are Call instructions and third instruction multiplies the return 2670 value of the two calls. 2671 2672 .. code-block:: c++ 2673 2674 Instruction *pi = ...; 2675 IRBuilder<> Builder(pi); 2676 CallInst* callOne = Builder.CreateCall(...); 2677 CallInst* callTwo = Builder.CreateCall(...); 2678 Value* result = Builder.CreateMul(callOne, callTwo); 2679 2680 The example below is similar to the above example except that the created 2681 ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``. 2682 2683 .. code-block:: c++ 2684 2685 BasicBlock *pb = ...; 2686 IRBuilder<> Builder(pb); 2687 CallInst* callOne = Builder.CreateCall(...); 2688 CallInst* callTwo = Builder.CreateCall(...); 2689 Value* result = Builder.CreateMul(callOne, callTwo); 2690 2691 See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``. 2692 2693 2694.. _schanges_deleting: 2695 2696Deleting Instructions 2697^^^^^^^^^^^^^^^^^^^^^ 2698 2699Deleting an instruction from an existing sequence of instructions that form a 2700BasicBlock_ is very straight-forward: just call the instruction's 2701``eraseFromParent()`` method. For example: 2702 2703.. code-block:: c++ 2704 2705 Instruction *I = .. ; 2706 I->eraseFromParent(); 2707 2708This unlinks the instruction from its containing basic block and deletes it. If 2709you'd just like to unlink the instruction from its containing basic block but 2710not delete it, you can use the ``removeFromParent()`` method. 2711 2712.. _schanges_replacing: 2713 2714Replacing an Instruction with another Value 2715^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2716 2717Replacing individual instructions 2718""""""""""""""""""""""""""""""""" 2719 2720Including "`llvm/Transforms/Utils/BasicBlockUtils.h 2721<http://llvm.org/doxygen/BasicBlockUtils_8h-source.html>`_" permits use of two 2722very useful replace functions: ``ReplaceInstWithValue`` and 2723``ReplaceInstWithInst``. 2724 2725.. _schanges_deleting_sub: 2726 2727Deleting Instructions 2728""""""""""""""""""""" 2729 2730* ``ReplaceInstWithValue`` 2731 2732 This function replaces all uses of a given instruction with a value, and then 2733 removes the original instruction. The following example illustrates the 2734 replacement of the result of a particular ``AllocaInst`` that allocates memory 2735 for a single integer with a null pointer to an integer. 2736 2737 .. code-block:: c++ 2738 2739 AllocaInst* instToReplace = ...; 2740 BasicBlock::iterator ii(instToReplace); 2741 2742 ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, 2743 Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty))); 2744 2745* ``ReplaceInstWithInst`` 2746 2747 This function replaces a particular instruction with another instruction, 2748 inserting the new instruction into the basic block at the location where the 2749 old instruction was, and replacing any uses of the old instruction with the 2750 new instruction. The following example illustrates the replacement of one 2751 ``AllocaInst`` with another. 2752 2753 .. code-block:: c++ 2754 2755 AllocaInst* instToReplace = ...; 2756 BasicBlock::iterator ii(instToReplace); 2757 2758 ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, 2759 new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt")); 2760 2761 2762Replacing multiple uses of Users and Values 2763""""""""""""""""""""""""""""""""""""""""""" 2764 2765You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to 2766change more than one use at a time. See the doxygen documentation for the 2767`Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class 2768<http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more 2769information. 2770 2771.. _schanges_deletingGV: 2772 2773Deleting GlobalVariables 2774^^^^^^^^^^^^^^^^^^^^^^^^ 2775 2776Deleting a global variable from a module is just as easy as deleting an 2777Instruction. First, you must have a pointer to the global variable that you 2778wish to delete. You use this pointer to erase it from its parent, the module. 2779For example: 2780 2781.. code-block:: c++ 2782 2783 GlobalVariable *GV = .. ; 2784 2785 GV->eraseFromParent(); 2786 2787 2788.. _create_types: 2789 2790How to Create Types 2791------------------- 2792 2793In generating IR, you may need some complex types. If you know these types 2794statically, you can use ``TypeBuilder<...>::get()``, defined in 2795``llvm/Support/TypeBuilder.h``, to retrieve them. ``TypeBuilder`` has two forms 2796depending on whether you're building types for cross-compilation or native 2797library use. ``TypeBuilder<T, true>`` requires that ``T`` be independent of the 2798host environment, meaning that it's built out of types from the ``llvm::types`` 2799(`doxygen <http://llvm.org/doxygen/namespacellvm_1_1types.html>`__) namespace 2800and pointers, functions, arrays, etc. built of those. ``TypeBuilder<T, false>`` 2801additionally allows native C types whose size may depend on the host compiler. 2802For example, 2803 2804.. code-block:: c++ 2805 2806 FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get(); 2807 2808is easier to read and write than the equivalent 2809 2810.. code-block:: c++ 2811 2812 std::vector<const Type*> params; 2813 params.push_back(PointerType::getUnqual(Type::Int32Ty)); 2814 FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false); 2815 2816See the `class comment 2817<http://llvm.org/doxygen/TypeBuilder_8h-source.html#l00001>`_ for more details. 2818 2819.. _threading: 2820 2821Threads and LLVM 2822================ 2823 2824This section describes the interaction of the LLVM APIs with multithreading, 2825both on the part of client applications, and in the JIT, in the hosted 2826application. 2827 2828Note that LLVM's support for multithreading is still relatively young. Up 2829through version 2.5, the execution of threaded hosted applications was 2830supported, but not threaded client access to the APIs. While this use case is 2831now supported, clients *must* adhere to the guidelines specified below to ensure 2832proper operation in multithreaded mode. 2833 2834Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic 2835intrinsics in order to support threaded operation. If you need a 2836multhreading-capable LLVM on a platform without a suitably modern system 2837compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and 2838using the resultant compiler to build a copy of LLVM with multithreading 2839support. 2840 2841.. _shutdown: 2842 2843Ending Execution with ``llvm_shutdown()`` 2844----------------------------------------- 2845 2846When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to 2847deallocate memory used for internal structures. 2848 2849.. _managedstatic: 2850 2851Lazy Initialization with ``ManagedStatic`` 2852------------------------------------------ 2853 2854``ManagedStatic`` is a utility class in LLVM used to implement static 2855initialization of static resources, such as the global type tables. In a 2856single-threaded environment, it implements a simple lazy initialization scheme. 2857When LLVM is compiled with support for multi-threading, however, it uses 2858double-checked locking to implement thread-safe lazy initialization. 2859 2860.. _llvmcontext: 2861 2862Achieving Isolation with ``LLVMContext`` 2863---------------------------------------- 2864 2865``LLVMContext`` is an opaque class in the LLVM API which clients can use to 2866operate multiple, isolated instances of LLVM concurrently within the same 2867address space. For instance, in a hypothetical compile-server, the compilation 2868of an individual translation unit is conceptually independent from all the 2869others, and it would be desirable to be able to compile incoming translation 2870units concurrently on independent server threads. Fortunately, ``LLVMContext`` 2871exists to enable just this kind of scenario! 2872 2873Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity 2874(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's 2875in-memory IR belongs to an ``LLVMContext``. Entities in different contexts 2876*cannot* interact with each other: ``Module``\ s in different contexts cannot be 2877linked together, ``Function``\ s cannot be added to ``Module``\ s in different 2878contexts, etc. What this means is that is is safe to compile on multiple 2879threads simultaneously, as long as no two threads operate on entities within the 2880same context. 2881 2882In practice, very few places in the API require the explicit specification of a 2883``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every 2884``Type`` carries a reference to its owning context, most other entities can 2885determine what context they belong to by looking at their own ``Type``. If you 2886are adding new entities to LLVM IR, please try to maintain this interface 2887design. 2888 2889.. _jitthreading: 2890 2891Threads and the JIT 2892------------------- 2893 2894LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple 2895threads can call ``ExecutionEngine::getPointerToFunction()`` or 2896``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run 2897code output by the JIT concurrently. The user must still ensure that only one 2898thread accesses IR in a given ``LLVMContext`` while another thread might be 2899modifying it. One way to do that is to always hold the JIT lock while accessing 2900IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s). 2901Another way is to only call ``getPointerToFunction()`` from the 2902``LLVMContext``'s thread. 2903 2904When the JIT is configured to compile lazily (using 2905``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race 2906condition <http://llvm.org/bugs/show_bug.cgi?id=5184>`_ in updating call sites 2907after a function is lazily-jitted. It's still possible to use the lazy JIT in a 2908threaded program if you ensure that only one thread at a time can call any 2909particular lazy stub and that the JIT lock guards any IR access, but we suggest 2910using only the eager JIT in threaded programs. 2911 2912.. _advanced: 2913 2914Advanced Topics 2915=============== 2916 2917This section describes some of the advanced or obscure API's that most clients 2918do not need to be aware of. These API's tend manage the inner workings of the 2919LLVM system, and only need to be accessed in unusual circumstances. 2920 2921.. _SymbolTable: 2922 2923The ``ValueSymbolTable`` class 2924------------------------------ 2925 2926The ``ValueSymbolTable`` (`doxygen 2927<http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides 2928a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for 2929naming value definitions. The symbol table can provide a name for any Value_. 2930 2931Note that the ``SymbolTable`` class should not be directly accessed by most 2932clients. It should only be used when iteration over the symbol table names 2933themselves are required, which is very special purpose. Note that not all LLVM 2934Value_\ s have names, and those without names (i.e. they have an empty name) do 2935not exist in the symbol table. 2936 2937Symbol tables support iteration over the values in the symbol table with 2938``begin/end/iterator`` and supports querying to see if a specific name is in the 2939symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no 2940public mutator methods, instead, simply call ``setName`` on a value, which will 2941autoinsert it into the appropriate symbol table. 2942 2943.. _UserLayout: 2944 2945The ``User`` and owned ``Use`` classes' memory layout 2946----------------------------------------------------- 2947 2948The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__) 2949class provides a basis for expressing the ownership of ``User`` towards other 2950`Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The 2951``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper 2952class is employed to do the bookkeeping and to facilitate *O(1)* addition and 2953removal. 2954 2955.. _Use2User: 2956 2957Interaction and relationship between ``User`` and ``Use`` objects 2958^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2959 2960A subclass of ``User`` can choose between incorporating its ``Use`` objects or 2961refer to them out-of-line by means of a pointer. A mixed variant (some ``Use`` 2962s inline others hung off) is impractical and breaks the invariant that the 2963``Use`` objects belonging to the same ``User`` form a contiguous array. 2964 2965We have 2 different layouts in the ``User`` (sub)classes: 2966 2967* Layout a) 2968 2969 The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User`` 2970 object and there are a fixed number of them. 2971 2972* Layout b) 2973 2974 The ``Use`` object(s) are referenced by a pointer to an array from the 2975 ``User`` object and there may be a variable number of them. 2976 2977As of v2.4 each layout still possesses a direct pointer to the start of the 2978array of ``Use``\ s. Though not mandatory for layout a), we stick to this 2979redundancy for the sake of simplicity. The ``User`` object also stores the 2980number of ``Use`` objects it has. (Theoretically this information can also be 2981calculated given the scheme presented below.) 2982 2983Special forms of allocation operators (``operator new``) enforce the following 2984memory layouts: 2985 2986* Layout a) is modelled by prepending the ``User`` object by the ``Use[]`` 2987 array. 2988 2989 .. code-block:: none 2990 2991 ...---.---.---.---.-------... 2992 | P | P | P | P | User 2993 '''---'---'---'---'-------''' 2994 2995* Layout b) is modelled by pointing at the ``Use[]`` array. 2996 2997 .. code-block:: none 2998 2999 .-------... 3000 | User 3001 '-------''' 3002 | 3003 v 3004 .---.---.---.---... 3005 | P | P | P | P | 3006 '---'---'---'---''' 3007 3008*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in 3009each* ``Use`` *object in the member* ``Use::Prev`` *)* 3010 3011.. _Waymarking: 3012 3013The waymarking algorithm 3014^^^^^^^^^^^^^^^^^^^^^^^^ 3015 3016Since the ``Use`` objects are deprived of the direct (back)pointer to their 3017``User`` objects, there must be a fast and exact method to recover it. This is 3018accomplished by the following scheme: 3019 3020A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev`` 3021allows to find the start of the ``User`` object: 3022 3023* ``00`` --- binary digit 0 3024 3025* ``01`` --- binary digit 1 3026 3027* ``10`` --- stop and calculate (``s``) 3028 3029* ``11`` --- full stop (``S``) 3030 3031Given a ``Use*``, all we have to do is to walk till we get a stop and we either 3032have a ``User`` immediately behind or we have to walk to the next stop picking 3033up digits and calculating the offset: 3034 3035.. code-block:: none 3036 3037 .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---------------- 3038 | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*) 3039 '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---------------- 3040 |+15 |+10 |+6 |+3 |+1 3041 | | | | | __> 3042 | | | | __________> 3043 | | | ______________________> 3044 | | ______________________________________> 3045 | __________________________________________________________> 3046 3047Only the significant number of bits need to be stored between the stops, so that 3048the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects 3049associated with a ``User``. 3050 3051.. _ReferenceImpl: 3052 3053Reference implementation 3054^^^^^^^^^^^^^^^^^^^^^^^^ 3055 3056The following literate Haskell fragment demonstrates the concept: 3057 3058.. code-block:: haskell 3059 3060 > import Test.QuickCheck 3061 > 3062 > digits :: Int -> [Char] -> [Char] 3063 > digits 0 acc = '0' : acc 3064 > digits 1 acc = '1' : acc 3065 > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc 3066 > 3067 > dist :: Int -> [Char] -> [Char] 3068 > dist 0 [] = ['S'] 3069 > dist 0 acc = acc 3070 > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r 3071 > dist n acc = dist (n - 1) $ dist 1 acc 3072 > 3073 > takeLast n ss = reverse $ take n $ reverse ss 3074 > 3075 > test = takeLast 40 $ dist 20 [] 3076 > 3077 3078Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"`` 3079 3080The reverse algorithm computes the length of the string just by examining a 3081certain prefix: 3082 3083.. code-block:: haskell 3084 3085 > pref :: [Char] -> Int 3086 > pref "S" = 1 3087 > pref ('s':'1':rest) = decode 2 1 rest 3088 > pref (_:rest) = 1 + pref rest 3089 > 3090 > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest 3091 > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest 3092 > decode walk acc _ = walk + acc 3093 > 3094 3095Now, as expected, printing <pref test> gives ``40``. 3096 3097We can *quickCheck* this with following property: 3098 3099.. code-block:: haskell 3100 3101 > testcase = dist 2000 [] 3102 > testcaseLength = length testcase 3103 > 3104 > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr 3105 > where arr = takeLast n testcase 3106 > 3107 3108As expected <quickCheck identityProp> gives: 3109 3110:: 3111 3112 *Main> quickCheck identityProp 3113 OK, passed 100 tests. 3114 3115Let's be a bit more exhaustive: 3116 3117.. code-block:: haskell 3118 3119 > 3120 > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p 3121 > 3122 3123And here is the result of <deepCheck identityProp>: 3124 3125:: 3126 3127 *Main> deepCheck identityProp 3128 OK, passed 500 tests. 3129 3130.. _Tagging: 3131 3132Tagging considerations 3133^^^^^^^^^^^^^^^^^^^^^^ 3134 3135To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never 3136change after being set up, setters of ``Use::Prev`` must re-tag the new 3137``Use**`` on every modification. Accordingly getters must strip the tag bits. 3138 3139For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit 3140set). Following this pointer brings us to the ``User``. A portable trick 3141ensures that the first bytes of ``User`` (if interpreted as a pointer) never has 3142the LSBit set. (Portability is relying on the fact that all known compilers 3143place the ``vptr`` in the first word of the instances.) 3144 3145.. _polymorphism: 3146 3147Designing Type Hiercharies and Polymorphic Interfaces 3148----------------------------------------------------- 3149 3150There are two different design patterns that tend to result in the use of 3151virtual dispatch for methods in a type hierarchy in C++ programs. The first is 3152a genuine type hierarchy where different types in the hierarchy model 3153a specific subset of the functionality and semantics, and these types nest 3154strictly within each other. Good examples of this can be seen in the ``Value`` 3155or ``Type`` type hierarchies. 3156 3157A second is the desire to dispatch dynamically across a collection of 3158polymorphic interface implementations. This latter use case can be modeled with 3159virtual dispatch and inheritance by defining an abstract interface base class 3160which all implementations derive from and override. However, this 3161implementation strategy forces an **"is-a"** relationship to exist that is not 3162actually meaningful. There is often not some nested hierarchy of useful 3163generalizations which code might interact with and move up and down. Instead, 3164there is a singular interface which is dispatched across a range of 3165implementations. 3166 3167The preferred implementation strategy for the second use case is that of 3168generic programming (sometimes called "compile-time duck typing" or "static 3169polymorphism"). For example, a template over some type parameter ``T`` can be 3170instantiated across any particular implementation that conforms to the 3171interface or *concept*. A good example here is the highly generic properties of 3172any type which models a node in a directed graph. LLVM models these primarily 3173through templates and generic programming. Such templates include the 3174``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism 3175truly needs **dynamic** dispatch you can generalize it using a technique 3176called *concept-based polymorphism*. This pattern emulates the interfaces and 3177behaviors of templates using a very limited form of virtual dispatch for type 3178erasure inside its implementation. You can find examples of this technique in 3179the ``PassManager.h`` system, and there is a more detailed introduction to it 3180by Sean Parent in several of his talks and papers: 3181 3182#. `Inheritance Is The Base Class of Evil 3183 <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_ 3184 - The GoingNative 2013 talk describing this technique, and probably the best 3185 place to start. 3186#. `Value Semantics and Concepts-based Polymorphism 3187 <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk 3188 describing this technique in more detail. 3189#. `Sean Parent's Papers and Presentations 3190 <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_ 3191 - A Github project full of links to slides, video, and sometimes code. 3192 3193When deciding between creating a type hierarchy (with either tagged or virtual 3194dispatch) and using templates or concepts-based polymorphism, consider whether 3195there is some refinement of an abstract base class which is a semantically 3196meaningful type on an interface boundary. If anything more refined than the 3197root abstract interface is meaningless to talk about as a partial extension of 3198the semantic model, then your use case likely fits better with polymorphism and 3199you should avoid using virtual dispatch. However, there may be some exigent 3200circumstances that require one technique or the other to be used. 3201 3202If you do need to introduce a type hierarchy, we prefer to use explicitly 3203closed type hierarchies with manual tagged dispatch and/or RTTI rather than the 3204open inheritance model and virtual dispatch that is more common in C++ code. 3205This is because LLVM rarely encourages library consumers to extend its core 3206types, and leverages the closed and tag-dispatched nature of its hierarchies to 3207generate significantly more efficient code. We have also found that a large 3208amount of our usage of type hierarchies fits better with tag-based pattern 3209matching rather than dynamic dispatch across a common interface. Within LLVM we 3210have built custom helpers to facilitate this design. See this document's 3211section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document 3212<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this 3213pattern for use with the LLVM helpers. 3214 3215.. _abi_breaking_checks: 3216 3217ABI Breaking Checks 3218------------------- 3219 3220Checks and asserts that alter the LLVM C++ ABI are predicated on the 3221preprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM 3222libraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI 3223compatible LLVM libraries built without it defined. By default, 3224turning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS` 3225so a default +Asserts build is not ABI compatible with a 3226default -Asserts build. Clients that want ABI compatibility 3227between +Asserts and -Asserts builds should use the CMake or autoconf 3228build systems to set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently 3229of `LLVM_ENABLE_ASSERTIONS`. 3230 3231.. _coreclasses: 3232 3233The Core LLVM Class Hierarchy Reference 3234======================================= 3235 3236``#include "llvm/IR/Type.h"`` 3237 3238header source: `Type.h <http://llvm.org/doxygen/Type_8h-source.html>`_ 3239 3240doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_ 3241 3242The Core LLVM classes are the primary means of representing the program being 3243inspected or transformed. The core LLVM classes are defined in header files in 3244the ``include/llvm/IR`` directory, and implemented in the ``lib/IR`` 3245directory. It's worth noting that, for historical reasons, this library is 3246called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect. 3247 3248.. _Type: 3249 3250The Type class and Derived Types 3251-------------------------------- 3252 3253``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``. 3254``Type`` cannot be instantiated directly but only through its subclasses. 3255Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and 3256``DoubleType``) have hidden subclasses. They are hidden because they offer no 3257useful functionality beyond what the ``Type`` class offers except to distinguish 3258themselves from other subclasses of ``Type``. 3259 3260All other types are subclasses of ``DerivedType``. Types can be named, but this 3261is not a requirement. There exists exactly one instance of a given shape at any 3262one time. This allows type equality to be performed with address equality of 3263the Type Instance. That is, given two ``Type*`` values, the types are identical 3264if the pointers are identical. 3265 3266.. _m_Type: 3267 3268Important Public Methods 3269^^^^^^^^^^^^^^^^^^^^^^^^ 3270 3271* ``bool isIntegerTy() const``: Returns true for any integer type. 3272 3273* ``bool isFloatingPointTy()``: Return true if this is one of the five 3274 floating point types. 3275 3276* ``bool isSized()``: Return true if the type has known size. Things 3277 that don't have a size are abstract types, labels and void. 3278 3279.. _derivedtypes: 3280 3281Important Derived Types 3282^^^^^^^^^^^^^^^^^^^^^^^ 3283 3284``IntegerType`` 3285 Subclass of DerivedType that represents integer types of any bit width. Any 3286 bit width between ``IntegerType::MIN_INT_BITS`` (1) and 3287 ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented. 3288 3289 * ``static const IntegerType* get(unsigned NumBits)``: get an integer 3290 type of a specific bit width. 3291 3292 * ``unsigned getBitWidth() const``: Get the bit width of an integer type. 3293 3294``SequentialType`` 3295 This is subclassed by ArrayType and VectorType. 3296 3297 * ``const Type * getElementType() const``: Returns the type of each 3298 of the elements in the sequential type. 3299 3300 * ``uint64_t getNumElements() const``: Returns the number of elements 3301 in the sequential type. 3302 3303``ArrayType`` 3304 This is a subclass of SequentialType and defines the interface for array 3305 types. 3306 3307``PointerType`` 3308 Subclass of Type for pointer types. 3309 3310``VectorType`` 3311 Subclass of SequentialType for vector types. A vector type is similar to an 3312 ArrayType but is distinguished because it is a first class type whereas 3313 ArrayType is not. Vector types are used for vector operations and are usually 3314 small vectors of an integer or floating point type. 3315 3316``StructType`` 3317 Subclass of DerivedTypes for struct types. 3318 3319.. _FunctionType: 3320 3321``FunctionType`` 3322 Subclass of DerivedTypes for function types. 3323 3324 * ``bool isVarArg() const``: Returns true if it's a vararg function. 3325 3326 * ``const Type * getReturnType() const``: Returns the return type of the 3327 function. 3328 3329 * ``const Type * getParamType (unsigned i)``: Returns the type of the ith 3330 parameter. 3331 3332 * ``const unsigned getNumParams() const``: Returns the number of formal 3333 parameters. 3334 3335.. _Module: 3336 3337The ``Module`` class 3338-------------------- 3339 3340``#include "llvm/IR/Module.h"`` 3341 3342header source: `Module.h <http://llvm.org/doxygen/Module_8h-source.html>`_ 3343 3344doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_ 3345 3346The ``Module`` class represents the top level structure present in LLVM 3347programs. An LLVM module is effectively either a translation unit of the 3348original program or a combination of several translation units merged by the 3349linker. The ``Module`` class keeps track of a list of :ref:`Function 3350<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_. 3351Additionally, it contains a few helpful member functions that try to make common 3352operations easy. 3353 3354.. _m_Module: 3355 3356Important Public Members of the ``Module`` class 3357^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3358 3359* ``Module::Module(std::string name = "")`` 3360 3361 Constructing a Module_ is easy. You can optionally provide a name for it 3362 (probably based on the name of the translation unit). 3363 3364* | ``Module::iterator`` - Typedef for function list iterator 3365 | ``Module::const_iterator`` - Typedef for const_iterator. 3366 | ``begin()``, ``end()``, ``size()``, ``empty()`` 3367 3368 These are forwarding methods that make it easy to access the contents of a 3369 ``Module`` object's :ref:`Function <c_Function>` list. 3370 3371* ``Module::FunctionListType &getFunctionList()`` 3372 3373 Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use 3374 when you need to update the list or perform a complex action that doesn't have 3375 a forwarding method. 3376 3377---------------- 3378 3379* | ``Module::global_iterator`` - Typedef for global variable list iterator 3380 | ``Module::const_global_iterator`` - Typedef for const_iterator. 3381 | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()`` 3382 3383 These are forwarding methods that make it easy to access the contents of a 3384 ``Module`` object's GlobalVariable_ list. 3385 3386* ``Module::GlobalListType &getGlobalList()`` 3387 3388 Returns the list of GlobalVariable_\ s. This is necessary to use when you 3389 need to update the list or perform a complex action that doesn't have a 3390 forwarding method. 3391 3392---------------- 3393 3394* ``SymbolTable *getSymbolTable()`` 3395 3396 Return a reference to the SymbolTable_ for this ``Module``. 3397 3398---------------- 3399 3400* ``Function *getFunction(StringRef Name) const`` 3401 3402 Look up the specified function in the ``Module`` SymbolTable_. If it does not 3403 exist, return ``null``. 3404 3405* ``Function *getOrInsertFunction(const std::string &Name, const FunctionType 3406 *T)`` 3407 3408 Look up the specified function in the ``Module`` SymbolTable_. If it does not 3409 exist, add an external declaration for the function and return it. 3410 3411* ``std::string getTypeName(const Type *Ty)`` 3412 3413 If there is at least one entry in the SymbolTable_ for the specified Type_, 3414 return it. Otherwise return the empty string. 3415 3416* ``bool addTypeName(const std::string &Name, const Type *Ty)`` 3417 3418 Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is 3419 already an entry for this name, true is returned and the SymbolTable_ is not 3420 modified. 3421 3422.. _Value: 3423 3424The ``Value`` class 3425------------------- 3426 3427``#include "llvm/IR/Value.h"`` 3428 3429header source: `Value.h <http://llvm.org/doxygen/Value_8h-source.html>`_ 3430 3431doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ 3432 3433The ``Value`` class is the most important class in the LLVM Source base. It 3434represents a typed value that may be used (among other things) as an operand to 3435an instruction. There are many different types of ``Value``\ s, such as 3436Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function 3437<c_Function>`\ s are ``Value``\ s. 3438 3439A particular ``Value`` may be used many times in the LLVM representation for a 3440program. For example, an incoming argument to a function (represented with an 3441instance of the Argument_ class) is "used" by every instruction in the function 3442that references the argument. To keep track of this relationship, the ``Value`` 3443class keeps a list of all of the ``User``\ s that is using it (the User_ class 3444is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s). 3445This use list is how LLVM represents def-use information in the program, and is 3446accessible through the ``use_*`` methods, shown below. 3447 3448Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this 3449Type_ is available through the ``getType()`` method. In addition, all LLVM 3450values can be named. The "name" of the ``Value`` is a symbolic string printed 3451in the LLVM code: 3452 3453.. code-block:: llvm 3454 3455 %foo = add i32 1, 2 3456 3457.. _nameWarning: 3458 3459The name of this instruction is "foo". **NOTE** that the name of any value may 3460be missing (an empty string), so names should **ONLY** be used for debugging 3461(making the source code easier to read, debugging printouts), they should not be 3462used to keep track of values or map between them. For this purpose, use a 3463``std::map`` of pointers to the ``Value`` itself instead. 3464 3465One important aspect of LLVM is that there is no distinction between an SSA 3466variable and the operation that produces it. Because of this, any reference to 3467the value produced by an instruction (or the value available as an incoming 3468argument, for example) is represented as a direct pointer to the instance of the 3469class that represents this value. Although this may take some getting used to, 3470it simplifies the representation and makes it easier to manipulate. 3471 3472.. _m_Value: 3473 3474Important Public Members of the ``Value`` class 3475^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3476 3477* | ``Value::use_iterator`` - Typedef for iterator over the use-list 3478 | ``Value::const_use_iterator`` - Typedef for const_iterator over the 3479 use-list 3480 | ``unsigned use_size()`` - Returns the number of users of the value. 3481 | ``bool use_empty()`` - Returns true if there are no users. 3482 | ``use_iterator use_begin()`` - Get an iterator to the start of the 3483 use-list. 3484 | ``use_iterator use_end()`` - Get an iterator to the end of the use-list. 3485 | ``User *use_back()`` - Returns the last element in the list. 3486 3487 These methods are the interface to access the def-use information in LLVM. 3488 As with all other iterators in LLVM, the naming conventions follow the 3489 conventions defined by the STL_. 3490 3491* ``Type *getType() const`` 3492 This method returns the Type of the Value. 3493 3494* | ``bool hasName() const`` 3495 | ``std::string getName() const`` 3496 | ``void setName(const std::string &Name)`` 3497 3498 This family of methods is used to access and assign a name to a ``Value``, be 3499 aware of the :ref:`precaution above <nameWarning>`. 3500 3501* ``void replaceAllUsesWith(Value *V)`` 3502 3503 This method traverses the use list of a ``Value`` changing all User_\ s of the 3504 current value to refer to "``V``" instead. For example, if you detect that an 3505 instruction always produces a constant value (for example through constant 3506 folding), you can replace all uses of the instruction with the constant like 3507 this: 3508 3509 .. code-block:: c++ 3510 3511 Inst->replaceAllUsesWith(ConstVal); 3512 3513.. _User: 3514 3515The ``User`` class 3516------------------ 3517 3518``#include "llvm/IR/User.h"`` 3519 3520header source: `User.h <http://llvm.org/doxygen/User_8h-source.html>`_ 3521 3522doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_ 3523 3524Superclass: Value_ 3525 3526The ``User`` class is the common base class of all LLVM nodes that may refer to 3527``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s 3528that the User is referring to. The ``User`` class itself is a subclass of 3529``Value``. 3530 3531The operands of a ``User`` point directly to the LLVM ``Value`` that it refers 3532to. Because LLVM uses Static Single Assignment (SSA) form, there can only be 3533one definition referred to, allowing this direct connection. This connection 3534provides the use-def information in LLVM. 3535 3536.. _m_User: 3537 3538Important Public Members of the ``User`` class 3539^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3540 3541The ``User`` class exposes the operand list in two ways: through an index access 3542interface and through an iterator based interface. 3543 3544* | ``Value *getOperand(unsigned i)`` 3545 | ``unsigned getNumOperands()`` 3546 3547 These two methods expose the operands of the ``User`` in a convenient form for 3548 direct access. 3549 3550* | ``User::op_iterator`` - Typedef for iterator over the operand list 3551 | ``op_iterator op_begin()`` - Get an iterator to the start of the operand 3552 list. 3553 | ``op_iterator op_end()`` - Get an iterator to the end of the operand list. 3554 3555 Together, these methods make up the iterator based interface to the operands 3556 of a ``User``. 3557 3558 3559.. _Instruction: 3560 3561The ``Instruction`` class 3562------------------------- 3563 3564``#include "llvm/IR/Instruction.h"`` 3565 3566header source: `Instruction.h 3567<http://llvm.org/doxygen/Instruction_8h-source.html>`_ 3568 3569doxygen info: `Instruction Class 3570<http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ 3571 3572Superclasses: User_, Value_ 3573 3574The ``Instruction`` class is the common base class for all LLVM instructions. 3575It provides only a few methods, but is a very commonly used class. The primary 3576data tracked by the ``Instruction`` class itself is the opcode (instruction 3577type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To 3578represent a specific type of instruction, one of many subclasses of 3579``Instruction`` are used. 3580 3581Because the ``Instruction`` class subclasses the User_ class, its operands can 3582be accessed in the same way as for other ``User``\ s (with the 3583``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods). 3584An important file for the ``Instruction`` class is the ``llvm/Instruction.def`` 3585file. This file contains some meta-data about the various different types of 3586instructions in LLVM. It describes the enum values that are used as opcodes 3587(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the 3588concrete sub-classes of ``Instruction`` that implement the instruction (for 3589example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this 3590file confuses doxygen, so these enum values don't show up correctly in the 3591`doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_. 3592 3593.. _s_Instruction: 3594 3595Important Subclasses of the ``Instruction`` class 3596^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3597 3598.. _BinaryOperator: 3599 3600* ``BinaryOperator`` 3601 3602 This subclasses represents all two operand instructions whose operands must be 3603 the same type, except for the comparison instructions. 3604 3605.. _CastInst: 3606 3607* ``CastInst`` 3608 This subclass is the parent of the 12 casting instructions. It provides 3609 common operations on cast instructions. 3610 3611.. _CmpInst: 3612 3613* ``CmpInst`` 3614 3615 This subclass respresents the two comparison instructions, 3616 `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and 3617 `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands). 3618 3619.. _TerminatorInst: 3620 3621* ``TerminatorInst`` 3622 3623 This subclass is the parent of all terminator instructions (those which can 3624 terminate a block). 3625 3626.. _m_Instruction: 3627 3628Important Public Members of the ``Instruction`` class 3629^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3630 3631* ``BasicBlock *getParent()`` 3632 3633 Returns the BasicBlock_ that this 3634 ``Instruction`` is embedded into. 3635 3636* ``bool mayWriteToMemory()`` 3637 3638 Returns true if the instruction writes to memory, i.e. it is a ``call``, 3639 ``free``, ``invoke``, or ``store``. 3640 3641* ``unsigned getOpcode()`` 3642 3643 Returns the opcode for the ``Instruction``. 3644 3645* ``Instruction *clone() const`` 3646 3647 Returns another instance of the specified instruction, identical in all ways 3648 to the original except that the instruction has no parent (i.e. it's not 3649 embedded into a BasicBlock_), and it has no name. 3650 3651.. _Constant: 3652 3653The ``Constant`` class and subclasses 3654------------------------------------- 3655 3656Constant represents a base class for different types of constants. It is 3657subclassed by ConstantInt, ConstantArray, etc. for representing the various 3658types of Constants. GlobalValue_ is also a subclass, which represents the 3659address of a global variable or function. 3660 3661.. _s_Constant: 3662 3663Important Subclasses of Constant 3664^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3665 3666* ConstantInt : This subclass of Constant represents an integer constant of 3667 any width. 3668 3669 * ``const APInt& getValue() const``: Returns the underlying 3670 value of this constant, an APInt value. 3671 3672 * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an 3673 int64_t via sign extension. If the value (not the bit width) of the APInt 3674 is too large to fit in an int64_t, an assertion will result. For this 3675 reason, use of this method is discouraged. 3676 3677 * ``uint64_t getZExtValue() const``: Converts the underlying APInt value 3678 to a uint64_t via zero extension. IF the value (not the bit width) of the 3679 APInt is too large to fit in a uint64_t, an assertion will result. For this 3680 reason, use of this method is discouraged. 3681 3682 * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt 3683 object that represents the value provided by ``Val``. The type is implied 3684 as the IntegerType that corresponds to the bit width of ``Val``. 3685 3686 * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the 3687 ConstantInt object that represents the value provided by ``Val`` for integer 3688 type ``Ty``. 3689 3690* ConstantFP : This class represents a floating point constant. 3691 3692 * ``double getValue() const``: Returns the underlying value of this constant. 3693 3694* ConstantArray : This represents a constant array. 3695 3696 * ``const std::vector<Use> &getValues() const``: Returns a vector of 3697 component constants that makeup this array. 3698 3699* ConstantStruct : This represents a constant struct. 3700 3701 * ``const std::vector<Use> &getValues() const``: Returns a vector of 3702 component constants that makeup this array. 3703 3704* GlobalValue : This represents either a global variable or a function. In 3705 either case, the value is a constant fixed address (after linking). 3706 3707.. _GlobalValue: 3708 3709The ``GlobalValue`` class 3710------------------------- 3711 3712``#include "llvm/IR/GlobalValue.h"`` 3713 3714header source: `GlobalValue.h 3715<http://llvm.org/doxygen/GlobalValue_8h-source.html>`_ 3716 3717doxygen info: `GlobalValue Class 3718<http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_ 3719 3720Superclasses: Constant_, User_, Value_ 3721 3722Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the 3723only LLVM values that are visible in the bodies of all :ref:`Function 3724<c_Function>`\ s. Because they are visible at global scope, they are also 3725subject to linking with other globals defined in different translation units. 3726To control the linking process, ``GlobalValue``\ s know their linkage rules. 3727Specifically, ``GlobalValue``\ s know whether they have internal or external 3728linkage, as defined by the ``LinkageTypes`` enumeration. 3729 3730If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C), 3731it is not visible to code outside the current translation unit, and does not 3732participate in linking. If it has external linkage, it is visible to external 3733code, and does participate in linking. In addition to linkage information, 3734``GlobalValue``\ s keep track of which Module_ they are currently part of. 3735 3736Because ``GlobalValue``\ s are memory objects, they are always referred to by 3737their **address**. As such, the Type_ of a global is always a pointer to its 3738contents. It is important to remember this when using the ``GetElementPtrInst`` 3739instruction because this pointer must be dereferenced first. For example, if 3740you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array 3741of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to 3742that array. Although the address of the first element of this array and the 3743value of the ``GlobalVariable`` are the same, they have different types. The 3744``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is 3745``i32.`` Because of this, accessing a global value requires you to dereference 3746the pointer with ``GetElementPtrInst`` first, then its elements can be accessed. 3747This is explained in the `LLVM Language Reference Manual 3748<LangRef.html#globalvars>`_. 3749 3750.. _m_GlobalValue: 3751 3752Important Public Members of the ``GlobalValue`` class 3753^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3754 3755* | ``bool hasInternalLinkage() const`` 3756 | ``bool hasExternalLinkage() const`` 3757 | ``void setInternalLinkage(bool HasInternalLinkage)`` 3758 3759 These methods manipulate the linkage characteristics of the ``GlobalValue``. 3760 3761* ``Module *getParent()`` 3762 3763 This returns the Module_ that the 3764 GlobalValue is currently embedded into. 3765 3766.. _c_Function: 3767 3768The ``Function`` class 3769---------------------- 3770 3771``#include "llvm/IR/Function.h"`` 3772 3773header source: `Function.h <http://llvm.org/doxygen/Function_8h-source.html>`_ 3774 3775doxygen info: `Function Class 3776<http://llvm.org/doxygen/classllvm_1_1Function.html>`_ 3777 3778Superclasses: GlobalValue_, Constant_, User_, Value_ 3779 3780The ``Function`` class represents a single procedure in LLVM. It is actually 3781one of the more complex classes in the LLVM hierarchy because it must keep track 3782of a large amount of data. The ``Function`` class keeps track of a list of 3783BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_. 3784 3785The list of BasicBlock_\ s is the most commonly used part of ``Function`` 3786objects. The list imposes an implicit ordering of the blocks in the function, 3787which indicate how the code will be laid out by the backend. Additionally, the 3788first BasicBlock_ is the implicit entry node for the ``Function``. It is not 3789legal in LLVM to explicitly branch to this initial block. There are no implicit 3790exit nodes, and in fact there may be multiple exit nodes from a single 3791``Function``. If the BasicBlock_ list is empty, this indicates that the 3792``Function`` is actually a function declaration: the actual body of the function 3793hasn't been linked in yet. 3794 3795In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track 3796of the list of formal Argument_\ s that the function receives. This container 3797manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does 3798for the BasicBlock_\ s. 3799 3800The SymbolTable_ is a very rarely used LLVM feature that is only used when you 3801have to look up a value by name. Aside from that, the SymbolTable_ is used 3802internally to make sure that there are not conflicts between the names of 3803Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body. 3804 3805Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The 3806value of the function is its address (after linking) which is guaranteed to be 3807constant. 3808 3809.. _m_Function: 3810 3811Important Public Members of the ``Function`` 3812^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3813 3814* ``Function(const FunctionType *Ty, LinkageTypes Linkage, 3815 const std::string &N = "", Module* Parent = 0)`` 3816 3817 Constructor used when you need to create new ``Function``\ s to add the 3818 program. The constructor must specify the type of the function to create and 3819 what type of linkage the function should have. The FunctionType_ argument 3820 specifies the formal arguments and return value for the function. The same 3821 FunctionType_ value can be used to create multiple functions. The ``Parent`` 3822 argument specifies the Module in which the function is defined. If this 3823 argument is provided, the function will automatically be inserted into that 3824 module's list of functions. 3825 3826* ``bool isDeclaration()`` 3827 3828 Return whether or not the ``Function`` has a body defined. If the function is 3829 "external", it does not have a body, and thus must be resolved by linking with 3830 a function defined in a different translation unit. 3831 3832* | ``Function::iterator`` - Typedef for basic block list iterator 3833 | ``Function::const_iterator`` - Typedef for const_iterator. 3834 | ``begin()``, ``end()``, ``size()``, ``empty()`` 3835 3836 These are forwarding methods that make it easy to access the contents of a 3837 ``Function`` object's BasicBlock_ list. 3838 3839* ``Function::BasicBlockListType &getBasicBlockList()`` 3840 3841 Returns the list of BasicBlock_\ s. This is necessary to use when you need to 3842 update the list or perform a complex action that doesn't have a forwarding 3843 method. 3844 3845* | ``Function::arg_iterator`` - Typedef for the argument list iterator 3846 | ``Function::const_arg_iterator`` - Typedef for const_iterator. 3847 | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()`` 3848 3849 These are forwarding methods that make it easy to access the contents of a 3850 ``Function`` object's Argument_ list. 3851 3852* ``Function::ArgumentListType &getArgumentList()`` 3853 3854 Returns the list of Argument_. This is necessary to use when you need to 3855 update the list or perform a complex action that doesn't have a forwarding 3856 method. 3857 3858* ``BasicBlock &getEntryBlock()`` 3859 3860 Returns the entry ``BasicBlock`` for the function. Because the entry block 3861 for the function is always the first block, this returns the first block of 3862 the ``Function``. 3863 3864* | ``Type *getReturnType()`` 3865 | ``FunctionType *getFunctionType()`` 3866 3867 This traverses the Type_ of the ``Function`` and returns the return type of 3868 the function, or the FunctionType_ of the actual function. 3869 3870* ``SymbolTable *getSymbolTable()`` 3871 3872 Return a pointer to the SymbolTable_ for this ``Function``. 3873 3874.. _GlobalVariable: 3875 3876The ``GlobalVariable`` class 3877---------------------------- 3878 3879``#include "llvm/IR/GlobalVariable.h"`` 3880 3881header source: `GlobalVariable.h 3882<http://llvm.org/doxygen/GlobalVariable_8h-source.html>`_ 3883 3884doxygen info: `GlobalVariable Class 3885<http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_ 3886 3887Superclasses: GlobalValue_, Constant_, User_, Value_ 3888 3889Global variables are represented with the (surprise surprise) ``GlobalVariable`` 3890class. Like functions, ``GlobalVariable``\ s are also subclasses of 3891GlobalValue_, and as such are always referenced by their address (global values 3892must live in memory, so their "name" refers to their constant address). See 3893GlobalValue_ for more on this. Global variables may have an initial value 3894(which must be a Constant_), and if they have an initializer, they may be marked 3895as "constant" themselves (indicating that their contents never change at 3896runtime). 3897 3898.. _m_GlobalVariable: 3899 3900Important Public Members of the ``GlobalVariable`` class 3901^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3902 3903* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage, 3904 Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)`` 3905 3906 Create a new global variable of the specified type. If ``isConstant`` is true 3907 then the global variable will be marked as unchanging for the program. The 3908 Linkage parameter specifies the type of linkage (internal, external, weak, 3909 linkonce, appending) for the variable. If the linkage is InternalLinkage, 3910 WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then 3911 the resultant global variable will have internal linkage. AppendingLinkage 3912 concatenates together all instances (in different translation units) of the 3913 variable into a single variable but is only applicable to arrays. See the 3914 `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details 3915 on linkage types. Optionally an initializer, a name, and the module to put 3916 the variable into may be specified for the global variable as well. 3917 3918* ``bool isConstant() const`` 3919 3920 Returns true if this is a global variable that is known not to be modified at 3921 runtime. 3922 3923* ``bool hasInitializer()`` 3924 3925 Returns true if this ``GlobalVariable`` has an intializer. 3926 3927* ``Constant *getInitializer()`` 3928 3929 Returns the initial value for a ``GlobalVariable``. It is not legal to call 3930 this method if there is no initializer. 3931 3932.. _BasicBlock: 3933 3934The ``BasicBlock`` class 3935------------------------ 3936 3937``#include "llvm/IR/BasicBlock.h"`` 3938 3939header source: `BasicBlock.h 3940<http://llvm.org/doxygen/BasicBlock_8h-source.html>`_ 3941 3942doxygen info: `BasicBlock Class 3943<http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_ 3944 3945Superclass: Value_ 3946 3947This class represents a single entry single exit section of the code, commonly 3948known as a basic block by the compiler community. The ``BasicBlock`` class 3949maintains a list of Instruction_\ s, which form the body of the block. Matching 3950the language definition, the last element of this list of instructions is always 3951a terminator instruction (a subclass of the TerminatorInst_ class). 3952 3953In addition to tracking the list of instructions that make up the block, the 3954``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that 3955it is embedded into. 3956 3957Note that ``BasicBlock``\ s themselves are Value_\ s, because they are 3958referenced by instructions like branches and can go in the switch tables. 3959``BasicBlock``\ s have type ``label``. 3960 3961.. _m_BasicBlock: 3962 3963Important Public Members of the ``BasicBlock`` class 3964^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3965 3966* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)`` 3967 3968 The ``BasicBlock`` constructor is used to create new basic blocks for 3969 insertion into a function. The constructor optionally takes a name for the 3970 new block, and a :ref:`Function <c_Function>` to insert it into. If the 3971 ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically 3972 inserted at the end of the specified :ref:`Function <c_Function>`, if not 3973 specified, the BasicBlock must be manually inserted into the :ref:`Function 3974 <c_Function>`. 3975 3976* | ``BasicBlock::iterator`` - Typedef for instruction list iterator 3977 | ``BasicBlock::const_iterator`` - Typedef for const_iterator. 3978 | ``begin()``, ``end()``, ``front()``, ``back()``, 3979 ``size()``, ``empty()`` 3980 STL-style functions for accessing the instruction list. 3981 3982 These methods and typedefs are forwarding functions that have the same 3983 semantics as the standard library methods of the same names. These methods 3984 expose the underlying instruction list of a basic block in a way that is easy 3985 to manipulate. To get the full complement of container operations (including 3986 operations to update the list), you must use the ``getInstList()`` method. 3987 3988* ``BasicBlock::InstListType &getInstList()`` 3989 3990 This method is used to get access to the underlying container that actually 3991 holds the Instructions. This method must be used when there isn't a 3992 forwarding function in the ``BasicBlock`` class for the operation that you 3993 would like to perform. Because there are no forwarding functions for 3994 "updating" operations, you need to use this if you want to update the contents 3995 of a ``BasicBlock``. 3996 3997* ``Function *getParent()`` 3998 3999 Returns a pointer to :ref:`Function <c_Function>` the block is embedded into, 4000 or a null pointer if it is homeless. 4001 4002* ``TerminatorInst *getTerminator()`` 4003 4004 Returns a pointer to the terminator instruction that appears at the end of the 4005 ``BasicBlock``. If there is no terminator instruction, or if the last 4006 instruction in the block is not a terminator, then a null pointer is returned. 4007 4008.. _Argument: 4009 4010The ``Argument`` class 4011---------------------- 4012 4013This subclass of Value defines the interface for incoming formal arguments to a 4014function. A Function maintains a list of its formal arguments. An argument has 4015a pointer to the parent Function. 4016 4017 4018