17daef91dSBill Wendling========================================== 27daef91dSBill WendlingThe LLVM Target-Independent Code Generator 37daef91dSBill Wendling========================================== 47daef91dSBill Wendling 5ff9feeb5SBill Wendling.. role:: raw-html(raw) 6ff9feeb5SBill Wendling :format: html 7ff9feeb5SBill Wendling 8ff9feeb5SBill Wendling.. raw:: html 9ff9feeb5SBill Wendling 10ff9feeb5SBill Wendling <style> 11ff9feeb5SBill Wendling .unknown { background-color: #C0C0C0; text-align: center; } 12ff9feeb5SBill Wendling .unknown:before { content: "?" } 13ff9feeb5SBill Wendling .no { background-color: #C11B17 } 14ff9feeb5SBill Wendling .no:before { content: "N" } 15ff9feeb5SBill Wendling .partial { background-color: #F88017 } 16ff9feeb5SBill Wendling .yes { background-color: #0F0; } 17ff9feeb5SBill Wendling .yes:before { content: "Y" } 18ceab0deaSJustin Holewinski .na { background-color: #6666FF; } 19ceab0deaSJustin Holewinski .na:before { content: "N/A" } 20ff9feeb5SBill Wendling </style> 21ff9feeb5SBill Wendling 22ff9feeb5SBill Wendling.. contents:: 23ff9feeb5SBill Wendling :local: 24ff9feeb5SBill Wendling 25ff9feeb5SBill Wendling.. warning:: 26ff9feeb5SBill Wendling This is a work in progress. 27ff9feeb5SBill Wendling 28ff9feeb5SBill WendlingIntroduction 29ff9feeb5SBill Wendling============ 30ff9feeb5SBill Wendling 31ff9feeb5SBill WendlingThe LLVM target-independent code generator is a framework that provides a suite 32ff9feeb5SBill Wendlingof reusable components for translating the LLVM internal representation to the 33ff9feeb5SBill Wendlingmachine code for a specified target---either in assembly form (suitable for a 34ff9feeb5SBill Wendlingstatic compiler) or in binary machine code format (usable for a JIT 35ff9feeb5SBill Wendlingcompiler). The LLVM target-independent code generator consists of six main 36ff9feeb5SBill Wendlingcomponents: 37ff9feeb5SBill Wendling 38ff9feeb5SBill Wendling1. `Abstract target description`_ interfaces which capture important properties 39ff9feeb5SBill Wendling about various aspects of the machine, independently of how they will be used. 40ff9feeb5SBill Wendling These interfaces are defined in ``include/llvm/Target/``. 41ff9feeb5SBill Wendling 42ff9feeb5SBill Wendling2. Classes used to represent the `code being generated`_ for a target. These 43ff9feeb5SBill Wendling classes are intended to be abstract enough to represent the machine code for 44ff9feeb5SBill Wendling *any* target machine. These classes are defined in 45ff9feeb5SBill Wendling ``include/llvm/CodeGen/``. At this level, concepts like "constant pool 46ff9feeb5SBill Wendling entries" and "jump tables" are explicitly exposed. 47ff9feeb5SBill Wendling 4820adbdd2SJustin Lebar3. Classes and algorithms used to represent code at the object file level, the 49ff9feeb5SBill Wendling `MC Layer`_. These classes represent assembly level constructs like labels, 50ff9feeb5SBill Wendling sections, and instructions. At this level, concepts like "constant pool 51ff9feeb5SBill Wendling entries" and "jump tables" don't exist. 52ff9feeb5SBill Wendling 53ff9feeb5SBill Wendling4. `Target-independent algorithms`_ used to implement various phases of native 54ff9feeb5SBill Wendling code generation (register allocation, scheduling, stack frame representation, 55ff9feeb5SBill Wendling etc). This code lives in ``lib/CodeGen/``. 56ff9feeb5SBill Wendling 57ff9feeb5SBill Wendling5. `Implementations of the abstract target description interfaces`_ for 58ff9feeb5SBill Wendling particular targets. These machine descriptions make use of the components 59ff9feeb5SBill Wendling provided by LLVM, and can optionally provide custom target-specific passes, 60ff9feeb5SBill Wendling to build complete code generators for a specific target. Target descriptions 61ff9feeb5SBill Wendling live in ``lib/Target/``. 62ff9feeb5SBill Wendling 63ff9feeb5SBill Wendling6. The target-independent JIT components. The LLVM JIT is completely target 64ff9feeb5SBill Wendling independent (it uses the ``TargetJITInfo`` structure to interface for 65ff9feeb5SBill Wendling target-specific issues. The code for the target-independent JIT lives in 66ff9feeb5SBill Wendling ``lib/ExecutionEngine/JIT``. 67ff9feeb5SBill Wendling 68ff9feeb5SBill WendlingDepending on which part of the code generator you are interested in working on, 69ff9feeb5SBill Wendlingdifferent pieces of this will be useful to you. In any case, you should be 70ff9feeb5SBill Wendlingfamiliar with the `target description`_ and `machine code representation`_ 71ff9feeb5SBill Wendlingclasses. If you want to add a backend for a new target, you will need to 72ff9feeb5SBill Wendling`implement the target description`_ classes for your new target and understand 731703e705SSean Silvathe :doc:`LLVM code representation <LangRef>`. If you are interested in 74ff9feeb5SBill Wendlingimplementing a new `code generation algorithm`_, it should only depend on the 75ff9feeb5SBill Wendlingtarget-description and machine code representation classes, ensuring that it is 76ff9feeb5SBill Wendlingportable. 77ff9feeb5SBill Wendling 78ff9feeb5SBill WendlingRequired components in the code generator 79ff9feeb5SBill Wendling----------------------------------------- 80ff9feeb5SBill Wendling 81ff9feeb5SBill WendlingThe two pieces of the LLVM code generator are the high-level interface to the 82ff9feeb5SBill Wendlingcode generator and the set of reusable components that can be used to build 83ff9feeb5SBill Wendlingtarget-specific backends. The two most important interfaces (:raw-html:`<tt>` 849cfc13d4SMicah Villmow`TargetMachine`_ :raw-html:`</tt>` and :raw-html:`<tt>` `DataLayout`_ 85ff9feeb5SBill Wendling:raw-html:`</tt>`) are the only ones that are required to be defined for a 86ff9feeb5SBill Wendlingbackend to fit into the LLVM system, but the others must be defined if the 87ff9feeb5SBill Wendlingreusable code generator components are going to be used. 88ff9feeb5SBill Wendling 89ff9feeb5SBill WendlingThis design has two important implications. The first is that LLVM can support 90ff9feeb5SBill Wendlingcompletely non-traditional code generation targets. For example, the C backend 91ff9feeb5SBill Wendlingdoes not require register allocation, instruction selection, or any of the other 92ff9feeb5SBill Wendlingstandard components provided by the system. As such, it only implements these 93ff9feeb5SBill Wendlingtwo interfaces, and does its own thing. Note that C backend was removed from the 94ff9feeb5SBill Wendlingtrunk since LLVM 3.1 release. Another example of a code generator like this is a 95ff9feeb5SBill Wendling(purely hypothetical) backend that converts LLVM to the GCC RTL form and uses 96ff9feeb5SBill WendlingGCC to emit machine code for a target. 97ff9feeb5SBill Wendling 98ff9feeb5SBill WendlingThis design also implies that it is possible to design and implement radically 99ff9feeb5SBill Wendlingdifferent code generators in the LLVM system that do not make use of any of the 100ff9feeb5SBill Wendlingbuilt-in components. Doing so is not recommended at all, but could be required 101ff9feeb5SBill Wendlingfor radically different targets that do not fit into the LLVM machine 102ff9feeb5SBill Wendlingdescription model: FPGAs for example. 103ff9feeb5SBill Wendling 104ff9feeb5SBill Wendling.. _high-level design of the code generator: 105ff9feeb5SBill Wendling 106ff9feeb5SBill WendlingThe high-level design of the code generator 107ff9feeb5SBill Wendling------------------------------------------- 108ff9feeb5SBill Wendling 109ff9feeb5SBill WendlingThe LLVM target-independent code generator is designed to support efficient and 110ff9feeb5SBill Wendlingquality code generation for standard register-based microprocessors. Code 111ff9feeb5SBill Wendlinggeneration in this model is divided into the following stages: 112ff9feeb5SBill Wendling 113ff9feeb5SBill Wendling1. `Instruction Selection`_ --- This phase determines an efficient way to 114ff9feeb5SBill Wendling express the input LLVM code in the target instruction set. This stage 115ff9feeb5SBill Wendling produces the initial code for the program in the target instruction set, then 116ff9feeb5SBill Wendling makes use of virtual registers in SSA form and physical registers that 117ff9feeb5SBill Wendling represent any required register assignments due to target constraints or 118ff9feeb5SBill Wendling calling conventions. This step turns the LLVM code into a DAG of target 119ff9feeb5SBill Wendling instructions. 120ff9feeb5SBill Wendling 121ff9feeb5SBill Wendling2. `Scheduling and Formation`_ --- This phase takes the DAG of target 122ff9feeb5SBill Wendling instructions produced by the instruction selection phase, determines an 123ff9feeb5SBill Wendling ordering of the instructions, then emits the instructions as :raw-html:`<tt>` 124ff9feeb5SBill Wendling `MachineInstr`_\s :raw-html:`</tt>` with that ordering. Note that we 125ff9feeb5SBill Wendling describe this in the `instruction selection section`_ because it operates on 126ff9feeb5SBill Wendling a `SelectionDAG`_. 127ff9feeb5SBill Wendling 128ff9feeb5SBill Wendling3. `SSA-based Machine Code Optimizations`_ --- This optional stage consists of a 129ff9feeb5SBill Wendling series of machine-code optimizations that operate on the SSA-form produced by 130ff9feeb5SBill Wendling the instruction selector. Optimizations like modulo-scheduling or peephole 131ff9feeb5SBill Wendling optimization work here. 132ff9feeb5SBill Wendling 133ff9feeb5SBill Wendling4. `Register Allocation`_ --- The target code is transformed from an infinite 134ff9feeb5SBill Wendling virtual register file in SSA form to the concrete register file used by the 135ff9feeb5SBill Wendling target. This phase introduces spill code and eliminates all virtual register 136ff9feeb5SBill Wendling references from the program. 137ff9feeb5SBill Wendling 138ff9feeb5SBill Wendling5. `Prolog/Epilog Code Insertion`_ --- Once the machine code has been generated 139ff9feeb5SBill Wendling for the function and the amount of stack space required is known (used for 140ff9feeb5SBill Wendling LLVM alloca's and spill slots), the prolog and epilog code for the function 141ff9feeb5SBill Wendling can be inserted and "abstract stack location references" can be eliminated. 142ff9feeb5SBill Wendling This stage is responsible for implementing optimizations like frame-pointer 143ff9feeb5SBill Wendling elimination and stack packing. 144ff9feeb5SBill Wendling 145ff9feeb5SBill Wendling6. `Late Machine Code Optimizations`_ --- Optimizations that operate on "final" 146ff9feeb5SBill Wendling machine code can go here, such as spill code scheduling and peephole 147ff9feeb5SBill Wendling optimizations. 148ff9feeb5SBill Wendling 149ff9feeb5SBill Wendling7. `Code Emission`_ --- The final stage actually puts out the code for the 150ff9feeb5SBill Wendling current function, either in the target assembler format or in machine 151ff9feeb5SBill Wendling code. 152ff9feeb5SBill Wendling 153ff9feeb5SBill WendlingThe code generator is based on the assumption that the instruction selector will 154ff9feeb5SBill Wendlinguse an optimal pattern matching selector to create high-quality sequences of 155ff9feeb5SBill Wendlingnative instructions. Alternative code generator designs based on pattern 156ff9feeb5SBill Wendlingexpansion and aggressive iterative peephole optimization are much slower. This 157ff9feeb5SBill Wendlingdesign permits efficient compilation (important for JIT environments) and 158ff9feeb5SBill Wendlingaggressive optimization (used when generating code offline) by allowing 159ff9feeb5SBill Wendlingcomponents of varying levels of sophistication to be used for any step of 160ff9feeb5SBill Wendlingcompilation. 161ff9feeb5SBill Wendling 162ff9feeb5SBill WendlingIn addition to these stages, target implementations can insert arbitrary 163ff9feeb5SBill Wendlingtarget-specific passes into the flow. For example, the X86 target uses a 164ff9feeb5SBill Wendlingspecial pass to handle the 80x87 floating point stack architecture. Other 165ff9feeb5SBill Wendlingtargets with unusual requirements can be supported with custom passes as needed. 166ff9feeb5SBill Wendling 167ff9feeb5SBill WendlingUsing TableGen for target description 168ff9feeb5SBill Wendling------------------------------------- 169ff9feeb5SBill Wendling 170ff9feeb5SBill WendlingThe target description classes require a detailed description of the target 171ff9feeb5SBill Wendlingarchitecture. These target descriptions often have a large amount of common 172ff9feeb5SBill Wendlinginformation (e.g., an ``add`` instruction is almost identical to a ``sub`` 173ff9feeb5SBill Wendlinginstruction). In order to allow the maximum amount of commonality to be 174ff9feeb5SBill Wendlingfactored out, the LLVM code generator uses the 175397ee6ecSSean Silva:doc:`TableGen/index` tool to describe big chunks of the 176ff9feeb5SBill Wendlingtarget machine, which allows the use of domain-specific and target-specific 177ff9feeb5SBill Wendlingabstractions to reduce the amount of repetition. 178ff9feeb5SBill Wendling 179ff9feeb5SBill WendlingAs LLVM continues to be developed and refined, we plan to move more and more of 180ff9feeb5SBill Wendlingthe target description to the ``.td`` form. Doing so gives us a number of 181ff9feeb5SBill Wendlingadvantages. The most important is that it makes it easier to port LLVM because 182ff9feeb5SBill Wendlingit reduces the amount of C++ code that has to be written, and the surface area 183ff9feeb5SBill Wendlingof the code generator that needs to be understood before someone can get 184ff9feeb5SBill Wendlingsomething working. Second, it makes it easier to change things. In particular, 185ff9feeb5SBill Wendlingif tables and other things are all emitted by ``tblgen``, we only need a change 186ff9feeb5SBill Wendlingin one place (``tblgen``) to update all of the targets to a new interface. 187ff9feeb5SBill Wendling 188ff9feeb5SBill Wendling.. _Abstract target description: 189ff9feeb5SBill Wendling.. _target description: 190ff9feeb5SBill Wendling 191ff9feeb5SBill WendlingTarget description classes 192ff9feeb5SBill Wendling========================== 193ff9feeb5SBill Wendling 194ff9feeb5SBill WendlingThe LLVM target description classes (located in the ``include/llvm/Target`` 195ff9feeb5SBill Wendlingdirectory) provide an abstract description of the target machine independent of 196ff9feeb5SBill Wendlingany particular client. These classes are designed to capture the *abstract* 197ff9feeb5SBill Wendlingproperties of the target (such as the instructions and registers it has), and do 198ff9feeb5SBill Wendlingnot incorporate any particular pieces of code generation algorithms. 199ff9feeb5SBill Wendling 2009cfc13d4SMicah VillmowAll of the target description classes (except the :raw-html:`<tt>` `DataLayout`_ 201ff9feeb5SBill Wendling:raw-html:`</tt>` class) are designed to be subclassed by the concrete target 202ff9feeb5SBill Wendlingimplementation, and have virtual methods implemented. To get to these 203ff9feeb5SBill Wendlingimplementations, the :raw-html:`<tt>` `TargetMachine`_ :raw-html:`</tt>` class 204ff9feeb5SBill Wendlingprovides accessors that should be implemented by the target. 205ff9feeb5SBill Wendling 206ff9feeb5SBill Wendling.. _TargetMachine: 207ff9feeb5SBill Wendling 208ff9feeb5SBill WendlingThe ``TargetMachine`` class 209ff9feeb5SBill Wendling--------------------------- 210ff9feeb5SBill Wendling 211ff9feeb5SBill WendlingThe ``TargetMachine`` class provides virtual methods that are used to access the 212ff9feeb5SBill Wendlingtarget-specific implementations of the various target description classes via 213ff9feeb5SBill Wendlingthe ``get*Info`` methods (``getInstrInfo``, ``getRegisterInfo``, 214ff9feeb5SBill Wendling``getFrameInfo``, etc.). This class is designed to be specialized by a concrete 215ff9feeb5SBill Wendlingtarget implementation (e.g., ``X86TargetMachine``) which implements the various 216ff9feeb5SBill Wendlingvirtual methods. The only required target description class is the 2179cfc13d4SMicah Villmow:raw-html:`<tt>` `DataLayout`_ :raw-html:`</tt>` class, but if the code 218ff9feeb5SBill Wendlinggenerator components are to be used, the other interfaces should be implemented 219ff9feeb5SBill Wendlingas well. 220ff9feeb5SBill Wendling 2219cfc13d4SMicah Villmow.. _DataLayout: 222ff9feeb5SBill Wendling 2239cfc13d4SMicah VillmowThe ``DataLayout`` class 224ff9feeb5SBill Wendling------------------------ 225ff9feeb5SBill Wendling 2269cfc13d4SMicah VillmowThe ``DataLayout`` class is the only required target description class, and it 22713539d1bSDmitri Gribenkois the only class that is not extensible (you cannot derive a new class from 2289cfc13d4SMicah Villmowit). ``DataLayout`` specifies information about how the target lays out memory 229ff9feeb5SBill Wendlingfor structures, the alignment requirements for various data types, the size of 230ff9feeb5SBill Wendlingpointers in the target, and whether the target is little-endian or 231ff9feeb5SBill Wendlingbig-endian. 232ff9feeb5SBill Wendling 2338bd389d8SDmitri Gribenko.. _TargetLowering: 234ff9feeb5SBill Wendling 235ff9feeb5SBill WendlingThe ``TargetLowering`` class 236ff9feeb5SBill Wendling---------------------------- 237ff9feeb5SBill Wendling 238ff9feeb5SBill WendlingThe ``TargetLowering`` class is used by SelectionDAG based instruction selectors 239ff9feeb5SBill Wendlingprimarily to describe how LLVM code should be lowered to SelectionDAG 240ff9feeb5SBill Wendlingoperations. Among other things, this class indicates: 241ff9feeb5SBill Wendling 242ff9feeb5SBill Wendling* an initial register class to use for various ``ValueType``\s, 243ff9feeb5SBill Wendling 244ff9feeb5SBill Wendling* which operations are natively supported by the target machine, 245ff9feeb5SBill Wendling 246ff9feeb5SBill Wendling* the return type of ``setcc`` operations, 247ff9feeb5SBill Wendling 248ff9feeb5SBill Wendling* the type to use for shift amounts, and 249ff9feeb5SBill Wendling 250ff9feeb5SBill Wendling* various high-level characteristics, like whether it is profitable to turn 25113539d1bSDmitri Gribenko division by a constant into a multiplication sequence. 252ff9feeb5SBill Wendling 2537174c5a0SDmitri Gribenko.. _TargetRegisterInfo: 2547174c5a0SDmitri Gribenko 255ff9feeb5SBill WendlingThe ``TargetRegisterInfo`` class 256ff9feeb5SBill Wendling-------------------------------- 257ff9feeb5SBill Wendling 258ff9feeb5SBill WendlingThe ``TargetRegisterInfo`` class is used to describe the register file of the 259ff9feeb5SBill Wendlingtarget and any interactions between the registers. 260ff9feeb5SBill Wendling 26170f4e794SEli BenderskyRegisters are represented in the code generator by unsigned integers. Physical 26270f4e794SEli Benderskyregisters (those that actually exist in the target description) are unique 26370f4e794SEli Benderskysmall numbers, and virtual registers are generally large. Note that 26470f4e794SEli Benderskyregister ``#0`` is reserved as a flag value. 265ff9feeb5SBill Wendling 266ff9feeb5SBill WendlingEach register in the processor description has an associated 267ff9feeb5SBill Wendling``TargetRegisterDesc`` entry, which provides a textual name for the register 268ff9feeb5SBill Wendling(used for assembly output and debugging dumps) and a set of aliases (used to 269ff9feeb5SBill Wendlingindicate whether one register overlaps with another). 270ff9feeb5SBill Wendling 271ff9feeb5SBill WendlingIn addition to the per-register description, the ``TargetRegisterInfo`` class 272ff9feeb5SBill Wendlingexposes a set of processor specific register classes (instances of the 273ff9feeb5SBill Wendling``TargetRegisterClass`` class). Each register class contains sets of registers 274ff9feeb5SBill Wendlingthat have the same properties (for example, they are all 32-bit integer 275ff9feeb5SBill Wendlingregisters). Each SSA virtual register created by the instruction selector has 276ff9feeb5SBill Wendlingan associated register class. When the register allocator runs, it replaces 277ff9feeb5SBill Wendlingvirtual registers with a physical register in the set. 278ff9feeb5SBill Wendling 279ff9feeb5SBill WendlingThe target-specific implementations of these classes is auto-generated from a 280397ee6ecSSean Silva:doc:`TableGen/index` description of the register file. 281ff9feeb5SBill Wendling 282ff9feeb5SBill Wendling.. _TargetInstrInfo: 283ff9feeb5SBill Wendling 284ff9feeb5SBill WendlingThe ``TargetInstrInfo`` class 285ff9feeb5SBill Wendling----------------------------- 286ff9feeb5SBill Wendling 287ff9feeb5SBill WendlingThe ``TargetInstrInfo`` class is used to describe the machine instructions 2880b85ba7dSEli Benderskysupported by the target. Descriptions define things like the mnemonic for 2890b85ba7dSEli Benderskythe opcode, the number of operands, the list of implicit register uses and defs, 2900b85ba7dSEli Benderskywhether the instruction has certain target-independent properties (accesses 2910b85ba7dSEli Benderskymemory, is commutable, etc), and holds any target-specific flags. 292ff9feeb5SBill Wendling 2934b916b21SNico WeberThe ``TargetFrameLowering`` class 2944773d0b4SDan Liew--------------------------------- 295ff9feeb5SBill Wendling 2964b916b21SNico WeberThe ``TargetFrameLowering`` class is used to provide information about the stack 297ff9feeb5SBill Wendlingframe layout of the target. It holds the direction of stack growth, the known 298ff9feeb5SBill Wendlingstack alignment on entry to each function, and the offset to the local area. 299ff9feeb5SBill WendlingThe offset to the local area is the offset from the stack pointer on function 300ff9feeb5SBill Wendlingentry to the first location where function data (local variables, spill 301ff9feeb5SBill Wendlinglocations) can be stored. 302ff9feeb5SBill Wendling 303ff9feeb5SBill WendlingThe ``TargetSubtarget`` class 304ff9feeb5SBill Wendling----------------------------- 305ff9feeb5SBill Wendling 306ff9feeb5SBill WendlingThe ``TargetSubtarget`` class is used to provide information about the specific 307ff9feeb5SBill Wendlingchip set being targeted. A sub-target informs code generation of which 308ff9feeb5SBill Wendlinginstructions are supported, instruction latencies and instruction execution 309ff9feeb5SBill Wendlingitinerary; i.e., which processing units are used, in what order, and for how 310ff9feeb5SBill Wendlinglong. 311ff9feeb5SBill Wendling 312ff9feeb5SBill WendlingThe ``TargetJITInfo`` class 313ff9feeb5SBill Wendling--------------------------- 314ff9feeb5SBill Wendling 315ff9feeb5SBill WendlingThe ``TargetJITInfo`` class exposes an abstract interface used by the 316ff9feeb5SBill WendlingJust-In-Time code generator to perform target-specific activities, such as 317ff9feeb5SBill Wendlingemitting stubs. If a ``TargetMachine`` supports JIT code generation, it should 318ff9feeb5SBill Wendlingprovide one of these objects through the ``getJITInfo`` method. 319ff9feeb5SBill Wendling 320ff9feeb5SBill Wendling.. _code being generated: 321ff9feeb5SBill Wendling.. _machine code representation: 322ff9feeb5SBill Wendling 323ff9feeb5SBill WendlingMachine code description classes 324ff9feeb5SBill Wendling================================ 325ff9feeb5SBill Wendling 326ff9feeb5SBill WendlingAt the high-level, LLVM code is translated to a machine specific representation 327ff9feeb5SBill Wendlingformed out of :raw-html:`<tt>` `MachineFunction`_ :raw-html:`</tt>`, 328ff9feeb5SBill Wendling:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>`, and :raw-html:`<tt>` 329ff9feeb5SBill Wendling`MachineInstr`_ :raw-html:`</tt>` instances (defined in 330ff9feeb5SBill Wendling``include/llvm/CodeGen``). This representation is completely target agnostic, 331ff9feeb5SBill Wendlingrepresenting instructions in their most abstract form: an opcode and a series of 332ff9feeb5SBill Wendlingoperands. This representation is designed to support both an SSA representation 333ff9feeb5SBill Wendlingfor machine code, as well as a register allocated, non-SSA form. 334ff9feeb5SBill Wendling 335ff9feeb5SBill Wendling.. _MachineInstr: 336ff9feeb5SBill Wendling 337ff9feeb5SBill WendlingThe ``MachineInstr`` class 338ff9feeb5SBill Wendling-------------------------- 339ff9feeb5SBill Wendling 340ff9feeb5SBill WendlingTarget machine instructions are represented as instances of the ``MachineInstr`` 341ff9feeb5SBill Wendlingclass. This class is an extremely abstract way of representing machine 342ff9feeb5SBill Wendlinginstructions. In particular, it only keeps track of an opcode number and a set 343ff9feeb5SBill Wendlingof operands. 344ff9feeb5SBill Wendling 345ff9feeb5SBill WendlingThe opcode number is a simple unsigned integer that only has meaning to a 346ff9feeb5SBill Wendlingspecific backend. All of the instructions for a target should be defined in the 347ff9feeb5SBill Wendling``*InstrInfo.td`` file for the target. The opcode enum values are auto-generated 348ff9feeb5SBill Wendlingfrom this description. The ``MachineInstr`` class does not have any information 349ff9feeb5SBill Wendlingabout how to interpret the instruction (i.e., what the semantics of the 350ff9feeb5SBill Wendlinginstruction are); for that you must refer to the :raw-html:`<tt>` 351ff9feeb5SBill Wendling`TargetInstrInfo`_ :raw-html:`</tt>` class. 352ff9feeb5SBill Wendling 353ff9feeb5SBill WendlingThe operands of a machine instruction can be of several different types: a 354ff9feeb5SBill Wendlingregister reference, a constant integer, a basic block reference, etc. In 355ff9feeb5SBill Wendlingaddition, a machine operand should be marked as a def or a use of the value 356ff9feeb5SBill Wendling(though only registers are allowed to be defs). 357ff9feeb5SBill Wendling 358ff9feeb5SBill WendlingBy convention, the LLVM code generator orders instruction operands so that all 359ff9feeb5SBill Wendlingregister definitions come before the register uses, even on architectures that 360ff9feeb5SBill Wendlingare normally printed in other orders. For example, the SPARC add instruction: 361ff9feeb5SBill Wendling"``add %i1, %i2, %i3``" adds the "%i1", and "%i2" registers and stores the 362ff9feeb5SBill Wendlingresult into the "%i3" register. In the LLVM code generator, the operands should 363ff9feeb5SBill Wendlingbe stored as "``%i3, %i1, %i2``": with the destination first. 364ff9feeb5SBill Wendling 365ff9feeb5SBill WendlingKeeping destination (definition) operands at the beginning of the operand list 366ff9feeb5SBill Wendlinghas several advantages. In particular, the debugging printer will print the 367ff9feeb5SBill Wendlinginstruction like this: 368ff9feeb5SBill Wendling 369ff9feeb5SBill Wendling.. code-block:: llvm 370ff9feeb5SBill Wendling 371ff9feeb5SBill Wendling %r3 = add %i1, %i2 372ff9feeb5SBill Wendling 373ff9feeb5SBill WendlingAlso if the first operand is a def, it is easier to `create instructions`_ whose 374ff9feeb5SBill Wendlingonly def is the first operand. 375ff9feeb5SBill Wendling 376ff9feeb5SBill Wendling.. _create instructions: 377ff9feeb5SBill Wendling 378ff9feeb5SBill WendlingUsing the ``MachineInstrBuilder.h`` functions 379ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 380ff9feeb5SBill Wendling 381ff9feeb5SBill WendlingMachine instructions are created by using the ``BuildMI`` functions, located in 382ff9feeb5SBill Wendlingthe ``include/llvm/CodeGen/MachineInstrBuilder.h`` file. The ``BuildMI`` 383ff9feeb5SBill Wendlingfunctions make it easy to build arbitrary machine instructions. Usage of the 384ff9feeb5SBill Wendling``BuildMI`` functions look like this: 385ff9feeb5SBill Wendling 386ff9feeb5SBill Wendling.. code-block:: c++ 387ff9feeb5SBill Wendling 388ff9feeb5SBill Wendling // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42') 3893da1078aSKrzysztof Parzyszek // instruction and insert it at the end of the given MachineBasicBlock. 3903da1078aSKrzysztof Parzyszek const TargetInstrInfo &TII = ... 39137f92c74SDmitri Gribenko MachineBasicBlock &MBB = ... 3923da1078aSKrzysztof Parzyszek DebugLoc DL; 3933da1078aSKrzysztof Parzyszek MachineInstr *MI = BuildMI(MBB, DL, TII.get(X86::MOV32ri), DestReg).addImm(42); 394ff9feeb5SBill Wendling 395ff9feeb5SBill Wendling // Create the same instr, but insert it before a specified iterator point. 396ff9feeb5SBill Wendling MachineBasicBlock::iterator MBBI = ... 3973da1078aSKrzysztof Parzyszek BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), DestReg).addImm(42); 398ff9feeb5SBill Wendling 399ff9feeb5SBill Wendling // Create a 'cmp Reg, 0' instruction, no destination reg. 4003da1078aSKrzysztof Parzyszek MI = BuildMI(MBB, DL, TII.get(X86::CMP32ri8)).addReg(Reg).addImm(42); 401ff9feeb5SBill Wendling 402ff9feeb5SBill Wendling // Create an 'sahf' instruction which takes no operands and stores nothing. 4033da1078aSKrzysztof Parzyszek MI = BuildMI(MBB, DL, TII.get(X86::SAHF)); 404ff9feeb5SBill Wendling 405ff9feeb5SBill Wendling // Create a self looping branch instruction. 4063da1078aSKrzysztof Parzyszek BuildMI(MBB, DL, TII.get(X86::JNE)).addMBB(&MBB); 407ff9feeb5SBill Wendling 4083da1078aSKrzysztof ParzyszekIf you need to add a definition operand (other than the optional destination 4093da1078aSKrzysztof Parzyszekregister), you must explicitly mark it as such: 410ff9feeb5SBill Wendling 411ff9feeb5SBill Wendling.. code-block:: c++ 412ff9feeb5SBill Wendling 413ff9feeb5SBill Wendling MI.addReg(Reg, RegState::Define); 414ff9feeb5SBill Wendling 415ff9feeb5SBill WendlingFixed (preassigned) registers 416ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 417ff9feeb5SBill Wendling 418ff9feeb5SBill WendlingOne important issue that the code generator needs to be aware of is the presence 419ff9feeb5SBill Wendlingof fixed registers. In particular, there are often places in the instruction 420ff9feeb5SBill Wendlingstream where the register allocator *must* arrange for a particular value to be 421ff9feeb5SBill Wendlingin a particular register. This can occur due to limitations of the instruction 422ff9feeb5SBill Wendlingset (e.g., the X86 can only do a 32-bit divide with the ``EAX``/``EDX`` 423ff9feeb5SBill Wendlingregisters), or external factors like calling conventions. In any case, the 424ff9feeb5SBill Wendlinginstruction selector should emit code that copies a virtual register into or out 425ff9feeb5SBill Wendlingof a physical register when needed. 426ff9feeb5SBill Wendling 427ff9feeb5SBill WendlingFor example, consider this simple LLVM example: 428ff9feeb5SBill Wendling 429ff9feeb5SBill Wendling.. code-block:: llvm 430ff9feeb5SBill Wendling 431ff9feeb5SBill Wendling define i32 @test(i32 %X, i32 %Y) { 432cdc53956STim Northover %Z = sdiv i32 %X, %Y 433ff9feeb5SBill Wendling ret i32 %Z 434ff9feeb5SBill Wendling } 435ff9feeb5SBill Wendling 436cdc53956STim NorthoverThe X86 instruction selector might produce this machine code for the ``div`` and 437cdc53956STim Northover``ret``: 438ff9feeb5SBill Wendling 439124f2593SRenato Golin.. code-block:: text 440ff9feeb5SBill Wendling 441ff9feeb5SBill Wendling ;; Start of div 442ff9feeb5SBill Wendling %EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX 443ff9feeb5SBill Wendling %reg1027 = sar %reg1024, 31 444ff9feeb5SBill Wendling %EDX = mov %reg1027 ;; Sign extend X into EDX 445ff9feeb5SBill Wendling idiv %reg1025 ;; Divide by Y (in reg1025) 446ff9feeb5SBill Wendling %reg1026 = mov %EAX ;; Read the result (Z) out of EAX 447ff9feeb5SBill Wendling 448ff9feeb5SBill Wendling ;; Start of ret 449ff9feeb5SBill Wendling %EAX = mov %reg1026 ;; 32-bit return value goes in EAX 450ff9feeb5SBill Wendling ret 451ff9feeb5SBill Wendling 452cdc53956STim NorthoverBy the end of code generation, the register allocator would coalesce the 453cdc53956STim Northoverregisters and delete the resultant identity moves producing the following 454ff9feeb5SBill Wendlingcode: 455ff9feeb5SBill Wendling 456124f2593SRenato Golin.. code-block:: text 457ff9feeb5SBill Wendling 458ff9feeb5SBill Wendling ;; X is in EAX, Y is in ECX 459ff9feeb5SBill Wendling mov %EAX, %EDX 460ff9feeb5SBill Wendling sar %EDX, 31 461ff9feeb5SBill Wendling idiv %ECX 462ff9feeb5SBill Wendling ret 463ff9feeb5SBill Wendling 464ff9feeb5SBill WendlingThis approach is extremely general (if it can handle the X86 architecture, it 465ff9feeb5SBill Wendlingcan handle anything!) and allows all of the target specific knowledge about the 466ff9feeb5SBill Wendlinginstruction stream to be isolated in the instruction selector. Note that 467ff9feeb5SBill Wendlingphysical registers should have a short lifetime for good code generation, and 468ff9feeb5SBill Wendlingall physical registers are assumed dead on entry to and exit from basic blocks 469ff9feeb5SBill Wendling(before register allocation). Thus, if you need a value to be live across basic 470ff9feeb5SBill Wendlingblock boundaries, it *must* live in a virtual register. 471ff9feeb5SBill Wendling 472ff9feeb5SBill WendlingCall-clobbered registers 473ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^ 474ff9feeb5SBill Wendling 475ff9feeb5SBill WendlingSome machine instructions, like calls, clobber a large number of physical 476ff9feeb5SBill Wendlingregisters. Rather than adding ``<def,dead>`` operands for all of them, it is 477ff9feeb5SBill Wendlingpossible to use an ``MO_RegisterMask`` operand instead. The register mask 478ff9feeb5SBill Wendlingoperand holds a bit mask of preserved registers, and everything else is 479ff9feeb5SBill Wendlingconsidered to be clobbered by the instruction. 480ff9feeb5SBill Wendling 481ff9feeb5SBill WendlingMachine code in SSA form 482ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^ 483ff9feeb5SBill Wendling 484ff9feeb5SBill Wendling``MachineInstr``'s are initially selected in SSA-form, and are maintained in 485ff9feeb5SBill WendlingSSA-form until register allocation happens. For the most part, this is 486ff9feeb5SBill Wendlingtrivially simple since LLVM is already in SSA form; LLVM PHI nodes become 487ff9feeb5SBill Wendlingmachine code PHI nodes, and virtual registers are only allowed to have a single 488ff9feeb5SBill Wendlingdefinition. 489ff9feeb5SBill Wendling 490ff9feeb5SBill WendlingAfter register allocation, machine code is no longer in SSA-form because there 491ff9feeb5SBill Wendlingare no virtual registers left in the code. 492ff9feeb5SBill Wendling 493ff9feeb5SBill Wendling.. _MachineBasicBlock: 494ff9feeb5SBill Wendling 495ff9feeb5SBill WendlingThe ``MachineBasicBlock`` class 496ff9feeb5SBill Wendling------------------------------- 497ff9feeb5SBill Wendling 498ff9feeb5SBill WendlingThe ``MachineBasicBlock`` class contains a list of machine instructions 499ff9feeb5SBill Wendling(:raw-html:`<tt>` `MachineInstr`_ :raw-html:`</tt>` instances). It roughly 500ff9feeb5SBill Wendlingcorresponds to the LLVM code input to the instruction selector, but there can be 501ff9feeb5SBill Wendlinga one-to-many mapping (i.e. one LLVM basic block can map to multiple machine 502ff9feeb5SBill Wendlingbasic blocks). The ``MachineBasicBlock`` class has a "``getBasicBlock``" method, 503ff9feeb5SBill Wendlingwhich returns the LLVM basic block that it comes from. 504ff9feeb5SBill Wendling 505ff9feeb5SBill Wendling.. _MachineFunction: 506ff9feeb5SBill Wendling 507ff9feeb5SBill WendlingThe ``MachineFunction`` class 508ff9feeb5SBill Wendling----------------------------- 509ff9feeb5SBill Wendling 510ff9feeb5SBill WendlingThe ``MachineFunction`` class contains a list of machine basic blocks 511ff9feeb5SBill Wendling(:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>` instances). It 512ff9feeb5SBill Wendlingcorresponds one-to-one with the LLVM function input to the instruction selector. 513e1567e77SYoungsuk KimIn addition to a list of basic blocks, the ``MachineFunction`` contains a 514ff9feeb5SBill Wendling``MachineConstantPool``, a ``MachineFrameInfo``, a ``MachineFunctionInfo``, and 515ff9feeb5SBill Wendlinga ``MachineRegisterInfo``. See ``include/llvm/CodeGen/MachineFunction.h`` for 516ff9feeb5SBill Wendlingmore information. 517ff9feeb5SBill Wendling 518ff9feeb5SBill Wendling``MachineInstr Bundles`` 519ff9feeb5SBill Wendling------------------------ 520ff9feeb5SBill Wendling 521ff9feeb5SBill WendlingLLVM code generator can model sequences of instructions as MachineInstr 522ff9feeb5SBill Wendlingbundles. A MI bundle can model a VLIW group / pack which contains an arbitrary 523ff9feeb5SBill Wendlingnumber of parallel instructions. It can also be used to model a sequential list 524ff9feeb5SBill Wendlingof instructions (potentially with data dependencies) that cannot be legally 525ff9feeb5SBill Wendlingseparated (e.g. ARM Thumb2 IT blocks). 526ff9feeb5SBill Wendling 527ff9feeb5SBill WendlingConceptually a MI bundle is a MI with a number of other MIs nested within: 528ff9feeb5SBill Wendling 529ff9feeb5SBill Wendling:: 530ff9feeb5SBill Wendling 531ff9feeb5SBill Wendling -------------- 532ff9feeb5SBill Wendling | Bundle | --------- 533ff9feeb5SBill Wendling -------------- \ 534ff9feeb5SBill Wendling | ---------------- 535ff9feeb5SBill Wendling | | MI | 536ff9feeb5SBill Wendling | ---------------- 537ff9feeb5SBill Wendling | | 538ff9feeb5SBill Wendling | ---------------- 539ff9feeb5SBill Wendling | | MI | 540ff9feeb5SBill Wendling | ---------------- 541ff9feeb5SBill Wendling | | 542ff9feeb5SBill Wendling | ---------------- 543ff9feeb5SBill Wendling | | MI | 544ff9feeb5SBill Wendling | ---------------- 545ff9feeb5SBill Wendling | 546ff9feeb5SBill Wendling -------------- 547ff9feeb5SBill Wendling | Bundle | -------- 548ff9feeb5SBill Wendling -------------- \ 549ff9feeb5SBill Wendling | ---------------- 550ff9feeb5SBill Wendling | | MI | 551ff9feeb5SBill Wendling | ---------------- 552ff9feeb5SBill Wendling | | 553ff9feeb5SBill Wendling | ---------------- 554ff9feeb5SBill Wendling | | MI | 555ff9feeb5SBill Wendling | ---------------- 556ff9feeb5SBill Wendling | | 557ff9feeb5SBill Wendling | ... 558ff9feeb5SBill Wendling | 559ff9feeb5SBill Wendling -------------- 560ff9feeb5SBill Wendling | Bundle | -------- 561ff9feeb5SBill Wendling -------------- \ 562ff9feeb5SBill Wendling | 563ff9feeb5SBill Wendling ... 564ff9feeb5SBill Wendling 565ff9feeb5SBill WendlingMI bundle support does not change the physical representations of 566ff9feeb5SBill WendlingMachineBasicBlock and MachineInstr. All the MIs (including top level and nested 567ff9feeb5SBill Wendlingones) are stored as sequential list of MIs. The "bundled" MIs are marked with 568ff9feeb5SBill Wendlingthe 'InsideBundle' flag. A top level MI with the special BUNDLE opcode is used 569c36a1f1cSHiroshi Inoueto represent the start of a bundle. It's legal to mix BUNDLE MIs with individual 570ff9feeb5SBill WendlingMIs that are not inside bundles nor represent bundles. 571ff9feeb5SBill Wendling 572ff9feeb5SBill WendlingMachineInstr passes should operate on a MI bundle as a single unit. Member 573ff9feeb5SBill Wendlingmethods have been taught to correctly handle bundles and MIs inside bundles. 574ff9feeb5SBill WendlingThe MachineBasicBlock iterator has been modified to skip over bundled MIs to 575ff9feeb5SBill Wendlingenforce the bundle-as-a-single-unit concept. An alternative iterator 576ff9feeb5SBill Wendlinginstr_iterator has been added to MachineBasicBlock to allow passes to iterate 577ff9feeb5SBill Wendlingover all of the MIs in a MachineBasicBlock, including those which are nested 578ff9feeb5SBill Wendlinginside bundles. The top level BUNDLE instruction must have the correct set of 579ff9feeb5SBill Wendlingregister MachineOperand's that represent the cumulative inputs and outputs of 580ff9feeb5SBill Wendlingthe bundled MIs. 581ff9feeb5SBill Wendling 582e56a2ad8SMatt ArsenaultPacking / bundling of MachineInstrs for VLIW architectures should 583e56a2ad8SMatt Arsenaultgenerally be done as part of the register allocation super-pass. More 584e56a2ad8SMatt Arsenaultspecifically, the pass which determines what MIs should be bundled 585e56a2ad8SMatt Arsenaulttogether should be done after code generator exits SSA form 586e56a2ad8SMatt Arsenault(i.e. after two-address pass, PHI elimination, and copy coalescing). 587e56a2ad8SMatt ArsenaultSuch bundles should be finalized (i.e. adding BUNDLE MIs and input and 588e56a2ad8SMatt Arsenaultoutput register MachineOperands) after virtual registers have been 589e56a2ad8SMatt Arsenaultrewritten into physical registers. This eliminates the need to add 590e56a2ad8SMatt Arsenaultvirtual register operands to BUNDLE instructions which would 591e56a2ad8SMatt Arsenaulteffectively double the virtual register def and use lists. Bundles may 592e56a2ad8SMatt Arsenaultuse virtual registers and be formed in SSA form, but may not be 593e56a2ad8SMatt Arsenaultappropriate for all use cases. 594ff9feeb5SBill Wendling 595ff9feeb5SBill Wendling.. _MC Layer: 596ff9feeb5SBill Wendling 597ff9feeb5SBill WendlingThe "MC" Layer 598ff9feeb5SBill Wendling============== 599ff9feeb5SBill Wendling 600ff9feeb5SBill WendlingThe MC Layer is used to represent and process code at the raw machine code 601ff9feeb5SBill Wendlinglevel, devoid of "high level" information like "constant pools", "jump tables", 602ff9feeb5SBill Wendling"global variables" or anything like that. At this level, LLVM handles things 603ff9feeb5SBill Wendlinglike label names, machine instructions, and sections in the object file. The 604ff9feeb5SBill Wendlingcode in this layer is used for a number of important purposes: the tail end of 605ff9feeb5SBill Wendlingthe code generator uses it to write a .s or .o file, and it is also used by the 606ff9feeb5SBill Wendlingllvm-mc tool to implement standalone machine code assemblers and disassemblers. 607ff9feeb5SBill Wendling 608ff9feeb5SBill WendlingThis section describes some of the important classes. There are also a number 609ff9feeb5SBill Wendlingof important subsystems that interact at this layer, they are described later in 610ff9feeb5SBill Wendlingthis manual. 611ff9feeb5SBill Wendling 612ff9feeb5SBill Wendling.. _MCStreamer: 613ff9feeb5SBill Wendling 614ff9feeb5SBill WendlingThe ``MCStreamer`` API 615ff9feeb5SBill Wendling---------------------- 616ff9feeb5SBill Wendling 617ff9feeb5SBill WendlingMCStreamer is best thought of as an assembler API. It is an abstract API which 618ff9feeb5SBill Wendlingis *implemented* in different ways (e.g. to output a .s file, output an ELF .o 619ff9feeb5SBill Wendlingfile, etc) but whose API correspond directly to what you see in a .s file. 620ff9feeb5SBill WendlingMCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, 621adf4142fSFangrui SongswitchSection, emitValue (for .byte, .word), etc, which directly correspond to 622ff9feeb5SBill Wendlingassembly level directives. It also has an EmitInstruction method, which is used 623ff9feeb5SBill Wendlingto output an MCInst to the streamer. 624ff9feeb5SBill Wendling 625ff9feeb5SBill WendlingThis API is most important for two clients: the llvm-mc stand-alone assembler is 626ff9feeb5SBill Wendlingeffectively a parser that parses a line, then invokes a method on MCStreamer. In 627ff9feeb5SBill Wendlingthe code generator, the `Code Emission`_ phase of the code generator lowers 628ff9feeb5SBill Wendlinghigher level LLVM IR and Machine* constructs down to the MC layer, emitting 629ff9feeb5SBill Wendlingdirectives through MCStreamer. 630ff9feeb5SBill Wendling 631ff9feeb5SBill WendlingOn the implementation side of MCStreamer, there are two major implementations: 632ff9feeb5SBill Wendlingone for writing out a .s file (MCAsmStreamer), and one for writing out a .o 633ca35b090SJustin Lebarfile (MCObjectStreamer). MCAsmStreamer is a straightforward implementation 634ff9feeb5SBill Wendlingthat prints out a directive for each method (e.g. ``EmitValue -> .byte``), but 635ff9feeb5SBill WendlingMCObjectStreamer implements a full assembler. 636ff9feeb5SBill Wendling 637974efd32SRafael EspindolaFor target specific directives, the MCStreamer has a MCTargetStreamer instance. 638974efd32SRafael EspindolaEach target that needs it defines a class that inherits from it and is a lot 639974efd32SRafael Espindolalike MCStreamer itself: It has one method per directive and two classes that 640974efd32SRafael Espindolainherit from it, a target object streamer and a target asm streamer. The target 64148aee7c6SArtyom Skrobovasm streamer just prints it (``emitFnStart -> .fnstart``), and the object 642974efd32SRafael Espindolastreamer implement the assembler logic for it. 643974efd32SRafael Espindola 644b665d79fSRafael EspindolaTo make llvm use these classes, the target initialization must call 645b665d79fSRafael EspindolaTargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer 646b665d79fSRafael Espindolapassing callbacks that allocate the corresponding target streamer and pass it 647b665d79fSRafael Espindolato createAsmStreamer or to the appropriate object streamer constructor. 648b665d79fSRafael Espindola 649ff9feeb5SBill WendlingThe ``MCContext`` class 650ff9feeb5SBill Wendling----------------------- 651ff9feeb5SBill Wendling 652ff9feeb5SBill WendlingThe MCContext class is the owner of a variety of uniqued data structures at the 653ff9feeb5SBill WendlingMC layer, including symbols, sections, etc. As such, this is the class that you 654ff9feeb5SBill Wendlinginteract with to create symbols and sections. This class can not be subclassed. 655ff9feeb5SBill Wendling 656ff9feeb5SBill WendlingThe ``MCSymbol`` class 657ff9feeb5SBill Wendling---------------------- 658ff9feeb5SBill Wendling 659ff9feeb5SBill WendlingThe MCSymbol class represents a symbol (aka label) in the assembly file. There 660ff9feeb5SBill Wendlingare two interesting kinds of symbols: assembler temporary symbols, and normal 661ff9feeb5SBill Wendlingsymbols. Assembler temporary symbols are used and processed by the assembler 662ff9feeb5SBill Wendlingbut are discarded when the object file is produced. The distinction is usually 663ff9feeb5SBill Wendlingrepresented by adding a prefix to the label, for example "L" labels are 664ff9feeb5SBill Wendlingassembler temporary labels in MachO. 665ff9feeb5SBill Wendling 666ff9feeb5SBill WendlingMCSymbols are created by MCContext and uniqued there. This means that MCSymbols 667ff9feeb5SBill Wendlingcan be compared for pointer equivalence to find out if they are the same symbol. 668ff9feeb5SBill WendlingNote that pointer inequality does not guarantee the labels will end up at 669ff9feeb5SBill Wendlingdifferent addresses though. It's perfectly legal to output something like this 670ff9feeb5SBill Wendlingto the .s file: 671ff9feeb5SBill Wendling 672ff9feeb5SBill Wendling:: 673ff9feeb5SBill Wendling 674ff9feeb5SBill Wendling foo: 675ff9feeb5SBill Wendling bar: 676ff9feeb5SBill Wendling .byte 4 677ff9feeb5SBill Wendling 678ff9feeb5SBill WendlingIn this case, both the foo and bar symbols will have the same address. 679ff9feeb5SBill Wendling 680ff9feeb5SBill WendlingThe ``MCSection`` class 681ff9feeb5SBill Wendling----------------------- 682ff9feeb5SBill Wendling 683ff9feeb5SBill WendlingThe ``MCSection`` class represents an object-file specific section. It is 684ff9feeb5SBill Wendlingsubclassed by object file specific implementations (e.g. ``MCSectionMachO``, 685ff9feeb5SBill Wendling``MCSectionCOFF``, ``MCSectionELF``) and these are created and uniqued by 686ff9feeb5SBill WendlingMCContext. The MCStreamer has a notion of the current section, which can be 687ff9feeb5SBill Wendlingchanged with the SwitchToSection method (which corresponds to a ".section" 688ff9feeb5SBill Wendlingdirective in a .s file). 689ff9feeb5SBill Wendling 690ff9feeb5SBill Wendling.. _MCInst: 691ff9feeb5SBill Wendling 692ff9feeb5SBill WendlingThe ``MCInst`` class 693ff9feeb5SBill Wendling-------------------- 694ff9feeb5SBill Wendling 695ff9feeb5SBill WendlingThe ``MCInst`` class is a target-independent representation of an instruction. 696ff9feeb5SBill WendlingIt is a simple class (much more so than `MachineInstr`_) that holds a 697ff9feeb5SBill Wendlingtarget-specific opcode and a vector of MCOperands. MCOperand, in turn, is a 698ff9feeb5SBill Wendlingsimple discriminated union of three cases: 1) a simple immediate, 2) a target 699ff9feeb5SBill Wendlingregister ID, 3) a symbolic expression (e.g. "``Lfoo-Lbar+42``") as an MCExpr. 700ff9feeb5SBill Wendling 701ff9feeb5SBill WendlingMCInst is the common currency used to represent machine instructions at the MC 702ff9feeb5SBill Wendlinglayer. It is the type used by the instruction encoder, the instruction printer, 703ff9feeb5SBill Wendlingand the type generated by the assembly parser and disassembler. 704ff9feeb5SBill Wendling 705d5745d00SChris Bieneman.. _ObjectFormats: 706d5745d00SChris Bieneman 707d5745d00SChris BienemanObject File Format 708d5745d00SChris Bieneman------------------ 709d5745d00SChris Bieneman 710d5745d00SChris BienemanThe MC layer's object writers support a variety of object formats. Because of 711d5745d00SChris Bienemantarget-specific aspects of object formats each target only supports a subset of 712d5745d00SChris Bienemanthe formats supported by the MC layer. Most targets support emitting ELF 713d5745d00SChris Bienemanobjects. Other vendor-specific objects are generally supported only on targets 714d5745d00SChris Bienemanthat are supported by that vendor (i.e. MachO is only supported on targets 715d5745d00SChris Bienemansupported by Darwin, and XCOFF is only supported on targets that support AIX). 716d5745d00SChris BienemanAdditionally some targets have their own object formats (i.e. DirectX, SPIR-V 717d5745d00SChris Bienemanand WebAssembly). 718d5745d00SChris Bieneman 719d5745d00SChris BienemanThe table below captures a snapshot of object file support in LLVM: 720d5745d00SChris Bieneman 721d5745d00SChris Bieneman .. table:: Object File Formats 722d5745d00SChris Bieneman 723d5745d00SChris Bieneman ================== ======================================================== 724d5745d00SChris Bieneman Format Supported Targets 725d5745d00SChris Bieneman ================== ======================================================== 726d5745d00SChris Bieneman ``COFF`` AArch64, ARM, X86 727d5745d00SChris Bieneman ``DXContainer`` DirectX 728d5745d00SChris Bieneman ``ELF`` AArch64, AMDGPU, ARM, AVR, BPF, CSKY, Hexagon, Lanai, LoongArch, M86k, MSP430, MIPS, PowerPC, RISCV, SPARC, SystemZ, VE, X86 729d5745d00SChris Bieneman ``GCOFF`` SystemZ 730d5745d00SChris Bieneman ``MachO`` AArch64, ARM, X86 731d5745d00SChris Bieneman ``SPIR-V`` SPIRV 732d5745d00SChris Bieneman ``WASM`` WebAssembly 733d5745d00SChris Bieneman ``XCOFF`` PowerPC 734d5745d00SChris Bieneman ================== ======================================================== 735d5745d00SChris Bieneman 736ff9feeb5SBill Wendling.. _Target-independent algorithms: 737ff9feeb5SBill Wendling.. _code generation algorithm: 738ff9feeb5SBill Wendling 739ff9feeb5SBill WendlingTarget-independent code generation algorithms 740ff9feeb5SBill Wendling============================================= 741ff9feeb5SBill Wendling 742ff9feeb5SBill WendlingThis section documents the phases described in the `high-level design of the 743ff9feeb5SBill Wendlingcode generator`_. It explains how they work and some of the rationale behind 744ff9feeb5SBill Wendlingtheir design. 745ff9feeb5SBill Wendling 746ff9feeb5SBill Wendling.. _Instruction Selection: 747ff9feeb5SBill Wendling.. _instruction selection section: 748ff9feeb5SBill Wendling 749ff9feeb5SBill WendlingInstruction Selection 750ff9feeb5SBill Wendling--------------------- 751ff9feeb5SBill Wendling 752ff9feeb5SBill WendlingInstruction Selection is the process of translating LLVM code presented to the 753ff9feeb5SBill Wendlingcode generator into target-specific machine instructions. There are several 754ff9feeb5SBill Wendlingwell-known ways to do this in the literature. LLVM uses a SelectionDAG based 755ff9feeb5SBill Wendlinginstruction selector. 756ff9feeb5SBill Wendling 757ff9feeb5SBill WendlingPortions of the DAG instruction selector are generated from the target 758ff9feeb5SBill Wendlingdescription (``*.td``) files. Our goal is for the entire instruction selector 759ff9feeb5SBill Wendlingto be generated from these ``.td`` files, though currently there are still 760ff9feeb5SBill Wendlingthings that require custom C++ code. 761ff9feeb5SBill Wendling 762a15f9ff9Spooja2299`GlobalISel <https://llvm.org/docs/GlobalISel/index.html>`_ is another 763a15f9ff9Spooja2299instruction selection framework. 764a15f9ff9Spooja2299 765ff9feeb5SBill Wendling.. _SelectionDAG: 766ff9feeb5SBill Wendling 767ff9feeb5SBill WendlingIntroduction to SelectionDAGs 768ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 769ff9feeb5SBill Wendling 770ff9feeb5SBill WendlingThe SelectionDAG provides an abstraction for code representation in a way that 771ff9feeb5SBill Wendlingis amenable to instruction selection using automatic techniques 772ff9feeb5SBill Wendling(e.g. dynamic-programming based optimal pattern matching selectors). It is also 773ff9feeb5SBill Wendlingwell-suited to other phases of code generation; in particular, instruction 774ff9feeb5SBill Wendlingscheduling (SelectionDAG's are very close to scheduling DAGs post-selection). 775ff9feeb5SBill WendlingAdditionally, the SelectionDAG provides a host representation where a large 776ff9feeb5SBill Wendlingvariety of very-low-level (but target-independent) `optimizations`_ may be 777ff9feeb5SBill Wendlingperformed; ones which require extensive information about the instructions 778ff9feeb5SBill Wendlingefficiently supported by the target. 779ff9feeb5SBill Wendling 780ff9feeb5SBill WendlingThe SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the 781ff9feeb5SBill Wendling``SDNode`` class. The primary payload of the ``SDNode`` is its operation code 782ff9feeb5SBill Wendling(Opcode) that indicates what operation the node performs and the operands to the 783ff9feeb5SBill Wendlingoperation. The various operation node types are described at the top of the 7845b41ea0dSCharlie Turner``include/llvm/CodeGen/ISDOpcodes.h`` file. 785ff9feeb5SBill Wendling 786ff9feeb5SBill WendlingAlthough most operations define a single value, each node in the graph may 787ff9feeb5SBill Wendlingdefine multiple values. For example, a combined div/rem operation will define 788ff9feeb5SBill Wendlingboth the dividend and the remainder. Many other situations require multiple 789ff9feeb5SBill Wendlingvalues as well. Each node also has some number of operands, which are edges to 790ff9feeb5SBill Wendlingthe node defining the used value. Because nodes may define multiple values, 791ff9feeb5SBill Wendlingedges are represented by instances of the ``SDValue`` class, which is a 792ff9feeb5SBill Wendling``<SDNode, unsigned>`` pair, indicating the node and result value being used, 793ff9feeb5SBill Wendlingrespectively. Each value produced by an ``SDNode`` has an associated ``MVT`` 794ff9feeb5SBill Wendling(Machine Value Type) indicating what the type of the value is. 795ff9feeb5SBill Wendling 796ff9feeb5SBill WendlingSelectionDAGs contain two different kinds of values: those that represent data 797ff9feeb5SBill Wendlingflow and those that represent control flow dependencies. Data values are simple 798ff9feeb5SBill Wendlingedges with an integer or floating point value type. Control edges are 799ff9feeb5SBill Wendlingrepresented as "chain" edges which are of type ``MVT::Other``. These edges 800ff9feeb5SBill Wendlingprovide an ordering between nodes that have side effects (such as loads, stores, 801ff9feeb5SBill Wendlingcalls, returns, etc). All nodes that have side effects should take a token 802ff9feeb5SBill Wendlingchain as input and produce a new one as output. By convention, token chain 803ff9feeb5SBill Wendlinginputs are always operand #0, and chain results are always the last value 804907e64b4SMatt Arsenaultproduced by an operation. However, after instruction selection, the 805907e64b4SMatt Arsenaultmachine nodes have their chain after the instruction's operands, and 806907e64b4SMatt Arsenaultmay be followed by glue nodes. 807ff9feeb5SBill Wendling 808ff9feeb5SBill WendlingA SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is 809ff9feeb5SBill Wendlingalways a marker node with an Opcode of ``ISD::EntryToken``. The Root node is 810ff9feeb5SBill Wendlingthe final side-effecting node in the token chain. For example, in a single basic 811ff9feeb5SBill Wendlingblock function it would be the return node. 812ff9feeb5SBill Wendling 813ff9feeb5SBill WendlingOne important concept for SelectionDAGs is the notion of a "legal" vs. 814ff9feeb5SBill Wendling"illegal" DAG. A legal DAG for a target is one that only uses supported 815ff9feeb5SBill Wendlingoperations and supported types. On a 32-bit PowerPC, for example, a DAG with a 816ff9feeb5SBill Wendlingvalue of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a 817ff9feeb5SBill WendlingSREM or UREM operation. The `legalize types`_ and `legalize operations`_ phases 818ff9feeb5SBill Wendlingare responsible for turning an illegal DAG into a legal DAG. 819ff9feeb5SBill Wendling 8207174c5a0SDmitri Gribenko.. _SelectionDAG-Process: 8217174c5a0SDmitri Gribenko 822ff9feeb5SBill WendlingSelectionDAG Instruction Selection Process 823ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 824ff9feeb5SBill Wendling 825ff9feeb5SBill WendlingSelectionDAG-based instruction selection consists of the following steps: 826ff9feeb5SBill Wendling 827ff9feeb5SBill Wendling#. `Build initial DAG`_ --- This stage performs a simple translation from the 828ff9feeb5SBill Wendling input LLVM code to an illegal SelectionDAG. 829ff9feeb5SBill Wendling 830ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- This stage performs simple optimizations on the 831ff9feeb5SBill Wendling SelectionDAG to simplify it, and recognize meta instructions (like rotates 832ff9feeb5SBill Wendling and ``div``/``rem`` pairs) for targets that support these meta operations. 833ff9feeb5SBill Wendling This makes the resultant code more efficient and the `select instructions 834ff9feeb5SBill Wendling from DAG`_ phase (below) simpler. 835ff9feeb5SBill Wendling 836ff9feeb5SBill Wendling#. `Legalize SelectionDAG Types`_ --- This stage transforms SelectionDAG nodes 837ff9feeb5SBill Wendling to eliminate any types that are unsupported on the target. 838ff9feeb5SBill Wendling 839ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to clean up 840ff9feeb5SBill Wendling redundancies exposed by type legalization. 841ff9feeb5SBill Wendling 842ff9feeb5SBill Wendling#. `Legalize SelectionDAG Ops`_ --- This stage transforms SelectionDAG nodes to 843ff9feeb5SBill Wendling eliminate any operations that are unsupported on the target. 844ff9feeb5SBill Wendling 845ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to eliminate 846ff9feeb5SBill Wendling inefficiencies introduced by operation legalization. 847ff9feeb5SBill Wendling 848ff9feeb5SBill Wendling#. `Select instructions from DAG`_ --- Finally, the target instruction selector 849ff9feeb5SBill Wendling matches the DAG operations to target instructions. This process translates 850ff9feeb5SBill Wendling the target-independent input DAG into another DAG of target instructions. 851ff9feeb5SBill Wendling 852ff9feeb5SBill Wendling#. `SelectionDAG Scheduling and Formation`_ --- The last phase assigns a linear 853ff9feeb5SBill Wendling order to the instructions in the target-instruction DAG and emits them into 854ff9feeb5SBill Wendling the MachineFunction being compiled. This step uses traditional prepass 855ff9feeb5SBill Wendling scheduling techniques. 856ff9feeb5SBill Wendling 857ff9feeb5SBill WendlingAfter all of these steps are complete, the SelectionDAG is destroyed and the 858ff9feeb5SBill Wendlingrest of the code generation passes are run. 859ff9feeb5SBill Wendling 860ff9feeb5SBill WendlingOne great way to visualize what is going on here is to take advantage of a few 861ff9feeb5SBill WendlingLLC command line options. The following options pop up a window displaying the 862ff9feeb5SBill WendlingSelectionDAG at specific times (if you only get errors printed to the console 863ff9feeb5SBill Wendlingwhile using this, you probably `need to configure your 8645b41ea0dSCharlie Turnersystem <ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ to add support for it). 865ff9feeb5SBill Wendling 866ff9feeb5SBill Wendling* ``-view-dag-combine1-dags`` displays the DAG after being built, before the 867ff9feeb5SBill Wendling first optimization pass. 868ff9feeb5SBill Wendling 869ff9feeb5SBill Wendling* ``-view-legalize-dags`` displays the DAG before Legalization. 870ff9feeb5SBill Wendling 871ff9feeb5SBill Wendling* ``-view-dag-combine2-dags`` displays the DAG before the second optimization 872ff9feeb5SBill Wendling pass. 873ff9feeb5SBill Wendling 874ff9feeb5SBill Wendling* ``-view-isel-dags`` displays the DAG before the Select phase. 875ff9feeb5SBill Wendling 876ff9feeb5SBill Wendling* ``-view-sched-dags`` displays the DAG before Scheduling. 877ff9feeb5SBill Wendling 878ff9feeb5SBill WendlingThe ``-view-sunit-dags`` displays the Scheduler's dependency graph. This graph 879ff9feeb5SBill Wendlingis based on the final SelectionDAG, with nodes that must be scheduled together 880ff9feeb5SBill Wendlingbundled into a single scheduling-unit node, and with immediate operands and 881ff9feeb5SBill Wendlingother nodes that aren't relevant for scheduling omitted. 882d8976b8eSMehdi Amini 883d8976b8eSMehdi AminiThe option ``-filter-view-dags`` allows to select the name of the basic block 884d8976b8eSMehdi Aminithat you are interested to visualize and filters all the previous 885d8976b8eSMehdi Amini``view-*-dags`` options. 886ff9feeb5SBill Wendling 887ff9feeb5SBill Wendling.. _Build initial DAG: 888ff9feeb5SBill Wendling 889ff9feeb5SBill WendlingInitial SelectionDAG Construction 890ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 891ff9feeb5SBill Wendling 892ff9feeb5SBill WendlingThe initial SelectionDAG is na\ :raw-html:`ï`\ vely peephole expanded from 8936f6f55eeSEli Benderskythe LLVM input by the ``SelectionDAGBuilder`` class. The intent of this pass 894ff9feeb5SBill Wendlingis to expose as much low-level, target-specific details to the SelectionDAG as 895ff9feeb5SBill Wendlingpossible. This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an 896ff9feeb5SBill Wendling``SDNode add`` while a ``getelementptr`` is expanded into the obvious 897ff9feeb5SBill Wendlingarithmetic). This pass requires target-specific hooks to lower calls, returns, 898ff9feeb5SBill Wendlingvarargs, etc. For these features, the :raw-html:`<tt>` `TargetLowering`_ 899ff9feeb5SBill Wendling:raw-html:`</tt>` interface is used. 900ff9feeb5SBill Wendling 901ff9feeb5SBill Wendling.. _legalize types: 902ff9feeb5SBill Wendling.. _Legalize SelectionDAG Types: 903ff9feeb5SBill Wendling.. _Legalize SelectionDAG Ops: 904ff9feeb5SBill Wendling 905ff9feeb5SBill WendlingSelectionDAG LegalizeTypes Phase 906ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 907ff9feeb5SBill Wendling 908ff9feeb5SBill WendlingThe Legalize phase is in charge of converting a DAG to only use the types that 909ff9feeb5SBill Wendlingare natively supported by the target. 910ff9feeb5SBill Wendling 911ff9feeb5SBill WendlingThere are two main ways of converting values of unsupported scalar types to 912ff9feeb5SBill Wendlingvalues of supported types: converting small types to larger types ("promoting"), 913ff9feeb5SBill Wendlingand breaking up large integer types into smaller ones ("expanding"). For 914ff9feeb5SBill Wendlingexample, a target might require that all f32 values are promoted to f64 and that 915ff9feeb5SBill Wendlingall i1/i8/i16 values are promoted to i32. The same target might require that 916ff9feeb5SBill Wendlingall i64 values be expanded into pairs of i32 values. These changes can insert 917ff9feeb5SBill Wendlingsign and zero extensions as needed to make sure that the final code has the same 918ff9feeb5SBill Wendlingbehavior as the input. 919ff9feeb5SBill Wendling 920ff9feeb5SBill WendlingThere are two main ways of converting values of unsupported vector types to 921ff9feeb5SBill Wendlingvalue of supported types: splitting vector types, multiple times if necessary, 922ff9feeb5SBill Wendlinguntil a legal type is found, and extending vector types by adding elements to 923ff9feeb5SBill Wendlingthe end to round them out to legal types ("widening"). If a vector gets split 924ff9feeb5SBill Wendlingall the way down to single-element parts with no supported vector type being 925ff9feeb5SBill Wendlingfound, the elements are converted to scalars ("scalarizing"). 926ff9feeb5SBill Wendling 927ff9feeb5SBill WendlingA target implementation tells the legalizer which types are supported (and which 928ff9feeb5SBill Wendlingregister class to use for them) by calling the ``addRegisterClass`` method in 9298bd389d8SDmitri Gribenkoits ``TargetLowering`` constructor. 930ff9feeb5SBill Wendling 931ff9feeb5SBill Wendling.. _legalize operations: 932ff9feeb5SBill Wendling.. _Legalizer: 933ff9feeb5SBill Wendling 934ff9feeb5SBill WendlingSelectionDAG Legalize Phase 935ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^ 936ff9feeb5SBill Wendling 937ff9feeb5SBill WendlingThe Legalize phase is in charge of converting a DAG to only use the operations 938ff9feeb5SBill Wendlingthat are natively supported by the target. 939ff9feeb5SBill Wendling 940ff9feeb5SBill WendlingTargets often have weird constraints, such as not supporting every operation on 941ff9feeb5SBill Wendlingevery supported datatype (e.g. X86 does not support byte conditional moves and 942ff9feeb5SBill WendlingPowerPC does not support sign-extending loads from a 16-bit memory location). 943ff9feeb5SBill WendlingLegalize takes care of this by open-coding another sequence of operations to 944ff9feeb5SBill Wendlingemulate the operation ("expansion"), by promoting one type to a larger type that 945ff9feeb5SBill Wendlingsupports the operation ("promotion"), or by using a target-specific hook to 946ff9feeb5SBill Wendlingimplement the legalization ("custom"). 947ff9feeb5SBill Wendling 948ff9feeb5SBill WendlingA target implementation tells the legalizer which operations are not supported 949ff9feeb5SBill Wendling(and which of the above three actions to take) by calling the 950ff9feeb5SBill Wendling``setOperationAction`` method in its ``TargetLowering`` constructor. 951ff9feeb5SBill Wendling 9526a554188SSanjay PatelIf a target has legal vector types, it is expected to produce efficient machine 9536a554188SSanjay Patelcode for common forms of the shufflevector IR instruction using those types. 9546a554188SSanjay PatelThis may require custom legalization for SelectionDAG vector operations that 9556a554188SSanjay Patelare created from the shufflevector IR. The shufflevector forms that should be 9566a554188SSanjay Patelhandled include: 9576a554188SSanjay Patel 9586a554188SSanjay Patel* Vector select --- Each element of the vector is chosen from either of the 9596a554188SSanjay Patel corresponding elements of the 2 input vectors. This operation may also be 9606a554188SSanjay Patel known as a "blend" or "bitwise select" in target assembly. This type of shuffle 9616a554188SSanjay Patel maps directly to the ``shuffle_vector`` SelectionDAG node. 9626a554188SSanjay Patel 9636a554188SSanjay Patel* Insert subvector --- A vector is placed into a longer vector type starting 9646a554188SSanjay Patel at index 0. This type of shuffle maps directly to the ``insert_subvector`` 9656a554188SSanjay Patel SelectionDAG node with the ``index`` operand set to 0. 9666a554188SSanjay Patel 9676a554188SSanjay Patel* Extract subvector --- A vector is pulled from a longer vector type starting 9686a554188SSanjay Patel at index 0. This type of shuffle maps directly to the ``extract_subvector`` 9696a554188SSanjay Patel SelectionDAG node with the ``index`` operand set to 0. 9706a554188SSanjay Patel 9716a554188SSanjay Patel* Splat --- All elements of the vector have identical scalar elements. This 9726a554188SSanjay Patel operation may also be known as a "broadcast" or "duplicate" in target assembly. 9736a554188SSanjay Patel The shufflevector IR instruction may change the vector length, so this operation 9746a554188SSanjay Patel may map to multiple SelectionDAG nodes including ``shuffle_vector``, 9756a554188SSanjay Patel ``concat_vectors``, ``insert_subvector``, and ``extract_subvector``. 9766a554188SSanjay Patel 977ff9feeb5SBill WendlingPrior to the existence of the Legalize passes, we required that every target 978ff9feeb5SBill Wendling`selector`_ supported and handled every operator and type even if they are not 979ff9feeb5SBill Wendlingnatively supported. The introduction of the Legalize phases allows all of the 980ff9feeb5SBill Wendlingcanonicalization patterns to be shared across targets, and makes it very easy to 981ff9feeb5SBill Wendlingoptimize the canonicalized code because it is still in the form of a DAG. 982ff9feeb5SBill Wendling 983ff9feeb5SBill Wendling.. _optimizations: 984ff9feeb5SBill Wendling.. _Optimize SelectionDAG: 985ff9feeb5SBill Wendling.. _selector: 986ff9feeb5SBill Wendling 987ff9feeb5SBill WendlingSelectionDAG Optimization Phase: the DAG Combiner 988ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 989ff9feeb5SBill Wendling 990ff9feeb5SBill WendlingThe SelectionDAG optimization phase is run multiple times for code generation, 991ff9feeb5SBill Wendlingimmediately after the DAG is built and once after each legalization. The first 992ff9feeb5SBill Wendlingrun of the pass allows the initial code to be cleaned up (e.g. performing 993ff9feeb5SBill Wendlingoptimizations that depend on knowing that the operators have restricted type 994ff9feeb5SBill Wendlinginputs). Subsequent runs of the pass clean up the messy code generated by the 995ff9feeb5SBill WendlingLegalize passes, which allows Legalize to be very simple (it can focus on making 996ff9feeb5SBill Wendlingcode legal instead of focusing on generating *good* and legal code). 997ff9feeb5SBill Wendling 998ff9feeb5SBill WendlingOne important class of optimizations performed is optimizing inserted sign and 999ff9feeb5SBill Wendlingzero extension instructions. We currently use ad-hoc techniques, but could move 1000ff9feeb5SBill Wendlingto more rigorous techniques in the future. Here are some good papers on the 1001ff9feeb5SBill Wendlingsubject: 1002ff9feeb5SBill Wendling 1003ff9feeb5SBill Wendling"`Widening integer arithmetic <http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html>`_" :raw-html:`<br>` 1004ff9feeb5SBill WendlingKevin Redwine and Norman Ramsey :raw-html:`<br>` 1005ff9feeb5SBill WendlingInternational Conference on Compiler Construction (CC) 2004 1006ff9feeb5SBill Wendling 1007ff9feeb5SBill Wendling"`Effective sign extension elimination <http://portal.acm.org/citation.cfm?doid=512529.512552>`_" :raw-html:`<br>` 1008ff9feeb5SBill WendlingMotohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani :raw-html:`<br>` 1009ff9feeb5SBill WendlingProceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design 1010ff9feeb5SBill Wendlingand Implementation. 1011ff9feeb5SBill Wendling 1012ff9feeb5SBill Wendling.. _Select instructions from DAG: 1013ff9feeb5SBill Wendling 1014ff9feeb5SBill WendlingSelectionDAG Select Phase 1015ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^ 1016ff9feeb5SBill Wendling 1017ff9feeb5SBill WendlingThe Select phase is the bulk of the target-specific code for instruction 1018ff9feeb5SBill Wendlingselection. This phase takes a legal SelectionDAG as input, pattern matches the 1019ff9feeb5SBill Wendlinginstructions supported by the target to this DAG, and produces a new DAG of 1020ff9feeb5SBill Wendlingtarget code. For example, consider the following LLVM fragment: 1021ff9feeb5SBill Wendling 1022ff9feeb5SBill Wendling.. code-block:: llvm 1023ff9feeb5SBill Wendling 1024ff9feeb5SBill Wendling %t1 = fadd float %W, %X 1025ff9feeb5SBill Wendling %t2 = fmul float %t1, %Y 1026ff9feeb5SBill Wendling %t3 = fadd float %t2, %Z 1027ff9feeb5SBill Wendling 1028ff9feeb5SBill WendlingThis LLVM code corresponds to a SelectionDAG that looks basically like this: 1029ff9feeb5SBill Wendling 1030124f2593SRenato Golin.. code-block:: text 1031ff9feeb5SBill Wendling 1032ff9feeb5SBill Wendling (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z) 1033ff9feeb5SBill Wendling 1034ff9feeb5SBill WendlingIf a target supports floating point multiply-and-add (FMA) operations, one of 1035ff9feeb5SBill Wendlingthe adds can be merged with the multiply. On the PowerPC, for example, the 1036ff9feeb5SBill Wendlingoutput of the instruction selector might look like this DAG: 1037ff9feeb5SBill Wendling 1038ff9feeb5SBill Wendling:: 1039ff9feeb5SBill Wendling 1040ff9feeb5SBill Wendling (FMADDS (FADDS W, X), Y, Z) 1041ff9feeb5SBill Wendling 1042ff9feeb5SBill WendlingThe ``FMADDS`` instruction is a ternary instruction that multiplies its first 1043ff9feeb5SBill Wendlingtwo operands and adds the third (as single-precision floating-point numbers). 1044ff9feeb5SBill WendlingThe ``FADDS`` instruction is a simple binary single-precision add instruction. 1045ff9feeb5SBill WendlingTo perform this pattern match, the PowerPC backend includes the following 1046ff9feeb5SBill Wendlinginstruction definitions: 1047ff9feeb5SBill Wendling 104833f2c07cSSean Silva.. code-block:: text 104933f2c07cSSean Silva :emphasize-lines: 4-5,9 1050ff9feeb5SBill Wendling 1051ff9feeb5SBill Wendling def FMADDS : AForm_1<59, 29, 1052ff9feeb5SBill Wendling (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB), 1053ff9feeb5SBill Wendling "fmadds $FRT, $FRA, $FRC, $FRB", 1054ff9feeb5SBill Wendling [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC), 1055ff9feeb5SBill Wendling F4RC:$FRB))]>; 1056ff9feeb5SBill Wendling def FADDS : AForm_2<59, 21, 1057ff9feeb5SBill Wendling (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB), 1058ff9feeb5SBill Wendling "fadds $FRT, $FRA, $FRB", 1059ff9feeb5SBill Wendling [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>; 1060ff9feeb5SBill Wendling 106133f2c07cSSean SilvaThe highlighted portion of the instruction definitions indicates the pattern 106233f2c07cSSean Silvaused to match the instructions. The DAG operators (like ``fmul``/``fadd``) 106333f2c07cSSean Silvaare defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file. 106433f2c07cSSean Silva"``F4RC``" is the register class of the input and result values. 1065ff9feeb5SBill Wendling 1066ff9feeb5SBill WendlingThe TableGen DAG instruction selector generator reads the instruction patterns 1067ff9feeb5SBill Wendlingin the ``.td`` file and automatically builds parts of the pattern matching code 1068ff9feeb5SBill Wendlingfor your target. It has the following strengths: 1069ff9feeb5SBill Wendling 1070ebba0507SJonathan Roelofs* At compiler-compile time, it analyzes your instruction patterns and tells you 1071ff9feeb5SBill Wendling if your patterns make sense or not. 1072ff9feeb5SBill Wendling 1073ff9feeb5SBill Wendling* It can handle arbitrary constraints on operands for the pattern match. In 1074ff9feeb5SBill Wendling particular, it is straight-forward to say things like "match any immediate 1075ff9feeb5SBill Wendling that is a 13-bit sign-extended value". For examples, see the ``immSExt16`` 1076ff9feeb5SBill Wendling and related ``tblgen`` classes in the PowerPC backend. 1077ff9feeb5SBill Wendling 1078ff9feeb5SBill Wendling* It knows several important identities for the patterns defined. For example, 1079ff9feeb5SBill Wendling it knows that addition is commutative, so it allows the ``FMADDS`` pattern 1080ff9feeb5SBill Wendling above to match "``(fadd X, (fmul Y, Z))``" as well as "``(fadd (fmul X, Y), 1081ff9feeb5SBill Wendling Z)``", without the target author having to specially handle this case. 1082ff9feeb5SBill Wendling 1083ff9feeb5SBill Wendling* It has a full-featured type-inferencing system. In particular, you should 1084ff9feeb5SBill Wendling rarely have to explicitly tell the system what type parts of your patterns 1085ff9feeb5SBill Wendling are. In the ``FMADDS`` case above, we didn't have to tell ``tblgen`` that all 1086ff9feeb5SBill Wendling of the nodes in the pattern are of type 'f32'. It was able to infer and 1087ff9feeb5SBill Wendling propagate this knowledge from the fact that ``F4RC`` has type 'f32'. 1088ff9feeb5SBill Wendling 1089ff9feeb5SBill Wendling* Targets can define their own (and rely on built-in) "pattern fragments". 1090ff9feeb5SBill Wendling Pattern fragments are chunks of reusable patterns that get inlined into your 1091ebba0507SJonathan Roelofs patterns during compiler-compile time. For example, the integer "``(not 1092ff9feeb5SBill Wendling x)``" operation is actually defined as a pattern fragment that expands as 1093ff9feeb5SBill Wendling "``(xor x, -1)``", since the SelectionDAG does not have a native '``not``' 1094ff9feeb5SBill Wendling operation. Targets can define their own short-hand fragments as they see fit. 1095ff9feeb5SBill Wendling See the definition of '``not``' and '``ineg``' for examples. 1096ff9feeb5SBill Wendling 1097ff9feeb5SBill Wendling* In addition to instructions, targets can specify arbitrary patterns that map 1098ff9feeb5SBill Wendling to one or more instructions using the 'Pat' class. For example, the PowerPC 1099ff9feeb5SBill Wendling has no way to load an arbitrary integer immediate into a register in one 1100ff9feeb5SBill Wendling instruction. To tell tblgen how to do this, it defines: 1101ff9feeb5SBill Wendling 1102ff9feeb5SBill Wendling :: 1103ff9feeb5SBill Wendling 1104ff9feeb5SBill Wendling // Arbitrary immediate support. Implement in terms of LIS/ORI. 1105ff9feeb5SBill Wendling def : Pat<(i32 imm:$imm), 1106ff9feeb5SBill Wendling (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>; 1107ff9feeb5SBill Wendling 1108ff9feeb5SBill Wendling If none of the single-instruction patterns for loading an immediate into a 1109ff9feeb5SBill Wendling register match, this will be used. This rule says "match an arbitrary i32 1110ff9feeb5SBill Wendling immediate, turning it into an ``ORI`` ('or a 16-bit immediate') and an ``LIS`` 1111ff9feeb5SBill Wendling ('load 16-bit immediate, where the immediate is shifted to the left 16 bits') 1112ff9feeb5SBill Wendling instruction". To make this work, the ``LO16``/``HI16`` node transformations 1113ff9feeb5SBill Wendling are used to manipulate the input immediate (in this case, take the high or low 1114ff9feeb5SBill Wendling 16-bits of the immediate). 1115ff9feeb5SBill Wendling 1116e618abd6SUlrich Weigand* When using the 'Pat' class to map a pattern to an instruction that has one 1117e618abd6SUlrich Weigand or more complex operands (like e.g. `X86 addressing mode`_), the pattern may 1118e618abd6SUlrich Weigand either specify the operand as a whole using a ``ComplexPattern``, or else it 1119e618abd6SUlrich Weigand may specify the components of the complex operand separately. The latter is 1120e618abd6SUlrich Weigand done e.g. for pre-increment instructions by the PowerPC back end: 1121e618abd6SUlrich Weigand 1122e618abd6SUlrich Weigand :: 1123e618abd6SUlrich Weigand 1124e618abd6SUlrich Weigand def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst), 1125e618abd6SUlrich Weigand "stwu $rS, $dst", LdStStoreUpd, []>, 1126e618abd6SUlrich Weigand RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">; 1127e618abd6SUlrich Weigand 1128e618abd6SUlrich Weigand def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff), 1129e618abd6SUlrich Weigand (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>; 1130e618abd6SUlrich Weigand 1131e618abd6SUlrich Weigand Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the 1132e618abd6SUlrich Weigand complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction. 1133e618abd6SUlrich Weigand 1134ff9feeb5SBill Wendling* While the system does automate a lot, it still allows you to write custom C++ 1135ff9feeb5SBill Wendling code to match special cases if there is something that is hard to 1136ff9feeb5SBill Wendling express. 1137ff9feeb5SBill Wendling 1138ff9feeb5SBill WendlingWhile it has many strengths, the system currently has some limitations, 1139ff9feeb5SBill Wendlingprimarily because it is a work in progress and is not yet finished: 1140ff9feeb5SBill Wendling 1141ff9feeb5SBill Wendling* Overall, there is no way to define or match SelectionDAG nodes that define 1142ff9feeb5SBill Wendling multiple values (e.g. ``SMUL_LOHI``, ``LOAD``, ``CALL``, etc). This is the 1143ff9feeb5SBill Wendling biggest reason that you currently still *have to* write custom C++ code 1144ff9feeb5SBill Wendling for your instruction selector. 1145ff9feeb5SBill Wendling 1146ff9feeb5SBill Wendling* There is no great way to support matching complex addressing modes yet. In 1147ff9feeb5SBill Wendling the future, we will extend pattern fragments to allow them to define multiple 1148ff9feeb5SBill Wendling values (e.g. the four operands of the `X86 addressing mode`_, which are 1149ff9feeb5SBill Wendling currently matched with custom C++ code). In addition, we'll extend fragments 1150ff9feeb5SBill Wendling so that a fragment can match multiple different patterns. 1151ff9feeb5SBill Wendling 1152ff9feeb5SBill Wendling* We don't automatically infer flags like ``isStore``/``isLoad`` yet. 1153ff9feeb5SBill Wendling 1154ff9feeb5SBill Wendling* We don't automatically generate the set of supported registers and operations 1155ff9feeb5SBill Wendling for the `Legalizer`_ yet. 1156ff9feeb5SBill Wendling 1157ff9feeb5SBill Wendling* We don't have a way of tying in custom legalized nodes yet. 1158ff9feeb5SBill Wendling 1159ff9feeb5SBill WendlingDespite these limitations, the instruction selector generator is still quite 1160ff9feeb5SBill Wendlinguseful for most of the binary and logical operations in typical instruction 1161ff9feeb5SBill Wendlingsets. If you run into any problems or can't figure out how to do something, 1162ff9feeb5SBill Wendlingplease let Chris know! 1163ff9feeb5SBill Wendling 1164ff9feeb5SBill Wendling.. _Scheduling and Formation: 1165ff9feeb5SBill Wendling.. _SelectionDAG Scheduling and Formation: 1166ff9feeb5SBill Wendling 1167ff9feeb5SBill WendlingSelectionDAG Scheduling and Formation Phase 1168ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1169ff9feeb5SBill Wendling 1170ff9feeb5SBill WendlingThe scheduling phase takes the DAG of target instructions from the selection 1171ff9feeb5SBill Wendlingphase and assigns an order. The scheduler can pick an order depending on 1172ff9feeb5SBill Wendlingvarious constraints of the machines (i.e. order for minimal register pressure or 1173ff9feeb5SBill Wendlingtry to cover instruction latencies). Once an order is established, the DAG is 1174ff9feeb5SBill Wendlingconverted to a list of :raw-html:`<tt>` `MachineInstr`_\s :raw-html:`</tt>` and 1175ff9feeb5SBill Wendlingthe SelectionDAG is destroyed. 1176ff9feeb5SBill Wendling 1177ff9feeb5SBill WendlingNote that this phase is logically separate from the instruction selection phase, 1178ff9feeb5SBill Wendlingbut is tied to it closely in the code because it operates on SelectionDAGs. 1179ff9feeb5SBill Wendling 1180ff9feeb5SBill WendlingFuture directions for the SelectionDAG 1181ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1182ff9feeb5SBill Wendling 1183ff9feeb5SBill Wendling#. Optional function-at-a-time selection. 1184ff9feeb5SBill Wendling 1185ff9feeb5SBill Wendling#. Auto-generate entire selector from ``.td`` file. 1186ff9feeb5SBill Wendling 1187ff9feeb5SBill Wendling.. _SSA-based Machine Code Optimizations: 1188ff9feeb5SBill Wendling 1189ff9feeb5SBill WendlingSSA-based Machine Code Optimizations 1190ff9feeb5SBill Wendling------------------------------------ 1191ff9feeb5SBill Wendling 1192ff9feeb5SBill WendlingTo Be Written 1193ff9feeb5SBill Wendling 1194ff9feeb5SBill WendlingLive Intervals 1195ff9feeb5SBill Wendling-------------- 1196ff9feeb5SBill Wendling 1197ff9feeb5SBill WendlingLive Intervals are the ranges (intervals) where a variable is *live*. They are 1198ff9feeb5SBill Wendlingused by some `register allocator`_ passes to determine if two or more virtual 1199ff9feeb5SBill Wendlingregisters which require the same physical register are live at the same point in 1200ff9feeb5SBill Wendlingthe program (i.e., they conflict). When this situation occurs, one virtual 1201ff9feeb5SBill Wendlingregister must be *spilled*. 1202ff9feeb5SBill Wendling 1203ff9feeb5SBill WendlingLive Variable Analysis 1204ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^ 1205ff9feeb5SBill Wendling 1206ff9feeb5SBill WendlingThe first step in determining the live intervals of variables is to calculate 1207ff9feeb5SBill Wendlingthe set of registers that are immediately dead after the instruction (i.e., the 1208ff9feeb5SBill Wendlinginstruction calculates the value, but it is never used) and the set of registers 1209ff9feeb5SBill Wendlingthat are used by the instruction, but are never used after the instruction 1210ff9feeb5SBill Wendling(i.e., they are killed). Live variable information is computed for 1211ff9feeb5SBill Wendlingeach *virtual* register and *register allocatable* physical register 1212ff9feeb5SBill Wendlingin the function. This is done in a very efficient manner because it uses SSA to 1213ff9feeb5SBill Wendlingsparsely compute lifetime information for virtual registers (which are in SSA 1214ff9feeb5SBill Wendlingform) and only has to track physical registers within a block. Before register 1215ff9feeb5SBill Wendlingallocation, LLVM can assume that physical registers are only live within a 1216ff9feeb5SBill Wendlingsingle basic block. This allows it to do a single, local analysis to resolve 1217ff9feeb5SBill Wendlingphysical register lifetimes within each basic block. If a physical register is 1218ff9feeb5SBill Wendlingnot register allocatable (e.g., a stack pointer or condition codes), it is not 1219ff9feeb5SBill Wendlingtracked. 1220ff9feeb5SBill Wendling 1221ff9feeb5SBill WendlingPhysical registers may be live in to or out of a function. Live in values are 1222ff9feeb5SBill Wendlingtypically arguments in registers. Live out values are typically return values in 1223ff9feeb5SBill Wendlingregisters. Live in values are marked as such, and are given a dummy "defining" 1224ff9feeb5SBill Wendlinginstruction during live intervals analysis. If the last basic block of a 1225ff9feeb5SBill Wendlingfunction is a ``return``, then it's marked as using all live out values in the 1226ff9feeb5SBill Wendlingfunction. 1227ff9feeb5SBill Wendling 1228ff9feeb5SBill Wendling``PHI`` nodes need to be handled specially, because the calculation of the live 1229ff9feeb5SBill Wendlingvariable information from a depth first traversal of the CFG of the function 1230ff9feeb5SBill Wendlingwon't guarantee that a virtual register used by the ``PHI`` node is defined 1231ff9feeb5SBill Wendlingbefore it's used. When a ``PHI`` node is encountered, only the definition is 1232ff9feeb5SBill Wendlinghandled, because the uses will be handled in other basic blocks. 1233ff9feeb5SBill Wendling 1234ff9feeb5SBill WendlingFor each ``PHI`` node of the current basic block, we simulate an assignment at 1235ff9feeb5SBill Wendlingthe end of the current basic block and traverse the successor basic blocks. If a 1236ff9feeb5SBill Wendlingsuccessor basic block has a ``PHI`` node and one of the ``PHI`` node's operands 1237ff9feeb5SBill Wendlingis coming from the current basic block, then the variable is marked as *alive* 1238ff9feeb5SBill Wendlingwithin the current basic block and all of its predecessor basic blocks, until 1239ff9feeb5SBill Wendlingthe basic block with the defining instruction is encountered. 1240ff9feeb5SBill Wendling 1241ff9feeb5SBill WendlingLive Intervals Analysis 1242ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^ 1243ff9feeb5SBill Wendling 1244ff9feeb5SBill WendlingWe now have the information available to perform the live intervals analysis and 1245ff9feeb5SBill Wendlingbuild the live intervals themselves. We start off by numbering the basic blocks 1246ff9feeb5SBill Wendlingand machine instructions. We then handle the "live-in" values. These are in 1247ff9feeb5SBill Wendlingphysical registers, so the physical register is assumed to be killed by the end 1248ff9feeb5SBill Wendlingof the basic block. Live intervals for virtual registers are computed for some 1249ff9feeb5SBill Wendlingordering of the machine instructions ``[1, N]``. A live interval is an interval 1250ff9feeb5SBill Wendling``[i, j)``, where ``1 >= i >= j > N``, for which a variable is live. 1251ff9feeb5SBill Wendling 1252ff9feeb5SBill Wendling.. note:: 1253ff9feeb5SBill Wendling More to come... 1254ff9feeb5SBill Wendling 1255ff9feeb5SBill Wendling.. _Register Allocation: 1256ff9feeb5SBill Wendling.. _register allocator: 1257ff9feeb5SBill Wendling 1258ff9feeb5SBill WendlingRegister Allocation 1259ff9feeb5SBill Wendling------------------- 1260ff9feeb5SBill Wendling 1261ff9feeb5SBill WendlingThe *Register Allocation problem* consists in mapping a program 1262ff9feeb5SBill Wendling:raw-html:`<b><tt>` P\ :sub:`v`\ :raw-html:`</tt></b>`, that can use an unbounded 1263ff9feeb5SBill Wendlingnumber of virtual registers, to a program :raw-html:`<b><tt>` P\ :sub:`p`\ 1264ff9feeb5SBill Wendling:raw-html:`</tt></b>` that contains a finite (possibly small) number of physical 1265ff9feeb5SBill Wendlingregisters. Each target architecture has a different number of physical 1266ff9feeb5SBill Wendlingregisters. If the number of physical registers is not enough to accommodate all 1267ff9feeb5SBill Wendlingthe virtual registers, some of them will have to be mapped into memory. These 1268ff9feeb5SBill Wendlingvirtuals are called *spilled virtuals*. 1269ff9feeb5SBill Wendling 1270ff9feeb5SBill WendlingHow registers are represented in LLVM 1271ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1272ff9feeb5SBill Wendling 1273ff9feeb5SBill WendlingIn LLVM, physical registers are denoted by integer numbers that normally range 1274ff9feeb5SBill Wendlingfrom 1 to 1023. To see how this numbering is defined for a particular 1275ff9feeb5SBill Wendlingarchitecture, you can read the ``GenRegisterNames.inc`` file for that 1276ff9feeb5SBill Wendlingarchitecture. For instance, by inspecting 1277ff9feeb5SBill Wendling``lib/Target/X86/X86GenRegisterInfo.inc`` we see that the 32-bit register 1278ff9feeb5SBill Wendling``EAX`` is denoted by 43, and the MMX register ``MM0`` is mapped to 65. 1279ff9feeb5SBill Wendling 1280ff9feeb5SBill WendlingSome architectures contain registers that share the same physical location. A 1281ff9feeb5SBill Wendlingnotable example is the X86 platform. For instance, in the X86 architecture, the 1282ff9feeb5SBill Wendlingregisters ``EAX``, ``AX`` and ``AL`` share the first eight bits. These physical 1283ff9feeb5SBill Wendlingregisters are marked as *aliased* in LLVM. Given a particular architecture, you 1284ff9feeb5SBill Wendlingcan check which registers are aliased by inspecting its ``RegisterInfo.td`` 1285ff9feeb5SBill Wendlingfile. Moreover, the class ``MCRegAliasIterator`` enumerates all the physical 1286ff9feeb5SBill Wendlingregisters aliased to a register. 1287ff9feeb5SBill Wendling 1288ff9feeb5SBill WendlingPhysical registers, in LLVM, are grouped in *Register Classes*. Elements in the 1289ff9feeb5SBill Wendlingsame register class are functionally equivalent, and can be interchangeably 1290ff9feeb5SBill Wendlingused. Each virtual register can only be mapped to physical registers of a 1291ff9feeb5SBill Wendlingparticular class. For instance, in the X86 architecture, some virtuals can only 1292ff9feeb5SBill Wendlingbe allocated to 8 bit registers. A register class is described by 1293ff9feeb5SBill Wendling``TargetRegisterClass`` objects. To discover if a virtual register is 1294ff6a7d6dSSean Silvacompatible with a given physical, this code can be used: 1295ff9feeb5SBill Wendling 1296ff9feeb5SBill Wendling.. code-block:: c++ 1297ff9feeb5SBill Wendling 1298ff9feeb5SBill Wendling bool RegMapping_Fer::compatible_class(MachineFunction &mf, 1299ff9feeb5SBill Wendling unsigned v_reg, 1300ff9feeb5SBill Wendling unsigned p_reg) { 1301ff9feeb5SBill Wendling assert(TargetRegisterInfo::isPhysicalRegister(p_reg) && 1302ff9feeb5SBill Wendling "Target register must be physical"); 1303ff9feeb5SBill Wendling const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg); 1304ff9feeb5SBill Wendling return trc->contains(p_reg); 1305ff9feeb5SBill Wendling } 1306ff9feeb5SBill Wendling 1307ff9feeb5SBill WendlingSometimes, mostly for debugging purposes, it is useful to change the number of 1308ff9feeb5SBill Wendlingphysical registers available in the target architecture. This must be done 1309f65d4aa9SKazuaki Ishizakistatically, inside the ``TargetRegisterInfo.td`` file. Just ``grep`` for 1310ff9feeb5SBill Wendling``RegisterClass``, the last parameter of which is a list of registers. Just 1311ff9feeb5SBill Wendlingcommenting some out is one simple way to avoid them being used. A more polite 1312ff9feeb5SBill Wendlingway is to explicitly exclude some registers from the *allocation order*. See the 1313ff9feeb5SBill Wendlingdefinition of the ``GR8`` register class in 1314ff9feeb5SBill Wendling``lib/Target/X86/X86RegisterInfo.td`` for an example of this. 1315ff9feeb5SBill Wendling 1316ff9feeb5SBill WendlingVirtual registers are also denoted by integer numbers. Contrary to physical 1317ff9feeb5SBill Wendlingregisters, different virtual registers never share the same number. Whereas 1318ff9feeb5SBill Wendlingphysical registers are statically defined in a ``TargetRegisterInfo.td`` file 1319ff9feeb5SBill Wendlingand cannot be created by the application developer, that is not the case with 1320ff9feeb5SBill Wendlingvirtual registers. In order to create new virtual registers, use the method 1321ff9feeb5SBill Wendling``MachineRegisterInfo::createVirtualRegister()``. This method will return a new 1322ff9feeb5SBill Wendlingvirtual register. Use an ``IndexedMap<Foo, VirtReg2IndexFunctor>`` to hold 1323ff9feeb5SBill Wendlinginformation per virtual register. If you need to enumerate all virtual 1324ff9feeb5SBill Wendlingregisters, use the function ``TargetRegisterInfo::index2VirtReg()`` to find the 1325ff9feeb5SBill Wendlingvirtual register numbers: 1326ff9feeb5SBill Wendling 1327ff9feeb5SBill Wendling.. code-block:: c++ 1328ff9feeb5SBill Wendling 1329ff9feeb5SBill Wendling for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) { 1330ff9feeb5SBill Wendling unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i); 1331ff9feeb5SBill Wendling stuff(VirtReg); 1332ff9feeb5SBill Wendling } 1333ff9feeb5SBill Wendling 1334ff9feeb5SBill WendlingBefore register allocation, the operands of an instruction are mostly virtual 1335ff9feeb5SBill Wendlingregisters, although physical registers may also be used. In order to check if a 1336ff9feeb5SBill Wendlinggiven machine operand is a register, use the boolean function 1337ff9feeb5SBill Wendling``MachineOperand::isRegister()``. To obtain the integer code of a register, use 1338ff9feeb5SBill Wendling``MachineOperand::getReg()``. An instruction may define or use a register. For 1339ff9feeb5SBill Wendlinginstance, ``ADD reg:1026 := reg:1025 reg:1024`` defines the registers 1024, and 1340ff9feeb5SBill Wendlinguses registers 1025 and 1026. Given a register operand, the method 1341ff9feeb5SBill Wendling``MachineOperand::isUse()`` informs if that register is being used by the 1342ff9feeb5SBill Wendlinginstruction. The method ``MachineOperand::isDef()`` informs if that registers is 1343ff9feeb5SBill Wendlingbeing defined. 1344ff9feeb5SBill Wendling 1345ff9feeb5SBill WendlingWe will call physical registers present in the LLVM bitcode before register 1346ff9feeb5SBill Wendlingallocation *pre-colored registers*. Pre-colored registers are used in many 1347ff9feeb5SBill Wendlingdifferent situations, for instance, to pass parameters of functions calls, and 1348ff9feeb5SBill Wendlingto store results of particular instructions. There are two types of pre-colored 1349ff9feeb5SBill Wendlingregisters: the ones *implicitly* defined, and those *explicitly* 1350ff9feeb5SBill Wendlingdefined. Explicitly defined registers are normal operands, and can be accessed 1351ff9feeb5SBill Wendlingwith ``MachineInstr::getOperand(int)::getReg()``. In order to check which 1352ff9feeb5SBill Wendlingregisters are implicitly defined by an instruction, use the 1353ff9feeb5SBill Wendling``TargetInstrInfo::get(opcode)::ImplicitDefs``, where ``opcode`` is the opcode 1354ff9feeb5SBill Wendlingof the target instruction. One important difference between explicit and 1355ff9feeb5SBill Wendlingimplicit physical registers is that the latter are defined statically for each 1356ff9feeb5SBill Wendlinginstruction, whereas the former may vary depending on the program being 1357ff9feeb5SBill Wendlingcompiled. For example, an instruction that represents a function call will 1358ff9feeb5SBill Wendlingalways implicitly define or use the same set of physical registers. To read the 1359ff9feeb5SBill Wendlingregisters implicitly used by an instruction, use 1360ff9feeb5SBill Wendling``TargetInstrInfo::get(opcode)::ImplicitUses``. Pre-colored registers impose 1361ff9feeb5SBill Wendlingconstraints on any register allocation algorithm. The register allocator must 1362ff9feeb5SBill Wendlingmake sure that none of them are overwritten by the values of virtual registers 1363ff9feeb5SBill Wendlingwhile still alive. 1364ff9feeb5SBill Wendling 1365ff9feeb5SBill WendlingMapping virtual registers to physical registers 1366ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1367ff9feeb5SBill Wendling 1368ff9feeb5SBill WendlingThere are two ways to map virtual registers to physical registers (or to memory 1369ff9feeb5SBill Wendlingslots). The first way, that we will call *direct mapping*, is based on the use 1370ff9feeb5SBill Wendlingof methods of the classes ``TargetRegisterInfo``, and ``MachineOperand``. The 1371ff9feeb5SBill Wendlingsecond way, that we will call *indirect mapping*, relies on the ``VirtRegMap`` 1372ff9feeb5SBill Wendlingclass in order to insert loads and stores sending and getting values to and from 1373ff9feeb5SBill Wendlingmemory. 1374ff9feeb5SBill Wendling 1375ff9feeb5SBill WendlingThe direct mapping provides more flexibility to the developer of the register 1376ff9feeb5SBill Wendlingallocator; however, it is more error prone, and demands more implementation 1377ff9feeb5SBill Wendlingwork. Basically, the programmer will have to specify where load and store 1378ff9feeb5SBill Wendlinginstructions should be inserted in the target function being compiled in order 1379ff9feeb5SBill Wendlingto get and store values in memory. To assign a physical register to a virtual 1380ff9feeb5SBill Wendlingregister present in a given operand, use ``MachineOperand::setReg(p_reg)``. To 1381ff9feeb5SBill Wendlinginsert a store instruction, use ``TargetInstrInfo::storeRegToStackSlot(...)``, 1382ff9feeb5SBill Wendlingand to insert a load instruction, use ``TargetInstrInfo::loadRegFromStackSlot``. 1383ff9feeb5SBill Wendling 1384ff9feeb5SBill WendlingThe indirect mapping shields the application developer from the complexities of 1385ff9feeb5SBill Wendlinginserting load and store instructions. In order to map a virtual register to a 1386ff9feeb5SBill Wendlingphysical one, use ``VirtRegMap::assignVirt2Phys(vreg, preg)``. In order to map 1387ff9feeb5SBill Wendlinga certain virtual register to memory, use 1388ff9feeb5SBill Wendling``VirtRegMap::assignVirt2StackSlot(vreg)``. This method will return the stack 1389ff9feeb5SBill Wendlingslot where ``vreg``'s value will be located. If it is necessary to map another 1390ff9feeb5SBill Wendlingvirtual register to the same stack slot, use 1391ff9feeb5SBill Wendling``VirtRegMap::assignVirt2StackSlot(vreg, stack_location)``. One important point 1392ff9feeb5SBill Wendlingto consider when using the indirect mapping, is that even if a virtual register 1393ff9feeb5SBill Wendlingis mapped to memory, it still needs to be mapped to a physical register. This 1394ff9feeb5SBill Wendlingphysical register is the location where the virtual register is supposed to be 1395ff9feeb5SBill Wendlingfound before being stored or after being reloaded. 1396ff9feeb5SBill Wendling 1397ff9feeb5SBill WendlingIf the indirect strategy is used, after all the virtual registers have been 1398ff9feeb5SBill Wendlingmapped to physical registers or stack slots, it is necessary to use a spiller 1399ff9feeb5SBill Wendlingobject to place load and store instructions in the code. Every virtual that has 14001e61ffddSEric Christopherbeen mapped to a stack slot will be stored to memory after being defined and will 1401ff9feeb5SBill Wendlingbe loaded before being used. The implementation of the spiller tries to recycle 1402ff9feeb5SBill Wendlingload/store instructions, avoiding unnecessary instructions. For an example of 1403ff9feeb5SBill Wendlinghow to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in 1404ff9feeb5SBill Wendling``lib/CodeGen/RegAllocLinearScan.cpp``. 1405ff9feeb5SBill Wendling 1406ff9feeb5SBill WendlingHandling two address instructions 1407ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1408ff9feeb5SBill Wendling 1409ff9feeb5SBill WendlingWith very rare exceptions (e.g., function calls), the LLVM machine code 1410ff9feeb5SBill Wendlinginstructions are three address instructions. That is, each instruction is 1411ff9feeb5SBill Wendlingexpected to define at most one register, and to use at most two registers. 1412ff9feeb5SBill WendlingHowever, some architectures use two address instructions. In this case, the 14131e61ffddSEric Christopherdefined register is also one of the used registers. For instance, an instruction 1414ff9feeb5SBill Wendlingsuch as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX + 1415ff9feeb5SBill Wendling%EBX``. 1416ff9feeb5SBill Wendling 1417ff9feeb5SBill WendlingIn order to produce correct code, LLVM must convert three address instructions 1418ff9feeb5SBill Wendlingthat represent two address instructions into true two address instructions. LLVM 1419ff9feeb5SBill Wendlingprovides the pass ``TwoAddressInstructionPass`` for this specific purpose. It 1420ff9feeb5SBill Wendlingmust be run before register allocation takes place. After its execution, the 1421ff9feeb5SBill Wendlingresulting code may no longer be in SSA form. This happens, for instance, in 1422ff9feeb5SBill Wendlingsituations where an instruction such as ``%a = ADD %b %c`` is converted to two 1423ff9feeb5SBill Wendlinginstructions such as: 1424ff9feeb5SBill Wendling 1425ff9feeb5SBill Wendling:: 1426ff9feeb5SBill Wendling 1427ff9feeb5SBill Wendling %a = MOVE %b 1428ff9feeb5SBill Wendling %a = ADD %a %c 1429ff9feeb5SBill Wendling 1430ff9feeb5SBill WendlingNotice that, internally, the second instruction is represented as ``ADD 1431ff9feeb5SBill Wendling%a[def/use] %c``. I.e., the register operand ``%a`` is both used and defined by 1432ff9feeb5SBill Wendlingthe instruction. 1433ff9feeb5SBill Wendling 1434ff9feeb5SBill WendlingThe SSA deconstruction phase 1435ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1436ff9feeb5SBill Wendling 1437ff9feeb5SBill WendlingAn important transformation that happens during register allocation is called 1438ff9feeb5SBill Wendlingthe *SSA Deconstruction Phase*. The SSA form simplifies many analyses that are 1439ff9feeb5SBill Wendlingperformed on the control flow graph of programs. However, traditional 1440ff9feeb5SBill Wendlinginstruction sets do not implement PHI instructions. Thus, in order to generate 1441ff9feeb5SBill Wendlingexecutable code, compilers must replace PHI instructions with other instructions 1442ff9feeb5SBill Wendlingthat preserve their semantics. 1443ff9feeb5SBill Wendling 1444ff9feeb5SBill WendlingThere are many ways in which PHI instructions can safely be removed from the 1445ff9feeb5SBill Wendlingtarget code. The most traditional PHI deconstruction algorithm replaces PHI 1446ff9feeb5SBill Wendlinginstructions with copy instructions. That is the strategy adopted by LLVM. The 1447ff9feeb5SBill WendlingSSA deconstruction algorithm is implemented in 1448ff9feeb5SBill Wendling``lib/CodeGen/PHIElimination.cpp``. In order to invoke this pass, the identifier 1449ff9feeb5SBill Wendling``PHIEliminationID`` must be marked as required in the code of the register 1450ff9feeb5SBill Wendlingallocator. 1451ff9feeb5SBill Wendling 1452ff9feeb5SBill WendlingInstruction folding 1453ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^ 1454ff9feeb5SBill Wendling 1455ff9feeb5SBill Wendling*Instruction folding* is an optimization performed during register allocation 1456ff9feeb5SBill Wendlingthat removes unnecessary copy instructions. For instance, a sequence of 1457ff9feeb5SBill Wendlinginstructions such as: 1458ff9feeb5SBill Wendling 1459ff9feeb5SBill Wendling:: 1460ff9feeb5SBill Wendling 1461ff9feeb5SBill Wendling %EBX = LOAD %mem_address 1462ff9feeb5SBill Wendling %EAX = COPY %EBX 1463ff9feeb5SBill Wendling 1464ff9feeb5SBill Wendlingcan be safely substituted by the single instruction: 1465ff9feeb5SBill Wendling 1466ff9feeb5SBill Wendling:: 1467ff9feeb5SBill Wendling 1468ff9feeb5SBill Wendling %EAX = LOAD %mem_address 1469ff9feeb5SBill Wendling 1470ff9feeb5SBill WendlingInstructions can be folded with the 1471ff9feeb5SBill Wendling``TargetRegisterInfo::foldMemoryOperand(...)`` method. Care must be taken when 1472ff9feeb5SBill Wendlingfolding instructions; a folded instruction can be quite different from the 1473ff9feeb5SBill Wendlingoriginal instruction. See ``LiveIntervals::addIntervalsForSpills`` in 1474ff9feeb5SBill Wendling``lib/CodeGen/LiveIntervalAnalysis.cpp`` for an example of its use. 1475ff9feeb5SBill Wendling 1476ff9feeb5SBill WendlingBuilt in register allocators 1477ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1478ff9feeb5SBill Wendling 1479ff9feeb5SBill WendlingThe LLVM infrastructure provides the application developer with three different 1480ff9feeb5SBill Wendlingregister allocators: 1481ff9feeb5SBill Wendling 1482ff9feeb5SBill Wendling* *Fast* --- This register allocator is the default for debug builds. It 1483ff9feeb5SBill Wendling allocates registers on a basic block level, attempting to keep values in 1484ff9feeb5SBill Wendling registers and reusing registers as appropriate. 1485ff9feeb5SBill Wendling 1486ff9feeb5SBill Wendling* *Basic* --- This is an incremental approach to register allocation. Live 1487ff9feeb5SBill Wendling ranges are assigned to registers one at a time in an order that is driven by 1488ff9feeb5SBill Wendling heuristics. Since code can be rewritten on-the-fly during allocation, this 1489ff9feeb5SBill Wendling framework allows interesting allocators to be developed as extensions. It is 1490ff9feeb5SBill Wendling not itself a production register allocator but is a potentially useful 1491ff9feeb5SBill Wendling stand-alone mode for triaging bugs and as a performance baseline. 1492ff9feeb5SBill Wendling 1493ff9feeb5SBill Wendling* *Greedy* --- *The default allocator*. This is a highly tuned implementation of 1494ff9feeb5SBill Wendling the *Basic* allocator that incorporates global live range splitting. This 1495ff9feeb5SBill Wendling allocator works hard to minimize the cost of spill code. 1496ff9feeb5SBill Wendling 1497ff9feeb5SBill Wendling* *PBQP* --- A Partitioned Boolean Quadratic Programming (PBQP) based register 1498ff9feeb5SBill Wendling allocator. This allocator works by constructing a PBQP problem representing 1499ff9feeb5SBill Wendling the register allocation problem under consideration, solving this using a PBQP 1500ff9feeb5SBill Wendling solver, and mapping the solution back to a register assignment. 1501ff9feeb5SBill Wendling 1502ff9feeb5SBill WendlingThe type of register allocator used in ``llc`` can be chosen with the command 1503ff9feeb5SBill Wendlingline option ``-regalloc=...``: 1504ff9feeb5SBill Wendling 1505ff9feeb5SBill Wendling.. code-block:: bash 1506ff9feeb5SBill Wendling 1507ff9feeb5SBill Wendling $ llc -regalloc=linearscan file.bc -o ln.s 1508ff9feeb5SBill Wendling $ llc -regalloc=fast file.bc -o fa.s 1509ff9feeb5SBill Wendling $ llc -regalloc=pbqp file.bc -o pbqp.s 1510ff9feeb5SBill Wendling 1511ff9feeb5SBill Wendling.. _Prolog/Epilog Code Insertion: 1512ff9feeb5SBill Wendling 1513ff9feeb5SBill WendlingProlog/Epilog Code Insertion 1514ff9feeb5SBill Wendling---------------------------- 1515ff9feeb5SBill Wendling 15160ee7a2c3SDaniel McIntosh.. note:: 15170ee7a2c3SDaniel McIntosh 15180ee7a2c3SDaniel McIntosh To Be Written 15190ee7a2c3SDaniel McIntosh 1520ff9feeb5SBill WendlingCompact Unwind 15210ee7a2c3SDaniel McIntosh-------------- 1522ff9feeb5SBill Wendling 1523ff9feeb5SBill WendlingThrowing an exception requires *unwinding* out of a function. The information on 1524ff9feeb5SBill Wendlinghow to unwind a given function is traditionally expressed in DWARF unwind 1525ff9feeb5SBill Wendling(a.k.a. frame) info. But that format was originally developed for debuggers to 1526ff9feeb5SBill Wendlingbacktrace, and each Frame Description Entry (FDE) requires ~20-30 bytes per 1527ff9feeb5SBill Wendlingfunction. There is also the cost of mapping from an address in a function to the 1528ff9feeb5SBill Wendlingcorresponding FDE at runtime. An alternative unwind encoding is called *compact 1529ff9feeb5SBill Wendlingunwind* and requires just 4-bytes per function. 1530ff9feeb5SBill Wendling 1531ff9feeb5SBill WendlingThe compact unwind encoding is a 32-bit value, which is encoded in an 1532ff9feeb5SBill Wendlingarchitecture-specific way. It specifies which registers to restore and from 1533ff9feeb5SBill Wendlingwhere, and how to unwind out of the function. When the linker creates a final 1534ff9feeb5SBill Wendlinglinked image, it will create a ``__TEXT,__unwind_info`` section. This section is 1535ff9feeb5SBill Wendlinga small and fast way for the runtime to access unwind info for any given 1536ff9feeb5SBill Wendlingfunction. If we emit compact unwind info for the function, that compact unwind 1537ff9feeb5SBill Wendlinginfo will be encoded in the ``__TEXT,__unwind_info`` section. If we emit DWARF 1538ff9feeb5SBill Wendlingunwind info, the ``__TEXT,__unwind_info`` section will contain the offset of the 1539ff9feeb5SBill WendlingFDE in the ``__TEXT,__eh_frame`` section in the final linked image. 1540ff9feeb5SBill Wendling 1541ff9feeb5SBill WendlingFor X86, there are three modes for the compact unwind encoding: 1542ff9feeb5SBill Wendling 1543ff9feeb5SBill Wendling*Function with a Frame Pointer (``EBP`` or ``RBP``)* 1544ff9feeb5SBill Wendling ``EBP/RBP``-based frame, where ``EBP/RBP`` is pushed onto the stack 1545ff9feeb5SBill Wendling immediately after the return address, then ``ESP/RSP`` is moved to 1546ff9feeb5SBill Wendling ``EBP/RBP``. Thus to unwind, ``ESP/RSP`` is restored with the current 1547ff9feeb5SBill Wendling ``EBP/RBP`` value, then ``EBP/RBP`` is restored by popping the stack, and the 1548ff9feeb5SBill Wendling return is done by popping the stack once more into the PC. All non-volatile 1549ff9feeb5SBill Wendling registers that need to be restored must have been saved in a small range on 1550ff9feeb5SBill Wendling the stack that starts ``EBP-4`` to ``EBP-1020`` (``RBP-8`` to 1551ff9feeb5SBill Wendling ``RBP-1020``). The offset (divided by 4 in 32-bit mode and 8 in 64-bit mode) 1552ff9feeb5SBill Wendling is encoded in bits 16-23 (mask: ``0x00FF0000``). The registers saved are 1553ff9feeb5SBill Wendling encoded in bits 0-14 (mask: ``0x00007FFF``) as five 3-bit entries from the 1554ff9feeb5SBill Wendling following table: 1555ff9feeb5SBill Wendling 1556ff9feeb5SBill Wendling ============== ============= =============== 1557ff9feeb5SBill Wendling Compact Number i386 Register x86-64 Register 1558ff9feeb5SBill Wendling ============== ============= =============== 1559ff9feeb5SBill Wendling 1 ``EBX`` ``RBX`` 1560ff9feeb5SBill Wendling 2 ``ECX`` ``R12`` 1561ff9feeb5SBill Wendling 3 ``EDX`` ``R13`` 1562ff9feeb5SBill Wendling 4 ``EDI`` ``R14`` 1563ff9feeb5SBill Wendling 5 ``ESI`` ``R15`` 1564ff9feeb5SBill Wendling 6 ``EBP`` ``RBP`` 1565ff9feeb5SBill Wendling ============== ============= =============== 1566ff9feeb5SBill Wendling 1567ff9feeb5SBill Wendling*Frameless with a Small Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)* 1568ff9feeb5SBill Wendling To return, a constant (encoded in the compact unwind encoding) is added to the 1569ff9feeb5SBill Wendling ``ESP/RSP``. Then the return is done by popping the stack into the PC. All 1570ff9feeb5SBill Wendling non-volatile registers that need to be restored must have been saved on the 1571ff9feeb5SBill Wendling stack immediately after the return address. The stack size (divided by 4 in 1572ff9feeb5SBill Wendling 32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask: 1573ff9feeb5SBill Wendling ``0x00FF0000``). There is a maximum stack size of 1024 bytes in 32-bit mode 1574ff9feeb5SBill Wendling and 2048 in 64-bit mode. The number of registers saved is encoded in bits 9-12 1575ff9feeb5SBill Wendling (mask: ``0x00001C00``). Bits 0-9 (mask: ``0x000003FF``) contain which 1576ff9feeb5SBill Wendling registers were saved and their order. (See the 1577ff9feeb5SBill Wendling ``encodeCompactUnwindRegistersWithoutFrame()`` function in 1578ff9feeb5SBill Wendling ``lib/Target/X86FrameLowering.cpp`` for the encoding algorithm.) 1579ff9feeb5SBill Wendling 1580ff9feeb5SBill Wendling*Frameless with a Large Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)* 1581ff9feeb5SBill Wendling This case is like the "Frameless with a Small Constant Stack Size" case, but 1582ff9feeb5SBill Wendling the stack size is too large to encode in the compact unwind encoding. Instead 1583ff9feeb5SBill Wendling it requires that the function contains "``subl $nnnnnn, %esp``" in its 1584ff9feeb5SBill Wendling prolog. The compact encoding contains the offset to the ``$nnnnnn`` value in 1585ff9feeb5SBill Wendling the function in bits 9-12 (mask: ``0x00001C00``). 1586ff9feeb5SBill Wendling 1587ff9feeb5SBill Wendling.. _Late Machine Code Optimizations: 1588ff9feeb5SBill Wendling 1589ff9feeb5SBill WendlingLate Machine Code Optimizations 1590ff9feeb5SBill Wendling------------------------------- 1591ff9feeb5SBill Wendling 1592ff9feeb5SBill Wendling.. note:: 1593ff9feeb5SBill Wendling 1594ff9feeb5SBill Wendling To Be Written 1595ff9feeb5SBill Wendling 1596ff9feeb5SBill Wendling.. _Code Emission: 1597ff9feeb5SBill Wendling 1598ff9feeb5SBill WendlingCode Emission 1599ff9feeb5SBill Wendling------------- 1600ff9feeb5SBill Wendling 1601ff9feeb5SBill WendlingThe code emission step of code generation is responsible for lowering from the 1602ff9feeb5SBill Wendlingcode generator abstractions (like `MachineFunction`_, `MachineInstr`_, etc) down 1603ff9feeb5SBill Wendlingto the abstractions used by the MC layer (`MCInst`_, `MCStreamer`_, etc). This 1604ff9feeb5SBill Wendlingis done with a combination of several different classes: the (misnamed) 1605ff9feeb5SBill Wendlingtarget-independent AsmPrinter class, target-specific subclasses of AsmPrinter 1606ff9feeb5SBill Wendling(such as SparcAsmPrinter), and the TargetLoweringObjectFile class. 1607ff9feeb5SBill Wendling 1608ff9feeb5SBill WendlingSince the MC layer works at the level of abstraction of object files, it doesn't 1609ff9feeb5SBill Wendlinghave a notion of functions, global variables etc. Instead, it thinks about 1610ff9feeb5SBill Wendlinglabels, directives, and instructions. A key class used at this time is the 1611ff9feeb5SBill WendlingMCStreamer class. This is an abstract API that is implemented in different ways 1612ff9feeb5SBill Wendling(e.g. to output a .s file, output an ELF .o file, etc) that is effectively an 1613ff9feeb5SBill Wendling"assembler API". MCStreamer has one method per directive, such as EmitLabel, 1614adf4142fSFangrui SongEmitSymbolAttribute, switchSection, etc, which directly correspond to assembly 1615ff9feeb5SBill Wendlinglevel directives. 1616ff9feeb5SBill Wendling 1617ff9feeb5SBill WendlingIf you are interested in implementing a code generator for a target, there are 1618ff9feeb5SBill Wendlingthree important things that you have to implement for your target: 1619ff9feeb5SBill Wendling 1620ff9feeb5SBill Wendling#. First, you need a subclass of AsmPrinter for your target. This class 1621ff9feeb5SBill Wendling implements the general lowering process converting MachineFunction's into MC 1622ff9feeb5SBill Wendling label constructs. The AsmPrinter base class provides a number of useful 1623ff9feeb5SBill Wendling methods and routines, and also allows you to override the lowering process in 1624ff9feeb5SBill Wendling some important ways. You should get much of the lowering for free if you are 1625ff9feeb5SBill Wendling implementing an ELF, COFF, or MachO target, because the 1626ff9feeb5SBill Wendling TargetLoweringObjectFile class implements much of the common logic. 1627ff9feeb5SBill Wendling 1628ff9feeb5SBill Wendling#. Second, you need to implement an instruction printer for your target. The 1629ff9feeb5SBill Wendling instruction printer takes an `MCInst`_ and renders it to a raw_ostream as 1630ff9feeb5SBill Wendling text. Most of this is automatically generated from the .td file (when you 1631ff9feeb5SBill Wendling specify something like "``add $dst, $src1, $src2``" in the instructions), but 1632ff9feeb5SBill Wendling you need to implement routines to print operands. 1633ff9feeb5SBill Wendling 1634ff9feeb5SBill Wendling#. Third, you need to implement code that lowers a `MachineInstr`_ to an MCInst, 1635ff9feeb5SBill Wendling usually implemented in "<target>MCInstLower.cpp". This lowering process is 1636ff9feeb5SBill Wendling often target specific, and is responsible for turning jump table entries, 1637ff9feeb5SBill Wendling constant pool indices, global variable addresses, etc into MCLabels as 1638ff9feeb5SBill Wendling appropriate. This translation layer is also responsible for expanding pseudo 1639ff9feeb5SBill Wendling ops used by the code generator into the actual machine instructions they 1640ff9feeb5SBill Wendling correspond to. The MCInsts that are generated by this are fed into the 1641ff9feeb5SBill Wendling instruction printer or the encoder. 1642ff9feeb5SBill Wendling 16431e61ffddSEric ChristopherFinally, at your choosing, you can also implement a subclass of MCCodeEmitter 1644ff9feeb5SBill Wendlingwhich lowers MCInst's into machine code bytes and relocations. This is 1645ff9feeb5SBill Wendlingimportant if you want to support direct .o file emission, or would like to 1646ff9feeb5SBill Wendlingimplement an assembler for your target. 1647ff9feeb5SBill Wendling 1648a6bcd53dSSean EvesonEmitting function stack size information 1649a6bcd53dSSean Eveson^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1650a6bcd53dSSean Eveson 1651a6bcd53dSSean EvesonA section containing metadata on function stack sizes will be emitted when 1652a6bcd53dSSean Eveson``TargetLoweringObjectFile::StackSizesSection`` is not null, and 1653a6bcd53dSSean Eveson``TargetOptions::EmitStackSizeSection`` is set (-stack-size-section). The 16542ae6037dSSean Evesonsection will contain an array of pairs of function symbol values (pointer size) 1655a6bcd53dSSean Evesonand stack sizes (unsigned LEB128). The stack size values only include the space 1656a6bcd53dSSean Evesonallocated in the function prologue. Functions with dynamic stack allocations are 1657a6bcd53dSSean Evesonnot included. 1658a6bcd53dSSean Eveson 1659ff9feeb5SBill WendlingVLIW Packetizer 1660ff9feeb5SBill Wendling--------------- 1661ff9feeb5SBill Wendling 1662ff9feeb5SBill WendlingIn a Very Long Instruction Word (VLIW) architecture, the compiler is responsible 1663ff9feeb5SBill Wendlingfor mapping instructions to functional-units available on the architecture. To 1664ff9feeb5SBill Wendlingthat end, the compiler creates groups of instructions called *packets* or 1665ff9feeb5SBill Wendling*bundles*. The VLIW packetizer in LLVM is a target-independent mechanism to 1666ff9feeb5SBill Wendlingenable the packetization of machine instructions. 1667ff9feeb5SBill Wendling 1668ff9feeb5SBill WendlingMapping from instructions to functional units 1669ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1670ff9feeb5SBill Wendling 1671ff9feeb5SBill WendlingInstructions in a VLIW target can typically be mapped to multiple functional 1672ff9feeb5SBill Wendlingunits. During the process of packetizing, the compiler must be able to reason 1673ff9feeb5SBill Wendlingabout whether an instruction can be added to a packet. This decision can be 1674ff9feeb5SBill Wendlingcomplex since the compiler has to examine all possible mappings of instructions 1675ff9feeb5SBill Wendlingto functional units. Therefore to alleviate compilation-time complexity, the 1676ff9feeb5SBill WendlingVLIW packetizer parses the instruction classes of a target and generates tables 1677ff9feeb5SBill Wendlingat compiler build time. These tables can then be queried by the provided 1678ff9feeb5SBill Wendlingmachine-independent API to determine if an instruction can be accommodated in a 1679ff9feeb5SBill Wendlingpacket. 1680ff9feeb5SBill Wendling 1681ff9feeb5SBill WendlingHow the packetization tables are generated and used 1682ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1683ff9feeb5SBill Wendling 1684ff9feeb5SBill WendlingThe packetizer reads instruction classes from a target's itineraries and creates 1685ff9feeb5SBill Wendlinga deterministic finite automaton (DFA) to represent the state of a packet. A DFA 1686ff9feeb5SBill Wendlingconsists of three major elements: inputs, states, and transitions. The set of 1687ff9feeb5SBill Wendlinginputs for the generated DFA represents the instruction being added to a 1688ff9feeb5SBill Wendlingpacket. The states represent the possible consumption of functional units by 1689ff9feeb5SBill Wendlinginstructions in a packet. In the DFA, transitions from one state to another 1690ff9feeb5SBill Wendlingoccur on the addition of an instruction to an existing packet. If there is a 1691ff9feeb5SBill Wendlinglegal mapping of functional units to instructions, then the DFA contains a 1692ff9feeb5SBill Wendlingcorresponding transition. The absence of a transition indicates that a legal 1693ff9feeb5SBill Wendlingmapping does not exist and that the instruction cannot be added to the packet. 1694ff9feeb5SBill Wendling 1695ff9feeb5SBill WendlingTo generate tables for a VLIW target, add *Target*\ GenDFAPacketizer.inc as a 1696ff9feeb5SBill Wendlingtarget to the Makefile in the target directory. The exported API provides three 1697ff9feeb5SBill Wendlingfunctions: ``DFAPacketizer::clearResources()``, 1698ff9feeb5SBill Wendling``DFAPacketizer::reserveResources(MachineInstr *MI)``, and 1699ff9feeb5SBill Wendling``DFAPacketizer::canReserveResources(MachineInstr *MI)``. These functions allow 1700ff9feeb5SBill Wendlinga target packetizer to add an instruction to an existing packet and to check 1701ff9feeb5SBill Wendlingwhether an instruction can be added to a packet. See 1702ff9feeb5SBill Wendling``llvm/CodeGen/DFAPacketizer.h`` for more information. 1703ff9feeb5SBill Wendling 1704ff9feeb5SBill WendlingImplementing a Native Assembler 1705ff9feeb5SBill Wendling=============================== 1706ff9feeb5SBill Wendling 1707ff9feeb5SBill WendlingThough you're probably reading this because you want to write or maintain a 1708a9853efbSSylvestre Ledrucompiler backend, LLVM also fully supports building a native assembler. 1709ff9feeb5SBill WendlingWe've tried hard to automate the generation of the assembler from the .td files 1710ff9feeb5SBill Wendling(in particular the instruction syntax and encodings), which means that a large 1711ff9feeb5SBill Wendlingpart of the manual and repetitive data entry can be factored and shared with the 1712ff9feeb5SBill Wendlingcompiler. 1713ff9feeb5SBill Wendling 1714ff9feeb5SBill WendlingInstruction Parsing 1715ff9feeb5SBill Wendling------------------- 1716ff9feeb5SBill Wendling 1717ff9feeb5SBill Wendling.. note:: 1718ff9feeb5SBill Wendling 1719ff9feeb5SBill Wendling To Be Written 1720ff9feeb5SBill Wendling 1721ff9feeb5SBill Wendling 1722ff9feeb5SBill WendlingInstruction Alias Processing 1723ff9feeb5SBill Wendling---------------------------- 1724ff9feeb5SBill Wendling 1725ff9feeb5SBill WendlingOnce the instruction is parsed, it enters the MatchInstructionImpl function. 1726ff9feeb5SBill WendlingThe MatchInstructionImpl function performs alias processing and then does actual 1727ff9feeb5SBill Wendlingmatching. 1728ff9feeb5SBill Wendling 1729ff9feeb5SBill WendlingAlias processing is the phase that canonicalizes different lexical forms of the 1730ff9feeb5SBill Wendlingsame instructions down to one representation. There are several different kinds 1731ff9feeb5SBill Wendlingof alias that are possible to implement and they are listed below in the order 1732ff9feeb5SBill Wendlingthat they are processed (which is in order from simplest/weakest to most 1733ff9feeb5SBill Wendlingcomplex/powerful). Generally you want to use the first alias mechanism that 1734ff9feeb5SBill Wendlingmeets the needs of your instruction, because it will allow a more concise 1735ff9feeb5SBill Wendlingdescription. 1736ff9feeb5SBill Wendling 1737ff9feeb5SBill WendlingMnemonic Aliases 1738ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^ 1739ff9feeb5SBill Wendling 1740ff9feeb5SBill WendlingThe first phase of alias processing is simple instruction mnemonic remapping for 1741ff9feeb5SBill Wendlingclasses of instructions which are allowed with two different mnemonics. This 1742ff9feeb5SBill Wendlingphase is a simple and unconditionally remapping from one input mnemonic to one 1743ff9feeb5SBill Wendlingoutput mnemonic. It isn't possible for this form of alias to look at the 1744ff9feeb5SBill Wendlingoperands at all, so the remapping must apply for all forms of a given mnemonic. 1745ff9feeb5SBill WendlingMnemonic aliases are defined simply, for example X86 has: 1746ff9feeb5SBill Wendling 1747ff9feeb5SBill Wendling:: 1748ff9feeb5SBill Wendling 1749ff9feeb5SBill Wendling def : MnemonicAlias<"cbw", "cbtw">; 1750ff9feeb5SBill Wendling def : MnemonicAlias<"smovq", "movsq">; 1751ff9feeb5SBill Wendling def : MnemonicAlias<"fldcww", "fldcw">; 1752ff9feeb5SBill Wendling def : MnemonicAlias<"fucompi", "fucomip">; 1753ff9feeb5SBill Wendling def : MnemonicAlias<"ud2a", "ud2">; 1754ff9feeb5SBill Wendling 1755ff9feeb5SBill Wendling... and many others. With a MnemonicAlias definition, the mnemonic is remapped 1756ff9feeb5SBill Wendlingsimply and directly. Though MnemonicAlias's can't look at any aspect of the 1757ff9feeb5SBill Wendlinginstruction (such as the operands) they can depend on global modes (the same 1758ff9feeb5SBill Wendlingones supported by the matcher), through a Requires clause: 1759ff9feeb5SBill Wendling 1760ff9feeb5SBill Wendling:: 1761ff9feeb5SBill Wendling 1762ff9feeb5SBill Wendling def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>; 1763ff9feeb5SBill Wendling def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>; 1764ff9feeb5SBill Wendling 17659ab8899fSSean SilvaIn this example, the mnemonic gets mapped into a different one depending on 1766ff9feeb5SBill Wendlingthe current instruction set. 1767ff9feeb5SBill Wendling 1768ff9feeb5SBill WendlingInstruction Aliases 1769ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^ 1770ff9feeb5SBill Wendling 1771ff9feeb5SBill WendlingThe most general phase of alias processing occurs while matching is happening: 1772ff9feeb5SBill Wendlingit provides new forms for the matcher to match along with a specific instruction 1773ff9feeb5SBill Wendlingto generate. An instruction alias has two parts: the string to match and the 1774ff9feeb5SBill Wendlinginstruction to generate. For example: 1775ff9feeb5SBill Wendling 1776ff9feeb5SBill Wendling:: 1777ff9feeb5SBill Wendling 1778ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>; 1779ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>; 1780ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>; 1781ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>; 1782ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>; 1783ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>; 1784ff9feeb5SBill Wendling def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>; 1785ff9feeb5SBill Wendling 1786ff9feeb5SBill WendlingThis shows a powerful example of the instruction aliases, matching the same 1787ff9feeb5SBill Wendlingmnemonic in multiple different ways depending on what operands are present in 1788ff9feeb5SBill Wendlingthe assembly. The result of instruction aliases can include operands in a 1789ff9feeb5SBill Wendlingdifferent order than the destination instruction, and can use an input multiple 1790ff9feeb5SBill Wendlingtimes, for example: 1791ff9feeb5SBill Wendling 1792ff9feeb5SBill Wendling:: 1793ff9feeb5SBill Wendling 1794ff9feeb5SBill Wendling def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>; 1795ff9feeb5SBill Wendling def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>; 1796ff9feeb5SBill Wendling def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>; 1797ff9feeb5SBill Wendling def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>; 1798ff9feeb5SBill Wendling 1799ff9feeb5SBill WendlingThis example also shows that tied operands are only listed once. In the X86 1800ff9feeb5SBill Wendlingbackend, XOR8rr has two input GR8's and one output GR8 (where an input is tied 1801ff9feeb5SBill Wendlingto the output). InstAliases take a flattened operand list without duplicates 1802ff9feeb5SBill Wendlingfor tied operands. The result of an instruction alias can also use immediates 1803ff9feeb5SBill Wendlingand fixed physical registers which are added as simple immediate operands in the 1804ff9feeb5SBill Wendlingresult, for example: 1805ff9feeb5SBill Wendling 1806ff9feeb5SBill Wendling:: 1807ff9feeb5SBill Wendling 1808ff9feeb5SBill Wendling // Fixed Immediate operand. 1809ff9feeb5SBill Wendling def : InstAlias<"aad", (AAD8i8 10)>; 1810ff9feeb5SBill Wendling 1811ff9feeb5SBill Wendling // Fixed register operand. 1812ff9feeb5SBill Wendling def : InstAlias<"fcomi", (COM_FIr ST1)>; 1813ff9feeb5SBill Wendling 1814ff9feeb5SBill Wendling // Simple alias. 1815ff9feeb5SBill Wendling def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>; 1816ff9feeb5SBill Wendling 1817ff9feeb5SBill WendlingInstruction aliases can also have a Requires clause to make them subtarget 1818ff9feeb5SBill Wendlingspecific. 1819ff9feeb5SBill Wendling 1820ff9feeb5SBill WendlingIf the back-end supports it, the instruction printer can automatically emit the 1821ff9feeb5SBill Wendlingalias rather than what's being aliased. It typically leads to better, more 1822ff9feeb5SBill Wendlingreadable code. If it's better to print out what's being aliased, then pass a '0' 1823ff9feeb5SBill Wendlingas the third parameter to the InstAlias definition. 1824ff9feeb5SBill Wendling 1825ff9feeb5SBill WendlingInstruction Matching 1826ff9feeb5SBill Wendling-------------------- 1827ff9feeb5SBill Wendling 1828ff9feeb5SBill Wendling.. note:: 1829ff9feeb5SBill Wendling 1830ff9feeb5SBill Wendling To Be Written 1831ff9feeb5SBill Wendling 1832ff9feeb5SBill Wendling.. _Implementations of the abstract target description interfaces: 1833ff9feeb5SBill Wendling.. _implement the target description: 1834ff9feeb5SBill Wendling 1835ff9feeb5SBill WendlingTarget-specific Implementation Notes 1836ff9feeb5SBill Wendling==================================== 1837ff9feeb5SBill Wendling 1838ff9feeb5SBill WendlingThis section of the document explains features or design decisions that are 1839*86c42429SAlex Bradburyspecific to the code generator for a particular target. 1840ff9feeb5SBill Wendling 18415ace2cd5SJay Foad.. _tail call section: 1842ff9feeb5SBill Wendling 1843ff9feeb5SBill WendlingTail call optimization 1844ff9feeb5SBill Wendling---------------------- 1845ff9feeb5SBill Wendling 1846ff9feeb5SBill WendlingTail call optimization, callee reusing the stack of the caller, is currently 184782a0e808STim Northoversupported on x86/x86-64, PowerPC, AArch64, and WebAssembly. It is performed on 184882a0e808STim Northoverx86/x86-64, PowerPC, and AArch64 if: 1849ff9feeb5SBill Wendling 1850d71b4e45SDuncan Sands* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC 185182a0e808STim Northover calling convention), ``cc 11`` (HiPE calling convention), ``tailcc``, or 185282a0e808STim Northover ``swifttailcc``. 1853ff9feeb5SBill Wendling 1854ff9feeb5SBill Wendling* The call is a tail call - in tail position (ret immediately follows call and 1855ff9feeb5SBill Wendling ret uses value of call or is void). 1856ff9feeb5SBill Wendling 1857f9b67b81SReid Kleckner* Option ``-tailcallopt`` is enabled or the calling convention is ``tailcc``. 1858ff9feeb5SBill Wendling 1859cf21875dSAlp Toker* Platform-specific constraints are met. 1860ff9feeb5SBill Wendling 1861ff9feeb5SBill Wendlingx86/x86-64 constraints: 1862ff9feeb5SBill Wendling 1863ff9feeb5SBill Wendling* No variable argument lists are used. 1864ff9feeb5SBill Wendling 1865ff9feeb5SBill Wendling* On x86-64 when generating GOT/PIC code only module-local calls (visibility = 1866ff9feeb5SBill Wendling hidden or protected) are supported. 1867ff9feeb5SBill Wendling 1868ff9feeb5SBill WendlingPowerPC constraints: 1869ff9feeb5SBill Wendling 1870ff9feeb5SBill Wendling* No variable argument lists are used. 1871ff9feeb5SBill Wendling 1872ff9feeb5SBill Wendling* No byval parameters are used. 1873ff9feeb5SBill Wendling 1874ff9feeb5SBill Wendling* On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) 1875ff9feeb5SBill Wendling are supported. 1876ff9feeb5SBill Wendling 1877e0a9dce5SThomas LivelyWebAssembly constraints: 1878e0a9dce5SThomas Lively 1879e0a9dce5SThomas Lively* No variable argument lists are used 1880e0a9dce5SThomas Lively 1881e0a9dce5SThomas Lively* The 'tail-call' target attribute is enabled. 1882e0a9dce5SThomas Lively 1883e0a9dce5SThomas Lively* The caller and callee's return types must match. The caller cannot 1884e0a9dce5SThomas Lively be void unless the callee is, too. 1885a1d97a96SThomas Lively 188682a0e808STim NorthoverAArch64 constraints: 188782a0e808STim Northover 188882a0e808STim Northover* No variable argument lists are used. 188982a0e808STim Northover 1890ff9feeb5SBill WendlingExample: 1891ff9feeb5SBill Wendling 1892ff9feeb5SBill WendlingCall as ``llc -tailcallopt test.ll``. 1893ff9feeb5SBill Wendling 1894ff9feeb5SBill Wendling.. code-block:: llvm 1895ff9feeb5SBill Wendling 1896ff9feeb5SBill Wendling declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4) 1897ff9feeb5SBill Wendling 1898ff9feeb5SBill Wendling define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { 1899ff9feeb5SBill Wendling %l1 = add i32 %in1, %in2 19009399681aSKai Nacke %tmp = tail call fastcc i32 @tailcallee(i32 inreg %in1, i32 inreg %in2, i32 %in1, i32 %l1) 1901ff9feeb5SBill Wendling ret i32 %tmp 1902ff9feeb5SBill Wendling } 1903ff9feeb5SBill Wendling 1904ff9feeb5SBill WendlingImplications of ``-tailcallopt``: 1905ff9feeb5SBill Wendling 1906ff9feeb5SBill WendlingTo support tail call optimization in situations where the callee has more 1907ff9feeb5SBill Wendlingarguments than the caller a 'callee pops arguments' convention is used. This 1908ff9feeb5SBill Wendlingcurrently causes each ``fastcc`` call that is not tail call optimized (because 1909ff9feeb5SBill Wendlingone or more of above constraints are not met) to be followed by a readjustment 1910ff9feeb5SBill Wendlingof the stack. So performance might be worse in such cases. 1911ff9feeb5SBill Wendling 1912ff9feeb5SBill WendlingSibling call optimization 1913ff9feeb5SBill Wendling------------------------- 1914ff9feeb5SBill Wendling 1915ff9feeb5SBill WendlingSibling call optimization is a restricted form of tail call optimization. 1916ff9feeb5SBill WendlingUnlike tail call optimization described in the previous section, it can be 1917ff9feeb5SBill Wendlingperformed automatically on any tail calls when ``-tailcallopt`` option is not 1918ff9feeb5SBill Wendlingspecified. 1919ff9feeb5SBill Wendling 1920ff9feeb5SBill WendlingSibling call optimization is currently performed on x86/x86-64 when the 1921ff9feeb5SBill Wendlingfollowing constraints are met: 1922ff9feeb5SBill Wendling 1923ff9feeb5SBill Wendling* Caller and callee have the same calling convention. It can be either ``c`` or 1924ff9feeb5SBill Wendling ``fastcc``. 1925ff9feeb5SBill Wendling 1926ff9feeb5SBill Wendling* The call is a tail call - in tail position (ret immediately follows call and 1927ff9feeb5SBill Wendling ret uses value of call or is void). 1928ff9feeb5SBill Wendling 1929ff9feeb5SBill Wendling* Caller and callee have matching return type or the callee result is not used. 1930ff9feeb5SBill Wendling 1931ff9feeb5SBill Wendling* If any of the callee arguments are being passed in stack, they must be 1932ff9feeb5SBill Wendling available in caller's own incoming argument stack and the frame offsets must 1933ff9feeb5SBill Wendling be the same. 1934ff9feeb5SBill Wendling 1935ff9feeb5SBill WendlingExample: 1936ff9feeb5SBill Wendling 1937ff9feeb5SBill Wendling.. code-block:: llvm 1938ff9feeb5SBill Wendling 1939ff9feeb5SBill Wendling declare i32 @bar(i32, i32) 1940ff9feeb5SBill Wendling 1941ff9feeb5SBill Wendling define i32 @foo(i32 %a, i32 %b, i32 %c) { 1942ff9feeb5SBill Wendling entry: 1943ff9feeb5SBill Wendling %0 = tail call i32 @bar(i32 %a, i32 %b) 1944ff9feeb5SBill Wendling ret i32 %0 1945ff9feeb5SBill Wendling } 1946ff9feeb5SBill Wendling 1947ff9feeb5SBill WendlingThe X86 backend 1948ff9feeb5SBill Wendling--------------- 1949ff9feeb5SBill Wendling 1950ff9feeb5SBill WendlingThe X86 code generator lives in the ``lib/Target/X86`` directory. This code 1951ff9feeb5SBill Wendlinggenerator is capable of targeting a variety of x86-32 and x86-64 processors, and 1952ff9feeb5SBill Wendlingincludes support for ISA extensions such as MMX and SSE. 1953ff9feeb5SBill Wendling 1954ff9feeb5SBill WendlingX86 Target Triples supported 1955ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1956ff9feeb5SBill Wendling 1957ff9feeb5SBill WendlingThe following are the known target triples that are supported by the X86 1958ff9feeb5SBill Wendlingbackend. This is not an exhaustive list, and it would be useful to add those 1959ff9feeb5SBill Wendlingthat people test. 1960ff9feeb5SBill Wendling 1961ff9feeb5SBill Wendling* **i686-pc-linux-gnu** --- Linux 1962ff9feeb5SBill Wendling 1963ff9feeb5SBill Wendling* **i386-unknown-freebsd5.3** --- FreeBSD 5.3 1964ff9feeb5SBill Wendling 1965ff9feeb5SBill Wendling* **i686-pc-cygwin** --- Cygwin on Win32 1966ff9feeb5SBill Wendling 1967ff9feeb5SBill Wendling* **i686-pc-mingw32** --- MingW on Win32 1968ff9feeb5SBill Wendling 1969ff9feeb5SBill Wendling* **i386-pc-mingw32msvc** --- MingW crosscompiler on Linux 1970ff9feeb5SBill Wendling 1971ff9feeb5SBill Wendling* **i686-apple-darwin*** --- Apple Darwin on X86 1972ff9feeb5SBill Wendling 1973ff9feeb5SBill Wendling* **x86_64-unknown-linux-gnu** --- Linux 1974ff9feeb5SBill Wendling 1975ff9feeb5SBill WendlingX86 Calling Conventions supported 1976ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1977ff9feeb5SBill Wendling 1978ff9feeb5SBill WendlingThe following target-specific calling conventions are known to backend: 1979ff9feeb5SBill Wendling 1980ff9feeb5SBill Wendling* **x86_StdCall** --- stdcall calling convention seen on Microsoft Windows 1981ff9feeb5SBill Wendling platform (CC ID = 64). 1982ff9feeb5SBill Wendling 1983ff9feeb5SBill Wendling* **x86_FastCall** --- fastcall calling convention seen on Microsoft Windows 1984ff9feeb5SBill Wendling platform (CC ID = 65). 1985ff9feeb5SBill Wendling 1986ff9feeb5SBill Wendling* **x86_ThisCall** --- Similar to X86_StdCall. Passes first argument in ECX, 1987ff9feeb5SBill Wendling others via stack. Callee is responsible for stack cleaning. This convention is 1988ff9feeb5SBill Wendling used by MSVC by default for methods in its ABI (CC ID = 70). 1989ff9feeb5SBill Wendling 1990ff9feeb5SBill Wendling.. _X86 addressing mode: 1991ff9feeb5SBill Wendling 1992ff9feeb5SBill WendlingRepresenting X86 addressing modes in MachineInstrs 1993ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1994ff9feeb5SBill Wendling 1995ff9feeb5SBill WendlingThe x86 has a very flexible way of accessing memory. It is capable of forming 1996ff9feeb5SBill Wendlingmemory addresses of the following expression directly in integer instructions 1997ff9feeb5SBill Wendling(which use ModR/M addressing): 1998ff9feeb5SBill Wendling 1999ff9feeb5SBill Wendling:: 2000ff9feeb5SBill Wendling 2001ff9feeb5SBill Wendling SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32 2002ff9feeb5SBill Wendling 2003ff9feeb5SBill WendlingIn order to represent this, LLVM tracks no less than 5 operands for each memory 2004ff9feeb5SBill Wendlingoperand of this form. This means that the "load" form of '``mov``' has the 2005ff9feeb5SBill Wendlingfollowing ``MachineOperand``\s in this order: 2006ff9feeb5SBill Wendling 2007ff9feeb5SBill Wendling:: 2008ff9feeb5SBill Wendling 2009ff9feeb5SBill Wendling Index: 0 | 1 2 3 4 5 2010ff9feeb5SBill Wendling Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment 2011ff9feeb5SBill Wendling OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg 2012ff9feeb5SBill Wendling 2013ff9feeb5SBill WendlingStores, and all other instructions, treat the four memory operands in the same 2014ff9feeb5SBill Wendlingway and in the same order. If the segment register is unspecified (regno = 0), 2015ff9feeb5SBill Wendlingthen no segment override is generated. "Lea" operations do not have a segment 2016ff9feeb5SBill Wendlingregister specified, so they only have 4 operands for their memory reference. 2017ff9feeb5SBill Wendling 2018ff9feeb5SBill WendlingX86 address spaces supported 2019ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2020ff9feeb5SBill Wendling 2021ff9feeb5SBill Wendlingx86 has a feature which provides the ability to perform loads and stores to 2022ff9feeb5SBill Wendlingdifferent address spaces via the x86 segment registers. A segment override 2023ff9feeb5SBill Wendlingprefix byte on an instruction causes the instruction's memory access to go to 2024ff9feeb5SBill Wendlingthe specified segment. LLVM address space 0 is the default address space, which 2025ff9feeb5SBill Wendlingincludes the stack, and any unqualified memory accesses in a program. Address 2026ff9feeb5SBill Wendlingspaces 1-255 are currently reserved for user-defined code. The GS-segment is 2027c9fbf101SDavid L Kreitzerrepresented by address space 256, the FS-segment is represented by address space 2028c9fbf101SDavid L Kreitzer257, and the SS-segment is represented by address space 258. Other x86 segments 2029c9fbf101SDavid L Kreitzerhave yet to be allocated address space numbers. 2030ff9feeb5SBill Wendling 2031ff9feeb5SBill WendlingWhile these address spaces may seem similar to TLS via the ``thread_local`` 2032ff9feeb5SBill Wendlingkeyword, and often use the same underlying hardware, there are some fundamental 2033ff9feeb5SBill Wendlingdifferences. 2034ff9feeb5SBill Wendling 2035ff9feeb5SBill WendlingThe ``thread_local`` keyword applies to global variables and specifies that they 2036ff9feeb5SBill Wendlingare to be allocated in thread-local memory. There are no type qualifiers 2037ff9feeb5SBill Wendlinginvolved, and these variables can be pointed to with normal pointers and 2038ff9feeb5SBill Wendlingaccessed with normal loads and stores. The ``thread_local`` keyword is 2039ff9feeb5SBill Wendlingtarget-independent at the LLVM IR level (though LLVM doesn't yet have 2040ff9feeb5SBill Wendlingimplementations of it for some configurations) 2041ff9feeb5SBill Wendling 2042ff9feeb5SBill WendlingSpecial address spaces, in contrast, apply to static types. Every load and store 2043ff9feeb5SBill Wendlinghas a particular address space in its address operand type, and this is what 2044ff9feeb5SBill Wendlingdetermines which address space is accessed. LLVM ignores these special address 2045ff9feeb5SBill Wendlingspace qualifiers on global variables, and does not provide a way to directly 2046ff9feeb5SBill Wendlingallocate storage in them. At the LLVM IR level, the behavior of these special 2047ff9feeb5SBill Wendlingaddress spaces depends in part on the underlying OS or runtime environment, and 2048ff9feeb5SBill Wendlingthey are specific to x86 (and LLVM doesn't yet handle them correctly in some 2049ff9feeb5SBill Wendlingcases). 2050ff9feeb5SBill Wendling 2051ff9feeb5SBill WendlingSome operating systems and runtime environments use (or may in the future use) 2052ff9feeb5SBill Wendlingthe FS/GS-segment registers for various low-level purposes, so care should be 2053ff9feeb5SBill Wendlingtaken when considering them. 2054ff9feeb5SBill Wendling 2055ff9feeb5SBill WendlingInstruction naming 2056ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^ 2057ff9feeb5SBill Wendling 2058e1567e77SYoungsuk KimAn instruction name consists of the base name, a default operand size, and a 2059ff9feeb5SBill Wendlingcharacter per operand with an optional special size. For example: 2060ff9feeb5SBill Wendling 2061ff9feeb5SBill Wendling:: 2062ff9feeb5SBill Wendling 2063ff9feeb5SBill Wendling ADD8rr -> add, 8-bit register, 8-bit register 2064ff9feeb5SBill Wendling IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate 2065ff9feeb5SBill Wendling IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate 2066ff9feeb5SBill Wendling MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory 2067ff9feeb5SBill Wendling 2068ff9feeb5SBill WendlingThe PowerPC backend 2069ff9feeb5SBill Wendling------------------- 2070ff9feeb5SBill Wendling 2071ff9feeb5SBill WendlingThe PowerPC code generator lives in the lib/Target/PowerPC directory. The code 2072ff9feeb5SBill Wendlinggeneration is retargetable to several variations or *subtargets* of the PowerPC 2073ff9feeb5SBill WendlingISA; including ppc32, ppc64 and altivec. 2074ff9feeb5SBill Wendling 2075ff9feeb5SBill WendlingLLVM PowerPC ABI 2076ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^ 2077ff9feeb5SBill Wendling 2078ff9feeb5SBill WendlingLLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative 2079ff9feeb5SBill Wendling(PIC) or static addressing for accessing global values, so no TOC (r2) is 2080ff9feeb5SBill Wendlingused. Second, r31 is used as a frame pointer to allow dynamic growth of a stack 2081ff9feeb5SBill Wendlingframe. LLVM takes advantage of having no TOC to provide space to save the frame 2082ff9feeb5SBill Wendlingpointer in the PowerPC linkage area of the caller frame. Other details of 2083ff9feeb5SBill WendlingPowerPC ABI can be found at `PowerPC ABI 2084ff9feeb5SBill Wendling<http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html>`_\ 2085ff9feeb5SBill Wendling. Note: This link describes the 32 bit ABI. The 64 bit ABI is similar except 2086ff9feeb5SBill Wendlingspace for GPRs are 8 bytes wide (not 4) and r13 is reserved for system use. 2087ff9feeb5SBill Wendling 2088ff9feeb5SBill WendlingFrame Layout 2089ff9feeb5SBill Wendling^^^^^^^^^^^^ 2090ff9feeb5SBill Wendling 2091ff9feeb5SBill WendlingThe size of a PowerPC frame is usually fixed for the duration of a function's 2092ff9feeb5SBill Wendlinginvocation. Since the frame is fixed size, all references into the frame can be 2093ff9feeb5SBill Wendlingaccessed via fixed offsets from the stack pointer. The exception to this is 2094ff9feeb5SBill Wendlingwhen dynamic alloca or variable sized arrays are present, then a base pointer 2095ff9feeb5SBill Wendling(r31) is used as a proxy for the stack pointer and stack pointer is free to grow 2096ff9feeb5SBill Wendlingor shrink. A base pointer is also used if llvm-gcc is not passed the 2097ff9feeb5SBill Wendling-fomit-frame-pointer flag. The stack pointer is always aligned to 16 bytes, so 2098ff9feeb5SBill Wendlingthat space allocated for altivec vectors will be properly aligned. 2099ff9feeb5SBill Wendling 2100ff9feeb5SBill WendlingAn invocation frame is laid out as follows (low memory at top): 2101ff9feeb5SBill Wendling 2102ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">` 2103ff9feeb5SBill Wendling:raw-html:`<tr>` 2104ff9feeb5SBill Wendling:raw-html:`<td>Linkage<br><br></td>` 2105ff9feeb5SBill Wendling:raw-html:`</tr>` 2106ff9feeb5SBill Wendling:raw-html:`<tr>` 2107ff9feeb5SBill Wendling:raw-html:`<td>Parameter area<br><br></td>` 2108ff9feeb5SBill Wendling:raw-html:`</tr>` 2109ff9feeb5SBill Wendling:raw-html:`<tr>` 2110ff9feeb5SBill Wendling:raw-html:`<td>Dynamic area<br><br></td>` 2111ff9feeb5SBill Wendling:raw-html:`</tr>` 2112ff9feeb5SBill Wendling:raw-html:`<tr>` 2113ff9feeb5SBill Wendling:raw-html:`<td>Locals area<br><br></td>` 2114ff9feeb5SBill Wendling:raw-html:`</tr>` 2115ff9feeb5SBill Wendling:raw-html:`<tr>` 2116ff9feeb5SBill Wendling:raw-html:`<td>Saved registers area<br><br></td>` 2117ff9feeb5SBill Wendling:raw-html:`</tr>` 2118ff9feeb5SBill Wendling:raw-html:`<tr style="border-style: none hidden none hidden;">` 2119ff9feeb5SBill Wendling:raw-html:`<td><br></td>` 2120ff9feeb5SBill Wendling:raw-html:`</tr>` 2121ff9feeb5SBill Wendling:raw-html:`<tr>` 2122ff9feeb5SBill Wendling:raw-html:`<td>Previous Frame<br><br></td>` 2123ff9feeb5SBill Wendling:raw-html:`</tr>` 2124ff9feeb5SBill Wendling:raw-html:`</table>` 2125ff9feeb5SBill Wendling 2126ff9feeb5SBill WendlingThe *linkage* area is used by a callee to save special registers prior to 2127ff9feeb5SBill Wendlingallocating its own frame. Only three entries are relevant to LLVM. The first 2128ff9feeb5SBill Wendlingentry is the previous stack pointer (sp), aka link. This allows probing tools 2129ff9feeb5SBill Wendlinglike gdb or exception handlers to quickly scan the frames in the stack. A 2130ff9feeb5SBill Wendlingfunction epilog can also use the link to pop the frame from the stack. The 2131ff9feeb5SBill Wendlingthird entry in the linkage area is used to save the return address from the lr 2132ff9feeb5SBill Wendlingregister. Finally, as mentioned above, the last entry is used to save the 2133ff9feeb5SBill Wendlingprevious frame pointer (r31.) The entries in the linkage area are the size of a 2134ff9feeb5SBill WendlingGPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64 2135ff9feeb5SBill Wendlingbit mode. 2136ff9feeb5SBill Wendling 2137ff9feeb5SBill Wendling32 bit linkage area: 2138ff9feeb5SBill Wendling 2139ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">` 2140ff9feeb5SBill Wendling:raw-html:`<tr>` 2141ff9feeb5SBill Wendling:raw-html:`<td>0</td>` 2142ff9feeb5SBill Wendling:raw-html:`<td>Saved SP (r1)</td>` 2143ff9feeb5SBill Wendling:raw-html:`</tr>` 2144ff9feeb5SBill Wendling:raw-html:`<tr>` 2145ff9feeb5SBill Wendling:raw-html:`<td>4</td>` 2146ff9feeb5SBill Wendling:raw-html:`<td>Saved CR</td>` 2147ff9feeb5SBill Wendling:raw-html:`</tr>` 2148ff9feeb5SBill Wendling:raw-html:`<tr>` 2149ff9feeb5SBill Wendling:raw-html:`<td>8</td>` 2150ff9feeb5SBill Wendling:raw-html:`<td>Saved LR</td>` 2151ff9feeb5SBill Wendling:raw-html:`</tr>` 2152ff9feeb5SBill Wendling:raw-html:`<tr>` 2153ff9feeb5SBill Wendling:raw-html:`<td>12</td>` 2154ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>` 2155ff9feeb5SBill Wendling:raw-html:`</tr>` 2156ff9feeb5SBill Wendling:raw-html:`<tr>` 2157ff9feeb5SBill Wendling:raw-html:`<td>16</td>` 2158ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>` 2159ff9feeb5SBill Wendling:raw-html:`</tr>` 2160ff9feeb5SBill Wendling:raw-html:`<tr>` 2161ff9feeb5SBill Wendling:raw-html:`<td>20</td>` 2162ff9feeb5SBill Wendling:raw-html:`<td>Saved FP (r31)</td>` 2163ff9feeb5SBill Wendling:raw-html:`</tr>` 2164ff9feeb5SBill Wendling:raw-html:`</table>` 2165ff9feeb5SBill Wendling 2166ff9feeb5SBill Wendling64 bit linkage area: 2167ff9feeb5SBill Wendling 2168ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">` 2169ff9feeb5SBill Wendling:raw-html:`<tr>` 2170ff9feeb5SBill Wendling:raw-html:`<td>0</td>` 2171ff9feeb5SBill Wendling:raw-html:`<td>Saved SP (r1)</td>` 2172ff9feeb5SBill Wendling:raw-html:`</tr>` 2173ff9feeb5SBill Wendling:raw-html:`<tr>` 2174ff9feeb5SBill Wendling:raw-html:`<td>8</td>` 2175ff9feeb5SBill Wendling:raw-html:`<td>Saved CR</td>` 2176ff9feeb5SBill Wendling:raw-html:`</tr>` 2177ff9feeb5SBill Wendling:raw-html:`<tr>` 2178ff9feeb5SBill Wendling:raw-html:`<td>16</td>` 2179ff9feeb5SBill Wendling:raw-html:`<td>Saved LR</td>` 2180ff9feeb5SBill Wendling:raw-html:`</tr>` 2181ff9feeb5SBill Wendling:raw-html:`<tr>` 2182ff9feeb5SBill Wendling:raw-html:`<td>24</td>` 2183ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>` 2184ff9feeb5SBill Wendling:raw-html:`</tr>` 2185ff9feeb5SBill Wendling:raw-html:`<tr>` 2186ff9feeb5SBill Wendling:raw-html:`<td>32</td>` 2187ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>` 2188ff9feeb5SBill Wendling:raw-html:`</tr>` 2189ff9feeb5SBill Wendling:raw-html:`<tr>` 2190ff9feeb5SBill Wendling:raw-html:`<td>40</td>` 2191ff9feeb5SBill Wendling:raw-html:`<td>Saved FP (r31)</td>` 2192ff9feeb5SBill Wendling:raw-html:`</tr>` 2193ff9feeb5SBill Wendling:raw-html:`</table>` 2194ff9feeb5SBill Wendling 2195ff9feeb5SBill WendlingThe *parameter area* is used to store arguments being passed to a callee 2196ff9feeb5SBill Wendlingfunction. Following the PowerPC ABI, the first few arguments are actually 2197ff9feeb5SBill Wendlingpassed in registers, with the space in the parameter area unused. However, if 2198ff9feeb5SBill Wendlingthere are not enough registers or the callee is a thunk or vararg function, 2199ff9feeb5SBill Wendlingthese register arguments can be spilled into the parameter area. Thus, the 2200ff9feeb5SBill Wendlingparameter area must be large enough to store all the parameters for the largest 2201ff9feeb5SBill Wendlingcall sequence made by the caller. The size must also be minimally large enough 2202ff9feeb5SBill Wendlingto spill registers r3-r10. This allows callees blind to the call signature, 2203ff9feeb5SBill Wendlingsuch as thunks and vararg functions, enough space to cache the argument 2204ff9feeb5SBill Wendlingregisters. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64 2205ff9feeb5SBill Wendlingbit mode.) Also note that since the parameter area is a fixed offset from the 2206f65d4aa9SKazuaki Ishizakitop of the frame, that a callee can access its split arguments using fixed 2207ff9feeb5SBill Wendlingoffsets from the stack pointer (or base pointer.) 2208ff9feeb5SBill Wendling 2209ff9feeb5SBill WendlingCombining the information about the linkage, parameter areas and alignment. A 2210ff9feeb5SBill Wendlingstack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit mode. 2211ff9feeb5SBill Wendling 2212ff9feeb5SBill WendlingThe *dynamic area* starts out as size zero. If a function uses dynamic alloca 2213ff9feeb5SBill Wendlingthen space is added to the stack, the linkage and parameter areas are shifted to 2214ff9feeb5SBill Wendlingtop of stack, and the new space is available immediately below the linkage and 2215ff9feeb5SBill Wendlingparameter areas. The cost of shifting the linkage and parameter areas is minor 2216ff9feeb5SBill Wendlingsince only the link value needs to be copied. The link value can be easily 2217ff9feeb5SBill Wendlingfetched by adding the original frame size to the base pointer. Note that 2218ff9feeb5SBill Wendlingallocations in the dynamic space need to observe 16 byte alignment. 2219ff9feeb5SBill Wendling 2220ff9feeb5SBill WendlingThe *locals area* is where the llvm compiler reserves space for local variables. 2221ff9feeb5SBill Wendling 2222ff9feeb5SBill WendlingThe *saved registers area* is where the llvm compiler spills callee saved 2223ff9feeb5SBill Wendlingregisters on entry to the callee. 2224ff9feeb5SBill Wendling 2225ff9feeb5SBill WendlingProlog/Epilog 2226ff9feeb5SBill Wendling^^^^^^^^^^^^^ 2227ff9feeb5SBill Wendling 2228ff9feeb5SBill WendlingThe llvm prolog and epilog are the same as described in the PowerPC ABI, with 2229ff9feeb5SBill Wendlingthe following exceptions. Callee saved registers are spilled after the frame is 2230ff9feeb5SBill Wendlingcreated. This allows the llvm epilog/prolog support to be common with other 2231ff9feeb5SBill Wendlingtargets. The base pointer callee saved register r31 is saved in the TOC slot of 2232ff9feeb5SBill Wendlinglinkage area. This simplifies allocation of space for the base pointer and 2233843b7515SSylvestre Ledrumakes it convenient to locate programmatically and during debugging. 2234ff9feeb5SBill Wendling 2235ff9feeb5SBill WendlingDynamic Allocation 2236ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^ 2237ff9feeb5SBill Wendling 2238ff9feeb5SBill Wendling.. note:: 2239ff9feeb5SBill Wendling 2240ff9feeb5SBill Wendling TODO - More to come. 2241ff9feeb5SBill Wendling 2242f73d7a53SJustin HolewinskiThe NVPTX backend 2243f73d7a53SJustin Holewinski----------------- 2244ff9feeb5SBill Wendling 2245f73d7a53SJustin HolewinskiThe NVPTX code generator under lib/Target/NVPTX is an open-source version of 2246f73d7a53SJustin Holewinskithe NVIDIA NVPTX code generator for LLVM. It is contributed by NVIDIA and is 2247f73d7a53SJustin Holewinskia port of the code generator used in the CUDA compiler (nvcc). It targets the 2248f73d7a53SJustin HolewinskiPTX 3.0/3.1 ISA and can target any compute capability greater than or equal to 2249f73d7a53SJustin Holewinski2.0 (Fermi). 2250ff9feeb5SBill Wendling 2251f73d7a53SJustin HolewinskiThis target is of production quality and should be completely compatible with 2252f73d7a53SJustin Holewinskithe official NVIDIA toolchain. 2253ff9feeb5SBill Wendling 2254ff9feeb5SBill WendlingCode Generator Options: 2255ff9feeb5SBill Wendling 2256ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">` 2257ff9feeb5SBill Wendling:raw-html:`<tr>` 2258ff9feeb5SBill Wendling:raw-html:`<th>Option</th>` 2259ff9feeb5SBill Wendling:raw-html:`<th>Description</th>` 2260ff9feeb5SBill Wendling:raw-html:`</tr>` 2261ff9feeb5SBill Wendling:raw-html:`<tr>` 2262f73d7a53SJustin Holewinski:raw-html:`<td>sm_20</td>` 2263f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 2.0</td>` 2264ff9feeb5SBill Wendling:raw-html:`</tr>` 2265ff9feeb5SBill Wendling:raw-html:`<tr>` 2266f73d7a53SJustin Holewinski:raw-html:`<td>sm_21</td>` 2267f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 2.1</td>` 2268ff9feeb5SBill Wendling:raw-html:`</tr>` 2269ff9feeb5SBill Wendling:raw-html:`<tr>` 2270f73d7a53SJustin Holewinski:raw-html:`<td>sm_30</td>` 2271f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 3.0</td>` 2272f73d7a53SJustin Holewinski:raw-html:`</tr>` 2273f73d7a53SJustin Holewinski:raw-html:`<tr>` 2274f73d7a53SJustin Holewinski:raw-html:`<td>sm_35</td>` 2275f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 3.5</td>` 2276f73d7a53SJustin Holewinski:raw-html:`</tr>` 2277f73d7a53SJustin Holewinski:raw-html:`<tr>` 2278f73d7a53SJustin Holewinski:raw-html:`<td>ptx30</td>` 2279f73d7a53SJustin Holewinski:raw-html:`<td align="left">Target PTX 3.0</td>` 2280f73d7a53SJustin Holewinski:raw-html:`</tr>` 2281f73d7a53SJustin Holewinski:raw-html:`<tr>` 2282f73d7a53SJustin Holewinski:raw-html:`<td>ptx31</td>` 2283f73d7a53SJustin Holewinski:raw-html:`<td align="left">Target PTX 3.1</td>` 2284ff9feeb5SBill Wendling:raw-html:`</tr>` 2285ff9feeb5SBill Wendling:raw-html:`</table>` 2286ff9feeb5SBill Wendling 2287cb6b408dSAlexei StarovoitovThe extended Berkeley Packet Filter (eBPF) backend 2288cb6b408dSAlexei Starovoitov-------------------------------------------------- 2289cb6b408dSAlexei Starovoitov 2290cb6b408dSAlexei StarovoitovExtended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used 2291cb6b408dSAlexei Starovoitovto filter network packets. The 2292cb6b408dSAlexei Starovoitov`bpf() system call <http://man7.org/linux/man-pages/man2/bpf.2.html>`_ 2293cb6b408dSAlexei Starovoitovperforms a range of operations related to eBPF. For both cBPF and eBPF 2294cb6b408dSAlexei Starovoitovprograms, the Linux kernel statically analyzes the programs before loading 2295cb6b408dSAlexei Starovoitovthem, in order to ensure that they cannot harm the running system. eBPF is 2296cb6b408dSAlexei Starovoitova 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs. 2297cb6b408dSAlexei StarovoitovOpcodes are 8-bit encoded, and 87 instructions are defined. There are 10 2298cb6b408dSAlexei Starovoitovregisters, grouped by function as outlined below. 2299cb6b408dSAlexei Starovoitov 2300cb6b408dSAlexei Starovoitov:: 2301cb6b408dSAlexei Starovoitov 2302cb6b408dSAlexei Starovoitov R0 return value from in-kernel functions; exit value for eBPF program 2303cb6b408dSAlexei Starovoitov R1 - R5 function call arguments to in-kernel functions 2304cb6b408dSAlexei Starovoitov R6 - R9 callee-saved registers preserved by in-kernel functions 2305cb6b408dSAlexei Starovoitov R10 stack frame pointer (read only) 2306cb6b408dSAlexei Starovoitov 2307cb6b408dSAlexei StarovoitovInstruction encoding (arithmetic and jump) 2308cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2309cb6b408dSAlexei StarovoitoveBPF is reusing most of the opcode encoding from classic to simplify conversion 2310cb6b408dSAlexei Starovoitovof classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' 2311cb6b408dSAlexei Starovoitovfield is divided into three parts: 2312cb6b408dSAlexei Starovoitov 2313cb6b408dSAlexei Starovoitov:: 2314cb6b408dSAlexei Starovoitov 2315cb6b408dSAlexei Starovoitov +----------------+--------+--------------------+ 2316cb6b408dSAlexei Starovoitov | 4 bits | 1 bit | 3 bits | 2317cb6b408dSAlexei Starovoitov | operation code | source | instruction class | 2318cb6b408dSAlexei Starovoitov +----------------+--------+--------------------+ 2319cb6b408dSAlexei Starovoitov (MSB) (LSB) 2320cb6b408dSAlexei Starovoitov 2321cb6b408dSAlexei StarovoitovThree LSB bits store instruction class which is one of: 2322cb6b408dSAlexei Starovoitov 2323cb6b408dSAlexei Starovoitov:: 2324cb6b408dSAlexei Starovoitov 2325cb6b408dSAlexei Starovoitov BPF_LD 0x0 2326cb6b408dSAlexei Starovoitov BPF_LDX 0x1 2327cb6b408dSAlexei Starovoitov BPF_ST 0x2 2328cb6b408dSAlexei Starovoitov BPF_STX 0x3 2329cb6b408dSAlexei Starovoitov BPF_ALU 0x4 2330cb6b408dSAlexei Starovoitov BPF_JMP 0x5 2331cb6b408dSAlexei Starovoitov (unused) 0x6 2332cb6b408dSAlexei Starovoitov BPF_ALU64 0x7 2333cb6b408dSAlexei Starovoitov 2334cb6b408dSAlexei StarovoitovWhen BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP, 2335cb6b408dSAlexei Starovoitov4th bit encodes source operand 2336cb6b408dSAlexei Starovoitov 2337cb6b408dSAlexei Starovoitov:: 2338cb6b408dSAlexei Starovoitov 233933434d5fSYonghong Song BPF_X 0x1 use src_reg register as source operand 234033434d5fSYonghong Song BPF_K 0x0 use 32 bit immediate as source operand 2341cb6b408dSAlexei Starovoitov 2342cb6b408dSAlexei Starovoitovand four MSB bits store operation code 2343cb6b408dSAlexei Starovoitov 2344cb6b408dSAlexei Starovoitov:: 2345cb6b408dSAlexei Starovoitov 2346cb6b408dSAlexei Starovoitov BPF_ADD 0x0 add 2347cb6b408dSAlexei Starovoitov BPF_SUB 0x1 subtract 2348cb6b408dSAlexei Starovoitov BPF_MUL 0x2 multiply 2349cb6b408dSAlexei Starovoitov BPF_DIV 0x3 divide 2350cb6b408dSAlexei Starovoitov BPF_OR 0x4 bitwise logical OR 2351cb6b408dSAlexei Starovoitov BPF_AND 0x5 bitwise logical AND 2352cb6b408dSAlexei Starovoitov BPF_LSH 0x6 left shift 2353cb6b408dSAlexei Starovoitov BPF_RSH 0x7 right shift (zero extended) 2354cb6b408dSAlexei Starovoitov BPF_NEG 0x8 arithmetic negation 2355cb6b408dSAlexei Starovoitov BPF_MOD 0x9 modulo 2356cb6b408dSAlexei Starovoitov BPF_XOR 0xa bitwise logical XOR 2357cb6b408dSAlexei Starovoitov BPF_MOV 0xb move register to register 2358cb6b408dSAlexei Starovoitov BPF_ARSH 0xc right shift (sign extended) 2359cb6b408dSAlexei Starovoitov BPF_END 0xd endianness conversion 2360cb6b408dSAlexei Starovoitov 2361cb6b408dSAlexei StarovoitovIf BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of 2362cb6b408dSAlexei Starovoitov 2363cb6b408dSAlexei Starovoitov:: 2364cb6b408dSAlexei Starovoitov 2365cb6b408dSAlexei Starovoitov BPF_JA 0x0 unconditional jump 2366cb6b408dSAlexei Starovoitov BPF_JEQ 0x1 jump == 2367cb6b408dSAlexei Starovoitov BPF_JGT 0x2 jump > 2368cb6b408dSAlexei Starovoitov BPF_JGE 0x3 jump >= 2369cb6b408dSAlexei Starovoitov BPF_JSET 0x4 jump if (DST & SRC) 2370cb6b408dSAlexei Starovoitov BPF_JNE 0x5 jump != 2371cb6b408dSAlexei Starovoitov BPF_JSGT 0x6 jump signed > 2372cb6b408dSAlexei Starovoitov BPF_JSGE 0x7 jump signed >= 2373cb6b408dSAlexei Starovoitov BPF_CALL 0x8 function call 2374cb6b408dSAlexei Starovoitov BPF_EXIT 0x9 function return 2375cb6b408dSAlexei Starovoitov 2376cb6b408dSAlexei StarovoitovInstruction encoding (load, store) 2377cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2378cb6b408dSAlexei StarovoitovFor load and store instructions the 8-bit 'code' field is divided as: 2379cb6b408dSAlexei Starovoitov 2380cb6b408dSAlexei Starovoitov:: 2381cb6b408dSAlexei Starovoitov 2382cb6b408dSAlexei Starovoitov +--------+--------+-------------------+ 2383cb6b408dSAlexei Starovoitov | 3 bits | 2 bits | 3 bits | 2384cb6b408dSAlexei Starovoitov | mode | size | instruction class | 2385cb6b408dSAlexei Starovoitov +--------+--------+-------------------+ 2386cb6b408dSAlexei Starovoitov (MSB) (LSB) 2387cb6b408dSAlexei Starovoitov 2388cb6b408dSAlexei StarovoitovSize modifier is one of 2389cb6b408dSAlexei Starovoitov 2390cb6b408dSAlexei Starovoitov:: 2391cb6b408dSAlexei Starovoitov 2392cb6b408dSAlexei Starovoitov BPF_W 0x0 word 2393cb6b408dSAlexei Starovoitov BPF_H 0x1 half word 2394cb6b408dSAlexei Starovoitov BPF_B 0x2 byte 2395cb6b408dSAlexei Starovoitov BPF_DW 0x3 double word 2396cb6b408dSAlexei Starovoitov 2397cb6b408dSAlexei StarovoitovMode modifier is one of 2398cb6b408dSAlexei Starovoitov 2399cb6b408dSAlexei Starovoitov:: 2400cb6b408dSAlexei Starovoitov 2401cb6b408dSAlexei Starovoitov BPF_IMM 0x0 immediate 2402cb6b408dSAlexei Starovoitov BPF_ABS 0x1 used to access packet data 2403cb6b408dSAlexei Starovoitov BPF_IND 0x2 used to access packet data 2404cb6b408dSAlexei Starovoitov BPF_MEM 0x3 memory 2405cb6b408dSAlexei Starovoitov (reserved) 0x4 2406cb6b408dSAlexei Starovoitov (reserved) 0x5 2407cb6b408dSAlexei Starovoitov BPF_XADD 0x6 exclusive add 2408cb6b408dSAlexei Starovoitov 2409cb6b408dSAlexei Starovoitov 2410cb6b408dSAlexei StarovoitovPacket data access (BPF_ABS, BPF_IND) 2411cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2412cb6b408dSAlexei Starovoitov 2413cb6b408dSAlexei StarovoitovTwo non-generic instructions: (BPF_ABS | <size> | BPF_LD) and 2414cb6b408dSAlexei Starovoitov(BPF_IND | <size> | BPF_LD) which are used to access packet data. 2415cb6b408dSAlexei StarovoitovRegister R6 is an implicit input that must contain pointer to sk_buff. 2416cb6b408dSAlexei StarovoitovRegister R0 is an implicit output which contains the data fetched 2417cb6b408dSAlexei Starovoitovfrom the packet. Registers R1-R5 are scratch registers and must not 2418cb6b408dSAlexei Starovoitovbe used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD 2419cb6b408dSAlexei Starovoitovinstructions. These instructions have implicit program exit condition 2420cb6b408dSAlexei Starovoitovas well. When eBPF program is trying to access the data beyond 2421cb6b408dSAlexei Starovoitovthe packet boundary, the interpreter will abort the execution of the program. 2422cb6b408dSAlexei Starovoitov 2423cb6b408dSAlexei StarovoitovBPF_IND | BPF_W | BPF_LD is equivalent to: 2424cb6b408dSAlexei Starovoitov R0 = ntohl(\*(u32 \*) (((struct sk_buff \*) R6)->data + src_reg + imm32)) 2425cb6b408dSAlexei Starovoitov 2426cb6b408dSAlexei StarovoitoveBPF maps 2427cb6b408dSAlexei Starovoitov^^^^^^^^^ 2428cb6b408dSAlexei Starovoitov 2429cb6b408dSAlexei StarovoitoveBPF maps are provided for sharing data between kernel and user-space. 2430cb6b408dSAlexei StarovoitovCurrently implemented types are hash and array, with potential extension to 2431cb6b408dSAlexei Starovoitovsupport bloom filters, radix trees, etc. A map is defined by its type, 2432cb6b408dSAlexei Starovoitovmaximum number of elements, key size and value size in bytes. eBPF syscall 2433cb6b408dSAlexei Starovoitovsupports create, update, find and delete functions on maps. 2434cb6b408dSAlexei Starovoitov 2435cb6b408dSAlexei StarovoitovFunction calls 2436cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^ 2437cb6b408dSAlexei Starovoitov 2438cb6b408dSAlexei StarovoitovFunction call arguments are passed using up to five registers (R1 - R5). 2439cb6b408dSAlexei StarovoitovThe return value is passed in a dedicated register (R0). Four additional 2440cb6b408dSAlexei Starovoitovregisters (R6 - R9) are callee-saved, and the values in these registers 2441cb6b408dSAlexei Starovoitovare preserved within kernel functions. R0 - R5 are scratch registers within 2442cb6b408dSAlexei Starovoitovkernel functions, and eBPF programs must therefor store/restore values in 2443cb6b408dSAlexei Starovoitovthese registers if needed across function calls. The stack can be accessed 2444cb6b408dSAlexei Starovoitovusing the read-only frame pointer R10. eBPF registers map 1:1 to hardware 2445cb6b408dSAlexei Starovoitovregisters on x86_64 and other 64-bit architectures. For example, x86_64 2446cb6b408dSAlexei Starovoitovin-kernel JIT maps them as 2447cb6b408dSAlexei Starovoitov 2448cb6b408dSAlexei Starovoitov:: 2449cb6b408dSAlexei Starovoitov 2450cb6b408dSAlexei Starovoitov R0 - rax 2451cb6b408dSAlexei Starovoitov R1 - rdi 2452cb6b408dSAlexei Starovoitov R2 - rsi 2453cb6b408dSAlexei Starovoitov R3 - rdx 2454cb6b408dSAlexei Starovoitov R4 - rcx 2455cb6b408dSAlexei Starovoitov R5 - r8 2456cb6b408dSAlexei Starovoitov R6 - rbx 2457cb6b408dSAlexei Starovoitov R7 - r13 2458cb6b408dSAlexei Starovoitov R8 - r14 2459cb6b408dSAlexei Starovoitov R9 - r15 2460cb6b408dSAlexei Starovoitov R10 - rbp 2461cb6b408dSAlexei Starovoitov 2462cb6b408dSAlexei Starovoitovsince x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing 2463cb6b408dSAlexei Starovoitovand rbx, r12 - r15 are callee saved. 2464cb6b408dSAlexei Starovoitov 2465cb6b408dSAlexei StarovoitovProgram start 2466cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^ 2467cb6b408dSAlexei Starovoitov 2468cb6b408dSAlexei StarovoitovAn eBPF program receives a single argument and contains 2469cb6b408dSAlexei Starovoitova single eBPF main routine; the program does not contain eBPF functions. 2470cb6b408dSAlexei StarovoitovFunction calls are limited to a predefined set of kernel functions. The size 2471cb6b408dSAlexei Starovoitovof a program is limited to 4K instructions: this ensures fast termination and 2472cb6b408dSAlexei Starovoitova limited number of kernel function calls. Prior to running an eBPF program, 2473cb6b408dSAlexei Starovoitova verifier performs static analysis to prevent loops in the code and 2474cb6b408dSAlexei Starovoitovto ensure valid register usage and operand types. 24759fdfa518STom Stellard 24769fdfa518STom StellardThe AMDGPU backend 24779fdfa518STom Stellard------------------ 24789fdfa518STom Stellard 2479f16a45eaSTony TyeThe AMDGPU code generator lives in the ``lib/Target/AMDGPU`` 2480f16a45eaSTony Tyedirectory. This code generator is capable of targeting a variety of 2481f16a45eaSTony TyeAMD GPU processors. Refer to :doc:`AMDGPUUsage` for more information. 2482