llvm/docs/CodeGenerator.rst

7daef91dSBill Wendling==========================================
7daef91dSBill WendlingThe LLVM Target-Independent Code Generator
7daef91dSBill Wendling==========================================
7daef91dSBill Wendling
ff9feeb5SBill Wendling.. role:: raw-html(raw)
ff9feeb5SBill Wendling   :format: html
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. raw:: html
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  <style>
ff9feeb5SBill Wendling    .unknown { background-color: #C0C0C0; text-align: center; }
ff9feeb5SBill Wendling    .unknown:before { content: "?" }
ff9feeb5SBill Wendling    .no { background-color: #C11B17 }
ff9feeb5SBill Wendling    .no:before { content: "N" }
ff9feeb5SBill Wendling    .partial { background-color: #F88017 }
ff9feeb5SBill Wendling    .yes { background-color: #0F0; }
ff9feeb5SBill Wendling    .yes:before { content: "Y" }
ceab0deaSJustin Holewinski    .na { background-color: #6666FF; }
ceab0deaSJustin Holewinski    .na:before { content: "N/A" }
ff9feeb5SBill Wendling  </style>
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. contents::
ff9feeb5SBill Wendling   :local:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. warning::
ff9feeb5SBill Wendling  This is a work in progress.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIntroduction
ff9feeb5SBill Wendling============
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe LLVM target-independent code generator is a framework that provides a suite
ff9feeb5SBill Wendlingof reusable components for translating the LLVM internal representation to the
ff9feeb5SBill Wendlingmachine code for a specified target---either in assembly form (suitable for a
ff9feeb5SBill Wendlingstatic compiler) or in binary machine code format (usable for a JIT
ff9feeb5SBill Wendlingcompiler). The LLVM target-independent code generator consists of six main
ff9feeb5SBill Wendlingcomponents:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling1. `Abstract target description`_ interfaces which capture important properties
ff9feeb5SBill Wendling   about various aspects of the machine, independently of how they will be used.
ff9feeb5SBill Wendling   These interfaces are defined in ``include/llvm/Target/``.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling2. Classes used to represent the `code being generated`_ for a target.  These
ff9feeb5SBill Wendling   classes are intended to be abstract enough to represent the machine code for
ff9feeb5SBill Wendling   *any* target machine.  These classes are defined in
ff9feeb5SBill Wendling   ``include/llvm/CodeGen/``. At this level, concepts like "constant pool
ff9feeb5SBill Wendling   entries" and "jump tables" are explicitly exposed.
ff9feeb5SBill Wendling
20adbdd2SJustin Lebar3. Classes and algorithms used to represent code at the object file level, the
ff9feeb5SBill Wendling   `MC Layer`_.  These classes represent assembly level constructs like labels,
ff9feeb5SBill Wendling   sections, and instructions.  At this level, concepts like "constant pool
ff9feeb5SBill Wendling   entries" and "jump tables" don't exist.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling4. `Target-independent algorithms`_ used to implement various phases of native
ff9feeb5SBill Wendling   code generation (register allocation, scheduling, stack frame representation,
ff9feeb5SBill Wendling   etc).  This code lives in ``lib/CodeGen/``.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling5. `Implementations of the abstract target description interfaces`_ for
ff9feeb5SBill Wendling   particular targets.  These machine descriptions make use of the components
ff9feeb5SBill Wendling   provided by LLVM, and can optionally provide custom target-specific passes,
ff9feeb5SBill Wendling   to build complete code generators for a specific target.  Target descriptions
ff9feeb5SBill Wendling   live in ``lib/Target/``.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling6. The target-independent JIT components.  The LLVM JIT is completely target
ff9feeb5SBill Wendling   independent (it uses the ``TargetJITInfo`` structure to interface for
ff9feeb5SBill Wendling   target-specific issues.  The code for the target-independent JIT lives in
ff9feeb5SBill Wendling   ``lib/ExecutionEngine/JIT``.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingDepending on which part of the code generator you are interested in working on,
ff9feeb5SBill Wendlingdifferent pieces of this will be useful to you.  In any case, you should be
ff9feeb5SBill Wendlingfamiliar with the `target description`_ and `machine code representation`_
ff9feeb5SBill Wendlingclasses.  If you want to add a backend for a new target, you will need to
ff9feeb5SBill Wendling`implement the target description`_ classes for your new target and understand
1703e705SSean Silvathe :doc:`LLVM code representation <LangRef>`.  If you are interested in
ff9feeb5SBill Wendlingimplementing a new `code generation algorithm`_, it should only depend on the
ff9feeb5SBill Wendlingtarget-description and machine code representation classes, ensuring that it is
ff9feeb5SBill Wendlingportable.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingRequired components in the code generator
ff9feeb5SBill Wendling-----------------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe two pieces of the LLVM code generator are the high-level interface to the
ff9feeb5SBill Wendlingcode generator and the set of reusable components that can be used to build
ff9feeb5SBill Wendlingtarget-specific backends.  The two most important interfaces (:raw-html:`<tt>`
9cfc13d4SMicah Villmow`TargetMachine`_ :raw-html:`</tt>` and :raw-html:`<tt>` `DataLayout`_
ff9feeb5SBill Wendling:raw-html:`</tt>`) are the only ones that are required to be defined for a
ff9feeb5SBill Wendlingbackend to fit into the LLVM system, but the others must be defined if the
ff9feeb5SBill Wendlingreusable code generator components are going to be used.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis design has two important implications.  The first is that LLVM can support
ff9feeb5SBill Wendlingcompletely non-traditional code generation targets.  For example, the C backend
ff9feeb5SBill Wendlingdoes not require register allocation, instruction selection, or any of the other
ff9feeb5SBill Wendlingstandard components provided by the system.  As such, it only implements these
ff9feeb5SBill Wendlingtwo interfaces, and does its own thing. Note that C backend was removed from the
ff9feeb5SBill Wendlingtrunk since LLVM 3.1 release. Another example of a code generator like this is a
ff9feeb5SBill Wendling(purely hypothetical) backend that converts LLVM to the GCC RTL form and uses
ff9feeb5SBill WendlingGCC to emit machine code for a target.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis design also implies that it is possible to design and implement radically
ff9feeb5SBill Wendlingdifferent code generators in the LLVM system that do not make use of any of the
ff9feeb5SBill Wendlingbuilt-in components.  Doing so is not recommended at all, but could be required
ff9feeb5SBill Wendlingfor radically different targets that do not fit into the LLVM machine
ff9feeb5SBill Wendlingdescription model: FPGAs for example.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _high-level design of the code generator:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe high-level design of the code generator
ff9feeb5SBill Wendling-------------------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe LLVM target-independent code generator is designed to support efficient and
ff9feeb5SBill Wendlingquality code generation for standard register-based microprocessors.  Code
ff9feeb5SBill Wendlinggeneration in this model is divided into the following stages:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling1. `Instruction Selection`_ --- This phase determines an efficient way to
ff9feeb5SBill Wendling   express the input LLVM code in the target instruction set.  This stage
ff9feeb5SBill Wendling   produces the initial code for the program in the target instruction set, then
ff9feeb5SBill Wendling   makes use of virtual registers in SSA form and physical registers that
ff9feeb5SBill Wendling   represent any required register assignments due to target constraints or
ff9feeb5SBill Wendling   calling conventions.  This step turns the LLVM code into a DAG of target
ff9feeb5SBill Wendling   instructions.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling2. `Scheduling and Formation`_ --- This phase takes the DAG of target
ff9feeb5SBill Wendling   instructions produced by the instruction selection phase, determines an
ff9feeb5SBill Wendling   ordering of the instructions, then emits the instructions as :raw-html:`<tt>`
ff9feeb5SBill Wendling   `MachineInstr`_\s :raw-html:`</tt>` with that ordering.  Note that we
ff9feeb5SBill Wendling   describe this in the `instruction selection section`_ because it operates on
ff9feeb5SBill Wendling   a `SelectionDAG`_.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling3. `SSA-based Machine Code Optimizations`_ --- This optional stage consists of a
ff9feeb5SBill Wendling   series of machine-code optimizations that operate on the SSA-form produced by
ff9feeb5SBill Wendling   the instruction selector.  Optimizations like modulo-scheduling or peephole
ff9feeb5SBill Wendling   optimization work here.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling4. `Register Allocation`_ --- The target code is transformed from an infinite
ff9feeb5SBill Wendling   virtual register file in SSA form to the concrete register file used by the
ff9feeb5SBill Wendling   target.  This phase introduces spill code and eliminates all virtual register
ff9feeb5SBill Wendling   references from the program.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling5. `Prolog/Epilog Code Insertion`_ --- Once the machine code has been generated
ff9feeb5SBill Wendling   for the function and the amount of stack space required is known (used for
ff9feeb5SBill Wendling   LLVM alloca's and spill slots), the prolog and epilog code for the function
ff9feeb5SBill Wendling   can be inserted and "abstract stack location references" can be eliminated.
ff9feeb5SBill Wendling   This stage is responsible for implementing optimizations like frame-pointer
ff9feeb5SBill Wendling   elimination and stack packing.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling6. `Late Machine Code Optimizations`_ --- Optimizations that operate on "final"
ff9feeb5SBill Wendling   machine code can go here, such as spill code scheduling and peephole
ff9feeb5SBill Wendling   optimizations.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling7. `Code Emission`_ --- The final stage actually puts out the code for the
ff9feeb5SBill Wendling   current function, either in the target assembler format or in machine
ff9feeb5SBill Wendling   code.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe code generator is based on the assumption that the instruction selector will
ff9feeb5SBill Wendlinguse an optimal pattern matching selector to create high-quality sequences of
ff9feeb5SBill Wendlingnative instructions.  Alternative code generator designs based on pattern
ff9feeb5SBill Wendlingexpansion and aggressive iterative peephole optimization are much slower.  This
ff9feeb5SBill Wendlingdesign permits efficient compilation (important for JIT environments) and
ff9feeb5SBill Wendlingaggressive optimization (used when generating code offline) by allowing
ff9feeb5SBill Wendlingcomponents of varying levels of sophistication to be used for any step of
ff9feeb5SBill Wendlingcompilation.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn addition to these stages, target implementations can insert arbitrary
ff9feeb5SBill Wendlingtarget-specific passes into the flow.  For example, the X86 target uses a
ff9feeb5SBill Wendlingspecial pass to handle the 80x87 floating point stack architecture.  Other
ff9feeb5SBill Wendlingtargets with unusual requirements can be supported with custom passes as needed.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingUsing TableGen for target description
ff9feeb5SBill Wendling-------------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe target description classes require a detailed description of the target
ff9feeb5SBill Wendlingarchitecture.  These target descriptions often have a large amount of common
ff9feeb5SBill Wendlinginformation (e.g., an ``add`` instruction is almost identical to a ``sub``
ff9feeb5SBill Wendlinginstruction).  In order to allow the maximum amount of commonality to be
ff9feeb5SBill Wendlingfactored out, the LLVM code generator uses the
397ee6ecSSean Silva:doc:`TableGen/index` tool to describe big chunks of the
ff9feeb5SBill Wendlingtarget machine, which allows the use of domain-specific and target-specific
ff9feeb5SBill Wendlingabstractions to reduce the amount of repetition.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAs LLVM continues to be developed and refined, we plan to move more and more of
ff9feeb5SBill Wendlingthe target description to the ``.td`` form.  Doing so gives us a number of
ff9feeb5SBill Wendlingadvantages.  The most important is that it makes it easier to port LLVM because
ff9feeb5SBill Wendlingit reduces the amount of C++ code that has to be written, and the surface area
ff9feeb5SBill Wendlingof the code generator that needs to be understood before someone can get
ff9feeb5SBill Wendlingsomething working.  Second, it makes it easier to change things. In particular,
ff9feeb5SBill Wendlingif tables and other things are all emitted by ``tblgen``, we only need a change
ff9feeb5SBill Wendlingin one place (``tblgen``) to update all of the targets to a new interface.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Abstract target description:
ff9feeb5SBill Wendling.. _target description:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTarget description classes
ff9feeb5SBill Wendling==========================
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe LLVM target description classes (located in the ``include/llvm/Target``
ff9feeb5SBill Wendlingdirectory) provide an abstract description of the target machine independent of
ff9feeb5SBill Wendlingany particular client.  These classes are designed to capture the *abstract*
ff9feeb5SBill Wendlingproperties of the target (such as the instructions and registers it has), and do
ff9feeb5SBill Wendlingnot incorporate any particular pieces of code generation algorithms.
ff9feeb5SBill Wendling
9cfc13d4SMicah VillmowAll of the target description classes (except the :raw-html:`<tt>` `DataLayout`_
ff9feeb5SBill Wendling:raw-html:`</tt>` class) are designed to be subclassed by the concrete target
ff9feeb5SBill Wendlingimplementation, and have virtual methods implemented.  To get to these
ff9feeb5SBill Wendlingimplementations, the :raw-html:`<tt>` `TargetMachine`_ :raw-html:`</tt>` class
ff9feeb5SBill Wendlingprovides accessors that should be implemented by the target.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _TargetMachine:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetMachine`` class
ff9feeb5SBill Wendling---------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetMachine`` class provides virtual methods that are used to access the
ff9feeb5SBill Wendlingtarget-specific implementations of the various target description classes via
ff9feeb5SBill Wendlingthe ``get*Info`` methods (``getInstrInfo``, ``getRegisterInfo``,
ff9feeb5SBill Wendling``getFrameInfo``, etc.).  This class is designed to be specialized by a concrete
ff9feeb5SBill Wendlingtarget implementation (e.g., ``X86TargetMachine``) which implements the various
ff9feeb5SBill Wendlingvirtual methods.  The only required target description class is the
9cfc13d4SMicah Villmow:raw-html:`<tt>` `DataLayout`_ :raw-html:`</tt>` class, but if the code
ff9feeb5SBill Wendlinggenerator components are to be used, the other interfaces should be implemented
ff9feeb5SBill Wendlingas well.
ff9feeb5SBill Wendling
9cfc13d4SMicah Villmow.. _DataLayout:
ff9feeb5SBill Wendling
9cfc13d4SMicah VillmowThe ``DataLayout`` class
ff9feeb5SBill Wendling------------------------
ff9feeb5SBill Wendling
9cfc13d4SMicah VillmowThe ``DataLayout`` class is the only required target description class, and it
13539d1bSDmitri Gribenkois the only class that is not extensible (you cannot derive a new class from
9cfc13d4SMicah Villmowit).  ``DataLayout`` specifies information about how the target lays out memory
ff9feeb5SBill Wendlingfor structures, the alignment requirements for various data types, the size of
ff9feeb5SBill Wendlingpointers in the target, and whether the target is little-endian or
ff9feeb5SBill Wendlingbig-endian.
ff9feeb5SBill Wendling
8bd389d8SDmitri Gribenko.. _TargetLowering:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetLowering`` class
ff9feeb5SBill Wendling----------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetLowering`` class is used by SelectionDAG based instruction selectors
ff9feeb5SBill Wendlingprimarily to describe how LLVM code should be lowered to SelectionDAG
ff9feeb5SBill Wendlingoperations.  Among other things, this class indicates:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* an initial register class to use for various ``ValueType``\s,
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* which operations are natively supported by the target machine,
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* the return type of ``setcc`` operations,
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* the type to use for shift amounts, and
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* various high-level characteristics, like whether it is profitable to turn
13539d1bSDmitri Gribenko  division by a constant into a multiplication sequence.
ff9feeb5SBill Wendling
7174c5a0SDmitri Gribenko.. _TargetRegisterInfo:
7174c5a0SDmitri Gribenko
ff9feeb5SBill WendlingThe ``TargetRegisterInfo`` class
ff9feeb5SBill Wendling--------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetRegisterInfo`` class is used to describe the register file of the
ff9feeb5SBill Wendlingtarget and any interactions between the registers.
ff9feeb5SBill Wendling
70f4e794SEli BenderskyRegisters are represented in the code generator by unsigned integers.  Physical
70f4e794SEli Benderskyregisters (those that actually exist in the target description) are unique
70f4e794SEli Benderskysmall numbers, and virtual registers are generally large.  Note that
70f4e794SEli Benderskyregister ``#0`` is reserved as a flag value.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingEach register in the processor description has an associated
ff9feeb5SBill Wendling``TargetRegisterDesc`` entry, which provides a textual name for the register
ff9feeb5SBill Wendling(used for assembly output and debugging dumps) and a set of aliases (used to
ff9feeb5SBill Wendlingindicate whether one register overlaps with another).
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn addition to the per-register description, the ``TargetRegisterInfo`` class
ff9feeb5SBill Wendlingexposes a set of processor specific register classes (instances of the
ff9feeb5SBill Wendling``TargetRegisterClass`` class).  Each register class contains sets of registers
ff9feeb5SBill Wendlingthat have the same properties (for example, they are all 32-bit integer
ff9feeb5SBill Wendlingregisters).  Each SSA virtual register created by the instruction selector has
ff9feeb5SBill Wendlingan associated register class.  When the register allocator runs, it replaces
ff9feeb5SBill Wendlingvirtual registers with a physical register in the set.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe target-specific implementations of these classes is auto-generated from a
397ee6ecSSean Silva:doc:`TableGen/index` description of the register file.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _TargetInstrInfo:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetInstrInfo`` class
ff9feeb5SBill Wendling-----------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetInstrInfo`` class is used to describe the machine instructions
0b85ba7dSEli Benderskysupported by the target.  Descriptions define things like the mnemonic for
0b85ba7dSEli Benderskythe opcode, the number of operands, the list of implicit register uses and defs,
0b85ba7dSEli Benderskywhether the instruction has certain target-independent properties (accesses
0b85ba7dSEli Benderskymemory, is commutable, etc), and holds any target-specific flags.
ff9feeb5SBill Wendling
4b916b21SNico WeberThe ``TargetFrameLowering`` class
4773d0b4SDan Liew---------------------------------
ff9feeb5SBill Wendling
4b916b21SNico WeberThe ``TargetFrameLowering`` class is used to provide information about the stack
ff9feeb5SBill Wendlingframe layout of the target. It holds the direction of stack growth, the known
ff9feeb5SBill Wendlingstack alignment on entry to each function, and the offset to the local area.
ff9feeb5SBill WendlingThe offset to the local area is the offset from the stack pointer on function
ff9feeb5SBill Wendlingentry to the first location where function data (local variables, spill
ff9feeb5SBill Wendlinglocations) can be stored.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetSubtarget`` class
ff9feeb5SBill Wendling-----------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetSubtarget`` class is used to provide information about the specific
ff9feeb5SBill Wendlingchip set being targeted.  A sub-target informs code generation of which
ff9feeb5SBill Wendlinginstructions are supported, instruction latencies and instruction execution
ff9feeb5SBill Wendlingitinerary; i.e., which processing units are used, in what order, and for how
ff9feeb5SBill Wendlinglong.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetJITInfo`` class
ff9feeb5SBill Wendling---------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``TargetJITInfo`` class exposes an abstract interface used by the
ff9feeb5SBill WendlingJust-In-Time code generator to perform target-specific activities, such as
ff9feeb5SBill Wendlingemitting stubs.  If a ``TargetMachine`` supports JIT code generation, it should
ff9feeb5SBill Wendlingprovide one of these objects through the ``getJITInfo`` method.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _code being generated:
ff9feeb5SBill Wendling.. _machine code representation:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMachine code description classes
ff9feeb5SBill Wendling================================
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAt the high-level, LLVM code is translated to a machine specific representation
ff9feeb5SBill Wendlingformed out of :raw-html:`<tt>` `MachineFunction`_ :raw-html:`</tt>`,
ff9feeb5SBill Wendling:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>`, and :raw-html:`<tt>`
ff9feeb5SBill Wendling`MachineInstr`_ :raw-html:`</tt>` instances (defined in
ff9feeb5SBill Wendling``include/llvm/CodeGen``).  This representation is completely target agnostic,
ff9feeb5SBill Wendlingrepresenting instructions in their most abstract form: an opcode and a series of
ff9feeb5SBill Wendlingoperands.  This representation is designed to support both an SSA representation
ff9feeb5SBill Wendlingfor machine code, as well as a register allocated, non-SSA form.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _MachineInstr:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MachineInstr`` class
ff9feeb5SBill Wendling--------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTarget machine instructions are represented as instances of the ``MachineInstr``
ff9feeb5SBill Wendlingclass.  This class is an extremely abstract way of representing machine
ff9feeb5SBill Wendlinginstructions.  In particular, it only keeps track of an opcode number and a set
ff9feeb5SBill Wendlingof operands.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe opcode number is a simple unsigned integer that only has meaning to a
ff9feeb5SBill Wendlingspecific backend.  All of the instructions for a target should be defined in the
ff9feeb5SBill Wendling``*InstrInfo.td`` file for the target. The opcode enum values are auto-generated
ff9feeb5SBill Wendlingfrom this description.  The ``MachineInstr`` class does not have any information
ff9feeb5SBill Wendlingabout how to interpret the instruction (i.e., what the semantics of the
ff9feeb5SBill Wendlinginstruction are); for that you must refer to the :raw-html:`<tt>`
ff9feeb5SBill Wendling`TargetInstrInfo`_ :raw-html:`</tt>` class.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe operands of a machine instruction can be of several different types: a
ff9feeb5SBill Wendlingregister reference, a constant integer, a basic block reference, etc.  In
ff9feeb5SBill Wendlingaddition, a machine operand should be marked as a def or a use of the value
ff9feeb5SBill Wendling(though only registers are allowed to be defs).
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingBy convention, the LLVM code generator orders instruction operands so that all
ff9feeb5SBill Wendlingregister definitions come before the register uses, even on architectures that
ff9feeb5SBill Wendlingare normally printed in other orders.  For example, the SPARC add instruction:
ff9feeb5SBill Wendling"``add %i1, %i2, %i3``" adds the "%i1", and "%i2" registers and stores the
ff9feeb5SBill Wendlingresult into the "%i3" register.  In the LLVM code generator, the operands should
ff9feeb5SBill Wendlingbe stored as "``%i3, %i1, %i2``": with the destination first.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingKeeping destination (definition) operands at the beginning of the operand list
ff9feeb5SBill Wendlinghas several advantages.  In particular, the debugging printer will print the
ff9feeb5SBill Wendlinginstruction like this:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: llvm
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  %r3 = add %i1, %i2
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAlso if the first operand is a def, it is easier to `create instructions`_ whose
ff9feeb5SBill Wendlingonly def is the first operand.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _create instructions:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingUsing the ``MachineInstrBuilder.h`` functions
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMachine instructions are created by using the ``BuildMI`` functions, located in
ff9feeb5SBill Wendlingthe ``include/llvm/CodeGen/MachineInstrBuilder.h`` file.  The ``BuildMI``
ff9feeb5SBill Wendlingfunctions make it easy to build arbitrary machine instructions.  Usage of the
ff9feeb5SBill Wendling``BuildMI`` functions look like this:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: c++
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
3da1078aSKrzysztof Parzyszek  // instruction and insert it at the end of the given MachineBasicBlock.
3da1078aSKrzysztof Parzyszek  const TargetInstrInfo &TII = ...
37f92c74SDmitri Gribenko  MachineBasicBlock &MBB = ...
3da1078aSKrzysztof Parzyszek  DebugLoc DL;
3da1078aSKrzysztof Parzyszek  MachineInstr *MI = BuildMI(MBB, DL, TII.get(X86::MOV32ri), DestReg).addImm(42);
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Create the same instr, but insert it before a specified iterator point.
ff9feeb5SBill Wendling  MachineBasicBlock::iterator MBBI = ...
3da1078aSKrzysztof Parzyszek  BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), DestReg).addImm(42);
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Create a 'cmp Reg, 0' instruction, no destination reg.
3da1078aSKrzysztof Parzyszek  MI = BuildMI(MBB, DL, TII.get(X86::CMP32ri8)).addReg(Reg).addImm(42);
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Create an 'sahf' instruction which takes no operands and stores nothing.
3da1078aSKrzysztof Parzyszek  MI = BuildMI(MBB, DL, TII.get(X86::SAHF));
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Create a self looping branch instruction.
3da1078aSKrzysztof Parzyszek  BuildMI(MBB, DL, TII.get(X86::JNE)).addMBB(&MBB);
ff9feeb5SBill Wendling
3da1078aSKrzysztof ParzyszekIf you need to add a definition operand (other than the optional destination
3da1078aSKrzysztof Parzyszekregister), you must explicitly mark it as such:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: c++
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  MI.addReg(Reg, RegState::Define);
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingFixed (preassigned) registers
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingOne important issue that the code generator needs to be aware of is the presence
ff9feeb5SBill Wendlingof fixed registers.  In particular, there are often places in the instruction
ff9feeb5SBill Wendlingstream where the register allocator *must* arrange for a particular value to be
ff9feeb5SBill Wendlingin a particular register.  This can occur due to limitations of the instruction
ff9feeb5SBill Wendlingset (e.g., the X86 can only do a 32-bit divide with the ``EAX``/``EDX``
ff9feeb5SBill Wendlingregisters), or external factors like calling conventions.  In any case, the
ff9feeb5SBill Wendlinginstruction selector should emit code that copies a virtual register into or out
ff9feeb5SBill Wendlingof a physical register when needed.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingFor example, consider this simple LLVM example:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: llvm
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  define i32 @test(i32 %X, i32 %Y) {
cdc53956STim Northover    %Z = sdiv i32 %X, %Y
ff9feeb5SBill Wendling    ret i32 %Z
ff9feeb5SBill Wendling  }
ff9feeb5SBill Wendling
cdc53956STim NorthoverThe X86 instruction selector might produce this machine code for the ``div`` and
cdc53956STim Northover``ret``:
ff9feeb5SBill Wendling
124f2593SRenato Golin.. code-block:: text
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  ;; Start of div
ff9feeb5SBill Wendling  %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
ff9feeb5SBill Wendling  %reg1027 = sar %reg1024, 31
ff9feeb5SBill Wendling  %EDX = mov %reg1027           ;; Sign extend X into EDX
ff9feeb5SBill Wendling  idiv %reg1025                 ;; Divide by Y (in reg1025)
ff9feeb5SBill Wendling  %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  ;; Start of ret
ff9feeb5SBill Wendling  %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
ff9feeb5SBill Wendling  ret
ff9feeb5SBill Wendling
cdc53956STim NorthoverBy the end of code generation, the register allocator would coalesce the
cdc53956STim Northoverregisters and delete the resultant identity moves producing the following
ff9feeb5SBill Wendlingcode:
ff9feeb5SBill Wendling
124f2593SRenato Golin.. code-block:: text
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  ;; X is in EAX, Y is in ECX
ff9feeb5SBill Wendling  mov %EAX, %EDX
ff9feeb5SBill Wendling  sar %EDX, 31
ff9feeb5SBill Wendling  idiv %ECX
ff9feeb5SBill Wendling  ret
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis approach is extremely general (if it can handle the X86 architecture, it
ff9feeb5SBill Wendlingcan handle anything!) and allows all of the target specific knowledge about the
ff9feeb5SBill Wendlinginstruction stream to be isolated in the instruction selector.  Note that
ff9feeb5SBill Wendlingphysical registers should have a short lifetime for good code generation, and
ff9feeb5SBill Wendlingall physical registers are assumed dead on entry to and exit from basic blocks
ff9feeb5SBill Wendling(before register allocation).  Thus, if you need a value to be live across basic
ff9feeb5SBill Wendlingblock boundaries, it *must* live in a virtual register.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingCall-clobbered registers
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSome machine instructions, like calls, clobber a large number of physical
ff9feeb5SBill Wendlingregisters.  Rather than adding ``<def,dead>`` operands for all of them, it is
ff9feeb5SBill Wendlingpossible to use an ``MO_RegisterMask`` operand instead.  The register mask
ff9feeb5SBill Wendlingoperand holds a bit mask of preserved registers, and everything else is
ff9feeb5SBill Wendlingconsidered to be clobbered by the instruction.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMachine code in SSA form
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling``MachineInstr``'s are initially selected in SSA-form, and are maintained in
ff9feeb5SBill WendlingSSA-form until register allocation happens.  For the most part, this is
ff9feeb5SBill Wendlingtrivially simple since LLVM is already in SSA form; LLVM PHI nodes become
ff9feeb5SBill Wendlingmachine code PHI nodes, and virtual registers are only allowed to have a single
ff9feeb5SBill Wendlingdefinition.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAfter register allocation, machine code is no longer in SSA-form because there
ff9feeb5SBill Wendlingare no virtual registers left in the code.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _MachineBasicBlock:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MachineBasicBlock`` class
ff9feeb5SBill Wendling-------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MachineBasicBlock`` class contains a list of machine instructions
ff9feeb5SBill Wendling(:raw-html:`<tt>` `MachineInstr`_ :raw-html:`</tt>` instances).  It roughly
ff9feeb5SBill Wendlingcorresponds to the LLVM code input to the instruction selector, but there can be
ff9feeb5SBill Wendlinga one-to-many mapping (i.e. one LLVM basic block can map to multiple machine
ff9feeb5SBill Wendlingbasic blocks). The ``MachineBasicBlock`` class has a "``getBasicBlock``" method,
ff9feeb5SBill Wendlingwhich returns the LLVM basic block that it comes from.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _MachineFunction:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MachineFunction`` class
ff9feeb5SBill Wendling-----------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MachineFunction`` class contains a list of machine basic blocks
ff9feeb5SBill Wendling(:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>` instances).  It
ff9feeb5SBill Wendlingcorresponds one-to-one with the LLVM function input to the instruction selector.
e1567e77SYoungsuk KimIn addition to a list of basic blocks, the ``MachineFunction`` contains a
ff9feeb5SBill Wendling``MachineConstantPool``, a ``MachineFrameInfo``, a ``MachineFunctionInfo``, and
ff9feeb5SBill Wendlinga ``MachineRegisterInfo``.  See ``include/llvm/CodeGen/MachineFunction.h`` for
ff9feeb5SBill Wendlingmore information.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling``MachineInstr Bundles``
ff9feeb5SBill Wendling------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLLVM code generator can model sequences of instructions as MachineInstr
ff9feeb5SBill Wendlingbundles. A MI bundle can model a VLIW group / pack which contains an arbitrary
ff9feeb5SBill Wendlingnumber of parallel instructions. It can also be used to model a sequential list
ff9feeb5SBill Wendlingof instructions (potentially with data dependencies) that cannot be legally
ff9feeb5SBill Wendlingseparated (e.g. ARM Thumb2 IT blocks).
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingConceptually a MI bundle is a MI with a number of other MIs nested within:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  --------------
ff9feeb5SBill Wendling  |   Bundle   | ---------
ff9feeb5SBill Wendling  --------------          \
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |           |      MI      |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |                   |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |           |      MI      |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |                   |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |           |      MI      |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |
ff9feeb5SBill Wendling  --------------
ff9feeb5SBill Wendling  |   Bundle   | --------
ff9feeb5SBill Wendling  --------------         \
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |           |      MI      |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |                   |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |           |      MI      |
ff9feeb5SBill Wendling         |           ----------------
ff9feeb5SBill Wendling         |                   |
ff9feeb5SBill Wendling         |                  ...
ff9feeb5SBill Wendling         |
ff9feeb5SBill Wendling  --------------
ff9feeb5SBill Wendling  |   Bundle   | --------
ff9feeb5SBill Wendling  --------------         \
ff9feeb5SBill Wendling         |
ff9feeb5SBill Wendling        ...
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMI bundle support does not change the physical representations of
ff9feeb5SBill WendlingMachineBasicBlock and MachineInstr. All the MIs (including top level and nested
ff9feeb5SBill Wendlingones) are stored as sequential list of MIs. The "bundled" MIs are marked with
ff9feeb5SBill Wendlingthe 'InsideBundle' flag. A top level MI with the special BUNDLE opcode is used
c36a1f1cSHiroshi Inoueto represent the start of a bundle. It's legal to mix BUNDLE MIs with individual
ff9feeb5SBill WendlingMIs that are not inside bundles nor represent bundles.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMachineInstr passes should operate on a MI bundle as a single unit. Member
ff9feeb5SBill Wendlingmethods have been taught to correctly handle bundles and MIs inside bundles.
ff9feeb5SBill WendlingThe MachineBasicBlock iterator has been modified to skip over bundled MIs to
ff9feeb5SBill Wendlingenforce the bundle-as-a-single-unit concept. An alternative iterator
ff9feeb5SBill Wendlinginstr_iterator has been added to MachineBasicBlock to allow passes to iterate
ff9feeb5SBill Wendlingover all of the MIs in a MachineBasicBlock, including those which are nested
ff9feeb5SBill Wendlinginside bundles. The top level BUNDLE instruction must have the correct set of
ff9feeb5SBill Wendlingregister MachineOperand's that represent the cumulative inputs and outputs of
ff9feeb5SBill Wendlingthe bundled MIs.
ff9feeb5SBill Wendling
e56a2ad8SMatt ArsenaultPacking / bundling of MachineInstrs for VLIW architectures should
e56a2ad8SMatt Arsenaultgenerally be done as part of the register allocation super-pass. More
e56a2ad8SMatt Arsenaultspecifically, the pass which determines what MIs should be bundled
e56a2ad8SMatt Arsenaulttogether should be done after code generator exits SSA form
e56a2ad8SMatt Arsenault(i.e. after two-address pass, PHI elimination, and copy coalescing).
e56a2ad8SMatt ArsenaultSuch bundles should be finalized (i.e. adding BUNDLE MIs and input and
e56a2ad8SMatt Arsenaultoutput register MachineOperands) after virtual registers have been
e56a2ad8SMatt Arsenaultrewritten into physical registers. This eliminates the need to add
e56a2ad8SMatt Arsenaultvirtual register operands to BUNDLE instructions which would
e56a2ad8SMatt Arsenaulteffectively double the virtual register def and use lists. Bundles may
e56a2ad8SMatt Arsenaultuse virtual registers and be formed in SSA form, but may not be
e56a2ad8SMatt Arsenaultappropriate for all use cases.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _MC Layer:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe "MC" Layer
ff9feeb5SBill Wendling==============
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe MC Layer is used to represent and process code at the raw machine code
ff9feeb5SBill Wendlinglevel, devoid of "high level" information like "constant pools", "jump tables",
ff9feeb5SBill Wendling"global variables" or anything like that.  At this level, LLVM handles things
ff9feeb5SBill Wendlinglike label names, machine instructions, and sections in the object file.  The
ff9feeb5SBill Wendlingcode in this layer is used for a number of important purposes: the tail end of
ff9feeb5SBill Wendlingthe code generator uses it to write a .s or .o file, and it is also used by the
ff9feeb5SBill Wendlingllvm-mc tool to implement standalone machine code assemblers and disassemblers.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis section describes some of the important classes.  There are also a number
ff9feeb5SBill Wendlingof important subsystems that interact at this layer, they are described later in
ff9feeb5SBill Wendlingthis manual.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _MCStreamer:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MCStreamer`` API
ff9feeb5SBill Wendling----------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMCStreamer is best thought of as an assembler API.  It is an abstract API which
ff9feeb5SBill Wendlingis *implemented* in different ways (e.g. to output a .s file, output an ELF .o
ff9feeb5SBill Wendlingfile, etc) but whose API correspond directly to what you see in a .s file.
ff9feeb5SBill WendlingMCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute,
adf4142fSFangrui SongswitchSection, emitValue (for .byte, .word), etc, which directly correspond to
ff9feeb5SBill Wendlingassembly level directives.  It also has an EmitInstruction method, which is used
ff9feeb5SBill Wendlingto output an MCInst to the streamer.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis API is most important for two clients: the llvm-mc stand-alone assembler is
ff9feeb5SBill Wendlingeffectively a parser that parses a line, then invokes a method on MCStreamer. In
ff9feeb5SBill Wendlingthe code generator, the `Code Emission`_ phase of the code generator lowers
ff9feeb5SBill Wendlinghigher level LLVM IR and Machine* constructs down to the MC layer, emitting
ff9feeb5SBill Wendlingdirectives through MCStreamer.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingOn the implementation side of MCStreamer, there are two major implementations:
ff9feeb5SBill Wendlingone for writing out a .s file (MCAsmStreamer), and one for writing out a .o
ca35b090SJustin Lebarfile (MCObjectStreamer).  MCAsmStreamer is a straightforward implementation
ff9feeb5SBill Wendlingthat prints out a directive for each method (e.g. ``EmitValue -> .byte``), but
ff9feeb5SBill WendlingMCObjectStreamer implements a full assembler.
ff9feeb5SBill Wendling
974efd32SRafael EspindolaFor target specific directives, the MCStreamer has a MCTargetStreamer instance.
974efd32SRafael EspindolaEach target that needs it defines a class that inherits from it and is a lot
974efd32SRafael Espindolalike MCStreamer itself: It has one method per directive and two classes that
974efd32SRafael Espindolainherit from it, a target object streamer and a target asm streamer. The target
48aee7c6SArtyom Skrobovasm streamer just prints it (``emitFnStart -> .fnstart``), and the object
974efd32SRafael Espindolastreamer implement the assembler logic for it.
974efd32SRafael Espindola
b665d79fSRafael EspindolaTo make llvm use these classes, the target initialization must call
b665d79fSRafael EspindolaTargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer
b665d79fSRafael Espindolapassing callbacks that allocate the corresponding target streamer and pass it
b665d79fSRafael Espindolato createAsmStreamer or to the appropriate object streamer constructor.
b665d79fSRafael Espindola
ff9feeb5SBill WendlingThe ``MCContext`` class
ff9feeb5SBill Wendling-----------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe MCContext class is the owner of a variety of uniqued data structures at the
ff9feeb5SBill WendlingMC layer, including symbols, sections, etc.  As such, this is the class that you
ff9feeb5SBill Wendlinginteract with to create symbols and sections.  This class can not be subclassed.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MCSymbol`` class
ff9feeb5SBill Wendling----------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe MCSymbol class represents a symbol (aka label) in the assembly file.  There
ff9feeb5SBill Wendlingare two interesting kinds of symbols: assembler temporary symbols, and normal
ff9feeb5SBill Wendlingsymbols.  Assembler temporary symbols are used and processed by the assembler
ff9feeb5SBill Wendlingbut are discarded when the object file is produced.  The distinction is usually
ff9feeb5SBill Wendlingrepresented by adding a prefix to the label, for example "L" labels are
ff9feeb5SBill Wendlingassembler temporary labels in MachO.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMCSymbols are created by MCContext and uniqued there.  This means that MCSymbols
ff9feeb5SBill Wendlingcan be compared for pointer equivalence to find out if they are the same symbol.
ff9feeb5SBill WendlingNote that pointer inequality does not guarantee the labels will end up at
ff9feeb5SBill Wendlingdifferent addresses though.  It's perfectly legal to output something like this
ff9feeb5SBill Wendlingto the .s file:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  foo:
ff9feeb5SBill Wendling  bar:
ff9feeb5SBill Wendling    .byte 4
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn this case, both the foo and bar symbols will have the same address.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MCSection`` class
ff9feeb5SBill Wendling-----------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MCSection`` class represents an object-file specific section. It is
ff9feeb5SBill Wendlingsubclassed by object file specific implementations (e.g. ``MCSectionMachO``,
ff9feeb5SBill Wendling``MCSectionCOFF``, ``MCSectionELF``) and these are created and uniqued by
ff9feeb5SBill WendlingMCContext.  The MCStreamer has a notion of the current section, which can be
ff9feeb5SBill Wendlingchanged with the SwitchToSection method (which corresponds to a ".section"
ff9feeb5SBill Wendlingdirective in a .s file).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _MCInst:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MCInst`` class
ff9feeb5SBill Wendling--------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``MCInst`` class is a target-independent representation of an instruction.
ff9feeb5SBill WendlingIt is a simple class (much more so than `MachineInstr`_) that holds a
ff9feeb5SBill Wendlingtarget-specific opcode and a vector of MCOperands.  MCOperand, in turn, is a
ff9feeb5SBill Wendlingsimple discriminated union of three cases: 1) a simple immediate, 2) a target
ff9feeb5SBill Wendlingregister ID, 3) a symbolic expression (e.g. "``Lfoo-Lbar+42``") as an MCExpr.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMCInst is the common currency used to represent machine instructions at the MC
ff9feeb5SBill Wendlinglayer.  It is the type used by the instruction encoder, the instruction printer,
ff9feeb5SBill Wendlingand the type generated by the assembly parser and disassembler.
ff9feeb5SBill Wendling
d5745d00SChris Bieneman.. _ObjectFormats:
d5745d00SChris Bieneman
d5745d00SChris BienemanObject File Format
d5745d00SChris Bieneman------------------
d5745d00SChris Bieneman
d5745d00SChris BienemanThe MC layer's object writers support a variety of object formats. Because of
d5745d00SChris Bienemantarget-specific aspects of object formats each target only supports a subset of
d5745d00SChris Bienemanthe formats supported by the MC layer. Most targets support emitting ELF
d5745d00SChris Bienemanobjects. Other vendor-specific objects are generally supported only on targets
d5745d00SChris Bienemanthat are supported by that vendor (i.e. MachO is only supported on targets
d5745d00SChris Bienemansupported by Darwin, and XCOFF is only supported on targets that support AIX).
d5745d00SChris BienemanAdditionally some targets have their own object formats (i.e. DirectX, SPIR-V
d5745d00SChris Bienemanand WebAssembly).
d5745d00SChris Bieneman
d5745d00SChris BienemanThe table below captures a snapshot of object file support in LLVM:
d5745d00SChris Bieneman
d5745d00SChris Bieneman  .. table:: Object File Formats
d5745d00SChris Bieneman
d5745d00SChris Bieneman     ================== ========================================================
d5745d00SChris Bieneman     Format              Supported Targets
d5745d00SChris Bieneman     ================== ========================================================
d5745d00SChris Bieneman     ``COFF``            AArch64, ARM, X86
d5745d00SChris Bieneman     ``DXContainer``     DirectX
d5745d00SChris Bieneman     ``ELF``             AArch64, AMDGPU, ARM, AVR, BPF, CSKY, Hexagon, Lanai, LoongArch, M86k, MSP430, MIPS, PowerPC, RISCV, SPARC, SystemZ, VE, X86
d5745d00SChris Bieneman     ``GCOFF``           SystemZ
d5745d00SChris Bieneman     ``MachO``           AArch64, ARM, X86
d5745d00SChris Bieneman     ``SPIR-V``          SPIRV
d5745d00SChris Bieneman     ``WASM``            WebAssembly
d5745d00SChris Bieneman     ``XCOFF``           PowerPC
d5745d00SChris Bieneman     ================== ========================================================
d5745d00SChris Bieneman
ff9feeb5SBill Wendling.. _Target-independent algorithms:
ff9feeb5SBill Wendling.. _code generation algorithm:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTarget-independent code generation algorithms
ff9feeb5SBill Wendling=============================================
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis section documents the phases described in the `high-level design of the
ff9feeb5SBill Wendlingcode generator`_.  It explains how they work and some of the rationale behind
ff9feeb5SBill Wendlingtheir design.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Instruction Selection:
ff9feeb5SBill Wendling.. _instruction selection section:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction Selection
ff9feeb5SBill Wendling---------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction Selection is the process of translating LLVM code presented to the
ff9feeb5SBill Wendlingcode generator into target-specific machine instructions.  There are several
ff9feeb5SBill Wendlingwell-known ways to do this in the literature.  LLVM uses a SelectionDAG based
ff9feeb5SBill Wendlinginstruction selector.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingPortions of the DAG instruction selector are generated from the target
ff9feeb5SBill Wendlingdescription (``*.td``) files.  Our goal is for the entire instruction selector
ff9feeb5SBill Wendlingto be generated from these ``.td`` files, though currently there are still
ff9feeb5SBill Wendlingthings that require custom C++ code.
ff9feeb5SBill Wendling
a15f9ff9Spooja2299`GlobalISel <https://llvm.org/docs/GlobalISel/index.html>`_ is another
a15f9ff9Spooja2299instruction selection framework.
a15f9ff9Spooja2299
ff9feeb5SBill Wendling.. _SelectionDAG:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIntroduction to SelectionDAGs
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe SelectionDAG provides an abstraction for code representation in a way that
ff9feeb5SBill Wendlingis amenable to instruction selection using automatic techniques
ff9feeb5SBill Wendling(e.g. dynamic-programming based optimal pattern matching selectors). It is also
ff9feeb5SBill Wendlingwell-suited to other phases of code generation; in particular, instruction
ff9feeb5SBill Wendlingscheduling (SelectionDAG's are very close to scheduling DAGs post-selection).
ff9feeb5SBill WendlingAdditionally, the SelectionDAG provides a host representation where a large
ff9feeb5SBill Wendlingvariety of very-low-level (but target-independent) `optimizations`_ may be
ff9feeb5SBill Wendlingperformed; ones which require extensive information about the instructions
ff9feeb5SBill Wendlingefficiently supported by the target.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
ff9feeb5SBill Wendling``SDNode`` class.  The primary payload of the ``SDNode`` is its operation code
ff9feeb5SBill Wendling(Opcode) that indicates what operation the node performs and the operands to the
ff9feeb5SBill Wendlingoperation.  The various operation node types are described at the top of the
5b41ea0dSCharlie Turner``include/llvm/CodeGen/ISDOpcodes.h`` file.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAlthough most operations define a single value, each node in the graph may
ff9feeb5SBill Wendlingdefine multiple values.  For example, a combined div/rem operation will define
ff9feeb5SBill Wendlingboth the dividend and the remainder. Many other situations require multiple
ff9feeb5SBill Wendlingvalues as well.  Each node also has some number of operands, which are edges to
ff9feeb5SBill Wendlingthe node defining the used value.  Because nodes may define multiple values,
ff9feeb5SBill Wendlingedges are represented by instances of the ``SDValue`` class, which is a
ff9feeb5SBill Wendling``<SDNode, unsigned>`` pair, indicating the node and result value being used,
ff9feeb5SBill Wendlingrespectively.  Each value produced by an ``SDNode`` has an associated ``MVT``
ff9feeb5SBill Wendling(Machine Value Type) indicating what the type of the value is.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAGs contain two different kinds of values: those that represent data
ff9feeb5SBill Wendlingflow and those that represent control flow dependencies.  Data values are simple
ff9feeb5SBill Wendlingedges with an integer or floating point value type.  Control edges are
ff9feeb5SBill Wendlingrepresented as "chain" edges which are of type ``MVT::Other``.  These edges
ff9feeb5SBill Wendlingprovide an ordering between nodes that have side effects (such as loads, stores,
ff9feeb5SBill Wendlingcalls, returns, etc).  All nodes that have side effects should take a token
ff9feeb5SBill Wendlingchain as input and produce a new one as output.  By convention, token chain
ff9feeb5SBill Wendlinginputs are always operand #0, and chain results are always the last value
907e64b4SMatt Arsenaultproduced by an operation. However, after instruction selection, the
907e64b4SMatt Arsenaultmachine nodes have their chain after the instruction's operands, and
907e64b4SMatt Arsenaultmay be followed by glue nodes.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingA SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
ff9feeb5SBill Wendlingalways a marker node with an Opcode of ``ISD::EntryToken``.  The Root node is
ff9feeb5SBill Wendlingthe final side-effecting node in the token chain. For example, in a single basic
ff9feeb5SBill Wendlingblock function it would be the return node.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingOne important concept for SelectionDAGs is the notion of a "legal" vs.
ff9feeb5SBill Wendling"illegal" DAG.  A legal DAG for a target is one that only uses supported
ff9feeb5SBill Wendlingoperations and supported types.  On a 32-bit PowerPC, for example, a DAG with a
ff9feeb5SBill Wendlingvalue of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a
ff9feeb5SBill WendlingSREM or UREM operation.  The `legalize types`_ and `legalize operations`_ phases
ff9feeb5SBill Wendlingare responsible for turning an illegal DAG into a legal DAG.
ff9feeb5SBill Wendling
7174c5a0SDmitri Gribenko.. _SelectionDAG-Process:
7174c5a0SDmitri Gribenko
ff9feeb5SBill WendlingSelectionDAG Instruction Selection Process
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAG-based instruction selection consists of the following steps:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Build initial DAG`_ --- This stage performs a simple translation from the
ff9feeb5SBill Wendling   input LLVM code to an illegal SelectionDAG.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- This stage performs simple optimizations on the
ff9feeb5SBill Wendling   SelectionDAG to simplify it, and recognize meta instructions (like rotates
ff9feeb5SBill Wendling   and ``div``/``rem`` pairs) for targets that support these meta operations.
ff9feeb5SBill Wendling   This makes the resultant code more efficient and the `select instructions
ff9feeb5SBill Wendling   from DAG`_ phase (below) simpler.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Legalize SelectionDAG Types`_ --- This stage transforms SelectionDAG nodes
ff9feeb5SBill Wendling   to eliminate any types that are unsupported on the target.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to clean up
ff9feeb5SBill Wendling   redundancies exposed by type legalization.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Legalize SelectionDAG Ops`_ --- This stage transforms SelectionDAG nodes to
ff9feeb5SBill Wendling   eliminate any operations that are unsupported on the target.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to eliminate
ff9feeb5SBill Wendling   inefficiencies introduced by operation legalization.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `Select instructions from DAG`_ --- Finally, the target instruction selector
ff9feeb5SBill Wendling   matches the DAG operations to target instructions.  This process translates
ff9feeb5SBill Wendling   the target-independent input DAG into another DAG of target instructions.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. `SelectionDAG Scheduling and Formation`_ --- The last phase assigns a linear
ff9feeb5SBill Wendling   order to the instructions in the target-instruction DAG and emits them into
ff9feeb5SBill Wendling   the MachineFunction being compiled.  This step uses traditional prepass
ff9feeb5SBill Wendling   scheduling techniques.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAfter all of these steps are complete, the SelectionDAG is destroyed and the
ff9feeb5SBill Wendlingrest of the code generation passes are run.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingOne great way to visualize what is going on here is to take advantage of a few
ff9feeb5SBill WendlingLLC command line options.  The following options pop up a window displaying the
ff9feeb5SBill WendlingSelectionDAG at specific times (if you only get errors printed to the console
ff9feeb5SBill Wendlingwhile using this, you probably `need to configure your
5b41ea0dSCharlie Turnersystem <ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ to add support for it).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* ``-view-dag-combine1-dags`` displays the DAG after being built, before the
ff9feeb5SBill Wendling  first optimization pass.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* ``-view-legalize-dags`` displays the DAG before Legalization.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* ``-view-dag-combine2-dags`` displays the DAG before the second optimization
ff9feeb5SBill Wendling  pass.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* ``-view-isel-dags`` displays the DAG before the Select phase.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* ``-view-sched-dags`` displays the DAG before Scheduling.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``-view-sunit-dags`` displays the Scheduler's dependency graph.  This graph
ff9feeb5SBill Wendlingis based on the final SelectionDAG, with nodes that must be scheduled together
ff9feeb5SBill Wendlingbundled into a single scheduling-unit node, and with immediate operands and
ff9feeb5SBill Wendlingother nodes that aren't relevant for scheduling omitted.
d8976b8eSMehdi Amini
d8976b8eSMehdi AminiThe option ``-filter-view-dags`` allows to select the name of the basic block
d8976b8eSMehdi Aminithat you are interested to visualize and filters all the previous
d8976b8eSMehdi Amini``view-*-dags`` options.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Build initial DAG:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInitial SelectionDAG Construction
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe initial SelectionDAG is na\ :raw-html:`&iuml;`\ vely peephole expanded from
6f6f55eeSEli Benderskythe LLVM input by the ``SelectionDAGBuilder`` class.  The intent of this pass
ff9feeb5SBill Wendlingis to expose as much low-level, target-specific details to the SelectionDAG as
ff9feeb5SBill Wendlingpossible.  This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an
ff9feeb5SBill Wendling``SDNode add`` while a ``getelementptr`` is expanded into the obvious
ff9feeb5SBill Wendlingarithmetic). This pass requires target-specific hooks to lower calls, returns,
ff9feeb5SBill Wendlingvarargs, etc.  For these features, the :raw-html:`<tt>` `TargetLowering`_
ff9feeb5SBill Wendling:raw-html:`</tt>` interface is used.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _legalize types:
ff9feeb5SBill Wendling.. _Legalize SelectionDAG Types:
ff9feeb5SBill Wendling.. _Legalize SelectionDAG Ops:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAG LegalizeTypes Phase
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe Legalize phase is in charge of converting a DAG to only use the types that
ff9feeb5SBill Wendlingare natively supported by the target.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThere are two main ways of converting values of unsupported scalar types to
ff9feeb5SBill Wendlingvalues of supported types: converting small types to larger types ("promoting"),
ff9feeb5SBill Wendlingand breaking up large integer types into smaller ones ("expanding").  For
ff9feeb5SBill Wendlingexample, a target might require that all f32 values are promoted to f64 and that
ff9feeb5SBill Wendlingall i1/i8/i16 values are promoted to i32.  The same target might require that
ff9feeb5SBill Wendlingall i64 values be expanded into pairs of i32 values.  These changes can insert
ff9feeb5SBill Wendlingsign and zero extensions as needed to make sure that the final code has the same
ff9feeb5SBill Wendlingbehavior as the input.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThere are two main ways of converting values of unsupported vector types to
ff9feeb5SBill Wendlingvalue of supported types: splitting vector types, multiple times if necessary,
ff9feeb5SBill Wendlinguntil a legal type is found, and extending vector types by adding elements to
ff9feeb5SBill Wendlingthe end to round them out to legal types ("widening").  If a vector gets split
ff9feeb5SBill Wendlingall the way down to single-element parts with no supported vector type being
ff9feeb5SBill Wendlingfound, the elements are converted to scalars ("scalarizing").
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingA target implementation tells the legalizer which types are supported (and which
ff9feeb5SBill Wendlingregister class to use for them) by calling the ``addRegisterClass`` method in
8bd389d8SDmitri Gribenkoits ``TargetLowering`` constructor.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _legalize operations:
ff9feeb5SBill Wendling.. _Legalizer:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAG Legalize Phase
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe Legalize phase is in charge of converting a DAG to only use the operations
ff9feeb5SBill Wendlingthat are natively supported by the target.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTargets often have weird constraints, such as not supporting every operation on
ff9feeb5SBill Wendlingevery supported datatype (e.g. X86 does not support byte conditional moves and
ff9feeb5SBill WendlingPowerPC does not support sign-extending loads from a 16-bit memory location).
ff9feeb5SBill WendlingLegalize takes care of this by open-coding another sequence of operations to
ff9feeb5SBill Wendlingemulate the operation ("expansion"), by promoting one type to a larger type that
ff9feeb5SBill Wendlingsupports the operation ("promotion"), or by using a target-specific hook to
ff9feeb5SBill Wendlingimplement the legalization ("custom").
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingA target implementation tells the legalizer which operations are not supported
ff9feeb5SBill Wendling(and which of the above three actions to take) by calling the
ff9feeb5SBill Wendling``setOperationAction`` method in its ``TargetLowering`` constructor.
ff9feeb5SBill Wendling
6a554188SSanjay PatelIf a target has legal vector types, it is expected to produce efficient machine
6a554188SSanjay Patelcode for common forms of the shufflevector IR instruction using those types.
6a554188SSanjay PatelThis may require custom legalization for SelectionDAG vector operations that
6a554188SSanjay Patelare created from the shufflevector IR. The shufflevector forms that should be
6a554188SSanjay Patelhandled include:
6a554188SSanjay Patel
6a554188SSanjay Patel* Vector select --- Each element of the vector is chosen from either of the
6a554188SSanjay Patel  corresponding elements of the 2 input vectors. This operation may also be
6a554188SSanjay Patel  known as a "blend" or "bitwise select" in target assembly. This type of shuffle
6a554188SSanjay Patel  maps directly to the ``shuffle_vector`` SelectionDAG node.
6a554188SSanjay Patel
6a554188SSanjay Patel* Insert subvector --- A vector is placed into a longer vector type starting
6a554188SSanjay Patel  at index 0. This type of shuffle maps directly to the ``insert_subvector``
6a554188SSanjay Patel  SelectionDAG node with the ``index`` operand set to 0.
6a554188SSanjay Patel
6a554188SSanjay Patel* Extract subvector --- A vector is pulled from a longer vector type starting
6a554188SSanjay Patel  at index 0. This type of shuffle maps directly to the ``extract_subvector``
6a554188SSanjay Patel  SelectionDAG node with the ``index`` operand set to 0.
6a554188SSanjay Patel
6a554188SSanjay Patel* Splat --- All elements of the vector have identical scalar elements. This
6a554188SSanjay Patel  operation may also be known as a "broadcast" or "duplicate" in target assembly.
6a554188SSanjay Patel  The shufflevector IR instruction may change the vector length, so this operation
6a554188SSanjay Patel  may map to multiple SelectionDAG nodes including ``shuffle_vector``,
6a554188SSanjay Patel  ``concat_vectors``, ``insert_subvector``, and ``extract_subvector``.
6a554188SSanjay Patel
ff9feeb5SBill WendlingPrior to the existence of the Legalize passes, we required that every target
ff9feeb5SBill Wendling`selector`_ supported and handled every operator and type even if they are not
ff9feeb5SBill Wendlingnatively supported.  The introduction of the Legalize phases allows all of the
ff9feeb5SBill Wendlingcanonicalization patterns to be shared across targets, and makes it very easy to
ff9feeb5SBill Wendlingoptimize the canonicalized code because it is still in the form of a DAG.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _optimizations:
ff9feeb5SBill Wendling.. _Optimize SelectionDAG:
ff9feeb5SBill Wendling.. _selector:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAG Optimization Phase: the DAG Combiner
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe SelectionDAG optimization phase is run multiple times for code generation,
ff9feeb5SBill Wendlingimmediately after the DAG is built and once after each legalization.  The first
ff9feeb5SBill Wendlingrun of the pass allows the initial code to be cleaned up (e.g. performing
ff9feeb5SBill Wendlingoptimizations that depend on knowing that the operators have restricted type
ff9feeb5SBill Wendlinginputs).  Subsequent runs of the pass clean up the messy code generated by the
ff9feeb5SBill WendlingLegalize passes, which allows Legalize to be very simple (it can focus on making
ff9feeb5SBill Wendlingcode legal instead of focusing on generating *good* and legal code).
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingOne important class of optimizations performed is optimizing inserted sign and
ff9feeb5SBill Wendlingzero extension instructions.  We currently use ad-hoc techniques, but could move
ff9feeb5SBill Wendlingto more rigorous techniques in the future.  Here are some good papers on the
ff9feeb5SBill Wendlingsubject:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling"`Widening integer arithmetic <http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html>`_" :raw-html:`<br>`
ff9feeb5SBill WendlingKevin Redwine and Norman Ramsey :raw-html:`<br>`
ff9feeb5SBill WendlingInternational Conference on Compiler Construction (CC) 2004
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling"`Effective sign extension elimination <http://portal.acm.org/citation.cfm?doid=512529.512552>`_"  :raw-html:`<br>`
ff9feeb5SBill WendlingMotohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani :raw-html:`<br>`
ff9feeb5SBill WendlingProceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
ff9feeb5SBill Wendlingand Implementation.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Select instructions from DAG:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAG Select Phase
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe Select phase is the bulk of the target-specific code for instruction
ff9feeb5SBill Wendlingselection.  This phase takes a legal SelectionDAG as input, pattern matches the
ff9feeb5SBill Wendlinginstructions supported by the target to this DAG, and produces a new DAG of
ff9feeb5SBill Wendlingtarget code.  For example, consider the following LLVM fragment:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: llvm
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  %t1 = fadd float %W, %X
ff9feeb5SBill Wendling  %t2 = fmul float %t1, %Y
ff9feeb5SBill Wendling  %t3 = fadd float %t2, %Z
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis LLVM code corresponds to a SelectionDAG that looks basically like this:
ff9feeb5SBill Wendling
124f2593SRenato Golin.. code-block:: text
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIf a target supports floating point multiply-and-add (FMA) operations, one of
ff9feeb5SBill Wendlingthe adds can be merged with the multiply.  On the PowerPC, for example, the
ff9feeb5SBill Wendlingoutput of the instruction selector might look like this DAG:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  (FMADDS (FADDS W, X), Y, Z)
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``FMADDS`` instruction is a ternary instruction that multiplies its first
ff9feeb5SBill Wendlingtwo operands and adds the third (as single-precision floating-point numbers).
ff9feeb5SBill WendlingThe ``FADDS`` instruction is a simple binary single-precision add instruction.
ff9feeb5SBill WendlingTo perform this pattern match, the PowerPC backend includes the following
ff9feeb5SBill Wendlinginstruction definitions:
ff9feeb5SBill Wendling
33f2c07cSSean Silva.. code-block:: text
33f2c07cSSean Silva  :emphasize-lines: 4-5,9
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  def FMADDS : AForm_1<59, 29,
ff9feeb5SBill Wendling                      (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
ff9feeb5SBill Wendling                      "fmadds $FRT, $FRA, $FRC, $FRB",
ff9feeb5SBill Wendling                      [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
ff9feeb5SBill Wendling                                             F4RC:$FRB))]>;
ff9feeb5SBill Wendling  def FADDS : AForm_2<59, 21,
ff9feeb5SBill Wendling                      (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
ff9feeb5SBill Wendling                      "fadds $FRT, $FRA, $FRB",
ff9feeb5SBill Wendling                      [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;
ff9feeb5SBill Wendling
33f2c07cSSean SilvaThe highlighted portion of the instruction definitions indicates the pattern
33f2c07cSSean Silvaused to match the instructions. The DAG operators (like ``fmul``/``fadd``)
33f2c07cSSean Silvaare defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file.
33f2c07cSSean Silva"``F4RC``" is the register class of the input and result values.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe TableGen DAG instruction selector generator reads the instruction patterns
ff9feeb5SBill Wendlingin the ``.td`` file and automatically builds parts of the pattern matching code
ff9feeb5SBill Wendlingfor your target.  It has the following strengths:
ff9feeb5SBill Wendling
ebba0507SJonathan Roelofs* At compiler-compile time, it analyzes your instruction patterns and tells you
ff9feeb5SBill Wendling  if your patterns make sense or not.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* It can handle arbitrary constraints on operands for the pattern match.  In
ff9feeb5SBill Wendling  particular, it is straight-forward to say things like "match any immediate
ff9feeb5SBill Wendling  that is a 13-bit sign-extended value".  For examples, see the ``immSExt16``
ff9feeb5SBill Wendling  and related ``tblgen`` classes in the PowerPC backend.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* It knows several important identities for the patterns defined.  For example,
ff9feeb5SBill Wendling  it knows that addition is commutative, so it allows the ``FMADDS`` pattern
ff9feeb5SBill Wendling  above to match "``(fadd X, (fmul Y, Z))``" as well as "``(fadd (fmul X, Y),
ff9feeb5SBill Wendling  Z)``", without the target author having to specially handle this case.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* It has a full-featured type-inferencing system.  In particular, you should
ff9feeb5SBill Wendling  rarely have to explicitly tell the system what type parts of your patterns
ff9feeb5SBill Wendling  are.  In the ``FMADDS`` case above, we didn't have to tell ``tblgen`` that all
ff9feeb5SBill Wendling  of the nodes in the pattern are of type 'f32'.  It was able to infer and
ff9feeb5SBill Wendling  propagate this knowledge from the fact that ``F4RC`` has type 'f32'.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* Targets can define their own (and rely on built-in) "pattern fragments".
ff9feeb5SBill Wendling  Pattern fragments are chunks of reusable patterns that get inlined into your
ebba0507SJonathan Roelofs  patterns during compiler-compile time.  For example, the integer "``(not
ff9feeb5SBill Wendling  x)``" operation is actually defined as a pattern fragment that expands as
ff9feeb5SBill Wendling  "``(xor x, -1)``", since the SelectionDAG does not have a native '``not``'
ff9feeb5SBill Wendling  operation.  Targets can define their own short-hand fragments as they see fit.
ff9feeb5SBill Wendling  See the definition of '``not``' and '``ineg``' for examples.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* In addition to instructions, targets can specify arbitrary patterns that map
ff9feeb5SBill Wendling  to one or more instructions using the 'Pat' class.  For example, the PowerPC
ff9feeb5SBill Wendling  has no way to load an arbitrary integer immediate into a register in one
ff9feeb5SBill Wendling  instruction. To tell tblgen how to do this, it defines:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  ::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling    // Arbitrary immediate support.  Implement in terms of LIS/ORI.
ff9feeb5SBill Wendling    def : Pat<(i32 imm:$imm),
ff9feeb5SBill Wendling              (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  If none of the single-instruction patterns for loading an immediate into a
ff9feeb5SBill Wendling  register match, this will be used.  This rule says "match an arbitrary i32
ff9feeb5SBill Wendling  immediate, turning it into an ``ORI`` ('or a 16-bit immediate') and an ``LIS``
ff9feeb5SBill Wendling  ('load 16-bit immediate, where the immediate is shifted to the left 16 bits')
ff9feeb5SBill Wendling  instruction".  To make this work, the ``LO16``/``HI16`` node transformations
ff9feeb5SBill Wendling  are used to manipulate the input immediate (in this case, take the high or low
ff9feeb5SBill Wendling  16-bits of the immediate).
ff9feeb5SBill Wendling
e618abd6SUlrich Weigand* When using the 'Pat' class to map a pattern to an instruction that has one
e618abd6SUlrich Weigand  or more complex operands (like e.g. `X86 addressing mode`_), the pattern may
e618abd6SUlrich Weigand  either specify the operand as a whole using a ``ComplexPattern``, or else it
e618abd6SUlrich Weigand  may specify the components of the complex operand separately.  The latter is
e618abd6SUlrich Weigand  done e.g. for pre-increment instructions by the PowerPC back end:
e618abd6SUlrich Weigand
e618abd6SUlrich Weigand  ::
e618abd6SUlrich Weigand
e618abd6SUlrich Weigand    def STWU  : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst),
e618abd6SUlrich Weigand                    "stwu $rS, $dst", LdStStoreUpd, []>,
e618abd6SUlrich Weigand                    RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">;
e618abd6SUlrich Weigand
e618abd6SUlrich Weigand    def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff),
e618abd6SUlrich Weigand              (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>;
e618abd6SUlrich Weigand
e618abd6SUlrich Weigand  Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the
e618abd6SUlrich Weigand  complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction.
e618abd6SUlrich Weigand
ff9feeb5SBill Wendling* While the system does automate a lot, it still allows you to write custom C++
ff9feeb5SBill Wendling  code to match special cases if there is something that is hard to
ff9feeb5SBill Wendling  express.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingWhile it has many strengths, the system currently has some limitations,
ff9feeb5SBill Wendlingprimarily because it is a work in progress and is not yet finished:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* Overall, there is no way to define or match SelectionDAG nodes that define
ff9feeb5SBill Wendling  multiple values (e.g. ``SMUL_LOHI``, ``LOAD``, ``CALL``, etc).  This is the
ff9feeb5SBill Wendling  biggest reason that you currently still *have to* write custom C++ code
ff9feeb5SBill Wendling  for your instruction selector.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* There is no great way to support matching complex addressing modes yet.  In
ff9feeb5SBill Wendling  the future, we will extend pattern fragments to allow them to define multiple
ff9feeb5SBill Wendling  values (e.g. the four operands of the `X86 addressing mode`_, which are
ff9feeb5SBill Wendling  currently matched with custom C++ code).  In addition, we'll extend fragments
ff9feeb5SBill Wendling  so that a fragment can match multiple different patterns.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* We don't automatically infer flags like ``isStore``/``isLoad`` yet.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* We don't automatically generate the set of supported registers and operations
ff9feeb5SBill Wendling  for the `Legalizer`_ yet.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* We don't have a way of tying in custom legalized nodes yet.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingDespite these limitations, the instruction selector generator is still quite
ff9feeb5SBill Wendlinguseful for most of the binary and logical operations in typical instruction
ff9feeb5SBill Wendlingsets.  If you run into any problems or can't figure out how to do something,
ff9feeb5SBill Wendlingplease let Chris know!
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Scheduling and Formation:
ff9feeb5SBill Wendling.. _SelectionDAG Scheduling and Formation:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSelectionDAG Scheduling and Formation Phase
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe scheduling phase takes the DAG of target instructions from the selection
ff9feeb5SBill Wendlingphase and assigns an order.  The scheduler can pick an order depending on
ff9feeb5SBill Wendlingvarious constraints of the machines (i.e. order for minimal register pressure or
ff9feeb5SBill Wendlingtry to cover instruction latencies).  Once an order is established, the DAG is
ff9feeb5SBill Wendlingconverted to a list of :raw-html:`<tt>` `MachineInstr`_\s :raw-html:`</tt>` and
ff9feeb5SBill Wendlingthe SelectionDAG is destroyed.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingNote that this phase is logically separate from the instruction selection phase,
ff9feeb5SBill Wendlingbut is tied to it closely in the code because it operates on SelectionDAGs.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingFuture directions for the SelectionDAG
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. Optional function-at-a-time selection.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. Auto-generate entire selector from ``.td`` file.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _SSA-based Machine Code Optimizations:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSSA-based Machine Code Optimizations
ff9feeb5SBill Wendling------------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTo Be Written
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLive Intervals
ff9feeb5SBill Wendling--------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLive Intervals are the ranges (intervals) where a variable is *live*.  They are
ff9feeb5SBill Wendlingused by some `register allocator`_ passes to determine if two or more virtual
ff9feeb5SBill Wendlingregisters which require the same physical register are live at the same point in
ff9feeb5SBill Wendlingthe program (i.e., they conflict).  When this situation occurs, one virtual
ff9feeb5SBill Wendlingregister must be *spilled*.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLive Variable Analysis
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe first step in determining the live intervals of variables is to calculate
ff9feeb5SBill Wendlingthe set of registers that are immediately dead after the instruction (i.e., the
ff9feeb5SBill Wendlinginstruction calculates the value, but it is never used) and the set of registers
ff9feeb5SBill Wendlingthat are used by the instruction, but are never used after the instruction
ff9feeb5SBill Wendling(i.e., they are killed). Live variable information is computed for
ff9feeb5SBill Wendlingeach *virtual* register and *register allocatable* physical register
ff9feeb5SBill Wendlingin the function.  This is done in a very efficient manner because it uses SSA to
ff9feeb5SBill Wendlingsparsely compute lifetime information for virtual registers (which are in SSA
ff9feeb5SBill Wendlingform) and only has to track physical registers within a block.  Before register
ff9feeb5SBill Wendlingallocation, LLVM can assume that physical registers are only live within a
ff9feeb5SBill Wendlingsingle basic block.  This allows it to do a single, local analysis to resolve
ff9feeb5SBill Wendlingphysical register lifetimes within each basic block. If a physical register is
ff9feeb5SBill Wendlingnot register allocatable (e.g., a stack pointer or condition codes), it is not
ff9feeb5SBill Wendlingtracked.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingPhysical registers may be live in to or out of a function. Live in values are
ff9feeb5SBill Wendlingtypically arguments in registers. Live out values are typically return values in
ff9feeb5SBill Wendlingregisters. Live in values are marked as such, and are given a dummy "defining"
ff9feeb5SBill Wendlinginstruction during live intervals analysis. If the last basic block of a
ff9feeb5SBill Wendlingfunction is a ``return``, then it's marked as using all live out values in the
ff9feeb5SBill Wendlingfunction.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling``PHI`` nodes need to be handled specially, because the calculation of the live
ff9feeb5SBill Wendlingvariable information from a depth first traversal of the CFG of the function
ff9feeb5SBill Wendlingwon't guarantee that a virtual register used by the ``PHI`` node is defined
ff9feeb5SBill Wendlingbefore it's used. When a ``PHI`` node is encountered, only the definition is
ff9feeb5SBill Wendlinghandled, because the uses will be handled in other basic blocks.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingFor each ``PHI`` node of the current basic block, we simulate an assignment at
ff9feeb5SBill Wendlingthe end of the current basic block and traverse the successor basic blocks. If a
ff9feeb5SBill Wendlingsuccessor basic block has a ``PHI`` node and one of the ``PHI`` node's operands
ff9feeb5SBill Wendlingis coming from the current basic block, then the variable is marked as *alive*
ff9feeb5SBill Wendlingwithin the current basic block and all of its predecessor basic blocks, until
ff9feeb5SBill Wendlingthe basic block with the defining instruction is encountered.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLive Intervals Analysis
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingWe now have the information available to perform the live intervals analysis and
ff9feeb5SBill Wendlingbuild the live intervals themselves.  We start off by numbering the basic blocks
ff9feeb5SBill Wendlingand machine instructions.  We then handle the "live-in" values.  These are in
ff9feeb5SBill Wendlingphysical registers, so the physical register is assumed to be killed by the end
ff9feeb5SBill Wendlingof the basic block.  Live intervals for virtual registers are computed for some
ff9feeb5SBill Wendlingordering of the machine instructions ``[1, N]``.  A live interval is an interval
ff9feeb5SBill Wendling``[i, j)``, where ``1 >= i >= j > N``, for which a variable is live.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. note::
ff9feeb5SBill Wendling  More to come...
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Register Allocation:
ff9feeb5SBill Wendling.. _register allocator:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingRegister Allocation
ff9feeb5SBill Wendling-------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe *Register Allocation problem* consists in mapping a program
ff9feeb5SBill Wendling:raw-html:`<b><tt>` P\ :sub:`v`\ :raw-html:`</tt></b>`, that can use an unbounded
ff9feeb5SBill Wendlingnumber of virtual registers, to a program :raw-html:`<b><tt>` P\ :sub:`p`\
ff9feeb5SBill Wendling:raw-html:`</tt></b>` that contains a finite (possibly small) number of physical
ff9feeb5SBill Wendlingregisters. Each target architecture has a different number of physical
ff9feeb5SBill Wendlingregisters. If the number of physical registers is not enough to accommodate all
ff9feeb5SBill Wendlingthe virtual registers, some of them will have to be mapped into memory. These
ff9feeb5SBill Wendlingvirtuals are called *spilled virtuals*.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingHow registers are represented in LLVM
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn LLVM, physical registers are denoted by integer numbers that normally range
ff9feeb5SBill Wendlingfrom 1 to 1023. To see how this numbering is defined for a particular
ff9feeb5SBill Wendlingarchitecture, you can read the ``GenRegisterNames.inc`` file for that
ff9feeb5SBill Wendlingarchitecture. For instance, by inspecting
ff9feeb5SBill Wendling``lib/Target/X86/X86GenRegisterInfo.inc`` we see that the 32-bit register
ff9feeb5SBill Wendling``EAX`` is denoted by 43, and the MMX register ``MM0`` is mapped to 65.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSome architectures contain registers that share the same physical location. A
ff9feeb5SBill Wendlingnotable example is the X86 platform. For instance, in the X86 architecture, the
ff9feeb5SBill Wendlingregisters ``EAX``, ``AX`` and ``AL`` share the first eight bits. These physical
ff9feeb5SBill Wendlingregisters are marked as *aliased* in LLVM. Given a particular architecture, you
ff9feeb5SBill Wendlingcan check which registers are aliased by inspecting its ``RegisterInfo.td``
ff9feeb5SBill Wendlingfile. Moreover, the class ``MCRegAliasIterator`` enumerates all the physical
ff9feeb5SBill Wendlingregisters aliased to a register.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingPhysical registers, in LLVM, are grouped in *Register Classes*.  Elements in the
ff9feeb5SBill Wendlingsame register class are functionally equivalent, and can be interchangeably
ff9feeb5SBill Wendlingused. Each virtual register can only be mapped to physical registers of a
ff9feeb5SBill Wendlingparticular class. For instance, in the X86 architecture, some virtuals can only
ff9feeb5SBill Wendlingbe allocated to 8 bit registers.  A register class is described by
ff9feeb5SBill Wendling``TargetRegisterClass`` objects.  To discover if a virtual register is
ff6a7d6dSSean Silvacompatible with a given physical, this code can be used:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: c++
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  bool RegMapping_Fer::compatible_class(MachineFunction &mf,
ff9feeb5SBill Wendling                                        unsigned v_reg,
ff9feeb5SBill Wendling                                        unsigned p_reg) {
ff9feeb5SBill Wendling    assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
ff9feeb5SBill Wendling           "Target register must be physical");
ff9feeb5SBill Wendling    const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
ff9feeb5SBill Wendling    return trc->contains(p_reg);
ff9feeb5SBill Wendling  }
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSometimes, mostly for debugging purposes, it is useful to change the number of
ff9feeb5SBill Wendlingphysical registers available in the target architecture. This must be done
f65d4aa9SKazuaki Ishizakistatically, inside the ``TargetRegisterInfo.td`` file. Just ``grep`` for
ff9feeb5SBill Wendling``RegisterClass``, the last parameter of which is a list of registers. Just
ff9feeb5SBill Wendlingcommenting some out is one simple way to avoid them being used. A more polite
ff9feeb5SBill Wendlingway is to explicitly exclude some registers from the *allocation order*. See the
ff9feeb5SBill Wendlingdefinition of the ``GR8`` register class in
ff9feeb5SBill Wendling``lib/Target/X86/X86RegisterInfo.td`` for an example of this.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingVirtual registers are also denoted by integer numbers. Contrary to physical
ff9feeb5SBill Wendlingregisters, different virtual registers never share the same number. Whereas
ff9feeb5SBill Wendlingphysical registers are statically defined in a ``TargetRegisterInfo.td`` file
ff9feeb5SBill Wendlingand cannot be created by the application developer, that is not the case with
ff9feeb5SBill Wendlingvirtual registers. In order to create new virtual registers, use the method
ff9feeb5SBill Wendling``MachineRegisterInfo::createVirtualRegister()``. This method will return a new
ff9feeb5SBill Wendlingvirtual register. Use an ``IndexedMap<Foo, VirtReg2IndexFunctor>`` to hold
ff9feeb5SBill Wendlinginformation per virtual register. If you need to enumerate all virtual
ff9feeb5SBill Wendlingregisters, use the function ``TargetRegisterInfo::index2VirtReg()`` to find the
ff9feeb5SBill Wendlingvirtual register numbers:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: c++
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling    for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
ff9feeb5SBill Wendling      unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
ff9feeb5SBill Wendling      stuff(VirtReg);
ff9feeb5SBill Wendling    }
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingBefore register allocation, the operands of an instruction are mostly virtual
ff9feeb5SBill Wendlingregisters, although physical registers may also be used. In order to check if a
ff9feeb5SBill Wendlinggiven machine operand is a register, use the boolean function
ff9feeb5SBill Wendling``MachineOperand::isRegister()``. To obtain the integer code of a register, use
ff9feeb5SBill Wendling``MachineOperand::getReg()``. An instruction may define or use a register. For
ff9feeb5SBill Wendlinginstance, ``ADD reg:1026 := reg:1025 reg:1024`` defines the registers 1024, and
ff9feeb5SBill Wendlinguses registers 1025 and 1026. Given a register operand, the method
ff9feeb5SBill Wendling``MachineOperand::isUse()`` informs if that register is being used by the
ff9feeb5SBill Wendlinginstruction. The method ``MachineOperand::isDef()`` informs if that registers is
ff9feeb5SBill Wendlingbeing defined.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingWe will call physical registers present in the LLVM bitcode before register
ff9feeb5SBill Wendlingallocation *pre-colored registers*. Pre-colored registers are used in many
ff9feeb5SBill Wendlingdifferent situations, for instance, to pass parameters of functions calls, and
ff9feeb5SBill Wendlingto store results of particular instructions. There are two types of pre-colored
ff9feeb5SBill Wendlingregisters: the ones *implicitly* defined, and those *explicitly*
ff9feeb5SBill Wendlingdefined. Explicitly defined registers are normal operands, and can be accessed
ff9feeb5SBill Wendlingwith ``MachineInstr::getOperand(int)::getReg()``.  In order to check which
ff9feeb5SBill Wendlingregisters are implicitly defined by an instruction, use the
ff9feeb5SBill Wendling``TargetInstrInfo::get(opcode)::ImplicitDefs``, where ``opcode`` is the opcode
ff9feeb5SBill Wendlingof the target instruction. One important difference between explicit and
ff9feeb5SBill Wendlingimplicit physical registers is that the latter are defined statically for each
ff9feeb5SBill Wendlinginstruction, whereas the former may vary depending on the program being
ff9feeb5SBill Wendlingcompiled. For example, an instruction that represents a function call will
ff9feeb5SBill Wendlingalways implicitly define or use the same set of physical registers. To read the
ff9feeb5SBill Wendlingregisters implicitly used by an instruction, use
ff9feeb5SBill Wendling``TargetInstrInfo::get(opcode)::ImplicitUses``. Pre-colored registers impose
ff9feeb5SBill Wendlingconstraints on any register allocation algorithm. The register allocator must
ff9feeb5SBill Wendlingmake sure that none of them are overwritten by the values of virtual registers
ff9feeb5SBill Wendlingwhile still alive.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMapping virtual registers to physical registers
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThere are two ways to map virtual registers to physical registers (or to memory
ff9feeb5SBill Wendlingslots). The first way, that we will call *direct mapping*, is based on the use
ff9feeb5SBill Wendlingof methods of the classes ``TargetRegisterInfo``, and ``MachineOperand``. The
ff9feeb5SBill Wendlingsecond way, that we will call *indirect mapping*, relies on the ``VirtRegMap``
ff9feeb5SBill Wendlingclass in order to insert loads and stores sending and getting values to and from
ff9feeb5SBill Wendlingmemory.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe direct mapping provides more flexibility to the developer of the register
ff9feeb5SBill Wendlingallocator; however, it is more error prone, and demands more implementation
ff9feeb5SBill Wendlingwork.  Basically, the programmer will have to specify where load and store
ff9feeb5SBill Wendlinginstructions should be inserted in the target function being compiled in order
ff9feeb5SBill Wendlingto get and store values in memory. To assign a physical register to a virtual
ff9feeb5SBill Wendlingregister present in a given operand, use ``MachineOperand::setReg(p_reg)``. To
ff9feeb5SBill Wendlinginsert a store instruction, use ``TargetInstrInfo::storeRegToStackSlot(...)``,
ff9feeb5SBill Wendlingand to insert a load instruction, use ``TargetInstrInfo::loadRegFromStackSlot``.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe indirect mapping shields the application developer from the complexities of
ff9feeb5SBill Wendlinginserting load and store instructions. In order to map a virtual register to a
ff9feeb5SBill Wendlingphysical one, use ``VirtRegMap::assignVirt2Phys(vreg, preg)``.  In order to map
ff9feeb5SBill Wendlinga certain virtual register to memory, use
ff9feeb5SBill Wendling``VirtRegMap::assignVirt2StackSlot(vreg)``. This method will return the stack
ff9feeb5SBill Wendlingslot where ``vreg``'s value will be located.  If it is necessary to map another
ff9feeb5SBill Wendlingvirtual register to the same stack slot, use
ff9feeb5SBill Wendling``VirtRegMap::assignVirt2StackSlot(vreg, stack_location)``. One important point
ff9feeb5SBill Wendlingto consider when using the indirect mapping, is that even if a virtual register
ff9feeb5SBill Wendlingis mapped to memory, it still needs to be mapped to a physical register. This
ff9feeb5SBill Wendlingphysical register is the location where the virtual register is supposed to be
ff9feeb5SBill Wendlingfound before being stored or after being reloaded.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIf the indirect strategy is used, after all the virtual registers have been
ff9feeb5SBill Wendlingmapped to physical registers or stack slots, it is necessary to use a spiller
ff9feeb5SBill Wendlingobject to place load and store instructions in the code. Every virtual that has
1e61ffddSEric Christopherbeen mapped to a stack slot will be stored to memory after being defined and will
ff9feeb5SBill Wendlingbe loaded before being used. The implementation of the spiller tries to recycle
ff9feeb5SBill Wendlingload/store instructions, avoiding unnecessary instructions. For an example of
ff9feeb5SBill Wendlinghow to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in
ff9feeb5SBill Wendling``lib/CodeGen/RegAllocLinearScan.cpp``.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingHandling two address instructions
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingWith very rare exceptions (e.g., function calls), the LLVM machine code
ff9feeb5SBill Wendlinginstructions are three address instructions. That is, each instruction is
ff9feeb5SBill Wendlingexpected to define at most one register, and to use at most two registers.
ff9feeb5SBill WendlingHowever, some architectures use two address instructions. In this case, the
1e61ffddSEric Christopherdefined register is also one of the used registers. For instance, an instruction
ff9feeb5SBill Wendlingsuch as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX +
ff9feeb5SBill Wendling%EBX``.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn order to produce correct code, LLVM must convert three address instructions
ff9feeb5SBill Wendlingthat represent two address instructions into true two address instructions. LLVM
ff9feeb5SBill Wendlingprovides the pass ``TwoAddressInstructionPass`` for this specific purpose. It
ff9feeb5SBill Wendlingmust be run before register allocation takes place. After its execution, the
ff9feeb5SBill Wendlingresulting code may no longer be in SSA form. This happens, for instance, in
ff9feeb5SBill Wendlingsituations where an instruction such as ``%a = ADD %b %c`` is converted to two
ff9feeb5SBill Wendlinginstructions such as:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  %a = MOVE %b
ff9feeb5SBill Wendling  %a = ADD %a %c
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingNotice that, internally, the second instruction is represented as ``ADD
ff9feeb5SBill Wendling%a[def/use] %c``. I.e., the register operand ``%a`` is both used and defined by
ff9feeb5SBill Wendlingthe instruction.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe SSA deconstruction phase
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAn important transformation that happens during register allocation is called
ff9feeb5SBill Wendlingthe *SSA Deconstruction Phase*. The SSA form simplifies many analyses that are
ff9feeb5SBill Wendlingperformed on the control flow graph of programs. However, traditional
ff9feeb5SBill Wendlinginstruction sets do not implement PHI instructions. Thus, in order to generate
ff9feeb5SBill Wendlingexecutable code, compilers must replace PHI instructions with other instructions
ff9feeb5SBill Wendlingthat preserve their semantics.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThere are many ways in which PHI instructions can safely be removed from the
ff9feeb5SBill Wendlingtarget code. The most traditional PHI deconstruction algorithm replaces PHI
ff9feeb5SBill Wendlinginstructions with copy instructions. That is the strategy adopted by LLVM. The
ff9feeb5SBill WendlingSSA deconstruction algorithm is implemented in
ff9feeb5SBill Wendling``lib/CodeGen/PHIElimination.cpp``. In order to invoke this pass, the identifier
ff9feeb5SBill Wendling``PHIEliminationID`` must be marked as required in the code of the register
ff9feeb5SBill Wendlingallocator.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction folding
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling*Instruction folding* is an optimization performed during register allocation
ff9feeb5SBill Wendlingthat removes unnecessary copy instructions. For instance, a sequence of
ff9feeb5SBill Wendlinginstructions such as:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  %EBX = LOAD %mem_address
ff9feeb5SBill Wendling  %EAX = COPY %EBX
ff9feeb5SBill Wendling
ff9feeb5SBill Wendlingcan be safely substituted by the single instruction:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  %EAX = LOAD %mem_address
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstructions can be folded with the
ff9feeb5SBill Wendling``TargetRegisterInfo::foldMemoryOperand(...)`` method. Care must be taken when
ff9feeb5SBill Wendlingfolding instructions; a folded instruction can be quite different from the
ff9feeb5SBill Wendlingoriginal instruction. See ``LiveIntervals::addIntervalsForSpills`` in
ff9feeb5SBill Wendling``lib/CodeGen/LiveIntervalAnalysis.cpp`` for an example of its use.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingBuilt in register allocators
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe LLVM infrastructure provides the application developer with three different
ff9feeb5SBill Wendlingregister allocators:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* *Fast* --- This register allocator is the default for debug builds. It
ff9feeb5SBill Wendling  allocates registers on a basic block level, attempting to keep values in
ff9feeb5SBill Wendling  registers and reusing registers as appropriate.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* *Basic* --- This is an incremental approach to register allocation. Live
ff9feeb5SBill Wendling  ranges are assigned to registers one at a time in an order that is driven by
ff9feeb5SBill Wendling  heuristics. Since code can be rewritten on-the-fly during allocation, this
ff9feeb5SBill Wendling  framework allows interesting allocators to be developed as extensions. It is
ff9feeb5SBill Wendling  not itself a production register allocator but is a potentially useful
ff9feeb5SBill Wendling  stand-alone mode for triaging bugs and as a performance baseline.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* *Greedy* --- *The default allocator*. This is a highly tuned implementation of
ff9feeb5SBill Wendling  the *Basic* allocator that incorporates global live range splitting. This
ff9feeb5SBill Wendling  allocator works hard to minimize the cost of spill code.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* *PBQP* --- A Partitioned Boolean Quadratic Programming (PBQP) based register
ff9feeb5SBill Wendling  allocator. This allocator works by constructing a PBQP problem representing
ff9feeb5SBill Wendling  the register allocation problem under consideration, solving this using a PBQP
ff9feeb5SBill Wendling  solver, and mapping the solution back to a register assignment.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe type of register allocator used in ``llc`` can be chosen with the command
ff9feeb5SBill Wendlingline option ``-regalloc=...``:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: bash
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  $ llc -regalloc=linearscan file.bc -o ln.s
ff9feeb5SBill Wendling  $ llc -regalloc=fast file.bc -o fa.s
ff9feeb5SBill Wendling  $ llc -regalloc=pbqp file.bc -o pbqp.s
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Prolog/Epilog Code Insertion:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingProlog/Epilog Code Insertion
ff9feeb5SBill Wendling----------------------------
ff9feeb5SBill Wendling
0ee7a2c3SDaniel McIntosh.. note::
0ee7a2c3SDaniel McIntosh
0ee7a2c3SDaniel McIntosh  To Be Written
0ee7a2c3SDaniel McIntosh
ff9feeb5SBill WendlingCompact Unwind
0ee7a2c3SDaniel McIntosh--------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThrowing an exception requires *unwinding* out of a function. The information on
ff9feeb5SBill Wendlinghow to unwind a given function is traditionally expressed in DWARF unwind
ff9feeb5SBill Wendling(a.k.a. frame) info. But that format was originally developed for debuggers to
ff9feeb5SBill Wendlingbacktrace, and each Frame Description Entry (FDE) requires ~20-30 bytes per
ff9feeb5SBill Wendlingfunction. There is also the cost of mapping from an address in a function to the
ff9feeb5SBill Wendlingcorresponding FDE at runtime. An alternative unwind encoding is called *compact
ff9feeb5SBill Wendlingunwind* and requires just 4-bytes per function.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe compact unwind encoding is a 32-bit value, which is encoded in an
ff9feeb5SBill Wendlingarchitecture-specific way. It specifies which registers to restore and from
ff9feeb5SBill Wendlingwhere, and how to unwind out of the function. When the linker creates a final
ff9feeb5SBill Wendlinglinked image, it will create a ``__TEXT,__unwind_info`` section. This section is
ff9feeb5SBill Wendlinga small and fast way for the runtime to access unwind info for any given
ff9feeb5SBill Wendlingfunction. If we emit compact unwind info for the function, that compact unwind
ff9feeb5SBill Wendlinginfo will be encoded in the ``__TEXT,__unwind_info`` section. If we emit DWARF
ff9feeb5SBill Wendlingunwind info, the ``__TEXT,__unwind_info`` section will contain the offset of the
ff9feeb5SBill WendlingFDE in the ``__TEXT,__eh_frame`` section in the final linked image.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingFor X86, there are three modes for the compact unwind encoding:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling*Function with a Frame Pointer (``EBP`` or ``RBP``)*
ff9feeb5SBill Wendling  ``EBP/RBP``-based frame, where ``EBP/RBP`` is pushed onto the stack
ff9feeb5SBill Wendling  immediately after the return address, then ``ESP/RSP`` is moved to
ff9feeb5SBill Wendling  ``EBP/RBP``. Thus to unwind, ``ESP/RSP`` is restored with the current
ff9feeb5SBill Wendling  ``EBP/RBP`` value, then ``EBP/RBP`` is restored by popping the stack, and the
ff9feeb5SBill Wendling  return is done by popping the stack once more into the PC. All non-volatile
ff9feeb5SBill Wendling  registers that need to be restored must have been saved in a small range on
ff9feeb5SBill Wendling  the stack that starts ``EBP-4`` to ``EBP-1020`` (``RBP-8`` to
ff9feeb5SBill Wendling  ``RBP-1020``). The offset (divided by 4 in 32-bit mode and 8 in 64-bit mode)
ff9feeb5SBill Wendling  is encoded in bits 16-23 (mask: ``0x00FF0000``).  The registers saved are
ff9feeb5SBill Wendling  encoded in bits 0-14 (mask: ``0x00007FFF``) as five 3-bit entries from the
ff9feeb5SBill Wendling  following table:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling    ==============  =============  ===============
ff9feeb5SBill Wendling    Compact Number  i386 Register  x86-64 Register
ff9feeb5SBill Wendling    ==============  =============  ===============
ff9feeb5SBill Wendling    1               ``EBX``        ``RBX``
ff9feeb5SBill Wendling    2               ``ECX``        ``R12``
ff9feeb5SBill Wendling    3               ``EDX``        ``R13``
ff9feeb5SBill Wendling    4               ``EDI``        ``R14``
ff9feeb5SBill Wendling    5               ``ESI``        ``R15``
ff9feeb5SBill Wendling    6               ``EBP``        ``RBP``
ff9feeb5SBill Wendling    ==============  =============  ===============
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling*Frameless with a Small Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)*
ff9feeb5SBill Wendling  To return, a constant (encoded in the compact unwind encoding) is added to the
ff9feeb5SBill Wendling  ``ESP/RSP``.  Then the return is done by popping the stack into the PC. All
ff9feeb5SBill Wendling  non-volatile registers that need to be restored must have been saved on the
ff9feeb5SBill Wendling  stack immediately after the return address. The stack size (divided by 4 in
ff9feeb5SBill Wendling  32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask:
ff9feeb5SBill Wendling  ``0x00FF0000``). There is a maximum stack size of 1024 bytes in 32-bit mode
ff9feeb5SBill Wendling  and 2048 in 64-bit mode. The number of registers saved is encoded in bits 9-12
ff9feeb5SBill Wendling  (mask: ``0x00001C00``). Bits 0-9 (mask: ``0x000003FF``) contain which
ff9feeb5SBill Wendling  registers were saved and their order. (See the
ff9feeb5SBill Wendling  ``encodeCompactUnwindRegistersWithoutFrame()`` function in
ff9feeb5SBill Wendling  ``lib/Target/X86FrameLowering.cpp`` for the encoding algorithm.)
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling*Frameless with a Large Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)*
ff9feeb5SBill Wendling  This case is like the "Frameless with a Small Constant Stack Size" case, but
ff9feeb5SBill Wendling  the stack size is too large to encode in the compact unwind encoding. Instead
ff9feeb5SBill Wendling  it requires that the function contains "``subl $nnnnnn, %esp``" in its
ff9feeb5SBill Wendling  prolog. The compact encoding contains the offset to the ``$nnnnnn`` value in
ff9feeb5SBill Wendling  the function in bits 9-12 (mask: ``0x00001C00``).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Late Machine Code Optimizations:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLate Machine Code Optimizations
ff9feeb5SBill Wendling-------------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. note::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  To Be Written
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Code Emission:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingCode Emission
ff9feeb5SBill Wendling-------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe code emission step of code generation is responsible for lowering from the
ff9feeb5SBill Wendlingcode generator abstractions (like `MachineFunction`_, `MachineInstr`_, etc) down
ff9feeb5SBill Wendlingto the abstractions used by the MC layer (`MCInst`_, `MCStreamer`_, etc).  This
ff9feeb5SBill Wendlingis done with a combination of several different classes: the (misnamed)
ff9feeb5SBill Wendlingtarget-independent AsmPrinter class, target-specific subclasses of AsmPrinter
ff9feeb5SBill Wendling(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSince the MC layer works at the level of abstraction of object files, it doesn't
ff9feeb5SBill Wendlinghave a notion of functions, global variables etc.  Instead, it thinks about
ff9feeb5SBill Wendlinglabels, directives, and instructions.  A key class used at this time is the
ff9feeb5SBill WendlingMCStreamer class.  This is an abstract API that is implemented in different ways
ff9feeb5SBill Wendling(e.g. to output a .s file, output an ELF .o file, etc) that is effectively an
ff9feeb5SBill Wendling"assembler API".  MCStreamer has one method per directive, such as EmitLabel,
adf4142fSFangrui SongEmitSymbolAttribute, switchSection, etc, which directly correspond to assembly
ff9feeb5SBill Wendlinglevel directives.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIf you are interested in implementing a code generator for a target, there are
ff9feeb5SBill Wendlingthree important things that you have to implement for your target:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. First, you need a subclass of AsmPrinter for your target.  This class
ff9feeb5SBill Wendling   implements the general lowering process converting MachineFunction's into MC
ff9feeb5SBill Wendling   label constructs.  The AsmPrinter base class provides a number of useful
ff9feeb5SBill Wendling   methods and routines, and also allows you to override the lowering process in
ff9feeb5SBill Wendling   some important ways.  You should get much of the lowering for free if you are
ff9feeb5SBill Wendling   implementing an ELF, COFF, or MachO target, because the
ff9feeb5SBill Wendling   TargetLoweringObjectFile class implements much of the common logic.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. Second, you need to implement an instruction printer for your target.  The
ff9feeb5SBill Wendling   instruction printer takes an `MCInst`_ and renders it to a raw_ostream as
ff9feeb5SBill Wendling   text.  Most of this is automatically generated from the .td file (when you
ff9feeb5SBill Wendling   specify something like "``add $dst, $src1, $src2``" in the instructions), but
ff9feeb5SBill Wendling   you need to implement routines to print operands.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling#. Third, you need to implement code that lowers a `MachineInstr`_ to an MCInst,
ff9feeb5SBill Wendling   usually implemented in "<target>MCInstLower.cpp".  This lowering process is
ff9feeb5SBill Wendling   often target specific, and is responsible for turning jump table entries,
ff9feeb5SBill Wendling   constant pool indices, global variable addresses, etc into MCLabels as
ff9feeb5SBill Wendling   appropriate.  This translation layer is also responsible for expanding pseudo
ff9feeb5SBill Wendling   ops used by the code generator into the actual machine instructions they
ff9feeb5SBill Wendling   correspond to. The MCInsts that are generated by this are fed into the
ff9feeb5SBill Wendling   instruction printer or the encoder.
ff9feeb5SBill Wendling
1e61ffddSEric ChristopherFinally, at your choosing, you can also implement a subclass of MCCodeEmitter
ff9feeb5SBill Wendlingwhich lowers MCInst's into machine code bytes and relocations.  This is
ff9feeb5SBill Wendlingimportant if you want to support direct .o file emission, or would like to
ff9feeb5SBill Wendlingimplement an assembler for your target.
ff9feeb5SBill Wendling
a6bcd53dSSean EvesonEmitting function stack size information
a6bcd53dSSean Eveson^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
a6bcd53dSSean Eveson
a6bcd53dSSean EvesonA section containing metadata on function stack sizes will be emitted when
a6bcd53dSSean Eveson``TargetLoweringObjectFile::StackSizesSection`` is not null, and
a6bcd53dSSean Eveson``TargetOptions::EmitStackSizeSection`` is set (-stack-size-section). The
2ae6037dSSean Evesonsection will contain an array of pairs of function symbol values (pointer size)
a6bcd53dSSean Evesonand stack sizes (unsigned LEB128). The stack size values only include the space
a6bcd53dSSean Evesonallocated in the function prologue. Functions with dynamic stack allocations are
a6bcd53dSSean Evesonnot included.
a6bcd53dSSean Eveson
ff9feeb5SBill WendlingVLIW Packetizer
ff9feeb5SBill Wendling---------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn a Very Long Instruction Word (VLIW) architecture, the compiler is responsible
ff9feeb5SBill Wendlingfor mapping instructions to functional-units available on the architecture. To
ff9feeb5SBill Wendlingthat end, the compiler creates groups of instructions called *packets* or
ff9feeb5SBill Wendling*bundles*. The VLIW packetizer in LLVM is a target-independent mechanism to
ff9feeb5SBill Wendlingenable the packetization of machine instructions.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMapping from instructions to functional units
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstructions in a VLIW target can typically be mapped to multiple functional
ff9feeb5SBill Wendlingunits. During the process of packetizing, the compiler must be able to reason
ff9feeb5SBill Wendlingabout whether an instruction can be added to a packet. This decision can be
ff9feeb5SBill Wendlingcomplex since the compiler has to examine all possible mappings of instructions
ff9feeb5SBill Wendlingto functional units. Therefore to alleviate compilation-time complexity, the
ff9feeb5SBill WendlingVLIW packetizer parses the instruction classes of a target and generates tables
ff9feeb5SBill Wendlingat compiler build time. These tables can then be queried by the provided
ff9feeb5SBill Wendlingmachine-independent API to determine if an instruction can be accommodated in a
ff9feeb5SBill Wendlingpacket.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingHow the packetization tables are generated and used
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe packetizer reads instruction classes from a target's itineraries and creates
ff9feeb5SBill Wendlinga deterministic finite automaton (DFA) to represent the state of a packet. A DFA
ff9feeb5SBill Wendlingconsists of three major elements: inputs, states, and transitions. The set of
ff9feeb5SBill Wendlinginputs for the generated DFA represents the instruction being added to a
ff9feeb5SBill Wendlingpacket. The states represent the possible consumption of functional units by
ff9feeb5SBill Wendlinginstructions in a packet. In the DFA, transitions from one state to another
ff9feeb5SBill Wendlingoccur on the addition of an instruction to an existing packet. If there is a
ff9feeb5SBill Wendlinglegal mapping of functional units to instructions, then the DFA contains a
ff9feeb5SBill Wendlingcorresponding transition. The absence of a transition indicates that a legal
ff9feeb5SBill Wendlingmapping does not exist and that the instruction cannot be added to the packet.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTo generate tables for a VLIW target, add *Target*\ GenDFAPacketizer.inc as a
ff9feeb5SBill Wendlingtarget to the Makefile in the target directory. The exported API provides three
ff9feeb5SBill Wendlingfunctions: ``DFAPacketizer::clearResources()``,
ff9feeb5SBill Wendling``DFAPacketizer::reserveResources(MachineInstr *MI)``, and
ff9feeb5SBill Wendling``DFAPacketizer::canReserveResources(MachineInstr *MI)``. These functions allow
ff9feeb5SBill Wendlinga target packetizer to add an instruction to an existing packet and to check
ff9feeb5SBill Wendlingwhether an instruction can be added to a packet. See
ff9feeb5SBill Wendling``llvm/CodeGen/DFAPacketizer.h`` for more information.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingImplementing a Native Assembler
ff9feeb5SBill Wendling===============================
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThough you're probably reading this because you want to write or maintain a
a9853efbSSylvestre Ledrucompiler backend, LLVM also fully supports building a native assembler.
ff9feeb5SBill WendlingWe've tried hard to automate the generation of the assembler from the .td files
ff9feeb5SBill Wendling(in particular the instruction syntax and encodings), which means that a large
ff9feeb5SBill Wendlingpart of the manual and repetitive data entry can be factored and shared with the
ff9feeb5SBill Wendlingcompiler.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction Parsing
ff9feeb5SBill Wendling-------------------
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. note::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  To Be Written
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction Alias Processing
ff9feeb5SBill Wendling----------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingOnce the instruction is parsed, it enters the MatchInstructionImpl function.
ff9feeb5SBill WendlingThe MatchInstructionImpl function performs alias processing and then does actual
ff9feeb5SBill Wendlingmatching.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAlias processing is the phase that canonicalizes different lexical forms of the
ff9feeb5SBill Wendlingsame instructions down to one representation.  There are several different kinds
ff9feeb5SBill Wendlingof alias that are possible to implement and they are listed below in the order
ff9feeb5SBill Wendlingthat they are processed (which is in order from simplest/weakest to most
ff9feeb5SBill Wendlingcomplex/powerful).  Generally you want to use the first alias mechanism that
ff9feeb5SBill Wendlingmeets the needs of your instruction, because it will allow a more concise
ff9feeb5SBill Wendlingdescription.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingMnemonic Aliases
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe first phase of alias processing is simple instruction mnemonic remapping for
ff9feeb5SBill Wendlingclasses of instructions which are allowed with two different mnemonics.  This
ff9feeb5SBill Wendlingphase is a simple and unconditionally remapping from one input mnemonic to one
ff9feeb5SBill Wendlingoutput mnemonic.  It isn't possible for this form of alias to look at the
ff9feeb5SBill Wendlingoperands at all, so the remapping must apply for all forms of a given mnemonic.
ff9feeb5SBill WendlingMnemonic aliases are defined simply, for example X86 has:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  def : MnemonicAlias<"cbw",     "cbtw">;
ff9feeb5SBill Wendling  def : MnemonicAlias<"smovq",   "movsq">;
ff9feeb5SBill Wendling  def : MnemonicAlias<"fldcww",  "fldcw">;
ff9feeb5SBill Wendling  def : MnemonicAlias<"fucompi", "fucomip">;
ff9feeb5SBill Wendling  def : MnemonicAlias<"ud2a",    "ud2">;
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling... and many others.  With a MnemonicAlias definition, the mnemonic is remapped
ff9feeb5SBill Wendlingsimply and directly.  Though MnemonicAlias's can't look at any aspect of the
ff9feeb5SBill Wendlinginstruction (such as the operands) they can depend on global modes (the same
ff9feeb5SBill Wendlingones supported by the matcher), through a Requires clause:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
ff9feeb5SBill Wendling  def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
ff9feeb5SBill Wendling
9ab8899fSSean SilvaIn this example, the mnemonic gets mapped into a different one depending on
ff9feeb5SBill Wendlingthe current instruction set.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction Aliases
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe most general phase of alias processing occurs while matching is happening:
ff9feeb5SBill Wendlingit provides new forms for the matcher to match along with a specific instruction
ff9feeb5SBill Wendlingto generate.  An instruction alias has two parts: the string to match and the
ff9feeb5SBill Wendlinginstruction to generate.  For example:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8  :$src)>;
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX32rr8  GR32:$dst, GR8  :$src)>;
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX64rr8  GR64:$dst, GR8  :$src)>;
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis shows a powerful example of the instruction aliases, matching the same
ff9feeb5SBill Wendlingmnemonic in multiple different ways depending on what operands are present in
ff9feeb5SBill Wendlingthe assembly.  The result of instruction aliases can include operands in a
ff9feeb5SBill Wendlingdifferent order than the destination instruction, and can use an input multiple
ff9feeb5SBill Wendlingtimes, for example:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  def : InstAlias<"clrb $reg", (XOR8rr  GR8 :$reg, GR8 :$reg)>;
ff9feeb5SBill Wendling  def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
ff9feeb5SBill Wendling  def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
ff9feeb5SBill Wendling  def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis example also shows that tied operands are only listed once.  In the X86
ff9feeb5SBill Wendlingbackend, XOR8rr has two input GR8's and one output GR8 (where an input is tied
ff9feeb5SBill Wendlingto the output).  InstAliases take a flattened operand list without duplicates
ff9feeb5SBill Wendlingfor tied operands.  The result of an instruction alias can also use immediates
ff9feeb5SBill Wendlingand fixed physical registers which are added as simple immediate operands in the
ff9feeb5SBill Wendlingresult, for example:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Fixed Immediate operand.
ff9feeb5SBill Wendling  def : InstAlias<"aad", (AAD8i8 10)>;
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Fixed register operand.
ff9feeb5SBill Wendling  def : InstAlias<"fcomi", (COM_FIr ST1)>;
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  // Simple alias.
ff9feeb5SBill Wendling  def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction aliases can also have a Requires clause to make them subtarget
ff9feeb5SBill Wendlingspecific.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIf the back-end supports it, the instruction printer can automatically emit the
ff9feeb5SBill Wendlingalias rather than what's being aliased. It typically leads to better, more
ff9feeb5SBill Wendlingreadable code. If it's better to print out what's being aliased, then pass a '0'
ff9feeb5SBill Wendlingas the third parameter to the InstAlias definition.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction Matching
ff9feeb5SBill Wendling--------------------
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. note::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  To Be Written
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _Implementations of the abstract target description interfaces:
ff9feeb5SBill Wendling.. _implement the target description:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTarget-specific Implementation Notes
ff9feeb5SBill Wendling====================================
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThis section of the document explains features or design decisions that are
*86c42429SAlex Bradburyspecific to the code generator for a particular target.
ff9feeb5SBill Wendling
5ace2cd5SJay Foad.. _tail call section:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTail call optimization
ff9feeb5SBill Wendling----------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTail call optimization, callee reusing the stack of the caller, is currently
82a0e808STim Northoversupported on x86/x86-64, PowerPC, AArch64, and WebAssembly. It is performed on
82a0e808STim Northoverx86/x86-64, PowerPC, and AArch64 if:
ff9feeb5SBill Wendling
d71b4e45SDuncan Sands* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC
82a0e808STim Northover  calling convention), ``cc 11`` (HiPE calling convention), ``tailcc``, or
82a0e808STim Northover  ``swifttailcc``.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* The call is a tail call - in tail position (ret immediately follows call and
ff9feeb5SBill Wendling  ret uses value of call or is void).
ff9feeb5SBill Wendling
f9b67b81SReid Kleckner* Option ``-tailcallopt`` is enabled or the calling convention is ``tailcc``.
ff9feeb5SBill Wendling
cf21875dSAlp Toker* Platform-specific constraints are met.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendlingx86/x86-64 constraints:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* No variable argument lists are used.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* On x86-64 when generating GOT/PIC code only module-local calls (visibility =
ff9feeb5SBill Wendling  hidden or protected) are supported.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingPowerPC constraints:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* No variable argument lists are used.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* No byval parameters are used.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected)
ff9feeb5SBill Wendling  are supported.
ff9feeb5SBill Wendling
e0a9dce5SThomas LivelyWebAssembly constraints:
e0a9dce5SThomas Lively
e0a9dce5SThomas Lively* No variable argument lists are used
e0a9dce5SThomas Lively
e0a9dce5SThomas Lively* The 'tail-call' target attribute is enabled.
e0a9dce5SThomas Lively
e0a9dce5SThomas Lively* The caller and callee's return types must match. The caller cannot
e0a9dce5SThomas Lively  be void unless the callee is, too.
a1d97a96SThomas Lively
82a0e808STim NorthoverAArch64 constraints:
82a0e808STim Northover
82a0e808STim Northover* No variable argument lists are used.
82a0e808STim Northover
ff9feeb5SBill WendlingExample:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingCall as ``llc -tailcallopt test.ll``.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: llvm
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
ff9feeb5SBill Wendling    %l1 = add i32 %in1, %in2
9399681aSKai Nacke    %tmp = tail call fastcc i32 @tailcallee(i32 inreg %in1, i32 inreg %in2, i32 %in1, i32 %l1)
ff9feeb5SBill Wendling    ret i32 %tmp
ff9feeb5SBill Wendling  }
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingImplications of ``-tailcallopt``:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingTo support tail call optimization in situations where the callee has more
ff9feeb5SBill Wendlingarguments than the caller a 'callee pops arguments' convention is used. This
ff9feeb5SBill Wendlingcurrently causes each ``fastcc`` call that is not tail call optimized (because
ff9feeb5SBill Wendlingone or more of above constraints are not met) to be followed by a readjustment
ff9feeb5SBill Wendlingof the stack. So performance might be worse in such cases.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSibling call optimization
ff9feeb5SBill Wendling-------------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSibling call optimization is a restricted form of tail call optimization.
ff9feeb5SBill WendlingUnlike tail call optimization described in the previous section, it can be
ff9feeb5SBill Wendlingperformed automatically on any tail calls when ``-tailcallopt`` option is not
ff9feeb5SBill Wendlingspecified.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSibling call optimization is currently performed on x86/x86-64 when the
ff9feeb5SBill Wendlingfollowing constraints are met:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* Caller and callee have the same calling convention. It can be either ``c`` or
ff9feeb5SBill Wendling  ``fastcc``.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* The call is a tail call - in tail position (ret immediately follows call and
ff9feeb5SBill Wendling  ret uses value of call or is void).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* Caller and callee have matching return type or the callee result is not used.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* If any of the callee arguments are being passed in stack, they must be
ff9feeb5SBill Wendling  available in caller's own incoming argument stack and the frame offsets must
ff9feeb5SBill Wendling  be the same.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingExample:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. code-block:: llvm
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  declare i32 @bar(i32, i32)
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  define i32 @foo(i32 %a, i32 %b, i32 %c) {
ff9feeb5SBill Wendling  entry:
ff9feeb5SBill Wendling    %0 = tail call i32 @bar(i32 %a, i32 %b)
ff9feeb5SBill Wendling    ret i32 %0
ff9feeb5SBill Wendling  }
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe X86 backend
ff9feeb5SBill Wendling---------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe X86 code generator lives in the ``lib/Target/X86`` directory.  This code
ff9feeb5SBill Wendlinggenerator is capable of targeting a variety of x86-32 and x86-64 processors, and
ff9feeb5SBill Wendlingincludes support for ISA extensions such as MMX and SSE.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingX86 Target Triples supported
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe following are the known target triples that are supported by the X86
ff9feeb5SBill Wendlingbackend.  This is not an exhaustive list, and it would be useful to add those
ff9feeb5SBill Wendlingthat people test.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **i686-pc-linux-gnu** --- Linux
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **i386-unknown-freebsd5.3** --- FreeBSD 5.3
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **i686-pc-cygwin** --- Cygwin on Win32
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **i686-pc-mingw32** --- MingW on Win32
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **i386-pc-mingw32msvc** --- MingW crosscompiler on Linux
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **i686-apple-darwin*** --- Apple Darwin on X86
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **x86_64-unknown-linux-gnu** --- Linux
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingX86 Calling Conventions supported
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe following target-specific calling conventions are known to backend:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **x86_StdCall** --- stdcall calling convention seen on Microsoft Windows
ff9feeb5SBill Wendling  platform (CC ID = 64).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **x86_FastCall** --- fastcall calling convention seen on Microsoft Windows
ff9feeb5SBill Wendling  platform (CC ID = 65).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling* **x86_ThisCall** --- Similar to X86_StdCall. Passes first argument in ECX,
ff9feeb5SBill Wendling  others via stack. Callee is responsible for stack cleaning. This convention is
ff9feeb5SBill Wendling  used by MSVC by default for methods in its ABI (CC ID = 70).
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. _X86 addressing mode:
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingRepresenting X86 addressing modes in MachineInstrs
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe x86 has a very flexible way of accessing memory.  It is capable of forming
ff9feeb5SBill Wendlingmemory addresses of the following expression directly in integer instructions
ff9feeb5SBill Wendling(which use ModR/M addressing):
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingIn order to represent this, LLVM tracks no less than 5 operands for each memory
ff9feeb5SBill Wendlingoperand of this form.  This means that the "load" form of '``mov``' has the
ff9feeb5SBill Wendlingfollowing ``MachineOperand``\s in this order:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  Index:        0     |    1        2       3           4          5
ff9feeb5SBill Wendling  Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment
ff9feeb5SBill Wendling  OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingStores, and all other instructions, treat the four memory operands in the same
ff9feeb5SBill Wendlingway and in the same order.  If the segment register is unspecified (regno = 0),
ff9feeb5SBill Wendlingthen no segment override is generated.  "Lea" operations do not have a segment
ff9feeb5SBill Wendlingregister specified, so they only have 4 operands for their memory reference.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingX86 address spaces supported
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill Wendlingx86 has a feature which provides the ability to perform loads and stores to
ff9feeb5SBill Wendlingdifferent address spaces via the x86 segment registers.  A segment override
ff9feeb5SBill Wendlingprefix byte on an instruction causes the instruction's memory access to go to
ff9feeb5SBill Wendlingthe specified segment.  LLVM address space 0 is the default address space, which
ff9feeb5SBill Wendlingincludes the stack, and any unqualified memory accesses in a program.  Address
ff9feeb5SBill Wendlingspaces 1-255 are currently reserved for user-defined code.  The GS-segment is
c9fbf101SDavid L Kreitzerrepresented by address space 256, the FS-segment is represented by address space
c9fbf101SDavid L Kreitzer257, and the SS-segment is represented by address space 258. Other x86 segments
c9fbf101SDavid L Kreitzerhave yet to be allocated address space numbers.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingWhile these address spaces may seem similar to TLS via the ``thread_local``
ff9feeb5SBill Wendlingkeyword, and often use the same underlying hardware, there are some fundamental
ff9feeb5SBill Wendlingdifferences.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe ``thread_local`` keyword applies to global variables and specifies that they
ff9feeb5SBill Wendlingare to be allocated in thread-local memory. There are no type qualifiers
ff9feeb5SBill Wendlinginvolved, and these variables can be pointed to with normal pointers and
ff9feeb5SBill Wendlingaccessed with normal loads and stores.  The ``thread_local`` keyword is
ff9feeb5SBill Wendlingtarget-independent at the LLVM IR level (though LLVM doesn't yet have
ff9feeb5SBill Wendlingimplementations of it for some configurations)
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSpecial address spaces, in contrast, apply to static types. Every load and store
ff9feeb5SBill Wendlinghas a particular address space in its address operand type, and this is what
ff9feeb5SBill Wendlingdetermines which address space is accessed.  LLVM ignores these special address
ff9feeb5SBill Wendlingspace qualifiers on global variables, and does not provide a way to directly
ff9feeb5SBill Wendlingallocate storage in them.  At the LLVM IR level, the behavior of these special
ff9feeb5SBill Wendlingaddress spaces depends in part on the underlying OS or runtime environment, and
ff9feeb5SBill Wendlingthey are specific to x86 (and LLVM doesn't yet handle them correctly in some
ff9feeb5SBill Wendlingcases).
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingSome operating systems and runtime environments use (or may in the future use)
ff9feeb5SBill Wendlingthe FS/GS-segment registers for various low-level purposes, so care should be
ff9feeb5SBill Wendlingtaken when considering them.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingInstruction naming
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
e1567e77SYoungsuk KimAn instruction name consists of the base name, a default operand size, and a
ff9feeb5SBill Wendlingcharacter per operand with an optional special size. For example:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  ADD8rr      -> add, 8-bit register, 8-bit register
ff9feeb5SBill Wendling  IMUL16rmi   -> imul, 16-bit register, 16-bit memory, 16-bit immediate
ff9feeb5SBill Wendling  IMUL16rmi8  -> imul, 16-bit register, 16-bit memory, 8-bit immediate
ff9feeb5SBill Wendling  MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe PowerPC backend
ff9feeb5SBill Wendling-------------------
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe PowerPC code generator lives in the lib/Target/PowerPC directory.  The code
ff9feeb5SBill Wendlinggeneration is retargetable to several variations or *subtargets* of the PowerPC
ff9feeb5SBill WendlingISA; including ppc32, ppc64 and altivec.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLLVM PowerPC ABI
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingLLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative
ff9feeb5SBill Wendling(PIC) or static addressing for accessing global values, so no TOC (r2) is
ff9feeb5SBill Wendlingused. Second, r31 is used as a frame pointer to allow dynamic growth of a stack
ff9feeb5SBill Wendlingframe.  LLVM takes advantage of having no TOC to provide space to save the frame
ff9feeb5SBill Wendlingpointer in the PowerPC linkage area of the caller frame.  Other details of
ff9feeb5SBill WendlingPowerPC ABI can be found at `PowerPC ABI
ff9feeb5SBill Wendling<http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html>`_\
ff9feeb5SBill Wendling. Note: This link describes the 32 bit ABI.  The 64 bit ABI is similar except
ff9feeb5SBill Wendlingspace for GPRs are 8 bytes wide (not 4) and r13 is reserved for system use.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingFrame Layout
ff9feeb5SBill Wendling^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe size of a PowerPC frame is usually fixed for the duration of a function's
ff9feeb5SBill Wendlinginvocation.  Since the frame is fixed size, all references into the frame can be
ff9feeb5SBill Wendlingaccessed via fixed offsets from the stack pointer.  The exception to this is
ff9feeb5SBill Wendlingwhen dynamic alloca or variable sized arrays are present, then a base pointer
ff9feeb5SBill Wendling(r31) is used as a proxy for the stack pointer and stack pointer is free to grow
ff9feeb5SBill Wendlingor shrink.  A base pointer is also used if llvm-gcc is not passed the
ff9feeb5SBill Wendling-fomit-frame-pointer flag. The stack pointer is always aligned to 16 bytes, so
ff9feeb5SBill Wendlingthat space allocated for altivec vectors will be properly aligned.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingAn invocation frame is laid out as follows (low memory at top):
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>Linkage<br><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>Parameter area<br><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>Dynamic area<br><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>Locals area<br><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved registers area<br><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr style="border-style: none hidden none hidden;">`
ff9feeb5SBill Wendling:raw-html:`<td><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>Previous Frame<br><br></td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`</table>`
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe *linkage* area is used by a callee to save special registers prior to
ff9feeb5SBill Wendlingallocating its own frame.  Only three entries are relevant to LLVM. The first
ff9feeb5SBill Wendlingentry is the previous stack pointer (sp), aka link.  This allows probing tools
ff9feeb5SBill Wendlinglike gdb or exception handlers to quickly scan the frames in the stack.  A
ff9feeb5SBill Wendlingfunction epilog can also use the link to pop the frame from the stack.  The
ff9feeb5SBill Wendlingthird entry in the linkage area is used to save the return address from the lr
ff9feeb5SBill Wendlingregister. Finally, as mentioned above, the last entry is used to save the
ff9feeb5SBill Wendlingprevious frame pointer (r31.)  The entries in the linkage area are the size of a
ff9feeb5SBill WendlingGPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64
ff9feeb5SBill Wendlingbit mode.
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling32 bit linkage area:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling:raw-html:`<table  border="1" cellspacing="0">`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>0</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved SP (r1)</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>4</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved CR</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>8</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved LR</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>12</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>16</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>20</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved FP (r31)</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`</table>`
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling64 bit linkage area:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>0</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved SP (r1)</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>8</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved CR</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>16</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved LR</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>24</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>32</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<td>40</td>`
ff9feeb5SBill Wendling:raw-html:`<td>Saved FP (r31)</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`</table>`
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe *parameter area* is used to store arguments being passed to a callee
ff9feeb5SBill Wendlingfunction.  Following the PowerPC ABI, the first few arguments are actually
ff9feeb5SBill Wendlingpassed in registers, with the space in the parameter area unused.  However, if
ff9feeb5SBill Wendlingthere are not enough registers or the callee is a thunk or vararg function,
ff9feeb5SBill Wendlingthese register arguments can be spilled into the parameter area.  Thus, the
ff9feeb5SBill Wendlingparameter area must be large enough to store all the parameters for the largest
ff9feeb5SBill Wendlingcall sequence made by the caller.  The size must also be minimally large enough
ff9feeb5SBill Wendlingto spill registers r3-r10.  This allows callees blind to the call signature,
ff9feeb5SBill Wendlingsuch as thunks and vararg functions, enough space to cache the argument
ff9feeb5SBill Wendlingregisters.  Therefore, the parameter area is minimally 32 bytes (64 bytes in 64
ff9feeb5SBill Wendlingbit mode.)  Also note that since the parameter area is a fixed offset from the
f65d4aa9SKazuaki Ishizakitop of the frame, that a callee can access its split arguments using fixed
ff9feeb5SBill Wendlingoffsets from the stack pointer (or base pointer.)
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingCombining the information about the linkage, parameter areas and alignment. A
ff9feeb5SBill Wendlingstack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit mode.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe *dynamic area* starts out as size zero.  If a function uses dynamic alloca
ff9feeb5SBill Wendlingthen space is added to the stack, the linkage and parameter areas are shifted to
ff9feeb5SBill Wendlingtop of stack, and the new space is available immediately below the linkage and
ff9feeb5SBill Wendlingparameter areas.  The cost of shifting the linkage and parameter areas is minor
ff9feeb5SBill Wendlingsince only the link value needs to be copied.  The link value can be easily
ff9feeb5SBill Wendlingfetched by adding the original frame size to the base pointer.  Note that
ff9feeb5SBill Wendlingallocations in the dynamic space need to observe 16 byte alignment.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe *locals area* is where the llvm compiler reserves space for local variables.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe *saved registers area* is where the llvm compiler spills callee saved
ff9feeb5SBill Wendlingregisters on entry to the callee.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingProlog/Epilog
ff9feeb5SBill Wendling^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingThe llvm prolog and epilog are the same as described in the PowerPC ABI, with
ff9feeb5SBill Wendlingthe following exceptions.  Callee saved registers are spilled after the frame is
ff9feeb5SBill Wendlingcreated.  This allows the llvm epilog/prolog support to be common with other
ff9feeb5SBill Wendlingtargets.  The base pointer callee saved register r31 is saved in the TOC slot of
ff9feeb5SBill Wendlinglinkage area.  This simplifies allocation of space for the base pointer and
843b7515SSylvestre Ledrumakes it convenient to locate programmatically and during debugging.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingDynamic Allocation
ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling.. note::
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling  TODO - More to come.
ff9feeb5SBill Wendling
f73d7a53SJustin HolewinskiThe NVPTX backend
f73d7a53SJustin Holewinski-----------------
ff9feeb5SBill Wendling
f73d7a53SJustin HolewinskiThe NVPTX code generator under lib/Target/NVPTX is an open-source version of
f73d7a53SJustin Holewinskithe NVIDIA NVPTX code generator for LLVM.  It is contributed by NVIDIA and is
f73d7a53SJustin Holewinskia port of the code generator used in the CUDA compiler (nvcc).  It targets the
f73d7a53SJustin HolewinskiPTX 3.0/3.1 ISA and can target any compute capability greater than or equal to
f73d7a53SJustin Holewinski2.0 (Fermi).
ff9feeb5SBill Wendling
f73d7a53SJustin HolewinskiThis target is of production quality and should be completely compatible with
f73d7a53SJustin Holewinskithe official NVIDIA toolchain.
ff9feeb5SBill Wendling
ff9feeb5SBill WendlingCode Generator Options:
ff9feeb5SBill Wendling
ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">`
ff9feeb5SBill Wendling:raw-html:`<tr>`
ff9feeb5SBill Wendling:raw-html:`<th>Option</th>`
ff9feeb5SBill Wendling:raw-html:`<th>Description</th>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
f73d7a53SJustin Holewinski:raw-html:`<td>sm_20</td>`
f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 2.0</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
f73d7a53SJustin Holewinski:raw-html:`<td>sm_21</td>`
f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 2.1</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`<tr>`
f73d7a53SJustin Holewinski:raw-html:`<td>sm_30</td>`
f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 3.0</td>`
f73d7a53SJustin Holewinski:raw-html:`</tr>`
f73d7a53SJustin Holewinski:raw-html:`<tr>`
f73d7a53SJustin Holewinski:raw-html:`<td>sm_35</td>`
f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 3.5</td>`
f73d7a53SJustin Holewinski:raw-html:`</tr>`
f73d7a53SJustin Holewinski:raw-html:`<tr>`
f73d7a53SJustin Holewinski:raw-html:`<td>ptx30</td>`
f73d7a53SJustin Holewinski:raw-html:`<td align="left">Target PTX 3.0</td>`
f73d7a53SJustin Holewinski:raw-html:`</tr>`
f73d7a53SJustin Holewinski:raw-html:`<tr>`
f73d7a53SJustin Holewinski:raw-html:`<td>ptx31</td>`
f73d7a53SJustin Holewinski:raw-html:`<td align="left">Target PTX 3.1</td>`
ff9feeb5SBill Wendling:raw-html:`</tr>`
ff9feeb5SBill Wendling:raw-html:`</table>`
ff9feeb5SBill Wendling
cb6b408dSAlexei StarovoitovThe extended Berkeley Packet Filter (eBPF) backend
cb6b408dSAlexei Starovoitov--------------------------------------------------
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovExtended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used
cb6b408dSAlexei Starovoitovto filter network packets.  The
cb6b408dSAlexei Starovoitov`bpf() system call <http://man7.org/linux/man-pages/man2/bpf.2.html>`_
cb6b408dSAlexei Starovoitovperforms a range of operations related to eBPF.  For both cBPF and eBPF
cb6b408dSAlexei Starovoitovprograms, the Linux kernel statically analyzes the programs before loading
cb6b408dSAlexei Starovoitovthem, in order to ensure that they cannot harm the running system.  eBPF is
cb6b408dSAlexei Starovoitova 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs.
cb6b408dSAlexei StarovoitovOpcodes are 8-bit encoded, and 87 instructions are defined.  There are 10
cb6b408dSAlexei Starovoitovregisters, grouped by function as outlined below.
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  R0        return value from in-kernel functions; exit value for eBPF program
cb6b408dSAlexei Starovoitov  R1 - R5   function call arguments to in-kernel functions
cb6b408dSAlexei Starovoitov  R6 - R9   callee-saved registers preserved by in-kernel functions
cb6b408dSAlexei Starovoitov  R10       stack frame pointer (read only)
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovInstruction encoding (arithmetic and jump)
cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cb6b408dSAlexei StarovoitoveBPF is reusing most of the opcode encoding from classic to simplify conversion
cb6b408dSAlexei Starovoitovof classic BPF to eBPF.  For arithmetic and jump instructions the 8-bit 'code'
cb6b408dSAlexei Starovoitovfield is divided into three parts:
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  +----------------+--------+--------------------+
cb6b408dSAlexei Starovoitov  |   4 bits       |  1 bit |   3 bits           |
cb6b408dSAlexei Starovoitov  | operation code | source | instruction class  |
cb6b408dSAlexei Starovoitov  +----------------+--------+--------------------+
cb6b408dSAlexei Starovoitov  (MSB)                                      (LSB)
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovThree LSB bits store instruction class which is one of:
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  BPF_LD     0x0
cb6b408dSAlexei Starovoitov  BPF_LDX    0x1
cb6b408dSAlexei Starovoitov  BPF_ST     0x2
cb6b408dSAlexei Starovoitov  BPF_STX    0x3
cb6b408dSAlexei Starovoitov  BPF_ALU    0x4
cb6b408dSAlexei Starovoitov  BPF_JMP    0x5
cb6b408dSAlexei Starovoitov  (unused)   0x6
cb6b408dSAlexei Starovoitov  BPF_ALU64  0x7
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovWhen BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP,
cb6b408dSAlexei Starovoitov4th bit encodes source operand
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
33434d5fSYonghong Song  BPF_X     0x1  use src_reg register as source operand
33434d5fSYonghong Song  BPF_K     0x0  use 32 bit immediate as source operand
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitovand four MSB bits store operation code
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  BPF_ADD   0x0  add
cb6b408dSAlexei Starovoitov  BPF_SUB   0x1  subtract
cb6b408dSAlexei Starovoitov  BPF_MUL   0x2  multiply
cb6b408dSAlexei Starovoitov  BPF_DIV   0x3  divide
cb6b408dSAlexei Starovoitov  BPF_OR    0x4  bitwise logical OR
cb6b408dSAlexei Starovoitov  BPF_AND   0x5  bitwise logical AND
cb6b408dSAlexei Starovoitov  BPF_LSH   0x6  left shift
cb6b408dSAlexei Starovoitov  BPF_RSH   0x7  right shift (zero extended)
cb6b408dSAlexei Starovoitov  BPF_NEG   0x8  arithmetic negation
cb6b408dSAlexei Starovoitov  BPF_MOD   0x9  modulo
cb6b408dSAlexei Starovoitov  BPF_XOR   0xa  bitwise logical XOR
cb6b408dSAlexei Starovoitov  BPF_MOV   0xb  move register to register
cb6b408dSAlexei Starovoitov  BPF_ARSH  0xc  right shift (sign extended)
cb6b408dSAlexei Starovoitov  BPF_END   0xd  endianness conversion
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovIf BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  BPF_JA    0x0  unconditional jump
cb6b408dSAlexei Starovoitov  BPF_JEQ   0x1  jump ==
cb6b408dSAlexei Starovoitov  BPF_JGT   0x2  jump >
cb6b408dSAlexei Starovoitov  BPF_JGE   0x3  jump >=
cb6b408dSAlexei Starovoitov  BPF_JSET  0x4  jump if (DST & SRC)
cb6b408dSAlexei Starovoitov  BPF_JNE   0x5  jump !=
cb6b408dSAlexei Starovoitov  BPF_JSGT  0x6  jump signed >
cb6b408dSAlexei Starovoitov  BPF_JSGE  0x7  jump signed >=
cb6b408dSAlexei Starovoitov  BPF_CALL  0x8  function call
cb6b408dSAlexei Starovoitov  BPF_EXIT  0x9  function return
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovInstruction encoding (load, store)
cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cb6b408dSAlexei StarovoitovFor load and store instructions the 8-bit 'code' field is divided as:
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  +--------+--------+-------------------+
cb6b408dSAlexei Starovoitov  | 3 bits | 2 bits |   3 bits          |
cb6b408dSAlexei Starovoitov  |  mode  |  size  | instruction class |
cb6b408dSAlexei Starovoitov  +--------+--------+-------------------+
cb6b408dSAlexei Starovoitov  (MSB)                             (LSB)
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovSize modifier is one of
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  BPF_W       0x0  word
cb6b408dSAlexei Starovoitov  BPF_H       0x1  half word
cb6b408dSAlexei Starovoitov  BPF_B       0x2  byte
cb6b408dSAlexei Starovoitov  BPF_DW      0x3  double word
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovMode modifier is one of
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  BPF_IMM     0x0  immediate
cb6b408dSAlexei Starovoitov  BPF_ABS     0x1  used to access packet data
cb6b408dSAlexei Starovoitov  BPF_IND     0x2  used to access packet data
cb6b408dSAlexei Starovoitov  BPF_MEM     0x3  memory
cb6b408dSAlexei Starovoitov  (reserved)  0x4
cb6b408dSAlexei Starovoitov  (reserved)  0x5
cb6b408dSAlexei Starovoitov  BPF_XADD    0x6  exclusive add
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovPacket data access (BPF_ABS, BPF_IND)
cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovTwo non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
cb6b408dSAlexei Starovoitov(BPF_IND | <size> | BPF_LD) which are used to access packet data.
cb6b408dSAlexei StarovoitovRegister R6 is an implicit input that must contain pointer to sk_buff.
cb6b408dSAlexei StarovoitovRegister R0 is an implicit output which contains the data fetched
cb6b408dSAlexei Starovoitovfrom the packet.  Registers R1-R5 are scratch registers and must not
cb6b408dSAlexei Starovoitovbe used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD
cb6b408dSAlexei Starovoitovinstructions.  These instructions have implicit program exit condition
cb6b408dSAlexei Starovoitovas well.  When eBPF program is trying to access the data beyond
cb6b408dSAlexei Starovoitovthe packet boundary, the interpreter will abort the execution of the program.
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovBPF_IND | BPF_W | BPF_LD is equivalent to:
cb6b408dSAlexei Starovoitov  R0 = ntohl(\*(u32 \*) (((struct sk_buff \*) R6)->data + src_reg + imm32))
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitoveBPF maps
cb6b408dSAlexei Starovoitov^^^^^^^^^
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitoveBPF maps are provided for sharing data between kernel and user-space.
cb6b408dSAlexei StarovoitovCurrently implemented types are hash and array, with potential extension to
cb6b408dSAlexei Starovoitovsupport bloom filters, radix trees, etc.  A map is defined by its type,
cb6b408dSAlexei Starovoitovmaximum number of elements, key size and value size in bytes.  eBPF syscall
cb6b408dSAlexei Starovoitovsupports create, update, find and delete functions on maps.
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovFunction calls
cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovFunction call arguments are passed using up to five registers (R1 - R5).
cb6b408dSAlexei StarovoitovThe return value is passed in a dedicated register (R0).  Four additional
cb6b408dSAlexei Starovoitovregisters (R6 - R9) are callee-saved, and the values in these registers
cb6b408dSAlexei Starovoitovare preserved within kernel functions.  R0 - R5 are scratch registers within
cb6b408dSAlexei Starovoitovkernel functions, and eBPF programs must therefor store/restore values in
cb6b408dSAlexei Starovoitovthese registers if needed across function calls.  The stack can be accessed
cb6b408dSAlexei Starovoitovusing the read-only frame pointer R10.  eBPF registers map 1:1 to hardware
cb6b408dSAlexei Starovoitovregisters on x86_64 and other 64-bit architectures.  For example, x86_64
cb6b408dSAlexei Starovoitovin-kernel JIT maps them as
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov::
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitov  R0 - rax
cb6b408dSAlexei Starovoitov  R1 - rdi
cb6b408dSAlexei Starovoitov  R2 - rsi
cb6b408dSAlexei Starovoitov  R3 - rdx
cb6b408dSAlexei Starovoitov  R4 - rcx
cb6b408dSAlexei Starovoitov  R5 - r8
cb6b408dSAlexei Starovoitov  R6 - rbx
cb6b408dSAlexei Starovoitov  R7 - r13
cb6b408dSAlexei Starovoitov  R8 - r14
cb6b408dSAlexei Starovoitov  R9 - r15
cb6b408dSAlexei Starovoitov  R10 - rbp
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei Starovoitovsince x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
cb6b408dSAlexei Starovoitovand rbx, r12 - r15 are callee saved.
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovProgram start
cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^
cb6b408dSAlexei Starovoitov
cb6b408dSAlexei StarovoitovAn eBPF program receives a single argument and contains
cb6b408dSAlexei Starovoitova single eBPF main routine; the program does not contain eBPF functions.
cb6b408dSAlexei StarovoitovFunction calls are limited to a predefined set of kernel functions.  The size
cb6b408dSAlexei Starovoitovof a program is limited to 4K instructions:  this ensures fast termination and
cb6b408dSAlexei Starovoitova limited number of kernel function calls.  Prior to running an eBPF program,
cb6b408dSAlexei Starovoitova verifier performs static analysis to prevent loops in the code and
cb6b408dSAlexei Starovoitovto ensure valid register usage and operand types.
9fdfa518STom Stellard
9fdfa518STom StellardThe AMDGPU backend
9fdfa518STom Stellard------------------
9fdfa518STom Stellard
f16a45eaSTony TyeThe AMDGPU code generator lives in the ``lib/Target/AMDGPU``
f16a45eaSTony Tyedirectory. This code generator is capable of targeting a variety of
f16a45eaSTony TyeAMD GPU processors. Refer to :doc:`AMDGPUUsage` for more information.