17daef91dSBill Wendling==========================================
27daef91dSBill WendlingThe LLVM Target-Independent Code Generator
37daef91dSBill Wendling==========================================
47daef91dSBill Wendling
5ff9feeb5SBill Wendling.. role:: raw-html(raw)
6ff9feeb5SBill Wendling   :format: html
7ff9feeb5SBill Wendling
8ff9feeb5SBill Wendling.. raw:: html
9ff9feeb5SBill Wendling
10ff9feeb5SBill Wendling  <style>
11ff9feeb5SBill Wendling    .unknown { background-color: #C0C0C0; text-align: center; }
12ff9feeb5SBill Wendling    .unknown:before { content: "?" }
13ff9feeb5SBill Wendling    .no { background-color: #C11B17 }
14ff9feeb5SBill Wendling    .no:before { content: "N" }
15ff9feeb5SBill Wendling    .partial { background-color: #F88017 }
16ff9feeb5SBill Wendling    .yes { background-color: #0F0; }
17ff9feeb5SBill Wendling    .yes:before { content: "Y" }
18ceab0deaSJustin Holewinski    .na { background-color: #6666FF; }
19ceab0deaSJustin Holewinski    .na:before { content: "N/A" }
20ff9feeb5SBill Wendling  </style>
21ff9feeb5SBill Wendling
22ff9feeb5SBill Wendling.. contents::
23ff9feeb5SBill Wendling   :local:
24ff9feeb5SBill Wendling
25ff9feeb5SBill Wendling.. warning::
26ff9feeb5SBill Wendling  This is a work in progress.
27ff9feeb5SBill Wendling
28ff9feeb5SBill WendlingIntroduction
29ff9feeb5SBill Wendling============
30ff9feeb5SBill Wendling
31ff9feeb5SBill WendlingThe LLVM target-independent code generator is a framework that provides a suite
32ff9feeb5SBill Wendlingof reusable components for translating the LLVM internal representation to the
33ff9feeb5SBill Wendlingmachine code for a specified target---either in assembly form (suitable for a
34ff9feeb5SBill Wendlingstatic compiler) or in binary machine code format (usable for a JIT
35ff9feeb5SBill Wendlingcompiler). The LLVM target-independent code generator consists of six main
36ff9feeb5SBill Wendlingcomponents:
37ff9feeb5SBill Wendling
38ff9feeb5SBill Wendling1. `Abstract target description`_ interfaces which capture important properties
39ff9feeb5SBill Wendling   about various aspects of the machine, independently of how they will be used.
40ff9feeb5SBill Wendling   These interfaces are defined in ``include/llvm/Target/``.
41ff9feeb5SBill Wendling
42ff9feeb5SBill Wendling2. Classes used to represent the `code being generated`_ for a target.  These
43ff9feeb5SBill Wendling   classes are intended to be abstract enough to represent the machine code for
44ff9feeb5SBill Wendling   *any* target machine.  These classes are defined in
45ff9feeb5SBill Wendling   ``include/llvm/CodeGen/``. At this level, concepts like "constant pool
46ff9feeb5SBill Wendling   entries" and "jump tables" are explicitly exposed.
47ff9feeb5SBill Wendling
4820adbdd2SJustin Lebar3. Classes and algorithms used to represent code at the object file level, the
49ff9feeb5SBill Wendling   `MC Layer`_.  These classes represent assembly level constructs like labels,
50ff9feeb5SBill Wendling   sections, and instructions.  At this level, concepts like "constant pool
51ff9feeb5SBill Wendling   entries" and "jump tables" don't exist.
52ff9feeb5SBill Wendling
53ff9feeb5SBill Wendling4. `Target-independent algorithms`_ used to implement various phases of native
54ff9feeb5SBill Wendling   code generation (register allocation, scheduling, stack frame representation,
55ff9feeb5SBill Wendling   etc).  This code lives in ``lib/CodeGen/``.
56ff9feeb5SBill Wendling
57ff9feeb5SBill Wendling5. `Implementations of the abstract target description interfaces`_ for
58ff9feeb5SBill Wendling   particular targets.  These machine descriptions make use of the components
59ff9feeb5SBill Wendling   provided by LLVM, and can optionally provide custom target-specific passes,
60ff9feeb5SBill Wendling   to build complete code generators for a specific target.  Target descriptions
61ff9feeb5SBill Wendling   live in ``lib/Target/``.
62ff9feeb5SBill Wendling
63ff9feeb5SBill Wendling6. The target-independent JIT components.  The LLVM JIT is completely target
64ff9feeb5SBill Wendling   independent (it uses the ``TargetJITInfo`` structure to interface for
65ff9feeb5SBill Wendling   target-specific issues.  The code for the target-independent JIT lives in
66ff9feeb5SBill Wendling   ``lib/ExecutionEngine/JIT``.
67ff9feeb5SBill Wendling
68ff9feeb5SBill WendlingDepending on which part of the code generator you are interested in working on,
69ff9feeb5SBill Wendlingdifferent pieces of this will be useful to you.  In any case, you should be
70ff9feeb5SBill Wendlingfamiliar with the `target description`_ and `machine code representation`_
71ff9feeb5SBill Wendlingclasses.  If you want to add a backend for a new target, you will need to
72ff9feeb5SBill Wendling`implement the target description`_ classes for your new target and understand
731703e705SSean Silvathe :doc:`LLVM code representation <LangRef>`.  If you are interested in
74ff9feeb5SBill Wendlingimplementing a new `code generation algorithm`_, it should only depend on the
75ff9feeb5SBill Wendlingtarget-description and machine code representation classes, ensuring that it is
76ff9feeb5SBill Wendlingportable.
77ff9feeb5SBill Wendling
78ff9feeb5SBill WendlingRequired components in the code generator
79ff9feeb5SBill Wendling-----------------------------------------
80ff9feeb5SBill Wendling
81ff9feeb5SBill WendlingThe two pieces of the LLVM code generator are the high-level interface to the
82ff9feeb5SBill Wendlingcode generator and the set of reusable components that can be used to build
83ff9feeb5SBill Wendlingtarget-specific backends.  The two most important interfaces (:raw-html:`<tt>`
849cfc13d4SMicah Villmow`TargetMachine`_ :raw-html:`</tt>` and :raw-html:`<tt>` `DataLayout`_
85ff9feeb5SBill Wendling:raw-html:`</tt>`) are the only ones that are required to be defined for a
86ff9feeb5SBill Wendlingbackend to fit into the LLVM system, but the others must be defined if the
87ff9feeb5SBill Wendlingreusable code generator components are going to be used.
88ff9feeb5SBill Wendling
89ff9feeb5SBill WendlingThis design has two important implications.  The first is that LLVM can support
90ff9feeb5SBill Wendlingcompletely non-traditional code generation targets.  For example, the C backend
91ff9feeb5SBill Wendlingdoes not require register allocation, instruction selection, or any of the other
92ff9feeb5SBill Wendlingstandard components provided by the system.  As such, it only implements these
93ff9feeb5SBill Wendlingtwo interfaces, and does its own thing. Note that C backend was removed from the
94ff9feeb5SBill Wendlingtrunk since LLVM 3.1 release. Another example of a code generator like this is a
95ff9feeb5SBill Wendling(purely hypothetical) backend that converts LLVM to the GCC RTL form and uses
96ff9feeb5SBill WendlingGCC to emit machine code for a target.
97ff9feeb5SBill Wendling
98ff9feeb5SBill WendlingThis design also implies that it is possible to design and implement radically
99ff9feeb5SBill Wendlingdifferent code generators in the LLVM system that do not make use of any of the
100ff9feeb5SBill Wendlingbuilt-in components.  Doing so is not recommended at all, but could be required
101ff9feeb5SBill Wendlingfor radically different targets that do not fit into the LLVM machine
102ff9feeb5SBill Wendlingdescription model: FPGAs for example.
103ff9feeb5SBill Wendling
104ff9feeb5SBill Wendling.. _high-level design of the code generator:
105ff9feeb5SBill Wendling
106ff9feeb5SBill WendlingThe high-level design of the code generator
107ff9feeb5SBill Wendling-------------------------------------------
108ff9feeb5SBill Wendling
109ff9feeb5SBill WendlingThe LLVM target-independent code generator is designed to support efficient and
110ff9feeb5SBill Wendlingquality code generation for standard register-based microprocessors.  Code
111ff9feeb5SBill Wendlinggeneration in this model is divided into the following stages:
112ff9feeb5SBill Wendling
113ff9feeb5SBill Wendling1. `Instruction Selection`_ --- This phase determines an efficient way to
114ff9feeb5SBill Wendling   express the input LLVM code in the target instruction set.  This stage
115ff9feeb5SBill Wendling   produces the initial code for the program in the target instruction set, then
116ff9feeb5SBill Wendling   makes use of virtual registers in SSA form and physical registers that
117ff9feeb5SBill Wendling   represent any required register assignments due to target constraints or
118ff9feeb5SBill Wendling   calling conventions.  This step turns the LLVM code into a DAG of target
119ff9feeb5SBill Wendling   instructions.
120ff9feeb5SBill Wendling
121ff9feeb5SBill Wendling2. `Scheduling and Formation`_ --- This phase takes the DAG of target
122ff9feeb5SBill Wendling   instructions produced by the instruction selection phase, determines an
123ff9feeb5SBill Wendling   ordering of the instructions, then emits the instructions as :raw-html:`<tt>`
124ff9feeb5SBill Wendling   `MachineInstr`_\s :raw-html:`</tt>` with that ordering.  Note that we
125ff9feeb5SBill Wendling   describe this in the `instruction selection section`_ because it operates on
126ff9feeb5SBill Wendling   a `SelectionDAG`_.
127ff9feeb5SBill Wendling
128ff9feeb5SBill Wendling3. `SSA-based Machine Code Optimizations`_ --- This optional stage consists of a
129ff9feeb5SBill Wendling   series of machine-code optimizations that operate on the SSA-form produced by
130ff9feeb5SBill Wendling   the instruction selector.  Optimizations like modulo-scheduling or peephole
131ff9feeb5SBill Wendling   optimization work here.
132ff9feeb5SBill Wendling
133ff9feeb5SBill Wendling4. `Register Allocation`_ --- The target code is transformed from an infinite
134ff9feeb5SBill Wendling   virtual register file in SSA form to the concrete register file used by the
135ff9feeb5SBill Wendling   target.  This phase introduces spill code and eliminates all virtual register
136ff9feeb5SBill Wendling   references from the program.
137ff9feeb5SBill Wendling
138ff9feeb5SBill Wendling5. `Prolog/Epilog Code Insertion`_ --- Once the machine code has been generated
139ff9feeb5SBill Wendling   for the function and the amount of stack space required is known (used for
140ff9feeb5SBill Wendling   LLVM alloca's and spill slots), the prolog and epilog code for the function
141ff9feeb5SBill Wendling   can be inserted and "abstract stack location references" can be eliminated.
142ff9feeb5SBill Wendling   This stage is responsible for implementing optimizations like frame-pointer
143ff9feeb5SBill Wendling   elimination and stack packing.
144ff9feeb5SBill Wendling
145ff9feeb5SBill Wendling6. `Late Machine Code Optimizations`_ --- Optimizations that operate on "final"
146ff9feeb5SBill Wendling   machine code can go here, such as spill code scheduling and peephole
147ff9feeb5SBill Wendling   optimizations.
148ff9feeb5SBill Wendling
149ff9feeb5SBill Wendling7. `Code Emission`_ --- The final stage actually puts out the code for the
150ff9feeb5SBill Wendling   current function, either in the target assembler format or in machine
151ff9feeb5SBill Wendling   code.
152ff9feeb5SBill Wendling
153ff9feeb5SBill WendlingThe code generator is based on the assumption that the instruction selector will
154ff9feeb5SBill Wendlinguse an optimal pattern matching selector to create high-quality sequences of
155ff9feeb5SBill Wendlingnative instructions.  Alternative code generator designs based on pattern
156ff9feeb5SBill Wendlingexpansion and aggressive iterative peephole optimization are much slower.  This
157ff9feeb5SBill Wendlingdesign permits efficient compilation (important for JIT environments) and
158ff9feeb5SBill Wendlingaggressive optimization (used when generating code offline) by allowing
159ff9feeb5SBill Wendlingcomponents of varying levels of sophistication to be used for any step of
160ff9feeb5SBill Wendlingcompilation.
161ff9feeb5SBill Wendling
162ff9feeb5SBill WendlingIn addition to these stages, target implementations can insert arbitrary
163ff9feeb5SBill Wendlingtarget-specific passes into the flow.  For example, the X86 target uses a
164ff9feeb5SBill Wendlingspecial pass to handle the 80x87 floating point stack architecture.  Other
165ff9feeb5SBill Wendlingtargets with unusual requirements can be supported with custom passes as needed.
166ff9feeb5SBill Wendling
167ff9feeb5SBill WendlingUsing TableGen for target description
168ff9feeb5SBill Wendling-------------------------------------
169ff9feeb5SBill Wendling
170ff9feeb5SBill WendlingThe target description classes require a detailed description of the target
171ff9feeb5SBill Wendlingarchitecture.  These target descriptions often have a large amount of common
172ff9feeb5SBill Wendlinginformation (e.g., an ``add`` instruction is almost identical to a ``sub``
173ff9feeb5SBill Wendlinginstruction).  In order to allow the maximum amount of commonality to be
174ff9feeb5SBill Wendlingfactored out, the LLVM code generator uses the
175397ee6ecSSean Silva:doc:`TableGen/index` tool to describe big chunks of the
176ff9feeb5SBill Wendlingtarget machine, which allows the use of domain-specific and target-specific
177ff9feeb5SBill Wendlingabstractions to reduce the amount of repetition.
178ff9feeb5SBill Wendling
179ff9feeb5SBill WendlingAs LLVM continues to be developed and refined, we plan to move more and more of
180ff9feeb5SBill Wendlingthe target description to the ``.td`` form.  Doing so gives us a number of
181ff9feeb5SBill Wendlingadvantages.  The most important is that it makes it easier to port LLVM because
182ff9feeb5SBill Wendlingit reduces the amount of C++ code that has to be written, and the surface area
183ff9feeb5SBill Wendlingof the code generator that needs to be understood before someone can get
184ff9feeb5SBill Wendlingsomething working.  Second, it makes it easier to change things. In particular,
185ff9feeb5SBill Wendlingif tables and other things are all emitted by ``tblgen``, we only need a change
186ff9feeb5SBill Wendlingin one place (``tblgen``) to update all of the targets to a new interface.
187ff9feeb5SBill Wendling
188ff9feeb5SBill Wendling.. _Abstract target description:
189ff9feeb5SBill Wendling.. _target description:
190ff9feeb5SBill Wendling
191ff9feeb5SBill WendlingTarget description classes
192ff9feeb5SBill Wendling==========================
193ff9feeb5SBill Wendling
194ff9feeb5SBill WendlingThe LLVM target description classes (located in the ``include/llvm/Target``
195ff9feeb5SBill Wendlingdirectory) provide an abstract description of the target machine independent of
196ff9feeb5SBill Wendlingany particular client.  These classes are designed to capture the *abstract*
197ff9feeb5SBill Wendlingproperties of the target (such as the instructions and registers it has), and do
198ff9feeb5SBill Wendlingnot incorporate any particular pieces of code generation algorithms.
199ff9feeb5SBill Wendling
2009cfc13d4SMicah VillmowAll of the target description classes (except the :raw-html:`<tt>` `DataLayout`_
201ff9feeb5SBill Wendling:raw-html:`</tt>` class) are designed to be subclassed by the concrete target
202ff9feeb5SBill Wendlingimplementation, and have virtual methods implemented.  To get to these
203ff9feeb5SBill Wendlingimplementations, the :raw-html:`<tt>` `TargetMachine`_ :raw-html:`</tt>` class
204ff9feeb5SBill Wendlingprovides accessors that should be implemented by the target.
205ff9feeb5SBill Wendling
206ff9feeb5SBill Wendling.. _TargetMachine:
207ff9feeb5SBill Wendling
208ff9feeb5SBill WendlingThe ``TargetMachine`` class
209ff9feeb5SBill Wendling---------------------------
210ff9feeb5SBill Wendling
211ff9feeb5SBill WendlingThe ``TargetMachine`` class provides virtual methods that are used to access the
212ff9feeb5SBill Wendlingtarget-specific implementations of the various target description classes via
213ff9feeb5SBill Wendlingthe ``get*Info`` methods (``getInstrInfo``, ``getRegisterInfo``,
214ff9feeb5SBill Wendling``getFrameInfo``, etc.).  This class is designed to be specialized by a concrete
215ff9feeb5SBill Wendlingtarget implementation (e.g., ``X86TargetMachine``) which implements the various
216ff9feeb5SBill Wendlingvirtual methods.  The only required target description class is the
2179cfc13d4SMicah Villmow:raw-html:`<tt>` `DataLayout`_ :raw-html:`</tt>` class, but if the code
218ff9feeb5SBill Wendlinggenerator components are to be used, the other interfaces should be implemented
219ff9feeb5SBill Wendlingas well.
220ff9feeb5SBill Wendling
2219cfc13d4SMicah Villmow.. _DataLayout:
222ff9feeb5SBill Wendling
2239cfc13d4SMicah VillmowThe ``DataLayout`` class
224ff9feeb5SBill Wendling------------------------
225ff9feeb5SBill Wendling
2269cfc13d4SMicah VillmowThe ``DataLayout`` class is the only required target description class, and it
22713539d1bSDmitri Gribenkois the only class that is not extensible (you cannot derive a new class from
2289cfc13d4SMicah Villmowit).  ``DataLayout`` specifies information about how the target lays out memory
229ff9feeb5SBill Wendlingfor structures, the alignment requirements for various data types, the size of
230ff9feeb5SBill Wendlingpointers in the target, and whether the target is little-endian or
231ff9feeb5SBill Wendlingbig-endian.
232ff9feeb5SBill Wendling
2338bd389d8SDmitri Gribenko.. _TargetLowering:
234ff9feeb5SBill Wendling
235ff9feeb5SBill WendlingThe ``TargetLowering`` class
236ff9feeb5SBill Wendling----------------------------
237ff9feeb5SBill Wendling
238ff9feeb5SBill WendlingThe ``TargetLowering`` class is used by SelectionDAG based instruction selectors
239ff9feeb5SBill Wendlingprimarily to describe how LLVM code should be lowered to SelectionDAG
240ff9feeb5SBill Wendlingoperations.  Among other things, this class indicates:
241ff9feeb5SBill Wendling
242ff9feeb5SBill Wendling* an initial register class to use for various ``ValueType``\s,
243ff9feeb5SBill Wendling
244ff9feeb5SBill Wendling* which operations are natively supported by the target machine,
245ff9feeb5SBill Wendling
246ff9feeb5SBill Wendling* the return type of ``setcc`` operations,
247ff9feeb5SBill Wendling
248ff9feeb5SBill Wendling* the type to use for shift amounts, and
249ff9feeb5SBill Wendling
250ff9feeb5SBill Wendling* various high-level characteristics, like whether it is profitable to turn
25113539d1bSDmitri Gribenko  division by a constant into a multiplication sequence.
252ff9feeb5SBill Wendling
2537174c5a0SDmitri Gribenko.. _TargetRegisterInfo:
2547174c5a0SDmitri Gribenko
255ff9feeb5SBill WendlingThe ``TargetRegisterInfo`` class
256ff9feeb5SBill Wendling--------------------------------
257ff9feeb5SBill Wendling
258ff9feeb5SBill WendlingThe ``TargetRegisterInfo`` class is used to describe the register file of the
259ff9feeb5SBill Wendlingtarget and any interactions between the registers.
260ff9feeb5SBill Wendling
26170f4e794SEli BenderskyRegisters are represented in the code generator by unsigned integers.  Physical
26270f4e794SEli Benderskyregisters (those that actually exist in the target description) are unique
26370f4e794SEli Benderskysmall numbers, and virtual registers are generally large.  Note that
26470f4e794SEli Benderskyregister ``#0`` is reserved as a flag value.
265ff9feeb5SBill Wendling
266ff9feeb5SBill WendlingEach register in the processor description has an associated
267ff9feeb5SBill Wendling``TargetRegisterDesc`` entry, which provides a textual name for the register
268ff9feeb5SBill Wendling(used for assembly output and debugging dumps) and a set of aliases (used to
269ff9feeb5SBill Wendlingindicate whether one register overlaps with another).
270ff9feeb5SBill Wendling
271ff9feeb5SBill WendlingIn addition to the per-register description, the ``TargetRegisterInfo`` class
272ff9feeb5SBill Wendlingexposes a set of processor specific register classes (instances of the
273ff9feeb5SBill Wendling``TargetRegisterClass`` class).  Each register class contains sets of registers
274ff9feeb5SBill Wendlingthat have the same properties (for example, they are all 32-bit integer
275ff9feeb5SBill Wendlingregisters).  Each SSA virtual register created by the instruction selector has
276ff9feeb5SBill Wendlingan associated register class.  When the register allocator runs, it replaces
277ff9feeb5SBill Wendlingvirtual registers with a physical register in the set.
278ff9feeb5SBill Wendling
279ff9feeb5SBill WendlingThe target-specific implementations of these classes is auto-generated from a
280397ee6ecSSean Silva:doc:`TableGen/index` description of the register file.
281ff9feeb5SBill Wendling
282ff9feeb5SBill Wendling.. _TargetInstrInfo:
283ff9feeb5SBill Wendling
284ff9feeb5SBill WendlingThe ``TargetInstrInfo`` class
285ff9feeb5SBill Wendling-----------------------------
286ff9feeb5SBill Wendling
287ff9feeb5SBill WendlingThe ``TargetInstrInfo`` class is used to describe the machine instructions
2880b85ba7dSEli Benderskysupported by the target.  Descriptions define things like the mnemonic for
2890b85ba7dSEli Benderskythe opcode, the number of operands, the list of implicit register uses and defs,
2900b85ba7dSEli Benderskywhether the instruction has certain target-independent properties (accesses
2910b85ba7dSEli Benderskymemory, is commutable, etc), and holds any target-specific flags.
292ff9feeb5SBill Wendling
2934b916b21SNico WeberThe ``TargetFrameLowering`` class
2944773d0b4SDan Liew---------------------------------
295ff9feeb5SBill Wendling
2964b916b21SNico WeberThe ``TargetFrameLowering`` class is used to provide information about the stack
297ff9feeb5SBill Wendlingframe layout of the target. It holds the direction of stack growth, the known
298ff9feeb5SBill Wendlingstack alignment on entry to each function, and the offset to the local area.
299ff9feeb5SBill WendlingThe offset to the local area is the offset from the stack pointer on function
300ff9feeb5SBill Wendlingentry to the first location where function data (local variables, spill
301ff9feeb5SBill Wendlinglocations) can be stored.
302ff9feeb5SBill Wendling
303ff9feeb5SBill WendlingThe ``TargetSubtarget`` class
304ff9feeb5SBill Wendling-----------------------------
305ff9feeb5SBill Wendling
306ff9feeb5SBill WendlingThe ``TargetSubtarget`` class is used to provide information about the specific
307ff9feeb5SBill Wendlingchip set being targeted.  A sub-target informs code generation of which
308ff9feeb5SBill Wendlinginstructions are supported, instruction latencies and instruction execution
309ff9feeb5SBill Wendlingitinerary; i.e., which processing units are used, in what order, and for how
310ff9feeb5SBill Wendlinglong.
311ff9feeb5SBill Wendling
312ff9feeb5SBill WendlingThe ``TargetJITInfo`` class
313ff9feeb5SBill Wendling---------------------------
314ff9feeb5SBill Wendling
315ff9feeb5SBill WendlingThe ``TargetJITInfo`` class exposes an abstract interface used by the
316ff9feeb5SBill WendlingJust-In-Time code generator to perform target-specific activities, such as
317ff9feeb5SBill Wendlingemitting stubs.  If a ``TargetMachine`` supports JIT code generation, it should
318ff9feeb5SBill Wendlingprovide one of these objects through the ``getJITInfo`` method.
319ff9feeb5SBill Wendling
320ff9feeb5SBill Wendling.. _code being generated:
321ff9feeb5SBill Wendling.. _machine code representation:
322ff9feeb5SBill Wendling
323ff9feeb5SBill WendlingMachine code description classes
324ff9feeb5SBill Wendling================================
325ff9feeb5SBill Wendling
326ff9feeb5SBill WendlingAt the high-level, LLVM code is translated to a machine specific representation
327ff9feeb5SBill Wendlingformed out of :raw-html:`<tt>` `MachineFunction`_ :raw-html:`</tt>`,
328ff9feeb5SBill Wendling:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>`, and :raw-html:`<tt>`
329ff9feeb5SBill Wendling`MachineInstr`_ :raw-html:`</tt>` instances (defined in
330ff9feeb5SBill Wendling``include/llvm/CodeGen``).  This representation is completely target agnostic,
331ff9feeb5SBill Wendlingrepresenting instructions in their most abstract form: an opcode and a series of
332ff9feeb5SBill Wendlingoperands.  This representation is designed to support both an SSA representation
333ff9feeb5SBill Wendlingfor machine code, as well as a register allocated, non-SSA form.
334ff9feeb5SBill Wendling
335ff9feeb5SBill Wendling.. _MachineInstr:
336ff9feeb5SBill Wendling
337ff9feeb5SBill WendlingThe ``MachineInstr`` class
338ff9feeb5SBill Wendling--------------------------
339ff9feeb5SBill Wendling
340ff9feeb5SBill WendlingTarget machine instructions are represented as instances of the ``MachineInstr``
341ff9feeb5SBill Wendlingclass.  This class is an extremely abstract way of representing machine
342ff9feeb5SBill Wendlinginstructions.  In particular, it only keeps track of an opcode number and a set
343ff9feeb5SBill Wendlingof operands.
344ff9feeb5SBill Wendling
345ff9feeb5SBill WendlingThe opcode number is a simple unsigned integer that only has meaning to a
346ff9feeb5SBill Wendlingspecific backend.  All of the instructions for a target should be defined in the
347ff9feeb5SBill Wendling``*InstrInfo.td`` file for the target. The opcode enum values are auto-generated
348ff9feeb5SBill Wendlingfrom this description.  The ``MachineInstr`` class does not have any information
349ff9feeb5SBill Wendlingabout how to interpret the instruction (i.e., what the semantics of the
350ff9feeb5SBill Wendlinginstruction are); for that you must refer to the :raw-html:`<tt>`
351ff9feeb5SBill Wendling`TargetInstrInfo`_ :raw-html:`</tt>` class.
352ff9feeb5SBill Wendling
353ff9feeb5SBill WendlingThe operands of a machine instruction can be of several different types: a
354ff9feeb5SBill Wendlingregister reference, a constant integer, a basic block reference, etc.  In
355ff9feeb5SBill Wendlingaddition, a machine operand should be marked as a def or a use of the value
356ff9feeb5SBill Wendling(though only registers are allowed to be defs).
357ff9feeb5SBill Wendling
358ff9feeb5SBill WendlingBy convention, the LLVM code generator orders instruction operands so that all
359ff9feeb5SBill Wendlingregister definitions come before the register uses, even on architectures that
360ff9feeb5SBill Wendlingare normally printed in other orders.  For example, the SPARC add instruction:
361ff9feeb5SBill Wendling"``add %i1, %i2, %i3``" adds the "%i1", and "%i2" registers and stores the
362ff9feeb5SBill Wendlingresult into the "%i3" register.  In the LLVM code generator, the operands should
363ff9feeb5SBill Wendlingbe stored as "``%i3, %i1, %i2``": with the destination first.
364ff9feeb5SBill Wendling
365ff9feeb5SBill WendlingKeeping destination (definition) operands at the beginning of the operand list
366ff9feeb5SBill Wendlinghas several advantages.  In particular, the debugging printer will print the
367ff9feeb5SBill Wendlinginstruction like this:
368ff9feeb5SBill Wendling
369ff9feeb5SBill Wendling.. code-block:: llvm
370ff9feeb5SBill Wendling
371ff9feeb5SBill Wendling  %r3 = add %i1, %i2
372ff9feeb5SBill Wendling
373ff9feeb5SBill WendlingAlso if the first operand is a def, it is easier to `create instructions`_ whose
374ff9feeb5SBill Wendlingonly def is the first operand.
375ff9feeb5SBill Wendling
376ff9feeb5SBill Wendling.. _create instructions:
377ff9feeb5SBill Wendling
378ff9feeb5SBill WendlingUsing the ``MachineInstrBuilder.h`` functions
379ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
380ff9feeb5SBill Wendling
381ff9feeb5SBill WendlingMachine instructions are created by using the ``BuildMI`` functions, located in
382ff9feeb5SBill Wendlingthe ``include/llvm/CodeGen/MachineInstrBuilder.h`` file.  The ``BuildMI``
383ff9feeb5SBill Wendlingfunctions make it easy to build arbitrary machine instructions.  Usage of the
384ff9feeb5SBill Wendling``BuildMI`` functions look like this:
385ff9feeb5SBill Wendling
386ff9feeb5SBill Wendling.. code-block:: c++
387ff9feeb5SBill Wendling
388ff9feeb5SBill Wendling  // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
3893da1078aSKrzysztof Parzyszek  // instruction and insert it at the end of the given MachineBasicBlock.
3903da1078aSKrzysztof Parzyszek  const TargetInstrInfo &TII = ...
39137f92c74SDmitri Gribenko  MachineBasicBlock &MBB = ...
3923da1078aSKrzysztof Parzyszek  DebugLoc DL;
3933da1078aSKrzysztof Parzyszek  MachineInstr *MI = BuildMI(MBB, DL, TII.get(X86::MOV32ri), DestReg).addImm(42);
394ff9feeb5SBill Wendling
395ff9feeb5SBill Wendling  // Create the same instr, but insert it before a specified iterator point.
396ff9feeb5SBill Wendling  MachineBasicBlock::iterator MBBI = ...
3973da1078aSKrzysztof Parzyszek  BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), DestReg).addImm(42);
398ff9feeb5SBill Wendling
399ff9feeb5SBill Wendling  // Create a 'cmp Reg, 0' instruction, no destination reg.
4003da1078aSKrzysztof Parzyszek  MI = BuildMI(MBB, DL, TII.get(X86::CMP32ri8)).addReg(Reg).addImm(42);
401ff9feeb5SBill Wendling
402ff9feeb5SBill Wendling  // Create an 'sahf' instruction which takes no operands and stores nothing.
4033da1078aSKrzysztof Parzyszek  MI = BuildMI(MBB, DL, TII.get(X86::SAHF));
404ff9feeb5SBill Wendling
405ff9feeb5SBill Wendling  // Create a self looping branch instruction.
4063da1078aSKrzysztof Parzyszek  BuildMI(MBB, DL, TII.get(X86::JNE)).addMBB(&MBB);
407ff9feeb5SBill Wendling
4083da1078aSKrzysztof ParzyszekIf you need to add a definition operand (other than the optional destination
4093da1078aSKrzysztof Parzyszekregister), you must explicitly mark it as such:
410ff9feeb5SBill Wendling
411ff9feeb5SBill Wendling.. code-block:: c++
412ff9feeb5SBill Wendling
413ff9feeb5SBill Wendling  MI.addReg(Reg, RegState::Define);
414ff9feeb5SBill Wendling
415ff9feeb5SBill WendlingFixed (preassigned) registers
416ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
417ff9feeb5SBill Wendling
418ff9feeb5SBill WendlingOne important issue that the code generator needs to be aware of is the presence
419ff9feeb5SBill Wendlingof fixed registers.  In particular, there are often places in the instruction
420ff9feeb5SBill Wendlingstream where the register allocator *must* arrange for a particular value to be
421ff9feeb5SBill Wendlingin a particular register.  This can occur due to limitations of the instruction
422ff9feeb5SBill Wendlingset (e.g., the X86 can only do a 32-bit divide with the ``EAX``/``EDX``
423ff9feeb5SBill Wendlingregisters), or external factors like calling conventions.  In any case, the
424ff9feeb5SBill Wendlinginstruction selector should emit code that copies a virtual register into or out
425ff9feeb5SBill Wendlingof a physical register when needed.
426ff9feeb5SBill Wendling
427ff9feeb5SBill WendlingFor example, consider this simple LLVM example:
428ff9feeb5SBill Wendling
429ff9feeb5SBill Wendling.. code-block:: llvm
430ff9feeb5SBill Wendling
431ff9feeb5SBill Wendling  define i32 @test(i32 %X, i32 %Y) {
432cdc53956STim Northover    %Z = sdiv i32 %X, %Y
433ff9feeb5SBill Wendling    ret i32 %Z
434ff9feeb5SBill Wendling  }
435ff9feeb5SBill Wendling
436cdc53956STim NorthoverThe X86 instruction selector might produce this machine code for the ``div`` and
437cdc53956STim Northover``ret``:
438ff9feeb5SBill Wendling
439124f2593SRenato Golin.. code-block:: text
440ff9feeb5SBill Wendling
441ff9feeb5SBill Wendling  ;; Start of div
442ff9feeb5SBill Wendling  %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
443ff9feeb5SBill Wendling  %reg1027 = sar %reg1024, 31
444ff9feeb5SBill Wendling  %EDX = mov %reg1027           ;; Sign extend X into EDX
445ff9feeb5SBill Wendling  idiv %reg1025                 ;; Divide by Y (in reg1025)
446ff9feeb5SBill Wendling  %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
447ff9feeb5SBill Wendling
448ff9feeb5SBill Wendling  ;; Start of ret
449ff9feeb5SBill Wendling  %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
450ff9feeb5SBill Wendling  ret
451ff9feeb5SBill Wendling
452cdc53956STim NorthoverBy the end of code generation, the register allocator would coalesce the
453cdc53956STim Northoverregisters and delete the resultant identity moves producing the following
454ff9feeb5SBill Wendlingcode:
455ff9feeb5SBill Wendling
456124f2593SRenato Golin.. code-block:: text
457ff9feeb5SBill Wendling
458ff9feeb5SBill Wendling  ;; X is in EAX, Y is in ECX
459ff9feeb5SBill Wendling  mov %EAX, %EDX
460ff9feeb5SBill Wendling  sar %EDX, 31
461ff9feeb5SBill Wendling  idiv %ECX
462ff9feeb5SBill Wendling  ret
463ff9feeb5SBill Wendling
464ff9feeb5SBill WendlingThis approach is extremely general (if it can handle the X86 architecture, it
465ff9feeb5SBill Wendlingcan handle anything!) and allows all of the target specific knowledge about the
466ff9feeb5SBill Wendlinginstruction stream to be isolated in the instruction selector.  Note that
467ff9feeb5SBill Wendlingphysical registers should have a short lifetime for good code generation, and
468ff9feeb5SBill Wendlingall physical registers are assumed dead on entry to and exit from basic blocks
469ff9feeb5SBill Wendling(before register allocation).  Thus, if you need a value to be live across basic
470ff9feeb5SBill Wendlingblock boundaries, it *must* live in a virtual register.
471ff9feeb5SBill Wendling
472ff9feeb5SBill WendlingCall-clobbered registers
473ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^
474ff9feeb5SBill Wendling
475ff9feeb5SBill WendlingSome machine instructions, like calls, clobber a large number of physical
476ff9feeb5SBill Wendlingregisters.  Rather than adding ``<def,dead>`` operands for all of them, it is
477ff9feeb5SBill Wendlingpossible to use an ``MO_RegisterMask`` operand instead.  The register mask
478ff9feeb5SBill Wendlingoperand holds a bit mask of preserved registers, and everything else is
479ff9feeb5SBill Wendlingconsidered to be clobbered by the instruction.
480ff9feeb5SBill Wendling
481ff9feeb5SBill WendlingMachine code in SSA form
482ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^
483ff9feeb5SBill Wendling
484ff9feeb5SBill Wendling``MachineInstr``'s are initially selected in SSA-form, and are maintained in
485ff9feeb5SBill WendlingSSA-form until register allocation happens.  For the most part, this is
486ff9feeb5SBill Wendlingtrivially simple since LLVM is already in SSA form; LLVM PHI nodes become
487ff9feeb5SBill Wendlingmachine code PHI nodes, and virtual registers are only allowed to have a single
488ff9feeb5SBill Wendlingdefinition.
489ff9feeb5SBill Wendling
490ff9feeb5SBill WendlingAfter register allocation, machine code is no longer in SSA-form because there
491ff9feeb5SBill Wendlingare no virtual registers left in the code.
492ff9feeb5SBill Wendling
493ff9feeb5SBill Wendling.. _MachineBasicBlock:
494ff9feeb5SBill Wendling
495ff9feeb5SBill WendlingThe ``MachineBasicBlock`` class
496ff9feeb5SBill Wendling-------------------------------
497ff9feeb5SBill Wendling
498ff9feeb5SBill WendlingThe ``MachineBasicBlock`` class contains a list of machine instructions
499ff9feeb5SBill Wendling(:raw-html:`<tt>` `MachineInstr`_ :raw-html:`</tt>` instances).  It roughly
500ff9feeb5SBill Wendlingcorresponds to the LLVM code input to the instruction selector, but there can be
501ff9feeb5SBill Wendlinga one-to-many mapping (i.e. one LLVM basic block can map to multiple machine
502ff9feeb5SBill Wendlingbasic blocks). The ``MachineBasicBlock`` class has a "``getBasicBlock``" method,
503ff9feeb5SBill Wendlingwhich returns the LLVM basic block that it comes from.
504ff9feeb5SBill Wendling
505ff9feeb5SBill Wendling.. _MachineFunction:
506ff9feeb5SBill Wendling
507ff9feeb5SBill WendlingThe ``MachineFunction`` class
508ff9feeb5SBill Wendling-----------------------------
509ff9feeb5SBill Wendling
510ff9feeb5SBill WendlingThe ``MachineFunction`` class contains a list of machine basic blocks
511ff9feeb5SBill Wendling(:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>` instances).  It
512ff9feeb5SBill Wendlingcorresponds one-to-one with the LLVM function input to the instruction selector.
513e1567e77SYoungsuk KimIn addition to a list of basic blocks, the ``MachineFunction`` contains a
514ff9feeb5SBill Wendling``MachineConstantPool``, a ``MachineFrameInfo``, a ``MachineFunctionInfo``, and
515ff9feeb5SBill Wendlinga ``MachineRegisterInfo``.  See ``include/llvm/CodeGen/MachineFunction.h`` for
516ff9feeb5SBill Wendlingmore information.
517ff9feeb5SBill Wendling
518ff9feeb5SBill Wendling``MachineInstr Bundles``
519ff9feeb5SBill Wendling------------------------
520ff9feeb5SBill Wendling
521ff9feeb5SBill WendlingLLVM code generator can model sequences of instructions as MachineInstr
522ff9feeb5SBill Wendlingbundles. A MI bundle can model a VLIW group / pack which contains an arbitrary
523ff9feeb5SBill Wendlingnumber of parallel instructions. It can also be used to model a sequential list
524ff9feeb5SBill Wendlingof instructions (potentially with data dependencies) that cannot be legally
525ff9feeb5SBill Wendlingseparated (e.g. ARM Thumb2 IT blocks).
526ff9feeb5SBill Wendling
527ff9feeb5SBill WendlingConceptually a MI bundle is a MI with a number of other MIs nested within:
528ff9feeb5SBill Wendling
529ff9feeb5SBill Wendling::
530ff9feeb5SBill Wendling
531ff9feeb5SBill Wendling  --------------
532ff9feeb5SBill Wendling  |   Bundle   | ---------
533ff9feeb5SBill Wendling  --------------          \
534ff9feeb5SBill Wendling         |           ----------------
535ff9feeb5SBill Wendling         |           |      MI      |
536ff9feeb5SBill Wendling         |           ----------------
537ff9feeb5SBill Wendling         |                   |
538ff9feeb5SBill Wendling         |           ----------------
539ff9feeb5SBill Wendling         |           |      MI      |
540ff9feeb5SBill Wendling         |           ----------------
541ff9feeb5SBill Wendling         |                   |
542ff9feeb5SBill Wendling         |           ----------------
543ff9feeb5SBill Wendling         |           |      MI      |
544ff9feeb5SBill Wendling         |           ----------------
545ff9feeb5SBill Wendling         |
546ff9feeb5SBill Wendling  --------------
547ff9feeb5SBill Wendling  |   Bundle   | --------
548ff9feeb5SBill Wendling  --------------         \
549ff9feeb5SBill Wendling         |           ----------------
550ff9feeb5SBill Wendling         |           |      MI      |
551ff9feeb5SBill Wendling         |           ----------------
552ff9feeb5SBill Wendling         |                   |
553ff9feeb5SBill Wendling         |           ----------------
554ff9feeb5SBill Wendling         |           |      MI      |
555ff9feeb5SBill Wendling         |           ----------------
556ff9feeb5SBill Wendling         |                   |
557ff9feeb5SBill Wendling         |                  ...
558ff9feeb5SBill Wendling         |
559ff9feeb5SBill Wendling  --------------
560ff9feeb5SBill Wendling  |   Bundle   | --------
561ff9feeb5SBill Wendling  --------------         \
562ff9feeb5SBill Wendling         |
563ff9feeb5SBill Wendling        ...
564ff9feeb5SBill Wendling
565ff9feeb5SBill WendlingMI bundle support does not change the physical representations of
566ff9feeb5SBill WendlingMachineBasicBlock and MachineInstr. All the MIs (including top level and nested
567ff9feeb5SBill Wendlingones) are stored as sequential list of MIs. The "bundled" MIs are marked with
568ff9feeb5SBill Wendlingthe 'InsideBundle' flag. A top level MI with the special BUNDLE opcode is used
569c36a1f1cSHiroshi Inoueto represent the start of a bundle. It's legal to mix BUNDLE MIs with individual
570ff9feeb5SBill WendlingMIs that are not inside bundles nor represent bundles.
571ff9feeb5SBill Wendling
572ff9feeb5SBill WendlingMachineInstr passes should operate on a MI bundle as a single unit. Member
573ff9feeb5SBill Wendlingmethods have been taught to correctly handle bundles and MIs inside bundles.
574ff9feeb5SBill WendlingThe MachineBasicBlock iterator has been modified to skip over bundled MIs to
575ff9feeb5SBill Wendlingenforce the bundle-as-a-single-unit concept. An alternative iterator
576ff9feeb5SBill Wendlinginstr_iterator has been added to MachineBasicBlock to allow passes to iterate
577ff9feeb5SBill Wendlingover all of the MIs in a MachineBasicBlock, including those which are nested
578ff9feeb5SBill Wendlinginside bundles. The top level BUNDLE instruction must have the correct set of
579ff9feeb5SBill Wendlingregister MachineOperand's that represent the cumulative inputs and outputs of
580ff9feeb5SBill Wendlingthe bundled MIs.
581ff9feeb5SBill Wendling
582e56a2ad8SMatt ArsenaultPacking / bundling of MachineInstrs for VLIW architectures should
583e56a2ad8SMatt Arsenaultgenerally be done as part of the register allocation super-pass. More
584e56a2ad8SMatt Arsenaultspecifically, the pass which determines what MIs should be bundled
585e56a2ad8SMatt Arsenaulttogether should be done after code generator exits SSA form
586e56a2ad8SMatt Arsenault(i.e. after two-address pass, PHI elimination, and copy coalescing).
587e56a2ad8SMatt ArsenaultSuch bundles should be finalized (i.e. adding BUNDLE MIs and input and
588e56a2ad8SMatt Arsenaultoutput register MachineOperands) after virtual registers have been
589e56a2ad8SMatt Arsenaultrewritten into physical registers. This eliminates the need to add
590e56a2ad8SMatt Arsenaultvirtual register operands to BUNDLE instructions which would
591e56a2ad8SMatt Arsenaulteffectively double the virtual register def and use lists. Bundles may
592e56a2ad8SMatt Arsenaultuse virtual registers and be formed in SSA form, but may not be
593e56a2ad8SMatt Arsenaultappropriate for all use cases.
594ff9feeb5SBill Wendling
595ff9feeb5SBill Wendling.. _MC Layer:
596ff9feeb5SBill Wendling
597ff9feeb5SBill WendlingThe "MC" Layer
598ff9feeb5SBill Wendling==============
599ff9feeb5SBill Wendling
600ff9feeb5SBill WendlingThe MC Layer is used to represent and process code at the raw machine code
601ff9feeb5SBill Wendlinglevel, devoid of "high level" information like "constant pools", "jump tables",
602ff9feeb5SBill Wendling"global variables" or anything like that.  At this level, LLVM handles things
603ff9feeb5SBill Wendlinglike label names, machine instructions, and sections in the object file.  The
604ff9feeb5SBill Wendlingcode in this layer is used for a number of important purposes: the tail end of
605ff9feeb5SBill Wendlingthe code generator uses it to write a .s or .o file, and it is also used by the
606ff9feeb5SBill Wendlingllvm-mc tool to implement standalone machine code assemblers and disassemblers.
607ff9feeb5SBill Wendling
608ff9feeb5SBill WendlingThis section describes some of the important classes.  There are also a number
609ff9feeb5SBill Wendlingof important subsystems that interact at this layer, they are described later in
610ff9feeb5SBill Wendlingthis manual.
611ff9feeb5SBill Wendling
612ff9feeb5SBill Wendling.. _MCStreamer:
613ff9feeb5SBill Wendling
614ff9feeb5SBill WendlingThe ``MCStreamer`` API
615ff9feeb5SBill Wendling----------------------
616ff9feeb5SBill Wendling
617ff9feeb5SBill WendlingMCStreamer is best thought of as an assembler API.  It is an abstract API which
618ff9feeb5SBill Wendlingis *implemented* in different ways (e.g. to output a .s file, output an ELF .o
619ff9feeb5SBill Wendlingfile, etc) but whose API correspond directly to what you see in a .s file.
620ff9feeb5SBill WendlingMCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute,
621adf4142fSFangrui SongswitchSection, emitValue (for .byte, .word), etc, which directly correspond to
622ff9feeb5SBill Wendlingassembly level directives.  It also has an EmitInstruction method, which is used
623ff9feeb5SBill Wendlingto output an MCInst to the streamer.
624ff9feeb5SBill Wendling
625ff9feeb5SBill WendlingThis API is most important for two clients: the llvm-mc stand-alone assembler is
626ff9feeb5SBill Wendlingeffectively a parser that parses a line, then invokes a method on MCStreamer. In
627ff9feeb5SBill Wendlingthe code generator, the `Code Emission`_ phase of the code generator lowers
628ff9feeb5SBill Wendlinghigher level LLVM IR and Machine* constructs down to the MC layer, emitting
629ff9feeb5SBill Wendlingdirectives through MCStreamer.
630ff9feeb5SBill Wendling
631ff9feeb5SBill WendlingOn the implementation side of MCStreamer, there are two major implementations:
632ff9feeb5SBill Wendlingone for writing out a .s file (MCAsmStreamer), and one for writing out a .o
633ca35b090SJustin Lebarfile (MCObjectStreamer).  MCAsmStreamer is a straightforward implementation
634ff9feeb5SBill Wendlingthat prints out a directive for each method (e.g. ``EmitValue -> .byte``), but
635ff9feeb5SBill WendlingMCObjectStreamer implements a full assembler.
636ff9feeb5SBill Wendling
637974efd32SRafael EspindolaFor target specific directives, the MCStreamer has a MCTargetStreamer instance.
638974efd32SRafael EspindolaEach target that needs it defines a class that inherits from it and is a lot
639974efd32SRafael Espindolalike MCStreamer itself: It has one method per directive and two classes that
640974efd32SRafael Espindolainherit from it, a target object streamer and a target asm streamer. The target
64148aee7c6SArtyom Skrobovasm streamer just prints it (``emitFnStart -> .fnstart``), and the object
642974efd32SRafael Espindolastreamer implement the assembler logic for it.
643974efd32SRafael Espindola
644b665d79fSRafael EspindolaTo make llvm use these classes, the target initialization must call
645b665d79fSRafael EspindolaTargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer
646b665d79fSRafael Espindolapassing callbacks that allocate the corresponding target streamer and pass it
647b665d79fSRafael Espindolato createAsmStreamer or to the appropriate object streamer constructor.
648b665d79fSRafael Espindola
649ff9feeb5SBill WendlingThe ``MCContext`` class
650ff9feeb5SBill Wendling-----------------------
651ff9feeb5SBill Wendling
652ff9feeb5SBill WendlingThe MCContext class is the owner of a variety of uniqued data structures at the
653ff9feeb5SBill WendlingMC layer, including symbols, sections, etc.  As such, this is the class that you
654ff9feeb5SBill Wendlinginteract with to create symbols and sections.  This class can not be subclassed.
655ff9feeb5SBill Wendling
656ff9feeb5SBill WendlingThe ``MCSymbol`` class
657ff9feeb5SBill Wendling----------------------
658ff9feeb5SBill Wendling
659ff9feeb5SBill WendlingThe MCSymbol class represents a symbol (aka label) in the assembly file.  There
660ff9feeb5SBill Wendlingare two interesting kinds of symbols: assembler temporary symbols, and normal
661ff9feeb5SBill Wendlingsymbols.  Assembler temporary symbols are used and processed by the assembler
662ff9feeb5SBill Wendlingbut are discarded when the object file is produced.  The distinction is usually
663ff9feeb5SBill Wendlingrepresented by adding a prefix to the label, for example "L" labels are
664ff9feeb5SBill Wendlingassembler temporary labels in MachO.
665ff9feeb5SBill Wendling
666ff9feeb5SBill WendlingMCSymbols are created by MCContext and uniqued there.  This means that MCSymbols
667ff9feeb5SBill Wendlingcan be compared for pointer equivalence to find out if they are the same symbol.
668ff9feeb5SBill WendlingNote that pointer inequality does not guarantee the labels will end up at
669ff9feeb5SBill Wendlingdifferent addresses though.  It's perfectly legal to output something like this
670ff9feeb5SBill Wendlingto the .s file:
671ff9feeb5SBill Wendling
672ff9feeb5SBill Wendling::
673ff9feeb5SBill Wendling
674ff9feeb5SBill Wendling  foo:
675ff9feeb5SBill Wendling  bar:
676ff9feeb5SBill Wendling    .byte 4
677ff9feeb5SBill Wendling
678ff9feeb5SBill WendlingIn this case, both the foo and bar symbols will have the same address.
679ff9feeb5SBill Wendling
680ff9feeb5SBill WendlingThe ``MCSection`` class
681ff9feeb5SBill Wendling-----------------------
682ff9feeb5SBill Wendling
683ff9feeb5SBill WendlingThe ``MCSection`` class represents an object-file specific section. It is
684ff9feeb5SBill Wendlingsubclassed by object file specific implementations (e.g. ``MCSectionMachO``,
685ff9feeb5SBill Wendling``MCSectionCOFF``, ``MCSectionELF``) and these are created and uniqued by
686ff9feeb5SBill WendlingMCContext.  The MCStreamer has a notion of the current section, which can be
687ff9feeb5SBill Wendlingchanged with the SwitchToSection method (which corresponds to a ".section"
688ff9feeb5SBill Wendlingdirective in a .s file).
689ff9feeb5SBill Wendling
690ff9feeb5SBill Wendling.. _MCInst:
691ff9feeb5SBill Wendling
692ff9feeb5SBill WendlingThe ``MCInst`` class
693ff9feeb5SBill Wendling--------------------
694ff9feeb5SBill Wendling
695ff9feeb5SBill WendlingThe ``MCInst`` class is a target-independent representation of an instruction.
696ff9feeb5SBill WendlingIt is a simple class (much more so than `MachineInstr`_) that holds a
697ff9feeb5SBill Wendlingtarget-specific opcode and a vector of MCOperands.  MCOperand, in turn, is a
698ff9feeb5SBill Wendlingsimple discriminated union of three cases: 1) a simple immediate, 2) a target
699ff9feeb5SBill Wendlingregister ID, 3) a symbolic expression (e.g. "``Lfoo-Lbar+42``") as an MCExpr.
700ff9feeb5SBill Wendling
701ff9feeb5SBill WendlingMCInst is the common currency used to represent machine instructions at the MC
702ff9feeb5SBill Wendlinglayer.  It is the type used by the instruction encoder, the instruction printer,
703ff9feeb5SBill Wendlingand the type generated by the assembly parser and disassembler.
704ff9feeb5SBill Wendling
705d5745d00SChris Bieneman.. _ObjectFormats:
706d5745d00SChris Bieneman
707d5745d00SChris BienemanObject File Format
708d5745d00SChris Bieneman------------------
709d5745d00SChris Bieneman
710d5745d00SChris BienemanThe MC layer's object writers support a variety of object formats. Because of
711d5745d00SChris Bienemantarget-specific aspects of object formats each target only supports a subset of
712d5745d00SChris Bienemanthe formats supported by the MC layer. Most targets support emitting ELF
713d5745d00SChris Bienemanobjects. Other vendor-specific objects are generally supported only on targets
714d5745d00SChris Bienemanthat are supported by that vendor (i.e. MachO is only supported on targets
715d5745d00SChris Bienemansupported by Darwin, and XCOFF is only supported on targets that support AIX).
716d5745d00SChris BienemanAdditionally some targets have their own object formats (i.e. DirectX, SPIR-V
717d5745d00SChris Bienemanand WebAssembly).
718d5745d00SChris Bieneman
719d5745d00SChris BienemanThe table below captures a snapshot of object file support in LLVM:
720d5745d00SChris Bieneman
721d5745d00SChris Bieneman  .. table:: Object File Formats
722d5745d00SChris Bieneman
723d5745d00SChris Bieneman     ================== ========================================================
724d5745d00SChris Bieneman     Format              Supported Targets
725d5745d00SChris Bieneman     ================== ========================================================
726d5745d00SChris Bieneman     ``COFF``            AArch64, ARM, X86
727d5745d00SChris Bieneman     ``DXContainer``     DirectX
728d5745d00SChris Bieneman     ``ELF``             AArch64, AMDGPU, ARM, AVR, BPF, CSKY, Hexagon, Lanai, LoongArch, M86k, MSP430, MIPS, PowerPC, RISCV, SPARC, SystemZ, VE, X86
729d5745d00SChris Bieneman     ``GCOFF``           SystemZ
730d5745d00SChris Bieneman     ``MachO``           AArch64, ARM, X86
731d5745d00SChris Bieneman     ``SPIR-V``          SPIRV
732d5745d00SChris Bieneman     ``WASM``            WebAssembly
733d5745d00SChris Bieneman     ``XCOFF``           PowerPC
734d5745d00SChris Bieneman     ================== ========================================================
735d5745d00SChris Bieneman
736ff9feeb5SBill Wendling.. _Target-independent algorithms:
737ff9feeb5SBill Wendling.. _code generation algorithm:
738ff9feeb5SBill Wendling
739ff9feeb5SBill WendlingTarget-independent code generation algorithms
740ff9feeb5SBill Wendling=============================================
741ff9feeb5SBill Wendling
742ff9feeb5SBill WendlingThis section documents the phases described in the `high-level design of the
743ff9feeb5SBill Wendlingcode generator`_.  It explains how they work and some of the rationale behind
744ff9feeb5SBill Wendlingtheir design.
745ff9feeb5SBill Wendling
746ff9feeb5SBill Wendling.. _Instruction Selection:
747ff9feeb5SBill Wendling.. _instruction selection section:
748ff9feeb5SBill Wendling
749ff9feeb5SBill WendlingInstruction Selection
750ff9feeb5SBill Wendling---------------------
751ff9feeb5SBill Wendling
752ff9feeb5SBill WendlingInstruction Selection is the process of translating LLVM code presented to the
753ff9feeb5SBill Wendlingcode generator into target-specific machine instructions.  There are several
754ff9feeb5SBill Wendlingwell-known ways to do this in the literature.  LLVM uses a SelectionDAG based
755ff9feeb5SBill Wendlinginstruction selector.
756ff9feeb5SBill Wendling
757ff9feeb5SBill WendlingPortions of the DAG instruction selector are generated from the target
758ff9feeb5SBill Wendlingdescription (``*.td``) files.  Our goal is for the entire instruction selector
759ff9feeb5SBill Wendlingto be generated from these ``.td`` files, though currently there are still
760ff9feeb5SBill Wendlingthings that require custom C++ code.
761ff9feeb5SBill Wendling
762a15f9ff9Spooja2299`GlobalISel <https://llvm.org/docs/GlobalISel/index.html>`_ is another
763a15f9ff9Spooja2299instruction selection framework.
764a15f9ff9Spooja2299
765ff9feeb5SBill Wendling.. _SelectionDAG:
766ff9feeb5SBill Wendling
767ff9feeb5SBill WendlingIntroduction to SelectionDAGs
768ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
769ff9feeb5SBill Wendling
770ff9feeb5SBill WendlingThe SelectionDAG provides an abstraction for code representation in a way that
771ff9feeb5SBill Wendlingis amenable to instruction selection using automatic techniques
772ff9feeb5SBill Wendling(e.g. dynamic-programming based optimal pattern matching selectors). It is also
773ff9feeb5SBill Wendlingwell-suited to other phases of code generation; in particular, instruction
774ff9feeb5SBill Wendlingscheduling (SelectionDAG's are very close to scheduling DAGs post-selection).
775ff9feeb5SBill WendlingAdditionally, the SelectionDAG provides a host representation where a large
776ff9feeb5SBill Wendlingvariety of very-low-level (but target-independent) `optimizations`_ may be
777ff9feeb5SBill Wendlingperformed; ones which require extensive information about the instructions
778ff9feeb5SBill Wendlingefficiently supported by the target.
779ff9feeb5SBill Wendling
780ff9feeb5SBill WendlingThe SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
781ff9feeb5SBill Wendling``SDNode`` class.  The primary payload of the ``SDNode`` is its operation code
782ff9feeb5SBill Wendling(Opcode) that indicates what operation the node performs and the operands to the
783ff9feeb5SBill Wendlingoperation.  The various operation node types are described at the top of the
7845b41ea0dSCharlie Turner``include/llvm/CodeGen/ISDOpcodes.h`` file.
785ff9feeb5SBill Wendling
786ff9feeb5SBill WendlingAlthough most operations define a single value, each node in the graph may
787ff9feeb5SBill Wendlingdefine multiple values.  For example, a combined div/rem operation will define
788ff9feeb5SBill Wendlingboth the dividend and the remainder. Many other situations require multiple
789ff9feeb5SBill Wendlingvalues as well.  Each node also has some number of operands, which are edges to
790ff9feeb5SBill Wendlingthe node defining the used value.  Because nodes may define multiple values,
791ff9feeb5SBill Wendlingedges are represented by instances of the ``SDValue`` class, which is a
792ff9feeb5SBill Wendling``<SDNode, unsigned>`` pair, indicating the node and result value being used,
793ff9feeb5SBill Wendlingrespectively.  Each value produced by an ``SDNode`` has an associated ``MVT``
794ff9feeb5SBill Wendling(Machine Value Type) indicating what the type of the value is.
795ff9feeb5SBill Wendling
796ff9feeb5SBill WendlingSelectionDAGs contain two different kinds of values: those that represent data
797ff9feeb5SBill Wendlingflow and those that represent control flow dependencies.  Data values are simple
798ff9feeb5SBill Wendlingedges with an integer or floating point value type.  Control edges are
799ff9feeb5SBill Wendlingrepresented as "chain" edges which are of type ``MVT::Other``.  These edges
800ff9feeb5SBill Wendlingprovide an ordering between nodes that have side effects (such as loads, stores,
801ff9feeb5SBill Wendlingcalls, returns, etc).  All nodes that have side effects should take a token
802ff9feeb5SBill Wendlingchain as input and produce a new one as output.  By convention, token chain
803ff9feeb5SBill Wendlinginputs are always operand #0, and chain results are always the last value
804907e64b4SMatt Arsenaultproduced by an operation. However, after instruction selection, the
805907e64b4SMatt Arsenaultmachine nodes have their chain after the instruction's operands, and
806907e64b4SMatt Arsenaultmay be followed by glue nodes.
807ff9feeb5SBill Wendling
808ff9feeb5SBill WendlingA SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
809ff9feeb5SBill Wendlingalways a marker node with an Opcode of ``ISD::EntryToken``.  The Root node is
810ff9feeb5SBill Wendlingthe final side-effecting node in the token chain. For example, in a single basic
811ff9feeb5SBill Wendlingblock function it would be the return node.
812ff9feeb5SBill Wendling
813ff9feeb5SBill WendlingOne important concept for SelectionDAGs is the notion of a "legal" vs.
814ff9feeb5SBill Wendling"illegal" DAG.  A legal DAG for a target is one that only uses supported
815ff9feeb5SBill Wendlingoperations and supported types.  On a 32-bit PowerPC, for example, a DAG with a
816ff9feeb5SBill Wendlingvalue of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a
817ff9feeb5SBill WendlingSREM or UREM operation.  The `legalize types`_ and `legalize operations`_ phases
818ff9feeb5SBill Wendlingare responsible for turning an illegal DAG into a legal DAG.
819ff9feeb5SBill Wendling
8207174c5a0SDmitri Gribenko.. _SelectionDAG-Process:
8217174c5a0SDmitri Gribenko
822ff9feeb5SBill WendlingSelectionDAG Instruction Selection Process
823ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
824ff9feeb5SBill Wendling
825ff9feeb5SBill WendlingSelectionDAG-based instruction selection consists of the following steps:
826ff9feeb5SBill Wendling
827ff9feeb5SBill Wendling#. `Build initial DAG`_ --- This stage performs a simple translation from the
828ff9feeb5SBill Wendling   input LLVM code to an illegal SelectionDAG.
829ff9feeb5SBill Wendling
830ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- This stage performs simple optimizations on the
831ff9feeb5SBill Wendling   SelectionDAG to simplify it, and recognize meta instructions (like rotates
832ff9feeb5SBill Wendling   and ``div``/``rem`` pairs) for targets that support these meta operations.
833ff9feeb5SBill Wendling   This makes the resultant code more efficient and the `select instructions
834ff9feeb5SBill Wendling   from DAG`_ phase (below) simpler.
835ff9feeb5SBill Wendling
836ff9feeb5SBill Wendling#. `Legalize SelectionDAG Types`_ --- This stage transforms SelectionDAG nodes
837ff9feeb5SBill Wendling   to eliminate any types that are unsupported on the target.
838ff9feeb5SBill Wendling
839ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to clean up
840ff9feeb5SBill Wendling   redundancies exposed by type legalization.
841ff9feeb5SBill Wendling
842ff9feeb5SBill Wendling#. `Legalize SelectionDAG Ops`_ --- This stage transforms SelectionDAG nodes to
843ff9feeb5SBill Wendling   eliminate any operations that are unsupported on the target.
844ff9feeb5SBill Wendling
845ff9feeb5SBill Wendling#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to eliminate
846ff9feeb5SBill Wendling   inefficiencies introduced by operation legalization.
847ff9feeb5SBill Wendling
848ff9feeb5SBill Wendling#. `Select instructions from DAG`_ --- Finally, the target instruction selector
849ff9feeb5SBill Wendling   matches the DAG operations to target instructions.  This process translates
850ff9feeb5SBill Wendling   the target-independent input DAG into another DAG of target instructions.
851ff9feeb5SBill Wendling
852ff9feeb5SBill Wendling#. `SelectionDAG Scheduling and Formation`_ --- The last phase assigns a linear
853ff9feeb5SBill Wendling   order to the instructions in the target-instruction DAG and emits them into
854ff9feeb5SBill Wendling   the MachineFunction being compiled.  This step uses traditional prepass
855ff9feeb5SBill Wendling   scheduling techniques.
856ff9feeb5SBill Wendling
857ff9feeb5SBill WendlingAfter all of these steps are complete, the SelectionDAG is destroyed and the
858ff9feeb5SBill Wendlingrest of the code generation passes are run.
859ff9feeb5SBill Wendling
860ff9feeb5SBill WendlingOne great way to visualize what is going on here is to take advantage of a few
861ff9feeb5SBill WendlingLLC command line options.  The following options pop up a window displaying the
862ff9feeb5SBill WendlingSelectionDAG at specific times (if you only get errors printed to the console
863ff9feeb5SBill Wendlingwhile using this, you probably `need to configure your
8645b41ea0dSCharlie Turnersystem <ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ to add support for it).
865ff9feeb5SBill Wendling
866ff9feeb5SBill Wendling* ``-view-dag-combine1-dags`` displays the DAG after being built, before the
867ff9feeb5SBill Wendling  first optimization pass.
868ff9feeb5SBill Wendling
869ff9feeb5SBill Wendling* ``-view-legalize-dags`` displays the DAG before Legalization.
870ff9feeb5SBill Wendling
871ff9feeb5SBill Wendling* ``-view-dag-combine2-dags`` displays the DAG before the second optimization
872ff9feeb5SBill Wendling  pass.
873ff9feeb5SBill Wendling
874ff9feeb5SBill Wendling* ``-view-isel-dags`` displays the DAG before the Select phase.
875ff9feeb5SBill Wendling
876ff9feeb5SBill Wendling* ``-view-sched-dags`` displays the DAG before Scheduling.
877ff9feeb5SBill Wendling
878ff9feeb5SBill WendlingThe ``-view-sunit-dags`` displays the Scheduler's dependency graph.  This graph
879ff9feeb5SBill Wendlingis based on the final SelectionDAG, with nodes that must be scheduled together
880ff9feeb5SBill Wendlingbundled into a single scheduling-unit node, and with immediate operands and
881ff9feeb5SBill Wendlingother nodes that aren't relevant for scheduling omitted.
882d8976b8eSMehdi Amini
883d8976b8eSMehdi AminiThe option ``-filter-view-dags`` allows to select the name of the basic block
884d8976b8eSMehdi Aminithat you are interested to visualize and filters all the previous
885d8976b8eSMehdi Amini``view-*-dags`` options.
886ff9feeb5SBill Wendling
887ff9feeb5SBill Wendling.. _Build initial DAG:
888ff9feeb5SBill Wendling
889ff9feeb5SBill WendlingInitial SelectionDAG Construction
890ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
891ff9feeb5SBill Wendling
892ff9feeb5SBill WendlingThe initial SelectionDAG is na\ :raw-html:`&iuml;`\ vely peephole expanded from
8936f6f55eeSEli Benderskythe LLVM input by the ``SelectionDAGBuilder`` class.  The intent of this pass
894ff9feeb5SBill Wendlingis to expose as much low-level, target-specific details to the SelectionDAG as
895ff9feeb5SBill Wendlingpossible.  This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an
896ff9feeb5SBill Wendling``SDNode add`` while a ``getelementptr`` is expanded into the obvious
897ff9feeb5SBill Wendlingarithmetic). This pass requires target-specific hooks to lower calls, returns,
898ff9feeb5SBill Wendlingvarargs, etc.  For these features, the :raw-html:`<tt>` `TargetLowering`_
899ff9feeb5SBill Wendling:raw-html:`</tt>` interface is used.
900ff9feeb5SBill Wendling
901ff9feeb5SBill Wendling.. _legalize types:
902ff9feeb5SBill Wendling.. _Legalize SelectionDAG Types:
903ff9feeb5SBill Wendling.. _Legalize SelectionDAG Ops:
904ff9feeb5SBill Wendling
905ff9feeb5SBill WendlingSelectionDAG LegalizeTypes Phase
906ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
907ff9feeb5SBill Wendling
908ff9feeb5SBill WendlingThe Legalize phase is in charge of converting a DAG to only use the types that
909ff9feeb5SBill Wendlingare natively supported by the target.
910ff9feeb5SBill Wendling
911ff9feeb5SBill WendlingThere are two main ways of converting values of unsupported scalar types to
912ff9feeb5SBill Wendlingvalues of supported types: converting small types to larger types ("promoting"),
913ff9feeb5SBill Wendlingand breaking up large integer types into smaller ones ("expanding").  For
914ff9feeb5SBill Wendlingexample, a target might require that all f32 values are promoted to f64 and that
915ff9feeb5SBill Wendlingall i1/i8/i16 values are promoted to i32.  The same target might require that
916ff9feeb5SBill Wendlingall i64 values be expanded into pairs of i32 values.  These changes can insert
917ff9feeb5SBill Wendlingsign and zero extensions as needed to make sure that the final code has the same
918ff9feeb5SBill Wendlingbehavior as the input.
919ff9feeb5SBill Wendling
920ff9feeb5SBill WendlingThere are two main ways of converting values of unsupported vector types to
921ff9feeb5SBill Wendlingvalue of supported types: splitting vector types, multiple times if necessary,
922ff9feeb5SBill Wendlinguntil a legal type is found, and extending vector types by adding elements to
923ff9feeb5SBill Wendlingthe end to round them out to legal types ("widening").  If a vector gets split
924ff9feeb5SBill Wendlingall the way down to single-element parts with no supported vector type being
925ff9feeb5SBill Wendlingfound, the elements are converted to scalars ("scalarizing").
926ff9feeb5SBill Wendling
927ff9feeb5SBill WendlingA target implementation tells the legalizer which types are supported (and which
928ff9feeb5SBill Wendlingregister class to use for them) by calling the ``addRegisterClass`` method in
9298bd389d8SDmitri Gribenkoits ``TargetLowering`` constructor.
930ff9feeb5SBill Wendling
931ff9feeb5SBill Wendling.. _legalize operations:
932ff9feeb5SBill Wendling.. _Legalizer:
933ff9feeb5SBill Wendling
934ff9feeb5SBill WendlingSelectionDAG Legalize Phase
935ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^
936ff9feeb5SBill Wendling
937ff9feeb5SBill WendlingThe Legalize phase is in charge of converting a DAG to only use the operations
938ff9feeb5SBill Wendlingthat are natively supported by the target.
939ff9feeb5SBill Wendling
940ff9feeb5SBill WendlingTargets often have weird constraints, such as not supporting every operation on
941ff9feeb5SBill Wendlingevery supported datatype (e.g. X86 does not support byte conditional moves and
942ff9feeb5SBill WendlingPowerPC does not support sign-extending loads from a 16-bit memory location).
943ff9feeb5SBill WendlingLegalize takes care of this by open-coding another sequence of operations to
944ff9feeb5SBill Wendlingemulate the operation ("expansion"), by promoting one type to a larger type that
945ff9feeb5SBill Wendlingsupports the operation ("promotion"), or by using a target-specific hook to
946ff9feeb5SBill Wendlingimplement the legalization ("custom").
947ff9feeb5SBill Wendling
948ff9feeb5SBill WendlingA target implementation tells the legalizer which operations are not supported
949ff9feeb5SBill Wendling(and which of the above three actions to take) by calling the
950ff9feeb5SBill Wendling``setOperationAction`` method in its ``TargetLowering`` constructor.
951ff9feeb5SBill Wendling
9526a554188SSanjay PatelIf a target has legal vector types, it is expected to produce efficient machine
9536a554188SSanjay Patelcode for common forms of the shufflevector IR instruction using those types.
9546a554188SSanjay PatelThis may require custom legalization for SelectionDAG vector operations that
9556a554188SSanjay Patelare created from the shufflevector IR. The shufflevector forms that should be
9566a554188SSanjay Patelhandled include:
9576a554188SSanjay Patel
9586a554188SSanjay Patel* Vector select --- Each element of the vector is chosen from either of the
9596a554188SSanjay Patel  corresponding elements of the 2 input vectors. This operation may also be
9606a554188SSanjay Patel  known as a "blend" or "bitwise select" in target assembly. This type of shuffle
9616a554188SSanjay Patel  maps directly to the ``shuffle_vector`` SelectionDAG node.
9626a554188SSanjay Patel
9636a554188SSanjay Patel* Insert subvector --- A vector is placed into a longer vector type starting
9646a554188SSanjay Patel  at index 0. This type of shuffle maps directly to the ``insert_subvector``
9656a554188SSanjay Patel  SelectionDAG node with the ``index`` operand set to 0.
9666a554188SSanjay Patel
9676a554188SSanjay Patel* Extract subvector --- A vector is pulled from a longer vector type starting
9686a554188SSanjay Patel  at index 0. This type of shuffle maps directly to the ``extract_subvector``
9696a554188SSanjay Patel  SelectionDAG node with the ``index`` operand set to 0.
9706a554188SSanjay Patel
9716a554188SSanjay Patel* Splat --- All elements of the vector have identical scalar elements. This
9726a554188SSanjay Patel  operation may also be known as a "broadcast" or "duplicate" in target assembly.
9736a554188SSanjay Patel  The shufflevector IR instruction may change the vector length, so this operation
9746a554188SSanjay Patel  may map to multiple SelectionDAG nodes including ``shuffle_vector``,
9756a554188SSanjay Patel  ``concat_vectors``, ``insert_subvector``, and ``extract_subvector``.
9766a554188SSanjay Patel
977ff9feeb5SBill WendlingPrior to the existence of the Legalize passes, we required that every target
978ff9feeb5SBill Wendling`selector`_ supported and handled every operator and type even if they are not
979ff9feeb5SBill Wendlingnatively supported.  The introduction of the Legalize phases allows all of the
980ff9feeb5SBill Wendlingcanonicalization patterns to be shared across targets, and makes it very easy to
981ff9feeb5SBill Wendlingoptimize the canonicalized code because it is still in the form of a DAG.
982ff9feeb5SBill Wendling
983ff9feeb5SBill Wendling.. _optimizations:
984ff9feeb5SBill Wendling.. _Optimize SelectionDAG:
985ff9feeb5SBill Wendling.. _selector:
986ff9feeb5SBill Wendling
987ff9feeb5SBill WendlingSelectionDAG Optimization Phase: the DAG Combiner
988ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
989ff9feeb5SBill Wendling
990ff9feeb5SBill WendlingThe SelectionDAG optimization phase is run multiple times for code generation,
991ff9feeb5SBill Wendlingimmediately after the DAG is built and once after each legalization.  The first
992ff9feeb5SBill Wendlingrun of the pass allows the initial code to be cleaned up (e.g. performing
993ff9feeb5SBill Wendlingoptimizations that depend on knowing that the operators have restricted type
994ff9feeb5SBill Wendlinginputs).  Subsequent runs of the pass clean up the messy code generated by the
995ff9feeb5SBill WendlingLegalize passes, which allows Legalize to be very simple (it can focus on making
996ff9feeb5SBill Wendlingcode legal instead of focusing on generating *good* and legal code).
997ff9feeb5SBill Wendling
998ff9feeb5SBill WendlingOne important class of optimizations performed is optimizing inserted sign and
999ff9feeb5SBill Wendlingzero extension instructions.  We currently use ad-hoc techniques, but could move
1000ff9feeb5SBill Wendlingto more rigorous techniques in the future.  Here are some good papers on the
1001ff9feeb5SBill Wendlingsubject:
1002ff9feeb5SBill Wendling
1003ff9feeb5SBill Wendling"`Widening integer arithmetic <http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html>`_" :raw-html:`<br>`
1004ff9feeb5SBill WendlingKevin Redwine and Norman Ramsey :raw-html:`<br>`
1005ff9feeb5SBill WendlingInternational Conference on Compiler Construction (CC) 2004
1006ff9feeb5SBill Wendling
1007ff9feeb5SBill Wendling"`Effective sign extension elimination <http://portal.acm.org/citation.cfm?doid=512529.512552>`_"  :raw-html:`<br>`
1008ff9feeb5SBill WendlingMotohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani :raw-html:`<br>`
1009ff9feeb5SBill WendlingProceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
1010ff9feeb5SBill Wendlingand Implementation.
1011ff9feeb5SBill Wendling
1012ff9feeb5SBill Wendling.. _Select instructions from DAG:
1013ff9feeb5SBill Wendling
1014ff9feeb5SBill WendlingSelectionDAG Select Phase
1015ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^
1016ff9feeb5SBill Wendling
1017ff9feeb5SBill WendlingThe Select phase is the bulk of the target-specific code for instruction
1018ff9feeb5SBill Wendlingselection.  This phase takes a legal SelectionDAG as input, pattern matches the
1019ff9feeb5SBill Wendlinginstructions supported by the target to this DAG, and produces a new DAG of
1020ff9feeb5SBill Wendlingtarget code.  For example, consider the following LLVM fragment:
1021ff9feeb5SBill Wendling
1022ff9feeb5SBill Wendling.. code-block:: llvm
1023ff9feeb5SBill Wendling
1024ff9feeb5SBill Wendling  %t1 = fadd float %W, %X
1025ff9feeb5SBill Wendling  %t2 = fmul float %t1, %Y
1026ff9feeb5SBill Wendling  %t3 = fadd float %t2, %Z
1027ff9feeb5SBill Wendling
1028ff9feeb5SBill WendlingThis LLVM code corresponds to a SelectionDAG that looks basically like this:
1029ff9feeb5SBill Wendling
1030124f2593SRenato Golin.. code-block:: text
1031ff9feeb5SBill Wendling
1032ff9feeb5SBill Wendling  (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
1033ff9feeb5SBill Wendling
1034ff9feeb5SBill WendlingIf a target supports floating point multiply-and-add (FMA) operations, one of
1035ff9feeb5SBill Wendlingthe adds can be merged with the multiply.  On the PowerPC, for example, the
1036ff9feeb5SBill Wendlingoutput of the instruction selector might look like this DAG:
1037ff9feeb5SBill Wendling
1038ff9feeb5SBill Wendling::
1039ff9feeb5SBill Wendling
1040ff9feeb5SBill Wendling  (FMADDS (FADDS W, X), Y, Z)
1041ff9feeb5SBill Wendling
1042ff9feeb5SBill WendlingThe ``FMADDS`` instruction is a ternary instruction that multiplies its first
1043ff9feeb5SBill Wendlingtwo operands and adds the third (as single-precision floating-point numbers).
1044ff9feeb5SBill WendlingThe ``FADDS`` instruction is a simple binary single-precision add instruction.
1045ff9feeb5SBill WendlingTo perform this pattern match, the PowerPC backend includes the following
1046ff9feeb5SBill Wendlinginstruction definitions:
1047ff9feeb5SBill Wendling
104833f2c07cSSean Silva.. code-block:: text
104933f2c07cSSean Silva  :emphasize-lines: 4-5,9
1050ff9feeb5SBill Wendling
1051ff9feeb5SBill Wendling  def FMADDS : AForm_1<59, 29,
1052ff9feeb5SBill Wendling                      (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
1053ff9feeb5SBill Wendling                      "fmadds $FRT, $FRA, $FRC, $FRB",
1054ff9feeb5SBill Wendling                      [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
1055ff9feeb5SBill Wendling                                             F4RC:$FRB))]>;
1056ff9feeb5SBill Wendling  def FADDS : AForm_2<59, 21,
1057ff9feeb5SBill Wendling                      (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
1058ff9feeb5SBill Wendling                      "fadds $FRT, $FRA, $FRB",
1059ff9feeb5SBill Wendling                      [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;
1060ff9feeb5SBill Wendling
106133f2c07cSSean SilvaThe highlighted portion of the instruction definitions indicates the pattern
106233f2c07cSSean Silvaused to match the instructions. The DAG operators (like ``fmul``/``fadd``)
106333f2c07cSSean Silvaare defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file.
106433f2c07cSSean Silva"``F4RC``" is the register class of the input and result values.
1065ff9feeb5SBill Wendling
1066ff9feeb5SBill WendlingThe TableGen DAG instruction selector generator reads the instruction patterns
1067ff9feeb5SBill Wendlingin the ``.td`` file and automatically builds parts of the pattern matching code
1068ff9feeb5SBill Wendlingfor your target.  It has the following strengths:
1069ff9feeb5SBill Wendling
1070ebba0507SJonathan Roelofs* At compiler-compile time, it analyzes your instruction patterns and tells you
1071ff9feeb5SBill Wendling  if your patterns make sense or not.
1072ff9feeb5SBill Wendling
1073ff9feeb5SBill Wendling* It can handle arbitrary constraints on operands for the pattern match.  In
1074ff9feeb5SBill Wendling  particular, it is straight-forward to say things like "match any immediate
1075ff9feeb5SBill Wendling  that is a 13-bit sign-extended value".  For examples, see the ``immSExt16``
1076ff9feeb5SBill Wendling  and related ``tblgen`` classes in the PowerPC backend.
1077ff9feeb5SBill Wendling
1078ff9feeb5SBill Wendling* It knows several important identities for the patterns defined.  For example,
1079ff9feeb5SBill Wendling  it knows that addition is commutative, so it allows the ``FMADDS`` pattern
1080ff9feeb5SBill Wendling  above to match "``(fadd X, (fmul Y, Z))``" as well as "``(fadd (fmul X, Y),
1081ff9feeb5SBill Wendling  Z)``", without the target author having to specially handle this case.
1082ff9feeb5SBill Wendling
1083ff9feeb5SBill Wendling* It has a full-featured type-inferencing system.  In particular, you should
1084ff9feeb5SBill Wendling  rarely have to explicitly tell the system what type parts of your patterns
1085ff9feeb5SBill Wendling  are.  In the ``FMADDS`` case above, we didn't have to tell ``tblgen`` that all
1086ff9feeb5SBill Wendling  of the nodes in the pattern are of type 'f32'.  It was able to infer and
1087ff9feeb5SBill Wendling  propagate this knowledge from the fact that ``F4RC`` has type 'f32'.
1088ff9feeb5SBill Wendling
1089ff9feeb5SBill Wendling* Targets can define their own (and rely on built-in) "pattern fragments".
1090ff9feeb5SBill Wendling  Pattern fragments are chunks of reusable patterns that get inlined into your
1091ebba0507SJonathan Roelofs  patterns during compiler-compile time.  For example, the integer "``(not
1092ff9feeb5SBill Wendling  x)``" operation is actually defined as a pattern fragment that expands as
1093ff9feeb5SBill Wendling  "``(xor x, -1)``", since the SelectionDAG does not have a native '``not``'
1094ff9feeb5SBill Wendling  operation.  Targets can define their own short-hand fragments as they see fit.
1095ff9feeb5SBill Wendling  See the definition of '``not``' and '``ineg``' for examples.
1096ff9feeb5SBill Wendling
1097ff9feeb5SBill Wendling* In addition to instructions, targets can specify arbitrary patterns that map
1098ff9feeb5SBill Wendling  to one or more instructions using the 'Pat' class.  For example, the PowerPC
1099ff9feeb5SBill Wendling  has no way to load an arbitrary integer immediate into a register in one
1100ff9feeb5SBill Wendling  instruction. To tell tblgen how to do this, it defines:
1101ff9feeb5SBill Wendling
1102ff9feeb5SBill Wendling  ::
1103ff9feeb5SBill Wendling
1104ff9feeb5SBill Wendling    // Arbitrary immediate support.  Implement in terms of LIS/ORI.
1105ff9feeb5SBill Wendling    def : Pat<(i32 imm:$imm),
1106ff9feeb5SBill Wendling              (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
1107ff9feeb5SBill Wendling
1108ff9feeb5SBill Wendling  If none of the single-instruction patterns for loading an immediate into a
1109ff9feeb5SBill Wendling  register match, this will be used.  This rule says "match an arbitrary i32
1110ff9feeb5SBill Wendling  immediate, turning it into an ``ORI`` ('or a 16-bit immediate') and an ``LIS``
1111ff9feeb5SBill Wendling  ('load 16-bit immediate, where the immediate is shifted to the left 16 bits')
1112ff9feeb5SBill Wendling  instruction".  To make this work, the ``LO16``/``HI16`` node transformations
1113ff9feeb5SBill Wendling  are used to manipulate the input immediate (in this case, take the high or low
1114ff9feeb5SBill Wendling  16-bits of the immediate).
1115ff9feeb5SBill Wendling
1116e618abd6SUlrich Weigand* When using the 'Pat' class to map a pattern to an instruction that has one
1117e618abd6SUlrich Weigand  or more complex operands (like e.g. `X86 addressing mode`_), the pattern may
1118e618abd6SUlrich Weigand  either specify the operand as a whole using a ``ComplexPattern``, or else it
1119e618abd6SUlrich Weigand  may specify the components of the complex operand separately.  The latter is
1120e618abd6SUlrich Weigand  done e.g. for pre-increment instructions by the PowerPC back end:
1121e618abd6SUlrich Weigand
1122e618abd6SUlrich Weigand  ::
1123e618abd6SUlrich Weigand
1124e618abd6SUlrich Weigand    def STWU  : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst),
1125e618abd6SUlrich Weigand                    "stwu $rS, $dst", LdStStoreUpd, []>,
1126e618abd6SUlrich Weigand                    RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">;
1127e618abd6SUlrich Weigand
1128e618abd6SUlrich Weigand    def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff),
1129e618abd6SUlrich Weigand              (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>;
1130e618abd6SUlrich Weigand
1131e618abd6SUlrich Weigand  Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the
1132e618abd6SUlrich Weigand  complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction.
1133e618abd6SUlrich Weigand
1134ff9feeb5SBill Wendling* While the system does automate a lot, it still allows you to write custom C++
1135ff9feeb5SBill Wendling  code to match special cases if there is something that is hard to
1136ff9feeb5SBill Wendling  express.
1137ff9feeb5SBill Wendling
1138ff9feeb5SBill WendlingWhile it has many strengths, the system currently has some limitations,
1139ff9feeb5SBill Wendlingprimarily because it is a work in progress and is not yet finished:
1140ff9feeb5SBill Wendling
1141ff9feeb5SBill Wendling* Overall, there is no way to define or match SelectionDAG nodes that define
1142ff9feeb5SBill Wendling  multiple values (e.g. ``SMUL_LOHI``, ``LOAD``, ``CALL``, etc).  This is the
1143ff9feeb5SBill Wendling  biggest reason that you currently still *have to* write custom C++ code
1144ff9feeb5SBill Wendling  for your instruction selector.
1145ff9feeb5SBill Wendling
1146ff9feeb5SBill Wendling* There is no great way to support matching complex addressing modes yet.  In
1147ff9feeb5SBill Wendling  the future, we will extend pattern fragments to allow them to define multiple
1148ff9feeb5SBill Wendling  values (e.g. the four operands of the `X86 addressing mode`_, which are
1149ff9feeb5SBill Wendling  currently matched with custom C++ code).  In addition, we'll extend fragments
1150ff9feeb5SBill Wendling  so that a fragment can match multiple different patterns.
1151ff9feeb5SBill Wendling
1152ff9feeb5SBill Wendling* We don't automatically infer flags like ``isStore``/``isLoad`` yet.
1153ff9feeb5SBill Wendling
1154ff9feeb5SBill Wendling* We don't automatically generate the set of supported registers and operations
1155ff9feeb5SBill Wendling  for the `Legalizer`_ yet.
1156ff9feeb5SBill Wendling
1157ff9feeb5SBill Wendling* We don't have a way of tying in custom legalized nodes yet.
1158ff9feeb5SBill Wendling
1159ff9feeb5SBill WendlingDespite these limitations, the instruction selector generator is still quite
1160ff9feeb5SBill Wendlinguseful for most of the binary and logical operations in typical instruction
1161ff9feeb5SBill Wendlingsets.  If you run into any problems or can't figure out how to do something,
1162ff9feeb5SBill Wendlingplease let Chris know!
1163ff9feeb5SBill Wendling
1164ff9feeb5SBill Wendling.. _Scheduling and Formation:
1165ff9feeb5SBill Wendling.. _SelectionDAG Scheduling and Formation:
1166ff9feeb5SBill Wendling
1167ff9feeb5SBill WendlingSelectionDAG Scheduling and Formation Phase
1168ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1169ff9feeb5SBill Wendling
1170ff9feeb5SBill WendlingThe scheduling phase takes the DAG of target instructions from the selection
1171ff9feeb5SBill Wendlingphase and assigns an order.  The scheduler can pick an order depending on
1172ff9feeb5SBill Wendlingvarious constraints of the machines (i.e. order for minimal register pressure or
1173ff9feeb5SBill Wendlingtry to cover instruction latencies).  Once an order is established, the DAG is
1174ff9feeb5SBill Wendlingconverted to a list of :raw-html:`<tt>` `MachineInstr`_\s :raw-html:`</tt>` and
1175ff9feeb5SBill Wendlingthe SelectionDAG is destroyed.
1176ff9feeb5SBill Wendling
1177ff9feeb5SBill WendlingNote that this phase is logically separate from the instruction selection phase,
1178ff9feeb5SBill Wendlingbut is tied to it closely in the code because it operates on SelectionDAGs.
1179ff9feeb5SBill Wendling
1180ff9feeb5SBill WendlingFuture directions for the SelectionDAG
1181ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1182ff9feeb5SBill Wendling
1183ff9feeb5SBill Wendling#. Optional function-at-a-time selection.
1184ff9feeb5SBill Wendling
1185ff9feeb5SBill Wendling#. Auto-generate entire selector from ``.td`` file.
1186ff9feeb5SBill Wendling
1187ff9feeb5SBill Wendling.. _SSA-based Machine Code Optimizations:
1188ff9feeb5SBill Wendling
1189ff9feeb5SBill WendlingSSA-based Machine Code Optimizations
1190ff9feeb5SBill Wendling------------------------------------
1191ff9feeb5SBill Wendling
1192ff9feeb5SBill WendlingTo Be Written
1193ff9feeb5SBill Wendling
1194ff9feeb5SBill WendlingLive Intervals
1195ff9feeb5SBill Wendling--------------
1196ff9feeb5SBill Wendling
1197ff9feeb5SBill WendlingLive Intervals are the ranges (intervals) where a variable is *live*.  They are
1198ff9feeb5SBill Wendlingused by some `register allocator`_ passes to determine if two or more virtual
1199ff9feeb5SBill Wendlingregisters which require the same physical register are live at the same point in
1200ff9feeb5SBill Wendlingthe program (i.e., they conflict).  When this situation occurs, one virtual
1201ff9feeb5SBill Wendlingregister must be *spilled*.
1202ff9feeb5SBill Wendling
1203ff9feeb5SBill WendlingLive Variable Analysis
1204ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^
1205ff9feeb5SBill Wendling
1206ff9feeb5SBill WendlingThe first step in determining the live intervals of variables is to calculate
1207ff9feeb5SBill Wendlingthe set of registers that are immediately dead after the instruction (i.e., the
1208ff9feeb5SBill Wendlinginstruction calculates the value, but it is never used) and the set of registers
1209ff9feeb5SBill Wendlingthat are used by the instruction, but are never used after the instruction
1210ff9feeb5SBill Wendling(i.e., they are killed). Live variable information is computed for
1211ff9feeb5SBill Wendlingeach *virtual* register and *register allocatable* physical register
1212ff9feeb5SBill Wendlingin the function.  This is done in a very efficient manner because it uses SSA to
1213ff9feeb5SBill Wendlingsparsely compute lifetime information for virtual registers (which are in SSA
1214ff9feeb5SBill Wendlingform) and only has to track physical registers within a block.  Before register
1215ff9feeb5SBill Wendlingallocation, LLVM can assume that physical registers are only live within a
1216ff9feeb5SBill Wendlingsingle basic block.  This allows it to do a single, local analysis to resolve
1217ff9feeb5SBill Wendlingphysical register lifetimes within each basic block. If a physical register is
1218ff9feeb5SBill Wendlingnot register allocatable (e.g., a stack pointer or condition codes), it is not
1219ff9feeb5SBill Wendlingtracked.
1220ff9feeb5SBill Wendling
1221ff9feeb5SBill WendlingPhysical registers may be live in to or out of a function. Live in values are
1222ff9feeb5SBill Wendlingtypically arguments in registers. Live out values are typically return values in
1223ff9feeb5SBill Wendlingregisters. Live in values are marked as such, and are given a dummy "defining"
1224ff9feeb5SBill Wendlinginstruction during live intervals analysis. If the last basic block of a
1225ff9feeb5SBill Wendlingfunction is a ``return``, then it's marked as using all live out values in the
1226ff9feeb5SBill Wendlingfunction.
1227ff9feeb5SBill Wendling
1228ff9feeb5SBill Wendling``PHI`` nodes need to be handled specially, because the calculation of the live
1229ff9feeb5SBill Wendlingvariable information from a depth first traversal of the CFG of the function
1230ff9feeb5SBill Wendlingwon't guarantee that a virtual register used by the ``PHI`` node is defined
1231ff9feeb5SBill Wendlingbefore it's used. When a ``PHI`` node is encountered, only the definition is
1232ff9feeb5SBill Wendlinghandled, because the uses will be handled in other basic blocks.
1233ff9feeb5SBill Wendling
1234ff9feeb5SBill WendlingFor each ``PHI`` node of the current basic block, we simulate an assignment at
1235ff9feeb5SBill Wendlingthe end of the current basic block and traverse the successor basic blocks. If a
1236ff9feeb5SBill Wendlingsuccessor basic block has a ``PHI`` node and one of the ``PHI`` node's operands
1237ff9feeb5SBill Wendlingis coming from the current basic block, then the variable is marked as *alive*
1238ff9feeb5SBill Wendlingwithin the current basic block and all of its predecessor basic blocks, until
1239ff9feeb5SBill Wendlingthe basic block with the defining instruction is encountered.
1240ff9feeb5SBill Wendling
1241ff9feeb5SBill WendlingLive Intervals Analysis
1242ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^
1243ff9feeb5SBill Wendling
1244ff9feeb5SBill WendlingWe now have the information available to perform the live intervals analysis and
1245ff9feeb5SBill Wendlingbuild the live intervals themselves.  We start off by numbering the basic blocks
1246ff9feeb5SBill Wendlingand machine instructions.  We then handle the "live-in" values.  These are in
1247ff9feeb5SBill Wendlingphysical registers, so the physical register is assumed to be killed by the end
1248ff9feeb5SBill Wendlingof the basic block.  Live intervals for virtual registers are computed for some
1249ff9feeb5SBill Wendlingordering of the machine instructions ``[1, N]``.  A live interval is an interval
1250ff9feeb5SBill Wendling``[i, j)``, where ``1 >= i >= j > N``, for which a variable is live.
1251ff9feeb5SBill Wendling
1252ff9feeb5SBill Wendling.. note::
1253ff9feeb5SBill Wendling  More to come...
1254ff9feeb5SBill Wendling
1255ff9feeb5SBill Wendling.. _Register Allocation:
1256ff9feeb5SBill Wendling.. _register allocator:
1257ff9feeb5SBill Wendling
1258ff9feeb5SBill WendlingRegister Allocation
1259ff9feeb5SBill Wendling-------------------
1260ff9feeb5SBill Wendling
1261ff9feeb5SBill WendlingThe *Register Allocation problem* consists in mapping a program
1262ff9feeb5SBill Wendling:raw-html:`<b><tt>` P\ :sub:`v`\ :raw-html:`</tt></b>`, that can use an unbounded
1263ff9feeb5SBill Wendlingnumber of virtual registers, to a program :raw-html:`<b><tt>` P\ :sub:`p`\
1264ff9feeb5SBill Wendling:raw-html:`</tt></b>` that contains a finite (possibly small) number of physical
1265ff9feeb5SBill Wendlingregisters. Each target architecture has a different number of physical
1266ff9feeb5SBill Wendlingregisters. If the number of physical registers is not enough to accommodate all
1267ff9feeb5SBill Wendlingthe virtual registers, some of them will have to be mapped into memory. These
1268ff9feeb5SBill Wendlingvirtuals are called *spilled virtuals*.
1269ff9feeb5SBill Wendling
1270ff9feeb5SBill WendlingHow registers are represented in LLVM
1271ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1272ff9feeb5SBill Wendling
1273ff9feeb5SBill WendlingIn LLVM, physical registers are denoted by integer numbers that normally range
1274ff9feeb5SBill Wendlingfrom 1 to 1023. To see how this numbering is defined for a particular
1275ff9feeb5SBill Wendlingarchitecture, you can read the ``GenRegisterNames.inc`` file for that
1276ff9feeb5SBill Wendlingarchitecture. For instance, by inspecting
1277ff9feeb5SBill Wendling``lib/Target/X86/X86GenRegisterInfo.inc`` we see that the 32-bit register
1278ff9feeb5SBill Wendling``EAX`` is denoted by 43, and the MMX register ``MM0`` is mapped to 65.
1279ff9feeb5SBill Wendling
1280ff9feeb5SBill WendlingSome architectures contain registers that share the same physical location. A
1281ff9feeb5SBill Wendlingnotable example is the X86 platform. For instance, in the X86 architecture, the
1282ff9feeb5SBill Wendlingregisters ``EAX``, ``AX`` and ``AL`` share the first eight bits. These physical
1283ff9feeb5SBill Wendlingregisters are marked as *aliased* in LLVM. Given a particular architecture, you
1284ff9feeb5SBill Wendlingcan check which registers are aliased by inspecting its ``RegisterInfo.td``
1285ff9feeb5SBill Wendlingfile. Moreover, the class ``MCRegAliasIterator`` enumerates all the physical
1286ff9feeb5SBill Wendlingregisters aliased to a register.
1287ff9feeb5SBill Wendling
1288ff9feeb5SBill WendlingPhysical registers, in LLVM, are grouped in *Register Classes*.  Elements in the
1289ff9feeb5SBill Wendlingsame register class are functionally equivalent, and can be interchangeably
1290ff9feeb5SBill Wendlingused. Each virtual register can only be mapped to physical registers of a
1291ff9feeb5SBill Wendlingparticular class. For instance, in the X86 architecture, some virtuals can only
1292ff9feeb5SBill Wendlingbe allocated to 8 bit registers.  A register class is described by
1293ff9feeb5SBill Wendling``TargetRegisterClass`` objects.  To discover if a virtual register is
1294ff6a7d6dSSean Silvacompatible with a given physical, this code can be used:
1295ff9feeb5SBill Wendling
1296ff9feeb5SBill Wendling.. code-block:: c++
1297ff9feeb5SBill Wendling
1298ff9feeb5SBill Wendling  bool RegMapping_Fer::compatible_class(MachineFunction &mf,
1299ff9feeb5SBill Wendling                                        unsigned v_reg,
1300ff9feeb5SBill Wendling                                        unsigned p_reg) {
1301ff9feeb5SBill Wendling    assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
1302ff9feeb5SBill Wendling           "Target register must be physical");
1303ff9feeb5SBill Wendling    const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
1304ff9feeb5SBill Wendling    return trc->contains(p_reg);
1305ff9feeb5SBill Wendling  }
1306ff9feeb5SBill Wendling
1307ff9feeb5SBill WendlingSometimes, mostly for debugging purposes, it is useful to change the number of
1308ff9feeb5SBill Wendlingphysical registers available in the target architecture. This must be done
1309f65d4aa9SKazuaki Ishizakistatically, inside the ``TargetRegisterInfo.td`` file. Just ``grep`` for
1310ff9feeb5SBill Wendling``RegisterClass``, the last parameter of which is a list of registers. Just
1311ff9feeb5SBill Wendlingcommenting some out is one simple way to avoid them being used. A more polite
1312ff9feeb5SBill Wendlingway is to explicitly exclude some registers from the *allocation order*. See the
1313ff9feeb5SBill Wendlingdefinition of the ``GR8`` register class in
1314ff9feeb5SBill Wendling``lib/Target/X86/X86RegisterInfo.td`` for an example of this.
1315ff9feeb5SBill Wendling
1316ff9feeb5SBill WendlingVirtual registers are also denoted by integer numbers. Contrary to physical
1317ff9feeb5SBill Wendlingregisters, different virtual registers never share the same number. Whereas
1318ff9feeb5SBill Wendlingphysical registers are statically defined in a ``TargetRegisterInfo.td`` file
1319ff9feeb5SBill Wendlingand cannot be created by the application developer, that is not the case with
1320ff9feeb5SBill Wendlingvirtual registers. In order to create new virtual registers, use the method
1321ff9feeb5SBill Wendling``MachineRegisterInfo::createVirtualRegister()``. This method will return a new
1322ff9feeb5SBill Wendlingvirtual register. Use an ``IndexedMap<Foo, VirtReg2IndexFunctor>`` to hold
1323ff9feeb5SBill Wendlinginformation per virtual register. If you need to enumerate all virtual
1324ff9feeb5SBill Wendlingregisters, use the function ``TargetRegisterInfo::index2VirtReg()`` to find the
1325ff9feeb5SBill Wendlingvirtual register numbers:
1326ff9feeb5SBill Wendling
1327ff9feeb5SBill Wendling.. code-block:: c++
1328ff9feeb5SBill Wendling
1329ff9feeb5SBill Wendling    for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
1330ff9feeb5SBill Wendling      unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
1331ff9feeb5SBill Wendling      stuff(VirtReg);
1332ff9feeb5SBill Wendling    }
1333ff9feeb5SBill Wendling
1334ff9feeb5SBill WendlingBefore register allocation, the operands of an instruction are mostly virtual
1335ff9feeb5SBill Wendlingregisters, although physical registers may also be used. In order to check if a
1336ff9feeb5SBill Wendlinggiven machine operand is a register, use the boolean function
1337ff9feeb5SBill Wendling``MachineOperand::isRegister()``. To obtain the integer code of a register, use
1338ff9feeb5SBill Wendling``MachineOperand::getReg()``. An instruction may define or use a register. For
1339ff9feeb5SBill Wendlinginstance, ``ADD reg:1026 := reg:1025 reg:1024`` defines the registers 1024, and
1340ff9feeb5SBill Wendlinguses registers 1025 and 1026. Given a register operand, the method
1341ff9feeb5SBill Wendling``MachineOperand::isUse()`` informs if that register is being used by the
1342ff9feeb5SBill Wendlinginstruction. The method ``MachineOperand::isDef()`` informs if that registers is
1343ff9feeb5SBill Wendlingbeing defined.
1344ff9feeb5SBill Wendling
1345ff9feeb5SBill WendlingWe will call physical registers present in the LLVM bitcode before register
1346ff9feeb5SBill Wendlingallocation *pre-colored registers*. Pre-colored registers are used in many
1347ff9feeb5SBill Wendlingdifferent situations, for instance, to pass parameters of functions calls, and
1348ff9feeb5SBill Wendlingto store results of particular instructions. There are two types of pre-colored
1349ff9feeb5SBill Wendlingregisters: the ones *implicitly* defined, and those *explicitly*
1350ff9feeb5SBill Wendlingdefined. Explicitly defined registers are normal operands, and can be accessed
1351ff9feeb5SBill Wendlingwith ``MachineInstr::getOperand(int)::getReg()``.  In order to check which
1352ff9feeb5SBill Wendlingregisters are implicitly defined by an instruction, use the
1353ff9feeb5SBill Wendling``TargetInstrInfo::get(opcode)::ImplicitDefs``, where ``opcode`` is the opcode
1354ff9feeb5SBill Wendlingof the target instruction. One important difference between explicit and
1355ff9feeb5SBill Wendlingimplicit physical registers is that the latter are defined statically for each
1356ff9feeb5SBill Wendlinginstruction, whereas the former may vary depending on the program being
1357ff9feeb5SBill Wendlingcompiled. For example, an instruction that represents a function call will
1358ff9feeb5SBill Wendlingalways implicitly define or use the same set of physical registers. To read the
1359ff9feeb5SBill Wendlingregisters implicitly used by an instruction, use
1360ff9feeb5SBill Wendling``TargetInstrInfo::get(opcode)::ImplicitUses``. Pre-colored registers impose
1361ff9feeb5SBill Wendlingconstraints on any register allocation algorithm. The register allocator must
1362ff9feeb5SBill Wendlingmake sure that none of them are overwritten by the values of virtual registers
1363ff9feeb5SBill Wendlingwhile still alive.
1364ff9feeb5SBill Wendling
1365ff9feeb5SBill WendlingMapping virtual registers to physical registers
1366ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1367ff9feeb5SBill Wendling
1368ff9feeb5SBill WendlingThere are two ways to map virtual registers to physical registers (or to memory
1369ff9feeb5SBill Wendlingslots). The first way, that we will call *direct mapping*, is based on the use
1370ff9feeb5SBill Wendlingof methods of the classes ``TargetRegisterInfo``, and ``MachineOperand``. The
1371ff9feeb5SBill Wendlingsecond way, that we will call *indirect mapping*, relies on the ``VirtRegMap``
1372ff9feeb5SBill Wendlingclass in order to insert loads and stores sending and getting values to and from
1373ff9feeb5SBill Wendlingmemory.
1374ff9feeb5SBill Wendling
1375ff9feeb5SBill WendlingThe direct mapping provides more flexibility to the developer of the register
1376ff9feeb5SBill Wendlingallocator; however, it is more error prone, and demands more implementation
1377ff9feeb5SBill Wendlingwork.  Basically, the programmer will have to specify where load and store
1378ff9feeb5SBill Wendlinginstructions should be inserted in the target function being compiled in order
1379ff9feeb5SBill Wendlingto get and store values in memory. To assign a physical register to a virtual
1380ff9feeb5SBill Wendlingregister present in a given operand, use ``MachineOperand::setReg(p_reg)``. To
1381ff9feeb5SBill Wendlinginsert a store instruction, use ``TargetInstrInfo::storeRegToStackSlot(...)``,
1382ff9feeb5SBill Wendlingand to insert a load instruction, use ``TargetInstrInfo::loadRegFromStackSlot``.
1383ff9feeb5SBill Wendling
1384ff9feeb5SBill WendlingThe indirect mapping shields the application developer from the complexities of
1385ff9feeb5SBill Wendlinginserting load and store instructions. In order to map a virtual register to a
1386ff9feeb5SBill Wendlingphysical one, use ``VirtRegMap::assignVirt2Phys(vreg, preg)``.  In order to map
1387ff9feeb5SBill Wendlinga certain virtual register to memory, use
1388ff9feeb5SBill Wendling``VirtRegMap::assignVirt2StackSlot(vreg)``. This method will return the stack
1389ff9feeb5SBill Wendlingslot where ``vreg``'s value will be located.  If it is necessary to map another
1390ff9feeb5SBill Wendlingvirtual register to the same stack slot, use
1391ff9feeb5SBill Wendling``VirtRegMap::assignVirt2StackSlot(vreg, stack_location)``. One important point
1392ff9feeb5SBill Wendlingto consider when using the indirect mapping, is that even if a virtual register
1393ff9feeb5SBill Wendlingis mapped to memory, it still needs to be mapped to a physical register. This
1394ff9feeb5SBill Wendlingphysical register is the location where the virtual register is supposed to be
1395ff9feeb5SBill Wendlingfound before being stored or after being reloaded.
1396ff9feeb5SBill Wendling
1397ff9feeb5SBill WendlingIf the indirect strategy is used, after all the virtual registers have been
1398ff9feeb5SBill Wendlingmapped to physical registers or stack slots, it is necessary to use a spiller
1399ff9feeb5SBill Wendlingobject to place load and store instructions in the code. Every virtual that has
14001e61ffddSEric Christopherbeen mapped to a stack slot will be stored to memory after being defined and will
1401ff9feeb5SBill Wendlingbe loaded before being used. The implementation of the spiller tries to recycle
1402ff9feeb5SBill Wendlingload/store instructions, avoiding unnecessary instructions. For an example of
1403ff9feeb5SBill Wendlinghow to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in
1404ff9feeb5SBill Wendling``lib/CodeGen/RegAllocLinearScan.cpp``.
1405ff9feeb5SBill Wendling
1406ff9feeb5SBill WendlingHandling two address instructions
1407ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1408ff9feeb5SBill Wendling
1409ff9feeb5SBill WendlingWith very rare exceptions (e.g., function calls), the LLVM machine code
1410ff9feeb5SBill Wendlinginstructions are three address instructions. That is, each instruction is
1411ff9feeb5SBill Wendlingexpected to define at most one register, and to use at most two registers.
1412ff9feeb5SBill WendlingHowever, some architectures use two address instructions. In this case, the
14131e61ffddSEric Christopherdefined register is also one of the used registers. For instance, an instruction
1414ff9feeb5SBill Wendlingsuch as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX +
1415ff9feeb5SBill Wendling%EBX``.
1416ff9feeb5SBill Wendling
1417ff9feeb5SBill WendlingIn order to produce correct code, LLVM must convert three address instructions
1418ff9feeb5SBill Wendlingthat represent two address instructions into true two address instructions. LLVM
1419ff9feeb5SBill Wendlingprovides the pass ``TwoAddressInstructionPass`` for this specific purpose. It
1420ff9feeb5SBill Wendlingmust be run before register allocation takes place. After its execution, the
1421ff9feeb5SBill Wendlingresulting code may no longer be in SSA form. This happens, for instance, in
1422ff9feeb5SBill Wendlingsituations where an instruction such as ``%a = ADD %b %c`` is converted to two
1423ff9feeb5SBill Wendlinginstructions such as:
1424ff9feeb5SBill Wendling
1425ff9feeb5SBill Wendling::
1426ff9feeb5SBill Wendling
1427ff9feeb5SBill Wendling  %a = MOVE %b
1428ff9feeb5SBill Wendling  %a = ADD %a %c
1429ff9feeb5SBill Wendling
1430ff9feeb5SBill WendlingNotice that, internally, the second instruction is represented as ``ADD
1431ff9feeb5SBill Wendling%a[def/use] %c``. I.e., the register operand ``%a`` is both used and defined by
1432ff9feeb5SBill Wendlingthe instruction.
1433ff9feeb5SBill Wendling
1434ff9feeb5SBill WendlingThe SSA deconstruction phase
1435ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1436ff9feeb5SBill Wendling
1437ff9feeb5SBill WendlingAn important transformation that happens during register allocation is called
1438ff9feeb5SBill Wendlingthe *SSA Deconstruction Phase*. The SSA form simplifies many analyses that are
1439ff9feeb5SBill Wendlingperformed on the control flow graph of programs. However, traditional
1440ff9feeb5SBill Wendlinginstruction sets do not implement PHI instructions. Thus, in order to generate
1441ff9feeb5SBill Wendlingexecutable code, compilers must replace PHI instructions with other instructions
1442ff9feeb5SBill Wendlingthat preserve their semantics.
1443ff9feeb5SBill Wendling
1444ff9feeb5SBill WendlingThere are many ways in which PHI instructions can safely be removed from the
1445ff9feeb5SBill Wendlingtarget code. The most traditional PHI deconstruction algorithm replaces PHI
1446ff9feeb5SBill Wendlinginstructions with copy instructions. That is the strategy adopted by LLVM. The
1447ff9feeb5SBill WendlingSSA deconstruction algorithm is implemented in
1448ff9feeb5SBill Wendling``lib/CodeGen/PHIElimination.cpp``. In order to invoke this pass, the identifier
1449ff9feeb5SBill Wendling``PHIEliminationID`` must be marked as required in the code of the register
1450ff9feeb5SBill Wendlingallocator.
1451ff9feeb5SBill Wendling
1452ff9feeb5SBill WendlingInstruction folding
1453ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^
1454ff9feeb5SBill Wendling
1455ff9feeb5SBill Wendling*Instruction folding* is an optimization performed during register allocation
1456ff9feeb5SBill Wendlingthat removes unnecessary copy instructions. For instance, a sequence of
1457ff9feeb5SBill Wendlinginstructions such as:
1458ff9feeb5SBill Wendling
1459ff9feeb5SBill Wendling::
1460ff9feeb5SBill Wendling
1461ff9feeb5SBill Wendling  %EBX = LOAD %mem_address
1462ff9feeb5SBill Wendling  %EAX = COPY %EBX
1463ff9feeb5SBill Wendling
1464ff9feeb5SBill Wendlingcan be safely substituted by the single instruction:
1465ff9feeb5SBill Wendling
1466ff9feeb5SBill Wendling::
1467ff9feeb5SBill Wendling
1468ff9feeb5SBill Wendling  %EAX = LOAD %mem_address
1469ff9feeb5SBill Wendling
1470ff9feeb5SBill WendlingInstructions can be folded with the
1471ff9feeb5SBill Wendling``TargetRegisterInfo::foldMemoryOperand(...)`` method. Care must be taken when
1472ff9feeb5SBill Wendlingfolding instructions; a folded instruction can be quite different from the
1473ff9feeb5SBill Wendlingoriginal instruction. See ``LiveIntervals::addIntervalsForSpills`` in
1474ff9feeb5SBill Wendling``lib/CodeGen/LiveIntervalAnalysis.cpp`` for an example of its use.
1475ff9feeb5SBill Wendling
1476ff9feeb5SBill WendlingBuilt in register allocators
1477ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1478ff9feeb5SBill Wendling
1479ff9feeb5SBill WendlingThe LLVM infrastructure provides the application developer with three different
1480ff9feeb5SBill Wendlingregister allocators:
1481ff9feeb5SBill Wendling
1482ff9feeb5SBill Wendling* *Fast* --- This register allocator is the default for debug builds. It
1483ff9feeb5SBill Wendling  allocates registers on a basic block level, attempting to keep values in
1484ff9feeb5SBill Wendling  registers and reusing registers as appropriate.
1485ff9feeb5SBill Wendling
1486ff9feeb5SBill Wendling* *Basic* --- This is an incremental approach to register allocation. Live
1487ff9feeb5SBill Wendling  ranges are assigned to registers one at a time in an order that is driven by
1488ff9feeb5SBill Wendling  heuristics. Since code can be rewritten on-the-fly during allocation, this
1489ff9feeb5SBill Wendling  framework allows interesting allocators to be developed as extensions. It is
1490ff9feeb5SBill Wendling  not itself a production register allocator but is a potentially useful
1491ff9feeb5SBill Wendling  stand-alone mode for triaging bugs and as a performance baseline.
1492ff9feeb5SBill Wendling
1493ff9feeb5SBill Wendling* *Greedy* --- *The default allocator*. This is a highly tuned implementation of
1494ff9feeb5SBill Wendling  the *Basic* allocator that incorporates global live range splitting. This
1495ff9feeb5SBill Wendling  allocator works hard to minimize the cost of spill code.
1496ff9feeb5SBill Wendling
1497ff9feeb5SBill Wendling* *PBQP* --- A Partitioned Boolean Quadratic Programming (PBQP) based register
1498ff9feeb5SBill Wendling  allocator. This allocator works by constructing a PBQP problem representing
1499ff9feeb5SBill Wendling  the register allocation problem under consideration, solving this using a PBQP
1500ff9feeb5SBill Wendling  solver, and mapping the solution back to a register assignment.
1501ff9feeb5SBill Wendling
1502ff9feeb5SBill WendlingThe type of register allocator used in ``llc`` can be chosen with the command
1503ff9feeb5SBill Wendlingline option ``-regalloc=...``:
1504ff9feeb5SBill Wendling
1505ff9feeb5SBill Wendling.. code-block:: bash
1506ff9feeb5SBill Wendling
1507ff9feeb5SBill Wendling  $ llc -regalloc=linearscan file.bc -o ln.s
1508ff9feeb5SBill Wendling  $ llc -regalloc=fast file.bc -o fa.s
1509ff9feeb5SBill Wendling  $ llc -regalloc=pbqp file.bc -o pbqp.s
1510ff9feeb5SBill Wendling
1511ff9feeb5SBill Wendling.. _Prolog/Epilog Code Insertion:
1512ff9feeb5SBill Wendling
1513ff9feeb5SBill WendlingProlog/Epilog Code Insertion
1514ff9feeb5SBill Wendling----------------------------
1515ff9feeb5SBill Wendling
15160ee7a2c3SDaniel McIntosh.. note::
15170ee7a2c3SDaniel McIntosh
15180ee7a2c3SDaniel McIntosh  To Be Written
15190ee7a2c3SDaniel McIntosh
1520ff9feeb5SBill WendlingCompact Unwind
15210ee7a2c3SDaniel McIntosh--------------
1522ff9feeb5SBill Wendling
1523ff9feeb5SBill WendlingThrowing an exception requires *unwinding* out of a function. The information on
1524ff9feeb5SBill Wendlinghow to unwind a given function is traditionally expressed in DWARF unwind
1525ff9feeb5SBill Wendling(a.k.a. frame) info. But that format was originally developed for debuggers to
1526ff9feeb5SBill Wendlingbacktrace, and each Frame Description Entry (FDE) requires ~20-30 bytes per
1527ff9feeb5SBill Wendlingfunction. There is also the cost of mapping from an address in a function to the
1528ff9feeb5SBill Wendlingcorresponding FDE at runtime. An alternative unwind encoding is called *compact
1529ff9feeb5SBill Wendlingunwind* and requires just 4-bytes per function.
1530ff9feeb5SBill Wendling
1531ff9feeb5SBill WendlingThe compact unwind encoding is a 32-bit value, which is encoded in an
1532ff9feeb5SBill Wendlingarchitecture-specific way. It specifies which registers to restore and from
1533ff9feeb5SBill Wendlingwhere, and how to unwind out of the function. When the linker creates a final
1534ff9feeb5SBill Wendlinglinked image, it will create a ``__TEXT,__unwind_info`` section. This section is
1535ff9feeb5SBill Wendlinga small and fast way for the runtime to access unwind info for any given
1536ff9feeb5SBill Wendlingfunction. If we emit compact unwind info for the function, that compact unwind
1537ff9feeb5SBill Wendlinginfo will be encoded in the ``__TEXT,__unwind_info`` section. If we emit DWARF
1538ff9feeb5SBill Wendlingunwind info, the ``__TEXT,__unwind_info`` section will contain the offset of the
1539ff9feeb5SBill WendlingFDE in the ``__TEXT,__eh_frame`` section in the final linked image.
1540ff9feeb5SBill Wendling
1541ff9feeb5SBill WendlingFor X86, there are three modes for the compact unwind encoding:
1542ff9feeb5SBill Wendling
1543ff9feeb5SBill Wendling*Function with a Frame Pointer (``EBP`` or ``RBP``)*
1544ff9feeb5SBill Wendling  ``EBP/RBP``-based frame, where ``EBP/RBP`` is pushed onto the stack
1545ff9feeb5SBill Wendling  immediately after the return address, then ``ESP/RSP`` is moved to
1546ff9feeb5SBill Wendling  ``EBP/RBP``. Thus to unwind, ``ESP/RSP`` is restored with the current
1547ff9feeb5SBill Wendling  ``EBP/RBP`` value, then ``EBP/RBP`` is restored by popping the stack, and the
1548ff9feeb5SBill Wendling  return is done by popping the stack once more into the PC. All non-volatile
1549ff9feeb5SBill Wendling  registers that need to be restored must have been saved in a small range on
1550ff9feeb5SBill Wendling  the stack that starts ``EBP-4`` to ``EBP-1020`` (``RBP-8`` to
1551ff9feeb5SBill Wendling  ``RBP-1020``). The offset (divided by 4 in 32-bit mode and 8 in 64-bit mode)
1552ff9feeb5SBill Wendling  is encoded in bits 16-23 (mask: ``0x00FF0000``).  The registers saved are
1553ff9feeb5SBill Wendling  encoded in bits 0-14 (mask: ``0x00007FFF``) as five 3-bit entries from the
1554ff9feeb5SBill Wendling  following table:
1555ff9feeb5SBill Wendling
1556ff9feeb5SBill Wendling    ==============  =============  ===============
1557ff9feeb5SBill Wendling    Compact Number  i386 Register  x86-64 Register
1558ff9feeb5SBill Wendling    ==============  =============  ===============
1559ff9feeb5SBill Wendling    1               ``EBX``        ``RBX``
1560ff9feeb5SBill Wendling    2               ``ECX``        ``R12``
1561ff9feeb5SBill Wendling    3               ``EDX``        ``R13``
1562ff9feeb5SBill Wendling    4               ``EDI``        ``R14``
1563ff9feeb5SBill Wendling    5               ``ESI``        ``R15``
1564ff9feeb5SBill Wendling    6               ``EBP``        ``RBP``
1565ff9feeb5SBill Wendling    ==============  =============  ===============
1566ff9feeb5SBill Wendling
1567ff9feeb5SBill Wendling*Frameless with a Small Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)*
1568ff9feeb5SBill Wendling  To return, a constant (encoded in the compact unwind encoding) is added to the
1569ff9feeb5SBill Wendling  ``ESP/RSP``.  Then the return is done by popping the stack into the PC. All
1570ff9feeb5SBill Wendling  non-volatile registers that need to be restored must have been saved on the
1571ff9feeb5SBill Wendling  stack immediately after the return address. The stack size (divided by 4 in
1572ff9feeb5SBill Wendling  32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask:
1573ff9feeb5SBill Wendling  ``0x00FF0000``). There is a maximum stack size of 1024 bytes in 32-bit mode
1574ff9feeb5SBill Wendling  and 2048 in 64-bit mode. The number of registers saved is encoded in bits 9-12
1575ff9feeb5SBill Wendling  (mask: ``0x00001C00``). Bits 0-9 (mask: ``0x000003FF``) contain which
1576ff9feeb5SBill Wendling  registers were saved and their order. (See the
1577ff9feeb5SBill Wendling  ``encodeCompactUnwindRegistersWithoutFrame()`` function in
1578ff9feeb5SBill Wendling  ``lib/Target/X86FrameLowering.cpp`` for the encoding algorithm.)
1579ff9feeb5SBill Wendling
1580ff9feeb5SBill Wendling*Frameless with a Large Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)*
1581ff9feeb5SBill Wendling  This case is like the "Frameless with a Small Constant Stack Size" case, but
1582ff9feeb5SBill Wendling  the stack size is too large to encode in the compact unwind encoding. Instead
1583ff9feeb5SBill Wendling  it requires that the function contains "``subl $nnnnnn, %esp``" in its
1584ff9feeb5SBill Wendling  prolog. The compact encoding contains the offset to the ``$nnnnnn`` value in
1585ff9feeb5SBill Wendling  the function in bits 9-12 (mask: ``0x00001C00``).
1586ff9feeb5SBill Wendling
1587ff9feeb5SBill Wendling.. _Late Machine Code Optimizations:
1588ff9feeb5SBill Wendling
1589ff9feeb5SBill WendlingLate Machine Code Optimizations
1590ff9feeb5SBill Wendling-------------------------------
1591ff9feeb5SBill Wendling
1592ff9feeb5SBill Wendling.. note::
1593ff9feeb5SBill Wendling
1594ff9feeb5SBill Wendling  To Be Written
1595ff9feeb5SBill Wendling
1596ff9feeb5SBill Wendling.. _Code Emission:
1597ff9feeb5SBill Wendling
1598ff9feeb5SBill WendlingCode Emission
1599ff9feeb5SBill Wendling-------------
1600ff9feeb5SBill Wendling
1601ff9feeb5SBill WendlingThe code emission step of code generation is responsible for lowering from the
1602ff9feeb5SBill Wendlingcode generator abstractions (like `MachineFunction`_, `MachineInstr`_, etc) down
1603ff9feeb5SBill Wendlingto the abstractions used by the MC layer (`MCInst`_, `MCStreamer`_, etc).  This
1604ff9feeb5SBill Wendlingis done with a combination of several different classes: the (misnamed)
1605ff9feeb5SBill Wendlingtarget-independent AsmPrinter class, target-specific subclasses of AsmPrinter
1606ff9feeb5SBill Wendling(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.
1607ff9feeb5SBill Wendling
1608ff9feeb5SBill WendlingSince the MC layer works at the level of abstraction of object files, it doesn't
1609ff9feeb5SBill Wendlinghave a notion of functions, global variables etc.  Instead, it thinks about
1610ff9feeb5SBill Wendlinglabels, directives, and instructions.  A key class used at this time is the
1611ff9feeb5SBill WendlingMCStreamer class.  This is an abstract API that is implemented in different ways
1612ff9feeb5SBill Wendling(e.g. to output a .s file, output an ELF .o file, etc) that is effectively an
1613ff9feeb5SBill Wendling"assembler API".  MCStreamer has one method per directive, such as EmitLabel,
1614adf4142fSFangrui SongEmitSymbolAttribute, switchSection, etc, which directly correspond to assembly
1615ff9feeb5SBill Wendlinglevel directives.
1616ff9feeb5SBill Wendling
1617ff9feeb5SBill WendlingIf you are interested in implementing a code generator for a target, there are
1618ff9feeb5SBill Wendlingthree important things that you have to implement for your target:
1619ff9feeb5SBill Wendling
1620ff9feeb5SBill Wendling#. First, you need a subclass of AsmPrinter for your target.  This class
1621ff9feeb5SBill Wendling   implements the general lowering process converting MachineFunction's into MC
1622ff9feeb5SBill Wendling   label constructs.  The AsmPrinter base class provides a number of useful
1623ff9feeb5SBill Wendling   methods and routines, and also allows you to override the lowering process in
1624ff9feeb5SBill Wendling   some important ways.  You should get much of the lowering for free if you are
1625ff9feeb5SBill Wendling   implementing an ELF, COFF, or MachO target, because the
1626ff9feeb5SBill Wendling   TargetLoweringObjectFile class implements much of the common logic.
1627ff9feeb5SBill Wendling
1628ff9feeb5SBill Wendling#. Second, you need to implement an instruction printer for your target.  The
1629ff9feeb5SBill Wendling   instruction printer takes an `MCInst`_ and renders it to a raw_ostream as
1630ff9feeb5SBill Wendling   text.  Most of this is automatically generated from the .td file (when you
1631ff9feeb5SBill Wendling   specify something like "``add $dst, $src1, $src2``" in the instructions), but
1632ff9feeb5SBill Wendling   you need to implement routines to print operands.
1633ff9feeb5SBill Wendling
1634ff9feeb5SBill Wendling#. Third, you need to implement code that lowers a `MachineInstr`_ to an MCInst,
1635ff9feeb5SBill Wendling   usually implemented in "<target>MCInstLower.cpp".  This lowering process is
1636ff9feeb5SBill Wendling   often target specific, and is responsible for turning jump table entries,
1637ff9feeb5SBill Wendling   constant pool indices, global variable addresses, etc into MCLabels as
1638ff9feeb5SBill Wendling   appropriate.  This translation layer is also responsible for expanding pseudo
1639ff9feeb5SBill Wendling   ops used by the code generator into the actual machine instructions they
1640ff9feeb5SBill Wendling   correspond to. The MCInsts that are generated by this are fed into the
1641ff9feeb5SBill Wendling   instruction printer or the encoder.
1642ff9feeb5SBill Wendling
16431e61ffddSEric ChristopherFinally, at your choosing, you can also implement a subclass of MCCodeEmitter
1644ff9feeb5SBill Wendlingwhich lowers MCInst's into machine code bytes and relocations.  This is
1645ff9feeb5SBill Wendlingimportant if you want to support direct .o file emission, or would like to
1646ff9feeb5SBill Wendlingimplement an assembler for your target.
1647ff9feeb5SBill Wendling
1648a6bcd53dSSean EvesonEmitting function stack size information
1649a6bcd53dSSean Eveson^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1650a6bcd53dSSean Eveson
1651a6bcd53dSSean EvesonA section containing metadata on function stack sizes will be emitted when
1652a6bcd53dSSean Eveson``TargetLoweringObjectFile::StackSizesSection`` is not null, and
1653a6bcd53dSSean Eveson``TargetOptions::EmitStackSizeSection`` is set (-stack-size-section). The
16542ae6037dSSean Evesonsection will contain an array of pairs of function symbol values (pointer size)
1655a6bcd53dSSean Evesonand stack sizes (unsigned LEB128). The stack size values only include the space
1656a6bcd53dSSean Evesonallocated in the function prologue. Functions with dynamic stack allocations are
1657a6bcd53dSSean Evesonnot included.
1658a6bcd53dSSean Eveson
1659ff9feeb5SBill WendlingVLIW Packetizer
1660ff9feeb5SBill Wendling---------------
1661ff9feeb5SBill Wendling
1662ff9feeb5SBill WendlingIn a Very Long Instruction Word (VLIW) architecture, the compiler is responsible
1663ff9feeb5SBill Wendlingfor mapping instructions to functional-units available on the architecture. To
1664ff9feeb5SBill Wendlingthat end, the compiler creates groups of instructions called *packets* or
1665ff9feeb5SBill Wendling*bundles*. The VLIW packetizer in LLVM is a target-independent mechanism to
1666ff9feeb5SBill Wendlingenable the packetization of machine instructions.
1667ff9feeb5SBill Wendling
1668ff9feeb5SBill WendlingMapping from instructions to functional units
1669ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1670ff9feeb5SBill Wendling
1671ff9feeb5SBill WendlingInstructions in a VLIW target can typically be mapped to multiple functional
1672ff9feeb5SBill Wendlingunits. During the process of packetizing, the compiler must be able to reason
1673ff9feeb5SBill Wendlingabout whether an instruction can be added to a packet. This decision can be
1674ff9feeb5SBill Wendlingcomplex since the compiler has to examine all possible mappings of instructions
1675ff9feeb5SBill Wendlingto functional units. Therefore to alleviate compilation-time complexity, the
1676ff9feeb5SBill WendlingVLIW packetizer parses the instruction classes of a target and generates tables
1677ff9feeb5SBill Wendlingat compiler build time. These tables can then be queried by the provided
1678ff9feeb5SBill Wendlingmachine-independent API to determine if an instruction can be accommodated in a
1679ff9feeb5SBill Wendlingpacket.
1680ff9feeb5SBill Wendling
1681ff9feeb5SBill WendlingHow the packetization tables are generated and used
1682ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1683ff9feeb5SBill Wendling
1684ff9feeb5SBill WendlingThe packetizer reads instruction classes from a target's itineraries and creates
1685ff9feeb5SBill Wendlinga deterministic finite automaton (DFA) to represent the state of a packet. A DFA
1686ff9feeb5SBill Wendlingconsists of three major elements: inputs, states, and transitions. The set of
1687ff9feeb5SBill Wendlinginputs for the generated DFA represents the instruction being added to a
1688ff9feeb5SBill Wendlingpacket. The states represent the possible consumption of functional units by
1689ff9feeb5SBill Wendlinginstructions in a packet. In the DFA, transitions from one state to another
1690ff9feeb5SBill Wendlingoccur on the addition of an instruction to an existing packet. If there is a
1691ff9feeb5SBill Wendlinglegal mapping of functional units to instructions, then the DFA contains a
1692ff9feeb5SBill Wendlingcorresponding transition. The absence of a transition indicates that a legal
1693ff9feeb5SBill Wendlingmapping does not exist and that the instruction cannot be added to the packet.
1694ff9feeb5SBill Wendling
1695ff9feeb5SBill WendlingTo generate tables for a VLIW target, add *Target*\ GenDFAPacketizer.inc as a
1696ff9feeb5SBill Wendlingtarget to the Makefile in the target directory. The exported API provides three
1697ff9feeb5SBill Wendlingfunctions: ``DFAPacketizer::clearResources()``,
1698ff9feeb5SBill Wendling``DFAPacketizer::reserveResources(MachineInstr *MI)``, and
1699ff9feeb5SBill Wendling``DFAPacketizer::canReserveResources(MachineInstr *MI)``. These functions allow
1700ff9feeb5SBill Wendlinga target packetizer to add an instruction to an existing packet and to check
1701ff9feeb5SBill Wendlingwhether an instruction can be added to a packet. See
1702ff9feeb5SBill Wendling``llvm/CodeGen/DFAPacketizer.h`` for more information.
1703ff9feeb5SBill Wendling
1704ff9feeb5SBill WendlingImplementing a Native Assembler
1705ff9feeb5SBill Wendling===============================
1706ff9feeb5SBill Wendling
1707ff9feeb5SBill WendlingThough you're probably reading this because you want to write or maintain a
1708a9853efbSSylvestre Ledrucompiler backend, LLVM also fully supports building a native assembler.
1709ff9feeb5SBill WendlingWe've tried hard to automate the generation of the assembler from the .td files
1710ff9feeb5SBill Wendling(in particular the instruction syntax and encodings), which means that a large
1711ff9feeb5SBill Wendlingpart of the manual and repetitive data entry can be factored and shared with the
1712ff9feeb5SBill Wendlingcompiler.
1713ff9feeb5SBill Wendling
1714ff9feeb5SBill WendlingInstruction Parsing
1715ff9feeb5SBill Wendling-------------------
1716ff9feeb5SBill Wendling
1717ff9feeb5SBill Wendling.. note::
1718ff9feeb5SBill Wendling
1719ff9feeb5SBill Wendling  To Be Written
1720ff9feeb5SBill Wendling
1721ff9feeb5SBill Wendling
1722ff9feeb5SBill WendlingInstruction Alias Processing
1723ff9feeb5SBill Wendling----------------------------
1724ff9feeb5SBill Wendling
1725ff9feeb5SBill WendlingOnce the instruction is parsed, it enters the MatchInstructionImpl function.
1726ff9feeb5SBill WendlingThe MatchInstructionImpl function performs alias processing and then does actual
1727ff9feeb5SBill Wendlingmatching.
1728ff9feeb5SBill Wendling
1729ff9feeb5SBill WendlingAlias processing is the phase that canonicalizes different lexical forms of the
1730ff9feeb5SBill Wendlingsame instructions down to one representation.  There are several different kinds
1731ff9feeb5SBill Wendlingof alias that are possible to implement and they are listed below in the order
1732ff9feeb5SBill Wendlingthat they are processed (which is in order from simplest/weakest to most
1733ff9feeb5SBill Wendlingcomplex/powerful).  Generally you want to use the first alias mechanism that
1734ff9feeb5SBill Wendlingmeets the needs of your instruction, because it will allow a more concise
1735ff9feeb5SBill Wendlingdescription.
1736ff9feeb5SBill Wendling
1737ff9feeb5SBill WendlingMnemonic Aliases
1738ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^
1739ff9feeb5SBill Wendling
1740ff9feeb5SBill WendlingThe first phase of alias processing is simple instruction mnemonic remapping for
1741ff9feeb5SBill Wendlingclasses of instructions which are allowed with two different mnemonics.  This
1742ff9feeb5SBill Wendlingphase is a simple and unconditionally remapping from one input mnemonic to one
1743ff9feeb5SBill Wendlingoutput mnemonic.  It isn't possible for this form of alias to look at the
1744ff9feeb5SBill Wendlingoperands at all, so the remapping must apply for all forms of a given mnemonic.
1745ff9feeb5SBill WendlingMnemonic aliases are defined simply, for example X86 has:
1746ff9feeb5SBill Wendling
1747ff9feeb5SBill Wendling::
1748ff9feeb5SBill Wendling
1749ff9feeb5SBill Wendling  def : MnemonicAlias<"cbw",     "cbtw">;
1750ff9feeb5SBill Wendling  def : MnemonicAlias<"smovq",   "movsq">;
1751ff9feeb5SBill Wendling  def : MnemonicAlias<"fldcww",  "fldcw">;
1752ff9feeb5SBill Wendling  def : MnemonicAlias<"fucompi", "fucomip">;
1753ff9feeb5SBill Wendling  def : MnemonicAlias<"ud2a",    "ud2">;
1754ff9feeb5SBill Wendling
1755ff9feeb5SBill Wendling... and many others.  With a MnemonicAlias definition, the mnemonic is remapped
1756ff9feeb5SBill Wendlingsimply and directly.  Though MnemonicAlias's can't look at any aspect of the
1757ff9feeb5SBill Wendlinginstruction (such as the operands) they can depend on global modes (the same
1758ff9feeb5SBill Wendlingones supported by the matcher), through a Requires clause:
1759ff9feeb5SBill Wendling
1760ff9feeb5SBill Wendling::
1761ff9feeb5SBill Wendling
1762ff9feeb5SBill Wendling  def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
1763ff9feeb5SBill Wendling  def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
1764ff9feeb5SBill Wendling
17659ab8899fSSean SilvaIn this example, the mnemonic gets mapped into a different one depending on
1766ff9feeb5SBill Wendlingthe current instruction set.
1767ff9feeb5SBill Wendling
1768ff9feeb5SBill WendlingInstruction Aliases
1769ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^
1770ff9feeb5SBill Wendling
1771ff9feeb5SBill WendlingThe most general phase of alias processing occurs while matching is happening:
1772ff9feeb5SBill Wendlingit provides new forms for the matcher to match along with a specific instruction
1773ff9feeb5SBill Wendlingto generate.  An instruction alias has two parts: the string to match and the
1774ff9feeb5SBill Wendlinginstruction to generate.  For example:
1775ff9feeb5SBill Wendling
1776ff9feeb5SBill Wendling::
1777ff9feeb5SBill Wendling
1778ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8  :$src)>;
1779ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
1780ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX32rr8  GR32:$dst, GR8  :$src)>;
1781ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
1782ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX64rr8  GR64:$dst, GR8  :$src)>;
1783ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
1784ff9feeb5SBill Wendling  def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
1785ff9feeb5SBill Wendling
1786ff9feeb5SBill WendlingThis shows a powerful example of the instruction aliases, matching the same
1787ff9feeb5SBill Wendlingmnemonic in multiple different ways depending on what operands are present in
1788ff9feeb5SBill Wendlingthe assembly.  The result of instruction aliases can include operands in a
1789ff9feeb5SBill Wendlingdifferent order than the destination instruction, and can use an input multiple
1790ff9feeb5SBill Wendlingtimes, for example:
1791ff9feeb5SBill Wendling
1792ff9feeb5SBill Wendling::
1793ff9feeb5SBill Wendling
1794ff9feeb5SBill Wendling  def : InstAlias<"clrb $reg", (XOR8rr  GR8 :$reg, GR8 :$reg)>;
1795ff9feeb5SBill Wendling  def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
1796ff9feeb5SBill Wendling  def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
1797ff9feeb5SBill Wendling  def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
1798ff9feeb5SBill Wendling
1799ff9feeb5SBill WendlingThis example also shows that tied operands are only listed once.  In the X86
1800ff9feeb5SBill Wendlingbackend, XOR8rr has two input GR8's and one output GR8 (where an input is tied
1801ff9feeb5SBill Wendlingto the output).  InstAliases take a flattened operand list without duplicates
1802ff9feeb5SBill Wendlingfor tied operands.  The result of an instruction alias can also use immediates
1803ff9feeb5SBill Wendlingand fixed physical registers which are added as simple immediate operands in the
1804ff9feeb5SBill Wendlingresult, for example:
1805ff9feeb5SBill Wendling
1806ff9feeb5SBill Wendling::
1807ff9feeb5SBill Wendling
1808ff9feeb5SBill Wendling  // Fixed Immediate operand.
1809ff9feeb5SBill Wendling  def : InstAlias<"aad", (AAD8i8 10)>;
1810ff9feeb5SBill Wendling
1811ff9feeb5SBill Wendling  // Fixed register operand.
1812ff9feeb5SBill Wendling  def : InstAlias<"fcomi", (COM_FIr ST1)>;
1813ff9feeb5SBill Wendling
1814ff9feeb5SBill Wendling  // Simple alias.
1815ff9feeb5SBill Wendling  def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
1816ff9feeb5SBill Wendling
1817ff9feeb5SBill WendlingInstruction aliases can also have a Requires clause to make them subtarget
1818ff9feeb5SBill Wendlingspecific.
1819ff9feeb5SBill Wendling
1820ff9feeb5SBill WendlingIf the back-end supports it, the instruction printer can automatically emit the
1821ff9feeb5SBill Wendlingalias rather than what's being aliased. It typically leads to better, more
1822ff9feeb5SBill Wendlingreadable code. If it's better to print out what's being aliased, then pass a '0'
1823ff9feeb5SBill Wendlingas the third parameter to the InstAlias definition.
1824ff9feeb5SBill Wendling
1825ff9feeb5SBill WendlingInstruction Matching
1826ff9feeb5SBill Wendling--------------------
1827ff9feeb5SBill Wendling
1828ff9feeb5SBill Wendling.. note::
1829ff9feeb5SBill Wendling
1830ff9feeb5SBill Wendling  To Be Written
1831ff9feeb5SBill Wendling
1832ff9feeb5SBill Wendling.. _Implementations of the abstract target description interfaces:
1833ff9feeb5SBill Wendling.. _implement the target description:
1834ff9feeb5SBill Wendling
1835ff9feeb5SBill WendlingTarget-specific Implementation Notes
1836ff9feeb5SBill Wendling====================================
1837ff9feeb5SBill Wendling
1838ff9feeb5SBill WendlingThis section of the document explains features or design decisions that are
1839*86c42429SAlex Bradburyspecific to the code generator for a particular target.
1840ff9feeb5SBill Wendling
18415ace2cd5SJay Foad.. _tail call section:
1842ff9feeb5SBill Wendling
1843ff9feeb5SBill WendlingTail call optimization
1844ff9feeb5SBill Wendling----------------------
1845ff9feeb5SBill Wendling
1846ff9feeb5SBill WendlingTail call optimization, callee reusing the stack of the caller, is currently
184782a0e808STim Northoversupported on x86/x86-64, PowerPC, AArch64, and WebAssembly. It is performed on
184882a0e808STim Northoverx86/x86-64, PowerPC, and AArch64 if:
1849ff9feeb5SBill Wendling
1850d71b4e45SDuncan Sands* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC
185182a0e808STim Northover  calling convention), ``cc 11`` (HiPE calling convention), ``tailcc``, or
185282a0e808STim Northover  ``swifttailcc``.
1853ff9feeb5SBill Wendling
1854ff9feeb5SBill Wendling* The call is a tail call - in tail position (ret immediately follows call and
1855ff9feeb5SBill Wendling  ret uses value of call or is void).
1856ff9feeb5SBill Wendling
1857f9b67b81SReid Kleckner* Option ``-tailcallopt`` is enabled or the calling convention is ``tailcc``.
1858ff9feeb5SBill Wendling
1859cf21875dSAlp Toker* Platform-specific constraints are met.
1860ff9feeb5SBill Wendling
1861ff9feeb5SBill Wendlingx86/x86-64 constraints:
1862ff9feeb5SBill Wendling
1863ff9feeb5SBill Wendling* No variable argument lists are used.
1864ff9feeb5SBill Wendling
1865ff9feeb5SBill Wendling* On x86-64 when generating GOT/PIC code only module-local calls (visibility =
1866ff9feeb5SBill Wendling  hidden or protected) are supported.
1867ff9feeb5SBill Wendling
1868ff9feeb5SBill WendlingPowerPC constraints:
1869ff9feeb5SBill Wendling
1870ff9feeb5SBill Wendling* No variable argument lists are used.
1871ff9feeb5SBill Wendling
1872ff9feeb5SBill Wendling* No byval parameters are used.
1873ff9feeb5SBill Wendling
1874ff9feeb5SBill Wendling* On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected)
1875ff9feeb5SBill Wendling  are supported.
1876ff9feeb5SBill Wendling
1877e0a9dce5SThomas LivelyWebAssembly constraints:
1878e0a9dce5SThomas Lively
1879e0a9dce5SThomas Lively* No variable argument lists are used
1880e0a9dce5SThomas Lively
1881e0a9dce5SThomas Lively* The 'tail-call' target attribute is enabled.
1882e0a9dce5SThomas Lively
1883e0a9dce5SThomas Lively* The caller and callee's return types must match. The caller cannot
1884e0a9dce5SThomas Lively  be void unless the callee is, too.
1885a1d97a96SThomas Lively
188682a0e808STim NorthoverAArch64 constraints:
188782a0e808STim Northover
188882a0e808STim Northover* No variable argument lists are used.
188982a0e808STim Northover
1890ff9feeb5SBill WendlingExample:
1891ff9feeb5SBill Wendling
1892ff9feeb5SBill WendlingCall as ``llc -tailcallopt test.ll``.
1893ff9feeb5SBill Wendling
1894ff9feeb5SBill Wendling.. code-block:: llvm
1895ff9feeb5SBill Wendling
1896ff9feeb5SBill Wendling  declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
1897ff9feeb5SBill Wendling
1898ff9feeb5SBill Wendling  define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
1899ff9feeb5SBill Wendling    %l1 = add i32 %in1, %in2
19009399681aSKai Nacke    %tmp = tail call fastcc i32 @tailcallee(i32 inreg %in1, i32 inreg %in2, i32 %in1, i32 %l1)
1901ff9feeb5SBill Wendling    ret i32 %tmp
1902ff9feeb5SBill Wendling  }
1903ff9feeb5SBill Wendling
1904ff9feeb5SBill WendlingImplications of ``-tailcallopt``:
1905ff9feeb5SBill Wendling
1906ff9feeb5SBill WendlingTo support tail call optimization in situations where the callee has more
1907ff9feeb5SBill Wendlingarguments than the caller a 'callee pops arguments' convention is used. This
1908ff9feeb5SBill Wendlingcurrently causes each ``fastcc`` call that is not tail call optimized (because
1909ff9feeb5SBill Wendlingone or more of above constraints are not met) to be followed by a readjustment
1910ff9feeb5SBill Wendlingof the stack. So performance might be worse in such cases.
1911ff9feeb5SBill Wendling
1912ff9feeb5SBill WendlingSibling call optimization
1913ff9feeb5SBill Wendling-------------------------
1914ff9feeb5SBill Wendling
1915ff9feeb5SBill WendlingSibling call optimization is a restricted form of tail call optimization.
1916ff9feeb5SBill WendlingUnlike tail call optimization described in the previous section, it can be
1917ff9feeb5SBill Wendlingperformed automatically on any tail calls when ``-tailcallopt`` option is not
1918ff9feeb5SBill Wendlingspecified.
1919ff9feeb5SBill Wendling
1920ff9feeb5SBill WendlingSibling call optimization is currently performed on x86/x86-64 when the
1921ff9feeb5SBill Wendlingfollowing constraints are met:
1922ff9feeb5SBill Wendling
1923ff9feeb5SBill Wendling* Caller and callee have the same calling convention. It can be either ``c`` or
1924ff9feeb5SBill Wendling  ``fastcc``.
1925ff9feeb5SBill Wendling
1926ff9feeb5SBill Wendling* The call is a tail call - in tail position (ret immediately follows call and
1927ff9feeb5SBill Wendling  ret uses value of call or is void).
1928ff9feeb5SBill Wendling
1929ff9feeb5SBill Wendling* Caller and callee have matching return type or the callee result is not used.
1930ff9feeb5SBill Wendling
1931ff9feeb5SBill Wendling* If any of the callee arguments are being passed in stack, they must be
1932ff9feeb5SBill Wendling  available in caller's own incoming argument stack and the frame offsets must
1933ff9feeb5SBill Wendling  be the same.
1934ff9feeb5SBill Wendling
1935ff9feeb5SBill WendlingExample:
1936ff9feeb5SBill Wendling
1937ff9feeb5SBill Wendling.. code-block:: llvm
1938ff9feeb5SBill Wendling
1939ff9feeb5SBill Wendling  declare i32 @bar(i32, i32)
1940ff9feeb5SBill Wendling
1941ff9feeb5SBill Wendling  define i32 @foo(i32 %a, i32 %b, i32 %c) {
1942ff9feeb5SBill Wendling  entry:
1943ff9feeb5SBill Wendling    %0 = tail call i32 @bar(i32 %a, i32 %b)
1944ff9feeb5SBill Wendling    ret i32 %0
1945ff9feeb5SBill Wendling  }
1946ff9feeb5SBill Wendling
1947ff9feeb5SBill WendlingThe X86 backend
1948ff9feeb5SBill Wendling---------------
1949ff9feeb5SBill Wendling
1950ff9feeb5SBill WendlingThe X86 code generator lives in the ``lib/Target/X86`` directory.  This code
1951ff9feeb5SBill Wendlinggenerator is capable of targeting a variety of x86-32 and x86-64 processors, and
1952ff9feeb5SBill Wendlingincludes support for ISA extensions such as MMX and SSE.
1953ff9feeb5SBill Wendling
1954ff9feeb5SBill WendlingX86 Target Triples supported
1955ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1956ff9feeb5SBill Wendling
1957ff9feeb5SBill WendlingThe following are the known target triples that are supported by the X86
1958ff9feeb5SBill Wendlingbackend.  This is not an exhaustive list, and it would be useful to add those
1959ff9feeb5SBill Wendlingthat people test.
1960ff9feeb5SBill Wendling
1961ff9feeb5SBill Wendling* **i686-pc-linux-gnu** --- Linux
1962ff9feeb5SBill Wendling
1963ff9feeb5SBill Wendling* **i386-unknown-freebsd5.3** --- FreeBSD 5.3
1964ff9feeb5SBill Wendling
1965ff9feeb5SBill Wendling* **i686-pc-cygwin** --- Cygwin on Win32
1966ff9feeb5SBill Wendling
1967ff9feeb5SBill Wendling* **i686-pc-mingw32** --- MingW on Win32
1968ff9feeb5SBill Wendling
1969ff9feeb5SBill Wendling* **i386-pc-mingw32msvc** --- MingW crosscompiler on Linux
1970ff9feeb5SBill Wendling
1971ff9feeb5SBill Wendling* **i686-apple-darwin*** --- Apple Darwin on X86
1972ff9feeb5SBill Wendling
1973ff9feeb5SBill Wendling* **x86_64-unknown-linux-gnu** --- Linux
1974ff9feeb5SBill Wendling
1975ff9feeb5SBill WendlingX86 Calling Conventions supported
1976ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1977ff9feeb5SBill Wendling
1978ff9feeb5SBill WendlingThe following target-specific calling conventions are known to backend:
1979ff9feeb5SBill Wendling
1980ff9feeb5SBill Wendling* **x86_StdCall** --- stdcall calling convention seen on Microsoft Windows
1981ff9feeb5SBill Wendling  platform (CC ID = 64).
1982ff9feeb5SBill Wendling
1983ff9feeb5SBill Wendling* **x86_FastCall** --- fastcall calling convention seen on Microsoft Windows
1984ff9feeb5SBill Wendling  platform (CC ID = 65).
1985ff9feeb5SBill Wendling
1986ff9feeb5SBill Wendling* **x86_ThisCall** --- Similar to X86_StdCall. Passes first argument in ECX,
1987ff9feeb5SBill Wendling  others via stack. Callee is responsible for stack cleaning. This convention is
1988ff9feeb5SBill Wendling  used by MSVC by default for methods in its ABI (CC ID = 70).
1989ff9feeb5SBill Wendling
1990ff9feeb5SBill Wendling.. _X86 addressing mode:
1991ff9feeb5SBill Wendling
1992ff9feeb5SBill WendlingRepresenting X86 addressing modes in MachineInstrs
1993ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1994ff9feeb5SBill Wendling
1995ff9feeb5SBill WendlingThe x86 has a very flexible way of accessing memory.  It is capable of forming
1996ff9feeb5SBill Wendlingmemory addresses of the following expression directly in integer instructions
1997ff9feeb5SBill Wendling(which use ModR/M addressing):
1998ff9feeb5SBill Wendling
1999ff9feeb5SBill Wendling::
2000ff9feeb5SBill Wendling
2001ff9feeb5SBill Wendling  SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
2002ff9feeb5SBill Wendling
2003ff9feeb5SBill WendlingIn order to represent this, LLVM tracks no less than 5 operands for each memory
2004ff9feeb5SBill Wendlingoperand of this form.  This means that the "load" form of '``mov``' has the
2005ff9feeb5SBill Wendlingfollowing ``MachineOperand``\s in this order:
2006ff9feeb5SBill Wendling
2007ff9feeb5SBill Wendling::
2008ff9feeb5SBill Wendling
2009ff9feeb5SBill Wendling  Index:        0     |    1        2       3           4          5
2010ff9feeb5SBill Wendling  Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment
2011ff9feeb5SBill Wendling  OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg
2012ff9feeb5SBill Wendling
2013ff9feeb5SBill WendlingStores, and all other instructions, treat the four memory operands in the same
2014ff9feeb5SBill Wendlingway and in the same order.  If the segment register is unspecified (regno = 0),
2015ff9feeb5SBill Wendlingthen no segment override is generated.  "Lea" operations do not have a segment
2016ff9feeb5SBill Wendlingregister specified, so they only have 4 operands for their memory reference.
2017ff9feeb5SBill Wendling
2018ff9feeb5SBill WendlingX86 address spaces supported
2019ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2020ff9feeb5SBill Wendling
2021ff9feeb5SBill Wendlingx86 has a feature which provides the ability to perform loads and stores to
2022ff9feeb5SBill Wendlingdifferent address spaces via the x86 segment registers.  A segment override
2023ff9feeb5SBill Wendlingprefix byte on an instruction causes the instruction's memory access to go to
2024ff9feeb5SBill Wendlingthe specified segment.  LLVM address space 0 is the default address space, which
2025ff9feeb5SBill Wendlingincludes the stack, and any unqualified memory accesses in a program.  Address
2026ff9feeb5SBill Wendlingspaces 1-255 are currently reserved for user-defined code.  The GS-segment is
2027c9fbf101SDavid L Kreitzerrepresented by address space 256, the FS-segment is represented by address space
2028c9fbf101SDavid L Kreitzer257, and the SS-segment is represented by address space 258. Other x86 segments
2029c9fbf101SDavid L Kreitzerhave yet to be allocated address space numbers.
2030ff9feeb5SBill Wendling
2031ff9feeb5SBill WendlingWhile these address spaces may seem similar to TLS via the ``thread_local``
2032ff9feeb5SBill Wendlingkeyword, and often use the same underlying hardware, there are some fundamental
2033ff9feeb5SBill Wendlingdifferences.
2034ff9feeb5SBill Wendling
2035ff9feeb5SBill WendlingThe ``thread_local`` keyword applies to global variables and specifies that they
2036ff9feeb5SBill Wendlingare to be allocated in thread-local memory. There are no type qualifiers
2037ff9feeb5SBill Wendlinginvolved, and these variables can be pointed to with normal pointers and
2038ff9feeb5SBill Wendlingaccessed with normal loads and stores.  The ``thread_local`` keyword is
2039ff9feeb5SBill Wendlingtarget-independent at the LLVM IR level (though LLVM doesn't yet have
2040ff9feeb5SBill Wendlingimplementations of it for some configurations)
2041ff9feeb5SBill Wendling
2042ff9feeb5SBill WendlingSpecial address spaces, in contrast, apply to static types. Every load and store
2043ff9feeb5SBill Wendlinghas a particular address space in its address operand type, and this is what
2044ff9feeb5SBill Wendlingdetermines which address space is accessed.  LLVM ignores these special address
2045ff9feeb5SBill Wendlingspace qualifiers on global variables, and does not provide a way to directly
2046ff9feeb5SBill Wendlingallocate storage in them.  At the LLVM IR level, the behavior of these special
2047ff9feeb5SBill Wendlingaddress spaces depends in part on the underlying OS or runtime environment, and
2048ff9feeb5SBill Wendlingthey are specific to x86 (and LLVM doesn't yet handle them correctly in some
2049ff9feeb5SBill Wendlingcases).
2050ff9feeb5SBill Wendling
2051ff9feeb5SBill WendlingSome operating systems and runtime environments use (or may in the future use)
2052ff9feeb5SBill Wendlingthe FS/GS-segment registers for various low-level purposes, so care should be
2053ff9feeb5SBill Wendlingtaken when considering them.
2054ff9feeb5SBill Wendling
2055ff9feeb5SBill WendlingInstruction naming
2056ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^
2057ff9feeb5SBill Wendling
2058e1567e77SYoungsuk KimAn instruction name consists of the base name, a default operand size, and a
2059ff9feeb5SBill Wendlingcharacter per operand with an optional special size. For example:
2060ff9feeb5SBill Wendling
2061ff9feeb5SBill Wendling::
2062ff9feeb5SBill Wendling
2063ff9feeb5SBill Wendling  ADD8rr      -> add, 8-bit register, 8-bit register
2064ff9feeb5SBill Wendling  IMUL16rmi   -> imul, 16-bit register, 16-bit memory, 16-bit immediate
2065ff9feeb5SBill Wendling  IMUL16rmi8  -> imul, 16-bit register, 16-bit memory, 8-bit immediate
2066ff9feeb5SBill Wendling  MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
2067ff9feeb5SBill Wendling
2068ff9feeb5SBill WendlingThe PowerPC backend
2069ff9feeb5SBill Wendling-------------------
2070ff9feeb5SBill Wendling
2071ff9feeb5SBill WendlingThe PowerPC code generator lives in the lib/Target/PowerPC directory.  The code
2072ff9feeb5SBill Wendlinggeneration is retargetable to several variations or *subtargets* of the PowerPC
2073ff9feeb5SBill WendlingISA; including ppc32, ppc64 and altivec.
2074ff9feeb5SBill Wendling
2075ff9feeb5SBill WendlingLLVM PowerPC ABI
2076ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^
2077ff9feeb5SBill Wendling
2078ff9feeb5SBill WendlingLLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative
2079ff9feeb5SBill Wendling(PIC) or static addressing for accessing global values, so no TOC (r2) is
2080ff9feeb5SBill Wendlingused. Second, r31 is used as a frame pointer to allow dynamic growth of a stack
2081ff9feeb5SBill Wendlingframe.  LLVM takes advantage of having no TOC to provide space to save the frame
2082ff9feeb5SBill Wendlingpointer in the PowerPC linkage area of the caller frame.  Other details of
2083ff9feeb5SBill WendlingPowerPC ABI can be found at `PowerPC ABI
2084ff9feeb5SBill Wendling<http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html>`_\
2085ff9feeb5SBill Wendling. Note: This link describes the 32 bit ABI.  The 64 bit ABI is similar except
2086ff9feeb5SBill Wendlingspace for GPRs are 8 bytes wide (not 4) and r13 is reserved for system use.
2087ff9feeb5SBill Wendling
2088ff9feeb5SBill WendlingFrame Layout
2089ff9feeb5SBill Wendling^^^^^^^^^^^^
2090ff9feeb5SBill Wendling
2091ff9feeb5SBill WendlingThe size of a PowerPC frame is usually fixed for the duration of a function's
2092ff9feeb5SBill Wendlinginvocation.  Since the frame is fixed size, all references into the frame can be
2093ff9feeb5SBill Wendlingaccessed via fixed offsets from the stack pointer.  The exception to this is
2094ff9feeb5SBill Wendlingwhen dynamic alloca or variable sized arrays are present, then a base pointer
2095ff9feeb5SBill Wendling(r31) is used as a proxy for the stack pointer and stack pointer is free to grow
2096ff9feeb5SBill Wendlingor shrink.  A base pointer is also used if llvm-gcc is not passed the
2097ff9feeb5SBill Wendling-fomit-frame-pointer flag. The stack pointer is always aligned to 16 bytes, so
2098ff9feeb5SBill Wendlingthat space allocated for altivec vectors will be properly aligned.
2099ff9feeb5SBill Wendling
2100ff9feeb5SBill WendlingAn invocation frame is laid out as follows (low memory at top):
2101ff9feeb5SBill Wendling
2102ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">`
2103ff9feeb5SBill Wendling:raw-html:`<tr>`
2104ff9feeb5SBill Wendling:raw-html:`<td>Linkage<br><br></td>`
2105ff9feeb5SBill Wendling:raw-html:`</tr>`
2106ff9feeb5SBill Wendling:raw-html:`<tr>`
2107ff9feeb5SBill Wendling:raw-html:`<td>Parameter area<br><br></td>`
2108ff9feeb5SBill Wendling:raw-html:`</tr>`
2109ff9feeb5SBill Wendling:raw-html:`<tr>`
2110ff9feeb5SBill Wendling:raw-html:`<td>Dynamic area<br><br></td>`
2111ff9feeb5SBill Wendling:raw-html:`</tr>`
2112ff9feeb5SBill Wendling:raw-html:`<tr>`
2113ff9feeb5SBill Wendling:raw-html:`<td>Locals area<br><br></td>`
2114ff9feeb5SBill Wendling:raw-html:`</tr>`
2115ff9feeb5SBill Wendling:raw-html:`<tr>`
2116ff9feeb5SBill Wendling:raw-html:`<td>Saved registers area<br><br></td>`
2117ff9feeb5SBill Wendling:raw-html:`</tr>`
2118ff9feeb5SBill Wendling:raw-html:`<tr style="border-style: none hidden none hidden;">`
2119ff9feeb5SBill Wendling:raw-html:`<td><br></td>`
2120ff9feeb5SBill Wendling:raw-html:`</tr>`
2121ff9feeb5SBill Wendling:raw-html:`<tr>`
2122ff9feeb5SBill Wendling:raw-html:`<td>Previous Frame<br><br></td>`
2123ff9feeb5SBill Wendling:raw-html:`</tr>`
2124ff9feeb5SBill Wendling:raw-html:`</table>`
2125ff9feeb5SBill Wendling
2126ff9feeb5SBill WendlingThe *linkage* area is used by a callee to save special registers prior to
2127ff9feeb5SBill Wendlingallocating its own frame.  Only three entries are relevant to LLVM. The first
2128ff9feeb5SBill Wendlingentry is the previous stack pointer (sp), aka link.  This allows probing tools
2129ff9feeb5SBill Wendlinglike gdb or exception handlers to quickly scan the frames in the stack.  A
2130ff9feeb5SBill Wendlingfunction epilog can also use the link to pop the frame from the stack.  The
2131ff9feeb5SBill Wendlingthird entry in the linkage area is used to save the return address from the lr
2132ff9feeb5SBill Wendlingregister. Finally, as mentioned above, the last entry is used to save the
2133ff9feeb5SBill Wendlingprevious frame pointer (r31.)  The entries in the linkage area are the size of a
2134ff9feeb5SBill WendlingGPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64
2135ff9feeb5SBill Wendlingbit mode.
2136ff9feeb5SBill Wendling
2137ff9feeb5SBill Wendling32 bit linkage area:
2138ff9feeb5SBill Wendling
2139ff9feeb5SBill Wendling:raw-html:`<table  border="1" cellspacing="0">`
2140ff9feeb5SBill Wendling:raw-html:`<tr>`
2141ff9feeb5SBill Wendling:raw-html:`<td>0</td>`
2142ff9feeb5SBill Wendling:raw-html:`<td>Saved SP (r1)</td>`
2143ff9feeb5SBill Wendling:raw-html:`</tr>`
2144ff9feeb5SBill Wendling:raw-html:`<tr>`
2145ff9feeb5SBill Wendling:raw-html:`<td>4</td>`
2146ff9feeb5SBill Wendling:raw-html:`<td>Saved CR</td>`
2147ff9feeb5SBill Wendling:raw-html:`</tr>`
2148ff9feeb5SBill Wendling:raw-html:`<tr>`
2149ff9feeb5SBill Wendling:raw-html:`<td>8</td>`
2150ff9feeb5SBill Wendling:raw-html:`<td>Saved LR</td>`
2151ff9feeb5SBill Wendling:raw-html:`</tr>`
2152ff9feeb5SBill Wendling:raw-html:`<tr>`
2153ff9feeb5SBill Wendling:raw-html:`<td>12</td>`
2154ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
2155ff9feeb5SBill Wendling:raw-html:`</tr>`
2156ff9feeb5SBill Wendling:raw-html:`<tr>`
2157ff9feeb5SBill Wendling:raw-html:`<td>16</td>`
2158ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
2159ff9feeb5SBill Wendling:raw-html:`</tr>`
2160ff9feeb5SBill Wendling:raw-html:`<tr>`
2161ff9feeb5SBill Wendling:raw-html:`<td>20</td>`
2162ff9feeb5SBill Wendling:raw-html:`<td>Saved FP (r31)</td>`
2163ff9feeb5SBill Wendling:raw-html:`</tr>`
2164ff9feeb5SBill Wendling:raw-html:`</table>`
2165ff9feeb5SBill Wendling
2166ff9feeb5SBill Wendling64 bit linkage area:
2167ff9feeb5SBill Wendling
2168ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">`
2169ff9feeb5SBill Wendling:raw-html:`<tr>`
2170ff9feeb5SBill Wendling:raw-html:`<td>0</td>`
2171ff9feeb5SBill Wendling:raw-html:`<td>Saved SP (r1)</td>`
2172ff9feeb5SBill Wendling:raw-html:`</tr>`
2173ff9feeb5SBill Wendling:raw-html:`<tr>`
2174ff9feeb5SBill Wendling:raw-html:`<td>8</td>`
2175ff9feeb5SBill Wendling:raw-html:`<td>Saved CR</td>`
2176ff9feeb5SBill Wendling:raw-html:`</tr>`
2177ff9feeb5SBill Wendling:raw-html:`<tr>`
2178ff9feeb5SBill Wendling:raw-html:`<td>16</td>`
2179ff9feeb5SBill Wendling:raw-html:`<td>Saved LR</td>`
2180ff9feeb5SBill Wendling:raw-html:`</tr>`
2181ff9feeb5SBill Wendling:raw-html:`<tr>`
2182ff9feeb5SBill Wendling:raw-html:`<td>24</td>`
2183ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
2184ff9feeb5SBill Wendling:raw-html:`</tr>`
2185ff9feeb5SBill Wendling:raw-html:`<tr>`
2186ff9feeb5SBill Wendling:raw-html:`<td>32</td>`
2187ff9feeb5SBill Wendling:raw-html:`<td>Reserved</td>`
2188ff9feeb5SBill Wendling:raw-html:`</tr>`
2189ff9feeb5SBill Wendling:raw-html:`<tr>`
2190ff9feeb5SBill Wendling:raw-html:`<td>40</td>`
2191ff9feeb5SBill Wendling:raw-html:`<td>Saved FP (r31)</td>`
2192ff9feeb5SBill Wendling:raw-html:`</tr>`
2193ff9feeb5SBill Wendling:raw-html:`</table>`
2194ff9feeb5SBill Wendling
2195ff9feeb5SBill WendlingThe *parameter area* is used to store arguments being passed to a callee
2196ff9feeb5SBill Wendlingfunction.  Following the PowerPC ABI, the first few arguments are actually
2197ff9feeb5SBill Wendlingpassed in registers, with the space in the parameter area unused.  However, if
2198ff9feeb5SBill Wendlingthere are not enough registers or the callee is a thunk or vararg function,
2199ff9feeb5SBill Wendlingthese register arguments can be spilled into the parameter area.  Thus, the
2200ff9feeb5SBill Wendlingparameter area must be large enough to store all the parameters for the largest
2201ff9feeb5SBill Wendlingcall sequence made by the caller.  The size must also be minimally large enough
2202ff9feeb5SBill Wendlingto spill registers r3-r10.  This allows callees blind to the call signature,
2203ff9feeb5SBill Wendlingsuch as thunks and vararg functions, enough space to cache the argument
2204ff9feeb5SBill Wendlingregisters.  Therefore, the parameter area is minimally 32 bytes (64 bytes in 64
2205ff9feeb5SBill Wendlingbit mode.)  Also note that since the parameter area is a fixed offset from the
2206f65d4aa9SKazuaki Ishizakitop of the frame, that a callee can access its split arguments using fixed
2207ff9feeb5SBill Wendlingoffsets from the stack pointer (or base pointer.)
2208ff9feeb5SBill Wendling
2209ff9feeb5SBill WendlingCombining the information about the linkage, parameter areas and alignment. A
2210ff9feeb5SBill Wendlingstack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit mode.
2211ff9feeb5SBill Wendling
2212ff9feeb5SBill WendlingThe *dynamic area* starts out as size zero.  If a function uses dynamic alloca
2213ff9feeb5SBill Wendlingthen space is added to the stack, the linkage and parameter areas are shifted to
2214ff9feeb5SBill Wendlingtop of stack, and the new space is available immediately below the linkage and
2215ff9feeb5SBill Wendlingparameter areas.  The cost of shifting the linkage and parameter areas is minor
2216ff9feeb5SBill Wendlingsince only the link value needs to be copied.  The link value can be easily
2217ff9feeb5SBill Wendlingfetched by adding the original frame size to the base pointer.  Note that
2218ff9feeb5SBill Wendlingallocations in the dynamic space need to observe 16 byte alignment.
2219ff9feeb5SBill Wendling
2220ff9feeb5SBill WendlingThe *locals area* is where the llvm compiler reserves space for local variables.
2221ff9feeb5SBill Wendling
2222ff9feeb5SBill WendlingThe *saved registers area* is where the llvm compiler spills callee saved
2223ff9feeb5SBill Wendlingregisters on entry to the callee.
2224ff9feeb5SBill Wendling
2225ff9feeb5SBill WendlingProlog/Epilog
2226ff9feeb5SBill Wendling^^^^^^^^^^^^^
2227ff9feeb5SBill Wendling
2228ff9feeb5SBill WendlingThe llvm prolog and epilog are the same as described in the PowerPC ABI, with
2229ff9feeb5SBill Wendlingthe following exceptions.  Callee saved registers are spilled after the frame is
2230ff9feeb5SBill Wendlingcreated.  This allows the llvm epilog/prolog support to be common with other
2231ff9feeb5SBill Wendlingtargets.  The base pointer callee saved register r31 is saved in the TOC slot of
2232ff9feeb5SBill Wendlinglinkage area.  This simplifies allocation of space for the base pointer and
2233843b7515SSylvestre Ledrumakes it convenient to locate programmatically and during debugging.
2234ff9feeb5SBill Wendling
2235ff9feeb5SBill WendlingDynamic Allocation
2236ff9feeb5SBill Wendling^^^^^^^^^^^^^^^^^^
2237ff9feeb5SBill Wendling
2238ff9feeb5SBill Wendling.. note::
2239ff9feeb5SBill Wendling
2240ff9feeb5SBill Wendling  TODO - More to come.
2241ff9feeb5SBill Wendling
2242f73d7a53SJustin HolewinskiThe NVPTX backend
2243f73d7a53SJustin Holewinski-----------------
2244ff9feeb5SBill Wendling
2245f73d7a53SJustin HolewinskiThe NVPTX code generator under lib/Target/NVPTX is an open-source version of
2246f73d7a53SJustin Holewinskithe NVIDIA NVPTX code generator for LLVM.  It is contributed by NVIDIA and is
2247f73d7a53SJustin Holewinskia port of the code generator used in the CUDA compiler (nvcc).  It targets the
2248f73d7a53SJustin HolewinskiPTX 3.0/3.1 ISA and can target any compute capability greater than or equal to
2249f73d7a53SJustin Holewinski2.0 (Fermi).
2250ff9feeb5SBill Wendling
2251f73d7a53SJustin HolewinskiThis target is of production quality and should be completely compatible with
2252f73d7a53SJustin Holewinskithe official NVIDIA toolchain.
2253ff9feeb5SBill Wendling
2254ff9feeb5SBill WendlingCode Generator Options:
2255ff9feeb5SBill Wendling
2256ff9feeb5SBill Wendling:raw-html:`<table border="1" cellspacing="0">`
2257ff9feeb5SBill Wendling:raw-html:`<tr>`
2258ff9feeb5SBill Wendling:raw-html:`<th>Option</th>`
2259ff9feeb5SBill Wendling:raw-html:`<th>Description</th>`
2260ff9feeb5SBill Wendling:raw-html:`</tr>`
2261ff9feeb5SBill Wendling:raw-html:`<tr>`
2262f73d7a53SJustin Holewinski:raw-html:`<td>sm_20</td>`
2263f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 2.0</td>`
2264ff9feeb5SBill Wendling:raw-html:`</tr>`
2265ff9feeb5SBill Wendling:raw-html:`<tr>`
2266f73d7a53SJustin Holewinski:raw-html:`<td>sm_21</td>`
2267f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 2.1</td>`
2268ff9feeb5SBill Wendling:raw-html:`</tr>`
2269ff9feeb5SBill Wendling:raw-html:`<tr>`
2270f73d7a53SJustin Holewinski:raw-html:`<td>sm_30</td>`
2271f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 3.0</td>`
2272f73d7a53SJustin Holewinski:raw-html:`</tr>`
2273f73d7a53SJustin Holewinski:raw-html:`<tr>`
2274f73d7a53SJustin Holewinski:raw-html:`<td>sm_35</td>`
2275f73d7a53SJustin Holewinski:raw-html:`<td align="left">Set shader model/compute capability to 3.5</td>`
2276f73d7a53SJustin Holewinski:raw-html:`</tr>`
2277f73d7a53SJustin Holewinski:raw-html:`<tr>`
2278f73d7a53SJustin Holewinski:raw-html:`<td>ptx30</td>`
2279f73d7a53SJustin Holewinski:raw-html:`<td align="left">Target PTX 3.0</td>`
2280f73d7a53SJustin Holewinski:raw-html:`</tr>`
2281f73d7a53SJustin Holewinski:raw-html:`<tr>`
2282f73d7a53SJustin Holewinski:raw-html:`<td>ptx31</td>`
2283f73d7a53SJustin Holewinski:raw-html:`<td align="left">Target PTX 3.1</td>`
2284ff9feeb5SBill Wendling:raw-html:`</tr>`
2285ff9feeb5SBill Wendling:raw-html:`</table>`
2286ff9feeb5SBill Wendling
2287cb6b408dSAlexei StarovoitovThe extended Berkeley Packet Filter (eBPF) backend
2288cb6b408dSAlexei Starovoitov--------------------------------------------------
2289cb6b408dSAlexei Starovoitov
2290cb6b408dSAlexei StarovoitovExtended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used
2291cb6b408dSAlexei Starovoitovto filter network packets.  The
2292cb6b408dSAlexei Starovoitov`bpf() system call <http://man7.org/linux/man-pages/man2/bpf.2.html>`_
2293cb6b408dSAlexei Starovoitovperforms a range of operations related to eBPF.  For both cBPF and eBPF
2294cb6b408dSAlexei Starovoitovprograms, the Linux kernel statically analyzes the programs before loading
2295cb6b408dSAlexei Starovoitovthem, in order to ensure that they cannot harm the running system.  eBPF is
2296cb6b408dSAlexei Starovoitova 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs.
2297cb6b408dSAlexei StarovoitovOpcodes are 8-bit encoded, and 87 instructions are defined.  There are 10
2298cb6b408dSAlexei Starovoitovregisters, grouped by function as outlined below.
2299cb6b408dSAlexei Starovoitov
2300cb6b408dSAlexei Starovoitov::
2301cb6b408dSAlexei Starovoitov
2302cb6b408dSAlexei Starovoitov  R0        return value from in-kernel functions; exit value for eBPF program
2303cb6b408dSAlexei Starovoitov  R1 - R5   function call arguments to in-kernel functions
2304cb6b408dSAlexei Starovoitov  R6 - R9   callee-saved registers preserved by in-kernel functions
2305cb6b408dSAlexei Starovoitov  R10       stack frame pointer (read only)
2306cb6b408dSAlexei Starovoitov
2307cb6b408dSAlexei StarovoitovInstruction encoding (arithmetic and jump)
2308cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2309cb6b408dSAlexei StarovoitoveBPF is reusing most of the opcode encoding from classic to simplify conversion
2310cb6b408dSAlexei Starovoitovof classic BPF to eBPF.  For arithmetic and jump instructions the 8-bit 'code'
2311cb6b408dSAlexei Starovoitovfield is divided into three parts:
2312cb6b408dSAlexei Starovoitov
2313cb6b408dSAlexei Starovoitov::
2314cb6b408dSAlexei Starovoitov
2315cb6b408dSAlexei Starovoitov  +----------------+--------+--------------------+
2316cb6b408dSAlexei Starovoitov  |   4 bits       |  1 bit |   3 bits           |
2317cb6b408dSAlexei Starovoitov  | operation code | source | instruction class  |
2318cb6b408dSAlexei Starovoitov  +----------------+--------+--------------------+
2319cb6b408dSAlexei Starovoitov  (MSB)                                      (LSB)
2320cb6b408dSAlexei Starovoitov
2321cb6b408dSAlexei StarovoitovThree LSB bits store instruction class which is one of:
2322cb6b408dSAlexei Starovoitov
2323cb6b408dSAlexei Starovoitov::
2324cb6b408dSAlexei Starovoitov
2325cb6b408dSAlexei Starovoitov  BPF_LD     0x0
2326cb6b408dSAlexei Starovoitov  BPF_LDX    0x1
2327cb6b408dSAlexei Starovoitov  BPF_ST     0x2
2328cb6b408dSAlexei Starovoitov  BPF_STX    0x3
2329cb6b408dSAlexei Starovoitov  BPF_ALU    0x4
2330cb6b408dSAlexei Starovoitov  BPF_JMP    0x5
2331cb6b408dSAlexei Starovoitov  (unused)   0x6
2332cb6b408dSAlexei Starovoitov  BPF_ALU64  0x7
2333cb6b408dSAlexei Starovoitov
2334cb6b408dSAlexei StarovoitovWhen BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP,
2335cb6b408dSAlexei Starovoitov4th bit encodes source operand
2336cb6b408dSAlexei Starovoitov
2337cb6b408dSAlexei Starovoitov::
2338cb6b408dSAlexei Starovoitov
233933434d5fSYonghong Song  BPF_X     0x1  use src_reg register as source operand
234033434d5fSYonghong Song  BPF_K     0x0  use 32 bit immediate as source operand
2341cb6b408dSAlexei Starovoitov
2342cb6b408dSAlexei Starovoitovand four MSB bits store operation code
2343cb6b408dSAlexei Starovoitov
2344cb6b408dSAlexei Starovoitov::
2345cb6b408dSAlexei Starovoitov
2346cb6b408dSAlexei Starovoitov  BPF_ADD   0x0  add
2347cb6b408dSAlexei Starovoitov  BPF_SUB   0x1  subtract
2348cb6b408dSAlexei Starovoitov  BPF_MUL   0x2  multiply
2349cb6b408dSAlexei Starovoitov  BPF_DIV   0x3  divide
2350cb6b408dSAlexei Starovoitov  BPF_OR    0x4  bitwise logical OR
2351cb6b408dSAlexei Starovoitov  BPF_AND   0x5  bitwise logical AND
2352cb6b408dSAlexei Starovoitov  BPF_LSH   0x6  left shift
2353cb6b408dSAlexei Starovoitov  BPF_RSH   0x7  right shift (zero extended)
2354cb6b408dSAlexei Starovoitov  BPF_NEG   0x8  arithmetic negation
2355cb6b408dSAlexei Starovoitov  BPF_MOD   0x9  modulo
2356cb6b408dSAlexei Starovoitov  BPF_XOR   0xa  bitwise logical XOR
2357cb6b408dSAlexei Starovoitov  BPF_MOV   0xb  move register to register
2358cb6b408dSAlexei Starovoitov  BPF_ARSH  0xc  right shift (sign extended)
2359cb6b408dSAlexei Starovoitov  BPF_END   0xd  endianness conversion
2360cb6b408dSAlexei Starovoitov
2361cb6b408dSAlexei StarovoitovIf BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of
2362cb6b408dSAlexei Starovoitov
2363cb6b408dSAlexei Starovoitov::
2364cb6b408dSAlexei Starovoitov
2365cb6b408dSAlexei Starovoitov  BPF_JA    0x0  unconditional jump
2366cb6b408dSAlexei Starovoitov  BPF_JEQ   0x1  jump ==
2367cb6b408dSAlexei Starovoitov  BPF_JGT   0x2  jump >
2368cb6b408dSAlexei Starovoitov  BPF_JGE   0x3  jump >=
2369cb6b408dSAlexei Starovoitov  BPF_JSET  0x4  jump if (DST & SRC)
2370cb6b408dSAlexei Starovoitov  BPF_JNE   0x5  jump !=
2371cb6b408dSAlexei Starovoitov  BPF_JSGT  0x6  jump signed >
2372cb6b408dSAlexei Starovoitov  BPF_JSGE  0x7  jump signed >=
2373cb6b408dSAlexei Starovoitov  BPF_CALL  0x8  function call
2374cb6b408dSAlexei Starovoitov  BPF_EXIT  0x9  function return
2375cb6b408dSAlexei Starovoitov
2376cb6b408dSAlexei StarovoitovInstruction encoding (load, store)
2377cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2378cb6b408dSAlexei StarovoitovFor load and store instructions the 8-bit 'code' field is divided as:
2379cb6b408dSAlexei Starovoitov
2380cb6b408dSAlexei Starovoitov::
2381cb6b408dSAlexei Starovoitov
2382cb6b408dSAlexei Starovoitov  +--------+--------+-------------------+
2383cb6b408dSAlexei Starovoitov  | 3 bits | 2 bits |   3 bits          |
2384cb6b408dSAlexei Starovoitov  |  mode  |  size  | instruction class |
2385cb6b408dSAlexei Starovoitov  +--------+--------+-------------------+
2386cb6b408dSAlexei Starovoitov  (MSB)                             (LSB)
2387cb6b408dSAlexei Starovoitov
2388cb6b408dSAlexei StarovoitovSize modifier is one of
2389cb6b408dSAlexei Starovoitov
2390cb6b408dSAlexei Starovoitov::
2391cb6b408dSAlexei Starovoitov
2392cb6b408dSAlexei Starovoitov  BPF_W       0x0  word
2393cb6b408dSAlexei Starovoitov  BPF_H       0x1  half word
2394cb6b408dSAlexei Starovoitov  BPF_B       0x2  byte
2395cb6b408dSAlexei Starovoitov  BPF_DW      0x3  double word
2396cb6b408dSAlexei Starovoitov
2397cb6b408dSAlexei StarovoitovMode modifier is one of
2398cb6b408dSAlexei Starovoitov
2399cb6b408dSAlexei Starovoitov::
2400cb6b408dSAlexei Starovoitov
2401cb6b408dSAlexei Starovoitov  BPF_IMM     0x0  immediate
2402cb6b408dSAlexei Starovoitov  BPF_ABS     0x1  used to access packet data
2403cb6b408dSAlexei Starovoitov  BPF_IND     0x2  used to access packet data
2404cb6b408dSAlexei Starovoitov  BPF_MEM     0x3  memory
2405cb6b408dSAlexei Starovoitov  (reserved)  0x4
2406cb6b408dSAlexei Starovoitov  (reserved)  0x5
2407cb6b408dSAlexei Starovoitov  BPF_XADD    0x6  exclusive add
2408cb6b408dSAlexei Starovoitov
2409cb6b408dSAlexei Starovoitov
2410cb6b408dSAlexei StarovoitovPacket data access (BPF_ABS, BPF_IND)
2411cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2412cb6b408dSAlexei Starovoitov
2413cb6b408dSAlexei StarovoitovTwo non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
2414cb6b408dSAlexei Starovoitov(BPF_IND | <size> | BPF_LD) which are used to access packet data.
2415cb6b408dSAlexei StarovoitovRegister R6 is an implicit input that must contain pointer to sk_buff.
2416cb6b408dSAlexei StarovoitovRegister R0 is an implicit output which contains the data fetched
2417cb6b408dSAlexei Starovoitovfrom the packet.  Registers R1-R5 are scratch registers and must not
2418cb6b408dSAlexei Starovoitovbe used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD
2419cb6b408dSAlexei Starovoitovinstructions.  These instructions have implicit program exit condition
2420cb6b408dSAlexei Starovoitovas well.  When eBPF program is trying to access the data beyond
2421cb6b408dSAlexei Starovoitovthe packet boundary, the interpreter will abort the execution of the program.
2422cb6b408dSAlexei Starovoitov
2423cb6b408dSAlexei StarovoitovBPF_IND | BPF_W | BPF_LD is equivalent to:
2424cb6b408dSAlexei Starovoitov  R0 = ntohl(\*(u32 \*) (((struct sk_buff \*) R6)->data + src_reg + imm32))
2425cb6b408dSAlexei Starovoitov
2426cb6b408dSAlexei StarovoitoveBPF maps
2427cb6b408dSAlexei Starovoitov^^^^^^^^^
2428cb6b408dSAlexei Starovoitov
2429cb6b408dSAlexei StarovoitoveBPF maps are provided for sharing data between kernel and user-space.
2430cb6b408dSAlexei StarovoitovCurrently implemented types are hash and array, with potential extension to
2431cb6b408dSAlexei Starovoitovsupport bloom filters, radix trees, etc.  A map is defined by its type,
2432cb6b408dSAlexei Starovoitovmaximum number of elements, key size and value size in bytes.  eBPF syscall
2433cb6b408dSAlexei Starovoitovsupports create, update, find and delete functions on maps.
2434cb6b408dSAlexei Starovoitov
2435cb6b408dSAlexei StarovoitovFunction calls
2436cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^^
2437cb6b408dSAlexei Starovoitov
2438cb6b408dSAlexei StarovoitovFunction call arguments are passed using up to five registers (R1 - R5).
2439cb6b408dSAlexei StarovoitovThe return value is passed in a dedicated register (R0).  Four additional
2440cb6b408dSAlexei Starovoitovregisters (R6 - R9) are callee-saved, and the values in these registers
2441cb6b408dSAlexei Starovoitovare preserved within kernel functions.  R0 - R5 are scratch registers within
2442cb6b408dSAlexei Starovoitovkernel functions, and eBPF programs must therefor store/restore values in
2443cb6b408dSAlexei Starovoitovthese registers if needed across function calls.  The stack can be accessed
2444cb6b408dSAlexei Starovoitovusing the read-only frame pointer R10.  eBPF registers map 1:1 to hardware
2445cb6b408dSAlexei Starovoitovregisters on x86_64 and other 64-bit architectures.  For example, x86_64
2446cb6b408dSAlexei Starovoitovin-kernel JIT maps them as
2447cb6b408dSAlexei Starovoitov
2448cb6b408dSAlexei Starovoitov::
2449cb6b408dSAlexei Starovoitov
2450cb6b408dSAlexei Starovoitov  R0 - rax
2451cb6b408dSAlexei Starovoitov  R1 - rdi
2452cb6b408dSAlexei Starovoitov  R2 - rsi
2453cb6b408dSAlexei Starovoitov  R3 - rdx
2454cb6b408dSAlexei Starovoitov  R4 - rcx
2455cb6b408dSAlexei Starovoitov  R5 - r8
2456cb6b408dSAlexei Starovoitov  R6 - rbx
2457cb6b408dSAlexei Starovoitov  R7 - r13
2458cb6b408dSAlexei Starovoitov  R8 - r14
2459cb6b408dSAlexei Starovoitov  R9 - r15
2460cb6b408dSAlexei Starovoitov  R10 - rbp
2461cb6b408dSAlexei Starovoitov
2462cb6b408dSAlexei Starovoitovsince x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
2463cb6b408dSAlexei Starovoitovand rbx, r12 - r15 are callee saved.
2464cb6b408dSAlexei Starovoitov
2465cb6b408dSAlexei StarovoitovProgram start
2466cb6b408dSAlexei Starovoitov^^^^^^^^^^^^^
2467cb6b408dSAlexei Starovoitov
2468cb6b408dSAlexei StarovoitovAn eBPF program receives a single argument and contains
2469cb6b408dSAlexei Starovoitova single eBPF main routine; the program does not contain eBPF functions.
2470cb6b408dSAlexei StarovoitovFunction calls are limited to a predefined set of kernel functions.  The size
2471cb6b408dSAlexei Starovoitovof a program is limited to 4K instructions:  this ensures fast termination and
2472cb6b408dSAlexei Starovoitova limited number of kernel function calls.  Prior to running an eBPF program,
2473cb6b408dSAlexei Starovoitova verifier performs static analysis to prevent loops in the code and
2474cb6b408dSAlexei Starovoitovto ensure valid register usage and operand types.
24759fdfa518STom Stellard
24769fdfa518STom StellardThe AMDGPU backend
24779fdfa518STom Stellard------------------
24789fdfa518STom Stellard
2479f16a45eaSTony TyeThe AMDGPU code generator lives in the ``lib/Target/AMDGPU``
2480f16a45eaSTony Tyedirectory. This code generator is capable of targeting a variety of
2481f16a45eaSTony TyeAMD GPU processors. Refer to :doc:`AMDGPUUsage` for more information.
2482