1d80f118eSChris Lattner==============================================
2d80f118eSChris LattnerKaleidoscope: Adding JIT and Optimizer Support
3d80f118eSChris Lattner==============================================
4d80f118eSChris Lattner
5d80f118eSChris Lattner.. contents::
6d80f118eSChris Lattner   :local:
7d80f118eSChris Lattner
8d80f118eSChris LattnerChapter 4 Introduction
9d80f118eSChris Lattner======================
10d80f118eSChris Lattner
11d80f118eSChris LattnerWelcome to Chapter 4 of the "`Implementing a language with
12d80f118eSChris LattnerLLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
13d80f118eSChris Lattnerof a simple language and added support for generating LLVM IR. This
14d80f118eSChris Lattnerchapter describes two new techniques: adding optimizer support to your
15d80f118eSChris Lattnerlanguage, and adding JIT compiler support. These additions will
16d80f118eSChris Lattnerdemonstrate how to get nice, efficient code for the Kaleidoscope
17d80f118eSChris Lattnerlanguage.
18d80f118eSChris Lattner
19d80f118eSChris LattnerTrivial Constant Folding
20d80f118eSChris Lattner========================
21d80f118eSChris Lattner
22d80f118eSChris LattnerOur demonstration for Chapter 3 is elegant and easy to extend.
23d80f118eSChris LattnerUnfortunately, it does not produce wonderful code. The IRBuilder,
24d80f118eSChris Lattnerhowever, does give us obvious optimizations when compiling simple code:
25d80f118eSChris Lattner
26d80f118eSChris Lattner::
27d80f118eSChris Lattner
28d80f118eSChris Lattner    ready> def test(x) 1+2+x;
29d80f118eSChris Lattner    Read function definition:
30d80f118eSChris Lattner    define double @test(double %x) {
31d80f118eSChris Lattner    entry:
32d80f118eSChris Lattner            %addtmp = fadd double 3.000000e+00, %x
33d80f118eSChris Lattner            ret double %addtmp
34d80f118eSChris Lattner    }
35d80f118eSChris Lattner
36d80f118eSChris LattnerThis code is not a literal transcription of the AST built by parsing the
37d80f118eSChris Lattnerinput. That would be:
38d80f118eSChris Lattner
39d80f118eSChris Lattner::
40d80f118eSChris Lattner
41d80f118eSChris Lattner    ready> def test(x) 1+2+x;
42d80f118eSChris Lattner    Read function definition:
43d80f118eSChris Lattner    define double @test(double %x) {
44d80f118eSChris Lattner    entry:
45d80f118eSChris Lattner            %addtmp = fadd double 2.000000e+00, 1.000000e+00
46d80f118eSChris Lattner            %addtmp1 = fadd double %addtmp, %x
47d80f118eSChris Lattner            ret double %addtmp1
48d80f118eSChris Lattner    }
49d80f118eSChris Lattner
50d80f118eSChris LattnerConstant folding, as seen above, in particular, is a very common and
51d80f118eSChris Lattnervery important optimization: so much so that many language implementors
52d80f118eSChris Lattnerimplement constant folding support in their AST representation.
53d80f118eSChris Lattner
54d80f118eSChris LattnerWith LLVM, you don't need this support in the AST. Since all calls to
55d80f118eSChris Lattnerbuild LLVM IR go through the LLVM IR builder, the builder itself checked
56d80f118eSChris Lattnerto see if there was a constant folding opportunity when you call it. If
57d80f118eSChris Lattnerso, it just does the constant fold and return the constant instead of
58d80f118eSChris Lattnercreating an instruction.
59d80f118eSChris Lattner
60d80f118eSChris LattnerWell, that was easy :). In practice, we recommend always using
61d80f118eSChris Lattner``IRBuilder`` when generating code like this. It has no "syntactic
62d80f118eSChris Lattneroverhead" for its use (you don't have to uglify your compiler with
63d80f118eSChris Lattnerconstant checks everywhere) and it can dramatically reduce the amount of
64d80f118eSChris LattnerLLVM IR that is generated in some cases (particular for languages with a
65d80f118eSChris Lattnermacro preprocessor or that use a lot of constants).
66d80f118eSChris Lattner
67d80f118eSChris LattnerOn the other hand, the ``IRBuilder`` is limited by the fact that it does
68d80f118eSChris Lattnerall of its analysis inline with the code as it is built. If you take a
69d80f118eSChris Lattnerslightly more complex example:
70d80f118eSChris Lattner
71d80f118eSChris Lattner::
72d80f118eSChris Lattner
73d80f118eSChris Lattner    ready> def test(x) (1+2+x)*(x+(1+2));
74d80f118eSChris Lattner    ready> Read function definition:
75d80f118eSChris Lattner    define double @test(double %x) {
76d80f118eSChris Lattner    entry:
77d80f118eSChris Lattner            %addtmp = fadd double 3.000000e+00, %x
78d80f118eSChris Lattner            %addtmp1 = fadd double %x, 3.000000e+00
79d80f118eSChris Lattner            %multmp = fmul double %addtmp, %addtmp1
80d80f118eSChris Lattner            ret double %multmp
81d80f118eSChris Lattner    }
82d80f118eSChris Lattner
83d80f118eSChris LattnerIn this case, the LHS and RHS of the multiplication are the same value.
84d80f118eSChris LattnerWe'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
85d80f118eSChris Lattnerinstead of computing "``x+3``" twice.
86d80f118eSChris Lattner
87d80f118eSChris LattnerUnfortunately, no amount of local analysis will be able to detect and
88d80f118eSChris Lattnercorrect this. This requires two transformations: reassociation of
89d80f118eSChris Lattnerexpressions (to make the add's lexically identical) and Common
90d80f118eSChris LattnerSubexpression Elimination (CSE) to delete the redundant add instruction.
91d80f118eSChris LattnerFortunately, LLVM provides a broad range of optimizations that you can
92d80f118eSChris Lattneruse, in the form of "passes".
93d80f118eSChris Lattner
94d80f118eSChris LattnerLLVM Optimization Passes
95d80f118eSChris Lattner========================
96d80f118eSChris Lattner
97d80f118eSChris Lattner.. warning::
98d80f118eSChris Lattner
99d80f118eSChris Lattner   Due to the transition to the new PassManager infrastructure this tutorial
100d80f118eSChris Lattner   is based on ``llvm::legacy::FunctionPassManager`` which can be found in
10172fd1033SSylvestre Ledru   `LegacyPassManager.h <https://llvm.org/doxygen/classllvm_1_1legacy_1_1FunctionPassManager.html>`_.
102d80f118eSChris Lattner   For the purpose of the this tutorial the above should be used until
103d80f118eSChris Lattner   the pass manager transition is complete.
104d80f118eSChris Lattner
105d80f118eSChris LattnerLLVM provides many optimization passes, which do many different sorts of
106d80f118eSChris Lattnerthings and have different tradeoffs. Unlike other systems, LLVM doesn't
107d80f118eSChris Lattnerhold to the mistaken notion that one set of optimizations is right for
108d80f118eSChris Lattnerall languages and for all situations. LLVM allows a compiler implementor
109d80f118eSChris Lattnerto make complete decisions about what optimizations to use, in which
110d80f118eSChris Lattnerorder, and in what situation.
111d80f118eSChris Lattner
112d80f118eSChris LattnerAs a concrete example, LLVM supports both "whole module" passes, which
113d80f118eSChris Lattnerlook across as large of body of code as they can (often a whole file,
114d80f118eSChris Lattnerbut if run at link time, this can be a substantial portion of the whole
115d80f118eSChris Lattnerprogram). It also supports and includes "per-function" passes which just
116d80f118eSChris Lattneroperate on a single function at a time, without looking at other
117d80f118eSChris Lattnerfunctions. For more information on passes and how they are run, see the
1182916489cSkristina`How to Write a Pass <../../WritingAnLLVMPass.html>`_ document and the
1192916489cSkristina`List of LLVM Passes <../../Passes.html>`_.
120d80f118eSChris Lattner
121d80f118eSChris LattnerFor Kaleidoscope, we are currently generating functions on the fly, one
122d80f118eSChris Lattnerat a time, as the user types them in. We aren't shooting for the
123d80f118eSChris Lattnerultimate optimization experience in this setting, but we also want to
124d80f118eSChris Lattnercatch the easy and quick stuff where possible. As such, we will choose
125d80f118eSChris Lattnerto run a few per-function optimizations as the user types the function
126d80f118eSChris Lattnerin. If we wanted to make a "static Kaleidoscope compiler", we would use
127d80f118eSChris Lattnerexactly the code we have now, except that we would defer running the
128d80f118eSChris Lattneroptimizer until the entire file has been parsed.
129d80f118eSChris Lattner
130d80f118eSChris LattnerIn order to get per-function optimizations going, we need to set up a
1312916489cSkristina`FunctionPassManager <../../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
132d80f118eSChris Lattnerand organize the LLVM optimizations that we want to run. Once we have
133d80f118eSChris Lattnerthat, we can add a set of optimizations to run. We'll need a new
134d80f118eSChris LattnerFunctionPassManager for each module that we want to optimize, so we'll
135d80f118eSChris Lattnerwrite a function to create and initialize both the module and pass manager
136d80f118eSChris Lattnerfor us:
137d80f118eSChris Lattner
138d80f118eSChris Lattner.. code-block:: c++
139d80f118eSChris Lattner
140d80f118eSChris Lattner    void InitializeModuleAndPassManager(void) {
141d80f118eSChris Lattner      // Open a new module.
1420eaee545SJonas Devlieghere      TheModule = std::make_unique<Module>("my cool jit", TheContext);
143d80f118eSChris Lattner
144d80f118eSChris Lattner      // Create a new pass manager attached to it.
145884fb45eSErich Keane      TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get());
146d80f118eSChris Lattner
147d80f118eSChris Lattner      // Do simple "peephole" optimizations and bit-twiddling optzns.
148d80f118eSChris Lattner      TheFPM->add(createInstructionCombiningPass());
149d80f118eSChris Lattner      // Reassociate expressions.
150d80f118eSChris Lattner      TheFPM->add(createReassociatePass());
151d80f118eSChris Lattner      // Eliminate Common SubExpressions.
152d80f118eSChris Lattner      TheFPM->add(createGVNPass());
153d80f118eSChris Lattner      // Simplify the control flow graph (deleting unreachable blocks, etc).
154d80f118eSChris Lattner      TheFPM->add(createCFGSimplificationPass());
155d80f118eSChris Lattner
156d80f118eSChris Lattner      TheFPM->doInitialization();
157d80f118eSChris Lattner    }
158d80f118eSChris Lattner
159d80f118eSChris LattnerThis code initializes the global module ``TheModule``, and the function pass
160d80f118eSChris Lattnermanager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
161d80f118eSChris Lattnerset up, we use a series of "add" calls to add a bunch of LLVM passes.
162d80f118eSChris Lattner
163d80f118eSChris LattnerIn this case, we choose to add four optimization passes.
164d80f118eSChris LattnerThe passes we choose here are a pretty standard set
165d80f118eSChris Lattnerof "cleanup" optimizations that are useful for a wide variety of code. I won't
166d80f118eSChris Lattnerdelve into what they do but, believe me, they are a good starting place :).
167d80f118eSChris Lattner
168d80f118eSChris LattnerOnce the PassManager is set up, we need to make use of it. We do this by
169d80f118eSChris Lattnerrunning it after our newly created function is constructed (in
170d80f118eSChris Lattner``FunctionAST::codegen()``), but before it is returned to the client:
171d80f118eSChris Lattner
172d80f118eSChris Lattner.. code-block:: c++
173d80f118eSChris Lattner
174d80f118eSChris Lattner      if (Value *RetVal = Body->codegen()) {
175d80f118eSChris Lattner        // Finish off the function.
176d80f118eSChris Lattner        Builder.CreateRet(RetVal);
177d80f118eSChris Lattner
178d80f118eSChris Lattner        // Validate the generated code, checking for consistency.
179d80f118eSChris Lattner        verifyFunction(*TheFunction);
180d80f118eSChris Lattner
181d80f118eSChris Lattner        // Optimize the function.
182d80f118eSChris Lattner        TheFPM->run(*TheFunction);
183d80f118eSChris Lattner
184d80f118eSChris Lattner        return TheFunction;
185d80f118eSChris Lattner      }
186d80f118eSChris Lattner
187d80f118eSChris LattnerAs you can see, this is pretty straightforward. The
188d80f118eSChris Lattner``FunctionPassManager`` optimizes and updates the LLVM Function\* in
189d80f118eSChris Lattnerplace, improving (hopefully) its body. With this in place, we can try
190d80f118eSChris Lattnerour test above again:
191d80f118eSChris Lattner
192d80f118eSChris Lattner::
193d80f118eSChris Lattner
194d80f118eSChris Lattner    ready> def test(x) (1+2+x)*(x+(1+2));
195d80f118eSChris Lattner    ready> Read function definition:
196d80f118eSChris Lattner    define double @test(double %x) {
197d80f118eSChris Lattner    entry:
198d80f118eSChris Lattner            %addtmp = fadd double %x, 3.000000e+00
199d80f118eSChris Lattner            %multmp = fmul double %addtmp, %addtmp
200d80f118eSChris Lattner            ret double %multmp
201d80f118eSChris Lattner    }
202d80f118eSChris Lattner
203d80f118eSChris LattnerAs expected, we now get our nicely optimized code, saving a floating
204d80f118eSChris Lattnerpoint add instruction from every execution of this function.
205d80f118eSChris Lattner
206d80f118eSChris LattnerLLVM provides a wide variety of optimizations that can be used in
207d80f118eSChris Lattnercertain circumstances. Some `documentation about the various
2082916489cSkristinapasses <../../Passes.html>`_ is available, but it isn't very complete.
209d80f118eSChris LattnerAnother good source of ideas can come from looking at the passes that
210d80f118eSChris Lattner``Clang`` runs to get started. The "``opt``" tool allows you to
211d80f118eSChris Lattnerexperiment with passes from the command line, so you can see if they do
212d80f118eSChris Lattneranything.
213d80f118eSChris Lattner
214d80f118eSChris LattnerNow that we have reasonable code coming out of our front-end, let's talk
215d80f118eSChris Lattnerabout executing it!
216d80f118eSChris Lattner
217d80f118eSChris LattnerAdding a JIT Compiler
218d80f118eSChris Lattner=====================
219d80f118eSChris Lattner
220d80f118eSChris LattnerCode that is available in LLVM IR can have a wide variety of tools
221d80f118eSChris Lattnerapplied to it. For example, you can run optimizations on it (as we did
222d80f118eSChris Lattnerabove), you can dump it out in textual or binary forms, you can compile
223d80f118eSChris Lattnerthe code to an assembly file (.s) for some target, or you can JIT
224d80f118eSChris Lattnercompile it. The nice thing about the LLVM IR representation is that it
225d80f118eSChris Lattneris the "common currency" between many different parts of the compiler.
226d80f118eSChris Lattner
227d80f118eSChris LattnerIn this section, we'll add JIT compiler support to our interpreter. The
228d80f118eSChris Lattnerbasic idea that we want for Kaleidoscope is to have the user enter
229d80f118eSChris Lattnerfunction bodies as they do now, but immediately evaluate the top-level
230d80f118eSChris Lattnerexpressions they type in. For example, if they type in "1 + 2;", we
231d80f118eSChris Lattnershould evaluate and print out 3. If they define a function, they should
232d80f118eSChris Lattnerbe able to call it from the command line.
233d80f118eSChris Lattner
234d80f118eSChris LattnerIn order to do this, we first prepare the environment to create code for
235d80f118eSChris Lattnerthe current native target and declare and initialize the JIT. This is
236d80f118eSChris Lattnerdone by calling some ``InitializeNativeTarget\*`` functions and
237d80f118eSChris Lattneradding a global variable ``TheJIT``, and initializing it in
238d80f118eSChris Lattner``main``:
239d80f118eSChris Lattner
240d80f118eSChris Lattner.. code-block:: c++
241d80f118eSChris Lattner
242d80f118eSChris Lattner    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
243d80f118eSChris Lattner    ...
244d80f118eSChris Lattner    int main() {
245d80f118eSChris Lattner      InitializeNativeTarget();
246d80f118eSChris Lattner      InitializeNativeTargetAsmPrinter();
247d80f118eSChris Lattner      InitializeNativeTargetAsmParser();
248d80f118eSChris Lattner
249d80f118eSChris Lattner      // Install standard binary operators.
250d80f118eSChris Lattner      // 1 is lowest precedence.
251d80f118eSChris Lattner      BinopPrecedence['<'] = 10;
252d80f118eSChris Lattner      BinopPrecedence['+'] = 20;
253d80f118eSChris Lattner      BinopPrecedence['-'] = 20;
254d80f118eSChris Lattner      BinopPrecedence['*'] = 40; // highest.
255d80f118eSChris Lattner
256d80f118eSChris Lattner      // Prime the first token.
257d80f118eSChris Lattner      fprintf(stderr, "ready> ");
258d80f118eSChris Lattner      getNextToken();
259d80f118eSChris Lattner
2600eaee545SJonas Devlieghere      TheJIT = std::make_unique<KaleidoscopeJIT>();
261d80f118eSChris Lattner
262d80f118eSChris Lattner      // Run the main "interpreter loop" now.
263d80f118eSChris Lattner      MainLoop();
264d80f118eSChris Lattner
265d80f118eSChris Lattner      return 0;
266d80f118eSChris Lattner    }
267d80f118eSChris Lattner
268d80f118eSChris LattnerWe also need to setup the data layout for the JIT:
269d80f118eSChris Lattner
270d80f118eSChris Lattner.. code-block:: c++
271d80f118eSChris Lattner
272d80f118eSChris Lattner    void InitializeModuleAndPassManager(void) {
273d80f118eSChris Lattner      // Open a new module.
2740eaee545SJonas Devlieghere      TheModule = std::make_unique<Module>("my cool jit", TheContext);
275d80f118eSChris Lattner      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
276d80f118eSChris Lattner
277d80f118eSChris Lattner      // Create a new pass manager attached to it.
278884fb45eSErich Keane      TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get());
279d80f118eSChris Lattner      ...
280d80f118eSChris Lattner
281d80f118eSChris LattnerThe KaleidoscopeJIT class is a simple JIT built specifically for these
282d80f118eSChris Lattnertutorials, available inside the LLVM source code
283d80f118eSChris Lattnerat llvm-src/examples/Kaleidoscope/include/KaleidoscopeJIT.h.
284d80f118eSChris LattnerIn later chapters we will look at how it works and extend it with
285d80f118eSChris Lattnernew features, but for now we will take it as given. Its API is very simple:
286d80f118eSChris Lattner``addModule`` adds an LLVM IR module to the JIT, making its functions
287d80f118eSChris Lattneravailable for execution; ``removeModule`` removes a module, freeing any
288d80f118eSChris Lattnermemory associated with the code in that module; and ``findSymbol`` allows us
289d80f118eSChris Lattnerto look up pointers to the compiled code.
290d80f118eSChris Lattner
291d80f118eSChris LattnerWe can take this simple API and change our code that parses top-level expressions to
292d80f118eSChris Lattnerlook like this:
293d80f118eSChris Lattner
294d80f118eSChris Lattner.. code-block:: c++
295d80f118eSChris Lattner
296d80f118eSChris Lattner    static void HandleTopLevelExpression() {
297d80f118eSChris Lattner      // Evaluate a top-level expression into an anonymous function.
298d80f118eSChris Lattner      if (auto FnAST = ParseTopLevelExpr()) {
299d80f118eSChris Lattner        if (FnAST->codegen()) {
300d80f118eSChris Lattner
301d80f118eSChris Lattner          // JIT the module containing the anonymous expression, keeping a handle so
302d80f118eSChris Lattner          // we can free it later.
303d80f118eSChris Lattner          auto H = TheJIT->addModule(std::move(TheModule));
304d80f118eSChris Lattner          InitializeModuleAndPassManager();
305d80f118eSChris Lattner
306d80f118eSChris Lattner          // Search the JIT for the __anon_expr symbol.
307d80f118eSChris Lattner          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
308d80f118eSChris Lattner          assert(ExprSymbol && "Function not found");
309d80f118eSChris Lattner
310d80f118eSChris Lattner          // Get the symbol's address and cast it to the right type (takes no
311d80f118eSChris Lattner          // arguments, returns a double) so we can call it as a native function.
312d80f118eSChris Lattner          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
313d80f118eSChris Lattner          fprintf(stderr, "Evaluated to %f\n", FP());
314d80f118eSChris Lattner
315d80f118eSChris Lattner          // Delete the anonymous expression module from the JIT.
316d80f118eSChris Lattner          TheJIT->removeModule(H);
317d80f118eSChris Lattner        }
318d80f118eSChris Lattner
319bb69208dSNico WeberIf parsing and codegen succeed, the next step is to add the module containing
320d80f118eSChris Lattnerthe top-level expression to the JIT. We do this by calling addModule, which
321d80f118eSChris Lattnertriggers code generation for all the functions in the module, and returns a
322d80f118eSChris Lattnerhandle that can be used to remove the module from the JIT later. Once the module
323d80f118eSChris Lattnerhas been added to the JIT it can no longer be modified, so we also open a new
324d80f118eSChris Lattnermodule to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
325d80f118eSChris Lattner
326d80f118eSChris LattnerOnce we've added the module to the JIT we need to get a pointer to the final
327d80f118eSChris Lattnergenerated code. We do this by calling the JIT's findSymbol method, and passing
328d80f118eSChris Lattnerthe name of the top-level expression function: ``__anon_expr``. Since we just
329d80f118eSChris Lattneradded this function, we assert that findSymbol returned a result.
330d80f118eSChris Lattner
331d80f118eSChris LattnerNext, we get the in-memory address of the ``__anon_expr`` function by calling
332d80f118eSChris Lattner``getAddress()`` on the symbol. Recall that we compile top-level expressions
333d80f118eSChris Lattnerinto a self-contained LLVM function that takes no arguments and returns the
334d80f118eSChris Lattnercomputed double. Because the LLVM JIT compiler matches the native platform ABI,
335d80f118eSChris Lattnerthis means that you can just cast the result pointer to a function pointer of
336d80f118eSChris Lattnerthat type and call it directly. This means, there is no difference between JIT
337d80f118eSChris Lattnercompiled code and native machine code that is statically linked into your
338d80f118eSChris Lattnerapplication.
339d80f118eSChris Lattner
340d80f118eSChris LattnerFinally, since we don't support re-evaluation of top-level expressions, we
341d80f118eSChris Lattnerremove the module from the JIT when we're done to free the associated memory.
342d80f118eSChris LattnerRecall, however, that the module we created a few lines earlier (via
343d80f118eSChris Lattner``InitializeModuleAndPassManager``) is still open and waiting for new code to be
344d80f118eSChris Lattneradded.
345d80f118eSChris Lattner
346d80f118eSChris LattnerWith just these two changes, let's see how Kaleidoscope works now!
347d80f118eSChris Lattner
348d80f118eSChris Lattner::
349d80f118eSChris Lattner
350d80f118eSChris Lattner    ready> 4+5;
351d80f118eSChris Lattner    Read top-level expression:
352d80f118eSChris Lattner    define double @0() {
353d80f118eSChris Lattner    entry:
354d80f118eSChris Lattner      ret double 9.000000e+00
355d80f118eSChris Lattner    }
356d80f118eSChris Lattner
357d80f118eSChris Lattner    Evaluated to 9.000000
358d80f118eSChris Lattner
359d80f118eSChris LattnerWell this looks like it is basically working. The dump of the function
360d80f118eSChris Lattnershows the "no argument function that always returns double" that we
361d80f118eSChris Lattnersynthesize for each top-level expression that is typed in. This
362d80f118eSChris Lattnerdemonstrates very basic functionality, but can we do more?
363d80f118eSChris Lattner
364d80f118eSChris Lattner::
365d80f118eSChris Lattner
366d80f118eSChris Lattner    ready> def testfunc(x y) x + y*2;
367d80f118eSChris Lattner    Read function definition:
368d80f118eSChris Lattner    define double @testfunc(double %x, double %y) {
369d80f118eSChris Lattner    entry:
370d80f118eSChris Lattner      %multmp = fmul double %y, 2.000000e+00
371d80f118eSChris Lattner      %addtmp = fadd double %multmp, %x
372d80f118eSChris Lattner      ret double %addtmp
373d80f118eSChris Lattner    }
374d80f118eSChris Lattner
375d80f118eSChris Lattner    ready> testfunc(4, 10);
376d80f118eSChris Lattner    Read top-level expression:
377d80f118eSChris Lattner    define double @1() {
378d80f118eSChris Lattner    entry:
379d80f118eSChris Lattner      %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
380d80f118eSChris Lattner      ret double %calltmp
381d80f118eSChris Lattner    }
382d80f118eSChris Lattner
383d80f118eSChris Lattner    Evaluated to 24.000000
384d80f118eSChris Lattner
385d80f118eSChris Lattner    ready> testfunc(5, 10);
386d80f118eSChris Lattner    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
387d80f118eSChris Lattner
388d80f118eSChris Lattner
389d80f118eSChris LattnerFunction definitions and calls also work, but something went very wrong on that
390d80f118eSChris Lattnerlast line. The call looks valid, so what happened? As you may have guessed from
391d80f118eSChris Lattnerthe API a Module is a unit of allocation for the JIT, and testfunc was part
392d80f118eSChris Lattnerof the same module that contained anonymous expression. When we removed that
393d80f118eSChris Lattnermodule from the JIT to free the memory for the anonymous expression, we deleted
394d80f118eSChris Lattnerthe definition of ``testfunc`` along with it. Then, when we tried to call
395d80f118eSChris Lattnertestfunc a second time, the JIT could no longer find it.
396d80f118eSChris Lattner
397d80f118eSChris LattnerThe easiest way to fix this is to put the anonymous expression in a separate
398d80f118eSChris Lattnermodule from the rest of the function definitions. The JIT will happily resolve
399d80f118eSChris Lattnerfunction calls across module boundaries, as long as each of the functions called
400d80f118eSChris Lattnerhas a prototype, and is added to the JIT before it is called. By putting the
401d80f118eSChris Lattneranonymous expression in a different module we can delete it without affecting
402d80f118eSChris Lattnerthe rest of the functions.
403d80f118eSChris Lattner
404d80f118eSChris LattnerIn fact, we're going to go a step further and put every function in its own
405d80f118eSChris Lattnermodule. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
406d80f118eSChris Lattnerthat will make our environment more REPL-like: Functions can be added to the
407d80f118eSChris LattnerJIT more than once (unlike a module where every function must have a unique
408d80f118eSChris Lattnerdefinition). When you look up a symbol in KaleidoscopeJIT it will always return
409d80f118eSChris Lattnerthe most recent definition:
410d80f118eSChris Lattner
411d80f118eSChris Lattner::
412d80f118eSChris Lattner
413d80f118eSChris Lattner    ready> def foo(x) x + 1;
414d80f118eSChris Lattner    Read function definition:
415d80f118eSChris Lattner    define double @foo(double %x) {
416d80f118eSChris Lattner    entry:
417d80f118eSChris Lattner      %addtmp = fadd double %x, 1.000000e+00
418d80f118eSChris Lattner      ret double %addtmp
419d80f118eSChris Lattner    }
420d80f118eSChris Lattner
421d80f118eSChris Lattner    ready> foo(2);
422d80f118eSChris Lattner    Evaluated to 3.000000
423d80f118eSChris Lattner
424d80f118eSChris Lattner    ready> def foo(x) x + 2;
425d80f118eSChris Lattner    define double @foo(double %x) {
426d80f118eSChris Lattner    entry:
427d80f118eSChris Lattner      %addtmp = fadd double %x, 2.000000e+00
428d80f118eSChris Lattner      ret double %addtmp
429d80f118eSChris Lattner    }
430d80f118eSChris Lattner
431d80f118eSChris Lattner    ready> foo(2);
432d80f118eSChris Lattner    Evaluated to 4.000000
433d80f118eSChris Lattner
434d80f118eSChris Lattner
435d80f118eSChris LattnerTo allow each function to live in its own module we'll need a way to
436d80f118eSChris Lattnerre-generate previous function declarations into each new module we open:
437d80f118eSChris Lattner
438d80f118eSChris Lattner.. code-block:: c++
439d80f118eSChris Lattner
440d80f118eSChris Lattner    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
441d80f118eSChris Lattner
442d80f118eSChris Lattner    ...
443d80f118eSChris Lattner
444d80f118eSChris Lattner    Function *getFunction(std::string Name) {
445d80f118eSChris Lattner      // First, see if the function has already been added to the current module.
446d80f118eSChris Lattner      if (auto *F = TheModule->getFunction(Name))
447d80f118eSChris Lattner        return F;
448d80f118eSChris Lattner
449d80f118eSChris Lattner      // If not, check whether we can codegen the declaration from some existing
450d80f118eSChris Lattner      // prototype.
451d80f118eSChris Lattner      auto FI = FunctionProtos.find(Name);
452d80f118eSChris Lattner      if (FI != FunctionProtos.end())
453d80f118eSChris Lattner        return FI->second->codegen();
454d80f118eSChris Lattner
455d80f118eSChris Lattner      // If no existing prototype exists, return null.
456d80f118eSChris Lattner      return nullptr;
457d80f118eSChris Lattner    }
458d80f118eSChris Lattner
459d80f118eSChris Lattner    ...
460d80f118eSChris Lattner
461d80f118eSChris Lattner    Value *CallExprAST::codegen() {
462d80f118eSChris Lattner      // Look up the name in the global module table.
463d80f118eSChris Lattner      Function *CalleeF = getFunction(Callee);
464d80f118eSChris Lattner
465d80f118eSChris Lattner    ...
466d80f118eSChris Lattner
467d80f118eSChris Lattner    Function *FunctionAST::codegen() {
468d80f118eSChris Lattner      // Transfer ownership of the prototype to the FunctionProtos map, but keep a
469d80f118eSChris Lattner      // reference to it for use below.
470d80f118eSChris Lattner      auto &P = *Proto;
471d80f118eSChris Lattner      FunctionProtos[Proto->getName()] = std::move(Proto);
472d80f118eSChris Lattner      Function *TheFunction = getFunction(P.getName());
473d80f118eSChris Lattner      if (!TheFunction)
474d80f118eSChris Lattner        return nullptr;
475d80f118eSChris Lattner
476d80f118eSChris Lattner
477d80f118eSChris LattnerTo enable this, we'll start by adding a new global, ``FunctionProtos``, that
478d80f118eSChris Lattnerholds the most recent prototype for each function. We'll also add a convenience
479d80f118eSChris Lattnermethod, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
480d80f118eSChris LattnerOur convenience method searches ``TheModule`` for an existing function
481d80f118eSChris Lattnerdeclaration, falling back to generating a new declaration from FunctionProtos if
482d80f118eSChris Lattnerit doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
483d80f118eSChris Lattnercall to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
484d80f118eSChris Lattnerupdate the FunctionProtos map first, then call ``getFunction()``. With this
485d80f118eSChris Lattnerdone, we can always obtain a function declaration in the current module for any
486d80f118eSChris Lattnerpreviously declared function.
487d80f118eSChris Lattner
488d80f118eSChris LattnerWe also need to update HandleDefinition and HandleExtern:
489d80f118eSChris Lattner
490d80f118eSChris Lattner.. code-block:: c++
491d80f118eSChris Lattner
492d80f118eSChris Lattner    static void HandleDefinition() {
493d80f118eSChris Lattner      if (auto FnAST = ParseDefinition()) {
494d80f118eSChris Lattner        if (auto *FnIR = FnAST->codegen()) {
495d80f118eSChris Lattner          fprintf(stderr, "Read function definition:");
496d80f118eSChris Lattner          FnIR->print(errs());
497d80f118eSChris Lattner          fprintf(stderr, "\n");
498d80f118eSChris Lattner          TheJIT->addModule(std::move(TheModule));
499d80f118eSChris Lattner          InitializeModuleAndPassManager();
500d80f118eSChris Lattner        }
501d80f118eSChris Lattner      } else {
502d80f118eSChris Lattner        // Skip token for error recovery.
503d80f118eSChris Lattner         getNextToken();
504d80f118eSChris Lattner      }
505d80f118eSChris Lattner    }
506d80f118eSChris Lattner
507d80f118eSChris Lattner    static void HandleExtern() {
508d80f118eSChris Lattner      if (auto ProtoAST = ParseExtern()) {
509d80f118eSChris Lattner        if (auto *FnIR = ProtoAST->codegen()) {
510d80f118eSChris Lattner          fprintf(stderr, "Read extern: ");
511d80f118eSChris Lattner          FnIR->print(errs());
512d80f118eSChris Lattner          fprintf(stderr, "\n");
513d80f118eSChris Lattner          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
514d80f118eSChris Lattner        }
515d80f118eSChris Lattner      } else {
516d80f118eSChris Lattner        // Skip token for error recovery.
517d80f118eSChris Lattner        getNextToken();
518d80f118eSChris Lattner      }
519d80f118eSChris Lattner    }
520d80f118eSChris Lattner
521d80f118eSChris LattnerIn HandleDefinition, we add two lines to transfer the newly defined function to
522d80f118eSChris Lattnerthe JIT and open a new module. In HandleExtern, we just need to add one line to
523d80f118eSChris Lattneradd the prototype to FunctionProtos.
524d80f118eSChris Lattner
525d80f118eSChris LattnerWith these changes made, let's try our REPL again (I removed the dump of the
526d80f118eSChris Lattneranonymous functions this time, you should get the idea by now :) :
527d80f118eSChris Lattner
528d80f118eSChris Lattner::
529d80f118eSChris Lattner
530d80f118eSChris Lattner    ready> def foo(x) x + 1;
531d80f118eSChris Lattner    ready> foo(2);
532d80f118eSChris Lattner    Evaluated to 3.000000
533d80f118eSChris Lattner
534d80f118eSChris Lattner    ready> def foo(x) x + 2;
535d80f118eSChris Lattner    ready> foo(2);
536d80f118eSChris Lattner    Evaluated to 4.000000
537d80f118eSChris Lattner
538d80f118eSChris LattnerIt works!
539d80f118eSChris Lattner
540d80f118eSChris LattnerEven with this simple code, we get some surprisingly powerful capabilities -
541d80f118eSChris Lattnercheck this out:
542d80f118eSChris Lattner
543d80f118eSChris Lattner::
544d80f118eSChris Lattner
545d80f118eSChris Lattner    ready> extern sin(x);
546d80f118eSChris Lattner    Read extern:
547d80f118eSChris Lattner    declare double @sin(double)
548d80f118eSChris Lattner
549d80f118eSChris Lattner    ready> extern cos(x);
550d80f118eSChris Lattner    Read extern:
551d80f118eSChris Lattner    declare double @cos(double)
552d80f118eSChris Lattner
553d80f118eSChris Lattner    ready> sin(1.0);
554d80f118eSChris Lattner    Read top-level expression:
555d80f118eSChris Lattner    define double @2() {
556d80f118eSChris Lattner    entry:
557d80f118eSChris Lattner      ret double 0x3FEAED548F090CEE
558d80f118eSChris Lattner    }
559d80f118eSChris Lattner
560d80f118eSChris Lattner    Evaluated to 0.841471
561d80f118eSChris Lattner
562d80f118eSChris Lattner    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
563d80f118eSChris Lattner    Read function definition:
564d80f118eSChris Lattner    define double @foo(double %x) {
565d80f118eSChris Lattner    entry:
566d80f118eSChris Lattner      %calltmp = call double @sin(double %x)
567d80f118eSChris Lattner      %multmp = fmul double %calltmp, %calltmp
568d80f118eSChris Lattner      %calltmp2 = call double @cos(double %x)
569d80f118eSChris Lattner      %multmp4 = fmul double %calltmp2, %calltmp2
570d80f118eSChris Lattner      %addtmp = fadd double %multmp, %multmp4
571d80f118eSChris Lattner      ret double %addtmp
572d80f118eSChris Lattner    }
573d80f118eSChris Lattner
574d80f118eSChris Lattner    ready> foo(4.0);
575d80f118eSChris Lattner    Read top-level expression:
576d80f118eSChris Lattner    define double @3() {
577d80f118eSChris Lattner    entry:
578d80f118eSChris Lattner      %calltmp = call double @foo(double 4.000000e+00)
579d80f118eSChris Lattner      ret double %calltmp
580d80f118eSChris Lattner    }
581d80f118eSChris Lattner
582d80f118eSChris Lattner    Evaluated to 1.000000
583d80f118eSChris Lattner
584d80f118eSChris LattnerWhoa, how does the JIT know about sin and cos? The answer is surprisingly
585d80f118eSChris Lattnersimple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
586d80f118eSChris Lattnerit uses to find symbols that aren't available in any given module: First
587d80f118eSChris Lattnerit searches all the modules that have already been added to the JIT, from the
588d80f118eSChris Lattnermost recent to the oldest, to find the newest definition. If no definition is
589d80f118eSChris Lattnerfound inside the JIT, it falls back to calling "``dlsym("sin")``" on the
590d80f118eSChris LattnerKaleidoscope process itself. Since "``sin``" is defined within the JIT's
591d80f118eSChris Lattneraddress space, it simply patches up calls in the module to call the libm
592d80f118eSChris Lattnerversion of ``sin`` directly. But in some cases this even goes further:
593d80f118eSChris Lattneras sin and cos are names of standard math functions, the constant folder
594d80f118eSChris Lattnerwill directly evaluate the function calls to the correct result when called
595d80f118eSChris Lattnerwith constants like in the "``sin(1.0)``" above.
596d80f118eSChris Lattner
597d80f118eSChris LattnerIn the future we'll see how tweaking this symbol resolution rule can be used to
598d80f118eSChris Lattnerenable all sorts of useful features, from security (restricting the set of
599d80f118eSChris Lattnersymbols available to JIT'd code), to dynamic code generation based on symbol
600d80f118eSChris Lattnernames, and even lazy compilation.
601d80f118eSChris Lattner
602d80f118eSChris LattnerOne immediate benefit of the symbol resolution rule is that we can now extend
603d80f118eSChris Lattnerthe language by writing arbitrary C++ code to implement operations. For example,
604d80f118eSChris Lattnerif we add:
605d80f118eSChris Lattner
606d80f118eSChris Lattner.. code-block:: c++
607d80f118eSChris Lattner
608d80f118eSChris Lattner    #ifdef _WIN32
609d80f118eSChris Lattner    #define DLLEXPORT __declspec(dllexport)
610d80f118eSChris Lattner    #else
611d80f118eSChris Lattner    #define DLLEXPORT
612d80f118eSChris Lattner    #endif
613d80f118eSChris Lattner
614d80f118eSChris Lattner    /// putchard - putchar that takes a double and returns 0.
615d80f118eSChris Lattner    extern "C" DLLEXPORT double putchard(double X) {
616d80f118eSChris Lattner      fputc((char)X, stderr);
617d80f118eSChris Lattner      return 0;
618d80f118eSChris Lattner    }
619d80f118eSChris Lattner
620d80f118eSChris LattnerNote, that for Windows we need to actually export the functions because
621d80f118eSChris Lattnerthe dynamic symbol loader will use GetProcAddress to find the symbols.
622d80f118eSChris Lattner
623d80f118eSChris LattnerNow we can produce simple output to the console by using things like:
624d80f118eSChris Lattner"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
625d80f118eSChris Lattneron the console (120 is the ASCII code for 'x'). Similar code could be
626d80f118eSChris Lattnerused to implement file I/O, console input, and many other capabilities
627d80f118eSChris Lattnerin Kaleidoscope.
628d80f118eSChris Lattner
629d80f118eSChris LattnerThis completes the JIT and optimizer chapter of the Kaleidoscope
630d80f118eSChris Lattnertutorial. At this point, we can compile a non-Turing-complete
631d80f118eSChris Lattnerprogramming language, optimize and JIT compile it in a user-driven way.
632d80f118eSChris LattnerNext up we'll look into `extending the language with control flow
633d80f118eSChris Lattnerconstructs <LangImpl05.html>`_, tackling some interesting LLVM IR issues
634d80f118eSChris Lattneralong the way.
635d80f118eSChris Lattner
636d80f118eSChris LattnerFull Code Listing
637d80f118eSChris Lattner=================
638d80f118eSChris Lattner
639d80f118eSChris LattnerHere is the complete code listing for our running example, enhanced with
640d80f118eSChris Lattnerthe LLVM JIT and optimizer. To build this example, use:
641d80f118eSChris Lattner
642d80f118eSChris Lattner.. code-block:: bash
643d80f118eSChris Lattner
644d80f118eSChris Lattner    # Compile
645*3546b372Sxgupta    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
646d80f118eSChris Lattner    # Run
647d80f118eSChris Lattner    ./toy
648d80f118eSChris Lattner
649d80f118eSChris LattnerIf you are compiling this on Linux, make sure to add the "-rdynamic"
650d80f118eSChris Lattneroption as well. This makes sure that the external functions are resolved
651d80f118eSChris Lattnerproperly at runtime.
652d80f118eSChris Lattner
653d80f118eSChris LattnerHere is the code:
654d80f118eSChris Lattner
655147e0ddaSHans Wennborg.. literalinclude:: ../../../examples/Kaleidoscope/Chapter4/toy.cpp
656d80f118eSChris Lattner   :language: c++
657d80f118eSChris Lattner
658d80f118eSChris Lattner`Next: Extending the language: control flow <LangImpl05.html>`_
659d80f118eSChris Lattner
660