tutorial/MyFirstLanguageFrontend/LangImpl03.rst

d80f118eSChris Lattner========================================
d80f118eSChris LattnerKaleidoscope: Code generation to LLVM IR
d80f118eSChris Lattner========================================
d80f118eSChris Lattner
d80f118eSChris Lattner.. contents::
d80f118eSChris Lattner   :local:
d80f118eSChris Lattner
d80f118eSChris LattnerChapter 3 Introduction
d80f118eSChris Lattner======================
d80f118eSChris Lattner
d80f118eSChris LattnerWelcome to Chapter 3 of the "`Implementing a language with
d80f118eSChris LattnerLLVM <index.html>`_" tutorial. This chapter shows you how to transform
d80f118eSChris Lattnerthe `Abstract Syntax Tree <LangImpl02.html>`_, built in Chapter 2, into
d80f118eSChris LattnerLLVM IR. This will teach you a little bit about how LLVM does things, as
d80f118eSChris Lattnerwell as demonstrate how easy it is to use. It's much more work to build
d80f118eSChris Lattnera lexer and parser than it is to generate LLVM IR code. :)
d80f118eSChris Lattner
d80f118eSChris Lattner**Please note**: the code in this chapter and later require LLVM 3.7 or
d80f118eSChris Lattnerlater. LLVM 3.6 and before will not work with it. Also note that you
d80f118eSChris Lattnerneed to use a version of this tutorial that matches your LLVM release:
d80f118eSChris LattnerIf you are using an official LLVM release, use the version of the
d80f118eSChris Lattnerdocumentation included with your release or on the `llvm.org releases
72fd1033SSylvestre Ledrupage <https://llvm.org/releases/>`_.
d80f118eSChris Lattner
d80f118eSChris LattnerCode Generation Setup
d80f118eSChris Lattner=====================
d80f118eSChris Lattner
d80f118eSChris LattnerIn order to generate LLVM IR, we want some simple setup to get started.
d80f118eSChris LattnerFirst we define virtual code generation (codegen) methods in each AST
d80f118eSChris Lattnerclass:
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    /// ExprAST - Base class for all expression nodes.
d80f118eSChris Lattner    class ExprAST {
d80f118eSChris Lattner    public:
*153431ecSRoman Sokolkov      virtual ~ExprAST() = default;
d80f118eSChris Lattner      virtual Value *codegen() = 0;
d80f118eSChris Lattner    };
d80f118eSChris Lattner
d80f118eSChris Lattner    /// NumberExprAST - Expression class for numeric literals like "1.0".
d80f118eSChris Lattner    class NumberExprAST : public ExprAST {
d80f118eSChris Lattner      double Val;
d80f118eSChris Lattner
d80f118eSChris Lattner    public:
d80f118eSChris Lattner      NumberExprAST(double Val) : Val(Val) {}
*153431ecSRoman Sokolkov      Value *codegen() override;
d80f118eSChris Lattner    };
d80f118eSChris Lattner    ...
d80f118eSChris Lattner
d80f118eSChris LattnerThe codegen() method says to emit IR for that AST node along with all
d80f118eSChris Lattnerthe things it depends on, and they all return an LLVM Value object.
d80f118eSChris Lattner"Value" is the class used to represent a "`Static Single Assignment
d80f118eSChris Lattner(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
d80f118eSChris Lattnerregister" or "SSA value" in LLVM. The most distinct aspect of SSA values
d80f118eSChris Lattneris that their value is computed as the related instruction executes, and
d80f118eSChris Lattnerit does not get a new value until (and if) the instruction re-executes.
d80f118eSChris LattnerIn other words, there is no way to "change" an SSA value. For more
d80f118eSChris Lattnerinformation, please read up on `Static Single
d80f118eSChris LattnerAssignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
d80f118eSChris Lattner- the concepts are really quite natural once you grok them.
d80f118eSChris Lattner
d80f118eSChris LattnerNote that instead of adding virtual methods to the ExprAST class
d80f118eSChris Lattnerhierarchy, it could also make sense to use a `visitor
d80f118eSChris Lattnerpattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other
d80f118eSChris Lattnerway to model this. Again, this tutorial won't dwell on good software
d80f118eSChris Lattnerengineering practices: for our purposes, adding a virtual method is
d80f118eSChris Lattnersimplest.
d80f118eSChris Lattner
28da5759SChris MorinThe second thing we want is a "LogError" method like we used for the
d80f118eSChris Lattnerparser, which will be used to report errors found during code generation
d80f118eSChris Lattner(for example, use of an undeclared parameter):
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    static LLVMContext TheContext;
d80f118eSChris Lattner    static IRBuilder<> Builder(TheContext);
d80f118eSChris Lattner    static std::unique_ptr<Module> TheModule;
d80f118eSChris Lattner    static std::map<std::string, Value *> NamedValues;
d80f118eSChris Lattner
d80f118eSChris Lattner    Value *LogErrorV(const char *Str) {
d80f118eSChris Lattner      LogError(Str);
d80f118eSChris Lattner      return nullptr;
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerThe static variables will be used during code generation. ``TheContext``
d80f118eSChris Lattneris an opaque object that owns a lot of core LLVM data structures, such as
d80f118eSChris Lattnerthe type and constant value tables. We don't need to understand it in
d80f118eSChris Lattnerdetail, we just need a single instance to pass into APIs that require it.
d80f118eSChris Lattner
d80f118eSChris LattnerThe ``Builder`` object is a helper object that makes it easy to generate
d80f118eSChris LattnerLLVM instructions. Instances of the
e15996b5SHan Seoul-Oh`IRBuilder <https://llvm.org/doxygen/IRBuilder_8h_source.html>`_
d80f118eSChris Lattnerclass template keep track of the current place to insert instructions
d80f118eSChris Lattnerand has methods to create new instructions.
d80f118eSChris Lattner
d80f118eSChris Lattner``TheModule`` is an LLVM construct that contains functions and global
d80f118eSChris Lattnervariables. In many ways, it is the top-level structure that the LLVM IR
d80f118eSChris Lattneruses to contain code. It will own the memory for all of the IR that we
d80f118eSChris Lattnergenerate, which is why the codegen() method returns a raw Value\*,
d80f118eSChris Lattnerrather than a unique_ptr<Value>.
d80f118eSChris Lattner
d80f118eSChris LattnerThe ``NamedValues`` map keeps track of which values are defined in the
d80f118eSChris Lattnercurrent scope and what their LLVM representation is. (In other words, it
d80f118eSChris Lattneris a symbol table for the code). In this form of Kaleidoscope, the only
d80f118eSChris Lattnerthings that can be referenced are function parameters. As such, function
d80f118eSChris Lattnerparameters will be in this map when generating code for their function
d80f118eSChris Lattnerbody.
d80f118eSChris Lattner
d80f118eSChris LattnerWith these basics in place, we can start talking about how to generate
d80f118eSChris Lattnercode for each expression. Note that this assumes that the ``Builder``
d80f118eSChris Lattnerhas been set up to generate code *into* something. For now, we'll assume
d80f118eSChris Lattnerthat this has already been done, and we'll just use it to emit code.
d80f118eSChris Lattner
d80f118eSChris LattnerExpression Code Generation
d80f118eSChris Lattner==========================
d80f118eSChris Lattner
d80f118eSChris LattnerGenerating LLVM code for expression nodes is very straightforward: less
d80f118eSChris Lattnerthan 45 lines of commented code for all four of our expression nodes.
d80f118eSChris LattnerFirst we'll do numeric literals:
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    Value *NumberExprAST::codegen() {
d80f118eSChris Lattner      return ConstantFP::get(TheContext, APFloat(Val));
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerIn the LLVM IR, numeric constants are represented with the
d80f118eSChris Lattner``ConstantFP`` class, which holds the numeric value in an ``APFloat``
d80f118eSChris Lattnerinternally (``APFloat`` has the capability of holding floating point
d80f118eSChris Lattnerconstants of Arbitrary Precision). This code basically just creates
d80f118eSChris Lattnerand returns a ``ConstantFP``. Note that in the LLVM IR that constants
d80f118eSChris Lattnerare all uniqued together and shared. For this reason, the API uses the
d80f118eSChris Lattner"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    Value *VariableExprAST::codegen() {
d80f118eSChris Lattner      // Look this variable up in the function.
d80f118eSChris Lattner      Value *V = NamedValues[Name];
d80f118eSChris Lattner      if (!V)
d80f118eSChris Lattner        LogErrorV("Unknown variable name");
d80f118eSChris Lattner      return V;
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerReferences to variables are also quite simple using LLVM. In the simple
d80f118eSChris Lattnerversion of Kaleidoscope, we assume that the variable has already been
d80f118eSChris Lattneremitted somewhere and its value is available. In practice, the only
d80f118eSChris Lattnervalues that can be in the ``NamedValues`` map are function arguments.
d80f118eSChris LattnerThis code simply checks to see that the specified name is in the map (if
d80f118eSChris Lattnernot, an unknown variable is being referenced) and returns the value for
d80f118eSChris Lattnerit. In future chapters, we'll add support for `loop induction
5864cb38SBrian Gesiakvariables <LangImpl05.html#for-loop-expression>`_ in the symbol table, and for `local
5864cb38SBrian Gesiakvariables <LangImpl07.html#user-defined-local-variables>`_.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    Value *BinaryExprAST::codegen() {
d80f118eSChris Lattner      Value *L = LHS->codegen();
d80f118eSChris Lattner      Value *R = RHS->codegen();
d80f118eSChris Lattner      if (!L || !R)
d80f118eSChris Lattner        return nullptr;
d80f118eSChris Lattner
d80f118eSChris Lattner      switch (Op) {
d80f118eSChris Lattner      case '+':
d80f118eSChris Lattner        return Builder.CreateFAdd(L, R, "addtmp");
d80f118eSChris Lattner      case '-':
d80f118eSChris Lattner        return Builder.CreateFSub(L, R, "subtmp");
d80f118eSChris Lattner      case '*':
d80f118eSChris Lattner        return Builder.CreateFMul(L, R, "multmp");
d80f118eSChris Lattner      case '<':
d80f118eSChris Lattner        L = Builder.CreateFCmpULT(L, R, "cmptmp");
d80f118eSChris Lattner        // Convert bool 0/1 to double 0.0 or 1.0
d80f118eSChris Lattner        return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext),
d80f118eSChris Lattner                                    "booltmp");
d80f118eSChris Lattner      default:
d80f118eSChris Lattner        return LogErrorV("invalid binary operator");
d80f118eSChris Lattner      }
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerBinary operators start to get more interesting. The basic idea here is
d80f118eSChris Lattnerthat we recursively emit code for the left-hand side of the expression,
d80f118eSChris Lattnerthen the right-hand side, then we compute the result of the binary
d80f118eSChris Lattnerexpression. In this code, we do a simple switch on the opcode to create
d80f118eSChris Lattnerthe right LLVM instruction.
d80f118eSChris Lattner
d80f118eSChris LattnerIn the example above, the LLVM builder class is starting to show its
d80f118eSChris Lattnervalue. IRBuilder knows where to insert the newly created instruction,
d80f118eSChris Lattnerall you have to do is specify what instruction to create (e.g. with
d80f118eSChris Lattner``CreateFAdd``), which operands to use (``L`` and ``R`` here) and
d80f118eSChris Lattneroptionally provide a name for the generated instruction.
d80f118eSChris Lattner
d80f118eSChris LattnerOne nice thing about LLVM is that the name is just a hint. For instance,
d80f118eSChris Lattnerif the code above emits multiple "addtmp" variables, LLVM will
d80f118eSChris Lattnerautomatically provide each one with an increasing, unique numeric
d80f118eSChris Lattnersuffix. Local value names for instructions are purely optional, but it
d80f118eSChris Lattnermakes it much easier to read the IR dumps.
d80f118eSChris Lattner
5e782e74Skristina`LLVM instructions <../../LangRef.html#instruction-reference>`_ are constrained by strict
114a8903SBill Wendlingrules: for example, the Left and Right operands of an `add
2916489cSkristinainstruction <../../LangRef.html#add-instruction>`_ must have the same type, and the
d80f118eSChris Lattnerresult type of the add must match the operand types. Because all values
d80f118eSChris Lattnerin Kaleidoscope are doubles, this makes for very simple code for add,
d80f118eSChris Lattnersub and mul.
d80f118eSChris Lattner
d80f118eSChris LattnerOn the other hand, LLVM specifies that the `fcmp
2916489cSkristinainstruction <../../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
d80f118eSChris Lattnerone bit integer). The problem with this is that Kaleidoscope wants the
d80f118eSChris Lattnervalue to be a 0.0 or 1.0 value. In order to get these semantics, we
d80f118eSChris Lattnercombine the fcmp instruction with a `uitofp
2916489cSkristinainstruction <../../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
d80f118eSChris Lattnerinput integer into a floating point value by treating the input as an
d80f118eSChris Lattnerunsigned value. In contrast, if we used the `sitofp
2916489cSkristinainstruction <../../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
d80f118eSChris Lattnerwould return 0.0 and -1.0, depending on the input value.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    Value *CallExprAST::codegen() {
d80f118eSChris Lattner      // Look up the name in the global module table.
d80f118eSChris Lattner      Function *CalleeF = TheModule->getFunction(Callee);
d80f118eSChris Lattner      if (!CalleeF)
d80f118eSChris Lattner        return LogErrorV("Unknown function referenced");
d80f118eSChris Lattner
d80f118eSChris Lattner      // If argument mismatch error.
d80f118eSChris Lattner      if (CalleeF->arg_size() != Args.size())
d80f118eSChris Lattner        return LogErrorV("Incorrect # arguments passed");
d80f118eSChris Lattner
d80f118eSChris Lattner      std::vector<Value *> ArgsV;
d80f118eSChris Lattner      for (unsigned i = 0, e = Args.size(); i != e; ++i) {
d80f118eSChris Lattner        ArgsV.push_back(Args[i]->codegen());
d80f118eSChris Lattner        if (!ArgsV.back())
d80f118eSChris Lattner          return nullptr;
d80f118eSChris Lattner      }
d80f118eSChris Lattner
d80f118eSChris Lattner      return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerCode generation for function calls is quite straightforward with LLVM. The code
d80f118eSChris Lattnerabove initially does a function name lookup in the LLVM Module's symbol table.
d80f118eSChris LattnerRecall that the LLVM Module is the container that holds the functions we are
d80f118eSChris LattnerJIT'ing. By giving each function the same name as what the user specifies, we
d80f118eSChris Lattnercan use the LLVM symbol table to resolve function names for us.
d80f118eSChris Lattner
d80f118eSChris LattnerOnce we have the function to call, we recursively codegen each argument
d80f118eSChris Lattnerthat is to be passed in, and create an LLVM `call
2916489cSkristinainstruction <../../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
d80f118eSChris Lattnercalling conventions by default, allowing these calls to also call into
d80f118eSChris Lattnerstandard library functions like "sin" and "cos", with no additional
d80f118eSChris Lattnereffort.
d80f118eSChris Lattner
d80f118eSChris LattnerThis wraps up our handling of the four basic expressions that we have so
d80f118eSChris Lattnerfar in Kaleidoscope. Feel free to go in and add some more. For example,
2916489cSkristinaby browsing the `LLVM language reference <../../LangRef.html>`_ you'll find
d80f118eSChris Lattnerseveral other interesting instructions that are really easy to plug into
d80f118eSChris Lattnerour basic framework.
d80f118eSChris Lattner
d80f118eSChris LattnerFunction Code Generation
d80f118eSChris Lattner========================
d80f118eSChris Lattner
d80f118eSChris LattnerCode generation for prototypes and functions must handle a number of
d80f118eSChris Lattnerdetails, which make their code less beautiful than expression code
d80f118eSChris Lattnergeneration, but allows us to illustrate some important points. First,
d80f118eSChris Lattnerlet's talk about code generation for prototypes: they are used both for
d80f118eSChris Lattnerfunction bodies and external function declarations. The code starts
d80f118eSChris Lattnerwith:
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner    Function *PrototypeAST::codegen() {
d80f118eSChris Lattner      // Make the function type:  double(double,double) etc.
d80f118eSChris Lattner      std::vector<Type*> Doubles(Args.size(),
d80f118eSChris Lattner                                 Type::getDoubleTy(TheContext));
d80f118eSChris Lattner      FunctionType *FT =
d80f118eSChris Lattner        FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false);
d80f118eSChris Lattner
d80f118eSChris Lattner      Function *F =
d80f118eSChris Lattner        Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get());
d80f118eSChris Lattner
d80f118eSChris LattnerThis code packs a lot of power into a few lines. Note first that this
d80f118eSChris Lattnerfunction returns a "Function\*" instead of a "Value\*". Because a
d80f118eSChris Lattner"prototype" really talks about the external interface for a function
d80f118eSChris Lattner(not the value computed by an expression), it makes sense for it to
d80f118eSChris Lattnerreturn the LLVM Function it corresponds to when codegen'd.
d80f118eSChris Lattner
d80f118eSChris LattnerThe call to ``FunctionType::get`` creates the ``FunctionType`` that
d80f118eSChris Lattnershould be used for a given Prototype. Since all function arguments in
d80f118eSChris LattnerKaleidoscope are of type double, the first line creates a vector of "N"
d80f118eSChris LattnerLLVM double types. It then uses the ``Functiontype::get`` method to
d80f118eSChris Lattnercreate a function type that takes "N" doubles as arguments, returns one
d80f118eSChris Lattnerdouble as a result, and that is not vararg (the false parameter
d80f118eSChris Lattnerindicates this). Note that Types in LLVM are uniqued just like Constants
d80f118eSChris Lattnerare, so you don't "new" a type, you "get" it.
d80f118eSChris Lattner
d80f118eSChris LattnerThe final line above actually creates the IR Function corresponding to
d80f118eSChris Lattnerthe Prototype. This indicates the type, linkage and name to use, as
d80f118eSChris Lattnerwell as which module to insert into. "`external
2916489cSkristinalinkage <../../LangRef.html#linkage>`_" means that the function may be
d80f118eSChris Lattnerdefined outside the current module and/or that it is callable by
d80f118eSChris Lattnerfunctions outside the module. The Name passed in is the name the user
d80f118eSChris Lattnerspecified: since "``TheModule``" is specified, this name is registered
d80f118eSChris Lattnerin "``TheModule``"s symbol table.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner  // Set names for all arguments.
d80f118eSChris Lattner  unsigned Idx = 0;
d80f118eSChris Lattner  for (auto &Arg : F->args())
d80f118eSChris Lattner    Arg.setName(Args[Idx++]);
d80f118eSChris Lattner
d80f118eSChris Lattner  return F;
d80f118eSChris Lattner
d80f118eSChris LattnerFinally, we set the name of each of the function's arguments according to the
d80f118eSChris Lattnernames given in the Prototype. This step isn't strictly necessary, but keeping
d80f118eSChris Lattnerthe names consistent makes the IR more readable, and allows subsequent code to
d80f118eSChris Lattnerrefer directly to the arguments for their names, rather than having to look up
d80f118eSChris Lattnerthem up in the Prototype AST.
d80f118eSChris Lattner
d80f118eSChris LattnerAt this point we have a function prototype with no body. This is how LLVM IR
d80f118eSChris Lattnerrepresents function declarations. For extern statements in Kaleidoscope, this
d80f118eSChris Lattneris as far as we need to go. For function definitions however, we need to
d80f118eSChris Lattnercodegen and attach a function body.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner  Function *FunctionAST::codegen() {
d80f118eSChris Lattner      // First, check for an existing function from a previous 'extern' declaration.
d80f118eSChris Lattner    Function *TheFunction = TheModule->getFunction(Proto->getName());
d80f118eSChris Lattner
d80f118eSChris Lattner    if (!TheFunction)
d80f118eSChris Lattner      TheFunction = Proto->codegen();
d80f118eSChris Lattner
d80f118eSChris Lattner    if (!TheFunction)
d80f118eSChris Lattner      return nullptr;
d80f118eSChris Lattner
d80f118eSChris Lattner    if (!TheFunction->empty())
d80f118eSChris Lattner      return (Function*)LogErrorV("Function cannot be redefined.");
d80f118eSChris Lattner
d80f118eSChris Lattner
d80f118eSChris LattnerFor function definitions, we start by searching TheModule's symbol table for an
d80f118eSChris Lattnerexisting version of this function, in case one has already been created using an
d80f118eSChris Lattner'extern' statement. If Module::getFunction returns null then no previous version
d80f118eSChris Lattnerexists, so we'll codegen one from the Prototype. In either case, we want to
d80f118eSChris Lattnerassert that the function is empty (i.e. has no body yet) before we start.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner  // Create a new basic block to start insertion into.
d80f118eSChris Lattner  BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction);
d80f118eSChris Lattner  Builder.SetInsertPoint(BB);
d80f118eSChris Lattner
d80f118eSChris Lattner  // Record the function arguments in the NamedValues map.
d80f118eSChris Lattner  NamedValues.clear();
d80f118eSChris Lattner  for (auto &Arg : TheFunction->args())
d80f118eSChris Lattner    NamedValues[Arg.getName()] = &Arg;
d80f118eSChris Lattner
d80f118eSChris LattnerNow we get to the point where the ``Builder`` is set up. The first line
d80f118eSChris Lattnercreates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
d80f118eSChris Lattner(named "entry"), which is inserted into ``TheFunction``. The second line
d80f118eSChris Lattnerthen tells the builder that new instructions should be inserted into the
d80f118eSChris Lattnerend of the new basic block. Basic blocks in LLVM are an important part
d80f118eSChris Lattnerof functions that define the `Control Flow
d80f118eSChris LattnerGraph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
d80f118eSChris Lattnerdon't have any control flow, our functions will only contain one block
d80f118eSChris Lattnerat this point. We'll fix this in `Chapter 5 <LangImpl05.html>`_ :).
d80f118eSChris Lattner
d80f118eSChris LattnerNext we add the function arguments to the NamedValues map (after first clearing
d80f118eSChris Lattnerit out) so that they're accessible to ``VariableExprAST`` nodes.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner      if (Value *RetVal = Body->codegen()) {
d80f118eSChris Lattner        // Finish off the function.
d80f118eSChris Lattner        Builder.CreateRet(RetVal);
d80f118eSChris Lattner
d80f118eSChris Lattner        // Validate the generated code, checking for consistency.
d80f118eSChris Lattner        verifyFunction(*TheFunction);
d80f118eSChris Lattner
d80f118eSChris Lattner        return TheFunction;
d80f118eSChris Lattner      }
d80f118eSChris Lattner
d80f118eSChris LattnerOnce the insertion point has been set up and the NamedValues map populated,
d80f118eSChris Lattnerwe call the ``codegen()`` method for the root expression of the function. If no
d80f118eSChris Lattnererror happens, this emits code to compute the expression into the entry block
d80f118eSChris Lattnerand returns the value that was computed. Assuming no error, we then create an
2916489cSkristinaLLVM `ret instruction <../../LangRef.html#ret-instruction>`_, which completes the function.
d80f118eSChris LattnerOnce the function is built, we call ``verifyFunction``, which is
d80f118eSChris Lattnerprovided by LLVM. This function does a variety of consistency checks on
d80f118eSChris Lattnerthe generated code, to determine if our compiler is doing everything
d80f118eSChris Lattnerright. Using this is important: it can catch a lot of bugs. Once the
d80f118eSChris Lattnerfunction is finished and validated, we return it.
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: c++
d80f118eSChris Lattner
d80f118eSChris Lattner      // Error reading body, remove function.
d80f118eSChris Lattner      TheFunction->eraseFromParent();
d80f118eSChris Lattner      return nullptr;
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerThe only piece left here is handling of the error case. For simplicity,
d80f118eSChris Lattnerwe handle this by merely deleting the function we produced with the
d80f118eSChris Lattner``eraseFromParent`` method. This allows the user to redefine a function
d80f118eSChris Lattnerthat they incorrectly typed in before: if we didn't delete it, it would
d80f118eSChris Lattnerlive in the symbol table, with a body, preventing future redefinition.
d80f118eSChris Lattner
d80f118eSChris LattnerThis code does have a bug, though: If the ``FunctionAST::codegen()`` method
d80f118eSChris Lattnerfinds an existing IR Function, it does not validate its signature against the
d80f118eSChris Lattnerdefinition's own prototype. This means that an earlier 'extern' declaration will
d80f118eSChris Lattnertake precedence over the function definition's signature, which can cause
d80f118eSChris Lattnercodegen to fail, for instance if the function arguments are named differently.
d80f118eSChris LattnerThere are a number of ways to fix this bug, see what you can come up with! Here
d80f118eSChris Lattneris a testcase:
d80f118eSChris Lattner
d80f118eSChris Lattner::
d80f118eSChris Lattner
d80f118eSChris Lattner    extern foo(a);     # ok, defines foo.
d80f118eSChris Lattner    def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence).
d80f118eSChris Lattner
d80f118eSChris LattnerDriver Changes and Closing Thoughts
d80f118eSChris Lattner===================================
d80f118eSChris Lattner
d80f118eSChris LattnerFor now, code generation to LLVM doesn't really get us much, except that
d80f118eSChris Lattnerwe can look at the pretty IR calls. The sample code inserts calls to
d80f118eSChris Lattnercodegen into the "``HandleDefinition``", "``HandleExtern``" etc
d80f118eSChris Lattnerfunctions, and then dumps out the LLVM IR. This gives a nice way to look
d80f118eSChris Lattnerat the LLVM IR for simple functions. For example:
d80f118eSChris Lattner
d80f118eSChris Lattner::
d80f118eSChris Lattner
d80f118eSChris Lattner    ready> 4+5;
d80f118eSChris Lattner    Read top-level expression:
d80f118eSChris Lattner    define double @0() {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      ret double 9.000000e+00
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerNote how the parser turns the top-level expression into anonymous
d80f118eSChris Lattnerfunctions for us. This will be handy when we add `JIT
5864cb38SBrian Gesiaksupport <LangImpl04.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the
d80f118eSChris Lattnercode is very literally transcribed, no optimizations are being performed
d80f118eSChris Lattnerexcept simple constant folding done by IRBuilder. We will `add
5864cb38SBrian Gesiakoptimizations <LangImpl04.html#trivial-constant-folding>`_ explicitly in the next
d80f118eSChris Lattnerchapter.
d80f118eSChris Lattner
d80f118eSChris Lattner::
d80f118eSChris Lattner
d80f118eSChris Lattner    ready> def foo(a b) a*a + 2*a*b + b*b;
d80f118eSChris Lattner    Read function definition:
d80f118eSChris Lattner    define double @foo(double %a, double %b) {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %multmp = fmul double %a, %a
d80f118eSChris Lattner      %multmp1 = fmul double 2.000000e+00, %a
d80f118eSChris Lattner      %multmp2 = fmul double %multmp1, %b
d80f118eSChris Lattner      %addtmp = fadd double %multmp, %multmp2
d80f118eSChris Lattner      %multmp3 = fmul double %b, %b
d80f118eSChris Lattner      %addtmp4 = fadd double %addtmp, %multmp3
d80f118eSChris Lattner      ret double %addtmp4
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerThis shows some simple arithmetic. Notice the striking similarity to the
d80f118eSChris LattnerLLVM builder calls that we use to create the instructions.
d80f118eSChris Lattner
d80f118eSChris Lattner::
d80f118eSChris Lattner
d80f118eSChris Lattner    ready> def bar(a) foo(a, 4.0) + bar(31337);
d80f118eSChris Lattner    Read function definition:
d80f118eSChris Lattner    define double @bar(double %a) {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %calltmp = call double @foo(double %a, double 4.000000e+00)
d80f118eSChris Lattner      %calltmp1 = call double @bar(double 3.133700e+04)
d80f118eSChris Lattner      %addtmp = fadd double %calltmp, %calltmp1
d80f118eSChris Lattner      ret double %addtmp
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerThis shows some function calls. Note that this function will take a long
d80f118eSChris Lattnertime to execute if you call it. In the future we'll add conditional
d80f118eSChris Lattnercontrol flow to actually make recursion useful :).
d80f118eSChris Lattner
d80f118eSChris Lattner::
d80f118eSChris Lattner
d80f118eSChris Lattner    ready> extern cos(x);
d80f118eSChris Lattner    Read extern:
d80f118eSChris Lattner    declare double @cos(double)
d80f118eSChris Lattner
d80f118eSChris Lattner    ready> cos(1.234);
d80f118eSChris Lattner    Read top-level expression:
d80f118eSChris Lattner    define double @1() {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %calltmp = call double @cos(double 1.234000e+00)
d80f118eSChris Lattner      ret double %calltmp
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerThis shows an extern for the libm "cos" function, and a call to it.
d80f118eSChris Lattner
d80f118eSChris Lattner.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up
d80f118eSChris Lattner   on highlighting this due to the first line.
d80f118eSChris Lattner
d80f118eSChris Lattner::
d80f118eSChris Lattner
d80f118eSChris Lattner    ready> ^D
d80f118eSChris Lattner    ; ModuleID = 'my cool jit'
d80f118eSChris Lattner
d80f118eSChris Lattner    define double @0() {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %addtmp = fadd double 4.000000e+00, 5.000000e+00
d80f118eSChris Lattner      ret double %addtmp
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris Lattner    define double @foo(double %a, double %b) {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %multmp = fmul double %a, %a
d80f118eSChris Lattner      %multmp1 = fmul double 2.000000e+00, %a
d80f118eSChris Lattner      %multmp2 = fmul double %multmp1, %b
d80f118eSChris Lattner      %addtmp = fadd double %multmp, %multmp2
d80f118eSChris Lattner      %multmp3 = fmul double %b, %b
d80f118eSChris Lattner      %addtmp4 = fadd double %addtmp, %multmp3
d80f118eSChris Lattner      ret double %addtmp4
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris Lattner    define double @bar(double %a) {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %calltmp = call double @foo(double %a, double 4.000000e+00)
d80f118eSChris Lattner      %calltmp1 = call double @bar(double 3.133700e+04)
d80f118eSChris Lattner      %addtmp = fadd double %calltmp, %calltmp1
d80f118eSChris Lattner      ret double %addtmp
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris Lattner    declare double @cos(double)
d80f118eSChris Lattner
d80f118eSChris Lattner    define double @1() {
d80f118eSChris Lattner    entry:
d80f118eSChris Lattner      %calltmp = call double @cos(double 1.234000e+00)
d80f118eSChris Lattner      ret double %calltmp
d80f118eSChris Lattner    }
d80f118eSChris Lattner
d80f118eSChris LattnerWhen you quit the current demo (by sending an EOF via CTRL+D on Linux
d80f118eSChris Lattneror CTRL+Z and ENTER on Windows), it dumps out the IR for the entire
d80f118eSChris Lattnermodule generated. Here you can see the big picture with all the
d80f118eSChris Lattnerfunctions referencing each other.
d80f118eSChris Lattner
d80f118eSChris LattnerThis wraps up the third chapter of the Kaleidoscope tutorial. Up next,
d80f118eSChris Lattnerwe'll describe how to `add JIT codegen and optimizer
d80f118eSChris Lattnersupport <LangImpl04.html>`_ to this so we can actually start running
d80f118eSChris Lattnercode!
d80f118eSChris Lattner
d80f118eSChris LattnerFull Code Listing
d80f118eSChris Lattner=================
d80f118eSChris Lattner
d80f118eSChris LattnerHere is the complete code listing for our running example, enhanced with
d80f118eSChris Lattnerthe LLVM code generator. Because this uses the LLVM libraries, we need
d80f118eSChris Lattnerto link them in. To do this, we use the
72fd1033SSylvestre Ledru`llvm-config <https://llvm.org/cmds/llvm-config.html>`_ tool to inform
d80f118eSChris Lattnerour makefile/command line about which options to use:
d80f118eSChris Lattner
d80f118eSChris Lattner.. code-block:: bash
d80f118eSChris Lattner
d80f118eSChris Lattner    # Compile
d80f118eSChris Lattner    clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
d80f118eSChris Lattner    # Run
d80f118eSChris Lattner    ./toy
d80f118eSChris Lattner
d80f118eSChris LattnerHere is the code:
d80f118eSChris Lattner
147e0ddaSHans Wennborg.. literalinclude:: ../../../examples/Kaleidoscope/Chapter3/toy.cpp
d80f118eSChris Lattner   :language: c++
d80f118eSChris Lattner
d80f118eSChris Lattner`Next: Adding JIT and Optimizer Support <LangImpl04.html>`_
d80f118eSChris Lattner