1:orphan: 2 3============================================== 4Kaleidoscope: Adding JIT and Optimizer Support 5============================================== 6 7.. contents:: 8 :local: 9 10Chapter 4 Introduction 11====================== 12 13Welcome to Chapter 4 of the "`Implementing a language with 14LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation 15of a simple language and added support for generating LLVM IR. This 16chapter describes two new techniques: adding optimizer support to your 17language, and adding JIT compiler support. These additions will 18demonstrate how to get nice, efficient code for the Kaleidoscope 19language. 20 21Trivial Constant Folding 22======================== 23 24Our demonstration for Chapter 3 is elegant and easy to extend. 25Unfortunately, it does not produce wonderful code. The IRBuilder, 26however, does give us obvious optimizations when compiling simple code: 27 28:: 29 30 ready> def test(x) 1+2+x; 31 Read function definition: 32 define double @test(double %x) { 33 entry: 34 %addtmp = fadd double 3.000000e+00, %x 35 ret double %addtmp 36 } 37 38This code is not a literal transcription of the AST built by parsing the 39input. That would be: 40 41:: 42 43 ready> def test(x) 1+2+x; 44 Read function definition: 45 define double @test(double %x) { 46 entry: 47 %addtmp = fadd double 2.000000e+00, 1.000000e+00 48 %addtmp1 = fadd double %addtmp, %x 49 ret double %addtmp1 50 } 51 52Constant folding, as seen above, in particular, is a very common and 53very important optimization: so much so that many language implementors 54implement constant folding support in their AST representation. 55 56With LLVM, you don't need this support in the AST. Since all calls to 57build LLVM IR go through the LLVM IR builder, the builder itself checked 58to see if there was a constant folding opportunity when you call it. If 59so, it just does the constant fold and return the constant instead of 60creating an instruction. 61 62Well, that was easy :). In practice, we recommend always using 63``IRBuilder`` when generating code like this. It has no "syntactic 64overhead" for its use (you don't have to uglify your compiler with 65constant checks everywhere) and it can dramatically reduce the amount of 66LLVM IR that is generated in some cases (particular for languages with a 67macro preprocessor or that use a lot of constants). 68 69On the other hand, the ``IRBuilder`` is limited by the fact that it does 70all of its analysis inline with the code as it is built. If you take a 71slightly more complex example: 72 73:: 74 75 ready> def test(x) (1+2+x)*(x+(1+2)); 76 ready> Read function definition: 77 define double @test(double %x) { 78 entry: 79 %addtmp = fadd double 3.000000e+00, %x 80 %addtmp1 = fadd double %x, 3.000000e+00 81 %multmp = fmul double %addtmp, %addtmp1 82 ret double %multmp 83 } 84 85In this case, the LHS and RHS of the multiplication are the same value. 86We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" 87instead of computing "``x+3``" twice. 88 89Unfortunately, no amount of local analysis will be able to detect and 90correct this. This requires two transformations: reassociation of 91expressions (to make the add's lexically identical) and Common 92Subexpression Elimination (CSE) to delete the redundant add instruction. 93Fortunately, LLVM provides a broad range of optimizations that you can 94use, in the form of "passes". 95 96LLVM Optimization Passes 97======================== 98 99.. warning:: 100 101 Due to the transition to the new PassManager infrastructure this tutorial 102 is based on ``llvm::legacy::FunctionPassManager`` which can be found in 103 `LegacyPassManager.h <http://llvm.org/doxygen/classllvm_1_1legacy_1_1FunctionPassManager.html>`_. 104 For the purpose of the this tutorial the above should be used until 105 the pass manager transition is complete. 106 107LLVM provides many optimization passes, which do many different sorts of 108things and have different tradeoffs. Unlike other systems, LLVM doesn't 109hold to the mistaken notion that one set of optimizations is right for 110all languages and for all situations. LLVM allows a compiler implementor 111to make complete decisions about what optimizations to use, in which 112order, and in what situation. 113 114As a concrete example, LLVM supports both "whole module" passes, which 115look across as large of body of code as they can (often a whole file, 116but if run at link time, this can be a substantial portion of the whole 117program). It also supports and includes "per-function" passes which just 118operate on a single function at a time, without looking at other 119functions. For more information on passes and how they are run, see the 120`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the 121`List of LLVM Passes <../Passes.html>`_. 122 123For Kaleidoscope, we are currently generating functions on the fly, one 124at a time, as the user types them in. We aren't shooting for the 125ultimate optimization experience in this setting, but we also want to 126catch the easy and quick stuff where possible. As such, we will choose 127to run a few per-function optimizations as the user types the function 128in. If we wanted to make a "static Kaleidoscope compiler", we would use 129exactly the code we have now, except that we would defer running the 130optimizer until the entire file has been parsed. 131 132In order to get per-function optimizations going, we need to set up a 133`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold 134and organize the LLVM optimizations that we want to run. Once we have 135that, we can add a set of optimizations to run. We'll need a new 136FunctionPassManager for each module that we want to optimize, so we'll 137write a function to create and initialize both the module and pass manager 138for us: 139 140.. code-block:: c++ 141 142 void InitializeModuleAndPassManager(void) { 143 // Open a new module. 144 TheModule = std::make_unique<Module>("my cool jit", TheContext); 145 146 // Create a new pass manager attached to it. 147 TheFPM = std::make_unique<FunctionPassManager>(TheModule.get()); 148 149 // Do simple "peephole" optimizations and bit-twiddling optzns. 150 TheFPM->add(createInstructionCombiningPass()); 151 // Reassociate expressions. 152 TheFPM->add(createReassociatePass()); 153 // Eliminate Common SubExpressions. 154 TheFPM->add(createGVNPass()); 155 // Simplify the control flow graph (deleting unreachable blocks, etc). 156 TheFPM->add(createCFGSimplificationPass()); 157 158 TheFPM->doInitialization(); 159 } 160 161This code initializes the global module ``TheModule``, and the function pass 162manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is 163set up, we use a series of "add" calls to add a bunch of LLVM passes. 164 165In this case, we choose to add four optimization passes. 166The passes we choose here are a pretty standard set 167of "cleanup" optimizations that are useful for a wide variety of code. I won't 168delve into what they do but, believe me, they are a good starting place :). 169 170Once the PassManager is set up, we need to make use of it. We do this by 171running it after our newly created function is constructed (in 172``FunctionAST::codegen()``), but before it is returned to the client: 173 174.. code-block:: c++ 175 176 if (Value *RetVal = Body->codegen()) { 177 // Finish off the function. 178 Builder.CreateRet(RetVal); 179 180 // Validate the generated code, checking for consistency. 181 verifyFunction(*TheFunction); 182 183 // Optimize the function. 184 TheFPM->run(*TheFunction); 185 186 return TheFunction; 187 } 188 189As you can see, this is pretty straightforward. The 190``FunctionPassManager`` optimizes and updates the LLVM Function\* in 191place, improving (hopefully) its body. With this in place, we can try 192our test above again: 193 194:: 195 196 ready> def test(x) (1+2+x)*(x+(1+2)); 197 ready> Read function definition: 198 define double @test(double %x) { 199 entry: 200 %addtmp = fadd double %x, 3.000000e+00 201 %multmp = fmul double %addtmp, %addtmp 202 ret double %multmp 203 } 204 205As expected, we now get our nicely optimized code, saving a floating 206point add instruction from every execution of this function. 207 208LLVM provides a wide variety of optimizations that can be used in 209certain circumstances. Some `documentation about the various 210passes <../Passes.html>`_ is available, but it isn't very complete. 211Another good source of ideas can come from looking at the passes that 212``Clang`` runs to get started. The "``opt``" tool allows you to 213experiment with passes from the command line, so you can see if they do 214anything. 215 216Now that we have reasonable code coming out of our front-end, let's talk 217about executing it! 218 219Adding a JIT Compiler 220===================== 221 222Code that is available in LLVM IR can have a wide variety of tools 223applied to it. For example, you can run optimizations on it (as we did 224above), you can dump it out in textual or binary forms, you can compile 225the code to an assembly file (.s) for some target, or you can JIT 226compile it. The nice thing about the LLVM IR representation is that it 227is the "common currency" between many different parts of the compiler. 228 229In this section, we'll add JIT compiler support to our interpreter. The 230basic idea that we want for Kaleidoscope is to have the user enter 231function bodies as they do now, but immediately evaluate the top-level 232expressions they type in. For example, if they type in "1 + 2;", we 233should evaluate and print out 3. If they define a function, they should 234be able to call it from the command line. 235 236In order to do this, we first prepare the environment to create code for 237the current native target and declare and initialize the JIT. This is 238done by calling some ``InitializeNativeTarget\*`` functions and 239adding a global variable ``TheJIT``, and initializing it in 240``main``: 241 242.. code-block:: c++ 243 244 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 245 ... 246 int main() { 247 InitializeNativeTarget(); 248 InitializeNativeTargetAsmPrinter(); 249 InitializeNativeTargetAsmParser(); 250 251 // Install standard binary operators. 252 // 1 is lowest precedence. 253 BinopPrecedence['<'] = 10; 254 BinopPrecedence['+'] = 20; 255 BinopPrecedence['-'] = 20; 256 BinopPrecedence['*'] = 40; // highest. 257 258 // Prime the first token. 259 fprintf(stderr, "ready> "); 260 getNextToken(); 261 262 TheJIT = std::make_unique<KaleidoscopeJIT>(); 263 264 // Run the main "interpreter loop" now. 265 MainLoop(); 266 267 return 0; 268 } 269 270We also need to setup the data layout for the JIT: 271 272.. code-block:: c++ 273 274 void InitializeModuleAndPassManager(void) { 275 // Open a new module. 276 TheModule = std::make_unique<Module>("my cool jit", TheContext); 277 TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); 278 279 // Create a new pass manager attached to it. 280 TheFPM = std::make_unique<FunctionPassManager>(TheModule.get()); 281 ... 282 283The KaleidoscopeJIT class is a simple JIT built specifically for these 284tutorials, available inside the LLVM source code 285at llvm-src/examples/Kaleidoscope/include/KaleidoscopeJIT.h. 286In later chapters we will look at how it works and extend it with 287new features, but for now we will take it as given. Its API is very simple: 288``addModule`` adds an LLVM IR module to the JIT, making its functions 289available for execution; ``removeModule`` removes a module, freeing any 290memory associated with the code in that module; and ``findSymbol`` allows us 291to look up pointers to the compiled code. 292 293We can take this simple API and change our code that parses top-level expressions to 294look like this: 295 296.. code-block:: c++ 297 298 static void HandleTopLevelExpression() { 299 // Evaluate a top-level expression into an anonymous function. 300 if (auto FnAST = ParseTopLevelExpr()) { 301 if (FnAST->codegen()) { 302 303 // JIT the module containing the anonymous expression, keeping a handle so 304 // we can free it later. 305 auto H = TheJIT->addModule(std::move(TheModule)); 306 InitializeModuleAndPassManager(); 307 308 // Search the JIT for the __anon_expr symbol. 309 auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); 310 assert(ExprSymbol && "Function not found"); 311 312 // Get the symbol's address and cast it to the right type (takes no 313 // arguments, returns a double) so we can call it as a native function. 314 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress(); 315 fprintf(stderr, "Evaluated to %f\n", FP()); 316 317 // Delete the anonymous expression module from the JIT. 318 TheJIT->removeModule(H); 319 } 320 321If parsing and codegen succeed, the next step is to add the module containing 322the top-level expression to the JIT. We do this by calling addModule, which 323triggers code generation for all the functions in the module, and returns a 324handle that can be used to remove the module from the JIT later. Once the module 325has been added to the JIT it can no longer be modified, so we also open a new 326module to hold subsequent code by calling ``InitializeModuleAndPassManager()``. 327 328Once we've added the module to the JIT we need to get a pointer to the final 329generated code. We do this by calling the JIT's findSymbol method, and passing 330the name of the top-level expression function: ``__anon_expr``. Since we just 331added this function, we assert that findSymbol returned a result. 332 333Next, we get the in-memory address of the ``__anon_expr`` function by calling 334``getAddress()`` on the symbol. Recall that we compile top-level expressions 335into a self-contained LLVM function that takes no arguments and returns the 336computed double. Because the LLVM JIT compiler matches the native platform ABI, 337this means that you can just cast the result pointer to a function pointer of 338that type and call it directly. This means, there is no difference between JIT 339compiled code and native machine code that is statically linked into your 340application. 341 342Finally, since we don't support re-evaluation of top-level expressions, we 343remove the module from the JIT when we're done to free the associated memory. 344Recall, however, that the module we created a few lines earlier (via 345``InitializeModuleAndPassManager``) is still open and waiting for new code to be 346added. 347 348With just these two changes, let's see how Kaleidoscope works now! 349 350:: 351 352 ready> 4+5; 353 Read top-level expression: 354 define double @0() { 355 entry: 356 ret double 9.000000e+00 357 } 358 359 Evaluated to 9.000000 360 361Well this looks like it is basically working. The dump of the function 362shows the "no argument function that always returns double" that we 363synthesize for each top-level expression that is typed in. This 364demonstrates very basic functionality, but can we do more? 365 366:: 367 368 ready> def testfunc(x y) x + y*2; 369 Read function definition: 370 define double @testfunc(double %x, double %y) { 371 entry: 372 %multmp = fmul double %y, 2.000000e+00 373 %addtmp = fadd double %multmp, %x 374 ret double %addtmp 375 } 376 377 ready> testfunc(4, 10); 378 Read top-level expression: 379 define double @1() { 380 entry: 381 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) 382 ret double %calltmp 383 } 384 385 Evaluated to 24.000000 386 387 ready> testfunc(5, 10); 388 ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved! 389 390 391Function definitions and calls also work, but something went very wrong on that 392last line. The call looks valid, so what happened? As you may have guessed from 393the API a Module is a unit of allocation for the JIT, and testfunc was part 394of the same module that contained anonymous expression. When we removed that 395module from the JIT to free the memory for the anonymous expression, we deleted 396the definition of ``testfunc`` along with it. Then, when we tried to call 397testfunc a second time, the JIT could no longer find it. 398 399The easiest way to fix this is to put the anonymous expression in a separate 400module from the rest of the function definitions. The JIT will happily resolve 401function calls across module boundaries, as long as each of the functions called 402has a prototype, and is added to the JIT before it is called. By putting the 403anonymous expression in a different module we can delete it without affecting 404the rest of the functions. 405 406In fact, we're going to go a step further and put every function in its own 407module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT 408that will make our environment more REPL-like: Functions can be added to the 409JIT more than once (unlike a module where every function must have a unique 410definition). When you look up a symbol in KaleidoscopeJIT it will always return 411the most recent definition: 412 413:: 414 415 ready> def foo(x) x + 1; 416 Read function definition: 417 define double @foo(double %x) { 418 entry: 419 %addtmp = fadd double %x, 1.000000e+00 420 ret double %addtmp 421 } 422 423 ready> foo(2); 424 Evaluated to 3.000000 425 426 ready> def foo(x) x + 2; 427 define double @foo(double %x) { 428 entry: 429 %addtmp = fadd double %x, 2.000000e+00 430 ret double %addtmp 431 } 432 433 ready> foo(2); 434 Evaluated to 4.000000 435 436 437To allow each function to live in its own module we'll need a way to 438re-generate previous function declarations into each new module we open: 439 440.. code-block:: c++ 441 442 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 443 444 ... 445 446 Function *getFunction(std::string Name) { 447 // First, see if the function has already been added to the current module. 448 if (auto *F = TheModule->getFunction(Name)) 449 return F; 450 451 // If not, check whether we can codegen the declaration from some existing 452 // prototype. 453 auto FI = FunctionProtos.find(Name); 454 if (FI != FunctionProtos.end()) 455 return FI->second->codegen(); 456 457 // If no existing prototype exists, return null. 458 return nullptr; 459 } 460 461 ... 462 463 Value *CallExprAST::codegen() { 464 // Look up the name in the global module table. 465 Function *CalleeF = getFunction(Callee); 466 467 ... 468 469 Function *FunctionAST::codegen() { 470 // Transfer ownership of the prototype to the FunctionProtos map, but keep a 471 // reference to it for use below. 472 auto &P = *Proto; 473 FunctionProtos[Proto->getName()] = std::move(Proto); 474 Function *TheFunction = getFunction(P.getName()); 475 if (!TheFunction) 476 return nullptr; 477 478 479To enable this, we'll start by adding a new global, ``FunctionProtos``, that 480holds the most recent prototype for each function. We'll also add a convenience 481method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``. 482Our convenience method searches ``TheModule`` for an existing function 483declaration, falling back to generating a new declaration from FunctionProtos if 484it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the 485call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to 486update the FunctionProtos map first, then call ``getFunction()``. With this 487done, we can always obtain a function declaration in the current module for any 488previously declared function. 489 490We also need to update HandleDefinition and HandleExtern: 491 492.. code-block:: c++ 493 494 static void HandleDefinition() { 495 if (auto FnAST = ParseDefinition()) { 496 if (auto *FnIR = FnAST->codegen()) { 497 fprintf(stderr, "Read function definition:"); 498 FnIR->print(errs()); 499 fprintf(stderr, "\n"); 500 TheJIT->addModule(std::move(TheModule)); 501 InitializeModuleAndPassManager(); 502 } 503 } else { 504 // Skip token for error recovery. 505 getNextToken(); 506 } 507 } 508 509 static void HandleExtern() { 510 if (auto ProtoAST = ParseExtern()) { 511 if (auto *FnIR = ProtoAST->codegen()) { 512 fprintf(stderr, "Read extern: "); 513 FnIR->print(errs()); 514 fprintf(stderr, "\n"); 515 FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); 516 } 517 } else { 518 // Skip token for error recovery. 519 getNextToken(); 520 } 521 } 522 523In HandleDefinition, we add two lines to transfer the newly defined function to 524the JIT and open a new module. In HandleExtern, we just need to add one line to 525add the prototype to FunctionProtos. 526 527With these changes made, let's try our REPL again (I removed the dump of the 528anonymous functions this time, you should get the idea by now :) : 529 530:: 531 532 ready> def foo(x) x + 1; 533 ready> foo(2); 534 Evaluated to 3.000000 535 536 ready> def foo(x) x + 2; 537 ready> foo(2); 538 Evaluated to 4.000000 539 540It works! 541 542Even with this simple code, we get some surprisingly powerful capabilities - 543check this out: 544 545:: 546 547 ready> extern sin(x); 548 Read extern: 549 declare double @sin(double) 550 551 ready> extern cos(x); 552 Read extern: 553 declare double @cos(double) 554 555 ready> sin(1.0); 556 Read top-level expression: 557 define double @2() { 558 entry: 559 ret double 0x3FEAED548F090CEE 560 } 561 562 Evaluated to 0.841471 563 564 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); 565 Read function definition: 566 define double @foo(double %x) { 567 entry: 568 %calltmp = call double @sin(double %x) 569 %multmp = fmul double %calltmp, %calltmp 570 %calltmp2 = call double @cos(double %x) 571 %multmp4 = fmul double %calltmp2, %calltmp2 572 %addtmp = fadd double %multmp, %multmp4 573 ret double %addtmp 574 } 575 576 ready> foo(4.0); 577 Read top-level expression: 578 define double @3() { 579 entry: 580 %calltmp = call double @foo(double 4.000000e+00) 581 ret double %calltmp 582 } 583 584 Evaluated to 1.000000 585 586Whoa, how does the JIT know about sin and cos? The answer is surprisingly 587simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that 588it uses to find symbols that aren't available in any given module: First 589it searches all the modules that have already been added to the JIT, from the 590most recent to the oldest, to find the newest definition. If no definition is 591found inside the JIT, it falls back to calling "``dlsym("sin")``" on the 592Kaleidoscope process itself. Since "``sin``" is defined within the JIT's 593address space, it simply patches up calls in the module to call the libm 594version of ``sin`` directly. But in some cases this even goes further: 595as sin and cos are names of standard math functions, the constant folder 596will directly evaluate the function calls to the correct result when called 597with constants like in the "``sin(1.0)``" above. 598 599In the future we'll see how tweaking this symbol resolution rule can be used to 600enable all sorts of useful features, from security (restricting the set of 601symbols available to JIT'd code), to dynamic code generation based on symbol 602names, and even lazy compilation. 603 604One immediate benefit of the symbol resolution rule is that we can now extend 605the language by writing arbitrary C++ code to implement operations. For example, 606if we add: 607 608.. code-block:: c++ 609 610 #ifdef _WIN32 611 #define DLLEXPORT __declspec(dllexport) 612 #else 613 #define DLLEXPORT 614 #endif 615 616 /// putchard - putchar that takes a double and returns 0. 617 extern "C" DLLEXPORT double putchard(double X) { 618 fputc((char)X, stderr); 619 return 0; 620 } 621 622Note, that for Windows we need to actually export the functions because 623the dynamic symbol loader will use GetProcAddress to find the symbols. 624 625Now we can produce simple output to the console by using things like: 626"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' 627on the console (120 is the ASCII code for 'x'). Similar code could be 628used to implement file I/O, console input, and many other capabilities 629in Kaleidoscope. 630 631This completes the JIT and optimizer chapter of the Kaleidoscope 632tutorial. At this point, we can compile a non-Turing-complete 633programming language, optimize and JIT compile it in a user-driven way. 634Next up we'll look into `extending the language with control flow 635constructs <LangImpl05.html>`_, tackling some interesting LLVM IR issues 636along the way. 637 638Full Code Listing 639================= 640 641Here is the complete code listing for our running example, enhanced with 642the LLVM JIT and optimizer. To build this example, use: 643 644.. code-block:: bash 645 646 # Compile 647 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy 648 # Run 649 ./toy 650 651If you are compiling this on Linux, make sure to add the "-rdynamic" 652option as well. This makes sure that the external functions are resolved 653properly at runtime. 654 655Here is the code: 656 657.. literalinclude:: ../../../examples/Kaleidoscope/Chapter4/toy.cpp 658 :language: c++ 659 660`Next: Extending the language: control flow <LangImpl05.html>`_ 661 662