175897234Sdrh<html> 275897234Sdrh<head> 375897234Sdrh<title>The Lemon Parser Generator</title> 475897234Sdrh</head> 560c71b02Sdrh<body> 660c71b02Sdrh<a id="main"></a> 79a243e69Sdrh<h1 align='center'>The Lemon Parser Generator</h1> 875897234Sdrh 99bccde3dSdrh<p>Lemon is an LALR(1) parser generator for C. 109bccde3dSdrhIt does the same job as "bison" and "yacc". 119a243e69SdrhBut Lemon is not a bison or yacc clone. Lemon 1275897234Sdrhuses a different grammar syntax which is designed to 139bccde3dSdrhreduce the number of coding errors. Lemon also uses a 149bccde3dSdrhparsing engine that is faster than yacc and 159bccde3dSdrhbison and which is both reentrant and threadsafe. 169bccde3dSdrh(Update: Since the previous sentence was written, bison 179bccde3dSdrhhas also been updated so that it too can generate a 189bccde3dSdrhreentrant and threadsafe parser.) 199bccde3dSdrhLemon also implements features that can be used 209a243e69Sdrhto eliminate resource leaks, making it suitable for use 2175897234Sdrhin long-running programs such as graphical user interfaces 2275897234Sdrhor embedded controllers.</p> 2375897234Sdrh 2475897234Sdrh<p>This document is an introduction to the Lemon 2575897234Sdrhparser generator.</p> 2675897234Sdrh 2760c71b02Sdrh<a id="toc"></a> 2860c71b02Sdrh<h2>1.0 Table of Contents</h2> 2960c71b02Sdrh<ul> 3060c71b02Sdrh<li><a href="#main">Introduction</a> 3160c71b02Sdrh<li><a href="#toc">1.0 Table of Contents</a> 3260c71b02Sdrh<li><a href="#secnot">2.0 Security Notes</a><br> 3360c71b02Sdrh<li><a href="#optheory">3.0 Theory of Operation</a> 3460c71b02Sdrh <ul> 3560c71b02Sdrh <li><a href="#options">3.1 Command Line Options</a> 3660c71b02Sdrh <li><a href="#interface">3.2 The Parser Interface</a> 3760c71b02Sdrh <ul> 3860c71b02Sdrh <li><a href="#onstack">3.2.1 Allocating The Parse Object On Stack</a> 3960c71b02Sdrh <li><a href="#ifsum">3.2.2 Interface Summary</a> 4060c71b02Sdrh </ul> 4160c71b02Sdrh <li><a href="#yaccdiff">3.3 Differences With YACC and BISON</a> 4260c71b02Sdrh <li><a href="#build">3.4 Building The "lemon" Or "lemon.exe" Executable</a> 4360c71b02Sdrh </ul> 4460c71b02Sdrh<li><a href="#syntax">4.0 Input File Syntax</a> 4560c71b02Sdrh <ul> 4660c71b02Sdrh <li><a href="#tnt">4.1 Terminals and Nonterminals</a> 4760c71b02Sdrh <li><a href="#rules">4.2 Grammar Rules</a> 4860c71b02Sdrh <li><a href="#precrules">4.3 Precedence Rules</a> 4960c71b02Sdrh <li><a href="#special">4.4 Special Directives</a> 5060c71b02Sdrh </ul> 5160c71b02Sdrh<li><a href="#errors">5.0 Error Processing</a> 5260c71b02Sdrh<li><a href="#history">6.0 History of Lemon</a> 5360c71b02Sdrh<li><a href="#copyright">7.0 Copyright</a> 5460c71b02Sdrh</ul> 5560c71b02Sdrh 5660c71b02Sdrh<a id="secnot"></a> 5760c71b02Sdrh<h2>2.0 Security Note</h2> 58c5e56b34Sdrh 59c5e56b34Sdrh<p>The language parser code created by Lemon is very robust and 60c5e56b34Sdrhis well-suited for use in internet-facing applications that need to 617b852416Sdrhsafely process maliciously crafted inputs.</p> 62c5e56b34Sdrh 63c5e56b34Sdrh<p>The "lemon.exe" command-line tool itself works great when given a valid 64c5e56b34Sdrhinput grammar file and almost always gives helpful 65c5e56b34Sdrherror messages for malformed inputs. However, it is possible for 66c5e56b34Sdrha malicious user to craft a grammar file that will cause 67c5e56b34Sdrhlemon.exe to crash. 68c5e56b34SdrhWe do not see this as a problem, as lemon.exe is not intended to be used 69c5e56b34Sdrhwith hostile inputs. 70c5e56b34SdrhTo summarize:</p> 71c5e56b34Sdrh 72c5e56b34Sdrh<ul> 73c5e56b34Sdrh<li>Parser code generated by lemon → Robust and secure 74c5e56b34Sdrh<li>The "lemon.exe" command line tool itself → Not so much 75c5e56b34Sdrh</ul> 76c5e56b34Sdrh 7760c71b02Sdrh<a id="optheory"></a> 7860c71b02Sdrh<h2>3.0 Theory of Operation</h2> 7975897234Sdrh 8060c71b02Sdrh<p>Lemon is computer program that translates a context free grammar (CFG) 8175897234Sdrhfor a particular language into C code that implements a parser for 8275897234Sdrhthat language. 8360c71b02SdrhThe Lemon program has two inputs:</p> 8475897234Sdrh<ul> 8575897234Sdrh<li>The grammar specification. 8675897234Sdrh<li>A parser template file. 8775897234Sdrh</ul> 887b852416Sdrh<p>Typically, only the grammar specification is supplied by the programmer. 8960c71b02SdrhLemon comes with a default parser template 9060c71b02Sdrh("<a href="https://sqlite.org/src/file/tool/lempar.c">lempar.c</a>") 9160c71b02Sdrhthat works fine for most applications. But the user is free to substitute 9260c71b02Sdrha different parser template if desired.</p> 9375897234Sdrh 949a243e69Sdrh<p>Depending on command-line options, Lemon will generate up to 957b852416Sdrhthree output files.</p> 9675897234Sdrh<ul> 9760c71b02Sdrh<li>C code to implement a parser for the input grammar. 9860c71b02Sdrh<li>A header file defining an integer ID for each terminal symbol 9960c71b02Sdrh (or "token"). 10075897234Sdrh<li>An information file that describes the states of the generated parser 10175897234Sdrh automaton. 10275897234Sdrh</ul> 1037b852416Sdrh<p>By default, all three of these output files are generated. 1049bccde3dSdrhThe header file is suppressed if the "-m" command-line option is 1059bccde3dSdrhused and the report file is omitted when "-q" is selected.</p> 10675897234Sdrh 1079bccde3dSdrh<p>The grammar specification file uses a ".y" suffix, by convention. 10875897234SdrhIn the examples used in this document, we'll assume the name of the 1099bccde3dSdrhgrammar file is "gram.y". A typical use of Lemon would be the 1107b852416Sdrhfollowing command:</p> 11175897234Sdrh<pre> 11275897234Sdrh lemon gram.y 11375897234Sdrh</pre> 1147b852416Sdrh<p>This command will generate three output files named "gram.c", 1159bccde3dSdrh"gram.h" and "gram.out". 11675897234SdrhThe first is C code to implement the parser. The second 11775897234Sdrhis the header file that defines numerical values for all 11875897234Sdrhterminal symbols, and the last is the report that explains 11975897234Sdrhthe states used by the parser automaton.</p> 12075897234Sdrh 12160c71b02Sdrh<a id="options"></a> 12260c71b02Sdrh<h3>3.1 Command Line Options</h3> 12375897234Sdrh 12475897234Sdrh<p>The behavior of Lemon can be modified using command-line options. 12575897234SdrhYou can obtain a list of the available command-line options together 1267b852416Sdrhwith a brief explanation of what each does by typing</p> 12775897234Sdrh<pre> 1289a243e69Sdrh lemon "-?" 12975897234Sdrh</pre> 1307b852416Sdrh<p>As of this writing, the following command-line options are supported:</p> 13175897234Sdrh<ul> 1329bccde3dSdrh<li><b>-b</b> 1339bccde3dSdrhShow only the basis for each parser state in the report file. 1349bccde3dSdrh<li><b>-c</b> 1359a243e69SdrhDo not compress the generated action tables. The parser will be a 1369a243e69Sdrhlittle larger and slower, but it will detect syntax errors sooner. 137fb32c44eSdrh<li><b>-d</b><i>directory</i> 138fb32c44eSdrhWrite all output files into <i>directory</i>. Normally, output files 139fb32c44eSdrhare written into the directory that contains the input grammar file. 1409bccde3dSdrh<li><b>-D<i>name</i></b> 1419a243e69SdrhDefine C preprocessor macro <i>name</i>. This macro is usable by 1425f0d37b5Sdrh"<tt><a href='#pifdef'>%ifdef</a></tt>", 1435f0d37b5Sdrh"<tt><a href='#pifdef'>%ifndef</a></tt>", and 1445f0d37b5Sdrh"<tt><a href="#pifdef">%if</a></tt> lines 1459a243e69Sdrhin the grammar file. 1465f0d37b5Sdrh<li><b>-E</b> 1475f0d37b5SdrhRun the "%if" preprocessor step only and print the revised grammar 1485f0d37b5Sdrhfile. 1499bccde3dSdrh<li><b>-g</b> 1509bccde3dSdrhDo not generate a parser. Instead write the input grammar to standard 1519bccde3dSdrhoutput with all comments, actions, and other extraneous text removed. 1529bccde3dSdrh<li><b>-l</b> 153dfe4e6bbSdrhOmit "#line" directives in the generated parser C code. 1549bccde3dSdrh<li><b>-m</b> 1559bccde3dSdrhCause the output C source code to be compatible with the "makeheaders" 1569bccde3dSdrhprogram. 1579bccde3dSdrh<li><b>-p</b> 1589bccde3dSdrhDisplay all conflicts that are resolved by 1599bccde3dSdrh<a href='#precrules'>precedence rules</a>. 1609bccde3dSdrh<li><b>-q</b> 1619bccde3dSdrhSuppress generation of the report file. 1629bccde3dSdrh<li><b>-r</b> 1639bccde3dSdrhDo not sort or renumber the parser states as part of optimization. 1649bccde3dSdrh<li><b>-s</b> 165ed5e6688SdrhShow parser statistics before exiting. 1669bccde3dSdrh<li><b>-T<i>file</i></b> 1679bccde3dSdrhUse <i>file</i> as the template for the generated C-code parser implementation. 1689bccde3dSdrh<li><b>-x</b> 1699bccde3dSdrhPrint the Lemon version number. 17075897234Sdrh</ul> 17175897234Sdrh 17260c71b02Sdrh<a id="interface"></a> 17360c71b02Sdrh<h3>3.2 The Parser Interface</h3> 17475897234Sdrh 17575897234Sdrh<p>Lemon doesn't generate a complete, working program. It only generates 17675897234Sdrha few subroutines that implement a parser. This section describes 17775897234Sdrhthe interface to those subroutines. It is up to the programmer to 17875897234Sdrhcall these subroutines in an appropriate way in order to produce a 17975897234Sdrhcomplete system.</p> 18075897234Sdrh 18175897234Sdrh<p>Before a program begins using a Lemon-generated parser, the program 18275897234Sdrhmust first create the parser. 1837b852416SdrhA new parser is created as follows:</p> 18475897234Sdrh<pre> 18575897234Sdrh void *pParser = ParseAlloc( malloc ); 18675897234Sdrh</pre> 1877b852416Sdrh<p>The ParseAlloc() routine allocates and initializes a new parser and 18875897234Sdrhreturns a pointer to it. 1899bccde3dSdrhThe actual data structure used to represent a parser is opaque — 19075897234Sdrhits internal structure is not visible or usable by the calling routine. 19175897234SdrhFor this reason, the ParseAlloc() routine returns a pointer to void 19275897234Sdrhrather than a pointer to some particular structure. 19375897234SdrhThe sole argument to the ParseAlloc() routine is a pointer to the 1949bccde3dSdrhsubroutine used to allocate memory. Typically this means malloc().</p> 19575897234Sdrh 19675897234Sdrh<p>After a program is finished using a parser, it can reclaim all 1977b852416Sdrhmemory allocated by that parser by calling</p> 19875897234Sdrh<pre> 19975897234Sdrh ParseFree(pParser, free); 20075897234Sdrh</pre> 2017b852416Sdrh<p>The first argument is the same pointer returned by ParseAlloc(). The 20275897234Sdrhsecond argument is a pointer to the function used to release bulk 20375897234Sdrhmemory back to the system.</p> 20475897234Sdrh 20575897234Sdrh<p>After a parser has been allocated using ParseAlloc(), the programmer 20675897234Sdrhmust supply the parser with a sequence of tokens (terminal symbols) to 20775897234Sdrhbe parsed. This is accomplished by calling the following function 2087b852416Sdrhonce for each token:<p> 20975897234Sdrh<pre> 21075897234Sdrh Parse(pParser, hTokenID, sTokenData, pArg); 21175897234Sdrh</pre> 2127b852416Sdrh<p>The first argument to the Parse() routine is the pointer returned by 21375897234SdrhParseAlloc(). 2149a243e69SdrhThe second argument is a small positive integer that tells the parser the 21575897234Sdrhtype of the next token in the data stream. 21675897234SdrhThere is one token type for each terminal symbol in the grammar. 21775897234SdrhThe gram.h file generated by Lemon contains #define statements that 21875897234Sdrhmap symbolic terminal symbol names into appropriate integer values. 2199bccde3dSdrhA value of 0 for the second argument is a special flag to the 2209bccde3dSdrhparser to indicate that the end of input has been reached. 22175897234SdrhThe third argument is the value of the given token. By default, 2229a243e69Sdrhthe type of the third argument is "void*", but the grammar will 22375897234Sdrhusually redefine this type to be some kind of structure. 22475897234SdrhTypically the second argument will be a broad category of tokens 2259bccde3dSdrhsuch as "identifier" or "number" and the third argument will 22675897234Sdrhbe the name of the identifier or the value of the number.</p> 22775897234Sdrh 22875897234Sdrh<p>The Parse() function may have either three or four arguments, 22945f31be8Sdrhdepending on the grammar. If the grammar specification file requests 2309a243e69Sdrhit (via the <tt><a href='#extraarg'>%extra_argument</a></tt> directive), 23145f31be8Sdrhthe Parse() function will have a fourth parameter that can be 23275897234Sdrhof any type chosen by the programmer. The parser doesn't do anything 23375897234Sdrhwith this argument except to pass it through to action routines. 23475897234SdrhThis is a convenient mechanism for passing state information down 23575897234Sdrhto the action routines without having to use global variables.</p> 23675897234Sdrh 23775897234Sdrh<p>A typical use of a Lemon parser might look something like the 2387b852416Sdrhfollowing:</p> 23975897234Sdrh<pre> 2409a243e69Sdrh 1 ParseTree *ParseFile(const char *zFilename){ 2419a243e69Sdrh 2 Tokenizer *pTokenizer; 2429a243e69Sdrh 3 void *pParser; 2439a243e69Sdrh 4 Token sToken; 2449a243e69Sdrh 5 int hTokenId; 2459a243e69Sdrh 6 ParserState sState; 2469a243e69Sdrh 7 2479a243e69Sdrh 8 pTokenizer = TokenizerCreate(zFilename); 2489a243e69Sdrh 9 pParser = ParseAlloc( malloc ); 2499a243e69Sdrh 10 InitParserState(&sState); 2509a243e69Sdrh 11 while( GetNextToken(pTokenizer, &hTokenId, &sToken) ){ 2519a243e69Sdrh 12 Parse(pParser, hTokenId, sToken, &sState); 25275897234Sdrh 13 } 2539a243e69Sdrh 14 Parse(pParser, 0, sToken, &sState); 25475897234Sdrh 15 ParseFree(pParser, free ); 25575897234Sdrh 16 TokenizerFree(pTokenizer); 25675897234Sdrh 17 return sState.treeRoot; 25775897234Sdrh 18 } 25875897234Sdrh</pre> 2597b852416Sdrh<p>This example shows a user-written routine that parses a file of 26075897234Sdrhtext and returns a pointer to the parse tree. 2619bccde3dSdrh(All error-handling code is omitted from this example to keep it 26275897234Sdrhsimple.) 26375897234SdrhWe assume the existence of some kind of tokenizer which is created 26475897234Sdrhusing TokenizerCreate() on line 8 and deleted by TokenizerFree() 26575897234Sdrhon line 16. The GetNextToken() function on line 11 retrieves the 26675897234Sdrhnext token from the input file and puts its type in the 26775897234Sdrhinteger variable hTokenId. The sToken variable is assumed to be 26875897234Sdrhsome kind of structure that contains details about each token, 26975897234Sdrhsuch as its complete text, what line it occurs on, etc.</p> 27075897234Sdrh 2717b852416Sdrh<p>This example also assumes the existence of a structure of type 27275897234SdrhParserState that holds state information about a particular parse. 27375897234SdrhAn instance of such a structure is created on line 6 and initialized 27475897234Sdrhon line 10. A pointer to this structure is passed into the Parse() 27575897234Sdrhroutine as the optional 4th argument. 27675897234SdrhThe action routine specified by the grammar for the parser can use 27775897234Sdrhthe ParserState structure to hold whatever information is useful and 27875897234Sdrhappropriate. In the example, we note that the treeRoot field of 27975897234Sdrhthe ParserState structure is left pointing to the root of the parse 28075897234Sdrhtree.</p> 28175897234Sdrh 2827b852416Sdrh<p>The core of this example as it relates to Lemon is as follows:</p> 28375897234Sdrh<pre> 28475897234Sdrh ParseFile(){ 28575897234Sdrh pParser = ParseAlloc( malloc ); 2869a243e69Sdrh while( GetNextToken(pTokenizer,&hTokenId, &sToken) ){ 28775897234Sdrh Parse(pParser, hTokenId, sToken); 28875897234Sdrh } 28975897234Sdrh Parse(pParser, 0, sToken); 29075897234Sdrh ParseFree(pParser, free ); 29175897234Sdrh } 29275897234Sdrh</pre> 2937b852416Sdrh<p>Basically, what a program has to do to use a Lemon-generated parser 29475897234Sdrhis first create the parser, then send it lots of tokens obtained by 29575897234Sdrhtokenizing an input source. When the end of input is reached, the 29675897234SdrhParse() routine should be called one last time with a token type 29775897234Sdrhof 0. This step is necessary to inform the parser that the end of 29875897234Sdrhinput has been reached. Finally, we reclaim memory used by the 29975897234Sdrhparser by calling ParseFree().</p> 30075897234Sdrh 30175897234Sdrh<p>There is one other interface routine that should be mentioned 30275897234Sdrhbefore we move on. 30375897234SdrhThe ParseTrace() function can be used to generate debugging output 3047b852416Sdrhfrom the parser. A prototype for this routine is as follows:</p> 30575897234Sdrh<pre> 30675897234Sdrh ParseTrace(FILE *stream, char *zPrefix); 30775897234Sdrh</pre> 3087b852416Sdrh<p>After this routine is called, a short (one-line) message is written 30975897234Sdrhto the designated output stream every time the parser changes states 31075897234Sdrhor calls an action routine. Each such message is prefaced using 31175897234Sdrhthe text given by zPrefix. This debugging output can be turned off 31275897234Sdrhby calling ParseTrace() again with a first argument of NULL (0).</p> 31375897234Sdrh 31460c71b02Sdrh<a id="onstack"></a> 31560c71b02Sdrh<h4>3.2.1 Allocating The Parse Object On Stack</h4> 31660c71b02Sdrh 31760c71b02Sdrh<p>If all calls to the Parse() interface are made from within 31860c71b02Sdrh<a href="#pcode"><tt>%code</tt> directives</a>, then the parse 31960c71b02Sdrhobject can be allocated from the stack rather than from the heap. 32060c71b02SdrhThese are the steps: 32160c71b02Sdrh 32260c71b02Sdrh<ul> 32360c71b02Sdrh<li> Declare a local variable of type "yyParser" 32460c71b02Sdrh<li> Initialize the variable using ParseInit() 32560c71b02Sdrh<li> Pass a pointer to the variable in calls ot Parse() 32660c71b02Sdrh<li> Deallocate substructure in the parse variable using ParseFinalize(). 32760c71b02Sdrh</ul> 32860c71b02Sdrh 32960c71b02Sdrh<p>The following code illustrates how this is done: 33060c71b02Sdrh 33160c71b02Sdrh<pre> 33260c71b02Sdrh ParseFile(){ 33360c71b02Sdrh yyParser x; 33460c71b02Sdrh ParseInit( &x ); 33560c71b02Sdrh while( GetNextToken(pTokenizer,&hTokenId, &sToken) ){ 33660c71b02Sdrh Parse(&x, hTokenId, sToken); 33760c71b02Sdrh } 33860c71b02Sdrh Parse(&x, 0, sToken); 33960c71b02Sdrh ParseFinalize( &x ); 34060c71b02Sdrh } 34160c71b02Sdrh</pre> 34260c71b02Sdrh 34360c71b02Sdrh<a id="ifsum"></a> 34460c71b02Sdrh<h4>3.2.2 Interface Summary</h4> 34560c71b02Sdrh 34660c71b02Sdrh<p>Here is a quick overview of the C-language interface to a 34760c71b02SdrhLemon-generated parser:</p> 34860c71b02Sdrh 34960c71b02Sdrh<blockquote><pre> 35060c71b02Sdrhvoid *ParseAlloc( (void*(*malloc)(size_t) ); 35160c71b02Sdrhvoid ParseFree(void *pParser, (void(*free)(void*) ); 35260c71b02Sdrhvoid Parse(void *pParser, int tokenCode, ParseTOKENTYPE token, ...); 35360c71b02Sdrhvoid ParseTrace(FILE *stream, char *zPrefix); 35460c71b02Sdrh</pre></blockquote> 35560c71b02Sdrh 35660c71b02Sdrh<p>Notes:</p> 35760c71b02Sdrh<ul> 35860c71b02Sdrh<li> Use the <a href="#pname"><tt>%name</tt> directive</a> to change 35960c71b02Sdrhthe "Parse" prefix names of the procedures in the interface. 36060c71b02Sdrh<li> Use the <a href="#token_type"><tt>%token_type</tt> directive</a> 36160c71b02Sdrhto define the "ParseTOKENTYPE" type. 36260c71b02Sdrh<li> Use the <a href="#extraarg"><tt>%extra_argument</tt> directive</a> 36360c71b02Sdrhto specify the type and name of the 4th parameter to the 36460c71b02SdrhParse() function. 36560c71b02Sdrh</ul> 36660c71b02Sdrh 36760c71b02Sdrh<a id="yaccdiff"></a> 36860c71b02Sdrh<h3>3.3 Differences With YACC and BISON</h3> 36975897234Sdrh 37075897234Sdrh<p>Programmers who have previously used the yacc or bison parser 37175897234Sdrhgenerator will notice several important differences between yacc and/or 3727b852416Sdrhbison and Lemon.</p> 37375897234Sdrh<ul> 37475897234Sdrh<li>In yacc and bison, the parser calls the tokenizer. In Lemon, 37575897234Sdrh the tokenizer calls the parser. 37675897234Sdrh<li>Lemon uses no global variables. Yacc and bison use global variables 37775897234Sdrh to pass information between the tokenizer and parser. 37875897234Sdrh<li>Lemon allows multiple parsers to be running simultaneously. Yacc 37975897234Sdrh and bison do not. 38075897234Sdrh</ul> 3817b852416Sdrh<p>These differences may cause some initial confusion for programmers 38275897234Sdrhwith prior yacc and bison experience. 38375897234SdrhBut after years of experience using Lemon, I firmly 38475897234Sdrhbelieve that the Lemon way of doing things is better.</p> 38575897234Sdrh 38645f31be8Sdrh<p><i>Updated as of 2016-02-16:</i> 38745f31be8SdrhThe text above was written in the 1990s. 38845f31be8SdrhWe are told that Bison has lately been enhanced to support the 38960c71b02Sdrhtokenizer-calls-parser paradigm used by Lemon, eliminating the 39045f31be8Sdrhneed for global variables.</p> 39145f31be8Sdrh 39260c71b02Sdrh<a id="build"><a> 39360c71b02Sdrh<h3>3.4 Building The "lemon" or "lemon.exe" Executable</h3> 39460c71b02Sdrh 39560c71b02Sdrh<p>The "lemon" or "lemon.exe" program is built from a single file 39660c71b02Sdrhof C-code named 39760c71b02Sdrh"<a href="https://sqlite.org/src/tool/lemon.c">lemon.c</a>". 39860c71b02SdrhThe Lemon source code is generic C89 code that uses 39960c71b02Sdrhno unusual or non-standard libraries. Any 40060c71b02Sdrhreasonable C compiler should suffice to compile the lemon program. 40160c71b02SdrhA command-line like the following will usually work:</p> 40260c71b02Sdrh 40360c71b02Sdrh<blockquote><pre> 40460c71b02Sdrhcc -o lemon lemon.c 40560c71b02Sdrh</pre></blockquote 40660c71b02Sdrh 40760c71b02Sdrh<p>On Windows machines with Visual C++ installed, bring up a 40860c71b02Sdrh"VS20<i>NN</i> x64 Native Tools Command Prompt" window and enter: 40960c71b02Sdrh 41060c71b02Sdrh<blockquote><pre> 41160c71b02Sdrhcl lemon.c 41260c71b02Sdrh</pre></blockquote> 41360c71b02Sdrh 41460c71b02Sdrh<p>Compiling Lemon really is that simple. 41560c71b02SdrhAdditional compiler options such as 41660c71b02Sdrh"-O2" or "-g" or "-Wall" can be added if desired, but they are not 41760c71b02Sdrhnecessary.</p> 41860c71b02Sdrh 41960c71b02Sdrh 42060c71b02Sdrh<a id="syntax"></a> 42160c71b02Sdrh<h2>4.0 Input File Syntax</h2> 42275897234Sdrh 42375897234Sdrh<p>The main purpose of the grammar specification file for Lemon is 42475897234Sdrhto define the grammar for the parser. But the input file also 42575897234Sdrhspecifies additional information Lemon requires to do its job. 42675897234SdrhMost of the work in using Lemon is in writing an appropriate 42775897234Sdrhgrammar file.</p> 42875897234Sdrh 4297b852416Sdrh<p>The grammar file for Lemon is, for the most part, a free format. 43075897234SdrhIt does not have sections or divisions like yacc or bison. Any 4317b852416Sdrhdeclaration can occur at any point in the file. Lemon ignores 4327b852416Sdrhwhitespace (except where it is needed to separate tokens), and it 4337b852416Sdrhhonors the same commenting conventions as C and C++.</p> 43475897234Sdrh 43560c71b02Sdrh<a id="tnt"></a> 43660c71b02Sdrh<h3>4.1 Terminals and Nonterminals</h3> 43775897234Sdrh 43875897234Sdrh<p>A terminal symbol (token) is any string of alphanumeric 4399bccde3dSdrhand/or underscore characters 44075897234Sdrhthat begins with an uppercase letter. 441c8eee5e5SdrhA terminal can contain lowercase letters after the first character, 44275897234Sdrhbut the usual convention is to make terminals all uppercase. 44375897234SdrhA nonterminal, on the other hand, is any string of alphanumeric 44475897234Sdrhand underscore characters than begins with a lowercase letter. 4459a243e69SdrhAgain, the usual convention is to make nonterminals use all lowercase 4469a243e69Sdrhletters.</p> 44775897234Sdrh 44875897234Sdrh<p>In Lemon, terminal and nonterminal symbols do not need to 44975897234Sdrhbe declared or identified in a separate section of the grammar file. 45075897234SdrhLemon is able to generate a list of all terminals and nonterminals 45175897234Sdrhby examining the grammar rules, and it can always distinguish a 45275897234Sdrhterminal from a nonterminal by checking the case of the first 45375897234Sdrhcharacter of the name.</p> 45475897234Sdrh 45575897234Sdrh<p>Yacc and bison allow terminal symbols to have either alphanumeric 45675897234Sdrhnames or to be individual characters included in single quotes, like 45775897234Sdrhthis: ')' or '$'. Lemon does not allow this alternative form for 45875897234Sdrhterminal symbols. With Lemon, all symbols, terminals and nonterminals, 45975897234Sdrhmust have alphanumeric names.</p> 46075897234Sdrh 46160c71b02Sdrh<a id="rules"></a> 46260c71b02Sdrh<h3>4.2 Grammar Rules</h3> 46375897234Sdrh 46475897234Sdrh<p>The main component of a Lemon grammar file is a sequence of grammar 46575897234Sdrhrules. 46675897234SdrhEach grammar rule consists of a nonterminal symbol followed by 4679bccde3dSdrhthe special symbol "::=" and then a list of terminals and/or nonterminals. 46875897234SdrhThe rule is terminated by a period. 46975897234SdrhThe list of terminals and nonterminals on the right-hand side of the 47075897234Sdrhrule can be empty. 47175897234SdrhRules can occur in any order, except that the left-hand side of the 47275897234Sdrhfirst rule is assumed to be the start symbol for the grammar (unless 4739a243e69Sdrhspecified otherwise using the <tt><a href='#start_symbol'>%start_symbol</a></tt> 4749a243e69Sdrhdirective described below.) 4757b852416SdrhA typical sequence of grammar rules might look something like this:</p> 47675897234Sdrh<pre> 47775897234Sdrh expr ::= expr PLUS expr. 47875897234Sdrh expr ::= expr TIMES expr. 47975897234Sdrh expr ::= LPAREN expr RPAREN. 48075897234Sdrh expr ::= VALUE. 48175897234Sdrh</pre> 48275897234Sdrh 4839bccde3dSdrh<p>There is one non-terminal in this example, "expr", and five 4849bccde3dSdrhterminal symbols or tokens: "PLUS", "TIMES", "LPAREN", 4859bccde3dSdrh"RPAREN" and "VALUE".</p> 48675897234Sdrh 48775897234Sdrh<p>Like yacc and bison, Lemon allows the grammar to specify a block 48875897234Sdrhof C code that will be executed whenever a grammar rule is reduced 48975897234Sdrhby the parser. 49075897234SdrhIn Lemon, this action is specified by putting the C code (contained 49175897234Sdrhwithin curly braces <tt>{...}</tt>) immediately after the 49275897234Sdrhperiod that closes the rule. 4937b852416SdrhFor example:</p> 49475897234Sdrh<pre> 49575897234Sdrh expr ::= expr PLUS expr. { printf("Doing an addition...\n"); } 49675897234Sdrh</pre> 49775897234Sdrh 49875897234Sdrh<p>In order to be useful, grammar actions must normally be linked to 49975897234Sdrhtheir associated grammar rules. 5009bccde3dSdrhIn yacc and bison, this is accomplished by embedding a "$$" in the 50175897234Sdrhaction to stand for the value of the left-hand side of the rule and 5029bccde3dSdrhsymbols "$1", "$2", and so forth to stand for the value of 50375897234Sdrhthe terminal or nonterminal at position 1, 2 and so forth on the 50475897234Sdrhright-hand side of the rule. 50575897234SdrhThis idea is very powerful, but it is also very error-prone. The 50675897234Sdrhsingle most common source of errors in a yacc or bison grammar is 50775897234Sdrhto miscount the number of symbols on the right-hand side of a grammar 5089bccde3dSdrhrule and say "$7" when you really mean "$8".</p> 50975897234Sdrh 51075897234Sdrh<p>Lemon avoids the need to count grammar symbols by assigning symbolic 51175897234Sdrhnames to each symbol in a grammar rule and then using those symbolic 51275897234Sdrhnames in the action. 5137b852416SdrhIn yacc or bison, one would write this:</p> 51475897234Sdrh<pre> 5159a243e69Sdrh expr -> expr PLUS expr { $$ = $1 + $3; }; 51675897234Sdrh</pre> 5177b852416Sdrh<p>But in Lemon, the same rule becomes the following:</p> 51875897234Sdrh<pre> 51975897234Sdrh expr(A) ::= expr(B) PLUS expr(C). { A = B+C; } 52075897234Sdrh</pre> 5217b852416Sdrh<p>In the Lemon rule, any symbol in parentheses after a grammar rule 52275897234Sdrhsymbol becomes a place holder for that symbol in the grammar rule. 52375897234SdrhThis place holder can then be used in the associated C action to 5247b852416Sdrhstand for the value of that symbol.</p> 52575897234Sdrh 52675897234Sdrh<p>The Lemon notation for linking a grammar rule with its reduce 52775897234Sdrhaction is superior to yacc/bison on several counts. 52875897234SdrhFirst, as mentioned above, the Lemon method avoids the need to 52975897234Sdrhcount grammar symbols. 53075897234SdrhSecondly, if a terminal or nonterminal in a Lemon grammar rule 53175897234Sdrhincludes a linking symbol in parentheses but that linking symbol 53275897234Sdrhis not actually used in the reduce action, then an error message 53375897234Sdrhis generated. 5347b852416SdrhFor example, the rule</p> 53575897234Sdrh<pre> 53675897234Sdrh expr(A) ::= expr(B) PLUS expr(C). { A = B; } 53775897234Sdrh</pre> 5387b852416Sdrh<p>will generate an error because the linking symbol "C" is used 53975897234Sdrhin the grammar rule but not in the reduce action.</p> 54075897234Sdrh 54175897234Sdrh<p>The Lemon notation for linking grammar rules to reduce actions 54275897234Sdrhalso facilitates the use of destructors for reclaiming memory 54375897234Sdrhallocated by the values of terminals and nonterminals on the 54475897234Sdrhright-hand side of a rule.</p> 54575897234Sdrh 5467b852416Sdrh<a id='precrules'></a> 54760c71b02Sdrh<h3>4.3 Precedence Rules</h3> 54875897234Sdrh 54975897234Sdrh<p>Lemon resolves parsing ambiguities in exactly the same way as 55075897234Sdrhyacc and bison. A shift-reduce conflict is resolved in favor 55175897234Sdrhof the shift, and a reduce-reduce conflict is resolved by reducing 55275897234Sdrhwhichever rule comes first in the grammar file.</p> 55375897234Sdrh 55475897234Sdrh<p>Just like in 55575897234Sdrhyacc and bison, Lemon allows a measure of control 5569a243e69Sdrhover the resolution of parsing conflicts using precedence rules. 55775897234SdrhA precedence value can be assigned to any terminal symbol 5589bccde3dSdrhusing the 5599a243e69Sdrh<tt><a href='#pleft'>%left</a></tt>, 5609a243e69Sdrh<tt><a href='#pright'>%right</a></tt> or 5619a243e69Sdrh<tt><a href='#pnonassoc'>%nonassoc</a></tt> directives. Terminal symbols 5629a243e69Sdrhmentioned in earlier directives have a lower precedence than 56375897234Sdrhterminal symbols mentioned in later directives. For example:</p> 56475897234Sdrh 5657b852416Sdrh<pre> 56675897234Sdrh %left AND. 56775897234Sdrh %left OR. 56875897234Sdrh %nonassoc EQ NE GT GE LT LE. 56975897234Sdrh %left PLUS MINUS. 57075897234Sdrh %left TIMES DIVIDE MOD. 57175897234Sdrh %right EXP NOT. 5727b852416Sdrh</pre> 57375897234Sdrh 57475897234Sdrh<p>In the preceding sequence of directives, the AND operator is 57575897234Sdrhdefined to have the lowest precedence. The OR operator is one 57675897234Sdrhprecedence level higher. And so forth. Hence, the grammar would 5777b852416Sdrhattempt to group the ambiguous expression</p> 57875897234Sdrh<pre> 57975897234Sdrh a AND b OR c 58075897234Sdrh</pre> 5817b852416Sdrh<p>like this</p> 58275897234Sdrh<pre> 58375897234Sdrh a AND (b OR c). 58475897234Sdrh</pre> 5857b852416Sdrh<p>The associativity (left, right or nonassoc) is used to determine 58675897234Sdrhthe grouping when the precedence is the same. AND is left-associative 5877b852416Sdrhin our example, so</p> 58875897234Sdrh<pre> 58975897234Sdrh a AND b AND c 59075897234Sdrh</pre> 5917b852416Sdrh<p>is parsed like this</p> 59275897234Sdrh<pre> 59375897234Sdrh (a AND b) AND c. 59475897234Sdrh</pre> 5957b852416Sdrh<p>The EXP operator is right-associative, though, so</p> 59675897234Sdrh<pre> 59775897234Sdrh a EXP b EXP c 59875897234Sdrh</pre> 5997b852416Sdrh<p>is parsed like this</p> 60075897234Sdrh<pre> 60175897234Sdrh a EXP (b EXP c). 60275897234Sdrh</pre> 6037b852416Sdrh<p>The nonassoc precedence is used for non-associative operators. 6047b852416SdrhSo</p> 60575897234Sdrh<pre> 60675897234Sdrh a EQ b EQ c 60775897234Sdrh</pre> 6087b852416Sdrh<p>is an error.</p> 60975897234Sdrh 61075897234Sdrh<p>The precedence of non-terminals is transferred to rules as follows: 61175897234SdrhThe precedence of a grammar rule is equal to the precedence of the 61275897234Sdrhleft-most terminal symbol in the rule for which a precedence is 61375897234Sdrhdefined. This is normally what you want, but in those cases where 614ed5e6688Sdrhyou want the precedence of a grammar rule to be something different, 61575897234Sdrhyou can specify an alternative precedence symbol by putting the 61675897234Sdrhsymbol in square braces after the period at the end of the rule and 61775897234Sdrhbefore any C-code. For example:</p> 61875897234Sdrh 6197b852416Sdrh<pre> 62075897234Sdrh expr = MINUS expr. [NOT] 6217b852416Sdrh</pre> 62275897234Sdrh 62375897234Sdrh<p>This rule has a precedence equal to that of the NOT symbol, not the 62475897234SdrhMINUS symbol as would have been the case by default.</p> 62575897234Sdrh 62675897234Sdrh<p>With the knowledge of how precedence is assigned to terminal 62775897234Sdrhsymbols and individual 62875897234Sdrhgrammar rules, we can now explain precisely how parsing conflicts 62975897234Sdrhare resolved in Lemon. Shift-reduce conflicts are resolved 6307b852416Sdrhas follows:</p> 63175897234Sdrh<ul> 63275897234Sdrh<li> If either the token to be shifted or the rule to be reduced 63375897234Sdrh lacks precedence information, then resolve in favor of the 63475897234Sdrh shift, but report a parsing conflict. 63575897234Sdrh<li> If the precedence of the token to be shifted is greater than 63675897234Sdrh the precedence of the rule to reduce, then resolve in favor 63775897234Sdrh of the shift. No parsing conflict is reported. 6389a243e69Sdrh<li> If the precedence of the token to be shifted is less than the 63975897234Sdrh precedence of the rule to reduce, then resolve in favor of the 64075897234Sdrh reduce action. No parsing conflict is reported. 64175897234Sdrh<li> If the precedences are the same and the shift token is 64275897234Sdrh right-associative, then resolve in favor of the shift. 64375897234Sdrh No parsing conflict is reported. 6449a243e69Sdrh<li> If the precedences are the same and the shift token is 64575897234Sdrh left-associative, then resolve in favor of the reduce. 64675897234Sdrh No parsing conflict is reported. 6479a243e69Sdrh<li> Otherwise, resolve the conflict by doing the shift, and 6489a243e69Sdrh report a parsing conflict. 64975897234Sdrh</ul> 6507b852416Sdrh<p>Reduce-reduce conflicts are resolved this way:</p> 65175897234Sdrh<ul> 65275897234Sdrh<li> If either reduce rule 65375897234Sdrh lacks precedence information, then resolve in favor of the 6549a243e69Sdrh rule that appears first in the grammar, and report a parsing 65575897234Sdrh conflict. 6569a243e69Sdrh<li> If both rules have precedence and the precedence is different, 65775897234Sdrh then resolve the dispute in favor of the rule with the highest 6589a243e69Sdrh precedence, and do not report a conflict. 65975897234Sdrh<li> Otherwise, resolve the conflict by reducing by the rule that 6609a243e69Sdrh appears first in the grammar, and report a parsing conflict. 66175897234Sdrh</ul> 66275897234Sdrh 66360c71b02Sdrh<a id="special"></a> 66460c71b02Sdrh<h3>4.4 Special Directives</h3> 66575897234Sdrh 66675897234Sdrh<p>The input grammar to Lemon consists of grammar rules and special 66775897234Sdrhdirectives. We've described all the grammar rules, so now we'll 66875897234Sdrhtalk about the special directives.</p> 66975897234Sdrh 6709a243e69Sdrh<p>Directives in Lemon can occur in any order. You can put them before 6719a243e69Sdrhthe grammar rules, or after the grammar rules, or in the midst of the 67275897234Sdrhgrammar rules. It doesn't matter. The relative order of 67375897234Sdrhdirectives used to assign precedence to terminals is important, but 67475897234Sdrhother than that, the order of directives in Lemon is arbitrary.</p> 67575897234Sdrh 6767b852416Sdrh<p>Lemon supports the following special directives:</p> 67775897234Sdrh<ul> 6789a243e69Sdrh<li><tt><a href='#pcode'>%code</a></tt> 6799a243e69Sdrh<li><tt><a href='#default_destructor'>%default_destructor</a></tt> 6809a243e69Sdrh<li><tt><a href='#default_type'>%default_type</a></tt> 6819a243e69Sdrh<li><tt><a href='#destructor'>%destructor</a></tt> 6825f0d37b5Sdrh<li><tt><a href='#pifdef'>%else</a></tt> 6839a243e69Sdrh<li><tt><a href='#pifdef'>%endif</a></tt> 6849a243e69Sdrh<li><tt><a href='#extraarg'>%extra_argument</a></tt> 6859a243e69Sdrh<li><tt><a href='#pfallback'>%fallback</a></tt> 6865f0d37b5Sdrh<li><tt><a href='#pifdef'>%if</a></tt> 6879a243e69Sdrh<li><tt><a href='#pifdef'>%ifdef</a></tt> 6889a243e69Sdrh<li><tt><a href='#pifdef'>%ifndef</a></tt> 6899a243e69Sdrh<li><tt><a href='#pinclude'>%include</a></tt> 6909a243e69Sdrh<li><tt><a href='#pleft'>%left</a></tt> 6919a243e69Sdrh<li><tt><a href='#pname'>%name</a></tt> 6929a243e69Sdrh<li><tt><a href='#pnonassoc'>%nonassoc</a></tt> 6939a243e69Sdrh<li><tt><a href='#parse_accept'>%parse_accept</a></tt> 6949a243e69Sdrh<li><tt><a href='#parse_failure'>%parse_failure</a></tt> 6959a243e69Sdrh<li><tt><a href='#pright'>%right</a></tt> 6969a243e69Sdrh<li><tt><a href='#stack_overflow'>%stack_overflow</a></tt> 6979a243e69Sdrh<li><tt><a href='#stack_size'>%stack_size</a></tt> 6989a243e69Sdrh<li><tt><a href='#start_symbol'>%start_symbol</a></tt> 6999a243e69Sdrh<li><tt><a href='#syntax_error'>%syntax_error</a></tt> 700*9cffb0ffSdrh<li><tt><a href='#token'>%token</a></tt> 7019a243e69Sdrh<li><tt><a href='#token_class'>%token_class</a></tt> 7029a243e69Sdrh<li><tt><a href='#token_destructor'>%token_destructor</a></tt> 7039a243e69Sdrh<li><tt><a href='#token_prefix'>%token_prefix</a></tt> 7049a243e69Sdrh<li><tt><a href='#token_type'>%token_type</a></tt> 7059a243e69Sdrh<li><tt><a href='#ptype'>%type</a></tt> 7069a243e69Sdrh<li><tt><a href='#pwildcard'>%wildcard</a></tt> 70775897234Sdrh</ul> 7087b852416Sdrh<p>Each of these directives will be described separately in the 70975897234Sdrhfollowing sections:</p> 71075897234Sdrh 7117b852416Sdrh<a id='pcode'></a> 71260c71b02Sdrh<h4>4.4.1 The <tt>%code</tt> directive</h4> 713f2340fc7Sdrh 7149a243e69Sdrh<p>The <tt>%code</tt> directive is used to specify additional C code that 715f2340fc7Sdrhis added to the end of the main output file. This is similar to 7169a243e69Sdrhthe <tt><a href='#pinclude'>%include</a></tt> directive except that 7179a243e69Sdrh<tt>%include</tt> is inserted at the beginning of the main output file.</p> 718f2340fc7Sdrh 7199a243e69Sdrh<p><tt>%code</tt> is typically used to include some action routines or perhaps 7209bccde3dSdrha tokenizer or even the "main()" function 7219bccde3dSdrhas part of the output file.</p> 722f2340fc7Sdrh 72360c71b02Sdrh<p>There can be multiple <tt>%code</tt> directives. The arguments of 72460c71b02Sdrhall <tt>%code</tt> directives are concatenated.</p> 72560c71b02Sdrh 7267b852416Sdrh<a id='default_destructor'></a> 72760c71b02Sdrh<h4>4.4.2 The <tt>%default_destructor</tt> directive</h4> 728f2340fc7Sdrh 7299a243e69Sdrh<p>The <tt>%default_destructor</tt> directive specifies a destructor to 730f2340fc7Sdrhuse for non-terminals that do not have their own destructor 7319a243e69Sdrhspecified by a separate <tt>%destructor</tt> directive. See the documentation 7327b852416Sdrhon the <tt><a href='#destructor'>%destructor</a></tt> directive below for 7339bccde3dSdrhadditional information.</p> 734f2340fc7Sdrh 7359a243e69Sdrh<p>In some grammars, many different non-terminal symbols have the 736f2340fc7Sdrhsame data type and hence the same destructor. This directive is 7379a243e69Sdrha convenient way to specify the same destructor for all those 738f2340fc7Sdrhnon-terminals using a single statement.</p> 739f2340fc7Sdrh 7407b852416Sdrh<a id='default_type'></a> 74160c71b02Sdrh<h4>4.4.3 The <tt>%default_type</tt> directive</h4> 742f2340fc7Sdrh 7439a243e69Sdrh<p>The <tt>%default_type</tt> directive specifies the data type of non-terminal 7449a243e69Sdrhsymbols that do not have their own data type defined using a separate 7459a243e69Sdrh<tt><a href='#ptype'>%type</a></tt> directive.</p> 746f2340fc7Sdrh 7477b852416Sdrh<a id='destructor'></a> 74860c71b02Sdrh<h4>4.4.4 The <tt>%destructor</tt> directive</h4> 74975897234Sdrh 7509a243e69Sdrh<p>The <tt>%destructor</tt> directive is used to specify a destructor for 75175897234Sdrha non-terminal symbol. 7529a243e69Sdrh(See also the <tt><a href='#token_destructor'>%token_destructor</a></tt> 7539bccde3dSdrhdirective which is used to specify a destructor for terminal symbols.)</p> 75475897234Sdrh 75575897234Sdrh<p>A non-terminal's destructor is called to dispose of the 75675897234Sdrhnon-terminal's value whenever the non-terminal is popped from 7577b852416Sdrhthe stack. This includes all of the following circumstances:</p> 75875897234Sdrh<ul> 75975897234Sdrh<li> When a rule reduces and the value of a non-terminal on 76075897234Sdrh the right-hand side is not linked to C code. 76175897234Sdrh<li> When the stack is popped during error processing. 76275897234Sdrh<li> When the ParseFree() function runs. 76375897234Sdrh</ul> 7647b852416Sdrh<p>The destructor can do whatever it wants with the value of 76575897234Sdrhthe non-terminal, but its design is to deallocate memory 76675897234Sdrhor other resources held by that non-terminal.</p> 76775897234Sdrh 7687b852416Sdrh<p>Consider an example:</p> 76975897234Sdrh<pre> 77075897234Sdrh %type nt {void*} 77175897234Sdrh %destructor nt { free($$); } 77275897234Sdrh nt(A) ::= ID NUM. { A = malloc( 100 ); } 77375897234Sdrh</pre> 7747b852416Sdrh<p>This example is a bit contrived, but it serves to illustrate how 77575897234Sdrhdestructors work. The example shows a non-terminal named 7769bccde3dSdrh"nt" that holds values of type "void*". When the rule for 7779bccde3dSdrhan "nt" reduces, it sets the value of the non-terminal to 77875897234Sdrhspace obtained from malloc(). Later, when the nt non-terminal 77975897234Sdrhis popped from the stack, the destructor will fire and call 78075897234Sdrhfree() on this malloced space, thus avoiding a memory leak. 7819bccde3dSdrh(Note that the symbol "$$" in the destructor code is replaced 78275897234Sdrhby the value of the non-terminal.)</p> 78375897234Sdrh 78475897234Sdrh<p>It is important to note that the value of a non-terminal is passed 78575897234Sdrhto the destructor whenever the non-terminal is removed from the 78675897234Sdrhstack, unless the non-terminal is used in a C-code action. If 78775897234Sdrhthe non-terminal is used by C-code, then it is assumed that the 7889bccde3dSdrhC-code will take care of destroying it. 7899bccde3dSdrhMore commonly, the value is used to build some 7909a243e69Sdrhlarger structure, and we don't want to destroy it, which is why 79175897234Sdrhthe destructor is not called in this circumstance.</p> 79275897234Sdrh 7939bccde3dSdrh<p>Destructors help avoid memory leaks by automatically freeing 7949bccde3dSdrhallocated objects when they go out of scope. 79575897234SdrhTo do the same using yacc or bison is much more difficult.</p> 79675897234Sdrh 7977b852416Sdrh<a id='extraarg'></a> 79860c71b02Sdrh<h4>4.4.5 The <tt>%extra_argument</tt> directive</h4> 79975897234Sdrh 8007b852416Sdrh<p>The <tt>%extra_argument</tt> directive instructs Lemon to add a 4th parameter 80175897234Sdrhto the parameter list of the Parse() function it generates. Lemon 80275897234Sdrhdoesn't do anything itself with this extra argument, but it does 80375897234Sdrhmake the argument available to C-code action routines, destructors, 80475897234Sdrhand so forth. For example, if the grammar file contains:</p> 80575897234Sdrh 8067b852416Sdrh<pre> 80775897234Sdrh %extra_argument { MyStruct *pAbc } 8087b852416Sdrh</pre> 80975897234Sdrh 81075897234Sdrh<p>Then the Parse() function generated will have an 4th parameter 8119bccde3dSdrhof type "MyStruct*" and all action routines will have access to 8129bccde3dSdrha variable named "pAbc" that is the value of the 4th parameter 81375897234Sdrhin the most recent call to Parse().</p> 81475897234Sdrh 815fb32c44eSdrh<p>The <tt>%extra_context</tt> directive works the same except that it 816fb32c44eSdrhis passed in on the ParseAlloc() or ParseInit() routines instead of 8177b852416Sdrhon Parse().</p> 818fb32c44eSdrh 8197b852416Sdrh<a id='extractx'></a> 82060c71b02Sdrh<h4>4.4.6 The <tt>%extra_context</tt> directive</h4> 821fb32c44eSdrh 8227b852416Sdrh<p>The <tt>%extra_context</tt> directive instructs Lemon to add a 2nd parameter 8237b852416Sdrhto the parameter list of the ParseAlloc() and ParseInit() functions. Lemon 824fb32c44eSdrhdoesn't do anything itself with these extra argument, but it does 825fb32c44eSdrhstore the value make it available to C-code action routines, destructors, 826fb32c44eSdrhand so forth. For example, if the grammar file contains:</p> 827fb32c44eSdrh 8287b852416Sdrh<pre> 829fb32c44eSdrh %extra_context { MyStruct *pAbc } 8307b852416Sdrh</pre> 831fb32c44eSdrh 832ed5e6688Sdrh<p>Then the ParseAlloc() and ParseInit() functions will have an 2nd parameter 833fb32c44eSdrhof type "MyStruct*" and all action routines will have access to 834ed5e6688Sdrha variable named "pAbc" that is the value of that 2nd parameter.</p> 835fb32c44eSdrh 836fb32c44eSdrh<p>The <tt>%extra_argument</tt> directive works the same except that it 8377b852416Sdrhis passed in on the Parse() routine instead of on ParseAlloc()/ParseInit().</p> 838fb32c44eSdrh 8397b852416Sdrh<a id='pfallback'></a> 84060c71b02Sdrh<h4>4.4.7 The <tt>%fallback</tt> directive</h4> 8419bccde3dSdrh 8429a243e69Sdrh<p>The <tt>%fallback</tt> directive specifies an alternative meaning for one 8439bccde3dSdrhor more tokens. The alternative meaning is tried if the original token 8449a243e69Sdrhwould have generated a syntax error.</p> 8459bccde3dSdrh 8469a243e69Sdrh<p>The <tt>%fallback</tt> directive was added to support robust parsing of SQL 8479a243e69Sdrhsyntax in <a href='https://www.sqlite.org/'>SQLite</a>. 8489bccde3dSdrhThe SQL language contains a large assortment of keywords, each of which 8499bccde3dSdrhappears as a different token to the language parser. SQL contains so 8509a243e69Sdrhmany keywords that it can be difficult for programmers to keep up with 8519bccde3dSdrhthem all. Programmers will, therefore, sometimes mistakenly use an 8529a243e69Sdrhobscure language keyword for an identifier. The <tt>%fallback</tt> directive 8539bccde3dSdrhprovides a mechanism to tell the parser: "If you are unable to parse 8549a243e69Sdrhthis keyword, try treating it as an identifier instead."</p> 8559bccde3dSdrh 8567b852416Sdrh<p>The syntax of <tt>%fallback</tt> is as follows:</p> 8579bccde3dSdrh 8589bccde3dSdrh<blockquote> 8599bccde3dSdrh<tt>%fallback</tt> <i>ID</i> <i>TOKEN...</i> <b>.</b> 8609a243e69Sdrh</blockquote></p> 8619bccde3dSdrh 8629a243e69Sdrh<p>In words, the <tt>%fallback</tt> directive is followed by a list of token 8639a243e69Sdrhnames terminated by a period. 8649a243e69SdrhThe first token name is the fallback token — the 8659bccde3dSdrhtoken to which all the other tokens fall back to. The second and subsequent 8669bccde3dSdrharguments are tokens which fall back to the token identified by the first 8679a243e69Sdrhargument.</p> 8689bccde3dSdrh 8697b852416Sdrh<a id='pifdef'></a> 87060c71b02Sdrh<h4>4.4.8 The <tt>%if</tt> directive and its friends</h4> 8719bccde3dSdrh 8725f0d37b5Sdrh<p>The <tt>%if</tt>, <tt>%ifdef</tt>, <tt>%ifndef</tt>, <tt>%else</tt>, 8735f0d37b5Sdrhand <tt>%endif</tt> directives 8745f0d37b5Sdrhare similar to #if, #ifdef, #ifndef, #else, and #endif in the C-preprocessor, 8759a243e69Sdrhjust not as general. 8769bccde3dSdrhEach of these directives must begin at the left margin. No whitespace 8779a243e69Sdrhis allowed between the "%" and the directive name.</p> 8789bccde3dSdrh 8799a243e69Sdrh<p>Grammar text in between "<tt>%ifdef MACRO</tt>" and the next nested 8809a243e69Sdrh"<tt>%endif</tt>" is 8819bccde3dSdrhignored unless the "-DMACRO" command-line option is used. Grammar text 8829a243e69Sdrhbetwen "<tt>%ifndef MACRO</tt>" and the next nested "<tt>%endif</tt>" is 8835f0d37b5Sdrhincluded except when the "-DMACRO" command-line option is used.<p> 8849bccde3dSdrh 8855f0d37b5Sdrh<p>The text in between "<tt>%if</tt> <i>CONDITIONAL</i>" and its 8865f0d37b5Sdrhcorresponding <tt>%endif</tt> is included only if <i>CONDITIONAL</i> 8875f0d37b5Sdrhis true. The CONDITION is one or more macro names, optionally connected 8885f0d37b5Sdrhusing the "||" and "&&" binary operators, the "!" unary operator, 8895f0d37b5Sdrhand grouped using balanced parentheses. Each term is true if the 8905f0d37b5Sdrhcorresponding macro exists, and false if it does not exist.</p> 8919bccde3dSdrh 8925f0d37b5Sdrh<p>An optional "<tt>%else</tt>" directive can occur anywhere in between a 8935f0d37b5Sdrh<tt>%ifdef</tt>, <tt>%ifndef</tt>, or <tt>%if</tt> directive and 8945f0d37b5Sdrhits corresponding <tt>%endif</tt>.</p> 8955f0d37b5Sdrh 8965f0d37b5Sdrh<p>Note that the argument to <tt>%ifdef</tt> and <tt>%ifndef</tt> is 8975f0d37b5Sdrhintended to be a single preprocessor symbol name, not a general expression. 8985f0d37b5SdrhUse the "<tt>%if</tt>" directive for general expressions.</p> 8999bccde3dSdrh 9007b852416Sdrh<a id='pinclude'></a> 90160c71b02Sdrh<h4>4.4.9 The <tt>%include</tt> directive</h4> 90275897234Sdrh 9039a243e69Sdrh<p>The <tt>%include</tt> directive specifies C code that is included at the 9049a243e69Sdrhtop of the generated parser. You can include any text you want — 905f2340fc7Sdrhthe Lemon parser generator copies it blindly. If you have multiple 9069a243e69Sdrh<tt>%include</tt> directives in your grammar file, their values are concatenated 9079a243e69Sdrhso that all <tt>%include</tt> code ultimately appears near the top of the 9089a243e69Sdrhgenerated parser, in the same order as it appeared in the grammar.</p> 90975897234Sdrh 9109a243e69Sdrh<p>The <tt>%include</tt> directive is very handy for getting some extra #include 91175897234Sdrhpreprocessor statements at the beginning of the generated parser. 91275897234SdrhFor example:</p> 91375897234Sdrh 9147b852416Sdrh<pre> 91575897234Sdrh %include {#include <unistd.h>} 9167b852416Sdrh</pre> 91775897234Sdrh 91875897234Sdrh<p>This might be needed, for example, if some of the C actions in the 9199a243e69Sdrhgrammar call functions that are prototyped in unistd.h.</p> 92075897234Sdrh 92160ce5d31Sdrh<p>Use the <tt><a href="#pcode">%code</a></tt> directive to add code to 92260ce5d31Sdrhthe end of the generated parser.</p> 92360ce5d31Sdrh 9247b852416Sdrh<a id='pleft'></a> 92560c71b02Sdrh<h4>4.4.10 The <tt>%left</tt> directive</h4> 92675897234Sdrh 9279a243e69SdrhThe <tt>%left</tt> directive is used (along with the 9289a243e69Sdrh<tt><a href='#pright'>%right</a></tt> and 9299a243e69Sdrh<tt><a href='#pnonassoc'>%nonassoc</a></tt> directives) to declare 9309a243e69Sdrhprecedences of terminal symbols. 9319a243e69SdrhEvery terminal symbol whose name appears after 9329a243e69Sdrha <tt>%left</tt> directive but before the next period (".") is 93375897234Sdrhgiven the same left-associative precedence value. Subsequent 9349a243e69Sdrh<tt>%left</tt> directives have higher precedence. For example:</p> 93575897234Sdrh 9367b852416Sdrh<pre> 93775897234Sdrh %left AND. 93875897234Sdrh %left OR. 93975897234Sdrh %nonassoc EQ NE GT GE LT LE. 94075897234Sdrh %left PLUS MINUS. 94175897234Sdrh %left TIMES DIVIDE MOD. 94275897234Sdrh %right EXP NOT. 9437b852416Sdrh</pre> 94475897234Sdrh 9459a243e69Sdrh<p>Note the period that terminates each <tt>%left</tt>, 9469a243e69Sdrh<tt>%right</tt> or <tt>%nonassoc</tt> 94775897234Sdrhdirective.</p> 94875897234Sdrh 94975897234Sdrh<p>LALR(1) grammars can get into a situation where they require 95075897234Sdrha large amount of stack space if you make heavy use or right-associative 9519a243e69Sdrhoperators. For this reason, it is recommended that you use <tt>%left</tt> 9529a243e69Sdrhrather than <tt>%right</tt> whenever possible.</p> 95375897234Sdrh 9547b852416Sdrh<a id='pname'></a> 95560c71b02Sdrh<h4>4.4.11 The <tt>%name</tt> directive</h4> 95675897234Sdrh 95775897234Sdrh<p>By default, the functions generated by Lemon all begin with the 9589bccde3dSdrhfive-character string "Parse". You can change this string to something 9599a243e69Sdrhdifferent using the <tt>%name</tt> directive. For instance:</p> 96075897234Sdrh 9617b852416Sdrh<pre> 96275897234Sdrh %name Abcde 9637b852416Sdrh</pre> 96475897234Sdrh 96575897234Sdrh<p>Putting this directive in the grammar file will cause Lemon to generate 9667b852416Sdrhfunctions named</p> 96775897234Sdrh<ul> 96875897234Sdrh<li> AbcdeAlloc(), 96975897234Sdrh<li> AbcdeFree(), 97075897234Sdrh<li> AbcdeTrace(), and 97175897234Sdrh<li> Abcde(). 97275897234Sdrh</ul> 9737b852416Sdrh</p>The <tt>%name</tt> directive allows you to generate two or more different 9749a243e69Sdrhparsers and link them all into the same executable.</p> 97575897234Sdrh 9767b852416Sdrh<a id='pnonassoc'></a> 97760c71b02Sdrh<h4>4.4.12 The <tt>%nonassoc</tt> directive</h4> 97875897234Sdrh 97975897234Sdrh<p>This directive is used to assign non-associative precedence to 9809bccde3dSdrhone or more terminal symbols. See the section on 9819bccde3dSdrh<a href='#precrules'>precedence rules</a> 9829a243e69Sdrhor on the <tt><a href='#pleft'>%left</a></tt> directive 9839a243e69Sdrhfor additional information.</p> 98475897234Sdrh 9857b852416Sdrh<a id='parse_accept'></a> 98660c71b02Sdrh<h4>4.4.13 The <tt>%parse_accept</tt> directive</h4> 98775897234Sdrh 9889a243e69Sdrh<p>The <tt>%parse_accept</tt> directive specifies a block of C code that is 9899bccde3dSdrhexecuted whenever the parser accepts its input string. To "accept" 99075897234Sdrhan input string means that the parser was able to process all tokens 99175897234Sdrhwithout error.</p> 99275897234Sdrh 99375897234Sdrh<p>For example:</p> 99475897234Sdrh 9957b852416Sdrh<pre> 99675897234Sdrh %parse_accept { 99775897234Sdrh printf("parsing complete!\n"); 99875897234Sdrh } 9997b852416Sdrh</pre> 100075897234Sdrh 10017b852416Sdrh<a id='parse_failure'></a> 100260c71b02Sdrh<h4>4.4.14 The <tt>%parse_failure</tt> directive</h4> 100375897234Sdrh 10049a243e69Sdrh<p>The <tt>%parse_failure</tt> directive specifies a block of C code that 100575897234Sdrhis executed whenever the parser fails complete. This code is not 100675897234Sdrhexecuted until the parser has tried and failed to resolve an input 100775897234Sdrherror using is usual error recovery strategy. The routine is 100875897234Sdrhonly invoked when parsing is unable to continue.</p> 100975897234Sdrh 10107b852416Sdrh<pre> 101175897234Sdrh %parse_failure { 101275897234Sdrh fprintf(stderr,"Giving up. Parser is hopelessly lost...\n"); 101375897234Sdrh } 10147b852416Sdrh</pre> 101575897234Sdrh 10167b852416Sdrh<a id='pright'></a> 101760c71b02Sdrh<h4>4.4.15 The <tt>%right</tt> directive</h4> 101875897234Sdrh 101975897234Sdrh<p>This directive is used to assign right-associative precedence to 10209bccde3dSdrhone or more terminal symbols. See the section on 10219bccde3dSdrh<a href='#precrules'>precedence rules</a> 10229bccde3dSdrhor on the <a href='#pleft'>%left</a> directive for additional information.</p> 102375897234Sdrh 10247b852416Sdrh<a id='stack_overflow'></a> 102560c71b02Sdrh<h4>4.4.16 The <tt>%stack_overflow</tt> directive</h4> 102675897234Sdrh 10279a243e69Sdrh<p>The <tt>%stack_overflow</tt> directive specifies a block of C code that 102875897234Sdrhis executed if the parser's internal stack ever overflows. Typically 102975897234Sdrhthis just prints an error message. After a stack overflow, the parser 103075897234Sdrhwill be unable to continue and must be reset.</p> 103175897234Sdrh 10327b852416Sdrh<pre> 103375897234Sdrh %stack_overflow { 103475897234Sdrh fprintf(stderr,"Giving up. Parser stack overflow\n"); 103575897234Sdrh } 10367b852416Sdrh</pre> 103775897234Sdrh 103875897234Sdrh<p>You can help prevent parser stack overflows by avoiding the use 103975897234Sdrhof right recursion and right-precedence operators in your grammar. 10409a243e69SdrhUse left recursion and and left-precedence operators instead to 104175897234Sdrhencourage rules to reduce sooner and keep the stack size down. 10427b852416SdrhFor example, do rules like this:</p> 104375897234Sdrh<pre> 104475897234Sdrh list ::= list element. // left-recursion. Good! 104575897234Sdrh list ::= . 104675897234Sdrh</pre> 10477b852416Sdrh<p>Not like this:</p> 104875897234Sdrh<pre> 104975897234Sdrh list ::= element list. // right-recursion. Bad! 105075897234Sdrh list ::= . 10517b852416Sdrh</pre> 105275897234Sdrh 10537b852416Sdrh<a id='stack_size'></a> 105460c71b02Sdrh<h4>4.4.17 The <tt>%stack_size</tt> directive</h4> 105575897234Sdrh 105675897234Sdrh<p>If stack overflow is a problem and you can't resolve the trouble 105775897234Sdrhby using left-recursion, then you might want to increase the size 105875897234Sdrhof the parser's stack using this directive. Put an positive integer 10599a243e69Sdrhafter the <tt>%stack_size</tt> directive and Lemon will generate a parse 106075897234Sdrhwith a stack of the requested size. The default value is 100.</p> 106175897234Sdrh 10627b852416Sdrh<pre> 106375897234Sdrh %stack_size 2000 10647b852416Sdrh</pre> 106575897234Sdrh 10667b852416Sdrh<a id='start_symbol'></a> 106760c71b02Sdrh<h4>4.4.18 The <tt>%start_symbol</tt> directive</h4> 106875897234Sdrh 10699a243e69Sdrh<p>By default, the start symbol for the grammar that Lemon generates 107075897234Sdrhis the first non-terminal that appears in the grammar file. But you 10719a243e69Sdrhcan choose a different start symbol using the 10729a243e69Sdrh<tt>%start_symbol</tt> directive.</p> 107375897234Sdrh 10747b852416Sdrh<pre> 107575897234Sdrh %start_symbol prog 10767b852416Sdrh</pre> 107775897234Sdrh 10787b852416Sdrh<a id='syntax_error'></a> 107960c71b02Sdrh<h4>4.4.19 The <tt>%syntax_error</tt> directive</h4> 10809a243e69Sdrh 10811e2896ecSdrh<p>See <a href='#errors'>Error Processing</a>.</p> 10829a243e69Sdrh 1083*9cffb0ffSdrh<a id='token'></a> 1084*9cffb0ffSdrh<h4>4.4.20 The <tt>%token</tt> directive</h4> 1085*9cffb0ffSdrh 1086*9cffb0ffSdrh<p>Tokens are normally created automatically, the first time they are used. 1087*9cffb0ffSdrhAny identifier that begins with an upper-case letter is a token. 1088*9cffb0ffSdrh 1089*9cffb0ffSdrh<p>Sometimes it is useful to declare tokens in advance, however. The 1090*9cffb0ffSdrhinteger values assigned to each token determined by the order in which 1091*9cffb0ffSdrhthe tokens are seen. So by declaring tokens in advance, it is possible to 1092*9cffb0ffSdrhcause some tokens to have low-numbered values, which might be desirable in 1093*9cffb0ffSdrhsome grammers, or to have sequential values assigned to a sequence of 1094*9cffb0ffSdrhrelated tokens. For this reason, the %token directive is provided to 1095*9cffb0ffSdrhdeclare tokens in advance. The syntax is as follows: 1096*9cffb0ffSdrh 1097*9cffb0ffSdrh<blockquote> 1098*9cffb0ffSdrh<tt>%token</tt> <i>TOKEN</i> <i>TOKEN...</i> <b>.</b> 1099*9cffb0ffSdrh</blockquote></p> 1100*9cffb0ffSdrh 1101*9cffb0ffSdrh<p>The %token directive is followed by zero or more token symbols and 1102*9cffb0ffSdrhterminated by a single ".". Each token named is created if it does not 1103*9cffb0ffSdrhalready exist. Tokens are created in order. 1104*9cffb0ffSdrh 1105*9cffb0ffSdrh 11067b852416Sdrh<a id='token_class'></a> 1107*9cffb0ffSdrh<h4>4.4.21 The <tt>%token_class</tt> directive</h4> 11089a243e69Sdrh 11099a243e69Sdrh<p>Undocumented. Appears to be related to the MULTITERMINAL concept. 11109a243e69Sdrh<a href='http://sqlite.org/src/fdiff?v1=796930d5fc2036c7&v2=624b24c5dc048e09&sbs=0'>Implementation</a>.</p> 11119a243e69Sdrh 11127b852416Sdrh<a id='token_destructor'></a> 1113*9cffb0ffSdrh<h4>4.4.22 The <tt>%token_destructor</tt> directive</h4> 111475897234Sdrh 11159a243e69Sdrh<p>The <tt>%destructor</tt> directive assigns a destructor to a non-terminal 11169a243e69Sdrhsymbol. (See the description of the 11179a243e69Sdrh<tt><a href='%destructor'>%destructor</a></tt> directive above.) 11189a243e69SdrhThe <tt>%token_destructor</tt> directive does the same thing 11199a243e69Sdrhfor all terminal symbols.</p> 112075897234Sdrh 11217b852416Sdrh<p>Unlike non-terminal symbols, which may each have a different data type 112275897234Sdrhfor their values, terminals all use the same data type (defined by 11239a243e69Sdrhthe <tt><a href='#token_type'>%token_type</a></tt> directive) 11249a243e69Sdrhand so they use a common destructor. 11259a243e69SdrhOther than that, the token destructor works just like the non-terminal 112675897234Sdrhdestructors.</p> 112775897234Sdrh 11287b852416Sdrh<a id='token_prefix'></a> 1129*9cffb0ffSdrh<h4>4.4.23 The <tt>%token_prefix</tt> directive</h4> 113075897234Sdrh 113175897234Sdrh<p>Lemon generates #defines that assign small integer constants 113275897234Sdrhto each terminal symbol in the grammar. If desired, Lemon will 113375897234Sdrhadd a prefix specified by this directive 11349a243e69Sdrhto each of the #defines it generates.</p> 11359a243e69Sdrh 11367b852416Sdrh<p>So if the default output of Lemon looked like this:</p> 113775897234Sdrh<pre> 113875897234Sdrh #define AND 1 113975897234Sdrh #define MINUS 2 114075897234Sdrh #define OR 3 114175897234Sdrh #define PLUS 4 114275897234Sdrh</pre> 11437b852416Sdrh<p>You can insert a statement into the grammar like this:</p> 114475897234Sdrh<pre> 114575897234Sdrh %token_prefix TOKEN_ 114675897234Sdrh</pre> 11477b852416Sdrh<p>to cause Lemon to produce these symbols instead:</p> 114875897234Sdrh<pre> 114975897234Sdrh #define TOKEN_AND 1 115075897234Sdrh #define TOKEN_MINUS 2 115175897234Sdrh #define TOKEN_OR 3 115275897234Sdrh #define TOKEN_PLUS 4 11537b852416Sdrh</pre> 115475897234Sdrh 11557b852416Sdrh<a id='token_type'></a><a id='ptype'></a> 1156*9cffb0ffSdrh<h4>4.4.24 The <tt>%token_type</tt> and <tt>%type</tt> directives</h4> 115775897234Sdrh 115875897234Sdrh<p>These directives are used to specify the data types for values 115975897234Sdrhon the parser's stack associated with terminal and non-terminal 116075897234Sdrhsymbols. The values of all terminal symbols must be of the same 116175897234Sdrhtype. This turns out to be the same data type as the 3rd parameter 116275897234Sdrhto the Parse() function generated by Lemon. Typically, you will 1163ed5e6688Sdrhmake the value of a terminal symbol be a pointer to some kind of 116475897234Sdrhtoken structure. Like this:</p> 116575897234Sdrh 11667b852416Sdrh<pre> 116775897234Sdrh %token_type {Token*} 11687b852416Sdrh</pre> 116975897234Sdrh 117075897234Sdrh<p>If the data type of terminals is not specified, the default value 1171dfe4e6bbSdrhis "void*".</p> 117275897234Sdrh 117375897234Sdrh<p>Non-terminal symbols can each have their own data types. Typically 11749a243e69Sdrhthe data type of a non-terminal is a pointer to the root of a parse tree 117575897234Sdrhstructure that contains all information about that non-terminal. 117675897234SdrhFor example:</p> 117775897234Sdrh 11787b852416Sdrh<pre> 117975897234Sdrh %type expr {Expr*} 11807b852416Sdrh</pre> 118175897234Sdrh 118275897234Sdrh<p>Each entry on the parser's stack is actually a union containing 118375897234Sdrhinstances of all data types for every non-terminal and terminal symbol. 118475897234SdrhLemon will automatically use the correct element of this union depending 118575897234Sdrhon what the corresponding non-terminal or terminal symbol is. But 118675897234Sdrhthe grammar designer should keep in mind that the size of the union 118775897234Sdrhwill be the size of its largest element. So if you have a single 118875897234Sdrhnon-terminal whose data type requires 1K of storage, then your 100 118975897234Sdrhentry parser stack will require 100K of heap space. If you are willing 119075897234Sdrhand able to pay that price, fine. You just need to know.</p> 119175897234Sdrh 11927b852416Sdrh<a id='pwildcard'></a> 1193*9cffb0ffSdrh<h4>4.4.25 The <tt>%wildcard</tt> directive</h4> 11949bccde3dSdrh 11959a243e69Sdrh<p>The <tt>%wildcard</tt> directive is followed by a single token name and a 11969bccde3dSdrhperiod. This directive specifies that the identified token should 11979a243e69Sdrhmatch any input token.</p> 11989bccde3dSdrh 11999bccde3dSdrh<p>When the generated parser has the choice of matching an input against 12009bccde3dSdrhthe wildcard token and some other token, the other token is always used. 12019a243e69SdrhThe wildcard token is only matched if there are no alternatives.</p> 12029bccde3dSdrh 12031e2896ecSdrh<a id='errors'></a> 120460c71b02Sdrh<h2>5.0 Error Processing</h2> 120575897234Sdrh 120675897234Sdrh<p>After extensive experimentation over several years, it has been 120775897234Sdrhdiscovered that the error recovery strategy used by yacc is about 120875897234Sdrhas good as it gets. And so that is what Lemon uses.</p> 120975897234Sdrh 121075897234Sdrh<p>When a Lemon-generated parser encounters a syntax error, it 12119a243e69Sdrhfirst invokes the code specified by the <tt>%syntax_error</tt> directive, if 121275897234Sdrhany. It then enters its error recovery strategy. The error recovery 121375897234Sdrhstrategy is to begin popping the parsers stack until it enters a 121475897234Sdrhstate where it is permitted to shift a special non-terminal symbol 12159bccde3dSdrhnamed "error". It then shifts this non-terminal and continues 12169a243e69Sdrhparsing. The <tt>%syntax_error</tt> routine will not be called again 121775897234Sdrhuntil at least three new tokens have been successfully shifted.</p> 121875897234Sdrh 121975897234Sdrh<p>If the parser pops its stack until the stack is empty, and it still 12209a243e69Sdrhis unable to shift the error symbol, then the 12219a243e69Sdrh<tt><a href='#parse_failure'>%parse_failure</a></tt> routine 122275897234Sdrhis invoked and the parser resets itself to its start state, ready 122375897234Sdrhto begin parsing a new file. This is what will happen at the very 122475897234Sdrhfirst syntax error, of course, if there are no instances of the 12259bccde3dSdrh"error" non-terminal in your grammar.</p> 122675897234Sdrh 122760c71b02Sdrh<a id='history'></a> 122860c71b02Sdrh<h2>6.0 History of Lemon</h2> 122960c71b02Sdrh 123060c71b02Sdrh<p>Lemon was originally written by Richard Hipp sometime in the late 123160c71b02Sdrh1980s on a Sun4 Workstation using K&R C. 123260c71b02SdrhThere was a companion LL(1) parser generator program named "Lime", the 123360c71b02Sdrhsource code to which as been lost.</p> 123460c71b02Sdrh 123560c71b02Sdrh<p>The lemon.c source file was originally many separate files that were 123660c71b02Sdrhcompiled together to generate the "lemon" executable. Sometime in the 123760c71b02Sdrh1990s, the individual source code files were combined together into 123860c71b02Sdrhthe current single large "lemon.c" source file. You can still see traces 123960c71b02Sdrhof original filenames in the code.</p> 124060c71b02Sdrh 124160c71b02Sdrh<p>Since 2001, Lemon has been part of the 124260c71b02Sdrh<a href="https://sqlite.org/">SQLite project</a> and the source code 124360c71b02Sdrhto Lemon has been managed as a part of the 124460c71b02Sdrh<a href="https://sqlite.org/src">SQLite source tree</a> in the following 124560c71b02Sdrhfiles:</p> 124660c71b02Sdrh 124760c71b02Sdrh<ul> 124860c71b02Sdrh<li> <a href="https://sqlite.org/src/file/tool/lemon.c">tool/lemon.c</a> 124960c71b02Sdrh<li> <a href="https://sqlite.org/src/file/tool/lempar.c">tool/lempar.c</a> 125060c71b02Sdrh<li> <a href="https://sqlite.org/src/file/doc/lemon.html">doc/lemon.html</a> 125160c71b02Sdrh</ul> 125260c71b02Sdrh 125360c71b02Sdrh<a id="copyright"></a> 125460c71b02Sdrh<h2>7.0 Copyright</h2> 125560c71b02Sdrh 125660c71b02Sdrh<p>All of the source code to Lemon, including the template parser file 125760c71b02Sdrh"lempar.c" and this documentation file ("lemon.html") are in the public 125860c71b02Sdrhdomain. You can use the code for any purpose and without attribution.</p> 125960c71b02Sdrh 126060c71b02Sdrh<p>The code comes with no warranty. If it breaks, you get to keep both 126160c71b02Sdrhpieces.</p> 126260c71b02Sdrh 126375897234Sdrh</body> 126475897234Sdrh</html> 1265