xref: /sqlite-3.40.0/doc/lemon.html (revision 9cffb0ff)
175897234Sdrh<html>
275897234Sdrh<head>
375897234Sdrh<title>The Lemon Parser Generator</title>
475897234Sdrh</head>
560c71b02Sdrh<body>
660c71b02Sdrh<a id="main"></a>
79a243e69Sdrh<h1 align='center'>The Lemon Parser Generator</h1>
875897234Sdrh
99bccde3dSdrh<p>Lemon is an LALR(1) parser generator for C.
109bccde3dSdrhIt does the same job as "bison" and "yacc".
119a243e69SdrhBut Lemon is not a bison or yacc clone.  Lemon
1275897234Sdrhuses a different grammar syntax which is designed to
139bccde3dSdrhreduce the number of coding errors.  Lemon also uses a
149bccde3dSdrhparsing engine that is faster than yacc and
159bccde3dSdrhbison and which is both reentrant and threadsafe.
169bccde3dSdrh(Update: Since the previous sentence was written, bison
179bccde3dSdrhhas also been updated so that it too can generate a
189bccde3dSdrhreentrant and threadsafe parser.)
199bccde3dSdrhLemon also implements features that can be used
209a243e69Sdrhto eliminate resource leaks, making it suitable for use
2175897234Sdrhin long-running programs such as graphical user interfaces
2275897234Sdrhor embedded controllers.</p>
2375897234Sdrh
2475897234Sdrh<p>This document is an introduction to the Lemon
2575897234Sdrhparser generator.</p>
2675897234Sdrh
2760c71b02Sdrh<a id="toc"></a>
2860c71b02Sdrh<h2>1.0 Table of Contents</h2>
2960c71b02Sdrh<ul>
3060c71b02Sdrh<li><a href="#main">Introduction</a>
3160c71b02Sdrh<li><a href="#toc">1.0 Table of Contents</a>
3260c71b02Sdrh<li><a href="#secnot">2.0 Security Notes</a><br>
3360c71b02Sdrh<li><a href="#optheory">3.0 Theory of Operation</a>
3460c71b02Sdrh    <ul>
3560c71b02Sdrh    <li><a href="#options">3.1 Command Line Options</a>
3660c71b02Sdrh    <li><a href="#interface">3.2 The Parser Interface</a>
3760c71b02Sdrh        <ul>
3860c71b02Sdrh        <li><a href="#onstack">3.2.1 Allocating The Parse Object On Stack</a>
3960c71b02Sdrh        <li><a href="#ifsum">3.2.2 Interface Summary</a>
4060c71b02Sdrh        </ul>
4160c71b02Sdrh    <li><a href="#yaccdiff">3.3 Differences With YACC and BISON</a>
4260c71b02Sdrh    <li><a href="#build">3.4 Building The "lemon" Or "lemon.exe" Executable</a>
4360c71b02Sdrh    </ul>
4460c71b02Sdrh<li><a href="#syntax">4.0 Input File Syntax</a>
4560c71b02Sdrh    <ul>
4660c71b02Sdrh    <li><a href="#tnt">4.1 Terminals and Nonterminals</a>
4760c71b02Sdrh    <li><a href="#rules">4.2 Grammar Rules</a>
4860c71b02Sdrh    <li><a href="#precrules">4.3 Precedence Rules</a>
4960c71b02Sdrh    <li><a href="#special">4.4 Special Directives</a>
5060c71b02Sdrh    </ul>
5160c71b02Sdrh<li><a href="#errors">5.0 Error Processing</a>
5260c71b02Sdrh<li><a href="#history">6.0 History of Lemon</a>
5360c71b02Sdrh<li><a href="#copyright">7.0 Copyright</a>
5460c71b02Sdrh</ul>
5560c71b02Sdrh
5660c71b02Sdrh<a id="secnot"></a>
5760c71b02Sdrh<h2>2.0 Security Note</h2>
58c5e56b34Sdrh
59c5e56b34Sdrh<p>The language parser code created by Lemon is very robust and
60c5e56b34Sdrhis well-suited for use in internet-facing applications that need to
617b852416Sdrhsafely process maliciously crafted inputs.</p>
62c5e56b34Sdrh
63c5e56b34Sdrh<p>The "lemon.exe" command-line tool itself works great when given a valid
64c5e56b34Sdrhinput grammar file and almost always gives helpful
65c5e56b34Sdrherror messages for malformed inputs.  However,  it is possible for
66c5e56b34Sdrha malicious user to craft a grammar file that will cause
67c5e56b34Sdrhlemon.exe to crash.
68c5e56b34SdrhWe do not see this as a problem, as lemon.exe is not intended to be used
69c5e56b34Sdrhwith hostile inputs.
70c5e56b34SdrhTo summarize:</p>
71c5e56b34Sdrh
72c5e56b34Sdrh<ul>
73c5e56b34Sdrh<li>Parser code generated by lemon &rarr; Robust and secure
74c5e56b34Sdrh<li>The "lemon.exe" command line tool itself &rarr; Not so much
75c5e56b34Sdrh</ul>
76c5e56b34Sdrh
7760c71b02Sdrh<a id="optheory"></a>
7860c71b02Sdrh<h2>3.0 Theory of Operation</h2>
7975897234Sdrh
8060c71b02Sdrh<p>Lemon is computer program that translates a context free grammar (CFG)
8175897234Sdrhfor a particular language into C code that implements a parser for
8275897234Sdrhthat language.
8360c71b02SdrhThe Lemon program has two inputs:</p>
8475897234Sdrh<ul>
8575897234Sdrh<li>The grammar specification.
8675897234Sdrh<li>A parser template file.
8775897234Sdrh</ul>
887b852416Sdrh<p>Typically, only the grammar specification is supplied by the programmer.
8960c71b02SdrhLemon comes with a default parser template
9060c71b02Sdrh("<a href="https://sqlite.org/src/file/tool/lempar.c">lempar.c</a>")
9160c71b02Sdrhthat works fine for most applications.  But the user is free to substitute
9260c71b02Sdrha different parser template if desired.</p>
9375897234Sdrh
949a243e69Sdrh<p>Depending on command-line options, Lemon will generate up to
957b852416Sdrhthree output files.</p>
9675897234Sdrh<ul>
9760c71b02Sdrh<li>C code to implement a parser for the input grammar.
9860c71b02Sdrh<li>A header file defining an integer ID for each terminal symbol
9960c71b02Sdrh    (or "token").
10075897234Sdrh<li>An information file that describes the states of the generated parser
10175897234Sdrh    automaton.
10275897234Sdrh</ul>
1037b852416Sdrh<p>By default, all three of these output files are generated.
1049bccde3dSdrhThe header file is suppressed if the "-m" command-line option is
1059bccde3dSdrhused and the report file is omitted when "-q" is selected.</p>
10675897234Sdrh
1079bccde3dSdrh<p>The grammar specification file uses a ".y" suffix, by convention.
10875897234SdrhIn the examples used in this document, we'll assume the name of the
1099bccde3dSdrhgrammar file is "gram.y".  A typical use of Lemon would be the
1107b852416Sdrhfollowing command:</p>
11175897234Sdrh<pre>
11275897234Sdrh   lemon gram.y
11375897234Sdrh</pre>
1147b852416Sdrh<p>This command will generate three output files named "gram.c",
1159bccde3dSdrh"gram.h" and "gram.out".
11675897234SdrhThe first is C code to implement the parser.  The second
11775897234Sdrhis the header file that defines numerical values for all
11875897234Sdrhterminal symbols, and the last is the report that explains
11975897234Sdrhthe states used by the parser automaton.</p>
12075897234Sdrh
12160c71b02Sdrh<a id="options"></a>
12260c71b02Sdrh<h3>3.1 Command Line Options</h3>
12375897234Sdrh
12475897234Sdrh<p>The behavior of Lemon can be modified using command-line options.
12575897234SdrhYou can obtain a list of the available command-line options together
1267b852416Sdrhwith a brief explanation of what each does by typing</p>
12775897234Sdrh<pre>
1289a243e69Sdrh   lemon "-?"
12975897234Sdrh</pre>
1307b852416Sdrh<p>As of this writing, the following command-line options are supported:</p>
13175897234Sdrh<ul>
1329bccde3dSdrh<li><b>-b</b>
1339bccde3dSdrhShow only the basis for each parser state in the report file.
1349bccde3dSdrh<li><b>-c</b>
1359a243e69SdrhDo not compress the generated action tables.  The parser will be a
1369a243e69Sdrhlittle larger and slower, but it will detect syntax errors sooner.
137fb32c44eSdrh<li><b>-d</b><i>directory</i>
138fb32c44eSdrhWrite all output files into <i>directory</i>.  Normally, output files
139fb32c44eSdrhare written into the directory that contains the input grammar file.
1409bccde3dSdrh<li><b>-D<i>name</i></b>
1419a243e69SdrhDefine C preprocessor macro <i>name</i>.  This macro is usable by
1425f0d37b5Sdrh"<tt><a href='#pifdef'>%ifdef</a></tt>",
1435f0d37b5Sdrh"<tt><a href='#pifdef'>%ifndef</a></tt>", and
1445f0d37b5Sdrh"<tt><a href="#pifdef">%if</a></tt> lines
1459a243e69Sdrhin the grammar file.
1465f0d37b5Sdrh<li><b>-E</b>
1475f0d37b5SdrhRun the "%if" preprocessor step only and print the revised grammar
1485f0d37b5Sdrhfile.
1499bccde3dSdrh<li><b>-g</b>
1509bccde3dSdrhDo not generate a parser.  Instead write the input grammar to standard
1519bccde3dSdrhoutput with all comments, actions, and other extraneous text removed.
1529bccde3dSdrh<li><b>-l</b>
153dfe4e6bbSdrhOmit "#line" directives in the generated parser C code.
1549bccde3dSdrh<li><b>-m</b>
1559bccde3dSdrhCause the output C source code to be compatible with the "makeheaders"
1569bccde3dSdrhprogram.
1579bccde3dSdrh<li><b>-p</b>
1589bccde3dSdrhDisplay all conflicts that are resolved by
1599bccde3dSdrh<a href='#precrules'>precedence rules</a>.
1609bccde3dSdrh<li><b>-q</b>
1619bccde3dSdrhSuppress generation of the report file.
1629bccde3dSdrh<li><b>-r</b>
1639bccde3dSdrhDo not sort or renumber the parser states as part of optimization.
1649bccde3dSdrh<li><b>-s</b>
165ed5e6688SdrhShow parser statistics before exiting.
1669bccde3dSdrh<li><b>-T<i>file</i></b>
1679bccde3dSdrhUse <i>file</i> as the template for the generated C-code parser implementation.
1689bccde3dSdrh<li><b>-x</b>
1699bccde3dSdrhPrint the Lemon version number.
17075897234Sdrh</ul>
17175897234Sdrh
17260c71b02Sdrh<a id="interface"></a>
17360c71b02Sdrh<h3>3.2 The Parser Interface</h3>
17475897234Sdrh
17575897234Sdrh<p>Lemon doesn't generate a complete, working program.  It only generates
17675897234Sdrha few subroutines that implement a parser.  This section describes
17775897234Sdrhthe interface to those subroutines.  It is up to the programmer to
17875897234Sdrhcall these subroutines in an appropriate way in order to produce a
17975897234Sdrhcomplete system.</p>
18075897234Sdrh
18175897234Sdrh<p>Before a program begins using a Lemon-generated parser, the program
18275897234Sdrhmust first create the parser.
1837b852416SdrhA new parser is created as follows:</p>
18475897234Sdrh<pre>
18575897234Sdrh   void *pParser = ParseAlloc( malloc );
18675897234Sdrh</pre>
1877b852416Sdrh<p>The ParseAlloc() routine allocates and initializes a new parser and
18875897234Sdrhreturns a pointer to it.
1899bccde3dSdrhThe actual data structure used to represent a parser is opaque &mdash;
19075897234Sdrhits internal structure is not visible or usable by the calling routine.
19175897234SdrhFor this reason, the ParseAlloc() routine returns a pointer to void
19275897234Sdrhrather than a pointer to some particular structure.
19375897234SdrhThe sole argument to the ParseAlloc() routine is a pointer to the
1949bccde3dSdrhsubroutine used to allocate memory.  Typically this means malloc().</p>
19575897234Sdrh
19675897234Sdrh<p>After a program is finished using a parser, it can reclaim all
1977b852416Sdrhmemory allocated by that parser by calling</p>
19875897234Sdrh<pre>
19975897234Sdrh   ParseFree(pParser, free);
20075897234Sdrh</pre>
2017b852416Sdrh<p>The first argument is the same pointer returned by ParseAlloc().  The
20275897234Sdrhsecond argument is a pointer to the function used to release bulk
20375897234Sdrhmemory back to the system.</p>
20475897234Sdrh
20575897234Sdrh<p>After a parser has been allocated using ParseAlloc(), the programmer
20675897234Sdrhmust supply the parser with a sequence of tokens (terminal symbols) to
20775897234Sdrhbe parsed.  This is accomplished by calling the following function
2087b852416Sdrhonce for each token:<p>
20975897234Sdrh<pre>
21075897234Sdrh   Parse(pParser, hTokenID, sTokenData, pArg);
21175897234Sdrh</pre>
2127b852416Sdrh<p>The first argument to the Parse() routine is the pointer returned by
21375897234SdrhParseAlloc().
2149a243e69SdrhThe second argument is a small positive integer that tells the parser the
21575897234Sdrhtype of the next token in the data stream.
21675897234SdrhThere is one token type for each terminal symbol in the grammar.
21775897234SdrhThe gram.h file generated by Lemon contains #define statements that
21875897234Sdrhmap symbolic terminal symbol names into appropriate integer values.
2199bccde3dSdrhA value of 0 for the second argument is a special flag to the
2209bccde3dSdrhparser to indicate that the end of input has been reached.
22175897234SdrhThe third argument is the value of the given token.  By default,
2229a243e69Sdrhthe type of the third argument is "void*", but the grammar will
22375897234Sdrhusually redefine this type to be some kind of structure.
22475897234SdrhTypically the second argument will be a broad category of tokens
2259bccde3dSdrhsuch as "identifier" or "number" and the third argument will
22675897234Sdrhbe the name of the identifier or the value of the number.</p>
22775897234Sdrh
22875897234Sdrh<p>The Parse() function may have either three or four arguments,
22945f31be8Sdrhdepending on the grammar.  If the grammar specification file requests
2309a243e69Sdrhit (via the <tt><a href='#extraarg'>%extra_argument</a></tt> directive),
23145f31be8Sdrhthe Parse() function will have a fourth parameter that can be
23275897234Sdrhof any type chosen by the programmer.  The parser doesn't do anything
23375897234Sdrhwith this argument except to pass it through to action routines.
23475897234SdrhThis is a convenient mechanism for passing state information down
23575897234Sdrhto the action routines without having to use global variables.</p>
23675897234Sdrh
23775897234Sdrh<p>A typical use of a Lemon parser might look something like the
2387b852416Sdrhfollowing:</p>
23975897234Sdrh<pre>
2409a243e69Sdrh    1 ParseTree *ParseFile(const char *zFilename){
2419a243e69Sdrh    2    Tokenizer *pTokenizer;
2429a243e69Sdrh    3    void *pParser;
2439a243e69Sdrh    4    Token sToken;
2449a243e69Sdrh    5    int hTokenId;
2459a243e69Sdrh    6    ParserState sState;
2469a243e69Sdrh    7
2479a243e69Sdrh    8    pTokenizer = TokenizerCreate(zFilename);
2489a243e69Sdrh    9    pParser = ParseAlloc( malloc );
2499a243e69Sdrh   10    InitParserState(&amp;sState);
2509a243e69Sdrh   11    while( GetNextToken(pTokenizer, &amp;hTokenId, &amp;sToken) ){
2519a243e69Sdrh   12       Parse(pParser, hTokenId, sToken, &amp;sState);
25275897234Sdrh   13    }
2539a243e69Sdrh   14    Parse(pParser, 0, sToken, &amp;sState);
25475897234Sdrh   15    ParseFree(pParser, free );
25575897234Sdrh   16    TokenizerFree(pTokenizer);
25675897234Sdrh   17    return sState.treeRoot;
25775897234Sdrh   18 }
25875897234Sdrh</pre>
2597b852416Sdrh<p>This example shows a user-written routine that parses a file of
26075897234Sdrhtext and returns a pointer to the parse tree.
2619bccde3dSdrh(All error-handling code is omitted from this example to keep it
26275897234Sdrhsimple.)
26375897234SdrhWe assume the existence of some kind of tokenizer which is created
26475897234Sdrhusing TokenizerCreate() on line 8 and deleted by TokenizerFree()
26575897234Sdrhon line 16.  The GetNextToken() function on line 11 retrieves the
26675897234Sdrhnext token from the input file and puts its type in the
26775897234Sdrhinteger variable hTokenId.  The sToken variable is assumed to be
26875897234Sdrhsome kind of structure that contains details about each token,
26975897234Sdrhsuch as its complete text, what line it occurs on, etc.</p>
27075897234Sdrh
2717b852416Sdrh<p>This example also assumes the existence of a structure of type
27275897234SdrhParserState that holds state information about a particular parse.
27375897234SdrhAn instance of such a structure is created on line 6 and initialized
27475897234Sdrhon line 10.  A pointer to this structure is passed into the Parse()
27575897234Sdrhroutine as the optional 4th argument.
27675897234SdrhThe action routine specified by the grammar for the parser can use
27775897234Sdrhthe ParserState structure to hold whatever information is useful and
27875897234Sdrhappropriate.  In the example, we note that the treeRoot field of
27975897234Sdrhthe ParserState structure is left pointing to the root of the parse
28075897234Sdrhtree.</p>
28175897234Sdrh
2827b852416Sdrh<p>The core of this example as it relates to Lemon is as follows:</p>
28375897234Sdrh<pre>
28475897234Sdrh   ParseFile(){
28575897234Sdrh      pParser = ParseAlloc( malloc );
2869a243e69Sdrh      while( GetNextToken(pTokenizer,&amp;hTokenId, &amp;sToken) ){
28775897234Sdrh         Parse(pParser, hTokenId, sToken);
28875897234Sdrh      }
28975897234Sdrh      Parse(pParser, 0, sToken);
29075897234Sdrh      ParseFree(pParser, free );
29175897234Sdrh   }
29275897234Sdrh</pre>
2937b852416Sdrh<p>Basically, what a program has to do to use a Lemon-generated parser
29475897234Sdrhis first create the parser, then send it lots of tokens obtained by
29575897234Sdrhtokenizing an input source.  When the end of input is reached, the
29675897234SdrhParse() routine should be called one last time with a token type
29775897234Sdrhof 0.  This step is necessary to inform the parser that the end of
29875897234Sdrhinput has been reached.  Finally, we reclaim memory used by the
29975897234Sdrhparser by calling ParseFree().</p>
30075897234Sdrh
30175897234Sdrh<p>There is one other interface routine that should be mentioned
30275897234Sdrhbefore we move on.
30375897234SdrhThe ParseTrace() function can be used to generate debugging output
3047b852416Sdrhfrom the parser.  A prototype for this routine is as follows:</p>
30575897234Sdrh<pre>
30675897234Sdrh   ParseTrace(FILE *stream, char *zPrefix);
30775897234Sdrh</pre>
3087b852416Sdrh<p>After this routine is called, a short (one-line) message is written
30975897234Sdrhto the designated output stream every time the parser changes states
31075897234Sdrhor calls an action routine.  Each such message is prefaced using
31175897234Sdrhthe text given by zPrefix.  This debugging output can be turned off
31275897234Sdrhby calling ParseTrace() again with a first argument of NULL (0).</p>
31375897234Sdrh
31460c71b02Sdrh<a id="onstack"></a>
31560c71b02Sdrh<h4>3.2.1 Allocating The Parse Object On Stack</h4>
31660c71b02Sdrh
31760c71b02Sdrh<p>If all calls to the Parse() interface are made from within
31860c71b02Sdrh<a href="#pcode"><tt>%code</tt> directives</a>, then the parse
31960c71b02Sdrhobject can be allocated from the stack rather than from the heap.
32060c71b02SdrhThese are the steps:
32160c71b02Sdrh
32260c71b02Sdrh<ul>
32360c71b02Sdrh<li> Declare a local variable of type "yyParser"
32460c71b02Sdrh<li> Initialize the variable using ParseInit()
32560c71b02Sdrh<li> Pass a pointer to the variable in calls ot Parse()
32660c71b02Sdrh<li> Deallocate substructure in the parse variable using ParseFinalize().
32760c71b02Sdrh</ul>
32860c71b02Sdrh
32960c71b02Sdrh<p>The following code illustrates how this is done:
33060c71b02Sdrh
33160c71b02Sdrh<pre>
33260c71b02Sdrh   ParseFile(){
33360c71b02Sdrh      yyParser x;
33460c71b02Sdrh      ParseInit( &x );
33560c71b02Sdrh      while( GetNextToken(pTokenizer,&amp;hTokenId, &amp;sToken) ){
33660c71b02Sdrh         Parse(&x, hTokenId, sToken);
33760c71b02Sdrh      }
33860c71b02Sdrh      Parse(&x, 0, sToken);
33960c71b02Sdrh      ParseFinalize( &x );
34060c71b02Sdrh   }
34160c71b02Sdrh</pre>
34260c71b02Sdrh
34360c71b02Sdrh<a id="ifsum"></a>
34460c71b02Sdrh<h4>3.2.2 Interface Summary</h4>
34560c71b02Sdrh
34660c71b02Sdrh<p>Here is a quick overview of the C-language interface to a
34760c71b02SdrhLemon-generated parser:</p>
34860c71b02Sdrh
34960c71b02Sdrh<blockquote><pre>
35060c71b02Sdrhvoid *ParseAlloc( (void*(*malloc)(size_t) );
35160c71b02Sdrhvoid ParseFree(void *pParser, (void(*free)(void*) );
35260c71b02Sdrhvoid Parse(void *pParser, int tokenCode, ParseTOKENTYPE token, ...);
35360c71b02Sdrhvoid ParseTrace(FILE *stream, char *zPrefix);
35460c71b02Sdrh</pre></blockquote>
35560c71b02Sdrh
35660c71b02Sdrh<p>Notes:</p>
35760c71b02Sdrh<ul>
35860c71b02Sdrh<li> Use the <a href="#pname"><tt>%name</tt> directive</a> to change
35960c71b02Sdrhthe "Parse" prefix names of the procedures in the interface.
36060c71b02Sdrh<li> Use the <a href="#token_type"><tt>%token_type</tt> directive</a>
36160c71b02Sdrhto define the "ParseTOKENTYPE" type.
36260c71b02Sdrh<li> Use the <a href="#extraarg"><tt>%extra_argument</tt> directive</a>
36360c71b02Sdrhto specify the type and name of the 4th parameter to the
36460c71b02SdrhParse() function.
36560c71b02Sdrh</ul>
36660c71b02Sdrh
36760c71b02Sdrh<a id="yaccdiff"></a>
36860c71b02Sdrh<h3>3.3 Differences With YACC and BISON</h3>
36975897234Sdrh
37075897234Sdrh<p>Programmers who have previously used the yacc or bison parser
37175897234Sdrhgenerator will notice several important differences between yacc and/or
3727b852416Sdrhbison and Lemon.</p>
37375897234Sdrh<ul>
37475897234Sdrh<li>In yacc and bison, the parser calls the tokenizer.  In Lemon,
37575897234Sdrh    the tokenizer calls the parser.
37675897234Sdrh<li>Lemon uses no global variables.  Yacc and bison use global variables
37775897234Sdrh    to pass information between the tokenizer and parser.
37875897234Sdrh<li>Lemon allows multiple parsers to be running simultaneously.  Yacc
37975897234Sdrh    and bison do not.
38075897234Sdrh</ul>
3817b852416Sdrh<p>These differences may cause some initial confusion for programmers
38275897234Sdrhwith prior yacc and bison experience.
38375897234SdrhBut after years of experience using Lemon, I firmly
38475897234Sdrhbelieve that the Lemon way of doing things is better.</p>
38575897234Sdrh
38645f31be8Sdrh<p><i>Updated as of 2016-02-16:</i>
38745f31be8SdrhThe text above was written in the 1990s.
38845f31be8SdrhWe are told that Bison has lately been enhanced to support the
38960c71b02Sdrhtokenizer-calls-parser paradigm used by Lemon, eliminating the
39045f31be8Sdrhneed for global variables.</p>
39145f31be8Sdrh
39260c71b02Sdrh<a id="build"><a>
39360c71b02Sdrh<h3>3.4 Building The "lemon" or "lemon.exe" Executable</h3>
39460c71b02Sdrh
39560c71b02Sdrh<p>The "lemon" or "lemon.exe" program is built from a single file
39660c71b02Sdrhof C-code named
39760c71b02Sdrh"<a href="https://sqlite.org/src/tool/lemon.c">lemon.c</a>".
39860c71b02SdrhThe Lemon source code is generic C89 code that uses
39960c71b02Sdrhno unusual or non-standard libraries.  Any
40060c71b02Sdrhreasonable C compiler should suffice to compile the lemon program.
40160c71b02SdrhA command-line like the following will usually work:</p>
40260c71b02Sdrh
40360c71b02Sdrh<blockquote><pre>
40460c71b02Sdrhcc -o lemon lemon.c
40560c71b02Sdrh</pre></blockquote
40660c71b02Sdrh
40760c71b02Sdrh<p>On Windows machines with Visual C++ installed, bring up a
40860c71b02Sdrh"VS20<i>NN</i> x64 Native Tools Command Prompt" window and enter:
40960c71b02Sdrh
41060c71b02Sdrh<blockquote><pre>
41160c71b02Sdrhcl lemon.c
41260c71b02Sdrh</pre></blockquote>
41360c71b02Sdrh
41460c71b02Sdrh<p>Compiling Lemon really is that simple.
41560c71b02SdrhAdditional compiler options such as
41660c71b02Sdrh"-O2" or "-g" or "-Wall" can be added if desired, but they are not
41760c71b02Sdrhnecessary.</p>
41860c71b02Sdrh
41960c71b02Sdrh
42060c71b02Sdrh<a id="syntax"></a>
42160c71b02Sdrh<h2>4.0 Input File Syntax</h2>
42275897234Sdrh
42375897234Sdrh<p>The main purpose of the grammar specification file for Lemon is
42475897234Sdrhto define the grammar for the parser.  But the input file also
42575897234Sdrhspecifies additional information Lemon requires to do its job.
42675897234SdrhMost of the work in using Lemon is in writing an appropriate
42775897234Sdrhgrammar file.</p>
42875897234Sdrh
4297b852416Sdrh<p>The grammar file for Lemon is, for the most part, a free format.
43075897234SdrhIt does not have sections or divisions like yacc or bison.  Any
4317b852416Sdrhdeclaration can occur at any point in the file.  Lemon ignores
4327b852416Sdrhwhitespace (except where it is needed to separate tokens), and it
4337b852416Sdrhhonors the same commenting conventions as C and C++.</p>
43475897234Sdrh
43560c71b02Sdrh<a id="tnt"></a>
43660c71b02Sdrh<h3>4.1 Terminals and Nonterminals</h3>
43775897234Sdrh
43875897234Sdrh<p>A terminal symbol (token) is any string of alphanumeric
4399bccde3dSdrhand/or underscore characters
44075897234Sdrhthat begins with an uppercase letter.
441c8eee5e5SdrhA terminal can contain lowercase letters after the first character,
44275897234Sdrhbut the usual convention is to make terminals all uppercase.
44375897234SdrhA nonterminal, on the other hand, is any string of alphanumeric
44475897234Sdrhand underscore characters than begins with a lowercase letter.
4459a243e69SdrhAgain, the usual convention is to make nonterminals use all lowercase
4469a243e69Sdrhletters.</p>
44775897234Sdrh
44875897234Sdrh<p>In Lemon, terminal and nonterminal symbols do not need to
44975897234Sdrhbe declared or identified in a separate section of the grammar file.
45075897234SdrhLemon is able to generate a list of all terminals and nonterminals
45175897234Sdrhby examining the grammar rules, and it can always distinguish a
45275897234Sdrhterminal from a nonterminal by checking the case of the first
45375897234Sdrhcharacter of the name.</p>
45475897234Sdrh
45575897234Sdrh<p>Yacc and bison allow terminal symbols to have either alphanumeric
45675897234Sdrhnames or to be individual characters included in single quotes, like
45775897234Sdrhthis: ')' or '$'.  Lemon does not allow this alternative form for
45875897234Sdrhterminal symbols.  With Lemon, all symbols, terminals and nonterminals,
45975897234Sdrhmust have alphanumeric names.</p>
46075897234Sdrh
46160c71b02Sdrh<a id="rules"></a>
46260c71b02Sdrh<h3>4.2 Grammar Rules</h3>
46375897234Sdrh
46475897234Sdrh<p>The main component of a Lemon grammar file is a sequence of grammar
46575897234Sdrhrules.
46675897234SdrhEach grammar rule consists of a nonterminal symbol followed by
4679bccde3dSdrhthe special symbol "::=" and then a list of terminals and/or nonterminals.
46875897234SdrhThe rule is terminated by a period.
46975897234SdrhThe list of terminals and nonterminals on the right-hand side of the
47075897234Sdrhrule can be empty.
47175897234SdrhRules can occur in any order, except that the left-hand side of the
47275897234Sdrhfirst rule is assumed to be the start symbol for the grammar (unless
4739a243e69Sdrhspecified otherwise using the <tt><a href='#start_symbol'>%start_symbol</a></tt>
4749a243e69Sdrhdirective described below.)
4757b852416SdrhA typical sequence of grammar rules might look something like this:</p>
47675897234Sdrh<pre>
47775897234Sdrh  expr ::= expr PLUS expr.
47875897234Sdrh  expr ::= expr TIMES expr.
47975897234Sdrh  expr ::= LPAREN expr RPAREN.
48075897234Sdrh  expr ::= VALUE.
48175897234Sdrh</pre>
48275897234Sdrh
4839bccde3dSdrh<p>There is one non-terminal in this example, "expr", and five
4849bccde3dSdrhterminal symbols or tokens: "PLUS", "TIMES", "LPAREN",
4859bccde3dSdrh"RPAREN" and "VALUE".</p>
48675897234Sdrh
48775897234Sdrh<p>Like yacc and bison, Lemon allows the grammar to specify a block
48875897234Sdrhof C code that will be executed whenever a grammar rule is reduced
48975897234Sdrhby the parser.
49075897234SdrhIn Lemon, this action is specified by putting the C code (contained
49175897234Sdrhwithin curly braces <tt>{...}</tt>) immediately after the
49275897234Sdrhperiod that closes the rule.
4937b852416SdrhFor example:</p>
49475897234Sdrh<pre>
49575897234Sdrh  expr ::= expr PLUS expr.   { printf("Doing an addition...\n"); }
49675897234Sdrh</pre>
49775897234Sdrh
49875897234Sdrh<p>In order to be useful, grammar actions must normally be linked to
49975897234Sdrhtheir associated grammar rules.
5009bccde3dSdrhIn yacc and bison, this is accomplished by embedding a "$$" in the
50175897234Sdrhaction to stand for the value of the left-hand side of the rule and
5029bccde3dSdrhsymbols "$1", "$2", and so forth to stand for the value of
50375897234Sdrhthe terminal or nonterminal at position 1, 2 and so forth on the
50475897234Sdrhright-hand side of the rule.
50575897234SdrhThis idea is very powerful, but it is also very error-prone.  The
50675897234Sdrhsingle most common source of errors in a yacc or bison grammar is
50775897234Sdrhto miscount the number of symbols on the right-hand side of a grammar
5089bccde3dSdrhrule and say "$7" when you really mean "$8".</p>
50975897234Sdrh
51075897234Sdrh<p>Lemon avoids the need to count grammar symbols by assigning symbolic
51175897234Sdrhnames to each symbol in a grammar rule and then using those symbolic
51275897234Sdrhnames in the action.
5137b852416SdrhIn yacc or bison, one would write this:</p>
51475897234Sdrh<pre>
5159a243e69Sdrh  expr -&gt; expr PLUS expr  { $$ = $1 + $3; };
51675897234Sdrh</pre>
5177b852416Sdrh<p>But in Lemon, the same rule becomes the following:</p>
51875897234Sdrh<pre>
51975897234Sdrh  expr(A) ::= expr(B) PLUS expr(C).  { A = B+C; }
52075897234Sdrh</pre>
5217b852416Sdrh<p>In the Lemon rule, any symbol in parentheses after a grammar rule
52275897234Sdrhsymbol becomes a place holder for that symbol in the grammar rule.
52375897234SdrhThis place holder can then be used in the associated C action to
5247b852416Sdrhstand for the value of that symbol.</p>
52575897234Sdrh
52675897234Sdrh<p>The Lemon notation for linking a grammar rule with its reduce
52775897234Sdrhaction is superior to yacc/bison on several counts.
52875897234SdrhFirst, as mentioned above, the Lemon method avoids the need to
52975897234Sdrhcount grammar symbols.
53075897234SdrhSecondly, if a terminal or nonterminal in a Lemon grammar rule
53175897234Sdrhincludes a linking symbol in parentheses but that linking symbol
53275897234Sdrhis not actually used in the reduce action, then an error message
53375897234Sdrhis generated.
5347b852416SdrhFor example, the rule</p>
53575897234Sdrh<pre>
53675897234Sdrh  expr(A) ::= expr(B) PLUS expr(C).  { A = B; }
53775897234Sdrh</pre>
5387b852416Sdrh<p>will generate an error because the linking symbol "C" is used
53975897234Sdrhin the grammar rule but not in the reduce action.</p>
54075897234Sdrh
54175897234Sdrh<p>The Lemon notation for linking grammar rules to reduce actions
54275897234Sdrhalso facilitates the use of destructors for reclaiming memory
54375897234Sdrhallocated by the values of terminals and nonterminals on the
54475897234Sdrhright-hand side of a rule.</p>
54575897234Sdrh
5467b852416Sdrh<a id='precrules'></a>
54760c71b02Sdrh<h3>4.3 Precedence Rules</h3>
54875897234Sdrh
54975897234Sdrh<p>Lemon resolves parsing ambiguities in exactly the same way as
55075897234Sdrhyacc and bison.  A shift-reduce conflict is resolved in favor
55175897234Sdrhof the shift, and a reduce-reduce conflict is resolved by reducing
55275897234Sdrhwhichever rule comes first in the grammar file.</p>
55375897234Sdrh
55475897234Sdrh<p>Just like in
55575897234Sdrhyacc and bison, Lemon allows a measure of control
5569a243e69Sdrhover the resolution of parsing conflicts using precedence rules.
55775897234SdrhA precedence value can be assigned to any terminal symbol
5589bccde3dSdrhusing the
5599a243e69Sdrh<tt><a href='#pleft'>%left</a></tt>,
5609a243e69Sdrh<tt><a href='#pright'>%right</a></tt> or
5619a243e69Sdrh<tt><a href='#pnonassoc'>%nonassoc</a></tt> directives.  Terminal symbols
5629a243e69Sdrhmentioned in earlier directives have a lower precedence than
56375897234Sdrhterminal symbols mentioned in later directives.  For example:</p>
56475897234Sdrh
5657b852416Sdrh<pre>
56675897234Sdrh   %left AND.
56775897234Sdrh   %left OR.
56875897234Sdrh   %nonassoc EQ NE GT GE LT LE.
56975897234Sdrh   %left PLUS MINUS.
57075897234Sdrh   %left TIMES DIVIDE MOD.
57175897234Sdrh   %right EXP NOT.
5727b852416Sdrh</pre>
57375897234Sdrh
57475897234Sdrh<p>In the preceding sequence of directives, the AND operator is
57575897234Sdrhdefined to have the lowest precedence.  The OR operator is one
57675897234Sdrhprecedence level higher.  And so forth.  Hence, the grammar would
5777b852416Sdrhattempt to group the ambiguous expression</p>
57875897234Sdrh<pre>
57975897234Sdrh     a AND b OR c
58075897234Sdrh</pre>
5817b852416Sdrh<p>like this</p>
58275897234Sdrh<pre>
58375897234Sdrh     a AND (b OR c).
58475897234Sdrh</pre>
5857b852416Sdrh<p>The associativity (left, right or nonassoc) is used to determine
58675897234Sdrhthe grouping when the precedence is the same.  AND is left-associative
5877b852416Sdrhin our example, so</p>
58875897234Sdrh<pre>
58975897234Sdrh     a AND b AND c
59075897234Sdrh</pre>
5917b852416Sdrh<p>is parsed like this</p>
59275897234Sdrh<pre>
59375897234Sdrh     (a AND b) AND c.
59475897234Sdrh</pre>
5957b852416Sdrh<p>The EXP operator is right-associative, though, so</p>
59675897234Sdrh<pre>
59775897234Sdrh     a EXP b EXP c
59875897234Sdrh</pre>
5997b852416Sdrh<p>is parsed like this</p>
60075897234Sdrh<pre>
60175897234Sdrh     a EXP (b EXP c).
60275897234Sdrh</pre>
6037b852416Sdrh<p>The nonassoc precedence is used for non-associative operators.
6047b852416SdrhSo</p>
60575897234Sdrh<pre>
60675897234Sdrh     a EQ b EQ c
60775897234Sdrh</pre>
6087b852416Sdrh<p>is an error.</p>
60975897234Sdrh
61075897234Sdrh<p>The precedence of non-terminals is transferred to rules as follows:
61175897234SdrhThe precedence of a grammar rule is equal to the precedence of the
61275897234Sdrhleft-most terminal symbol in the rule for which a precedence is
61375897234Sdrhdefined.  This is normally what you want, but in those cases where
614ed5e6688Sdrhyou want the precedence of a grammar rule to be something different,
61575897234Sdrhyou can specify an alternative precedence symbol by putting the
61675897234Sdrhsymbol in square braces after the period at the end of the rule and
61775897234Sdrhbefore any C-code.  For example:</p>
61875897234Sdrh
6197b852416Sdrh<pre>
62075897234Sdrh   expr = MINUS expr.  [NOT]
6217b852416Sdrh</pre>
62275897234Sdrh
62375897234Sdrh<p>This rule has a precedence equal to that of the NOT symbol, not the
62475897234SdrhMINUS symbol as would have been the case by default.</p>
62575897234Sdrh
62675897234Sdrh<p>With the knowledge of how precedence is assigned to terminal
62775897234Sdrhsymbols and individual
62875897234Sdrhgrammar rules, we can now explain precisely how parsing conflicts
62975897234Sdrhare resolved in Lemon.  Shift-reduce conflicts are resolved
6307b852416Sdrhas follows:</p>
63175897234Sdrh<ul>
63275897234Sdrh<li> If either the token to be shifted or the rule to be reduced
63375897234Sdrh     lacks precedence information, then resolve in favor of the
63475897234Sdrh     shift, but report a parsing conflict.
63575897234Sdrh<li> If the precedence of the token to be shifted is greater than
63675897234Sdrh     the precedence of the rule to reduce, then resolve in favor
63775897234Sdrh     of the shift.  No parsing conflict is reported.
6389a243e69Sdrh<li> If the precedence of the token to be shifted is less than the
63975897234Sdrh     precedence of the rule to reduce, then resolve in favor of the
64075897234Sdrh     reduce action.  No parsing conflict is reported.
64175897234Sdrh<li> If the precedences are the same and the shift token is
64275897234Sdrh     right-associative, then resolve in favor of the shift.
64375897234Sdrh     No parsing conflict is reported.
6449a243e69Sdrh<li> If the precedences are the same and the shift token is
64575897234Sdrh     left-associative, then resolve in favor of the reduce.
64675897234Sdrh     No parsing conflict is reported.
6479a243e69Sdrh<li> Otherwise, resolve the conflict by doing the shift, and
6489a243e69Sdrh     report a parsing conflict.
64975897234Sdrh</ul>
6507b852416Sdrh<p>Reduce-reduce conflicts are resolved this way:</p>
65175897234Sdrh<ul>
65275897234Sdrh<li> If either reduce rule
65375897234Sdrh     lacks precedence information, then resolve in favor of the
6549a243e69Sdrh     rule that appears first in the grammar, and report a parsing
65575897234Sdrh     conflict.
6569a243e69Sdrh<li> If both rules have precedence and the precedence is different,
65775897234Sdrh     then resolve the dispute in favor of the rule with the highest
6589a243e69Sdrh     precedence, and do not report a conflict.
65975897234Sdrh<li> Otherwise, resolve the conflict by reducing by the rule that
6609a243e69Sdrh     appears first in the grammar, and report a parsing conflict.
66175897234Sdrh</ul>
66275897234Sdrh
66360c71b02Sdrh<a id="special"></a>
66460c71b02Sdrh<h3>4.4 Special Directives</h3>
66575897234Sdrh
66675897234Sdrh<p>The input grammar to Lemon consists of grammar rules and special
66775897234Sdrhdirectives.  We've described all the grammar rules, so now we'll
66875897234Sdrhtalk about the special directives.</p>
66975897234Sdrh
6709a243e69Sdrh<p>Directives in Lemon can occur in any order.  You can put them before
6719a243e69Sdrhthe grammar rules, or after the grammar rules, or in the midst of the
67275897234Sdrhgrammar rules.  It doesn't matter.  The relative order of
67375897234Sdrhdirectives used to assign precedence to terminals is important, but
67475897234Sdrhother than that, the order of directives in Lemon is arbitrary.</p>
67575897234Sdrh
6767b852416Sdrh<p>Lemon supports the following special directives:</p>
67775897234Sdrh<ul>
6789a243e69Sdrh<li><tt><a href='#pcode'>%code</a></tt>
6799a243e69Sdrh<li><tt><a href='#default_destructor'>%default_destructor</a></tt>
6809a243e69Sdrh<li><tt><a href='#default_type'>%default_type</a></tt>
6819a243e69Sdrh<li><tt><a href='#destructor'>%destructor</a></tt>
6825f0d37b5Sdrh<li><tt><a href='#pifdef'>%else</a></tt>
6839a243e69Sdrh<li><tt><a href='#pifdef'>%endif</a></tt>
6849a243e69Sdrh<li><tt><a href='#extraarg'>%extra_argument</a></tt>
6859a243e69Sdrh<li><tt><a href='#pfallback'>%fallback</a></tt>
6865f0d37b5Sdrh<li><tt><a href='#pifdef'>%if</a></tt>
6879a243e69Sdrh<li><tt><a href='#pifdef'>%ifdef</a></tt>
6889a243e69Sdrh<li><tt><a href='#pifdef'>%ifndef</a></tt>
6899a243e69Sdrh<li><tt><a href='#pinclude'>%include</a></tt>
6909a243e69Sdrh<li><tt><a href='#pleft'>%left</a></tt>
6919a243e69Sdrh<li><tt><a href='#pname'>%name</a></tt>
6929a243e69Sdrh<li><tt><a href='#pnonassoc'>%nonassoc</a></tt>
6939a243e69Sdrh<li><tt><a href='#parse_accept'>%parse_accept</a></tt>
6949a243e69Sdrh<li><tt><a href='#parse_failure'>%parse_failure</a></tt>
6959a243e69Sdrh<li><tt><a href='#pright'>%right</a></tt>
6969a243e69Sdrh<li><tt><a href='#stack_overflow'>%stack_overflow</a></tt>
6979a243e69Sdrh<li><tt><a href='#stack_size'>%stack_size</a></tt>
6989a243e69Sdrh<li><tt><a href='#start_symbol'>%start_symbol</a></tt>
6999a243e69Sdrh<li><tt><a href='#syntax_error'>%syntax_error</a></tt>
700*9cffb0ffSdrh<li><tt><a href='#token'>%token</a></tt>
7019a243e69Sdrh<li><tt><a href='#token_class'>%token_class</a></tt>
7029a243e69Sdrh<li><tt><a href='#token_destructor'>%token_destructor</a></tt>
7039a243e69Sdrh<li><tt><a href='#token_prefix'>%token_prefix</a></tt>
7049a243e69Sdrh<li><tt><a href='#token_type'>%token_type</a></tt>
7059a243e69Sdrh<li><tt><a href='#ptype'>%type</a></tt>
7069a243e69Sdrh<li><tt><a href='#pwildcard'>%wildcard</a></tt>
70775897234Sdrh</ul>
7087b852416Sdrh<p>Each of these directives will be described separately in the
70975897234Sdrhfollowing sections:</p>
71075897234Sdrh
7117b852416Sdrh<a id='pcode'></a>
71260c71b02Sdrh<h4>4.4.1 The <tt>%code</tt> directive</h4>
713f2340fc7Sdrh
7149a243e69Sdrh<p>The <tt>%code</tt> directive is used to specify additional C code that
715f2340fc7Sdrhis added to the end of the main output file.  This is similar to
7169a243e69Sdrhthe <tt><a href='#pinclude'>%include</a></tt> directive except that
7179a243e69Sdrh<tt>%include</tt> is inserted at the beginning of the main output file.</p>
718f2340fc7Sdrh
7199a243e69Sdrh<p><tt>%code</tt> is typically used to include some action routines or perhaps
7209bccde3dSdrha tokenizer or even the "main()" function
7219bccde3dSdrhas part of the output file.</p>
722f2340fc7Sdrh
72360c71b02Sdrh<p>There can be multiple <tt>%code</tt> directives.  The arguments of
72460c71b02Sdrhall <tt>%code</tt> directives are concatenated.</p>
72560c71b02Sdrh
7267b852416Sdrh<a id='default_destructor'></a>
72760c71b02Sdrh<h4>4.4.2 The <tt>%default_destructor</tt> directive</h4>
728f2340fc7Sdrh
7299a243e69Sdrh<p>The <tt>%default_destructor</tt> directive specifies a destructor to
730f2340fc7Sdrhuse for non-terminals that do not have their own destructor
7319a243e69Sdrhspecified by a separate <tt>%destructor</tt> directive.  See the documentation
7327b852416Sdrhon the <tt><a href='#destructor'>%destructor</a></tt> directive below for
7339bccde3dSdrhadditional information.</p>
734f2340fc7Sdrh
7359a243e69Sdrh<p>In some grammars, many different non-terminal symbols have the
736f2340fc7Sdrhsame data type and hence the same destructor.  This directive is
7379a243e69Sdrha convenient way to specify the same destructor for all those
738f2340fc7Sdrhnon-terminals using a single statement.</p>
739f2340fc7Sdrh
7407b852416Sdrh<a id='default_type'></a>
74160c71b02Sdrh<h4>4.4.3 The <tt>%default_type</tt> directive</h4>
742f2340fc7Sdrh
7439a243e69Sdrh<p>The <tt>%default_type</tt> directive specifies the data type of non-terminal
7449a243e69Sdrhsymbols that do not have their own data type defined using a separate
7459a243e69Sdrh<tt><a href='#ptype'>%type</a></tt> directive.</p>
746f2340fc7Sdrh
7477b852416Sdrh<a id='destructor'></a>
74860c71b02Sdrh<h4>4.4.4 The <tt>%destructor</tt> directive</h4>
74975897234Sdrh
7509a243e69Sdrh<p>The <tt>%destructor</tt> directive is used to specify a destructor for
75175897234Sdrha non-terminal symbol.
7529a243e69Sdrh(See also the <tt><a href='#token_destructor'>%token_destructor</a></tt>
7539bccde3dSdrhdirective which is used to specify a destructor for terminal symbols.)</p>
75475897234Sdrh
75575897234Sdrh<p>A non-terminal's destructor is called to dispose of the
75675897234Sdrhnon-terminal's value whenever the non-terminal is popped from
7577b852416Sdrhthe stack.  This includes all of the following circumstances:</p>
75875897234Sdrh<ul>
75975897234Sdrh<li> When a rule reduces and the value of a non-terminal on
76075897234Sdrh     the right-hand side is not linked to C code.
76175897234Sdrh<li> When the stack is popped during error processing.
76275897234Sdrh<li> When the ParseFree() function runs.
76375897234Sdrh</ul>
7647b852416Sdrh<p>The destructor can do whatever it wants with the value of
76575897234Sdrhthe non-terminal, but its design is to deallocate memory
76675897234Sdrhor other resources held by that non-terminal.</p>
76775897234Sdrh
7687b852416Sdrh<p>Consider an example:</p>
76975897234Sdrh<pre>
77075897234Sdrh   %type nt {void*}
77175897234Sdrh   %destructor nt { free($$); }
77275897234Sdrh   nt(A) ::= ID NUM.   { A = malloc( 100 ); }
77375897234Sdrh</pre>
7747b852416Sdrh<p>This example is a bit contrived, but it serves to illustrate how
77575897234Sdrhdestructors work.  The example shows a non-terminal named
7769bccde3dSdrh"nt" that holds values of type "void*".  When the rule for
7779bccde3dSdrhan "nt" reduces, it sets the value of the non-terminal to
77875897234Sdrhspace obtained from malloc().  Later, when the nt non-terminal
77975897234Sdrhis popped from the stack, the destructor will fire and call
78075897234Sdrhfree() on this malloced space, thus avoiding a memory leak.
7819bccde3dSdrh(Note that the symbol "$$" in the destructor code is replaced
78275897234Sdrhby the value of the non-terminal.)</p>
78375897234Sdrh
78475897234Sdrh<p>It is important to note that the value of a non-terminal is passed
78575897234Sdrhto the destructor whenever the non-terminal is removed from the
78675897234Sdrhstack, unless the non-terminal is used in a C-code action.  If
78775897234Sdrhthe non-terminal is used by C-code, then it is assumed that the
7889bccde3dSdrhC-code will take care of destroying it.
7899bccde3dSdrhMore commonly, the value is used to build some
7909a243e69Sdrhlarger structure, and we don't want to destroy it, which is why
79175897234Sdrhthe destructor is not called in this circumstance.</p>
79275897234Sdrh
7939bccde3dSdrh<p>Destructors help avoid memory leaks by automatically freeing
7949bccde3dSdrhallocated objects when they go out of scope.
79575897234SdrhTo do the same using yacc or bison is much more difficult.</p>
79675897234Sdrh
7977b852416Sdrh<a id='extraarg'></a>
79860c71b02Sdrh<h4>4.4.5 The <tt>%extra_argument</tt> directive</h4>
79975897234Sdrh
8007b852416Sdrh<p>The <tt>%extra_argument</tt> directive instructs Lemon to add a 4th parameter
80175897234Sdrhto the parameter list of the Parse() function it generates.  Lemon
80275897234Sdrhdoesn't do anything itself with this extra argument, but it does
80375897234Sdrhmake the argument available to C-code action routines, destructors,
80475897234Sdrhand so forth.  For example, if the grammar file contains:</p>
80575897234Sdrh
8067b852416Sdrh<pre>
80775897234Sdrh    %extra_argument { MyStruct *pAbc }
8087b852416Sdrh</pre>
80975897234Sdrh
81075897234Sdrh<p>Then the Parse() function generated will have an 4th parameter
8119bccde3dSdrhof type "MyStruct*" and all action routines will have access to
8129bccde3dSdrha variable named "pAbc" that is the value of the 4th parameter
81375897234Sdrhin the most recent call to Parse().</p>
81475897234Sdrh
815fb32c44eSdrh<p>The <tt>%extra_context</tt> directive works the same except that it
816fb32c44eSdrhis passed in on the ParseAlloc() or ParseInit() routines instead of
8177b852416Sdrhon Parse().</p>
818fb32c44eSdrh
8197b852416Sdrh<a id='extractx'></a>
82060c71b02Sdrh<h4>4.4.6 The <tt>%extra_context</tt> directive</h4>
821fb32c44eSdrh
8227b852416Sdrh<p>The <tt>%extra_context</tt> directive instructs Lemon to add a 2nd parameter
8237b852416Sdrhto the parameter list of the ParseAlloc() and ParseInit() functions.  Lemon
824fb32c44eSdrhdoesn't do anything itself with these extra argument, but it does
825fb32c44eSdrhstore the value make it available to C-code action routines, destructors,
826fb32c44eSdrhand so forth.  For example, if the grammar file contains:</p>
827fb32c44eSdrh
8287b852416Sdrh<pre>
829fb32c44eSdrh    %extra_context { MyStruct *pAbc }
8307b852416Sdrh</pre>
831fb32c44eSdrh
832ed5e6688Sdrh<p>Then the ParseAlloc() and ParseInit() functions will have an 2nd parameter
833fb32c44eSdrhof type "MyStruct*" and all action routines will have access to
834ed5e6688Sdrha variable named "pAbc" that is the value of that 2nd parameter.</p>
835fb32c44eSdrh
836fb32c44eSdrh<p>The <tt>%extra_argument</tt> directive works the same except that it
8377b852416Sdrhis passed in on the Parse() routine instead of on ParseAlloc()/ParseInit().</p>
838fb32c44eSdrh
8397b852416Sdrh<a id='pfallback'></a>
84060c71b02Sdrh<h4>4.4.7 The <tt>%fallback</tt> directive</h4>
8419bccde3dSdrh
8429a243e69Sdrh<p>The <tt>%fallback</tt> directive specifies an alternative meaning for one
8439bccde3dSdrhor more tokens.  The alternative meaning is tried if the original token
8449a243e69Sdrhwould have generated a syntax error.</p>
8459bccde3dSdrh
8469a243e69Sdrh<p>The <tt>%fallback</tt> directive was added to support robust parsing of SQL
8479a243e69Sdrhsyntax in <a href='https://www.sqlite.org/'>SQLite</a>.
8489bccde3dSdrhThe SQL language contains a large assortment of keywords, each of which
8499bccde3dSdrhappears as a different token to the language parser.  SQL contains so
8509a243e69Sdrhmany keywords that it can be difficult for programmers to keep up with
8519bccde3dSdrhthem all.  Programmers will, therefore, sometimes mistakenly use an
8529a243e69Sdrhobscure language keyword for an identifier.  The <tt>%fallback</tt> directive
8539bccde3dSdrhprovides a mechanism to tell the parser:  "If you are unable to parse
8549a243e69Sdrhthis keyword, try treating it as an identifier instead."</p>
8559bccde3dSdrh
8567b852416Sdrh<p>The syntax of <tt>%fallback</tt> is as follows:</p>
8579bccde3dSdrh
8589bccde3dSdrh<blockquote>
8599bccde3dSdrh<tt>%fallback</tt> <i>ID</i> <i>TOKEN...</i> <b>.</b>
8609a243e69Sdrh</blockquote></p>
8619bccde3dSdrh
8629a243e69Sdrh<p>In words, the <tt>%fallback</tt> directive is followed by a list of token
8639a243e69Sdrhnames terminated by a period.
8649a243e69SdrhThe first token name is the fallback token &mdash; the
8659bccde3dSdrhtoken to which all the other tokens fall back to.  The second and subsequent
8669bccde3dSdrharguments are tokens which fall back to the token identified by the first
8679a243e69Sdrhargument.</p>
8689bccde3dSdrh
8697b852416Sdrh<a id='pifdef'></a>
87060c71b02Sdrh<h4>4.4.8 The <tt>%if</tt> directive and its friends</h4>
8719bccde3dSdrh
8725f0d37b5Sdrh<p>The <tt>%if</tt>, <tt>%ifdef</tt>, <tt>%ifndef</tt>, <tt>%else</tt>,
8735f0d37b5Sdrhand <tt>%endif</tt> directives
8745f0d37b5Sdrhare similar to #if, #ifdef, #ifndef, #else, and #endif in the C-preprocessor,
8759a243e69Sdrhjust not as general.
8769bccde3dSdrhEach of these directives must begin at the left margin.  No whitespace
8779a243e69Sdrhis allowed between the "%" and the directive name.</p>
8789bccde3dSdrh
8799a243e69Sdrh<p>Grammar text in between "<tt>%ifdef MACRO</tt>" and the next nested
8809a243e69Sdrh"<tt>%endif</tt>" is
8819bccde3dSdrhignored unless the "-DMACRO" command-line option is used.  Grammar text
8829a243e69Sdrhbetwen "<tt>%ifndef MACRO</tt>" and the next nested "<tt>%endif</tt>" is
8835f0d37b5Sdrhincluded except when the "-DMACRO" command-line option is used.<p>
8849bccde3dSdrh
8855f0d37b5Sdrh<p>The text in between "<tt>%if</tt> <i>CONDITIONAL</i>" and its
8865f0d37b5Sdrhcorresponding <tt>%endif</tt> is included only if <i>CONDITIONAL</i>
8875f0d37b5Sdrhis true.  The CONDITION is one or more macro names, optionally connected
8885f0d37b5Sdrhusing the "||" and "&amp;&amp;" binary operators, the "!" unary operator,
8895f0d37b5Sdrhand grouped using balanced parentheses.  Each term is true if the
8905f0d37b5Sdrhcorresponding macro exists, and false if it does not exist.</p>
8919bccde3dSdrh
8925f0d37b5Sdrh<p>An optional "<tt>%else</tt>" directive can occur anywhere in between a
8935f0d37b5Sdrh<tt>%ifdef</tt>, <tt>%ifndef</tt>, or <tt>%if</tt> directive and
8945f0d37b5Sdrhits corresponding <tt>%endif</tt>.</p>
8955f0d37b5Sdrh
8965f0d37b5Sdrh<p>Note that the argument to <tt>%ifdef</tt> and <tt>%ifndef</tt> is
8975f0d37b5Sdrhintended to be a single preprocessor symbol name, not a general expression.
8985f0d37b5SdrhUse the "<tt>%if</tt>" directive for general expressions.</p>
8999bccde3dSdrh
9007b852416Sdrh<a id='pinclude'></a>
90160c71b02Sdrh<h4>4.4.9 The <tt>%include</tt> directive</h4>
90275897234Sdrh
9039a243e69Sdrh<p>The <tt>%include</tt> directive specifies C code that is included at the
9049a243e69Sdrhtop of the generated parser.  You can include any text you want &mdash;
905f2340fc7Sdrhthe Lemon parser generator copies it blindly.  If you have multiple
9069a243e69Sdrh<tt>%include</tt> directives in your grammar file, their values are concatenated
9079a243e69Sdrhso that all <tt>%include</tt> code ultimately appears near the top of the
9089a243e69Sdrhgenerated parser, in the same order as it appeared in the grammar.</p>
90975897234Sdrh
9109a243e69Sdrh<p>The <tt>%include</tt> directive is very handy for getting some extra #include
91175897234Sdrhpreprocessor statements at the beginning of the generated parser.
91275897234SdrhFor example:</p>
91375897234Sdrh
9147b852416Sdrh<pre>
91575897234Sdrh   %include {#include &lt;unistd.h&gt;}
9167b852416Sdrh</pre>
91775897234Sdrh
91875897234Sdrh<p>This might be needed, for example, if some of the C actions in the
9199a243e69Sdrhgrammar call functions that are prototyped in unistd.h.</p>
92075897234Sdrh
92160ce5d31Sdrh<p>Use the <tt><a href="#pcode">%code</a></tt> directive to add code to
92260ce5d31Sdrhthe end of the generated parser.</p>
92360ce5d31Sdrh
9247b852416Sdrh<a id='pleft'></a>
92560c71b02Sdrh<h4>4.4.10 The <tt>%left</tt> directive</h4>
92675897234Sdrh
9279a243e69SdrhThe <tt>%left</tt> directive is used (along with the
9289a243e69Sdrh<tt><a href='#pright'>%right</a></tt> and
9299a243e69Sdrh<tt><a href='#pnonassoc'>%nonassoc</a></tt> directives) to declare
9309a243e69Sdrhprecedences of terminal symbols.
9319a243e69SdrhEvery terminal symbol whose name appears after
9329a243e69Sdrha <tt>%left</tt> directive but before the next period (".") is
93375897234Sdrhgiven the same left-associative precedence value.  Subsequent
9349a243e69Sdrh<tt>%left</tt> directives have higher precedence.  For example:</p>
93575897234Sdrh
9367b852416Sdrh<pre>
93775897234Sdrh   %left AND.
93875897234Sdrh   %left OR.
93975897234Sdrh   %nonassoc EQ NE GT GE LT LE.
94075897234Sdrh   %left PLUS MINUS.
94175897234Sdrh   %left TIMES DIVIDE MOD.
94275897234Sdrh   %right EXP NOT.
9437b852416Sdrh</pre>
94475897234Sdrh
9459a243e69Sdrh<p>Note the period that terminates each <tt>%left</tt>,
9469a243e69Sdrh<tt>%right</tt> or <tt>%nonassoc</tt>
94775897234Sdrhdirective.</p>
94875897234Sdrh
94975897234Sdrh<p>LALR(1) grammars can get into a situation where they require
95075897234Sdrha large amount of stack space if you make heavy use or right-associative
9519a243e69Sdrhoperators.  For this reason, it is recommended that you use <tt>%left</tt>
9529a243e69Sdrhrather than <tt>%right</tt> whenever possible.</p>
95375897234Sdrh
9547b852416Sdrh<a id='pname'></a>
95560c71b02Sdrh<h4>4.4.11 The <tt>%name</tt> directive</h4>
95675897234Sdrh
95775897234Sdrh<p>By default, the functions generated by Lemon all begin with the
9589bccde3dSdrhfive-character string "Parse".  You can change this string to something
9599a243e69Sdrhdifferent using the <tt>%name</tt> directive.  For instance:</p>
96075897234Sdrh
9617b852416Sdrh<pre>
96275897234Sdrh   %name Abcde
9637b852416Sdrh</pre>
96475897234Sdrh
96575897234Sdrh<p>Putting this directive in the grammar file will cause Lemon to generate
9667b852416Sdrhfunctions named</p>
96775897234Sdrh<ul>
96875897234Sdrh<li> AbcdeAlloc(),
96975897234Sdrh<li> AbcdeFree(),
97075897234Sdrh<li> AbcdeTrace(), and
97175897234Sdrh<li> Abcde().
97275897234Sdrh</ul>
9737b852416Sdrh</p>The <tt>%name</tt> directive allows you to generate two or more different
9749a243e69Sdrhparsers and link them all into the same executable.</p>
97575897234Sdrh
9767b852416Sdrh<a id='pnonassoc'></a>
97760c71b02Sdrh<h4>4.4.12 The <tt>%nonassoc</tt> directive</h4>
97875897234Sdrh
97975897234Sdrh<p>This directive is used to assign non-associative precedence to
9809bccde3dSdrhone or more terminal symbols.  See the section on
9819bccde3dSdrh<a href='#precrules'>precedence rules</a>
9829a243e69Sdrhor on the <tt><a href='#pleft'>%left</a></tt> directive
9839a243e69Sdrhfor additional information.</p>
98475897234Sdrh
9857b852416Sdrh<a id='parse_accept'></a>
98660c71b02Sdrh<h4>4.4.13 The <tt>%parse_accept</tt> directive</h4>
98775897234Sdrh
9889a243e69Sdrh<p>The <tt>%parse_accept</tt> directive specifies a block of C code that is
9899bccde3dSdrhexecuted whenever the parser accepts its input string.  To "accept"
99075897234Sdrhan input string means that the parser was able to process all tokens
99175897234Sdrhwithout error.</p>
99275897234Sdrh
99375897234Sdrh<p>For example:</p>
99475897234Sdrh
9957b852416Sdrh<pre>
99675897234Sdrh   %parse_accept {
99775897234Sdrh      printf("parsing complete!\n");
99875897234Sdrh   }
9997b852416Sdrh</pre>
100075897234Sdrh
10017b852416Sdrh<a id='parse_failure'></a>
100260c71b02Sdrh<h4>4.4.14 The <tt>%parse_failure</tt> directive</h4>
100375897234Sdrh
10049a243e69Sdrh<p>The <tt>%parse_failure</tt> directive specifies a block of C code that
100575897234Sdrhis executed whenever the parser fails complete.  This code is not
100675897234Sdrhexecuted until the parser has tried and failed to resolve an input
100775897234Sdrherror using is usual error recovery strategy.  The routine is
100875897234Sdrhonly invoked when parsing is unable to continue.</p>
100975897234Sdrh
10107b852416Sdrh<pre>
101175897234Sdrh   %parse_failure {
101275897234Sdrh     fprintf(stderr,"Giving up.  Parser is hopelessly lost...\n");
101375897234Sdrh   }
10147b852416Sdrh</pre>
101575897234Sdrh
10167b852416Sdrh<a id='pright'></a>
101760c71b02Sdrh<h4>4.4.15 The <tt>%right</tt> directive</h4>
101875897234Sdrh
101975897234Sdrh<p>This directive is used to assign right-associative precedence to
10209bccde3dSdrhone or more terminal symbols.  See the section on
10219bccde3dSdrh<a href='#precrules'>precedence rules</a>
10229bccde3dSdrhor on the <a href='#pleft'>%left</a> directive for additional information.</p>
102375897234Sdrh
10247b852416Sdrh<a id='stack_overflow'></a>
102560c71b02Sdrh<h4>4.4.16 The <tt>%stack_overflow</tt> directive</h4>
102675897234Sdrh
10279a243e69Sdrh<p>The <tt>%stack_overflow</tt> directive specifies a block of C code that
102875897234Sdrhis executed if the parser's internal stack ever overflows.  Typically
102975897234Sdrhthis just prints an error message.  After a stack overflow, the parser
103075897234Sdrhwill be unable to continue and must be reset.</p>
103175897234Sdrh
10327b852416Sdrh<pre>
103375897234Sdrh   %stack_overflow {
103475897234Sdrh     fprintf(stderr,"Giving up.  Parser stack overflow\n");
103575897234Sdrh   }
10367b852416Sdrh</pre>
103775897234Sdrh
103875897234Sdrh<p>You can help prevent parser stack overflows by avoiding the use
103975897234Sdrhof right recursion and right-precedence operators in your grammar.
10409a243e69SdrhUse left recursion and and left-precedence operators instead to
104175897234Sdrhencourage rules to reduce sooner and keep the stack size down.
10427b852416SdrhFor example, do rules like this:</p>
104375897234Sdrh<pre>
104475897234Sdrh   list ::= list element.      // left-recursion.  Good!
104575897234Sdrh   list ::= .
104675897234Sdrh</pre>
10477b852416Sdrh<p>Not like this:</p>
104875897234Sdrh<pre>
104975897234Sdrh   list ::= element list.      // right-recursion.  Bad!
105075897234Sdrh   list ::= .
10517b852416Sdrh</pre>
105275897234Sdrh
10537b852416Sdrh<a id='stack_size'></a>
105460c71b02Sdrh<h4>4.4.17 The <tt>%stack_size</tt> directive</h4>
105575897234Sdrh
105675897234Sdrh<p>If stack overflow is a problem and you can't resolve the trouble
105775897234Sdrhby using left-recursion, then you might want to increase the size
105875897234Sdrhof the parser's stack using this directive.  Put an positive integer
10599a243e69Sdrhafter the <tt>%stack_size</tt> directive and Lemon will generate a parse
106075897234Sdrhwith a stack of the requested size.  The default value is 100.</p>
106175897234Sdrh
10627b852416Sdrh<pre>
106375897234Sdrh   %stack_size 2000
10647b852416Sdrh</pre>
106575897234Sdrh
10667b852416Sdrh<a id='start_symbol'></a>
106760c71b02Sdrh<h4>4.4.18 The <tt>%start_symbol</tt> directive</h4>
106875897234Sdrh
10699a243e69Sdrh<p>By default, the start symbol for the grammar that Lemon generates
107075897234Sdrhis the first non-terminal that appears in the grammar file.  But you
10719a243e69Sdrhcan choose a different start symbol using the
10729a243e69Sdrh<tt>%start_symbol</tt> directive.</p>
107375897234Sdrh
10747b852416Sdrh<pre>
107575897234Sdrh   %start_symbol  prog
10767b852416Sdrh</pre>
107775897234Sdrh
10787b852416Sdrh<a id='syntax_error'></a>
107960c71b02Sdrh<h4>4.4.19 The <tt>%syntax_error</tt> directive</h4>
10809a243e69Sdrh
10811e2896ecSdrh<p>See <a href='#errors'>Error Processing</a>.</p>
10829a243e69Sdrh
1083*9cffb0ffSdrh<a id='token'></a>
1084*9cffb0ffSdrh<h4>4.4.20 The <tt>%token</tt> directive</h4>
1085*9cffb0ffSdrh
1086*9cffb0ffSdrh<p>Tokens are normally created automatically, the first time they are used.
1087*9cffb0ffSdrhAny identifier that begins with an upper-case letter is a token.
1088*9cffb0ffSdrh
1089*9cffb0ffSdrh<p>Sometimes it is useful to declare tokens in advance, however.  The
1090*9cffb0ffSdrhinteger values assigned to each token determined by the order in which
1091*9cffb0ffSdrhthe tokens are seen.  So by declaring tokens in advance, it is possible to
1092*9cffb0ffSdrhcause some tokens to have low-numbered values, which might be desirable in
1093*9cffb0ffSdrhsome grammers, or to have sequential values assigned to a sequence of
1094*9cffb0ffSdrhrelated tokens.  For this reason, the %token directive is provided to
1095*9cffb0ffSdrhdeclare tokens in advance.  The syntax is as follows:
1096*9cffb0ffSdrh
1097*9cffb0ffSdrh<blockquote>
1098*9cffb0ffSdrh<tt>%token</tt> <i>TOKEN</i> <i>TOKEN...</i> <b>.</b>
1099*9cffb0ffSdrh</blockquote></p>
1100*9cffb0ffSdrh
1101*9cffb0ffSdrh<p>The %token directive is followed by zero or more token symbols and
1102*9cffb0ffSdrhterminated by a single ".".  Each token named is created if it does not
1103*9cffb0ffSdrhalready exist.  Tokens are created in order.
1104*9cffb0ffSdrh
1105*9cffb0ffSdrh
11067b852416Sdrh<a id='token_class'></a>
1107*9cffb0ffSdrh<h4>4.4.21 The <tt>%token_class</tt> directive</h4>
11089a243e69Sdrh
11099a243e69Sdrh<p>Undocumented.  Appears to be related to the MULTITERMINAL concept.
11109a243e69Sdrh<a href='http://sqlite.org/src/fdiff?v1=796930d5fc2036c7&v2=624b24c5dc048e09&sbs=0'>Implementation</a>.</p>
11119a243e69Sdrh
11127b852416Sdrh<a id='token_destructor'></a>
1113*9cffb0ffSdrh<h4>4.4.22 The <tt>%token_destructor</tt> directive</h4>
111475897234Sdrh
11159a243e69Sdrh<p>The <tt>%destructor</tt> directive assigns a destructor to a non-terminal
11169a243e69Sdrhsymbol.  (See the description of the
11179a243e69Sdrh<tt><a href='%destructor'>%destructor</a></tt> directive above.)
11189a243e69SdrhThe <tt>%token_destructor</tt> directive does the same thing
11199a243e69Sdrhfor all terminal symbols.</p>
112075897234Sdrh
11217b852416Sdrh<p>Unlike non-terminal symbols, which may each have a different data type
112275897234Sdrhfor their values, terminals all use the same data type (defined by
11239a243e69Sdrhthe <tt><a href='#token_type'>%token_type</a></tt> directive)
11249a243e69Sdrhand so they use a common destructor.
11259a243e69SdrhOther than that, the token destructor works just like the non-terminal
112675897234Sdrhdestructors.</p>
112775897234Sdrh
11287b852416Sdrh<a id='token_prefix'></a>
1129*9cffb0ffSdrh<h4>4.4.23 The <tt>%token_prefix</tt> directive</h4>
113075897234Sdrh
113175897234Sdrh<p>Lemon generates #defines that assign small integer constants
113275897234Sdrhto each terminal symbol in the grammar.  If desired, Lemon will
113375897234Sdrhadd a prefix specified by this directive
11349a243e69Sdrhto each of the #defines it generates.</p>
11359a243e69Sdrh
11367b852416Sdrh<p>So if the default output of Lemon looked like this:</p>
113775897234Sdrh<pre>
113875897234Sdrh    #define AND              1
113975897234Sdrh    #define MINUS            2
114075897234Sdrh    #define OR               3
114175897234Sdrh    #define PLUS             4
114275897234Sdrh</pre>
11437b852416Sdrh<p>You can insert a statement into the grammar like this:</p>
114475897234Sdrh<pre>
114575897234Sdrh    %token_prefix    TOKEN_
114675897234Sdrh</pre>
11477b852416Sdrh<p>to cause Lemon to produce these symbols instead:</p>
114875897234Sdrh<pre>
114975897234Sdrh    #define TOKEN_AND        1
115075897234Sdrh    #define TOKEN_MINUS      2
115175897234Sdrh    #define TOKEN_OR         3
115275897234Sdrh    #define TOKEN_PLUS       4
11537b852416Sdrh</pre>
115475897234Sdrh
11557b852416Sdrh<a id='token_type'></a><a id='ptype'></a>
1156*9cffb0ffSdrh<h4>4.4.24 The <tt>%token_type</tt> and <tt>%type</tt> directives</h4>
115775897234Sdrh
115875897234Sdrh<p>These directives are used to specify the data types for values
115975897234Sdrhon the parser's stack associated with terminal and non-terminal
116075897234Sdrhsymbols.  The values of all terminal symbols must be of the same
116175897234Sdrhtype.  This turns out to be the same data type as the 3rd parameter
116275897234Sdrhto the Parse() function generated by Lemon.  Typically, you will
1163ed5e6688Sdrhmake the value of a terminal symbol be a pointer to some kind of
116475897234Sdrhtoken structure.  Like this:</p>
116575897234Sdrh
11667b852416Sdrh<pre>
116775897234Sdrh   %token_type    {Token*}
11687b852416Sdrh</pre>
116975897234Sdrh
117075897234Sdrh<p>If the data type of terminals is not specified, the default value
1171dfe4e6bbSdrhis "void*".</p>
117275897234Sdrh
117375897234Sdrh<p>Non-terminal symbols can each have their own data types.  Typically
11749a243e69Sdrhthe data type of a non-terminal is a pointer to the root of a parse tree
117575897234Sdrhstructure that contains all information about that non-terminal.
117675897234SdrhFor example:</p>
117775897234Sdrh
11787b852416Sdrh<pre>
117975897234Sdrh   %type   expr  {Expr*}
11807b852416Sdrh</pre>
118175897234Sdrh
118275897234Sdrh<p>Each entry on the parser's stack is actually a union containing
118375897234Sdrhinstances of all data types for every non-terminal and terminal symbol.
118475897234SdrhLemon will automatically use the correct element of this union depending
118575897234Sdrhon what the corresponding non-terminal or terminal symbol is.  But
118675897234Sdrhthe grammar designer should keep in mind that the size of the union
118775897234Sdrhwill be the size of its largest element.  So if you have a single
118875897234Sdrhnon-terminal whose data type requires 1K of storage, then your 100
118975897234Sdrhentry parser stack will require 100K of heap space.  If you are willing
119075897234Sdrhand able to pay that price, fine.  You just need to know.</p>
119175897234Sdrh
11927b852416Sdrh<a id='pwildcard'></a>
1193*9cffb0ffSdrh<h4>4.4.25 The <tt>%wildcard</tt> directive</h4>
11949bccde3dSdrh
11959a243e69Sdrh<p>The <tt>%wildcard</tt> directive is followed by a single token name and a
11969bccde3dSdrhperiod.  This directive specifies that the identified token should
11979a243e69Sdrhmatch any input token.</p>
11989bccde3dSdrh
11999bccde3dSdrh<p>When the generated parser has the choice of matching an input against
12009bccde3dSdrhthe wildcard token and some other token, the other token is always used.
12019a243e69SdrhThe wildcard token is only matched if there are no alternatives.</p>
12029bccde3dSdrh
12031e2896ecSdrh<a id='errors'></a>
120460c71b02Sdrh<h2>5.0 Error Processing</h2>
120575897234Sdrh
120675897234Sdrh<p>After extensive experimentation over several years, it has been
120775897234Sdrhdiscovered that the error recovery strategy used by yacc is about
120875897234Sdrhas good as it gets.  And so that is what Lemon uses.</p>
120975897234Sdrh
121075897234Sdrh<p>When a Lemon-generated parser encounters a syntax error, it
12119a243e69Sdrhfirst invokes the code specified by the <tt>%syntax_error</tt> directive, if
121275897234Sdrhany.  It then enters its error recovery strategy.  The error recovery
121375897234Sdrhstrategy is to begin popping the parsers stack until it enters a
121475897234Sdrhstate where it is permitted to shift a special non-terminal symbol
12159bccde3dSdrhnamed "error".  It then shifts this non-terminal and continues
12169a243e69Sdrhparsing.  The <tt>%syntax_error</tt> routine will not be called again
121775897234Sdrhuntil at least three new tokens have been successfully shifted.</p>
121875897234Sdrh
121975897234Sdrh<p>If the parser pops its stack until the stack is empty, and it still
12209a243e69Sdrhis unable to shift the error symbol, then the
12219a243e69Sdrh<tt><a href='#parse_failure'>%parse_failure</a></tt> routine
122275897234Sdrhis invoked and the parser resets itself to its start state, ready
122375897234Sdrhto begin parsing a new file.  This is what will happen at the very
122475897234Sdrhfirst syntax error, of course, if there are no instances of the
12259bccde3dSdrh"error" non-terminal in your grammar.</p>
122675897234Sdrh
122760c71b02Sdrh<a id='history'></a>
122860c71b02Sdrh<h2>6.0 History of Lemon</h2>
122960c71b02Sdrh
123060c71b02Sdrh<p>Lemon was originally written by Richard Hipp sometime in the late
123160c71b02Sdrh1980s on a Sun4 Workstation using K&amp;R C.
123260c71b02SdrhThere was a companion LL(1) parser generator program named "Lime", the
123360c71b02Sdrhsource code to which as been lost.</p>
123460c71b02Sdrh
123560c71b02Sdrh<p>The lemon.c source file was originally many separate files that were
123660c71b02Sdrhcompiled together to generate the "lemon" executable.  Sometime in the
123760c71b02Sdrh1990s, the individual source code files were combined together into
123860c71b02Sdrhthe current single large "lemon.c" source file.  You can still see traces
123960c71b02Sdrhof original filenames in the code.</p>
124060c71b02Sdrh
124160c71b02Sdrh<p>Since 2001, Lemon has been part of the
124260c71b02Sdrh<a href="https://sqlite.org/">SQLite project</a> and the source code
124360c71b02Sdrhto Lemon has been managed as a part of the
124460c71b02Sdrh<a href="https://sqlite.org/src">SQLite source tree</a> in the following
124560c71b02Sdrhfiles:</p>
124660c71b02Sdrh
124760c71b02Sdrh<ul>
124860c71b02Sdrh<li> <a href="https://sqlite.org/src/file/tool/lemon.c">tool/lemon.c</a>
124960c71b02Sdrh<li> <a href="https://sqlite.org/src/file/tool/lempar.c">tool/lempar.c</a>
125060c71b02Sdrh<li> <a href="https://sqlite.org/src/file/doc/lemon.html">doc/lemon.html</a>
125160c71b02Sdrh</ul>
125260c71b02Sdrh
125360c71b02Sdrh<a id="copyright"></a>
125460c71b02Sdrh<h2>7.0 Copyright</h2>
125560c71b02Sdrh
125660c71b02Sdrh<p>All of the source code to Lemon, including the template parser file
125760c71b02Sdrh"lempar.c" and this documentation file ("lemon.html") are in the public
125860c71b02Sdrhdomain.  You can use the code for any purpose and without attribution.</p>
125960c71b02Sdrh
126060c71b02Sdrh<p>The code comes with no warranty.  If it breaks, you get to keep both
126160c71b02Sdrhpieces.</p>
126260c71b02Sdrh
126375897234Sdrh</body>
126475897234Sdrh</html>
1265