175897234Sdrh<html> 275897234Sdrh<head> 375897234Sdrh<title>The Lemon Parser Generator</title> 475897234Sdrh</head> 575897234Sdrh<body bgcolor=white> 675897234Sdrh<h1 align=center>The Lemon Parser Generator</h1> 775897234Sdrh 875897234Sdrh<p>Lemon is an LALR(1) parser generator for C or C++. 975897234SdrhIt does the same job as ``bison'' and ``yacc''. 1075897234SdrhBut lemon is not another bison or yacc clone. It 1175897234Sdrhuses a different grammar syntax which is designed to 1275897234Sdrhreduce the number of coding errors. Lemon also uses a more 1375897234Sdrhsophisticated parsing engine that is faster than yacc and 1475897234Sdrhbison and which is both reentrant and thread-safe. 1575897234SdrhFurthermore, Lemon implements features that can be used 1675897234Sdrhto eliminate resource leaks, making is suitable for use 1775897234Sdrhin long-running programs such as graphical user interfaces 1875897234Sdrhor embedded controllers.</p> 1975897234Sdrh 2075897234Sdrh<p>This document is an introduction to the Lemon 2175897234Sdrhparser generator.</p> 2275897234Sdrh 2375897234Sdrh<h2>Theory of Operation</h2> 2475897234Sdrh 2575897234Sdrh<p>The main goal of Lemon is to translate a context free grammar (CFG) 2675897234Sdrhfor a particular language into C code that implements a parser for 2775897234Sdrhthat language. 2875897234SdrhThe program has two inputs: 2975897234Sdrh<ul> 3075897234Sdrh<li>The grammar specification. 3175897234Sdrh<li>A parser template file. 3275897234Sdrh</ul> 3375897234SdrhTypically, only the grammar specification is supplied by the programmer. 3475897234SdrhLemon comes with a default parser template which works fine for most 3575897234Sdrhapplications. But the user is free to substitute a different parser 3675897234Sdrhtemplate if desired.</p> 3775897234Sdrh 3875897234Sdrh<p>Depending on command-line options, Lemon will generate between 3975897234Sdrhone and three files of outputs. 4075897234Sdrh<ul> 4175897234Sdrh<li>C code to implement the parser. 4275897234Sdrh<li>A header file defining an integer ID for each terminal symbol. 4375897234Sdrh<li>An information file that describes the states of the generated parser 4475897234Sdrh automaton. 4575897234Sdrh</ul> 4675897234SdrhBy default, all three of these output files are generated. 4775897234SdrhThe header file is suppressed if the ``-m'' command-line option is 4875897234Sdrhused and the report file is omitted when ``-q'' is selected.</p> 4975897234Sdrh 5075897234Sdrh<p>The grammar specification file uses a ``.y'' suffix, by convention. 5175897234SdrhIn the examples used in this document, we'll assume the name of the 5275897234Sdrhgrammar file is ``gram.y''. A typical use of Lemon would be the 5375897234Sdrhfollowing command: 5475897234Sdrh<pre> 5575897234Sdrh lemon gram.y 5675897234Sdrh</pre> 5775897234SdrhThis command will generate three output files named ``gram.c'', 5875897234Sdrh``gram.h'' and ``gram.out''. 5975897234SdrhThe first is C code to implement the parser. The second 6075897234Sdrhis the header file that defines numerical values for all 6175897234Sdrhterminal symbols, and the last is the report that explains 6275897234Sdrhthe states used by the parser automaton.</p> 6375897234Sdrh 6475897234Sdrh<h3>Command Line Options</h3> 6575897234Sdrh 6675897234Sdrh<p>The behavior of Lemon can be modified using command-line options. 6775897234SdrhYou can obtain a list of the available command-line options together 6875897234Sdrhwith a brief explanation of what each does by typing 6975897234Sdrh<pre> 7075897234Sdrh lemon -? 7175897234Sdrh</pre> 7275897234SdrhAs of this writing, the following command-line options are supported: 7375897234Sdrh<ul> 7475897234Sdrh<li><tt>-b</tt> 7575897234Sdrh<li><tt>-c</tt> 7675897234Sdrh<li><tt>-g</tt> 7775897234Sdrh<li><tt>-m</tt> 7875897234Sdrh<li><tt>-q</tt> 7975897234Sdrh<li><tt>-s</tt> 8075897234Sdrh<li><tt>-x</tt> 8175897234Sdrh</ul> 8275897234SdrhThe ``-b'' option reduces the amount of text in the report file by 8375897234Sdrhprinting only the basis of each parser state, rather than the full 8475897234Sdrhconfiguration. 8575897234SdrhThe ``-c'' option suppresses action table compression. Using -c 8675897234Sdrhwill make the parser a little larger and slower but it will detect 8775897234Sdrhsyntax errors sooner. 8875897234SdrhThe ``-g'' option causes no output files to be generated at all. 8975897234SdrhInstead, the input grammar file is printed on standard output but 9075897234Sdrhwith all comments, actions and other extraneous text deleted. This 9175897234Sdrhis a useful way to get a quick summary of a grammar. 9275897234SdrhThe ``-m'' option causes the output C source file to be compatible 9375897234Sdrhwith the ``makeheaders'' program. 9475897234SdrhMakeheaders is a program that automatically generates header files 9575897234Sdrhfrom C source code. When the ``-m'' option is used, the header 9675897234Sdrhfile is not output since the makeheaders program will take care 9775897234Sdrhof generated all header files automatically. 9875897234SdrhThe ``-q'' option suppresses the report file. 9975897234SdrhUsing ``-s'' causes a brief summary of parser statistics to be 10075897234Sdrhprinted. Like this: 10175897234Sdrh<pre> 10275897234Sdrh Parser statistics: 74 terminals, 70 nonterminals, 179 rules 10375897234Sdrh 340 states, 2026 parser table entries, 0 conflicts 10475897234Sdrh</pre> 10575897234SdrhFinally, the ``-x'' option causes Lemon to print its version number 106b19a2bc6Sdrhand then stops without attempting to read the grammar or generate a parser.</p> 10775897234Sdrh 10875897234Sdrh<h3>The Parser Interface</h3> 10975897234Sdrh 11075897234Sdrh<p>Lemon doesn't generate a complete, working program. It only generates 11175897234Sdrha few subroutines that implement a parser. This section describes 11275897234Sdrhthe interface to those subroutines. It is up to the programmer to 11375897234Sdrhcall these subroutines in an appropriate way in order to produce a 11475897234Sdrhcomplete system.</p> 11575897234Sdrh 11675897234Sdrh<p>Before a program begins using a Lemon-generated parser, the program 11775897234Sdrhmust first create the parser. 11875897234SdrhA new parser is created as follows: 11975897234Sdrh<pre> 12075897234Sdrh void *pParser = ParseAlloc( malloc ); 12175897234Sdrh</pre> 12275897234SdrhThe ParseAlloc() routine allocates and initializes a new parser and 12375897234Sdrhreturns a pointer to it. 12475897234SdrhThe actual data structure used to represent a parser is opaque -- 12575897234Sdrhits internal structure is not visible or usable by the calling routine. 12675897234SdrhFor this reason, the ParseAlloc() routine returns a pointer to void 12775897234Sdrhrather than a pointer to some particular structure. 12875897234SdrhThe sole argument to the ParseAlloc() routine is a pointer to the 12975897234Sdrhsubroutine used to allocate memory. Typically this means ``malloc()''.</p> 13075897234Sdrh 13175897234Sdrh<p>After a program is finished using a parser, it can reclaim all 13275897234Sdrhmemory allocated by that parser by calling 13375897234Sdrh<pre> 13475897234Sdrh ParseFree(pParser, free); 13575897234Sdrh</pre> 13675897234SdrhThe first argument is the same pointer returned by ParseAlloc(). The 13775897234Sdrhsecond argument is a pointer to the function used to release bulk 13875897234Sdrhmemory back to the system.</p> 13975897234Sdrh 14075897234Sdrh<p>After a parser has been allocated using ParseAlloc(), the programmer 14175897234Sdrhmust supply the parser with a sequence of tokens (terminal symbols) to 14275897234Sdrhbe parsed. This is accomplished by calling the following function 14375897234Sdrhonce for each token: 14475897234Sdrh<pre> 14575897234Sdrh Parse(pParser, hTokenID, sTokenData, pArg); 14675897234Sdrh</pre> 14775897234SdrhThe first argument to the Parse() routine is the pointer returned by 14875897234SdrhParseAlloc(). 14975897234SdrhThe second argument is a small positive integer that tells the parse the 15075897234Sdrhtype of the next token in the data stream. 15175897234SdrhThere is one token type for each terminal symbol in the grammar. 15275897234SdrhThe gram.h file generated by Lemon contains #define statements that 15375897234Sdrhmap symbolic terminal symbol names into appropriate integer values. 15475897234Sdrh(A value of 0 for the second argument is a special flag to the 15575897234Sdrhparser to indicate that the end of input has been reached.) 15675897234SdrhThe third argument is the value of the given token. By default, 15775897234Sdrhthe type of the third argument is integer, but the grammar will 15875897234Sdrhusually redefine this type to be some kind of structure. 15975897234SdrhTypically the second argument will be a broad category of tokens 16075897234Sdrhsuch as ``identifier'' or ``number'' and the third argument will 16175897234Sdrhbe the name of the identifier or the value of the number.</p> 16275897234Sdrh 16375897234Sdrh<p>The Parse() function may have either three or four arguments, 16475897234Sdrhdepending on the grammar. If the grammar specification file request 16575897234Sdrhit, the Parse() function will have a fourth parameter that can be 16675897234Sdrhof any type chosen by the programmer. The parser doesn't do anything 16775897234Sdrhwith this argument except to pass it through to action routines. 16875897234SdrhThis is a convenient mechanism for passing state information down 16975897234Sdrhto the action routines without having to use global variables.</p> 17075897234Sdrh 17175897234Sdrh<p>A typical use of a Lemon parser might look something like the 17275897234Sdrhfollowing: 17375897234Sdrh<pre> 17475897234Sdrh 01 ParseTree *ParseFile(const char *zFilename){ 17575897234Sdrh 02 Tokenizer *pTokenizer; 17675897234Sdrh 03 void *pParser; 17775897234Sdrh 04 Token sToken; 17875897234Sdrh 05 int hTokenId; 17975897234Sdrh 06 ParserState sState; 18075897234Sdrh 07 18175897234Sdrh 08 pTokenizer = TokenizerCreate(zFilename); 18275897234Sdrh 09 pParser = ParseAlloc( malloc ); 18375897234Sdrh 10 InitParserState(&sState); 18475897234Sdrh 11 while( GetNextToken(pTokenizer, &hTokenId, &sToken) ){ 18575897234Sdrh 12 Parse(pParser, hTokenId, sToken, &sState); 18675897234Sdrh 13 } 18775897234Sdrh 14 Parse(pParser, 0, sToken, &sState); 18875897234Sdrh 15 ParseFree(pParser, free ); 18975897234Sdrh 16 TokenizerFree(pTokenizer); 19075897234Sdrh 17 return sState.treeRoot; 19175897234Sdrh 18 } 19275897234Sdrh</pre> 19375897234SdrhThis example shows a user-written routine that parses a file of 19475897234Sdrhtext and returns a pointer to the parse tree. 19575897234Sdrh(We've omitted all error-handling from this example to keep it 19675897234Sdrhsimple.) 19775897234SdrhWe assume the existence of some kind of tokenizer which is created 19875897234Sdrhusing TokenizerCreate() on line 8 and deleted by TokenizerFree() 19975897234Sdrhon line 16. The GetNextToken() function on line 11 retrieves the 20075897234Sdrhnext token from the input file and puts its type in the 20175897234Sdrhinteger variable hTokenId. The sToken variable is assumed to be 20275897234Sdrhsome kind of structure that contains details about each token, 20375897234Sdrhsuch as its complete text, what line it occurs on, etc. </p> 20475897234Sdrh 20575897234Sdrh<p>This example also assumes the existence of structure of type 20675897234SdrhParserState that holds state information about a particular parse. 20775897234SdrhAn instance of such a structure is created on line 6 and initialized 20875897234Sdrhon line 10. A pointer to this structure is passed into the Parse() 20975897234Sdrhroutine as the optional 4th argument. 21075897234SdrhThe action routine specified by the grammar for the parser can use 21175897234Sdrhthe ParserState structure to hold whatever information is useful and 21275897234Sdrhappropriate. In the example, we note that the treeRoot field of 21375897234Sdrhthe ParserState structure is left pointing to the root of the parse 21475897234Sdrhtree.</p> 21575897234Sdrh 21675897234Sdrh<p>The core of this example as it relates to Lemon is as follows: 21775897234Sdrh<pre> 21875897234Sdrh ParseFile(){ 21975897234Sdrh pParser = ParseAlloc( malloc ); 22075897234Sdrh while( GetNextToken(pTokenizer,&hTokenId, &sToken) ){ 22175897234Sdrh Parse(pParser, hTokenId, sToken); 22275897234Sdrh } 22375897234Sdrh Parse(pParser, 0, sToken); 22475897234Sdrh ParseFree(pParser, free ); 22575897234Sdrh } 22675897234Sdrh</pre> 22775897234SdrhBasically, what a program has to do to use a Lemon-generated parser 22875897234Sdrhis first create the parser, then send it lots of tokens obtained by 22975897234Sdrhtokenizing an input source. When the end of input is reached, the 23075897234SdrhParse() routine should be called one last time with a token type 23175897234Sdrhof 0. This step is necessary to inform the parser that the end of 23275897234Sdrhinput has been reached. Finally, we reclaim memory used by the 23375897234Sdrhparser by calling ParseFree().</p> 23475897234Sdrh 23575897234Sdrh<p>There is one other interface routine that should be mentioned 23675897234Sdrhbefore we move on. 23775897234SdrhThe ParseTrace() function can be used to generate debugging output 23875897234Sdrhfrom the parser. A prototype for this routine is as follows: 23975897234Sdrh<pre> 24075897234Sdrh ParseTrace(FILE *stream, char *zPrefix); 24175897234Sdrh</pre> 24275897234SdrhAfter this routine is called, a short (one-line) message is written 24375897234Sdrhto the designated output stream every time the parser changes states 24475897234Sdrhor calls an action routine. Each such message is prefaced using 24575897234Sdrhthe text given by zPrefix. This debugging output can be turned off 24675897234Sdrhby calling ParseTrace() again with a first argument of NULL (0).</p> 24775897234Sdrh 24875897234Sdrh<h3>Differences With YACC and BISON</h3> 24975897234Sdrh 25075897234Sdrh<p>Programmers who have previously used the yacc or bison parser 25175897234Sdrhgenerator will notice several important differences between yacc and/or 25275897234Sdrhbison and Lemon. 25375897234Sdrh<ul> 25475897234Sdrh<li>In yacc and bison, the parser calls the tokenizer. In Lemon, 25575897234Sdrh the tokenizer calls the parser. 25675897234Sdrh<li>Lemon uses no global variables. Yacc and bison use global variables 25775897234Sdrh to pass information between the tokenizer and parser. 25875897234Sdrh<li>Lemon allows multiple parsers to be running simultaneously. Yacc 25975897234Sdrh and bison do not. 26075897234Sdrh</ul> 26175897234SdrhThese differences may cause some initial confusion for programmers 26275897234Sdrhwith prior yacc and bison experience. 26375897234SdrhBut after years of experience using Lemon, I firmly 26475897234Sdrhbelieve that the Lemon way of doing things is better.</p> 26575897234Sdrh 26675897234Sdrh<h2>Input File Syntax</h2> 26775897234Sdrh 26875897234Sdrh<p>The main purpose of the grammar specification file for Lemon is 26975897234Sdrhto define the grammar for the parser. But the input file also 27075897234Sdrhspecifies additional information Lemon requires to do its job. 27175897234SdrhMost of the work in using Lemon is in writing an appropriate 27275897234Sdrhgrammar file.</p> 27375897234Sdrh 27475897234Sdrh<p>The grammar file for lemon is, for the most part, free format. 27575897234SdrhIt does not have sections or divisions like yacc or bison. Any 27675897234Sdrhdeclaration can occur at any point in the file. 27775897234SdrhLemon ignores whitespace (except where it is needed to separate 27875897234Sdrhtokens) and it honors the same commenting conventions as C and C++.</p> 27975897234Sdrh 28075897234Sdrh<h3>Terminals and Nonterminals</h3> 28175897234Sdrh 28275897234Sdrh<p>A terminal symbol (token) is any string of alphanumeric 28375897234Sdrhand underscore characters 28475897234Sdrhthat begins with an upper case letter. 285*c8eee5e5SdrhA terminal can contain lowercase letters after the first character, 28675897234Sdrhbut the usual convention is to make terminals all upper case. 28775897234SdrhA nonterminal, on the other hand, is any string of alphanumeric 28875897234Sdrhand underscore characters than begins with a lower case letter. 28975897234SdrhAgain, the usual convention is to make nonterminals use all lower 29075897234Sdrhcase letters.</p> 29175897234Sdrh 29275897234Sdrh<p>In Lemon, terminal and nonterminal symbols do not need to 29375897234Sdrhbe declared or identified in a separate section of the grammar file. 29475897234SdrhLemon is able to generate a list of all terminals and nonterminals 29575897234Sdrhby examining the grammar rules, and it can always distinguish a 29675897234Sdrhterminal from a nonterminal by checking the case of the first 29775897234Sdrhcharacter of the name.</p> 29875897234Sdrh 29975897234Sdrh<p>Yacc and bison allow terminal symbols to have either alphanumeric 30075897234Sdrhnames or to be individual characters included in single quotes, like 30175897234Sdrhthis: ')' or '$'. Lemon does not allow this alternative form for 30275897234Sdrhterminal symbols. With Lemon, all symbols, terminals and nonterminals, 30375897234Sdrhmust have alphanumeric names.</p> 30475897234Sdrh 30575897234Sdrh<h3>Grammar Rules</h3> 30675897234Sdrh 30775897234Sdrh<p>The main component of a Lemon grammar file is a sequence of grammar 30875897234Sdrhrules. 30975897234SdrhEach grammar rule consists of a nonterminal symbol followed by 31075897234Sdrhthe special symbol ``::='' and then a list of terminals and/or nonterminals. 31175897234SdrhThe rule is terminated by a period. 31275897234SdrhThe list of terminals and nonterminals on the right-hand side of the 31375897234Sdrhrule can be empty. 31475897234SdrhRules can occur in any order, except that the left-hand side of the 31575897234Sdrhfirst rule is assumed to be the start symbol for the grammar (unless 31675897234Sdrhspecified otherwise using the <tt>%start</tt> directive described below.) 31775897234SdrhA typical sequence of grammar rules might look something like this: 31875897234Sdrh<pre> 31975897234Sdrh expr ::= expr PLUS expr. 32075897234Sdrh expr ::= expr TIMES expr. 32175897234Sdrh expr ::= LPAREN expr RPAREN. 32275897234Sdrh expr ::= VALUE. 32375897234Sdrh</pre> 32475897234Sdrh</p> 32575897234Sdrh 32675897234Sdrh<p>There is one non-terminal in this example, ``expr'', and five 32775897234Sdrhterminal symbols or tokens: ``PLUS'', ``TIMES'', ``LPAREN'', 32875897234Sdrh``RPAREN'' and ``VALUE''.</p> 32975897234Sdrh 33075897234Sdrh<p>Like yacc and bison, Lemon allows the grammar to specify a block 33175897234Sdrhof C code that will be executed whenever a grammar rule is reduced 33275897234Sdrhby the parser. 33375897234SdrhIn Lemon, this action is specified by putting the C code (contained 33475897234Sdrhwithin curly braces <tt>{...}</tt>) immediately after the 33575897234Sdrhperiod that closes the rule. 33675897234SdrhFor example: 33775897234Sdrh<pre> 33875897234Sdrh expr ::= expr PLUS expr. { printf("Doing an addition...\n"); } 33975897234Sdrh</pre> 34075897234Sdrh</p> 34175897234Sdrh 34275897234Sdrh<p>In order to be useful, grammar actions must normally be linked to 34375897234Sdrhtheir associated grammar rules. 34475897234SdrhIn yacc and bison, this is accomplished by embedding a ``$$'' in the 34575897234Sdrhaction to stand for the value of the left-hand side of the rule and 34675897234Sdrhsymbols ``$1'', ``$2'', and so forth to stand for the value of 34775897234Sdrhthe terminal or nonterminal at position 1, 2 and so forth on the 34875897234Sdrhright-hand side of the rule. 34975897234SdrhThis idea is very powerful, but it is also very error-prone. The 35075897234Sdrhsingle most common source of errors in a yacc or bison grammar is 35175897234Sdrhto miscount the number of symbols on the right-hand side of a grammar 35275897234Sdrhrule and say ``$7'' when you really mean ``$8''.</p> 35375897234Sdrh 35475897234Sdrh<p>Lemon avoids the need to count grammar symbols by assigning symbolic 35575897234Sdrhnames to each symbol in a grammar rule and then using those symbolic 35675897234Sdrhnames in the action. 35775897234SdrhIn yacc or bison, one would write this: 35875897234Sdrh<pre> 35975897234Sdrh expr -> expr PLUS expr { $$ = $1 + $3; }; 36075897234Sdrh</pre> 36175897234SdrhBut in Lemon, the same rule becomes the following: 36275897234Sdrh<pre> 36375897234Sdrh expr(A) ::= expr(B) PLUS expr(C). { A = B+C; } 36475897234Sdrh</pre> 36575897234SdrhIn the Lemon rule, any symbol in parentheses after a grammar rule 36675897234Sdrhsymbol becomes a place holder for that symbol in the grammar rule. 36775897234SdrhThis place holder can then be used in the associated C action to 36875897234Sdrhstand for the value of that symbol.<p> 36975897234Sdrh 37075897234Sdrh<p>The Lemon notation for linking a grammar rule with its reduce 37175897234Sdrhaction is superior to yacc/bison on several counts. 37275897234SdrhFirst, as mentioned above, the Lemon method avoids the need to 37375897234Sdrhcount grammar symbols. 37475897234SdrhSecondly, if a terminal or nonterminal in a Lemon grammar rule 37575897234Sdrhincludes a linking symbol in parentheses but that linking symbol 37675897234Sdrhis not actually used in the reduce action, then an error message 37775897234Sdrhis generated. 37875897234SdrhFor example, the rule 37975897234Sdrh<pre> 38075897234Sdrh expr(A) ::= expr(B) PLUS expr(C). { A = B; } 38175897234Sdrh</pre> 38275897234Sdrhwill generate an error because the linking symbol ``C'' is used 38375897234Sdrhin the grammar rule but not in the reduce action.</p> 38475897234Sdrh 38575897234Sdrh<p>The Lemon notation for linking grammar rules to reduce actions 38675897234Sdrhalso facilitates the use of destructors for reclaiming memory 38775897234Sdrhallocated by the values of terminals and nonterminals on the 38875897234Sdrhright-hand side of a rule.</p> 38975897234Sdrh 39075897234Sdrh<h3>Precedence Rules</h3> 39175897234Sdrh 39275897234Sdrh<p>Lemon resolves parsing ambiguities in exactly the same way as 39375897234Sdrhyacc and bison. A shift-reduce conflict is resolved in favor 39475897234Sdrhof the shift, and a reduce-reduce conflict is resolved by reducing 39575897234Sdrhwhichever rule comes first in the grammar file.</p> 39675897234Sdrh 39775897234Sdrh<p>Just like in 39875897234Sdrhyacc and bison, Lemon allows a measure of control 39975897234Sdrhover the resolution of paring conflicts using precedence rules. 40075897234SdrhA precedence value can be assigned to any terminal symbol 40175897234Sdrhusing the %left, %right or %nonassoc directives. Terminal symbols 40275897234Sdrhmentioned in earlier directives have a lower precedence that 40375897234Sdrhterminal symbols mentioned in later directives. For example:</p> 40475897234Sdrh 40575897234Sdrh<p><pre> 40675897234Sdrh %left AND. 40775897234Sdrh %left OR. 40875897234Sdrh %nonassoc EQ NE GT GE LT LE. 40975897234Sdrh %left PLUS MINUS. 41075897234Sdrh %left TIMES DIVIDE MOD. 41175897234Sdrh %right EXP NOT. 41275897234Sdrh</pre></p> 41375897234Sdrh 41475897234Sdrh<p>In the preceding sequence of directives, the AND operator is 41575897234Sdrhdefined to have the lowest precedence. The OR operator is one 41675897234Sdrhprecedence level higher. And so forth. Hence, the grammar would 41775897234Sdrhattempt to group the ambiguous expression 41875897234Sdrh<pre> 41975897234Sdrh a AND b OR c 42075897234Sdrh</pre> 42175897234Sdrhlike this 42275897234Sdrh<pre> 42375897234Sdrh a AND (b OR c). 42475897234Sdrh</pre> 42575897234SdrhThe associativity (left, right or nonassoc) is used to determine 42675897234Sdrhthe grouping when the precedence is the same. AND is left-associative 42775897234Sdrhin our example, so 42875897234Sdrh<pre> 42975897234Sdrh a AND b AND c 43075897234Sdrh</pre> 43175897234Sdrhis parsed like this 43275897234Sdrh<pre> 43375897234Sdrh (a AND b) AND c. 43475897234Sdrh</pre> 43575897234SdrhThe EXP operator is right-associative, though, so 43675897234Sdrh<pre> 43775897234Sdrh a EXP b EXP c 43875897234Sdrh</pre> 43975897234Sdrhis parsed like this 44075897234Sdrh<pre> 44175897234Sdrh a EXP (b EXP c). 44275897234Sdrh</pre> 44375897234SdrhThe nonassoc precedence is used for non-associative operators. 44475897234SdrhSo 44575897234Sdrh<pre> 44675897234Sdrh a EQ b EQ c 44775897234Sdrh</pre> 44875897234Sdrhis an error.</p> 44975897234Sdrh 45075897234Sdrh<p>The precedence of non-terminals is transferred to rules as follows: 45175897234SdrhThe precedence of a grammar rule is equal to the precedence of the 45275897234Sdrhleft-most terminal symbol in the rule for which a precedence is 45375897234Sdrhdefined. This is normally what you want, but in those cases where 45475897234Sdrhyou want to precedence of a grammar rule to be something different, 45575897234Sdrhyou can specify an alternative precedence symbol by putting the 45675897234Sdrhsymbol in square braces after the period at the end of the rule and 45775897234Sdrhbefore any C-code. For example:</p> 45875897234Sdrh 45975897234Sdrh<p><pre> 46075897234Sdrh expr = MINUS expr. [NOT] 46175897234Sdrh</pre></p> 46275897234Sdrh 46375897234Sdrh<p>This rule has a precedence equal to that of the NOT symbol, not the 46475897234SdrhMINUS symbol as would have been the case by default.</p> 46575897234Sdrh 46675897234Sdrh<p>With the knowledge of how precedence is assigned to terminal 46775897234Sdrhsymbols and individual 46875897234Sdrhgrammar rules, we can now explain precisely how parsing conflicts 46975897234Sdrhare resolved in Lemon. Shift-reduce conflicts are resolved 47075897234Sdrhas follows: 47175897234Sdrh<ul> 47275897234Sdrh<li> If either the token to be shifted or the rule to be reduced 47375897234Sdrh lacks precedence information, then resolve in favor of the 47475897234Sdrh shift, but report a parsing conflict. 47575897234Sdrh<li> If the precedence of the token to be shifted is greater than 47675897234Sdrh the precedence of the rule to reduce, then resolve in favor 47775897234Sdrh of the shift. No parsing conflict is reported. 47875897234Sdrh<li> If the precedence of the token it be shifted is less than the 47975897234Sdrh precedence of the rule to reduce, then resolve in favor of the 48075897234Sdrh reduce action. No parsing conflict is reported. 48175897234Sdrh<li> If the precedences are the same and the shift token is 48275897234Sdrh right-associative, then resolve in favor of the shift. 48375897234Sdrh No parsing conflict is reported. 48475897234Sdrh<li> If the precedences are the same the the shift token is 48575897234Sdrh left-associative, then resolve in favor of the reduce. 48675897234Sdrh No parsing conflict is reported. 48775897234Sdrh<li> Otherwise, resolve the conflict by doing the shift and 48875897234Sdrh report the parsing conflict. 48975897234Sdrh</ul> 49075897234SdrhReduce-reduce conflicts are resolved this way: 49175897234Sdrh<ul> 49275897234Sdrh<li> If either reduce rule 49375897234Sdrh lacks precedence information, then resolve in favor of the 49475897234Sdrh rule that appears first in the grammar and report a parsing 49575897234Sdrh conflict. 49675897234Sdrh<li> If both rules have precedence and the precedence is different 49775897234Sdrh then resolve the dispute in favor of the rule with the highest 49875897234Sdrh precedence and do not report a conflict. 49975897234Sdrh<li> Otherwise, resolve the conflict by reducing by the rule that 50075897234Sdrh appears first in the grammar and report a parsing conflict. 50175897234Sdrh</ul> 50275897234Sdrh 50375897234Sdrh<h3>Special Directives</h3> 50475897234Sdrh 50575897234Sdrh<p>The input grammar to Lemon consists of grammar rules and special 50675897234Sdrhdirectives. We've described all the grammar rules, so now we'll 50775897234Sdrhtalk about the special directives.</p> 50875897234Sdrh 50975897234Sdrh<p>Directives in lemon can occur in any order. You can put them before 51075897234Sdrhthe grammar rules, or after the grammar rules, or in the mist of the 51175897234Sdrhgrammar rules. It doesn't matter. The relative order of 51275897234Sdrhdirectives used to assign precedence to terminals is important, but 51375897234Sdrhother than that, the order of directives in Lemon is arbitrary.</p> 51475897234Sdrh 51575897234Sdrh<p>Lemon supports the following special directives: 51675897234Sdrh<ul> 517f2340fc7Sdrh<li><tt>%code</tt> 518f2340fc7Sdrh<li><tt>%default_destructor</tt> 519f2340fc7Sdrh<li><tt>%default_type</tt> 52075897234Sdrh<li><tt>%destructor</tt> 52175897234Sdrh<li><tt>%extra_argument</tt> 52275897234Sdrh<li><tt>%include</tt> 52375897234Sdrh<li><tt>%left</tt> 52475897234Sdrh<li><tt>%name</tt> 52575897234Sdrh<li><tt>%nonassoc</tt> 52675897234Sdrh<li><tt>%parse_accept</tt> 52775897234Sdrh<li><tt>%parse_failure </tt> 52875897234Sdrh<li><tt>%right</tt> 52975897234Sdrh<li><tt>%stack_overflow</tt> 53075897234Sdrh<li><tt>%stack_size</tt> 53175897234Sdrh<li><tt>%start_symbol</tt> 53275897234Sdrh<li><tt>%syntax_error</tt> 53375897234Sdrh<li><tt>%token_destructor</tt> 53475897234Sdrh<li><tt>%token_prefix</tt> 53575897234Sdrh<li><tt>%token_type</tt> 53675897234Sdrh<li><tt>%type</tt> 53775897234Sdrh</ul> 53875897234SdrhEach of these directives will be described separately in the 53975897234Sdrhfollowing sections:</p> 54075897234Sdrh 541f2340fc7Sdrh<h4>The <tt>%code</tt> directive</h4> 542f2340fc7Sdrh 543f2340fc7Sdrh<p>The %code directive is used to specify addition C/C++ code that 544f2340fc7Sdrhis added to the end of the main output file. This is similar to 545f2340fc7Sdrhthe %include directive except that %include is inserted at the 546f2340fc7Sdrhbeginning of the main output file.</p> 547f2340fc7Sdrh 548f2340fc7Sdrh<p>%code is typically used to include some action routines or perhaps 549f2340fc7Sdrha tokenizer as part of the output file.</p> 550f2340fc7Sdrh 551f2340fc7Sdrh<h4>The <tt>%default_destructor</tt> directive</h4> 552f2340fc7Sdrh 553f2340fc7Sdrh<p>The %default_destructor directive specifies a destructor to 554f2340fc7Sdrhuse for non-terminals that do not have their own destructor 555f2340fc7Sdrhspecified by a separate %destructor directive. See the documentation 556f2340fc7Sdrhon the %destructor directive below for additional information.</p> 557f2340fc7Sdrh 558f2340fc7Sdrh<p>In some grammers, many different non-terminal symbols have the 559f2340fc7Sdrhsame datatype and hence the same destructor. This directive is 560f2340fc7Sdrha convenience way to specify the same destructor for all those 561f2340fc7Sdrhnon-terminals using a single statement.</p> 562f2340fc7Sdrh 563f2340fc7Sdrh<h4>The <tt>%default_type</tt> directive</h4> 564f2340fc7Sdrh 565f2340fc7Sdrh<p>The %default_type directive specifies the datatype of non-terminal 566f2340fc7Sdrhsymbols that do no have their own datatype defined using a separate 567f2340fc7Sdrh%type directive. See the documentation on %type below for addition 568f2340fc7Sdrhinformation.</p> 569f2340fc7Sdrh 57075897234Sdrh<h4>The <tt>%destructor</tt> directive</h4> 57175897234Sdrh 57275897234Sdrh<p>The %destructor directive is used to specify a destructor for 57375897234Sdrha non-terminal symbol. 57475897234Sdrh(See also the %token_destructor directive which is used to 57575897234Sdrhspecify a destructor for terminal symbols.)</p> 57675897234Sdrh 57775897234Sdrh<p>A non-terminal's destructor is called to dispose of the 57875897234Sdrhnon-terminal's value whenever the non-terminal is popped from 57975897234Sdrhthe stack. This includes all of the following circumstances: 58075897234Sdrh<ul> 58175897234Sdrh<li> When a rule reduces and the value of a non-terminal on 58275897234Sdrh the right-hand side is not linked to C code. 58375897234Sdrh<li> When the stack is popped during error processing. 58475897234Sdrh<li> When the ParseFree() function runs. 58575897234Sdrh</ul> 58675897234SdrhThe destructor can do whatever it wants with the value of 58775897234Sdrhthe non-terminal, but its design is to deallocate memory 58875897234Sdrhor other resources held by that non-terminal.</p> 58975897234Sdrh 59075897234Sdrh<p>Consider an example: 59175897234Sdrh<pre> 59275897234Sdrh %type nt {void*} 59375897234Sdrh %destructor nt { free($$); } 59475897234Sdrh nt(A) ::= ID NUM. { A = malloc( 100 ); } 59575897234Sdrh</pre> 59675897234SdrhThis example is a bit contrived but it serves to illustrate how 59775897234Sdrhdestructors work. The example shows a non-terminal named 59875897234Sdrh``nt'' that holds values of type ``void*''. When the rule for 59975897234Sdrhan ``nt'' reduces, it sets the value of the non-terminal to 60075897234Sdrhspace obtained from malloc(). Later, when the nt non-terminal 60175897234Sdrhis popped from the stack, the destructor will fire and call 60275897234Sdrhfree() on this malloced space, thus avoiding a memory leak. 60375897234Sdrh(Note that the symbol ``$$'' in the destructor code is replaced 60475897234Sdrhby the value of the non-terminal.)</p> 60575897234Sdrh 60675897234Sdrh<p>It is important to note that the value of a non-terminal is passed 60775897234Sdrhto the destructor whenever the non-terminal is removed from the 60875897234Sdrhstack, unless the non-terminal is used in a C-code action. If 60975897234Sdrhthe non-terminal is used by C-code, then it is assumed that the 61075897234SdrhC-code will take care of destroying it if it should really 61175897234Sdrhbe destroyed. More commonly, the value is used to build some 61275897234Sdrhlarger structure and we don't want to destroy it, which is why 61375897234Sdrhthe destructor is not called in this circumstance.</p> 61475897234Sdrh 61575897234Sdrh<p>By appropriate use of destructors, it is possible to 61675897234Sdrhbuild a parser using Lemon that can be used within a long-running 61775897234Sdrhprogram, such as a GUI, that will not leak memory or other resources. 61875897234SdrhTo do the same using yacc or bison is much more difficult.</p> 61975897234Sdrh 62075897234Sdrh<h4>The <tt>%extra_argument</tt> directive</h4> 62175897234Sdrh 62275897234SdrhThe %extra_argument directive instructs Lemon to add a 4th parameter 62375897234Sdrhto the parameter list of the Parse() function it generates. Lemon 62475897234Sdrhdoesn't do anything itself with this extra argument, but it does 62575897234Sdrhmake the argument available to C-code action routines, destructors, 62675897234Sdrhand so forth. For example, if the grammar file contains:</p> 62775897234Sdrh 62875897234Sdrh<p><pre> 62975897234Sdrh %extra_argument { MyStruct *pAbc } 63075897234Sdrh</pre></p> 63175897234Sdrh 63275897234Sdrh<p>Then the Parse() function generated will have an 4th parameter 63375897234Sdrhof type ``MyStruct*'' and all action routines will have access to 63475897234Sdrha variable named ``pAbc'' that is the value of the 4th parameter 63575897234Sdrhin the most recent call to Parse().</p> 63675897234Sdrh 63775897234Sdrh<h4>The <tt>%include</tt> directive</h4> 63875897234Sdrh 63975897234Sdrh<p>The %include directive specifies C code that is included at the 64075897234Sdrhtop of the generated parser. You can include any text you want -- 641f2340fc7Sdrhthe Lemon parser generator copies it blindly. If you have multiple 642f2340fc7Sdrh%include directives in your grammar file the value of the last 643f2340fc7Sdrh%include directive overwrites all the others.</p. 64475897234Sdrh 64575897234Sdrh<p>The %include directive is very handy for getting some extra #include 64675897234Sdrhpreprocessor statements at the beginning of the generated parser. 64775897234SdrhFor example:</p> 64875897234Sdrh 64975897234Sdrh<p><pre> 65075897234Sdrh %include {#include <unistd.h>} 65175897234Sdrh</pre></p> 65275897234Sdrh 65375897234Sdrh<p>This might be needed, for example, if some of the C actions in the 65475897234Sdrhgrammar call functions that are prototyed in unistd.h.</p> 65575897234Sdrh 65675897234Sdrh<h4>The <tt>%left</tt> directive</h4> 65775897234Sdrh 65875897234SdrhThe %left directive is used (along with the %right and 65975897234Sdrh%nonassoc directives) to declare precedences of terminal 66075897234Sdrhsymbols. Every terminal symbol whose name appears after 66175897234Sdrha %left directive but before the next period (``.'') is 66275897234Sdrhgiven the same left-associative precedence value. Subsequent 66375897234Sdrh%left directives have higher precedence. For example:</p> 66475897234Sdrh 66575897234Sdrh<p><pre> 66675897234Sdrh %left AND. 66775897234Sdrh %left OR. 66875897234Sdrh %nonassoc EQ NE GT GE LT LE. 66975897234Sdrh %left PLUS MINUS. 67075897234Sdrh %left TIMES DIVIDE MOD. 67175897234Sdrh %right EXP NOT. 67275897234Sdrh</pre></p> 67375897234Sdrh 67475897234Sdrh<p>Note the period that terminates each %left, %right or %nonassoc 67575897234Sdrhdirective.</p> 67675897234Sdrh 67775897234Sdrh<p>LALR(1) grammars can get into a situation where they require 67875897234Sdrha large amount of stack space if you make heavy use or right-associative 67975897234Sdrhoperators. For this reason, it is recommended that you use %left 68075897234Sdrhrather than %right whenever possible.</p> 68175897234Sdrh 68275897234Sdrh<h4>The <tt>%name</tt> directive</h4> 68375897234Sdrh 68475897234Sdrh<p>By default, the functions generated by Lemon all begin with the 68575897234Sdrhfive-character string ``Parse''. You can change this string to something 68675897234Sdrhdifferent using the %name directive. For instance:</p> 68775897234Sdrh 68875897234Sdrh<p><pre> 68975897234Sdrh %name Abcde 69075897234Sdrh</pre></p> 69175897234Sdrh 69275897234Sdrh<p>Putting this directive in the grammar file will cause Lemon to generate 69375897234Sdrhfunctions named 69475897234Sdrh<ul> 69575897234Sdrh<li> AbcdeAlloc(), 69675897234Sdrh<li> AbcdeFree(), 69775897234Sdrh<li> AbcdeTrace(), and 69875897234Sdrh<li> Abcde(). 69975897234Sdrh</ul> 70075897234SdrhThe %name directive allows you to generator two or more different 70175897234Sdrhparsers and link them all into the same executable. 70275897234Sdrh</p> 70375897234Sdrh 70475897234Sdrh<h4>The <tt>%nonassoc</tt> directive</h4> 70575897234Sdrh 70675897234Sdrh<p>This directive is used to assign non-associative precedence to 70775897234Sdrhone or more terminal symbols. See the section on precedence rules 70875897234Sdrhor on the %left directive for additional information.</p> 70975897234Sdrh 71075897234Sdrh<h4>The <tt>%parse_accept</tt> directive</h4> 71175897234Sdrh 71275897234Sdrh<p>The %parse_accept directive specifies a block of C code that is 71375897234Sdrhexecuted whenever the parser accepts its input string. To ``accept'' 71475897234Sdrhan input string means that the parser was able to process all tokens 71575897234Sdrhwithout error.</p> 71675897234Sdrh 71775897234Sdrh<p>For example:</p> 71875897234Sdrh 71975897234Sdrh<p><pre> 72075897234Sdrh %parse_accept { 72175897234Sdrh printf("parsing complete!\n"); 72275897234Sdrh } 72375897234Sdrh</pre></p> 72475897234Sdrh 72575897234Sdrh 72675897234Sdrh<h4>The <tt>%parse_failure</tt> directive</h4> 72775897234Sdrh 72875897234Sdrh<p>The %parse_failure directive specifies a block of C code that 72975897234Sdrhis executed whenever the parser fails complete. This code is not 73075897234Sdrhexecuted until the parser has tried and failed to resolve an input 73175897234Sdrherror using is usual error recovery strategy. The routine is 73275897234Sdrhonly invoked when parsing is unable to continue.</p> 73375897234Sdrh 73475897234Sdrh<p><pre> 73575897234Sdrh %parse_failure { 73675897234Sdrh fprintf(stderr,"Giving up. Parser is hopelessly lost...\n"); 73775897234Sdrh } 73875897234Sdrh</pre></p> 73975897234Sdrh 74075897234Sdrh<h4>The <tt>%right</tt> directive</h4> 74175897234Sdrh 74275897234Sdrh<p>This directive is used to assign right-associative precedence to 74375897234Sdrhone or more terminal symbols. See the section on precedence rules 74475897234Sdrhor on the %left directive for additional information.</p> 74575897234Sdrh 74675897234Sdrh<h4>The <tt>%stack_overflow</tt> directive</h4> 74775897234Sdrh 74875897234Sdrh<p>The %stack_overflow directive specifies a block of C code that 74975897234Sdrhis executed if the parser's internal stack ever overflows. Typically 75075897234Sdrhthis just prints an error message. After a stack overflow, the parser 75175897234Sdrhwill be unable to continue and must be reset.</p> 75275897234Sdrh 75375897234Sdrh<p><pre> 75475897234Sdrh %stack_overflow { 75575897234Sdrh fprintf(stderr,"Giving up. Parser stack overflow\n"); 75675897234Sdrh } 75775897234Sdrh</pre></p> 75875897234Sdrh 75975897234Sdrh<p>You can help prevent parser stack overflows by avoiding the use 76075897234Sdrhof right recursion and right-precedence operators in your grammar. 76175897234SdrhUse left recursion and and left-precedence operators instead, to 76275897234Sdrhencourage rules to reduce sooner and keep the stack size down. 76375897234SdrhFor example, do rules like this: 76475897234Sdrh<pre> 76575897234Sdrh list ::= list element. // left-recursion. Good! 76675897234Sdrh list ::= . 76775897234Sdrh</pre> 76875897234SdrhNot like this: 76975897234Sdrh<pre> 77075897234Sdrh list ::= element list. // right-recursion. Bad! 77175897234Sdrh list ::= . 77275897234Sdrh</pre> 77375897234Sdrh 77475897234Sdrh<h4>The <tt>%stack_size</tt> directive</h4> 77575897234Sdrh 77675897234Sdrh<p>If stack overflow is a problem and you can't resolve the trouble 77775897234Sdrhby using left-recursion, then you might want to increase the size 77875897234Sdrhof the parser's stack using this directive. Put an positive integer 77975897234Sdrhafter the %stack_size directive and Lemon will generate a parse 78075897234Sdrhwith a stack of the requested size. The default value is 100.</p> 78175897234Sdrh 78275897234Sdrh<p><pre> 78375897234Sdrh %stack_size 2000 78475897234Sdrh</pre></p> 78575897234Sdrh 78675897234Sdrh<h4>The <tt>%start_symbol</tt> directive</h4> 78775897234Sdrh 78875897234Sdrh<p>By default, the start-symbol for the grammar that Lemon generates 78975897234Sdrhis the first non-terminal that appears in the grammar file. But you 79075897234Sdrhcan choose a different start-symbol using the %start_symbol directive.</p> 79175897234Sdrh 79275897234Sdrh<p><pre> 79375897234Sdrh %start_symbol prog 79475897234Sdrh</pre></p> 79575897234Sdrh 79675897234Sdrh<h4>The <tt>%token_destructor</tt> directive</h4> 79775897234Sdrh 79875897234Sdrh<p>The %destructor directive assigns a destructor to a non-terminal 79975897234Sdrhsymbol. (See the description of the %destructor directive above.) 80075897234SdrhThis directive does the same thing for all terminal symbols.</p> 80175897234Sdrh 80275897234Sdrh<p>Unlike non-terminal symbols which may each have a different data type 80375897234Sdrhfor their values, terminals all use the same data type (defined by 80475897234Sdrhthe %token_type directive) and so they use a common destructor. Other 80575897234Sdrhthan that, the token destructor works just like the non-terminal 80675897234Sdrhdestructors.</p> 80775897234Sdrh 80875897234Sdrh<h4>The <tt>%token_prefix</tt> directive</h4> 80975897234Sdrh 81075897234Sdrh<p>Lemon generates #defines that assign small integer constants 81175897234Sdrhto each terminal symbol in the grammar. If desired, Lemon will 81275897234Sdrhadd a prefix specified by this directive 81375897234Sdrhto each of the #defines it generates. 81475897234SdrhSo if the default output of Lemon looked like this: 81575897234Sdrh<pre> 81675897234Sdrh #define AND 1 81775897234Sdrh #define MINUS 2 81875897234Sdrh #define OR 3 81975897234Sdrh #define PLUS 4 82075897234Sdrh</pre> 82175897234SdrhYou can insert a statement into the grammar like this: 82275897234Sdrh<pre> 82375897234Sdrh %token_prefix TOKEN_ 82475897234Sdrh</pre> 82575897234Sdrhto cause Lemon to produce these symbols instead: 82675897234Sdrh<pre> 82775897234Sdrh #define TOKEN_AND 1 82875897234Sdrh #define TOKEN_MINUS 2 82975897234Sdrh #define TOKEN_OR 3 83075897234Sdrh #define TOKEN_PLUS 4 83175897234Sdrh</pre> 83275897234Sdrh 83375897234Sdrh<h4>The <tt>%token_type</tt> and <tt>%type</tt> directives</h4> 83475897234Sdrh 83575897234Sdrh<p>These directives are used to specify the data types for values 83675897234Sdrhon the parser's stack associated with terminal and non-terminal 83775897234Sdrhsymbols. The values of all terminal symbols must be of the same 83875897234Sdrhtype. This turns out to be the same data type as the 3rd parameter 83975897234Sdrhto the Parse() function generated by Lemon. Typically, you will 84075897234Sdrhmake the value of a terminal symbol by a pointer to some kind of 84175897234Sdrhtoken structure. Like this:</p> 84275897234Sdrh 84375897234Sdrh<p><pre> 84475897234Sdrh %token_type {Token*} 84575897234Sdrh</pre></p> 84675897234Sdrh 84775897234Sdrh<p>If the data type of terminals is not specified, the default value 84875897234Sdrhis ``int''.</p> 84975897234Sdrh 85075897234Sdrh<p>Non-terminal symbols can each have their own data types. Typically 85175897234Sdrhthe data type of a non-terminal is a pointer to the root of a parse-tree 85275897234Sdrhstructure that contains all information about that non-terminal. 85375897234SdrhFor example:</p> 85475897234Sdrh 85575897234Sdrh<p><pre> 85675897234Sdrh %type expr {Expr*} 85775897234Sdrh</pre></p> 85875897234Sdrh 85975897234Sdrh<p>Each entry on the parser's stack is actually a union containing 86075897234Sdrhinstances of all data types for every non-terminal and terminal symbol. 86175897234SdrhLemon will automatically use the correct element of this union depending 86275897234Sdrhon what the corresponding non-terminal or terminal symbol is. But 86375897234Sdrhthe grammar designer should keep in mind that the size of the union 86475897234Sdrhwill be the size of its largest element. So if you have a single 86575897234Sdrhnon-terminal whose data type requires 1K of storage, then your 100 86675897234Sdrhentry parser stack will require 100K of heap space. If you are willing 86775897234Sdrhand able to pay that price, fine. You just need to know.</p> 86875897234Sdrh 86975897234Sdrh<h3>Error Processing</h3> 87075897234Sdrh 87175897234Sdrh<p>After extensive experimentation over several years, it has been 87275897234Sdrhdiscovered that the error recovery strategy used by yacc is about 87375897234Sdrhas good as it gets. And so that is what Lemon uses.</p> 87475897234Sdrh 87575897234Sdrh<p>When a Lemon-generated parser encounters a syntax error, it 87675897234Sdrhfirst invokes the code specified by the %syntax_error directive, if 87775897234Sdrhany. It then enters its error recovery strategy. The error recovery 87875897234Sdrhstrategy is to begin popping the parsers stack until it enters a 87975897234Sdrhstate where it is permitted to shift a special non-terminal symbol 88075897234Sdrhnamed ``error''. It then shifts this non-terminal and continues 88175897234Sdrhparsing. But the %syntax_error routine will not be called again 88275897234Sdrhuntil at least three new tokens have been successfully shifted.</p> 88375897234Sdrh 88475897234Sdrh<p>If the parser pops its stack until the stack is empty, and it still 88575897234Sdrhis unable to shift the error symbol, then the %parse_failed routine 88675897234Sdrhis invoked and the parser resets itself to its start state, ready 88775897234Sdrhto begin parsing a new file. This is what will happen at the very 88875897234Sdrhfirst syntax error, of course, if there are no instances of the 88975897234Sdrh``error'' non-terminal in your grammar.</p> 89075897234Sdrh 89175897234Sdrh</body> 89275897234Sdrh</html> 893