From 1d9bed166c9114d035925645959f30f03030bc21 Mon Sep 17 00:00:00 2001 From: grog Date: Sun, 19 May 2002 04:37:39 +0000 Subject: Initial checkin: 4.4BSD version. These files need to be updated with current license information and adapted to the FreeBSD build environment before they will build. Approved by: David Taylor --- share/doc/psd/15.yacc/Makefile | 12 ++ share/doc/psd/15.yacc/ss.. | 61 ++++++++ share/doc/psd/15.yacc/ss0 | 206 +++++++++++++++++++++++++ share/doc/psd/15.yacc/ss1 | 143 ++++++++++++++++++ share/doc/psd/15.yacc/ss2 | 156 +++++++++++++++++++ share/doc/psd/15.yacc/ss3 | 109 ++++++++++++++ share/doc/psd/15.yacc/ss4 | 335 +++++++++++++++++++++++++++++++++++++++++ share/doc/psd/15.yacc/ss5 | 307 +++++++++++++++++++++++++++++++++++++ share/doc/psd/15.yacc/ss6 | 151 +++++++++++++++++++ share/doc/psd/15.yacc/ss7 | 129 ++++++++++++++++ share/doc/psd/15.yacc/ss8 | 98 ++++++++++++ share/doc/psd/15.yacc/ss9 | 174 +++++++++++++++++++++ share/doc/psd/15.yacc/ssA | 189 +++++++++++++++++++++++ share/doc/psd/15.yacc/ssB | 31 ++++ share/doc/psd/15.yacc/ssa | 118 +++++++++++++++ share/doc/psd/15.yacc/ssb | 115 ++++++++++++++ share/doc/psd/15.yacc/ssc | 316 ++++++++++++++++++++++++++++++++++++++ share/doc/psd/15.yacc/ssd | 45 ++++++ 18 files changed, 2695 insertions(+) create mode 100644 share/doc/psd/15.yacc/Makefile create mode 100644 share/doc/psd/15.yacc/ss.. create mode 100644 share/doc/psd/15.yacc/ss0 create mode 100644 share/doc/psd/15.yacc/ss1 create mode 100644 share/doc/psd/15.yacc/ss2 create mode 100644 share/doc/psd/15.yacc/ss3 create mode 100644 share/doc/psd/15.yacc/ss4 create mode 100644 share/doc/psd/15.yacc/ss5 create mode 100644 share/doc/psd/15.yacc/ss6 create mode 100644 share/doc/psd/15.yacc/ss7 create mode 100644 share/doc/psd/15.yacc/ss8 create mode 100644 share/doc/psd/15.yacc/ss9 create mode 100644 share/doc/psd/15.yacc/ssA create mode 100644 share/doc/psd/15.yacc/ssB create mode 100644 share/doc/psd/15.yacc/ssa create mode 100644 share/doc/psd/15.yacc/ssb create mode 100644 share/doc/psd/15.yacc/ssc create mode 100644 share/doc/psd/15.yacc/ssd (limited to 'share/doc') diff --git a/share/doc/psd/15.yacc/Makefile b/share/doc/psd/15.yacc/Makefile new file mode 100644 index 0000000..8fa7406 --- /dev/null +++ b/share/doc/psd/15.yacc/Makefile @@ -0,0 +1,12 @@ +# @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +DIR= psd/15.yacc +SRCS= ss.. ss0 ss1 ss2 ss3 ss4 ss5 ss6 ss7 ss8 ss9 ssA ssB ssa ssb ssc ssd +MACROS= -msU +REFER= refer -e -p /usr/old/dict/papers/Ind + +paper.ps: ${SRCS} + ${REFER} ${SRCS} | ${ROFF} > ${.TARGET} + +.include diff --git a/share/doc/psd/15.yacc/ss.. b/share/doc/psd/15.yacc/ss.. new file mode 100644 index 0000000..d9786b2 --- /dev/null +++ b/share/doc/psd/15.yacc/ss.. @@ -0,0 +1,61 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss.. 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.EH 'PSD:15-%''Yacc: Yet Another Compiler-Compiler' +.OH 'Yacc: Yet Another Compiler-Compiler''PSD:15-%' +.\".RP +.ND "July 31, 1978" +.TL +Yacc: +Yet Another Compiler-Compiler +.AU "MH 2C-559" 3968 +Stephen C. Johnson +.AI +.MH +.AB +.PP +Computer program input generally has some structure; +in fact, every computer program that does input can be thought of as defining +an ``input language'' which it accepts. +An input language may be as complex as a programming language, or as simple as +a sequence of numbers. +Unfortunately, usual input facilities +are limited, difficult to use, +and often are lax about checking their inputs for validity. +.PP +Yacc provides a general tool for describing +the input to a computer program. +The Yacc user specifies the structures +of his input, together with code to be invoked as +each such structure is recognized. +Yacc turns such a specification into a subroutine that +handles the input process; +frequently, it is convenient and appropriate to have most +of the flow of control in the user's application +handled by this subroutine. +.PP +The input subroutine produced by Yacc calls a user-supplied routine to +return the next basic input item. +Thus, the user can specify his input in terms of individual input characters, or +in terms of higher level constructs such as names and numbers. +The user-supplied routine may also handle idiomatic features such as +comment and continuation conventions, which typically defy easy grammatical specification. +.PP +Yacc is written in portable C. +The class of specifications accepted is a very general one: LALR(1) +grammars with disambiguating rules. +.PP +In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc +has also been used for less conventional languages, +including a phototypesetter language, several desk calculator languages, a document retrieval system, +and a Fortran debugging system. +.AE +.OK +.\"Computer Languages +.\"Compilers +.\"Formal Language Theory +.CS 23 11 34 0 0 8 diff --git a/share/doc/psd/15.yacc/ss0 b/share/doc/psd/15.yacc/ss0 new file mode 100644 index 0000000..aed127a --- /dev/null +++ b/share/doc/psd/15.yacc/ss0 @@ -0,0 +1,206 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss0 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +0: Introduction +.PP +Yacc provides a general tool for imposing structure on the input to a computer program. +The Yacc user prepares a +specification of the input process; this includes rules +describing the input structure, code to be invoked when these +rules are recognized, and a low-level routine to do the +basic input. +Yacc then generates a function to control the input process. +This function, called a +.I parser , +calls the user-supplied low-level input routine +(the +.I "lexical analyzer" ) +to pick up the basic items +(called +.I tokens ) +from the input stream. +These tokens are organized according to the input structure rules, +called +.I "grammar rules" \|; +when one of these rules has been recognized, +then user code supplied for this rule, an +.I action , +is invoked; actions have the ability to return values and +make use of the values of other actions. +.PP +Yacc is written in a portable dialect of C +.[ +Ritchie Kernighan Language Prentice +.] +and the actions, and output subroutine, are in C as well. +Moreover, many of the syntactic conventions of Yacc follow C. +.PP +The heart of the input specification is a collection of grammar rules. +Each rule describes an allowable structure and gives it a name. +For example, one grammar rule might be +.DS +date : month\_name day \',\' year ; +.DE +Here, +.I date , +.I month\_name , +.I day , +and +.I year +represent structures of interest in the input process; +presumably, +.I month\_name , +.I day , +and +.I year +are defined elsewhere. +The comma ``,'' is enclosed in single quotes; this implies that the +comma is to appear literally in the input. +The colon and semicolon merely serve as punctuation in the rule, and have +no significance in controlling the input. +Thus, with proper definitions, the input +.DS +July 4, 1776 +.DE +might be matched by the above rule. +.PP +An important part of the input process is carried out by the +lexical analyzer. +This user routine reads the input stream, recognizing the lower level structures, +and communicates these tokens +to the parser. +For historical reasons, a structure recognized by the lexical analyzer is called a +.I "terminal symbol" , +while the structure recognized by the parser is called a +.I "nonterminal symbol" . +To avoid confusion, terminal symbols will usually be referred to as +.I tokens . +.PP +There is considerable leeway in deciding whether to recognize structures using the lexical +analyzer or grammar rules. +For example, the rules +.DS +month\_name : \'J\' \'a\' \'n\' ; +month\_name : \'F\' \'e\' \'b\' ; + + . . . + +month\_name : \'D\' \'e\' \'c\' ; +.DE +might be used in the above example. +The lexical analyzer would only need to recognize individual letters, and +.I month\_name +would be a nonterminal symbol. +Such low-level rules tend to waste time and space, and may +complicate the specification beyond Yacc's ability to deal with it. +Usually, the lexical analyzer would +recognize the month names, +and return an indication that a +.I month\_name +was seen; in this case, +.I month\_name +would be a token. +.PP +Literal characters such as ``,'' must also be passed through the lexical +analyzer, and are also considered tokens. +.PP +Specification files are very flexible. +It is realively easy to add to the above example the rule +.DS +date : month \'/\' day \'/\' year ; +.DE +allowing +.DS +7 / 4 / 1776 +.DE +as a synonym for +.DS +July 4, 1776 +.DE +In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, +and little danger of disrupting existing input. +.PP +The input being read may not conform to the +specifications. +These input errors are detected as early as is theoretically possible with a +left-to-right scan; +thus, not only is the chance of reading and computing with bad +input data substantially reduced, but the bad data can usually be quickly found. +Error handling, +provided as part of the input specifications, +permits the reentry of bad data, +or the continuation of the input process after skipping over the bad data. +.PP +In some cases, Yacc fails to produce a parser when given a set of +specifications. +For example, the specifications may be self contradictory, or they may +require a more powerful recognition mechanism than that available to Yacc. +The former cases represent design errors; +the latter cases +can often be corrected +by making +the lexical analyzer +more powerful, or by rewriting some of the grammar rules. +While Yacc cannot handle all possible specifications, its power +compares favorably with similar systems; +moreover, the +constructions which are difficult for Yacc to handle are +also frequently difficult for human beings to handle. +Some users have reported that the discipline of formulating valid +Yacc specifications for their input revealed errors of +conception or design early in the program development. +.PP +The theory underlying Yacc has been described elsewhere. +.[ +Aho Johnson Surveys LR Parsing +.] +.[ +Aho Johnson Ullman Ambiguous Grammars +.] +.[ +Aho Ullman Principles Compiler Design +.] +Yacc has been extensively used in numerous practical applications, +including +.I lint , +.[ +Johnson Lint +.] +the Portable C Compiler, +.[ +Johnson Portable Compiler Theory +.] +and a system for typesetting mathematics. +.[ +Kernighan Cherry typesetting system CACM +.] +.PP +The next several sections describe the +basic process of preparing a Yacc specification; +Section 1 describes the preparation of grammar rules, +Section 2 the preparation of the user supplied actions associated with these rules, +and Section 3 the preparation of lexical analyzers. +Section 4 describes the operation of the parser. +Section 5 discusses various reasons why Yacc may be unable to produce a +parser from a specification, and what to do about it. +Section 6 describes a simple mechanism for +handling operator precedences in arithmetic expressions. +Section 7 discusses error detection and recovery. +Section 8 discusses the operating environment and special features +of the parsers Yacc produces. +Section 9 gives some suggestions which should improve the +style and efficiency of the specifications. +Section 10 discusses some advanced topics, and Section 11 gives +acknowledgements. +Appendix A has a brief example, and Appendix B gives a +summary of the Yacc input syntax. +Appendix C gives an example using some of the more advanced +features of Yacc, and, finally, +Appendix D describes mechanisms and syntax +no longer actively supported, but +provided for historical continuity with older versions of Yacc. diff --git a/share/doc/psd/15.yacc/ss1 b/share/doc/psd/15.yacc/ss1 new file mode 100644 index 0000000..d1c9832 --- /dev/null +++ b/share/doc/psd/15.yacc/ss1 @@ -0,0 +1,143 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss1 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.tr *\(** +.tr |\(or +.SH +1: Basic Specifications +.PP +Names refer to either tokens or nonterminal symbols. +Yacc requires +token names to be declared as such. +In addition, for reasons discussed in Section 3, it is often desirable +to include the lexical analyzer as part of the specification file; +it may be useful to include other programs as well. +Thus, every specification file consists of three sections: +the +.I declarations , +.I "(grammar) rules" , +and +.I programs . +The sections are separated by double percent ``%%'' marks. +(The percent ``%'' is generally used in Yacc specifications as an escape character.) +.PP +In other words, a full specification file looks like +.DS +declarations +%% +rules +%% +programs +.DE +.PP +The declaration section may be empty. +Moreover, if the programs section is omitted, the second %% mark may be omitted also; +thus, the smallest legal Yacc specification is +.DS +%% +rules +.DE +.PP +Blanks, tabs, and newlines are ignored except +that they may not appear in names or multi-character reserved symbols. +Comments may appear wherever a name is legal; they are enclosed +in /* . . . */, as in C and PL/I. +.PP +The rules section is made up of one or more grammar rules. +A grammar rule has the form: +.DS +A : BODY ; +.DE +A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals. +The colon and the semicolon are Yacc punctuation. +.PP +Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and +non-initial digits. +Upper and lower case letters are distinct. +The names used in the body of a grammar rule may represent tokens or nonterminal symbols. +.PP +A literal consists of a character enclosed in single quotes ``\'''. +As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes +are recognized. +Thus +.DS +\'\en\' newline +\'\er\' return +\'\e\'\' single quote ``\''' +\'\e\e\' backslash ``\e'' +\'\et\' tab +\'\eb\' backspace +\'\ef\' form feed +\'\exxx\' ``xxx'' in octal +.DE +For a number of technical reasons, the +\s-2NUL\s0 +character (\'\e0\' or 0) should never +be used in grammar rules. +.PP +If there are several grammar rules with the same left hand side, the vertical bar ``|'' +can be used to avoid rewriting the left hand side. +In addition, +the semicolon at the end of a rule can be dropped before a vertical bar. +Thus the grammar rules +.DS +A : B C D ; +A : E F ; +A : G ; +.DE +can be given to Yacc as +.DS +A : B C D + | E F + | G + ; +.DE +It is not necessary that all grammar rules with the same left side appear together in the grammar rules section, +although it makes the input much more readable, and easier to change. +.PP +If a nonterminal symbol matches the empty string, this can be indicated in the obvious way: +.DS +empty : ; +.DE +.PP +Names representing tokens must be declared; this is most simply done by writing +.DS +%token name1 name2 . . . +.DE +in the declarations section. +(See Sections 3 , 5, and 6 for much more discussion). +Every name not defined in the declarations section is assumed to represent a nonterminal symbol. +Every nonterminal symbol must appear on the left side of at least one rule. +.PP +Of all the nonterminal symbols, one, called the +.I "start symbol" , +has particular importance. +The parser is designed to recognize the start symbol; thus, +this symbol represents the largest, +most general structure described by the grammar rules. +By default, +the start symbol is taken to be the left hand side of the first +grammar rule in the rules section. +It is possible, and in fact desirable, to declare the start +symbol explicitly in the declarations section using the %start keyword: +.DS +%start symbol +.DE +.PP +The end of the input to the parser is signaled by a special token, called the +.I endmarker . +If the tokens up to, but not including, the endmarker form a structure +which matches the start symbol, the parser function returns to its caller +after the endmarker is seen; it +.I accepts +the input. +If the endmarker is seen in any other context, it is an error. +.PP +It is the job of the user-supplied lexical analyzer +to return the endmarker when appropriate; see section 3, below. +Usually the endmarker represents some reasonably obvious +I/O status, such as ``end-of-file'' or ``end-of-record''. diff --git a/share/doc/psd/15.yacc/ss2 b/share/doc/psd/15.yacc/ss2 new file mode 100644 index 0000000..929e4a6 --- /dev/null +++ b/share/doc/psd/15.yacc/ss2 @@ -0,0 +1,156 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss2 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +2: Actions +.PP +With each grammar rule, the user may associate actions to be performed each time +the rule is recognized in the input process. +These actions may return values, and may obtain the values returned by previous +actions. +Moreover, the lexical analyzer can return values +for tokens, if desired. +.PP +An action is an arbitrary C statement, and as such can do +input and output, call subprograms, and alter +external vectors and variables. +An action is specified by +one or more statements, enclosed in curly braces ``{'' and ``}''. +For example, +.DS +A : \'(\' B \')\' + { hello( 1, "abc" ); } +.DE +and +.DS +XXX : YYY ZZZ + { printf("a message\en"); + flag = 25; } +.DE +are grammar rules with actions. +.PP +To facilitate easy communication between the actions and the parser, the action statements are altered +slightly. +The symbol ``dollar sign'' ``$'' is used as a signal to Yacc in this context. +.PP +To return a value, the action normally sets the +pseudo-variable ``$$'' to some value. +For example, an action that does nothing but return the value 1 is +.DS + { $$ = 1; } +.DE +.PP +To obtain the values returned by previous actions and the lexical analyzer, the +action may use the pseudo-variables $1, $2, . . ., +which refer to the values returned by the +components of the right side of a rule, reading from left to right. +Thus, if the rule is +.DS +A : B C D ; +.DE +for example, then $2 has the value returned by C, and $3 the value returned by D. +.PP +As a more concrete example, consider the rule +.DS +expr : \'(\' expr \')\' ; +.DE +The value returned by this rule is usually the value of the +.I expr +in parentheses. +This can be indicated by +.DS +expr : \'(\' expr \')\' { $$ = $2 ; } +.DE +.PP +By default, the value of a rule is the value of the first element in it ($1). +Thus, grammar rules of the form +.DS +A : B ; +.DE +frequently need not have an explicit action. +.PP +In the examples above, all the actions came at the end of their rules. +Sometimes, it is desirable to get control before a rule is fully parsed. +Yacc permits an action to be written in the middle of a rule as well +as at the end. +This rule is assumed to return a value, accessible +through the usual \$ mechanism by the actions to +the right of it. +In turn, it may access the values +returned by the symbols to its left. +Thus, in the rule +.DS +A : B + { $$ = 1; } + C + { x = $2; y = $3; } + ; +.DE +the effect is to set +.I x +to 1, and +.I y +to the value returned by C. +.PP +Actions that do not terminate a rule are actually +handled by Yacc by manufacturing a new nonterminal +symbol name, and a new rule matching this +name to the empty string. +The interior action is the action triggered off by recognizing +this added rule. +Yacc actually treats the above example as if +it had been written: +.DS +$ACT : /* empty */ + { $$ = 1; } + ; + +A : B $ACT C + { x = $2; y = $3; } + ; +.DE +.PP +In many applications, output is not done directly by the actions; +rather, a data structure, such as a parse tree, is constructed in memory, +and transformations are applied to it before output is generated. +Parse trees are particularly easy to +construct, given routines to build and maintain the tree +structure desired. +For example, suppose there is a C function +.I node , +written so that the call +.DS +node( L, n1, n2 ) +.DE +creates a node with label L, and descendants n1 and n2, and returns the index of +the newly created node. +Then parse tree can be built by supplying actions such as: +.DS +expr : expr \'+\' expr + { $$ = node( \'+\', $1, $3 ); } +.DE +in the specification. +.PP +The user may define other variables to be used by the actions. +Declarations and definitions can appear in +the declarations section, +enclosed in the marks ``%{'' and ``%}''. +These declarations and definitions have global scope, +so they are known to the action statements and the lexical analyzer. +For example, +.DS +%{ int variable = 0; %} +.DE +could be placed in the declarations section, +making +.I variable +accessible to all of the actions. +The Yacc parser uses only names beginning in ``yy''; +the user should avoid such names. +.PP +In these examples, all the values are integers: a discussion of +values of other types will be found in Section 10. diff --git a/share/doc/psd/15.yacc/ss3 b/share/doc/psd/15.yacc/ss3 new file mode 100644 index 0000000..d520a34 --- /dev/null +++ b/share/doc/psd/15.yacc/ss3 @@ -0,0 +1,109 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss3 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +3: Lexical Analysis +.PP +The user must supply a lexical analyzer to read the input stream and communicate tokens +(with values, if desired) to the parser. +The lexical analyzer is an integer-valued function called +.I yylex . +The function returns an integer, the +.I "token number" , +representing the kind of token read. +If there is a value associated with that token, it should be assigned +to the external variable +.I yylval . +.PP +The parser and the lexical analyzer must agree on these token numbers in order for +communication between them to take place. +The numbers may be chosen by Yacc, or chosen by the user. +In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer +to return these numbers symbolically. +For example, suppose that the token name DIGIT has been defined in the declarations section of the +Yacc specification file. +The relevant portion of the lexical analyzer might look like: +.DS +yylex(){ + extern int yylval; + int c; + . . . + c = getchar(); + . . . + switch( c ) { + . . . + case \'0\': + case \'1\': + . . . + case \'9\': + yylval = c\-\'0\'; + return( DIGIT ); + . . . + } + . . . +.DE +.PP +The intent is to return a token number of DIGIT, and a value equal to the numerical value of the +digit. +Provided that the lexical analyzer code is placed in the programs section of the specification file, +the identifier DIGIT will be defined as the token number associated +with the token DIGIT. +.PP +This mechanism leads to clear, +easily modified lexical analyzers; the only pitfall is the need +to avoid using any token names in the grammar that are reserved +or significant in C or the parser; for example, the use of +token names +.I if +or +.I while +will almost certainly cause severe +difficulties when the lexical analyzer is compiled. +The token name +.I error +is reserved for error handling, and should not be used naively +(see Section 7). +.PP +As mentioned above, the token numbers may be chosen by Yacc or by the user. +In the default situation, the numbers are chosen by Yacc. +The default token number for a literal +character is the numerical value of the character in the local character set. +Other names are assigned token numbers +starting at 257. +.PP +To assign a token number to a token (including literals), +the first appearance of the token name or literal +.I +in the declarations section +.R +can be immediately followed by +a nonnegative integer. +This integer is taken to be the token number of the name or literal. +Names and literals not defined by this mechanism retain their default definition. +It is important that all token numbers be distinct. +.PP +For historical reasons, the endmarker must have token +number 0 or negative. +This token number cannot be redefined by the user; thus, all +lexical analyzers should be prepared to return 0 or negative as a token number +upon reaching the end of their input. +.PP +A very useful tool for constructing lexical analyzers is +the +.I Lex +program developed by Mike Lesk. +.[ +Lesk Lex +.] +These lexical analyzers are designed to work in close +harmony with Yacc parsers. +The specifications for these lexical analyzers +use regular expressions instead of grammar rules. +Lex can be easily used to produce quite complicated lexical analyzers, +but there remain some languages (such as FORTRAN) which do not +fit any theoretical framework, and whose lexical analyzers +must be crafted by hand. diff --git a/share/doc/psd/15.yacc/ss4 b/share/doc/psd/15.yacc/ss4 new file mode 100644 index 0000000..3d6066e --- /dev/null +++ b/share/doc/psd/15.yacc/ss4 @@ -0,0 +1,335 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss4 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +4: How the Parser Works +.PP +Yacc turns the specification file into a C program, which +parses the input according to the specification given. +The algorithm used to go from the +specification to the parser is complex, and will not be discussed +here (see +the references for more information). +The parser itself, however, is relatively simple, +and understanding how it works, while +not strictly necessary, will nevertheless make +treatment of error recovery and ambiguities much more +comprehensible. +.PP +The parser produced by Yacc consists +of a finite state machine with a stack. +The parser is also capable of reading and remembering the next +input token (called the +.I lookahead +token). +The +.I "current state" +is always the one on the top of the stack. +The states of the finite state machine are given +small integer labels; initially, the machine is in state 0, +the stack contains only state 0, and no lookahead token has been read. +.PP +The machine has only four actions available to it, called +.I shift , +.I reduce , +.I accept , +and +.I error . +A move of the parser is done as follows: +.IP 1. +Based on its current state, the parser decides +whether it needs a lookahead token to decide +what action should be done; if it needs one, and does +not have one, it calls +.I yylex +to obtain the next token. +.IP 2. +Using the current state, and the lookahead token +if needed, the parser decides on its next action, and +carries it out. +This may result in states being pushed onto the stack, or popped off of +the stack, and in the lookahead token being processed +or left alone. +.PP +The +.I shift +action is the most common action the parser takes. +Whenever a shift action is taken, there is always +a lookahead token. +For example, in state 56 there may be +an action: +.DS + IF shift 34 +.DE +which says, in state 56, if the lookahead token is IF, +the current state (56) is pushed down on the stack, +and state 34 becomes the current state (on the +top of the stack). +The lookahead token is cleared. +.PP +The +.I reduce +action keeps the stack from growing without +bounds. +Reduce actions are appropriate when the parser has seen +the right hand side of a grammar rule, +and is prepared to announce that it has seen +an instance of the rule, replacing the right hand side +by the left hand side. +It may be necessary to consult the lookahead token +to decide whether to reduce, but +usually it is not; in fact, the default +action (represented by a ``.'') is often a reduce action. +.PP +Reduce actions are associated with individual grammar rules. +Grammar rules are also given small integer +numbers, leading to some confusion. +The action +.DS + \fB.\fR reduce 18 +.DE +refers to +.I "grammar rule" +18, while the action +.DS + IF shift 34 +.DE +refers to +.I state +34. +.PP +Suppose the rule being reduced is +.DS +A \fB:\fR x y z ; +.DE +The reduce action depends on the +left hand symbol (A in this case), and the number of +symbols on the right hand side (three in this case). +To reduce, first pop off the top three states +from the stack +(In general, the number of states popped equals the number of symbols on the +right side of the rule). +In effect, these states were the ones +put on the stack while recognizing +.I x , +.I y , +and +.I z , +and no longer serve any useful purpose. +After popping these states, a state is uncovered +which was the state the parser was in before beginning to +process the rule. +Using this uncovered state, and the symbol +on the left side of the rule, perform what is in +effect a shift of A. +A new state is obtained, pushed onto the stack, and parsing continues. +There are significant differences between the processing of +the left hand symbol and an ordinary shift of a token, +however, so this action is called a +.I goto +action. +In particular, the lookahead token is cleared by a shift, and +is not affected by a goto. +In any case, the uncovered state contains an entry such as: +.DS + A goto 20 +.DE +causing state 20 to be pushed +onto the stack, and become the current state. +.PP +In effect, the reduce action ``turns back the clock'' in the parse, +popping the states off the stack to go back to the +state where the right hand side of the rule was first seen. +The parser then behaves as if it had seen the left side at that time. +If the right hand side of the rule is empty, +no states are popped off of the stack: the uncovered state +is in fact the current state. +.PP +The reduce action is also important in the treatment of user-supplied +actions and values. +When a rule is reduced, the code supplied with the rule is executed +before the stack is adjusted. +In addition to the stack holding the states, another stack, +running in parallel with it, holds the values returned +from the lexical analyzer and the actions. +When a shift takes place, the external variable +.I yylval +is copied onto the value stack. +After the return from the user code, the reduction is carried out. +When the +.I goto +action is done, the external variable +.I yyval +is copied onto the value stack. +The pseudo-variables $1, $2, etc., refer to the value stack. +.PP +The other two parser actions are conceptually much simpler. +The +.I accept +action indicates that the entire input has been seen and +that it matches the specification. +This action appears only when the lookahead token is +the endmarker, and indicates that the parser has successfully +done its job. +The +.I error +action, on the other hand, represents a place where the parser +can no longer continue parsing according to the specification. +The input tokens it has seen, together with the lookahead token, +cannot be followed by anything that would result +in a legal input. +The parser reports an error, and attempts to recover the situation and +resume parsing: the error recovery (as opposed to the detection of error) +will be covered in Section 7. +.PP +It is time for an example! +Consider the specification +.DS +%token DING DONG DELL +%% +rhyme : sound place + ; +sound : DING DONG + ; +place : DELL + ; +.DE +.PP +When Yacc is invoked with the +.B \-v +option, a file called +.I y.output +is produced, with a human-readable description of the parser. +The +.I y.output +file corresponding to the above grammar (with some statistics +stripped off the end) is: +.DS +state 0 + $accept : \_rhyme $end + + DING shift 3 + . error + + rhyme goto 1 + sound goto 2 + +state 1 + $accept : rhyme\_$end + + $end accept + . error + +state 2 + rhyme : sound\_place + + DELL shift 5 + . error + + place goto 4 + +state 3 + sound : DING\_DONG + + DONG shift 6 + . error + +state 4 + rhyme : sound place\_ (1) + + . reduce 1 + +state 5 + place : DELL\_ (3) + + . reduce 3 + +state 6 + sound : DING DONG\_ (2) + + . reduce 2 +.DE +Notice that, in addition to the actions for each state, there is a +description of the parsing rules being processed in each +state. The \_ character is used to indicate +what has been seen, and what is yet to come, in each rule. +Suppose the input is +.DS +DING DONG DELL +.DE +It is instructive to follow the steps of the parser while +processing this input. +.PP +Initially, the current state is state 0. +The parser needs to refer to the input in order to decide +between the actions available in state 0, so +the first token, +.I DING , +is read, becoming the lookahead token. +The action in state 0 on +.I DING +is +is ``shift 3'', so state 3 is pushed onto the stack, +and the lookahead token is cleared. +State 3 becomes the current state. +The next token, +.I DONG , +is read, becoming the lookahead token. +The action in state 3 on the token +.I DONG +is ``shift 6'', +so state 6 is pushed onto the stack, and the lookahead is cleared. +The stack now contains 0, 3, and 6. +In state 6, without even consulting the lookahead, +the parser reduces by rule 2. +.DS + sound : DING DONG +.DE +This rule has two symbols on the right hand side, so +two states, 6 and 3, are popped off of the stack, uncovering state 0. +Consulting the description of state 0, looking for a goto on +.I sound , +.DS + sound goto 2 +.DE +is obtained; thus state 2 is pushed onto the stack, +becoming the current state. +.PP +In state 2, the next token, +.I DELL , +must be read. +The action is ``shift 5'', so state 5 is pushed onto the stack, which now has +0, 2, and 5 on it, and the lookahead token is cleared. +In state 5, the only action is to reduce by rule 3. +This has one symbol on the right hand side, so one state, 5, +is popped off, and state 2 is uncovered. +The goto in state 2 on +.I place , +the left side of rule 3, +is state 4. +Now, the stack contains 0, 2, and 4. +In state 4, the only action is to reduce by rule 1. +There are two symbols on the right, so the top two states are popped off, +uncovering state 0 again. +In state 0, there is a goto on +.I rhyme +causing the parser to enter state 1. +In state 1, the input is read; the endmarker is obtained, +indicated by ``$end'' in the +.I y.output +file. +The action in state 1 when the endmarker is seen is to accept, +successfully ending the parse. +.PP +The reader is urged to consider how the parser works +when confronted with such incorrect strings as +.I "DING DONG DONG" , +.I "DING DONG" , +.I "DING DONG DELL DELL" , +etc. +A few minutes spend with this and other simple examples will +probably be repaid when problems arise in more complicated contexts. diff --git a/share/doc/psd/15.yacc/ss5 b/share/doc/psd/15.yacc/ss5 new file mode 100644 index 0000000..df9cae0 --- /dev/null +++ b/share/doc/psd/15.yacc/ss5 @@ -0,0 +1,307 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss5 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +5: Ambiguity and Conflicts +.PP +A set of grammar rules is +.I ambiguous +if there is some input string that can be structured in two or more different ways. +For example, the grammar rule +.DS +expr : expr \'\-\' expr +.DE +is a natural way of expressing the fact that one way of forming an arithmetic expression +is to put two other expressions together with a minus sign between them. +Unfortunately, this grammar rule does not +completely specify the way that all complex inputs +should be structured. +For example, if the input is +.DS +expr \- expr \- expr +.DE +the rule allows this input to be structured as either +.DS +( expr \- expr ) \- expr +.DE +or as +.DS +expr \- ( expr \- expr ) +.DE +(The first is called +.I "left association" , +the second +.I "right association" ). +.PP +Yacc detects such ambiguities when it is attempting to build the parser. +It is instructive to consider the problem that confronts the parser when it is +given an input such as +.DS +expr \- expr \- expr +.DE +When the parser has read the second expr, the input that it has seen: +.DS +expr \- expr +.DE +matches the right side of the grammar rule above. +The parser could +.I reduce +the input by applying this rule; +after applying the rule; +the input is reduced to +.I expr (the +left side of the rule). +The parser would then read the final part of the input: +.DS +\- expr +.DE +and again reduce. +The effect of this is to take the left associative interpretation. +.PP +Alternatively, when the parser has seen +.DS +expr \- expr +.DE +it could defer the immediate application of the rule, and continue reading +the input until it had seen +.DS +expr \- expr \- expr +.DE +It could then apply the rule to the rightmost three symbols, reducing them to +.I expr +and leaving +.DS +expr \- expr +.DE +Now the rule can be reduced once more; the effect is to +take the right associative interpretation. +Thus, having read +.DS +expr \- expr +.DE +the parser can do two legal things, a shift or a reduction, and has no way of +deciding between them. +This is called a +.I "shift / reduce conflict" . +It may also happen that the parser has a choice of two legal reductions; +this is called a +.I "reduce / reduce conflict" . +Note that there are never any ``Shift/shift'' conflicts. +.PP +When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser. +It does this by selecting one of the valid steps wherever it has a choice. +A rule describing which choice to make in a given situation is called +a +.I "disambiguating rule" . +.PP +Yacc invokes two disambiguating rules by default: +.IP 1. +In a shift/reduce conflict, the default is to do the shift. +.IP 2. +In a reduce/reduce conflict, the default is to reduce by the +.I earlier +grammar rule (in the input sequence). +.PP +Rule 1 implies that reductions are deferred whenever there is a choice, +in favor of shifts. +Rule 2 gives the user rather crude control over the behavior of the parser +in this situation, but reduce/reduce conflicts should be avoided whenever possible. +.PP +Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent, +require a more complex parser than Yacc can construct. +The use of actions within rules can also cause conflicts, if the action must +be done before the parser can be sure which rule is being recognized. +In these cases, the application of disambiguating rules is inappropriate, +and leads to an incorrect parser. +For this reason, Yacc +always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2. +.PP +In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also +possible to rewrite the grammar rules so that the same inputs are read but there are no +conflicts. +For this reason, most previous parser generators +have considered conflicts to be fatal errors. +Our experience has suggested that this rewriting is somewhat unnatural, +and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts. +.PP +As an example of the power of disambiguating rules, consider a fragment from a programming +language involving an ``if-then-else'' construction: +.DS +stat : IF \'(\' cond \')\' stat + | IF \'(\' cond \')\' stat ELSE stat + ; +.DE +In these rules, +.I IF +and +.I ELSE +are tokens, +.I cond +is a nonterminal symbol describing +conditional (logical) expressions, and +.I stat +is a nonterminal symbol describing statements. +The first rule will be called the +.ul +simple-if +rule, and the +second the +.ul +if-else +rule. +.PP +These two rules form an ambiguous construction, since input of the form +.DS +IF ( C1 ) IF ( C2 ) S1 ELSE S2 +.DE +can be structured according to these rules in two ways: +.DS +IF ( C1 ) { + IF ( C2 ) S1 + } +ELSE S2 +.DE +or +.DS +IF ( C1 ) { + IF ( C2 ) S1 + ELSE S2 + } +.DE +The second interpretation is the one given in most programming languages +having this construct. +Each +.I ELSE +is associated with the last preceding +``un-\fIELSE'\fRd'' +.I IF . +In this example, consider the situation where the parser has seen +.DS +IF ( C1 ) IF ( C2 ) S1 +.DE +and is looking at the +.I ELSE . +It can immediately +reduce +by the simple-if rule to get +.DS +IF ( C1 ) stat +.DE +and then read the remaining input, +.DS +ELSE S2 +.DE +and reduce +.DS +IF ( C1 ) stat ELSE S2 +.DE +by the if-else rule. +This leads to the first of the above groupings of the input. +.PP +On the other hand, the +.I ELSE +may be shifted, +.I S2 +read, and then the right hand portion of +.DS +IF ( C1 ) IF ( C2 ) S1 ELSE S2 +.DE +can be reduced by the if-else rule to get +.DS +IF ( C1 ) stat +.DE +which can be reduced by the simple-if rule. +This leads to the second of the above groupings of the input, which +is usually desired. +.PP +Once again the parser can do two valid things \- there is a shift/reduce conflict. +The application of disambiguating rule 1 tells the parser to shift in this case, +which leads to the desired grouping. +.PP +This shift/reduce conflict arises only when there is a particular current input symbol, +.I ELSE , +and particular inputs already seen, such as +.DS +IF ( C1 ) IF ( C2 ) S1 +.DE +In general, there may be many conflicts, and each one +will be associated with an input symbol and +a set of previously read inputs. +The previously read inputs are characterized by the +state +of the parser. +.PP +The conflict messages of Yacc are best understood +by examining the verbose (\fB\-v\fR) option output file. +For example, the output corresponding to the above +conflict state might be: +.DS L +23: shift/reduce conflict (shift 45, reduce 18) on ELSE + +state 23 + + stat : IF ( cond ) stat\_ (18) + stat : IF ( cond ) stat\_ELSE stat + + ELSE shift 45 + . reduce 18 + +.DE +The first line describes the conflict, giving the state and the input symbol. +The ordinary state description follows, giving +the grammar rules active in the state, and the parser actions. +Recall that the underline marks the +portion of the grammar rules which has been seen. +Thus in the example, in state 23 the parser has seen input corresponding +to +.DS +IF ( cond ) stat +.DE +and the two grammar rules shown are active at this time. +The parser can do two possible things. +If the input symbol is +.I ELSE , +it is possible to shift into state +45. +State 45 will have, as part of its description, the line +.DS +stat : IF ( cond ) stat ELSE\_stat +.DE +since the +.I ELSE +will have been shifted in this state. +Back in state 23, the alternative action, described by ``\fB.\fR'', +is to be done if the input symbol is not mentioned explicitly in the above actions; thus, +in this case, if the input symbol is not +.I ELSE , +the parser reduces by grammar rule 18: +.DS +stat : IF \'(\' cond \')\' stat +.DE +Once again, notice that the numbers following ``shift'' commands refer to other states, +while the numbers following ``reduce'' commands refer to grammar +rule numbers. +In the +.I y.output +file, the rule numbers are printed after those rules which can be reduced. +In most one states, there will be at most reduce action possible in the +state, and this will be the default command. +The user who encounters unexpected shift/reduce conflicts will probably want to +look at the verbose output to decide whether the default actions are appropriate. +In really tough cases, the user might need to know more about +the behavior and construction of the parser than can be covered here. +In this case, one of the theoretical references +.[ +Aho Johnson Surveys Parsing +.] +.[ +Aho Johnson Ullman Deterministic Ambiguous +.] +.[ +Aho Ullman Principles Design +.] +might be consulted; the services of a local guru might also be appropriate. diff --git a/share/doc/psd/15.yacc/ss6 b/share/doc/psd/15.yacc/ss6 new file mode 100644 index 0000000..1000034 --- /dev/null +++ b/share/doc/psd/15.yacc/ss6 @@ -0,0 +1,151 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss6 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +6: Precedence +.PP +There is one common situation +where the rules given above for resolving conflicts are not sufficient; +this is in the parsing of arithmetic expressions. +Most of the commonly used constructions for arithmetic expressions can be naturally +described by the notion of +.I precedence +levels for operators, together with information about left +or right associativity. +It turns out that ambiguous grammars with appropriate disambiguating rules +can be used to create parsers that are faster and easier to +write than parsers constructed from unambiguous grammars. +The basic notion is to write grammar rules +of the form +.DS +expr : expr OP expr +.DE +and +.DS +expr : UNARY expr +.DE +for all binary and unary operators desired. +This creates a very ambiguous grammar, with many parsing conflicts. +As disambiguating rules, the user specifies the precedence, or binding +strength, of all the operators, and the associativity +of the binary operators. +This information is sufficient to allow Yacc to resolve the parsing conflicts +in accordance with these rules, and construct a parser that realizes the desired +precedences and associativities. +.PP +The precedences and associativities are attached to tokens in the declarations section. +This is done by a series of lines beginning with a Yacc keyword: %left, %right, +or %nonassoc, followed by a list of tokens. +All of the tokens on the same line are assumed to have the same precedence level +and associativity; the lines are listed in +order of increasing precedence or binding strength. +Thus, +.DS +%left \'+\' \'\-\' +%left \'*\' \'/\' +.DE +describes the precedence and associativity of the four arithmetic operators. +Plus and minus are left associative, and have lower precedence than +star and slash, which are also left associative. +The keyword %right is used to describe right associative operators, +and the keyword %nonassoc is used to describe operators, like +the operator .LT. in Fortran, that may not associate with themselves; thus, +.DS +A .LT. B .LT. C +.DE +is illegal in Fortran, and such an operator would be described with the keyword +%nonassoc in Yacc. +As an example of the behavior of these declarations, the description +.DS +%right \'=\' +%left \'+\' \'\-\' +%left \'*\' \'/\' + +%% + +expr : expr \'=\' expr + | expr \'+\' expr + | expr \'\-\' expr + | expr \'*\' expr + | expr \'/\' expr + | NAME + ; +.DE +might be used to structure the input +.DS +a = b = c*d \- e \- f*g +.DE +as follows: +.DS +a = ( b = ( ((c*d)\-e) \- (f*g) ) ) +.DE +When this mechanism is used, +unary operators must, in general, be given a precedence. +Sometimes a unary operator and a binary operator +have the same symbolic representation, but different precedences. +An example is unary and binary \'\-\'; unary minus may be given the same +strength as multiplication, or even higher, while binary minus has a lower strength than +multiplication. +The keyword, %prec, changes the precedence level associated with a particular grammar rule. +%prec appears immediately after the body of the grammar rule, before the action or closing semicolon, +and is followed by a token name or literal. +It +causes the precedence of the grammar rule to become that of the following token name or literal. +For example, to make unary minus have the same precedence as multiplication the rules might resemble: +.DS +%left \'+\' \'\-\' +%left \'*\' \'/\' + +%% + +expr : expr \'+\' expr + | expr \'\-\' expr + | expr \'*\' expr + | expr \'/\' expr + | \'\-\' expr %prec \'*\' + | NAME + ; +.DE +.PP +A token declared +by %left, %right, and %nonassoc need not be, but may be, declared by %token as well. +.PP +The precedences and associativities are used by Yacc to +resolve parsing conflicts; they give rise to disambiguating rules. +Formally, the rules work as follows: +.IP 1. +The precedences and associativities are recorded for those tokens and literals +that have them. +.IP 2. +A precedence and associativity is associated with each grammar rule; it is the precedence +and associativity of the last token or literal in the body of the rule. +If the %prec construction is used, it overrides this default. +Some grammar rules may have no precedence and associativity associated with them. +.IP 3. +When there is a reduce/reduce conflict, or there is a shift/reduce conflict +and either the input symbol or the grammar rule has no precedence and associativity, +then the two disambiguating rules given at the beginning of the section are used, +and the conflicts are reported. +.IP 4. +If there is a shift/reduce conflict, and both the grammar rule and the input character +have precedence and associativity associated with them, then the conflict is resolved +in favor of the action (shift or reduce) associated with the higher precedence. +If the precedences are the same, then the associativity is used; left +associative implies reduce, right associative implies shift, and nonassociating +implies error. +.PP +Conflicts resolved by precedence are not counted in the number of shift/reduce and reduce/reduce +conflicts reported by Yacc. +This means that mistakes in the specification of precedences may +disguise errors in the input grammar; it is a good idea to be sparing +with precedences, and use them in an essentially ``cookbook'' fashion, +until some experience has been gained. +The +.I y.output +file +is very useful in deciding whether the parser is actually doing +what was intended. diff --git a/share/doc/psd/15.yacc/ss7 b/share/doc/psd/15.yacc/ss7 new file mode 100644 index 0000000..8876ece --- /dev/null +++ b/share/doc/psd/15.yacc/ss7 @@ -0,0 +1,129 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss7 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +7: Error Handling +.PP +Error handling is an extremely difficult area, and many of the problems are semantic ones. +When an error is found, for example, it may be necessary to reclaim parse tree storage, +delete or alter symbol table entries, and, typically, set switches to avoid generating any further output. +.PP +It is seldom acceptable to stop all processing when an error is found; it is more useful to continue +scanning the input to find further syntax errors. +This leads to the problem of getting the parser ``restarted'' after an error. +A general class of algorithms to do this involves discarding a number of tokens +from the input string, and attempting to adjust the parser so that input can continue. +.PP +To allow the user some control over this process, +Yacc provides a simple, but reasonably general, feature. +The token name ``error'' is reserved for error handling. +This name can be used in grammar rules; +in effect, it suggests places where errors are expected, and recovery might take place. +The parser pops its stack until it enters a state where the token ``error'' is legal. +It then behaves as if the token ``error'' were the current lookahead token, +and performs the action encountered. +The lookahead token is then reset to the token that caused the error. +If no special error rules have been specified, the processing halts when an error is detected. +.PP +In order to prevent a cascade of error messages, the parser, after +detecting an error, remains in error state until three tokens have been successfully +read and shifted. +If an error is detected when the parser is already in error state, +no message is given, and the input token is quietly deleted. +.PP +As an example, a rule of the form +.DS +stat : error +.DE +would, in effect, mean that on a syntax error the parser would attempt to skip over the statement +in which the error was seen. +More precisely, the parser will +scan ahead, looking for three tokens that might legally follow +a statement, and start processing at the first of these; if +the beginnings of statements are not sufficiently distinctive, it may make a +false start in the middle of a statement, and end up reporting a +second error where there is in fact no error. +.PP +Actions may be used with these special error rules. +These actions might attempt to reinitialize tables, reclaim symbol table space, etc. +.PP +Error rules such as the above are very general, but difficult to control. +Somewhat easier are rules such as +.DS +stat : error \';\' +.DE +Here, when there is an error, the parser attempts to skip over the statement, but +will do so by skipping to the next \';\'. +All tokens after the error and before the next \';\' cannot be shifted, and are discarded. +When the \';\' is seen, this rule will be reduced, and any ``cleanup'' +action associated with it performed. +.PP +Another form of error rule arises in interactive applications, where +it may be desirable to permit a line to be reentered after an error. +A possible error rule might be +.DS +input : error \'\en\' { printf( "Reenter last line: " ); } input + { $$ = $4; } +.DE +There is one potential difficulty with this approach; +the parser must correctly process three input tokens before it +admits that it has correctly resynchronized after the error. +If the reentered line contains an error +in the first two tokens, the parser deletes the offending tokens, +and gives no message; this is clearly unacceptable. +For this reason, there is a mechanism that +can be used to force the parser +to believe that an error has been fully recovered from. +The statement +.DS +yyerrok ; +.DE +in an action +resets the parser to its normal mode. +The last example is better written +.DS +input : error \'\en\' + { yyerrok; + printf( "Reenter last line: " ); } + input + { $$ = $4; } + ; +.DE +.PP +As mentioned above, the token seen immediately +after the ``error'' symbol is the input token at which the +error was discovered. +Sometimes, this is inappropriate; for example, an +error recovery action might +take upon itself the job of finding the correct place to resume input. +In this case, +the previous lookahead token must be cleared. +The statement +.DS +yyclearin ; +.DE +in an action will have this effect. +For example, suppose the action after error +were to call some sophisticated resynchronization routine, +supplied by the user, that attempted to advance the input to the +beginning of the next valid statement. +After this routine was called, the next token returned by yylex would presumably +be the first token in a legal statement; +the old, illegal token must be discarded, and the error state reset. +This could be done by a rule like +.DS +stat : error + { resynch(); + yyerrok ; + yyclearin ; } + ; +.DE +.PP +These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser +from many errors; +moreover, the user can get control to deal with +the error actions required by other portions of the program. diff --git a/share/doc/psd/15.yacc/ss8 b/share/doc/psd/15.yacc/ss8 new file mode 100644 index 0000000..cf1b892 --- /dev/null +++ b/share/doc/psd/15.yacc/ss8 @@ -0,0 +1,98 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss8 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +8: The Yacc Environment +.PP +When the user inputs a specification +to Yacc, the output is a file of C programs, called +.I y.tab.c +on most +systems +(due to local file system conventions, the names may differ from +installation to installation). +The function produced by Yacc is called +.I yyparse \|; +it is an integer valued function. +When it is called, it in turn repeatedly calls +.I yylex , +the lexical analyzer +supplied by the user (see Section 3) +to obtain input tokens. +Eventually, either an error is detected, in which case +(if no error recovery is possible) +.I yyparse +returns the value 1, +or the lexical analyzer returns the endmarker token +and the parser accepts. +In this case, +.I yyparse +returns the value 0. +.PP +The user must provide a certain amount of environment for this +parser in order to obtain a working program. +For example, as with every C program, a program called +.I main +must be defined, that eventually calls +.I yyparse . +In addition, a routine called +.I yyerror +prints a message +when a syntax error is detected. +.PP +These two routines must be supplied in one form or another by the +user. +To ease the initial effort of using Yacc, a library has been +provided with default versions of +.I main +and +.I yyerror . +The name of this library is system dependent; +on many systems the library is accessed by a +.B \-ly +argument to the loader. +To show the triviality of these default programs, the source is +given below: +.DS +main(){ + return( yyparse() ); + } +.DE +and +.DS +# include + +yyerror(s) char *s; { + fprintf( stderr, "%s\en", s ); + } +.DE +The argument to +.I yyerror +is a string containing an error message, usually +the string ``syntax error''. +The average application will want to do better than this. +Ordinarily, the program should keep track of the input line number, and print it +along with the message when a syntax error is detected. +The external integer variable +.I yychar +contains the lookahead token number at the time the error was detected; +this may be of some interest in giving better diagnostics. +Since the +.I main +program is probably supplied by the user (to read arguments, etc.) +the Yacc library is useful only in small +projects, or in the earliest stages of larger ones. +.PP +The external integer variable +.I yydebug +is normally set to 0. +If it is set to a nonzero value, the parser will output a +verbose description of its actions, including +a discussion of which input symbols have been read, and +what the parser actions are. +Depending on the operating environment, +it may be possible to set this variable by using a debugging system. diff --git a/share/doc/psd/15.yacc/ss9 b/share/doc/psd/15.yacc/ss9 new file mode 100644 index 0000000..7fc5b68 --- /dev/null +++ b/share/doc/psd/15.yacc/ss9 @@ -0,0 +1,174 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ss9 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +9: Hints for Preparing Specifications +.PP +This section contains miscellaneous hints on preparing efficient, easy to change, +and clear specifications. +The individual subsections are more or less +independent. +.SH +Input Style +.PP +It is difficult to +provide rules with substantial actions +and still have a readable specification file. +The following style hints owe much to Brian Kernighan. +.IP a. +Use all capital letters for token names, all lower case letters for +nonterminal names. +This rule comes under the heading of ``knowing who to blame when +things go wrong.'' +.IP b. +Put grammar rules and actions on separate lines. +This allows either to be changed without +an automatic need to change the other. +.IP c. +Put all rules with the same left hand side together. +Put the left hand side in only once, and let all +following rules begin with a vertical bar. +.IP d. +Put a semicolon only after the last rule with a given left hand side, +and put the semicolon on a separate line. +This allows new rules to be easily added. +.IP e. +Indent rule bodies by two tab stops, and action bodies by three +tab stops. +.PP +The example in Appendix A is written following this style, as are +the examples in the text of this paper (where space permits). +The user must make up his own mind about these stylistic questions; +the central problem, however, is to make the rules visible through +the morass of action code. +.SH +Left Recursion +.PP +The algorithm used by the Yacc parser encourages so called ``left recursive'' +grammar rules: rules of the form +.DS +name : name rest_of_rule ; +.DE +These rules frequently arise when +writing specifications of sequences and lists: +.DS +list : item + | list \',\' item + ; +.DE +and +.DS +seq : item + | seq item + ; +.DE +In each of these cases, the first rule +will be reduced for the first item only, and the second rule +will be reduced for the second and all succeeding items. +.PP +With right recursive rules, such as +.DS +seq : item + | item seq + ; +.DE +the parser would be a bit bigger, and the items would be seen, and reduced, +from right to left. +More seriously, an internal stack in the parser +would be in danger of overflowing if a very long sequence were read. +Thus, the user should use left recursion wherever reasonable. +.PP +It is worth considering whether a sequence with zero +elements has any meaning, and if so, consider writing +the sequence specification with an empty rule: +.DS +seq : /* empty */ + | seq item + ; +.DE +Once again, the first rule would always be reduced exactly once, before the +first item was read, +and then the second rule would be reduced once for each item read. +Permitting empty sequences +often leads to increased generality. +However, conflicts might arise if Yacc is asked to decide +which empty sequence it has seen, when it hasn't seen enough to +know! +.SH +Lexical Tie-ins +.PP +Some lexical decisions depend on context. +For example, the lexical analyzer might want to +delete blanks normally, but not within quoted strings. +Or names might be entered into a symbol table in declarations, +but not in expressions. +.PP +One way of handling this situation is +to create a global flag that is +examined by the lexical analyzer, and set by actions. +For example, suppose a program +consists of 0 or more declarations, followed by 0 or more statements. +Consider: +.DS +%{ + int dflag; +%} + ... other declarations ... + +%% + +prog : decls stats + ; + +decls : /* empty */ + { dflag = 1; } + | decls declaration + ; + +stats : /* empty */ + { dflag = 0; } + | stats statement + ; + + ... other rules ... +.DE +The flag +.I dflag +is now 0 when reading statements, and 1 when reading declarations, +.ul +except for the first token in the first statement. +This token must be seen by the parser before it can tell that +the declaration section has ended and the statements have +begun. +In many cases, this single token exception does not +affect the lexical scan. +.PP +This kind of ``backdoor'' approach can be elaborated +to a noxious degree. +Nevertheless, it represents a way of doing some things +that are difficult, if not impossible, to +do otherwise. +.SH +Reserved Words +.PP +Some programming languages +permit the user to +use words like ``if'', which are normally reserved, +as label or variable names, provided that such use does not +conflict with the legal use of these names in the programming language. +This is extremely hard to do in the framework of Yacc; +it is difficult to pass information to the lexical analyzer +telling it ``this instance of `if' is a keyword, and that instance is a variable''. +The user can make a stab at it, using the +mechanism described in the last subsection, +but it is difficult. +.PP +A number of ways of making this easier are under advisement. +Until then, it is better that the keywords be +.I reserved \|; +that is, be forbidden for use as variable names. +There are powerful stylistic reasons for preferring this, anyway. diff --git a/share/doc/psd/15.yacc/ssA b/share/doc/psd/15.yacc/ssA new file mode 100644 index 0000000..027a06a --- /dev/null +++ b/share/doc/psd/15.yacc/ssA @@ -0,0 +1,189 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ssA 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +10: Advanced Topics +.PP +This section discusses a number of advanced features +of Yacc. +.SH +Simulating Error and Accept in Actions +.PP +The parsing actions of error and accept can be simulated +in an action by use of macros YYACCEPT and YYERROR. +YYACCEPT causes +.I yyparse +to return the value 0; +YYERROR causes +the parser to behave as if the current input symbol +had been a syntax error; +.I yyerror +is called, and error recovery takes place. +These mechanisms can be used to simulate parsers +with multiple endmarkers or context-sensitive syntax checking. +.SH +Accessing Values in Enclosing Rules. +.PP +An action may refer to values +returned by actions to the left of the current rule. +The mechanism is simply the same as with ordinary actions, +a dollar sign followed by a digit, but in this case the +digit may be 0 or negative. +Consider +.DS +sent : adj noun verb adj noun + { \fIlook at the sentence\fR . . . } + ; + +adj : THE { $$ = THE; } + | YOUNG { $$ = YOUNG; } + . . . + ; + +noun : DOG + { $$ = DOG; } + | CRONE + { if( $0 == YOUNG ){ + printf( "what?\en" ); + } + $$ = CRONE; + } + ; + . . . +.DE +In the action following the word CRONE, a check is made that the +preceding token shifted was not YOUNG. +Obviously, this is only possible when a great deal is known about +what might precede the symbol +.I noun +in the input. +There is also a distinctly unstructured flavor about this. +Nevertheless, at times this mechanism will save a great +deal of trouble, especially when a few combinations are to +be excluded from an otherwise regular structure. +.SH +Support for Arbitrary Value Types +.PP +By default, the values returned by actions and the lexical analyzer are integers. +Yacc can also support +values of other types, including structures. +In addition, Yacc keeps track of the types, and inserts +appropriate union member names so that the resulting parser will +be strictly type checked. +The Yacc value stack (see Section 4) +is declared to be a +.I union +of the various types of values desired. +The user declares the union, and associates union member names +to each token and nonterminal symbol having a value. +When the value is referenced through a $$ or $n construction, +Yacc will automatically insert the appropriate union name, so that +no unwanted conversions will take place. +In addition, type checking commands such as +.I Lint\| +.[ +Johnson Lint Checker 1273 +.] +will be far more silent. +.PP +There are three mechanisms used to provide for this typing. +First, there is a way of defining the union; this must be +done by the user since other programs, notably the lexical analyzer, +must know about the union member names. +Second, there is a way of associating a union member name with tokens +and nonterminals. +Finally, there is a mechanism for describing the type of those +few values where Yacc can not easily determine the type. +.PP +To declare the union, the user includes in the declaration section: +.DS +%union { + body of union ... + } +.DE +This declares the Yacc value stack, +and the external variables +.I yylval +and +.I yyval , +to have type equal to this union. +If Yacc was invoked with the +.B \-d +option, the union declaration +is copied onto the +.I y.tab.h +file. +Alternatively, +the union may be declared in a header file, and a typedef +used to define the variable YYSTYPE to represent +this union. +Thus, the header file might also have said: +.DS +typedef union { + body of union ... + } YYSTYPE; +.DE +The header file must be included in the declarations +section, by use of %{ and %}. +.PP +Once YYSTYPE is defined, +the union member names must be associated +with the various terminal and nonterminal names. +The construction +.DS +< name > +.DE +is used to indicate a union member name. +If this follows +one of the +keywords %token, +%left, %right, and %nonassoc, +the union member name is associated with the tokens listed. +Thus, saying +.DS +%left \'+\' \'\-\' +.DE +will cause any reference to values returned by these two tokens to be +tagged with +the union member name +.I optype . +Another keyword, %type, is +used similarly to associate +union member names with nonterminals. +Thus, one might say +.DS +%type expr stat +.DE +.PP +There remain a couple of cases where these mechanisms are insufficient. +If there is an action within a rule, the value returned +by this action has no +.I "a priori" +type. +Similarly, reference to left context values (such as $0 \- see the +previous subsection ) leaves Yacc with no easy way of knowing the type. +In this case, a type can be imposed on the reference by inserting +a union member name, between < and >, immediately after +the first $. +An example of this usage is +.DS +rule : aaa { $$ = 3; } bbb + { fun( $2, $0 ); } + ; +.DE +This syntax has little to recommend it, but the situation arises rarely. +.PP +A sample specification is given in Appendix C. +The facilities in this subsection are not triggered until they are used: +in particular, the use of %type will turn on these mechanisms. +When they are used, there is a fairly strict level of checking. +For example, use of $n or $$ to refer to something with no defined type +is diagnosed. +If these facilities are not triggered, the Yacc value stack is used to +hold +.I int' s, +as was true historically. diff --git a/share/doc/psd/15.yacc/ssB b/share/doc/psd/15.yacc/ssB new file mode 100644 index 0000000..82d32ae --- /dev/null +++ b/share/doc/psd/15.yacc/ssB @@ -0,0 +1,31 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ssB 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +11: Acknowledgements +.PP +Yacc owes much to a +most stimulating collection of users, who have goaded +me beyond my inclination, and frequently beyond my +ability, in their endless search for ``one more feature''. +Their irritating unwillingness to learn how to +do things my way has usually led to my doing things their way; +most of the time, they have been right. +B. W. Kernighan, P. J. Plauger, S. I. Feldman, C. Imagna, +M. E. Lesk, +and A. Snyder will recognize some of their ideas in the current version +of Yacc. +C. B. Haley contributed to the error recovery algorithm. +D. M. Ritchie, B. W. Kernighan, and M. O. Harris helped translate this document into English. +Al Aho also deserves special credit for bringing +the mountain to Mohammed, and other favors. +.SG "MH-1273-SCJ-unix" +.bp +.[ +$LIST$ +.] +.bp diff --git a/share/doc/psd/15.yacc/ssa b/share/doc/psd/15.yacc/ssa new file mode 100644 index 0000000..fb604f9 --- /dev/null +++ b/share/doc/psd/15.yacc/ssa @@ -0,0 +1,118 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ssa 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +Appendix A: A Simple Example +.PP +This example gives the complete Yacc specification for a small desk calculator; +the desk calculator has 26 registers, labeled ``a'' through ``z'', and accepts +arithmetic expressions made up of the operators +, \-, *, /, +% (mod operator), & (bitwise and), | (bitwise or), and assignment. +If an expression at the top level is an assignment, the value is not +printed; otherwise it is. +As in C, an integer that begins with 0 (zero) is assumed to be octal; +otherwise, it is assumed to be decimal. +.PP +As an example of a Yacc specification, the desk calculator +does a reasonable job of showing how precedences and ambiguities +are used, and demonstrating simple error recovery. +The major oversimplifications are that the +lexical analysis phase is much simpler than for most applications, and the +output is produced immediately, line by line. +Note the way that decimal and octal integers are read in by the grammar rules; +This job is probably better done by the lexical analyzer. +.sp +.nf +.ta .5i 1i 1.5i 2i 2.5i + +%{ +# include +# include + +int regs[26]; +int base; + +%} + +%start list + +%token DIGIT LETTER + +%left \'|\' +%left \'&\' +%left \'+\' \'\-\' +%left \'*\' \'/\' \'%\' +%left UMINUS /* supplies precedence for unary minus */ + +%% /* beginning of rules section */ + +list : /* empty */ + | list stat \'\en\' + | list error \'\en\' + { yyerrok; } + ; + +stat : expr + { printf( "%d\en", $1 ); } + | LETTER \'=\' expr + { regs[$1] = $3; } + ; + +expr : \'(\' expr \')\' + { $$ = $2; } + | expr \'+\' expr + { $$ = $1 + $3; } + | expr \'\-\' expr + { $$ = $1 \- $3; } + | expr \'*\' expr + { $$ = $1 * $3; } + | expr \'/\' expr + { $$ = $1 / $3; } + | expr \'%\' expr + { $$ = $1 % $3; } + | expr \'&\' expr + { $$ = $1 & $3; } + | expr \'|\' expr + { $$ = $1 | $3; } + | \'\-\' expr %prec UMINUS + { $$ = \- $2; } + | LETTER + { $$ = regs[$1]; } + | number + ; + +number : DIGIT + { $$ = $1; base = ($1==0) ? 8 : 10; } + | number DIGIT + { $$ = base * $1 + $2; } + ; + +%% /* start of programs */ + +yylex() { /* lexical analysis routine */ + /* returns LETTER for a lower case letter, yylval = 0 through 25 */ + /* return DIGIT for a digit, yylval = 0 through 9 */ + /* all other characters are returned immediately */ + + int c; + + while( (c=getchar()) == \' \' ) { /* skip blanks */ } + + /* c is now nonblank */ + + if( islower( c ) ) { + yylval = c \- \'a\'; + return ( LETTER ); + } + if( isdigit( c ) ) { + yylval = c \- \'0\'; + return( DIGIT ); + } + return( c ); + } +.fi +.bp diff --git a/share/doc/psd/15.yacc/ssb b/share/doc/psd/15.yacc/ssb new file mode 100644 index 0000000..4b4df03 --- /dev/null +++ b/share/doc/psd/15.yacc/ssb @@ -0,0 +1,115 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ssb 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +Appendix B: Yacc Input Syntax +.PP +This Appendix has a description of the Yacc input syntax, as a Yacc specification. +Context dependencies, etc., are not considered. +Ironically, the Yacc input specification language +is most naturally specified as an LR(2) grammar; the sticky +part comes when an identifier is seen in a rule, immediately +following an action. +If this identifier is followed by a colon, it is the start of the +next rule; otherwise +it is a continuation of the current rule, which just happens to have +an action embedded in it. +As implemented, the lexical analyzer looks +ahead after seeing an identifier, and +decide whether the next token (skipping blanks, newlines, comments, etc.) +is a colon. +If so, it returns the token C_IDENTIFIER. +Otherwise, it returns IDENTIFIER. +Literals (quoted strings) are also returned as IDENTIFIERS, +but never as part of C_IDENTIFIERs. +.sp +.nf +.ta .6i 1.2i 1.8i 2.4i 3i 3.6i + + /* grammar for the input to Yacc */ + + /* basic entities */ +%token IDENTIFIER /* includes identifiers and literals */ +%token C_IDENTIFIER /* identifier (but not literal) followed by colon */ +%token NUMBER /* [0-9]+ */ + + /* reserved words: %type => TYPE, %left => LEFT, etc. */ + +%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION + +%token MARK /* the %% mark */ +%token LCURL /* the %{ mark */ +%token RCURL /* the %} mark */ + + /* ascii character literals stand for themselves */ + +%start spec + +%% + +spec : defs MARK rules tail + ; + +tail : MARK { \fIIn this action, eat up the rest of the file\fR } + | /* empty: the second MARK is optional */ + ; + +defs : /* empty */ + | defs def + ; + +def : START IDENTIFIER + | UNION { \fICopy union definition to output\fR } + | LCURL { \fICopy C code to output file\fR } RCURL + | ndefs rword tag nlist + ; + +rword : TOKEN + | LEFT + | RIGHT + | NONASSOC + | TYPE + ; + +tag : /* empty: union tag is optional */ + | \'<\' IDENTIFIER \'>\' + ; + +nlist : nmno + | nlist nmno + | nlist \',\' nmno + ; + +nmno : IDENTIFIER /* NOTE: literal illegal with %type */ + | IDENTIFIER NUMBER /* NOTE: illegal with %type */ + ; + + /* rules section */ + +rules : C_IDENTIFIER rbody prec + | rules rule + ; + +rule : C_IDENTIFIER rbody prec + | '|' rbody prec + ; + +rbody : /* empty */ + | rbody IDENTIFIER + | rbody act + ; + +act : \'{\' { \fICopy action, translate $$, etc.\fR } \'}\' + ; + +prec : /* empty */ + | PREC IDENTIFIER + | PREC IDENTIFIER act + | prec \';\' + ; +.fi +.bp diff --git a/share/doc/psd/15.yacc/ssc b/share/doc/psd/15.yacc/ssc new file mode 100644 index 0000000..29b38be --- /dev/null +++ b/share/doc/psd/15.yacc/ssc @@ -0,0 +1,316 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ssc 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.SH +Appendix C: An Advanced Example +.PP +This Appendix gives an example of a grammar using some +of the advanced features discussed in Section 10. +The desk calculator example in Appendix A is +modified to provide a desk calculator that +does floating point interval arithmetic. +The calculator understands floating point +constants, the arithmetic operations +, \-, *, /, +unary \-, and = (assignment), and has 26 floating +point variables, ``a'' through ``z''. +Moreover, it also understands +.I intervals , +written +.DS + ( x , y ) +.DE +where +.I x +is less than or equal to +.I y . +There are 26 interval valued variables ``A'' through ``Z'' +that may also be used. +The usage is similar to that in Appendix A; assignments +return no value, and print nothing, while expressions print +the (floating or interval) value. +.PP +This example explores a number of interesting features +of Yacc and C. +Intervals are represented by a structure, consisting of the +left and right endpoint values, stored as +.I double 's. +This structure is given a type name, INTERVAL, by using +.I typedef . +The Yacc value stack can also contain floating point scalars, and +integers (used to index into the arrays holding the variable values). +Notice that this entire strategy depends strongly on being able to +assign structures and unions in C. +In fact, many of the actions call functions that return structures +as well. +.PP +It is also worth noting the use of YYERROR to handle error conditions: +division by an interval containing 0, and an interval presented in +the wrong order. +In effect, the error recovery mechanism of Yacc is used to throw away the +rest of the offending line. +.PP +In addition to the mixing of types on the value stack, +this grammar also demonstrates an interesting use of syntax to +keep track of the type (e.g. scalar or interval) of intermediate +expressions. +Note that a scalar can be automatically promoted to an interval if +the context demands an interval value. +This causes a large number of conflicts when the grammar is run through +Yacc: 18 Shift/Reduce and 26 Reduce/Reduce. +The problem can be seen by looking at the two input lines: +.DS + 2.5 + ( 3.5 \- 4. ) +.DE +and +.DS + 2.5 + ( 3.5 , 4. ) +.DE +Notice that the 2.5 is to be used in an interval valued expression +in the second example, but this fact is not known until +the ``,'' is read; by this time, 2.5 is finished, and the parser cannot go back +and change its mind. +More generally, it might be necessary to look ahead an arbitrary number of +tokens to decide whether to convert a scalar to an interval. +This problem is evaded by having two rules for each binary interval +valued operator: one when the left operand is a scalar, and one when +the left operand is an interval. +In the second case, the right operand must be an interval, +so the conversion will be applied automatically. +Despite this evasion, there are still many cases where the +conversion may be applied or not, leading to the above +conflicts. +They are resolved by listing the rules that yield scalars first +in the specification file; in this way, the conflicts will +be resolved in the direction of keeping scalar +valued expressions scalar valued until they are forced to become +intervals. +.PP +This way of handling multiple types is very instructive, but not very general. +If there were many kinds of expression types, instead of just two, +the number of rules needed would increase dramatically, and the conflicts +even more dramatically. +Thus, while this example is instructive, it is better practice in a +more normal programming language environment to +keep the type information as part of the value, and not as part +of the grammar. +.PP +Finally, a word about the lexical analysis. +The only unusual feature is the treatment of floating point constants. +The C library routine +.I atof +is used to do the actual conversion from a character string +to a double precision value. +If the lexical analyzer detects an error, +it responds by returning a token that +is illegal in the grammar, provoking a syntax error +in the parser, and thence error recovery. +.LD + +%{ + +# include +# include + +typedef struct interval { + double lo, hi; + } INTERVAL; + +INTERVAL vmul(), vdiv(); + +double atof(); + +double dreg[ 26 ]; +INTERVAL vreg[ 26 ]; + +%} + +%start lines + +%union { + int ival; + double dval; + INTERVAL vval; + } + +%token DREG VREG /* indices into dreg, vreg arrays */ + +%token CONST /* floating point constant */ + +%type dexp /* expression */ + +%type vexp /* interval expression */ + + /* precedence information about the operators */ + +%left \'+\' \'\-\' +%left \'*\' \'/\' +%left UMINUS /* precedence for unary minus */ + +%% + +lines : /* empty */ + | lines line + ; + +line : dexp \'\en\' + { printf( "%15.8f\en", $1 ); } + | vexp \'\en\' + { printf( "(%15.8f , %15.8f )\en", $1.lo, $1.hi ); } + | DREG \'=\' dexp \'\en\' + { dreg[$1] = $3; } + | VREG \'=\' vexp \'\en\' + { vreg[$1] = $3; } + | error \'\en\' + { yyerrok; } + ; + +dexp : CONST + | DREG + { $$ = dreg[$1]; } + | dexp \'+\' dexp + { $$ = $1 + $3; } + | dexp \'\-\' dexp + { $$ = $1 \- $3; } + | dexp \'*\' dexp + { $$ = $1 * $3; } + | dexp \'/\' dexp + { $$ = $1 / $3; } + | \'\-\' dexp %prec UMINUS + { $$ = \- $2; } + | \'(\' dexp \')\' + { $$ = $2; } + ; + +vexp : dexp + { $$.hi = $$.lo = $1; } + | \'(\' dexp \',\' dexp \')\' + { + $$.lo = $2; + $$.hi = $4; + if( $$.lo > $$.hi ){ + printf( "interval out of order\en" ); + YYERROR; + } + } + | VREG + { $$ = vreg[$1]; } + | vexp \'+\' vexp + { $$.hi = $1.hi + $3.hi; + $$.lo = $1.lo + $3.lo; } + | dexp \'+\' vexp + { $$.hi = $1 + $3.hi; + $$.lo = $1 + $3.lo; } + | vexp \'\-\' vexp + { $$.hi = $1.hi \- $3.lo; + $$.lo = $1.lo \- $3.hi; } + | dexp \'\-\' vexp + { $$.hi = $1 \- $3.lo; + $$.lo = $1 \- $3.hi; } + | vexp \'*\' vexp + { $$ = vmul( $1.lo, $1.hi, $3 ); } + | dexp \'*\' vexp + { $$ = vmul( $1, $1, $3 ); } + | vexp \'/\' vexp + { if( dcheck( $3 ) ) YYERROR; + $$ = vdiv( $1.lo, $1.hi, $3 ); } + | dexp \'/\' vexp + { if( dcheck( $3 ) ) YYERROR; + $$ = vdiv( $1, $1, $3 ); } + | \'\-\' vexp %prec UMINUS + { $$.hi = \-$2.lo; $$.lo = \-$2.hi; } + | \'(\' vexp \')\' + { $$ = $2; } + ; + +%% + +# define BSZ 50 /* buffer size for floating point numbers */ + + /* lexical analysis */ + +yylex(){ + register c; + + while( (c=getchar()) == \' \' ){ /* skip over blanks */ } + + if( isupper( c ) ){ + yylval.ival = c \- \'A\'; + return( VREG ); + } + if( islower( c ) ){ + yylval.ival = c \- \'a\'; + return( DREG ); + } + + if( isdigit( c ) || c==\'.\' ){ + /* gobble up digits, points, exponents */ + + char buf[BSZ+1], *cp = buf; + int dot = 0, exp = 0; + + for( ; (cp\-buf)= BSZ ) printf( "constant too long: truncated\en" ); + else ungetc( c, stdin ); /* push back last char read */ + yylval.dval = atof( buf ); + return( CONST ); + } + return( c ); + } + +INTERVAL hilo( a, b, c, d ) double a, b, c, d; { + /* returns the smallest interval containing a, b, c, and d */ + /* used by *, / routines */ + INTERVAL v; + + if( a>b ) { v.hi = a; v.lo = b; } + else { v.hi = b; v.lo = a; } + + if( c>d ) { + if( c>v.hi ) v.hi = c; + if( dv.hi ) v.hi = d; + if( c= 0. && v.lo <= 0. ){ + printf( "divisor interval contains 0.\en" ); + return( 1 ); + } + return( 0 ); + } + +INTERVAL vdiv( a, b, v ) double a, b; INTERVAL v; { + return( hilo( a/v.hi, a/v.lo, b/v.hi, b/v.lo ) ); + } +.DE +.bp diff --git a/share/doc/psd/15.yacc/ssd b/share/doc/psd/15.yacc/ssd new file mode 100644 index 0000000..872152f --- /dev/null +++ b/share/doc/psd/15.yacc/ssd @@ -0,0 +1,45 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ssd 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +Appendix D: Old Features Supported but not Encouraged +.PP +This Appendix mentions synonyms and features which are supported for historical +continuity, but, for various reasons, are not encouraged. +.IP 1. +Literals may also be delimited by double quotes ``"''. +.IP 2. +Literals may be more than one character long. +If all the characters are alphabetic, numeric, or \_, the type number of the literal is defined, +just as if the literal did not have the quotes around it. +Otherwise, it is difficult to find the value for such literals. +.IP +The use of multi-character literals is likely to mislead those unfamiliar with +Yacc, since it suggests that Yacc is doing a job which must be actually done by the lexical analyzer. +.IP 3. +Most places where % is legal, backslash ``\e'' may be used. +In particular, \e\e is the same as %%, \eleft the same as %left, etc. +.IP 4. +There are a number of other synonyms: +.DS +%< is the same as %left +%> is the same as %right +%binary and %2 are the same as %nonassoc +%0 and %term are the same as %token +%= is the same as %prec +.DE +.IP 5. +Actions may also have the form +.DS +={ . . . } +.DE +and the curly braces can be dropped if the action is a +single C statement. +.IP 6. +C code between %{ and %} used to be permitted at the +head of the rules section, as well as in the +declaration section. -- cgit v1.1