summaryrefslogtreecommitdiffstats
path: root/usr.bin/lex
diff options
context:
space:
mode:
authorpeter <peter@FreeBSD.org>1995-12-30 19:02:48 +0000
committerpeter <peter@FreeBSD.org>1995-12-30 19:02:48 +0000
commite9243aa42348dfb93efdf470f5feedf6d53fab58 (patch)
treec30cce5879822c9af94a82d89d96e73eef98e58f /usr.bin/lex
parent44852ef08668a7c6fb78323d12f3234aadfff3db (diff)
downloadFreeBSD-src-e9243aa42348dfb93efdf470f5feedf6d53fab58.zip
FreeBSD-src-e9243aa42348dfb93efdf470f5feedf6d53fab58.tar.gz
recording cvs-1.6 file death
Diffstat (limited to 'usr.bin/lex')
-rw-r--r--usr.bin/lex/flex.11001
-rw-r--r--usr.bin/lex/flexdoc.13045
2 files changed, 0 insertions, 4046 deletions
diff --git a/usr.bin/lex/flex.1 b/usr.bin/lex/flex.1
deleted file mode 100644
index 6aba4d6..0000000
--- a/usr.bin/lex/flex.1
+++ /dev/null
@@ -1,1001 +0,0 @@
-.TH FLEX 1 "November 1993" "Version 2.4"
-.SH NAME
-flex \- fast lexical analyzer generator
-.SH SYNOPSIS
-.B flex
-.B [\-bcdfhilnpstvwBFILTV78+ \-C[aefFmr] \-Pprefix \-Sskeleton]
-.I [filename ...]
-.SH DESCRIPTION
-.I flex
-is a tool for generating
-.I scanners:
-programs which recognized lexical patterns in text.
-.I flex
-reads
-the given input files, or its standard input if no file names are given,
-for a description of a scanner to generate. The description is in
-the form of pairs
-of regular expressions and C code, called
-.I rules. flex
-generates as output a C source file,
-.B lex.yy.c,
-which defines a routine
-.B yylex().
-This file is compiled and linked with the
-.B \-lfl
-library to produce an executable. When the executable is run,
-it analyzes its input for occurrences
-of the regular expressions. Whenever it finds one, it executes
-the corresponding C code.
-.PP
-For full documentation, see
-.B flexdoc(1).
-This manual entry is intended for use as a quick reference.
-.SH OPTIONS
-.I flex
-has the following options:
-.TP
-.B \-b
-generate backing-up information to
-.I lex.backup.
-This is a list of scanner states which require backing up and the input
-characters on which they do so. By adding rules one can remove
-backing-up states. If all backing-up states are eliminated and
-.B \-Cf
-or
-.B \-CF
-is used, the generated scanner will run faster.
-.TP
-.B \-c
-is a do-nothing, deprecated option included for POSIX compliance.
-.IP
-.B NOTE:
-in previous releases of
-.I flex
-.B \-c
-specified table-compression options. This functionality is
-now given by the
-.B \-C
-flag. To ease the the impact of this change, when
-.I flex
-encounters
-.B \-c,
-it currently issues a warning message and assumes that
-.B \-C
-was desired instead. In the future this "promotion" of
-.B \-c
-to
-.B \-C
-will go away in the name of full POSIX compliance (unless
-the POSIX meaning is removed first).
-.TP
-.B \-d
-makes the generated scanner run in
-.I debug
-mode. Whenever a pattern is recognized and the global
-.B yy_flex_debug
-is non-zero (which is the default), the scanner will
-write to
-.I stderr
-a line of the form:
-.nf
-
- --accepting rule at line 53 ("the matched text")
-
-.fi
-The line number refers to the location of the rule in the file
-defining the scanner (i.e., the file that was fed to flex). Messages
-are also generated when the scanner backs up, accepts the
-default rule, reaches the end of its input buffer (or encounters
-a NUL; the two look the same as far as the scanner's concerned),
-or reaches an end-of-file.
-.TP
-.B \-f
-specifies
-.I fast scanner.
-No table compression is done and stdio is bypassed.
-The result is large but fast. This option is equivalent to
-.B \-Cfr
-(see below).
-.TP
-.B \-h
-generates a "help" summary of
-.I flex's
-options to
-.I stderr
-and then exits.
-.TP
-.B \-i
-instructs
-.I flex
-to generate a
-.I case-insensitive
-scanner. The case of letters given in the
-.I flex
-input patterns will
-be ignored, and tokens in the input will be matched regardless of case. The
-matched text given in
-.I yytext
-will have the preserved case (i.e., it will not be folded).
-.TP
-.B \-l
-turns on maximum compatibility with the original AT&T lex implementation,
-at a considerable performance cost. This option is incompatible with
-.B \-+, \-f, \-F, \-Cf,
-or
-.B \-CF.
-See
-.I flexdoc(1)
-for details.
-.TP
-.B \-n
-is another do-nothing, deprecated option included only for
-POSIX compliance.
-.TP
-.B \-p
-generates a performance report to stderr. The report
-consists of comments regarding features of the
-.I flex
-input file which will cause a loss of performance in the resulting scanner.
-If you give the flag twice, you will also get comments regarding
-features that lead to minor performance losses.
-.TP
-.B \-s
-causes the
-.I default rule
-(that unmatched scanner input is echoed to
-.I stdout)
-to be suppressed. If the scanner encounters input that does not
-match any of its rules, it aborts with an error.
-.TP
-.B \-t
-instructs
-.I flex
-to write the scanner it generates to standard output instead
-of
-.B lex.yy.c.
-.TP
-.B \-v
-specifies that
-.I flex
-should write to
-.I stderr
-a summary of statistics regarding the scanner it generates.
-.TP
-.B \-w
-suppresses warning messages.
-.TP
-.B \-B
-instructs
-.I flex
-to generate a
-.I batch
-scanner instead of an
-.I interactive
-scanner (see
-.B \-I
-below). See
-.I flexdoc(1)
-for details. Scanners using
-.B \-Cf
-or
-.B \-CF
-compression options automatically specify this option, too.
-.TP
-.B \-F
-specifies that the
-.ul
-fast
-scanner table representation should be used (and stdio bypassed).
-This representation is about as fast as the full table representation
-.B (-f),
-and for some sets of patterns will be considerably smaller (and for
-others, larger). It cannot be used with the
-.B \-+
-option. See
-.B flexdoc(1)
-for more details.
-.IP
-This option is equivalent to
-.B \-CFr
-(see below).
-.TP
-.B \-I
-instructs
-.I flex
-to generate an
-.I interactive
-scanner, that is, a scanner which stops immediately rather than
-looking ahead if it knows
-that the currently scanned text cannot be part of a longer rule's match.
-This is the opposite of
-.I batch
-scanners (see
-.B \-B
-above). See
-.B flexdoc(1)
-for details.
-.IP
-Note,
-.B \-I
-cannot be used in conjunction with
-.I full
-or
-.I fast tables,
-i.e., the
-.B \-f, \-F, \-Cf,
-or
-.B \-CF
-flags. For other table compression options,
-.B \-I
-is the default.
-.TP
-.B \-L
-instructs
-.I flex
-not to generate
-.B #line
-directives in
-.B lex.yy.c.
-The default is to generate such directives so error
-messages in the actions will be correctly
-located with respect to the original
-.I flex
-input file, and not to
-the fairly meaningless line numbers of
-.B lex.yy.c.
-.TP
-.B \-T
-makes
-.I flex
-run in
-.I trace
-mode. It will generate a lot of messages to
-.I stderr
-concerning
-the form of the input and the resultant non-deterministic and deterministic
-finite automata. This option is mostly for use in maintaining
-.I flex.
-.TP
-.B \-V
-prints the version number to
-.I stderr
-and exits.
-.TP
-.B \-7
-instructs
-.I flex
-to generate a 7-bit scanner, which can save considerable table space,
-especially when using
-.B \-Cf
-or
-.B \-CF
-(and, at most sites,
-.B \-7
-is on by default for these options. To see if this is the case, use the
-.B -v
-verbose flag and check the flag summary it reports).
-.TP
-.B \-8
-instructs
-.I flex
-to generate an 8-bit scanner. This is the default except for the
-.B \-Cf
-and
-.B \-CF
-compression options, for which the default is site-dependent, and
-can be checked by inspecting the flag summary generated by the
-.B \-v
-option.
-.TP
-.B \-+
-specifies that you want flex to generate a C++
-scanner class. See the section on Generating C++ Scanners in
-.I flexdoc(1)
-for details.
-.TP
-.B \-C[aefFmr]
-controls the degree of table compression and scanner optimization.
-.IP
-.B \-Ca
-trade off larger tables in the generated scanner for faster performance
-because the elements of the tables are better aligned for memory access
-and computation. This option can double the size of the tables used by
-your scanner.
-.IP
-.B \-Ce
-directs
-.I flex
-to construct
-.I equivalence classes,
-i.e., sets of characters
-which have identical lexical properties.
-Equivalence classes usually give
-dramatic reductions in the final table/object file sizes (typically
-a factor of 2-5) and are pretty cheap performance-wise (one array
-look-up per character scanned).
-.IP
-.B \-Cf
-specifies that the
-.I full
-scanner tables should be generated -
-.I flex
-should not compress the
-tables by taking advantages of similar transition functions for
-different states.
-.IP
-.B \-CF
-specifies that the alternate fast scanner representation (described in
-.B flexdoc(1))
-should be used. This option cannot be used with
-.B \-+.
-.IP
-.B \-Cm
-directs
-.I flex
-to construct
-.I meta-equivalence classes,
-which are sets of equivalence classes (or characters, if equivalence
-classes are not being used) that are commonly used together. Meta-equivalence
-classes are often a big win when using compressed tables, but they
-have a moderate performance impact (one or two "if" tests and one
-array look-up per character scanned).
-.IP
-.B \-Cr
-causes the generated scanner to
-.I bypass
-using stdio for input. In general this option results in a minor
-performance gain only worthwhile if used in conjunction with
-.B \-Cf
-or
-.B \-CF.
-It can cause surprising behavior if you use stdio yourself to
-read from
-.I yyin
-prior to calling the scanner.
-.IP
-A lone
-.B \-C
-specifies that the scanner tables should be compressed but neither
-equivalence classes nor meta-equivalence classes should be used.
-.IP
-The options
-.B \-Cf
-or
-.B \-CF
-and
-.B \-Cm
-do not make sense together - there is no opportunity for meta-equivalence
-classes if the table is not being compressed. Otherwise the options
-may be freely mixed.
-.IP
-The default setting is
-.B \-Cem,
-which specifies that
-.I flex
-should generate equivalence classes
-and meta-equivalence classes. This setting provides the highest
-degree of table compression. You can trade off
-faster-executing scanners at the cost of larger tables with
-the following generally being true:
-.nf
-
- slowest & smallest
- -Cem
- -Cm
- -Ce
- -C
- -C{f,F}e
- -C{f,F}
- -C{f,F}a
- fastest & largest
-
-.fi
-.IP
-.B \-C
-options are cumulative.
-.TP
-.B \-Pprefix
-changes the default
-.I "yy"
-prefix used by
-.I flex
-to be
-.I prefix
-instead. See
-.I flexdoc(1)
-for a description of all the global variables and file names that
-this affects.
-.TP
-.B \-Sskeleton_file
-overrides the default skeleton file from which
-.I flex
-constructs its scanners. You'll never need this option unless you are doing
-.I flex
-maintenance or development.
-.SH SUMMARY OF FLEX REGULAR EXPRESSIONS
-The patterns in the input are written using an extended set of regular
-expressions. These are:
-.nf
-
- x match the character 'x'
- . any character except newline
- [xyz] a "character class"; in this case, the pattern
- matches either an 'x', a 'y', or a 'z'
- [abj-oZ] a "character class" with a range in it; matches
- an 'a', a 'b', any letter from 'j' through 'o',
- or a 'Z'
- [^A-Z] a "negated character class", i.e., any character
- but those in the class. In this case, any
- character EXCEPT an uppercase letter.
- [^A-Z\\n] any character EXCEPT an uppercase letter or
- a newline
- r* zero or more r's, where r is any regular expression
- r+ one or more r's
- r? zero or one r's (that is, "an optional r")
- r{2,5} anywhere from two to five r's
- r{2,} two or more r's
- r{4} exactly 4 r's
- {name} the expansion of the "name" definition
- (see above)
- "[xyz]\\"foo"
- the literal string: [xyz]"foo
- \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
- then the ANSI-C interpretation of \\x.
- Otherwise, a literal 'X' (used to escape
- operators such as '*')
- \\123 the character with octal value 123
- \\x2a the character with hexadecimal value 2a
- (r) match an r; parentheses are used to override
- precedence (see below)
-
-
- rs the regular expression r followed by the
- regular expression s; called "concatenation"
-
-
- r|s either an r or an s
-
-
- r/s an r but only if it is followed by an s. The
- s is not part of the matched text. This type
- of pattern is called as "trailing context".
- ^r an r, but only at the beginning of a line
- r$ an r, but only at the end of a line. Equivalent
- to "r/\\n".
-
-
- <s>r an r, but only in start condition s (see
- below for discussion of start conditions)
- <s1,s2,s3>r
- same, but in any of start conditions s1,
- s2, or s3
- <*>r an r in any start condition, even an exclusive one.
-
-
- <<EOF>> an end-of-file
- <s1,s2><<EOF>>
- an end-of-file when in start condition s1 or s2
-
-.fi
-The regular expressions listed above are grouped according to
-precedence, from highest precedence at the top to lowest at the bottom.
-Those grouped together have equal precedence.
-.PP
-Some notes on patterns:
-.IP -
-Negated character classes
-.I match newlines
-unless "\\n" (or an equivalent escape sequence) is one of the
-characters explicitly present in the negated character class
-(e.g., "[^A-Z\\n]").
-.IP -
-A rule can have at most one instance of trailing context (the '/' operator
-or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
-can only occur at the beginning of a pattern, and, as well as with '/' and '$',
-cannot be grouped inside parentheses. The following are all illegal:
-.nf
-
- foo/bar$
- foo|(bar$)
- foo|^bar
- <sc1>foo<sc2>bar
-
-.fi
-.SH SUMMARY OF SPECIAL ACTIONS
-In addition to arbitrary C code, the following can appear in actions:
-.IP -
-.B ECHO
-copies yytext to the scanner's output.
-.IP -
-.B BEGIN
-followed by the name of a start condition places the scanner in the
-corresponding start condition.
-.IP -
-.B REJECT
-directs the scanner to proceed on to the "second best" rule which matched the
-input (or a prefix of the input).
-.B yytext
-and
-.B yyleng
-are set up appropriately. Note that
-.B REJECT
-is a particularly expensive feature in terms scanner performance;
-if it is used in
-.I any
-of the scanner's actions it will slow down
-.I all
-of the scanner's matching. Furthermore,
-.B REJECT
-cannot be used with the
-.B \-f
-or
-.B \-F
-options.
-.IP
-Note also that unlike the other special actions,
-.B REJECT
-is a
-.I branch;
-code immediately following it in the action will
-.I not
-be executed.
-.IP -
-.B yymore()
-tells the scanner that the next time it matches a rule, the corresponding
-token should be
-.I appended
-onto the current value of
-.B yytext
-rather than replacing it.
-.IP -
-.B yyless(n)
-returns all but the first
-.I n
-characters of the current token back to the input stream, where they
-will be rescanned when the scanner looks for the next match.
-.B yytext
-and
-.B yyleng
-are adjusted appropriately (e.g.,
-.B yyleng
-will now be equal to
-.I n
-).
-.IP -
-.B unput(c)
-puts the character
-.I c
-back onto the input stream. It will be the next character scanned.
-.IP -
-.B input()
-reads the next character from the input stream (this routine is called
-.B yyinput()
-if the scanner is compiled using
-.B C++).
-.IP -
-.B yyterminate()
-can be used in lieu of a return statement in an action. It terminates
-the scanner and returns a 0 to the scanner's caller, indicating "all done".
-.IP
-By default,
-.B yyterminate()
-is also called when an end-of-file is encountered. It is a macro and
-may be redefined.
-.IP -
-.B YY_NEW_FILE
-is an action available only in <<EOF>> rules. It means "Okay, I've
-set up a new input file, continue scanning". It is no longer required;
-you can just assign
-.I yyin
-to point to a new file in the <<EOF>> action.
-.IP -
-.B yy_create_buffer( file, size )
-takes a
-.I FILE
-pointer and an integer
-.I size.
-It returns a YY_BUFFER_STATE
-handle to a new input buffer large enough to accomodate
-.I size
-characters and associated with the given file. When in doubt, use
-.B YY_BUF_SIZE
-for the size.
-.IP -
-.B yy_switch_to_buffer( new_buffer )
-switches the scanner's processing to scan for tokens from
-the given buffer, which must be a YY_BUFFER_STATE.
-.IP -
-.B yy_delete_buffer( buffer )
-deletes the given buffer.
-.SH VALUES AVAILABLE TO THE USER
-.IP -
-.B char *yytext
-holds the text of the current token. It may be modified but not lengthened
-(you cannot append characters to the end). Modifying the last character
-may affect the activity of rules anchored using '^' during the next scan;
-see
-.B flexdoc(1)
-for details.
-.IP
-If the special directive
-.B %array
-appears in the first section of the scanner description, then
-.B yytext
-is instead declared
-.B char yytext[YYLMAX],
-where
-.B YYLMAX
-is a macro definition that you can redefine in the first section
-if you don't like the default value (generally 8KB). Using
-.B %array
-results in somewhat slower scanners, but the value of
-.B yytext
-becomes immune to calls to
-.I input()
-and
-.I unput(),
-which potentially destroy its value when
-.B yytext
-is a character pointer. The opposite of
-.B %array
-is
-.B %pointer,
-which is the default.
-.IP
-You cannot use
-.B %array
-when generating C++ scanner classes
-(the
-.B \-+
-flag).
-.IP -
-.B int yyleng
-holds the length of the current token.
-.IP -
-.B FILE *yyin
-is the file which by default
-.I flex
-reads from. It may be redefined but doing so only makes sense before
-scanning begins or after an EOF has been encountered. Changing it in
-the midst of scanning will have unexpected results since
-.I flex
-buffers its input; use
-.B yyrestart()
-instead.
-Once scanning terminates because an end-of-file
-has been seen,
-.B
-you can assign
-.I yyin
-at the new input file and then call the scanner again to continue scanning.
-.IP -
-.B void yyrestart( FILE *new_file )
-may be called to point
-.I yyin
-at the new input file. The switch-over to the new file is immediate
-(any previously buffered-up input is lost). Note that calling
-.B yyrestart()
-with
-.I yyin
-as an argument thus throws away the current input buffer and continues
-scanning the same input file.
-.IP -
-.B FILE *yyout
-is the file to which
-.B ECHO
-actions are done. It can be reassigned by the user.
-.IP -
-.B YY_CURRENT_BUFFER
-returns a
-.B YY_BUFFER_STATE
-handle to the current buffer.
-.IP -
-.B YY_START
-returns an integer value corresponding to the current start
-condition. You can subsequently use this value with
-.B BEGIN
-to return to that start condition.
-.SH MACROS AND FUNCTIONS YOU CAN REDEFINE
-.IP -
-.B YY_DECL
-controls how the scanning routine is declared.
-By default, it is "int yylex()", or, if prototypes are being
-used, "int yylex(void)". This definition may be changed by redefining
-the "YY_DECL" macro. Note that
-if you give arguments to the scanning routine using a
-K&R-style/non-prototyped function declaration, you must terminate
-the definition with a semi-colon (;).
-.IP -
-The nature of how the scanner
-gets its input can be controlled by redefining the
-.B YY_INPUT
-macro.
-YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
-action is to place up to
-.I max_size
-characters in the character array
-.I buf
-and return in the integer variable
-.I result
-either the
-number of characters read or the constant YY_NULL (0 on Unix systems)
-to indicate EOF. The default YY_INPUT reads from the
-global file-pointer "yyin".
-A sample redefinition of YY_INPUT (in the definitions
-section of the input file):
-.nf
-
- %{
- #undef YY_INPUT
- #define YY_INPUT(buf,result,max_size) \\
- { \\
- int c = getchar(); \\
- result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
- }
- %}
-
-.fi
-.IP -
-When the scanner receives an end-of-file indication from YY_INPUT,
-it then checks the function
-.B yywrap()
-function. If
-.B yywrap()
-returns false (zero), then it is assumed that the
-function has gone ahead and set up
-.I yyin
-to point to another input file, and scanning continues. If it returns
-true (non-zero), then the scanner terminates, returning 0 to its
-caller.
-.IP
-The default
-.B yywrap()
-always returns 1.
-.IP -
-YY_USER_ACTION
-can be redefined to provide an action
-which is always executed prior to the matched rule's action.
-.IP -
-The macro
-.B YY_USER_INIT
-may be redefined to provide an action which is always executed before
-the first scan.
-.IP -
-In the generated scanner, the actions are all gathered in one large
-switch statement and separated using
-.B YY_BREAK,
-which may be redefined. By default, it is simply a "break", to separate
-each rule's action from the following rule's.
-.SH FILES
-.TP
-.B \-lfl
-library with which to link scanners to obtain the default versions
-of
-.I yywrap()
-and/or
-.I main().
-.TP
-.I lex.yy.c
-generated scanner (called
-.I lexyy.c
-on some systems).
-.TP
-.I lex.yy.cc
-generated C++ scanner class, when using
-.B -+.
-.TP
-.I <FlexLexer.h>
-header file defining the C++ scanner base class,
-.B FlexLexer,
-and its derived class,
-.B yyFlexLexer.
-.TP
-.I flex.skl
-skeleton scanner. This file is only used when building flex, not when
-flex executes.
-.TP
-.I lex.backup
-backing-up information for
-.B \-b
-flag (called
-.I lex.bck
-on some systems).
-.SH "SEE ALSO"
-.PP
-flexdoc(1), lex(1), yacc(1), sed(1), awk(1).
-.PP
-M. E. Lesk and E. Schmidt,
-.I LEX \- Lexical Analyzer Generator
-.SH DIAGNOSTICS
-.PP
-.I reject_used_but_not_detected undefined
-or
-.PP
-.I yymore_used_but_not_detected undefined -
-These errors can occur at compile time. They indicate that the
-scanner uses
-.B REJECT
-or
-.B yymore()
-but that
-.I flex
-failed to notice the fact, meaning that
-.I flex
-scanned the first two sections looking for occurrences of these actions
-and failed to find any, but somehow you snuck some in (via a #include
-file, for example). Make an explicit reference to the action in your
-.I flex
-input file. (Note that previously
-.I flex
-supported a
-.B %used/%unused
-mechanism for dealing with this problem; this feature is still supported
-but now deprecated, and will go away soon unless the author hears from
-people who can argue compellingly that they need it.)
-.PP
-.I flex scanner jammed -
-a scanner compiled with
-.B \-s
-has encountered an input string which wasn't matched by
-any of its rules.
-.PP
-.I warning, rule cannot be matched
-indicates that the given rule
-cannot be matched because it follows other rules that will
-always match the same text as it. See
-.I flexdoc(1)
-for an example.
-.PP
-.I warning,
-.B \-s
-.I
-option given but default rule can be matched
-means that it is possible (perhaps only in a particular start condition)
-that the default rule (match any single character) is the only one
-that will match a particular input. Since
-.PP
-.I scanner input buffer overflowed -
-a scanner rule matched more text than the available dynamic memory.
-.PP
-.I token too large, exceeds YYLMAX -
-your scanner uses
-.B %array
-and one of its rules matched a string longer than the
-.B YYLMAX
-constant (8K bytes by default). You can increase the value by
-#define'ing
-.B YYLMAX
-in the definitions section of your
-.I flex
-input.
-.PP
-.I scanner requires \-8 flag to
-.I use the character 'x' -
-Your scanner specification includes recognizing the 8-bit character
-.I 'x'
-and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
-because you used the
-.B \-Cf
-or
-.B \-CF
-table compression options.
-.PP
-.I flex scanner push-back overflow -
-you used
-.B unput()
-to push back so much text that the scanner's buffer could not hold
-both the pushed-back text and the current token in
-.B yytext.
-Ideally the scanner should dynamically resize the buffer in this case, but at
-present it does not.
-.PP
-.I
-input buffer overflow, can't enlarge buffer because scanner uses REJECT -
-the scanner was working on matching an extremely large token and needed
-to expand the input buffer. This doesn't work with scanners that use
-.B
-REJECT.
-.PP
-.I
-fatal flex scanner internal error--end of buffer missed -
-This can occur in an scanner which is reentered after a long-jump
-has jumped out (or over) the scanner's activation frame. Before
-reentering the scanner, use:
-.nf
-
- yyrestart( yyin );
-
-.fi
-or use C++ scanner classes (the
-.B \-+
-option), which are fully reentrant.
-.SH AUTHOR
-Vern Paxson, with the help of many ideas and much inspiration from
-Van Jacobson. Original version by Jef Poskanzer.
-.PP
-See flexdoc(1) for additional credits and the address to send comments to.
-.SH DEFICIENCIES / BUGS
-.PP
-Some trailing context
-patterns cannot be properly matched and generate
-warning messages ("dangerous trailing context"). These are
-patterns where the ending of the
-first part of the rule matches the beginning of the second
-part, such as "zx*/xy*", where the 'x*' matches the 'x' at
-the beginning of the trailing context. (Note that the POSIX draft
-states that the text matched by such patterns is undefined.)
-.PP
-For some trailing context rules, parts which are actually fixed-length are
-not recognized as such, leading to the abovementioned performance loss.
-In particular, parts using '|' or {n} (such as "foo{3}") are always
-considered variable-length.
-.PP
-Combining trailing context with the special '|' action can result in
-.I fixed
-trailing context being turned into the more expensive
-.I variable
-trailing context. For example, in the following:
-.nf
-
- %%
- abc |
- xyz/def
-
-.fi
-.PP
-Use of
-.B unput()
-or
-.B input()
-invalidates yytext and yyleng, unless the
-.B %array
-directive
-or the
-.B \-l
-option has been used.
-.PP
-Use of unput() to push back more text than was matched can
-result in the pushed-back text matching a beginning-of-line ('^')
-rule even though it didn't come at the beginning of the line
-(though this is rare!).
-.PP
-Pattern-matching of NUL's is substantially slower than matching other
-characters.
-.PP
-Dynamic resizing of the input buffer is slow, as it entails rescanning
-all the text matched so far by the current (generally huge) token.
-.PP
-.I flex
-does not generate correct #line directives for code internal
-to the scanner; thus, bugs in
-.I flex.skl
-yield bogus line numbers.
-.PP
-Due to both buffering of input and read-ahead, you cannot intermix
-calls to <stdio.h> routines, such as, for example,
-.B getchar(),
-with
-.I flex
-rules and expect it to work. Call
-.B input()
-instead.
-.PP
-The total table entries listed by the
-.B \-v
-flag excludes the number of table entries needed to determine
-what rule has been matched. The number of entries is equal
-to the number of DFA states if the scanner does not use
-.B REJECT,
-and somewhat greater than the number of states if it does.
-.PP
-.B REJECT
-cannot be used with the
-.B \-f
-or
-.B \-F
-options.
-.PP
-The
-.I flex
-internal algorithms need documentation.
diff --git a/usr.bin/lex/flexdoc.1 b/usr.bin/lex/flexdoc.1
deleted file mode 100644
index b80d569..0000000
--- a/usr.bin/lex/flexdoc.1
+++ /dev/null
@@ -1,3045 +0,0 @@
-.TH FLEXDOC 1 "November 1993" "Version 2.4"
-.SH NAME
-flexdoc \- documentation for flex, fast lexical analyzer generator
-.SH SYNOPSIS
-.B flex
-.B [\-bcdfhilnpstvwBFILTV78+ \-C[aefFmr] \-Pprefix \-Sskeleton]
-.I [filename ...]
-.SH DESCRIPTION
-.I flex
-is a tool for generating
-.I scanners:
-programs which recognized lexical patterns in text.
-.I flex
-reads
-the given input files, or its standard input if no file names are given,
-for a description of a scanner to generate. The description is in
-the form of pairs
-of regular expressions and C code, called
-.I rules. flex
-generates as output a C source file,
-.B lex.yy.c,
-which defines a routine
-.B yylex().
-This file is compiled and linked with the
-.B \-lfl
-library to produce an executable. When the executable is run,
-it analyzes its input for occurrences
-of the regular expressions. Whenever it finds one, it executes
-the corresponding C code.
-.SH SOME SIMPLE EXAMPLES
-.PP
-First some simple examples to get the flavor of how one uses
-.I flex.
-The following
-.I flex
-input specifies a scanner which whenever it encounters the string
-"username" will replace it with the user's login name:
-.nf
-
- %%
- username printf( "%s", getlogin() );
-
-.fi
-By default, any text not matched by a
-.I flex
-scanner
-is copied to the output, so the net effect of this scanner is
-to copy its input file to its output with each occurrence
-of "username" expanded.
-In this input, there is just one rule. "username" is the
-.I pattern
-and the "printf" is the
-.I action.
-The "%%" marks the beginning of the rules.
-.PP
-Here's another simple example:
-.nf
-
- int num_lines = 0, num_chars = 0;
-
- %%
- \\n ++num_lines; ++num_chars;
- . ++num_chars;
-
- %%
- main()
- {
- yylex();
- printf( "# of lines = %d, # of chars = %d\\n",
- num_lines, num_chars );
- }
-
-.fi
-This scanner counts the number of characters and the number
-of lines in its input (it produces no output other than the
-final report on the counts). The first line
-declares two globals, "num_lines" and "num_chars", which are accessible
-both inside
-.B yylex()
-and in the
-.B main()
-routine declared after the second "%%". There are two rules, one
-which matches a newline ("\\n") and increments both the line count and
-the character count, and one which matches any character other than
-a newline (indicated by the "." regular expression).
-.PP
-A somewhat more complicated example:
-.nf
-
- /* scanner for a toy Pascal-like language */
-
- %{
- /* need this for the call to atof() below */
- #include <math.h>
- %}
-
- DIGIT [0-9]
- ID [a-z][a-z0-9]*
-
- %%
-
- {DIGIT}+ {
- printf( "An integer: %s (%d)\\n", yytext,
- atoi( yytext ) );
- }
-
- {DIGIT}+"."{DIGIT}* {
- printf( "A float: %s (%g)\\n", yytext,
- atof( yytext ) );
- }
-
- if|then|begin|end|procedure|function {
- printf( "A keyword: %s\\n", yytext );
- }
-
- {ID} printf( "An identifier: %s\\n", yytext );
-
- "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
-
- "{"[^}\\n]*"}" /* eat up one-line comments */
-
- [ \\t\\n]+ /* eat up whitespace */
-
- . printf( "Unrecognized character: %s\\n", yytext );
-
- %%
-
- main( argc, argv )
- int argc;
- char **argv;
- {
- ++argv, --argc; /* skip over program name */
- if ( argc > 0 )
- yyin = fopen( argv[0], "r" );
- else
- yyin = stdin;
-
- yylex();
- }
-
-.fi
-This is the beginnings of a simple scanner for a language like
-Pascal. It identifies different types of
-.I tokens
-and reports on what it has seen.
-.PP
-The details of this example will be explained in the following
-sections.
-.SH FORMAT OF THE INPUT FILE
-The
-.I flex
-input file consists of three sections, separated by a line with just
-.B %%
-in it:
-.nf
-
- definitions
- %%
- rules
- %%
- user code
-
-.fi
-The
-.I definitions
-section contains declarations of simple
-.I name
-definitions to simplify the scanner specification, and declarations of
-.I start conditions,
-which are explained in a later section.
-.PP
-Name definitions have the form:
-.nf
-
- name definition
-
-.fi
-The "name" is a word beginning with a letter or an underscore ('_')
-followed by zero or more letters, digits, '_', or '-' (dash).
-The definition is taken to begin at the first non-white-space character
-following the name and continuing to the end of the line.
-The definition can subsequently be referred to using "{name}", which
-will expand to "(definition)". For example,
-.nf
-
- DIGIT [0-9]
- ID [a-z][a-z0-9]*
-
-.fi
-defines "DIGIT" to be a regular expression which matches a
-single digit, and
-"ID" to be a regular expression which matches a letter
-followed by zero-or-more letters-or-digits.
-A subsequent reference to
-.nf
-
- {DIGIT}+"."{DIGIT}*
-
-.fi
-is identical to
-.nf
-
- ([0-9])+"."([0-9])*
-
-.fi
-and matches one-or-more digits followed by a '.' followed
-by zero-or-more digits.
-.PP
-The
-.I rules
-section of the
-.I flex
-input contains a series of rules of the form:
-.nf
-
- pattern action
-
-.fi
-where the pattern must be unindented and the action must begin
-on the same line.
-.PP
-See below for a further description of patterns and actions.
-.PP
-Finally, the user code section is simply copied to
-.B lex.yy.c
-verbatim.
-It is used for companion routines which call or are called
-by the scanner. The presence of this section is optional;
-if it is missing, the second
-.B %%
-in the input file may be skipped, too.
-.PP
-In the definitions and rules sections, any
-.I indented
-text or text enclosed in
-.B %{
-and
-.B %}
-is copied verbatim to the output (with the %{}'s removed).
-The %{}'s must appear unindented on lines by themselves.
-.PP
-In the rules section,
-any indented or %{} text appearing before the
-first rule may be used to declare variables
-which are local to the scanning routine and (after the declarations)
-code which is to be executed whenever the scanning routine is entered.
-Other indented or %{} text in the rule section is still copied to the output,
-but its meaning is not well-defined and it may well cause compile-time
-errors (this feature is present for
-.I POSIX
-compliance; see below for other such features).
-.PP
-In the definitions section (but not in the rules section),
-an unindented comment (i.e., a line
-beginning with "/*") is also copied verbatim to the output up
-to the next "*/".
-.SH PATTERNS
-The patterns in the input are written using an extended set of regular
-expressions. These are:
-.nf
-
- x match the character 'x'
- . any character except newline
- [xyz] a "character class"; in this case, the pattern
- matches either an 'x', a 'y', or a 'z'
- [abj-oZ] a "character class" with a range in it; matches
- an 'a', a 'b', any letter from 'j' through 'o',
- or a 'Z'
- [^A-Z] a "negated character class", i.e., any character
- but those in the class. In this case, any
- character EXCEPT an uppercase letter.
- [^A-Z\\n] any character EXCEPT an uppercase letter or
- a newline
- r* zero or more r's, where r is any regular expression
- r+ one or more r's
- r? zero or one r's (that is, "an optional r")
- r{2,5} anywhere from two to five r's
- r{2,} two or more r's
- r{4} exactly 4 r's
- {name} the expansion of the "name" definition
- (see above)
- "[xyz]\\"foo"
- the literal string: [xyz]"foo
- \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
- then the ANSI-C interpretation of \\x.
- Otherwise, a literal 'X' (used to escape
- operators such as '*')
- \\123 the character with octal value 123
- \\x2a the character with hexadecimal value 2a
- (r) match an r; parentheses are used to override
- precedence (see below)
-
-
- rs the regular expression r followed by the
- regular expression s; called "concatenation"
-
-
- r|s either an r or an s
-
-
- r/s an r but only if it is followed by an s. The
- s is not part of the matched text. This type
- of pattern is called as "trailing context".
- ^r an r, but only at the beginning of a line
- r$ an r, but only at the end of a line. Equivalent
- to "r/\\n".
-
-
- <s>r an r, but only in start condition s (see
- below for discussion of start conditions)
- <s1,s2,s3>r
- same, but in any of start conditions s1,
- s2, or s3
- <*>r an r in any start condition, even an exclusive one.
-
-
- <<EOF>> an end-of-file
- <s1,s2><<EOF>>
- an end-of-file when in start condition s1 or s2
-
-.fi
-Note that inside of a character class, all regular expression operators
-lose their special meaning except escape ('\\') and the character class
-operators, '-', ']', and, at the beginning of the class, '^'.
-.PP
-The regular expressions listed above are grouped according to
-precedence, from highest precedence at the top to lowest at the bottom.
-Those grouped together have equal precedence. For example,
-.nf
-
- foo|bar*
-
-.fi
-is the same as
-.nf
-
- (foo)|(ba(r*))
-
-.fi
-since the '*' operator has higher precedence than concatenation,
-and concatenation higher than alternation ('|'). This pattern
-therefore matches
-.I either
-the string "foo"
-.I or
-the string "ba" followed by zero-or-more r's.
-To match "foo" or zero-or-more "bar"'s, use:
-.nf
-
- foo|(bar)*
-
-.fi
-and to match zero-or-more "foo"'s-or-"bar"'s:
-.nf
-
- (foo|bar)*
-
-.fi
-.PP
-Some notes on patterns:
-.IP -
-A negated character class such as the example "[^A-Z]"
-above
-.I will match a newline
-unless "\\n" (or an equivalent escape sequence) is one of the
-characters explicitly present in the negated character class
-(e.g., "[^A-Z\\n]"). This is unlike how many other regular
-expression tools treat negated character classes, but unfortunately
-the inconsistency is historically entrenched.
-Matching newlines means that a pattern like [^"]* can match the entire
-input unless there's another quote in the input.
-.IP -
-A rule can have at most one instance of trailing context (the '/' operator
-or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
-can only occur at the beginning of a pattern, and, as well as with '/' and '$',
-cannot be grouped inside parentheses. A '^' which does not occur at
-the beginning of a rule or a '$' which does not occur at the end of
-a rule loses its special properties and is treated as a normal character.
-.IP
-The following are illegal:
-.nf
-
- foo/bar$
- <sc1>foo<sc2>bar
-
-.fi
-Note that the first of these, can be written "foo/bar\\n".
-.IP
-The following will result in '$' or '^' being treated as a normal character:
-.nf
-
- foo|(bar$)
- foo|^bar
-
-.fi
-If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
-could be used (the special '|' action is explained below):
-.nf
-
- foo |
- bar$ /* action goes here */
-
-.fi
-A similar trick will work for matching a foo or a
-bar-at-the-beginning-of-a-line.
-.SH HOW THE INPUT IS MATCHED
-When the generated scanner is run, it analyzes its input looking
-for strings which match any of its patterns. If it finds more than
-one match, it takes the one matching the most text (for trailing
-context rules, this includes the length of the trailing part, even
-though it will then be returned to the input). If it finds two
-or more matches of the same length, the
-rule listed first in the
-.I flex
-input file is chosen.
-.PP
-Once the match is determined, the text corresponding to the match
-(called the
-.I token)
-is made available in the global character pointer
-.B yytext,
-and its length in the global integer
-.B yyleng.
-The
-.I action
-corresponding to the matched pattern is then executed (a more
-detailed description of actions follows), and then the remaining
-input is scanned for another match.
-.PP
-If no match is found, then the
-.I default rule
-is executed: the next character in the input is considered matched and
-copied to the standard output. Thus, the simplest legal
-.I flex
-input is:
-.nf
-
- %%
-
-.fi
-which generates a scanner that simply copies its input (one character
-at a time) to its output.
-.PP
-Note that
-.B yytext
-can be defined in two different ways: either as a character
-.I pointer
-or as a character
-.I array.
-You can control which definition
-.I flex
-uses by including one of the special directives
-.B %pointer
-or
-.B %array
-in the first (definitions) section of your flex input. The default is
-.B %pointer,
-unless you use the
-.B -l
-lex compatibility option, in which case
-.B yytext
-will be an array.
-The advantage of using
-.B %pointer
-is substantially faster scanning and no buffer overflow when matching
-very large tokens (unless you run out of dynamic memory). The disadvantage
-is that you are restricted in how your actions can modify
-.B yytext
-(see the next section), and calls to the
-.B input()
-and
-.B unput()
-functions destroy the present contents of
-.B yytext,
-which can be a considerable porting headache when moving between different
-.I lex
-versions.
-.PP
-The advantage of
-.B %array
-is that you can then modify
-.B yytext
-to your heart's content, and calls to
-.B input()
-and
-.B unput()
-do not destroy
-.B yytext
-(see below). Furthermore, existing
-.I lex
-programs sometimes access
-.B yytext
-externally using declarations of the form:
-.nf
- extern char yytext[];
-.fi
-This definition is erroneous when used with
-.B %pointer,
-but correct for
-.B %array.
-.PP
-.B %array
-defines
-.B yytext
-to be an array of
-.B YYLMAX
-characters, which defaults to a fairly large value. You can change
-the size by simply #define'ing
-.B YYLMAX
-to a different value in the first section of your
-.I flex
-input. As mentioned above, with
-.B %pointer
-yytext grows dynamically to accomodate large tokens. While this means your
-.B %pointer
-scanner can accomodate very large tokens (such as matching entire blocks
-of comments), bear in mind that each time the scanner must resize
-.B yytext
-it also must rescan the entire token from the beginning, so matching such
-tokens can prove slow.
-.B yytext
-presently does
-.I not
-dynamically grow if a call to
-.B unput()
-results in too much text being pushed back; instead, a run-time error results.
-.PP
-Also note that you cannot use
-.B %array
-with C++ scanner classes
-(the
-.B \-+
-option; see below).
-.SH ACTIONS
-Each pattern in a rule has a corresponding action, which can be any
-arbitrary C statement. The pattern ends at the first non-escaped
-whitespace character; the remainder of the line is its action. If the
-action is empty, then when the pattern is matched the input token
-is simply discarded. For example, here is the specification for a program
-which deletes all occurrences of "zap me" from its input:
-.nf
-
- %%
- "zap me"
-
-.fi
-(It will copy all other characters in the input to the output since
-they will be matched by the default rule.)
-.PP
-Here is a program which compresses multiple blanks and tabs down to
-a single blank, and throws away whitespace found at the end of a line:
-.nf
-
- %%
- [ \\t]+ putchar( ' ' );
- [ \\t]+$ /* ignore this token */
-
-.fi
-.PP
-If the action contains a '{', then the action spans till the balancing '}'
-is found, and the action may cross multiple lines.
-.I flex
-knows about C strings and comments and won't be fooled by braces found
-within them, but also allows actions to begin with
-.B %{
-and will consider the action to be all the text up to the next
-.B %}
-(regardless of ordinary braces inside the action).
-.PP
-An action consisting solely of a vertical bar ('|') means "same as
-the action for the next rule." See below for an illustration.
-.PP
-Actions can include arbitrary C code, including
-.B return
-statements to return a value to whatever routine called
-.B yylex().
-Each time
-.B yylex()
-is called it continues processing tokens from where it last left
-off until it either reaches
-the end of the file or executes a return.
-.PP
-Actions are free to modify
-.B yytext
-except for lengthening it (adding
-characters to its end--these will overwrite later characters in the
-input stream). Modifying the final character of yytext may alter
-whether when scanning resumes rules anchored with '^' are active.
-Specifically, changing the final character of yytext to a newline will
-activate such rules on the next scan, and changing it to anything else
-will deactivate the rules. Users should not rely on this behavior being
-present in future releases. Finally, note that none of this paragraph
-applies when using
-.B %array
-(see above).
-.PP
-Actions are free to modify
-.B yyleng
-except they should not do so if the action also includes use of
-.B yymore()
-(see below).
-.PP
-There are a number of special directives which can be included within
-an action:
-.IP -
-.B ECHO
-copies yytext to the scanner's output.
-.IP -
-.B BEGIN
-followed by the name of a start condition places the scanner in the
-corresponding start condition (see below).
-.IP -
-.B REJECT
-directs the scanner to proceed on to the "second best" rule which matched the
-input (or a prefix of the input). The rule is chosen as described
-above in "How the Input is Matched", and
-.B yytext
-and
-.B yyleng
-set up appropriately.
-It may either be one which matched as much text
-as the originally chosen rule but came later in the
-.I flex
-input file, or one which matched less text.
-For example, the following will both count the
-words in the input and call the routine special() whenever "frob" is seen:
-.nf
-
- int word_count = 0;
- %%
-
- frob special(); REJECT;
- [^ \\t\\n]+ ++word_count;
-
-.fi
-Without the
-.B REJECT,
-any "frob"'s in the input would not be counted as words, since the
-scanner normally executes only one action per token.
-Multiple
-.B REJECT's
-are allowed, each one finding the next best choice to the currently
-active rule. For example, when the following scanner scans the token
-"abcd", it will write "abcdabcaba" to the output:
-.nf
-
- %%
- a |
- ab |
- abc |
- abcd ECHO; REJECT;
- .|\\n /* eat up any unmatched character */
-
-.fi
-(The first three rules share the fourth's action since they use
-the special '|' action.)
-.B REJECT
-is a particularly expensive feature in terms scanner performance;
-if it is used in
-.I any
-of the scanner's actions it will slow down
-.I all
-of the scanner's matching. Furthermore,
-.B REJECT
-cannot be used with the
-.I -Cf
-or
-.I -CF
-options (see below).
-.IP
-Note also that unlike the other special actions,
-.B REJECT
-is a
-.I branch;
-code immediately following it in the action will
-.I not
-be executed.
-.IP -
-.B yymore()
-tells the scanner that the next time it matches a rule, the corresponding
-token should be
-.I appended
-onto the current value of
-.B yytext
-rather than replacing it. For example, given the input "mega-kludge"
-the following will write "mega-mega-kludge" to the output:
-.nf
-
- %%
- mega- ECHO; yymore();
- kludge ECHO;
-
-.fi
-First "mega-" is matched and echoed to the output. Then "kludge"
-is matched, but the previous "mega-" is still hanging around at the
-beginning of
-.B yytext
-so the
-.B ECHO
-for the "kludge" rule will actually write "mega-kludge".
-The presence of
-.B yymore()
-in the scanner's action entails a minor performance penalty in the
-scanner's matching speed.
-.IP -
-.B yyless(n)
-returns all but the first
-.I n
-characters of the current token back to the input stream, where they
-will be rescanned when the scanner looks for the next match.
-.B yytext
-and
-.B yyleng
-are adjusted appropriately (e.g.,
-.B yyleng
-will now be equal to
-.I n
-). For example, on the input "foobar" the following will write out
-"foobarbar":
-.nf
-
- %%
- foobar ECHO; yyless(3);
- [a-z]+ ECHO;
-
-.fi
-An argument of 0 to
-.B yyless
-will cause the entire current input string to be scanned again. Unless you've
-changed how the scanner will subsequently process its input (using
-.B BEGIN,
-for example), this will result in an endless loop.
-.PP
-Note that
-.B yyless
-is a macro and can only be used in the flex input file, not from
-other source files.
-.IP -
-.B unput(c)
-puts the character
-.I c
-back onto the input stream. It will be the next character scanned.
-The following action will take the current token and cause it
-to be rescanned enclosed in parentheses.
-.nf
-
- {
- int i;
- unput( ')' );
- for ( i = yyleng - 1; i >= 0; --i )
- unput( yytext[i] );
- unput( '(' );
- }
-
-.fi
-Note that since each
-.B unput()
-puts the given character back at the
-.I beginning
-of the input stream, pushing back strings must be done back-to-front.
-Also note that you cannot put back
-.B EOF
-to attempt to mark the input stream with an end-of-file.
-.IP -
-.B input()
-reads the next character from the input stream. For example,
-the following is one way to eat up C comments:
-.nf
-
- %%
- "/*" {
- register int c;
-
- for ( ; ; )
- {
- while ( (c = input()) != '*' &&
- c != EOF )
- ; /* eat up text of comment */
-
- if ( c == '*' )
- {
- while ( (c = input()) == '*' )
- ;
- if ( c == '/' )
- break; /* found the end */
- }
-
- if ( c == EOF )
- {
- error( "EOF in comment" );
- break;
- }
- }
- }
-
-.fi
-(Note that if the scanner is compiled using
-.B C++,
-then
-.B input()
-is instead referred to as
-.B yyinput(),
-in order to avoid a name clash with the
-.B C++
-stream by the name of
-.I input.)
-.IP -
-.B yyterminate()
-can be used in lieu of a return statement in an action. It terminates
-the scanner and returns a 0 to the scanner's caller, indicating "all done".
-By default,
-.B yyterminate()
-is also called when an end-of-file is encountered. It is a macro and
-may be redefined.
-.SH THE GENERATED SCANNER
-The output of
-.I flex
-is the file
-.B lex.yy.c,
-which contains the scanning routine
-.B yylex(),
-a number of tables used by it for matching tokens, and a number
-of auxiliary routines and macros. By default,
-.B yylex()
-is declared as follows:
-.nf
-
- int yylex()
- {
- ... various definitions and the actions in here ...
- }
-
-.fi
-(If your environment supports function prototypes, then it will
-be "int yylex( void )".) This definition may be changed by defining
-the "YY_DECL" macro. For example, you could use:
-.nf
-
- #define YY_DECL float lexscan( a, b ) float a, b;
-
-.fi
-to give the scanning routine the name
-.I lexscan,
-returning a float, and taking two floats as arguments. Note that
-if you give arguments to the scanning routine using a
-K&R-style/non-prototyped function declaration, you must terminate
-the definition with a semi-colon (;).
-.PP
-Whenever
-.B yylex()
-is called, it scans tokens from the global input file
-.I yyin
-(which defaults to stdin). It continues until it either reaches
-an end-of-file (at which point it returns the value 0) or
-one of its actions executes a
-.I return
-statement.
-.PP
-If the scanner reaches an end-of-file, subsequent calls are undefined
-unless either
-.I yyin
-is pointed at a new input file (in which case scanning continues from
-that file), or
-.B yyrestart()
-is called.
-.B yyrestart()
-takes one argument, a
-.B FILE *
-pointer, and initializes
-.I yyin
-for scanning from that file. Essentially there is no difference between
-just assigning
-.I yyin
-to a new input file or using
-.B yyrestart()
-to do so; the latter is available for compatibility with previous versions
-of
-.I flex,
-and because it can be used to switch input files in the middle of scanning.
-It can also be used to throw away the current input buffer, by calling
-it with an argument of
-.I yyin.
-.PP
-If
-.B yylex()
-stops scanning due to executing a
-.I return
-statement in one of the actions, the scanner may then be called again and it
-will resume scanning where it left off.
-.PP
-By default (and for purposes of efficiency), the scanner uses
-block-reads rather than simple
-.I getc()
-calls to read characters from
-.I yyin.
-The nature of how it gets its input can be controlled by defining the
-.B YY_INPUT
-macro.
-YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
-action is to place up to
-.I max_size
-characters in the character array
-.I buf
-and return in the integer variable
-.I result
-either the
-number of characters read or the constant YY_NULL (0 on Unix systems)
-to indicate EOF. The default YY_INPUT reads from the
-global file-pointer "yyin".
-.PP
-A sample definition of YY_INPUT (in the definitions
-section of the input file):
-.nf
-
- %{
- #define YY_INPUT(buf,result,max_size) \\
- { \\
- int c = getchar(); \\
- result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
- }
- %}
-
-.fi
-This definition will change the input processing to occur
-one character at a time.
-.PP
-You also can add in things like keeping track of the
-input line number this way; but don't expect your scanner to
-go very fast.
-.PP
-When the scanner receives an end-of-file indication from YY_INPUT,
-it then checks the
-.B yywrap()
-function. If
-.B yywrap()
-returns false (zero), then it is assumed that the
-function has gone ahead and set up
-.I yyin
-to point to another input file, and scanning continues. If it returns
-true (non-zero), then the scanner terminates, returning 0 to its
-caller.
-.PP
-The default
-.B yywrap()
-always returns 1.
-.PP
-The scanner writes its
-.B ECHO
-output to the
-.I yyout
-global (default, stdout), which may be redefined by the user simply
-by assigning it to some other
-.B FILE
-pointer.
-.SH START CONDITIONS
-.I flex
-provides a mechanism for conditionally activating rules. Any rule
-whose pattern is prefixed with "<sc>" will only be active when
-the scanner is in the start condition named "sc". For example,
-.nf
-
- <STRING>[^"]* { /* eat up the string body ... */
- ...
- }
-
-.fi
-will be active only when the scanner is in the "STRING" start
-condition, and
-.nf
-
- <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
- ...
- }
-
-.fi
-will be active only when the current start condition is
-either "INITIAL", "STRING", or "QUOTE".
-.PP
-Start conditions
-are declared in the definitions (first) section of the input
-using unindented lines beginning with either
-.B %s
-or
-.B %x
-followed by a list of names.
-The former declares
-.I inclusive
-start conditions, the latter
-.I exclusive
-start conditions. A start condition is activated using the
-.B BEGIN
-action. Until the next
-.B BEGIN
-action is executed, rules with the given start
-condition will be active and
-rules with other start conditions will be inactive.
-If the start condition is
-.I inclusive,
-then rules with no start conditions at all will also be active.
-If it is
-.I exclusive,
-then
-.I only
-rules qualified with the start condition will be active.
-A set of rules contingent on the same exclusive start condition
-describe a scanner which is independent of any of the other rules in the
-.I flex
-input. Because of this,
-exclusive start conditions make it easy to specify "mini-scanners"
-which scan portions of the input that are syntactically different
-from the rest (e.g., comments).
-.PP
-If the distinction between inclusive and exclusive start conditions
-is still a little vague, here's a simple example illustrating the
-connection between the two. The set of rules:
-.nf
-
- %s example
- %%
- <example>foo /* do something */
-
-.fi
-is equivalent to
-.nf
-
- %x example
- %%
- <INITIAL,example>foo /* do something */
-
-.fi
-.PP
-Also note that the special start-condition specifier
-.B <*>
-matches every start condition. Thus, the above example could also
-have been written;
-.nf
-
- %x example
- %%
- <*>foo /* do something */
-
-.fi
-.PP
-The default rule (to
-.B ECHO
-any unmatched character) remains active in start conditions.
-.PP
-.B BEGIN(0)
-returns to the original state where only the rules with
-no start conditions are active. This state can also be
-referred to as the start-condition "INITIAL", so
-.B BEGIN(INITIAL)
-is equivalent to
-.B BEGIN(0).
-(The parentheses around the start condition name are not required but
-are considered good style.)
-.PP
-.B BEGIN
-actions can also be given as indented code at the beginning
-of the rules section. For example, the following will cause
-the scanner to enter the "SPECIAL" start condition whenever
-.I yylex()
-is called and the global variable
-.I enter_special
-is true:
-.nf
-
- int enter_special;
-
- %x SPECIAL
- %%
- if ( enter_special )
- BEGIN(SPECIAL);
-
- <SPECIAL>blahblahblah
- ...more rules follow...
-
-.fi
-.PP
-To illustrate the uses of start conditions,
-here is a scanner which provides two different interpretations
-of a string like "123.456". By default it will treat it as
-as three tokens, the integer "123", a dot ('.'), and the integer "456".
-But if the string is preceded earlier in the line by the string
-"expect-floats"
-it will treat it as a single token, the floating-point number
-123.456:
-.nf
-
- %{
- #include <math.h>
- %}
- %s expect
-
- %%
- expect-floats BEGIN(expect);
-
- <expect>[0-9]+"."[0-9]+ {
- printf( "found a float, = %f\\n",
- atof( yytext ) );
- }
- <expect>\\n {
- /* that's the end of the line, so
- * we need another "expect-number"
- * before we'll recognize any more
- * numbers
- */
- BEGIN(INITIAL);
- }
-
- [0-9]+ {
- printf( "found an integer, = %d\\n",
- atoi( yytext ) );
- }
-
- "." printf( "found a dot\\n" );
-
-.fi
-Here is a scanner which recognizes (and discards) C comments while
-maintaining a count of the current input line.
-.nf
-
- %x comment
- %%
- int line_num = 1;
-
- "/*" BEGIN(comment);
-
- <comment>[^*\\n]* /* eat anything that's not a '*' */
- <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
- <comment>\\n ++line_num;
- <comment>"*"+"/" BEGIN(INITIAL);
-
-.fi
-This scanner goes to a bit of trouble to match as much
-text as possible with each rule. In general, when attempting to write
-a high-speed scanner try to match as much possible in each rule, as
-it's a big win.
-.PP
-Note that start-conditions names are really integer values and
-can be stored as such. Thus, the above could be extended in the
-following fashion:
-.nf
-
- %x comment foo
- %%
- int line_num = 1;
- int comment_caller;
-
- "/*" {
- comment_caller = INITIAL;
- BEGIN(comment);
- }
-
- ...
-
- <foo>"/*" {
- comment_caller = foo;
- BEGIN(comment);
- }
-
- <comment>[^*\\n]* /* eat anything that's not a '*' */
- <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
- <comment>\\n ++line_num;
- <comment>"*"+"/" BEGIN(comment_caller);
-
-.fi
-Furthermore, you can access the current start condition using
-the integer-valued
-.B YY_START
-macro. For example, the above assignments to
-.I comment_caller
-could instead be written
-.nf
-
- comment_caller = YY_START;
-.fi
-.PP
-Note that start conditions do not have their own name-space; %s's and %x's
-declare names in the same fashion as #define's.
-.PP
-Finally, here's an example of how to match C-style quoted strings using
-exclusive start conditions, including expanded escape sequences (but
-not including checking for a string that's too long):
-.nf
-
- %x str
-
- %%
- char string_buf[MAX_STR_CONST];
- char *string_buf_ptr;
-
-
- \\" string_buf_ptr = string_buf; BEGIN(str);
-
- <str>\\" { /* saw closing quote - all done */
- BEGIN(INITIAL);
- *string_buf_ptr = '\\0';
- /* return string constant token type and
- * value to parser
- */
- }
-
- <str>\\n {
- /* error - unterminated string constant */
- /* generate error message */
- }
-
- <str>\\\\[0-7]{1,3} {
- /* octal escape sequence */
- int result;
-
- (void) sscanf( yytext + 1, "%o", &result );
-
- if ( result > 0xff )
- /* error, constant is out-of-bounds */
-
- *string_buf_ptr++ = result;
- }
-
- <str>\\\\[0-9]+ {
- /* generate error - bad escape sequence; something
- * like '\\48' or '\\0777777'
- */
- }
-
- <str>\\\\n *string_buf_ptr++ = '\\n';
- <str>\\\\t *string_buf_ptr++ = '\\t';
- <str>\\\\r *string_buf_ptr++ = '\\r';
- <str>\\\\b *string_buf_ptr++ = '\\b';
- <str>\\\\f *string_buf_ptr++ = '\\f';
-
- <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
-
- <str>[^\\\\\\n\\"]+ {
- char *yytext_ptr = yytext;
-
- while ( *yytext_ptr )
- *string_buf_ptr++ = *yytext_ptr++;
- }
-
-.fi
-.SH MULTIPLE INPUT BUFFERS
-Some scanners (such as those which support "include" files)
-require reading from several input streams. As
-.I flex
-scanners do a large amount of buffering, one cannot control
-where the next input will be read from by simply writing a
-.B YY_INPUT
-which is sensitive to the scanning context.
-.B YY_INPUT
-is only called when the scanner reaches the end of its buffer, which
-may be a long time after scanning a statement such as an "include"
-which requires switching the input source.
-.PP
-To negotiate these sorts of problems,
-.I flex
-provides a mechanism for creating and switching between multiple
-input buffers. An input buffer is created by using:
-.nf
-
- YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
-
-.fi
-which takes a
-.I FILE
-pointer and a size and creates a buffer associated with the given
-file and large enough to hold
-.I size
-characters (when in doubt, use
-.B YY_BUF_SIZE
-for the size). It returns a
-.B YY_BUFFER_STATE
-handle, which may then be passed to other routines:
-.nf
-
- void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
-
-.fi
-switches the scanner's input buffer so subsequent tokens will
-come from
-.I new_buffer.
-Note that
-.B yy_switch_to_buffer()
-may be used by yywrap() to set things up for continued scanning, instead
-of opening a new file and pointing
-.I yyin
-at it.
-.nf
-
- void yy_delete_buffer( YY_BUFFER_STATE buffer )
-
-.fi
-is used to reclaim the storage associated with a buffer.
-.PP
-.B yy_new_buffer()
-is an alias for
-.B yy_create_buffer(),
-provided for compatibility with the C++ use of
-.I new
-and
-.I delete
-for creating and destroying dynamic objects.
-.PP
-Finally, the
-.B YY_CURRENT_BUFFER
-macro returns a
-.B YY_BUFFER_STATE
-handle to the current buffer.
-.PP
-Here is an example of using these features for writing a scanner
-which expands include files (the
-.B <<EOF>>
-feature is discussed below):
-.nf
-
- /* the "incl" state is used for picking up the name
- * of an include file
- */
- %x incl
-
- %{
- #define MAX_INCLUDE_DEPTH 10
- YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
- int include_stack_ptr = 0;
- %}
-
- %%
- include BEGIN(incl);
-
- [a-z]+ ECHO;
- [^a-z\\n]*\\n? ECHO;
-
- <incl>[ \\t]* /* eat the whitespace */
- <incl>[^ \\t\\n]+ { /* got the include file name */
- if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
- {
- fprintf( stderr, "Includes nested too deeply" );
- exit( 1 );
- }
-
- include_stack[include_stack_ptr++] =
- YY_CURRENT_BUFFER;
-
- yyin = fopen( yytext, "r" );
-
- if ( ! yyin )
- error( ... );
-
- yy_switch_to_buffer(
- yy_create_buffer( yyin, YY_BUF_SIZE ) );
-
- BEGIN(INITIAL);
- }
-
- <<EOF>> {
- if ( --include_stack_ptr < 0 )
- {
- yyterminate();
- }
-
- else
- {
- yy_delete_buffer( YY_CURRENT_BUFFER );
- yy_switch_to_buffer(
- include_stack[include_stack_ptr] );
- }
- }
-
-.fi
-.SH END-OF-FILE RULES
-The special rule "<<EOF>>" indicates
-actions which are to be taken when an end-of-file is
-encountered and yywrap() returns non-zero (i.e., indicates
-no further files to process). The action must finish
-by doing one of four things:
-.IP -
-assigning
-.I yyin
-to a new input file (in previous versions of flex, after doing the
-assignment you had to call the special action
-.B YY_NEW_FILE;
-this is no longer necessary);
-.IP -
-executing a
-.I return
-statement;
-.IP -
-executing the special
-.B yyterminate()
-action;
-.IP -
-or, switching to a new buffer using
-.B yy_switch_to_buffer()
-as shown in the example above.
-.PP
-<<EOF>> rules may not be used with other
-patterns; they may only be qualified with a list of start
-conditions. If an unqualified <<EOF>> rule is given, it
-applies to
-.I all
-start conditions which do not already have <<EOF>> actions. To
-specify an <<EOF>> rule for only the initial start condition, use
-.nf
-
- <INITIAL><<EOF>>
-
-.fi
-.PP
-These rules are useful for catching things like unclosed comments.
-An example:
-.nf
-
- %x quote
- %%
-
- ...other rules for dealing with quotes...
-
- <quote><<EOF>> {
- error( "unterminated quote" );
- yyterminate();
- }
- <<EOF>> {
- if ( *++filelist )
- yyin = fopen( *filelist, "r" );
- else
- yyterminate();
- }
-
-.fi
-.SH MISCELLANEOUS MACROS
-The macro
-.bd
-YY_USER_ACTION
-can be defined to provide an action
-which is always executed prior to the matched rule's action. For example,
-it could be #define'd to call a routine to convert yytext to lower-case.
-.PP
-The macro
-.B YY_USER_INIT
-may be defined to provide an action which is always executed before
-the first scan (and before the scanner's internal initializations are done).
-For example, it could be used to call a routine to read
-in a data table or open a logging file.
-.PP
-In the generated scanner, the actions are all gathered in one large
-switch statement and separated using
-.B YY_BREAK,
-which may be redefined. By default, it is simply a "break", to separate
-each rule's action from the following rule's.
-Redefining
-.B YY_BREAK
-allows, for example, C++ users to
-#define YY_BREAK to do nothing (while being very careful that every
-rule ends with a "break" or a "return"!) to avoid suffering from
-unreachable statement warnings where because a rule's action ends with
-"return", the
-.B YY_BREAK
-is inaccessible.
-.SH INTERFACING WITH YACC
-One of the main uses of
-.I flex
-is as a companion to the
-.I yacc
-parser-generator.
-.I yacc
-parsers expect to call a routine named
-.B yylex()
-to find the next input token. The routine is supposed to
-return the type of the next token as well as putting any associated
-value in the global
-.B yylval.
-To use
-.I flex
-with
-.I yacc,
-one specifies the
-.B \-d
-option to
-.I yacc
-to instruct it to generate the file
-.B y.tab.h
-containing definitions of all the
-.B %tokens
-appearing in the
-.I yacc
-input. This file is then included in the
-.I flex
-scanner. For example, if one of the tokens is "TOK_NUMBER",
-part of the scanner might look like:
-.nf
-
- %{
- #include "y.tab.h"
- %}
-
- %%
-
- [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
-
-.fi
-.SH OPTIONS
-.I flex
-has the following options:
-.TP
-.B \-b
-Generate backing-up information to
-.I lex.backup.
-This is a list of scanner states which require backing up
-and the input characters on which they do so. By adding rules one
-can remove backing-up states. If all backing-up states
-are eliminated and
-.B \-Cf
-or
-.B \-CF
-is used, the generated scanner will run faster (see the
-.B \-p
-flag). Only users who wish to squeeze every last cycle out of their
-scanners need worry about this option. (See the section on Performance
-Considerations below.)
-.TP
-.B \-c
-is a do-nothing, deprecated option included for POSIX compliance.
-.IP
-.B NOTE:
-in previous releases of
-.I flex
-.B \-c
-specified table-compression options. This functionality is
-now given by the
-.B \-C
-flag. To ease the the impact of this change, when
-.I flex
-encounters
-.B \-c,
-it currently issues a warning message and assumes that
-.B \-C
-was desired instead. In the future this "promotion" of
-.B \-c
-to
-.B \-C
-will go away in the name of full POSIX compliance (unless
-the POSIX meaning is removed first).
-.TP
-.B \-d
-makes the generated scanner run in
-.I debug
-mode. Whenever a pattern is recognized and the global
-.B yy_flex_debug
-is non-zero (which is the default),
-the scanner will write to
-.I stderr
-a line of the form:
-.nf
-
- --accepting rule at line 53 ("the matched text")
-
-.fi
-The line number refers to the location of the rule in the file
-defining the scanner (i.e., the file that was fed to flex). Messages
-are also generated when the scanner backs up, accepts the
-default rule, reaches the end of its input buffer (or encounters
-a NUL; at this point, the two look the same as far as the scanner's concerned),
-or reaches an end-of-file.
-.TP
-.B \-f
-specifies
-.I fast scanner.
-No table compression is done and stdio is bypassed.
-The result is large but fast. This option is equivalent to
-.B \-Cfr
-(see below).
-.TP
-.B \-h
-generates a "help" summary of
-.I flex's
-options to
-.I stderr
-and then exits.
-.TP
-.B \-i
-instructs
-.I flex
-to generate a
-.I case-insensitive
-scanner. The case of letters given in the
-.I flex
-input patterns will
-be ignored, and tokens in the input will be matched regardless of case. The
-matched text given in
-.I yytext
-will have the preserved case (i.e., it will not be folded).
-.TP
-.B \-l
-turns on maximum compatibility with the original AT&T
-.I lex
-implementation. Note that this does not mean
-.I full
-compatibility. Use of this option costs a considerable amount of
-performance, and it cannot be used with the
-.B \-+, -f, -F, -Cf,
-or
-.B -CF
-options. For details on the compatibilities it provides, see the section
-"Incompatibilities With Lex And POSIX" below.
-.TP
-.B \-n
-is another do-nothing, deprecated option included only for
-POSIX compliance.
-.TP
-.B \-p
-generates a performance report to stderr. The report
-consists of comments regarding features of the
-.I flex
-input file which will cause a serious loss of performance in the resulting
-scanner. If you give the flag twice, you will also get comments regarding
-features that lead to minor performance losses.
-.IP
-Note that the use of
-.B REJECT
-and variable trailing context (see the Bugs section in flex(1))
-entails a substantial performance penalty; use of
-.I yymore(),
-the
-.B ^
-operator,
-and the
-.B \-I
-flag entail minor performance penalties.
-.TP
-.B \-s
-causes the
-.I default rule
-(that unmatched scanner input is echoed to
-.I stdout)
-to be suppressed. If the scanner encounters input that does not
-match any of its rules, it aborts with an error. This option is
-useful for finding holes in a scanner's rule set.
-.TP
-.B \-t
-instructs
-.I flex
-to write the scanner it generates to standard output instead
-of
-.B lex.yy.c.
-.TP
-.B \-v
-specifies that
-.I flex
-should write to
-.I stderr
-a summary of statistics regarding the scanner it generates.
-Most of the statistics are meaningless to the casual
-.I flex
-user, but the first line identifies the version of
-.I flex
-(same as reported by
-.B \-V),
-and the next line the flags used when generating the scanner, including
-those that are on by default.
-.TP
-.B \-w
-suppresses warning messages.
-.TP
-.B \-B
-instructs
-.I flex
-to generate a
-.I batch
-scanner, the opposite of
-.I interactive
-scanners generated by
-.B \-I
-(see below). In general, you use
-.B \-B
-when you are
-.I certain
-that your scanner will never be used interactively, and you want to
-squeeze a
-.I little
-more performance out of it. If your goal is instead to squeeze out a
-.I lot
-more performance, you should be using the
-.B \-Cf
-or
-.B \-CF
-options (discussed below), which turn on
-.B \-B
-automatically anyway.
-.TP
-.B \-F
-specifies that the
-.ul
-fast
-scanner table representation should be used (and stdio
-bypassed). This representation is
-about as fast as the full table representation
-.B (-f),
-and for some sets of patterns will be considerably smaller (and for
-others, larger). In general, if the pattern set contains both "keywords"
-and a catch-all, "identifier" rule, such as in the set:
-.nf
-
- "case" return TOK_CASE;
- "switch" return TOK_SWITCH;
- ...
- "default" return TOK_DEFAULT;
- [a-z]+ return TOK_ID;
-
-.fi
-then you're better off using the full table representation. If only
-the "identifier" rule is present and you then use a hash table or some such
-to detect the keywords, you're better off using
-.B -F.
-.IP
-This option is equivalent to
-.B \-CFr
-(see below). It cannot be used with
-.B \-+.
-.TP
-.B \-I
-instructs
-.I flex
-to generate an
-.I interactive
-scanner. An interactive scanner is one that only looks ahead to decide
-what token has been matched if it absolutely must. It turns out that
-always looking one extra character ahead, even if the scanner has already
-seen enough text to disambiguate the current token, is a bit faster than
-only looking ahead when necessary. But scanners that always look ahead
-give dreadful interactive performance; for example, when a user types
-a newline, it is not recognized as a newline token until they enter
-.I another
-token, which often means typing in another whole line.
-.IP
-.I Flex
-scanners default to
-.I interactive
-unless you use the
-.B \-Cf
-or
-.B \-CF
-table-compression options (see below). That's because if you're looking
-for high-performance you should be using one of these options, so if you
-didn't,
-.I flex
-assumes you'd rather trade off a bit of run-time performance for intuitive
-interactive behavior. Note also that you
-.I cannot
-use
-.B \-I
-in conjunction with
-.B \-Cf
-or
-.B \-CF.
-Thus, this option is not really needed; it is on by default for all those
-cases in which it is allowed.
-.IP
-You can force a scanner to
-.I not
-be interactive by using
-.B \-B
-(see above).
-.TP
-.B \-L
-instructs
-.I flex
-not to generate
-.B #line
-directives. Without this option,
-.I flex
-peppers the generated scanner
-with #line directives so error messages in the actions will be correctly
-located with respect to the original
-.I flex
-input file, and not to
-the fairly meaningless line numbers of
-.B lex.yy.c.
-(Unfortunately
-.I flex
-does not presently generate the necessary directives
-to "retarget" the line numbers for those parts of
-.B lex.yy.c
-which it generated. So if there is an error in the generated code,
-a meaningless line number is reported.)
-.TP
-.B \-T
-makes
-.I flex
-run in
-.I trace
-mode. It will generate a lot of messages to
-.I stderr
-concerning
-the form of the input and the resultant non-deterministic and deterministic
-finite automata. This option is mostly for use in maintaining
-.I flex.
-.TP
-.B \-V
-prints the version number to
-.I stderr
-and exits.
-.TP
-.B \-7
-instructs
-.I flex
-to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
-characters in its input. The advantage of using
-.B \-7
-is that the scanner's tables can be up to half the size of those generated
-using the
-.B \-8
-option (see below). The disadvantage is that such scanners often hang
-or crash if their input contains an 8-bit character.
-.IP
-Note, however, that unless you generate your scanner using the
-.B \-Cf
-or
-.B \-CF
-table compression options, use of
-.B \-7
-will save only a small amount of table space, and make your scanner
-considerably less portable.
-.I Flex's
-default behavior is to generate an 8-bit scanner unless you use the
-.B \-Cf
-or
-.B \-CF,
-in which case
-.I flex
-defaults to generating 7-bit scanners unless your site was always
-configured to generate 8-bit scanners (as will often be the case
-with non-USA sites). You can tell whether flex generated a 7-bit
-or an 8-bit scanner by inspecting the flag summary in the
-.B \-v
-output as described above.
-.IP
-Note that if you use
-.B \-Cfe
-or
-.B \-CFe
-(those table compression options, but also using equivalence classes as
-discussed see below), flex still defaults to generating an 8-bit
-scanner, since usually with these compression options full 8-bit tables
-are not much more expensive than 7-bit tables.
-.TP
-.B \-8
-instructs
-.I flex
-to generate an 8-bit scanner, i.e., one which can recognize 8-bit
-characters. This flag is only needed for scanners generated using
-.B \-Cf
-or
-.B \-CF,
-as otherwise flex defaults to generating an 8-bit scanner anyway.
-.IP
-See the discussion of
-.B \-7
-above for flex's default behavior and the tradeoffs between 7-bit
-and 8-bit scanners.
-.TP
-.B \-+
-specifies that you want flex to generate a C++
-scanner class. See the section on Generating C++ Scanners below for
-details.
-.TP
-.B \-C[aefFmr]
-controls the degree of table compression and, more generally, trade-offs
-between small scanners and fast scanners.
-.IP
-.B \-Ca
-("align") instructs flex to trade off larger tables in the
-generated scanner for faster performance because the elements of
-the tables are better aligned for memory access and computation. On some
-RISC architectures, fetching and manipulating longwords is more efficient
-than with smaller-sized datums such as shortwords. This option can
-double the size of the tables used by your scanner.
-.IP
-.B \-Ce
-directs
-.I flex
-to construct
-.I equivalence classes,
-i.e., sets of characters
-which have identical lexical properties (for example, if the only
-appearance of digits in the
-.I flex
-input is in the character class
-"[0-9]" then the digits '0', '1', ..., '9' will all be put
-in the same equivalence class). Equivalence classes usually give
-dramatic reductions in the final table/object file sizes (typically
-a factor of 2-5) and are pretty cheap performance-wise (one array
-look-up per character scanned).
-.IP
-.B \-Cf
-specifies that the
-.I full
-scanner tables should be generated -
-.I flex
-should not compress the
-tables by taking advantages of similar transition functions for
-different states.
-.IP
-.B \-CF
-specifies that the alternate fast scanner representation (described
-above under the
-.B \-F
-flag)
-should be used. This option cannot be used with
-.B \-+.
-.IP
-.B \-Cm
-directs
-.I flex
-to construct
-.I meta-equivalence classes,
-which are sets of equivalence classes (or characters, if equivalence
-classes are not being used) that are commonly used together. Meta-equivalence
-classes are often a big win when using compressed tables, but they
-have a moderate performance impact (one or two "if" tests and one
-array look-up per character scanned).
-.IP
-.B \-Cr
-causes the generated scanner to
-.I bypass
-use of the standard I/O library (stdio) for input. Instead of calling
-.B fread()
-or
-.B getc(),
-the scanner will use the
-.B read()
-system call, resulting in a performance gain which varies from system
-to system, but in general is probably negligible unless you are also using
-.B \-Cf
-or
-.B \-CF.
-Using
-.B \-Cr
-can cause strange behavior if, for example, you read from
-.I yyin
-using stdio prior to calling the scanner (because the scanner will miss
-whatever text your previous reads left in the stdio input buffer).
-.IP
-.B \-Cr
-has no effect if you define
-.B YY_INPUT
-(see The Generated Scanner above).
-.IP
-A lone
-.B \-C
-specifies that the scanner tables should be compressed but neither
-equivalence classes nor meta-equivalence classes should be used.
-.IP
-The options
-.B \-Cf
-or
-.B \-CF
-and
-.B \-Cm
-do not make sense together - there is no opportunity for meta-equivalence
-classes if the table is not being compressed. Otherwise the options
-may be freely mixed, and are cumulative.
-.IP
-The default setting is
-.B \-Cem,
-which specifies that
-.I flex
-should generate equivalence classes
-and meta-equivalence classes. This setting provides the highest
-degree of table compression. You can trade off
-faster-executing scanners at the cost of larger tables with
-the following generally being true:
-.nf
-
- slowest & smallest
- -Cem
- -Cm
- -Ce
- -C
- -C{f,F}e
- -C{f,F}
- -C{f,F}a
- fastest & largest
-
-.fi
-Note that scanners with the smallest tables are usually generated and
-compiled the quickest, so
-during development you will usually want to use the default, maximal
-compression.
-.IP
-.B \-Cfe
-is often a good compromise between speed and size for production
-scanners.
-.TP
-.B \-Pprefix
-changes the default
-.I "yy"
-prefix used by
-.I flex
-for all globally-visible variable and function names to instead be
-.I prefix.
-For example,
-.B \-Pfoo
-changes the name of
-.B yytext
-to
-.B footext.
-It also changes the name of the default output file from
-.B lex.yy.c
-to
-.B lex.foo.c.
-Here are all of the names affected:
-.nf
-
- yyFlexLexer
- yy_create_buffer
- yy_delete_buffer
- yy_flex_debug
- yy_init_buffer
- yy_load_buffer_state
- yy_switch_to_buffer
- yyin
- yyleng
- yylex
- yyout
- yyrestart
- yytext
- yywrap
-
-.fi
-Within your scanner itself, you can still refer to the global variables
-and functions using either version of their name; but eternally, they
-have the modified name.
-.IP
-This option lets you easily link together multiple
-.I flex
-programs into the same executable. Note, though, that using this
-option also renames
-.B yywrap(),
-so you now
-.I must
-provide your own (appropriately-named) version of the routine for your
-scanner, as linking with
-.B \-lfl
-no longer provides one for you by default.
-.TP
-.B \-Sskeleton_file
-overrides the default skeleton file from which
-.I flex
-constructs its scanners. You'll never need this option unless you are doing
-.I flex
-maintenance or development.
-.SH PERFORMANCE CONSIDERATIONS
-The main design goal of
-.I flex
-is that it generate high-performance scanners. It has been optimized
-for dealing well with large sets of rules. Aside from the effects on
-scanner speed of the table compression
-.B \-C
-options outlined above,
-there are a number of options/actions which degrade performance. These
-are, from most expensive to least:
-.nf
-
- REJECT
-
- pattern sets that require backing up
- arbitrary trailing context
-
- yymore()
- '^' beginning-of-line operator
-
-.fi
-with the first three all being quite expensive and the last two
-being quite cheap. Note also that
-.B unput()
-is implemented as a routine call that potentially does quite a bit of
-work, while
-.B yyless()
-is a quite-cheap macro; so if just putting back some excess text you
-scanned, use
-.B yyless().
-.PP
-.B REJECT
-should be avoided at all costs when performance is important.
-It is a particularly expensive option.
-.PP
-Getting rid of backing up is messy and often may be an enormous
-amount of work for a complicated scanner. In principal, one begins
-by using the
-.B \-b
-flag to generate a
-.I lex.backup
-file. For example, on the input
-.nf
-
- %%
- foo return TOK_KEYWORD;
- foobar return TOK_KEYWORD;
-
-.fi
-the file looks like:
-.nf
-
- State #6 is non-accepting -
- associated rule line numbers:
- 2 3
- out-transitions: [ o ]
- jam-transitions: EOF [ \\001-n p-\\177 ]
-
- State #8 is non-accepting -
- associated rule line numbers:
- 3
- out-transitions: [ a ]
- jam-transitions: EOF [ \\001-` b-\\177 ]
-
- State #9 is non-accepting -
- associated rule line numbers:
- 3
- out-transitions: [ r ]
- jam-transitions: EOF [ \\001-q s-\\177 ]
-
- Compressed tables always back up.
-
-.fi
-The first few lines tell us that there's a scanner state in
-which it can make a transition on an 'o' but not on any other
-character, and that in that state the currently scanned text does not match
-any rule. The state occurs when trying to match the rules found
-at lines 2 and 3 in the input file.
-If the scanner is in that state and then reads
-something other than an 'o', it will have to back up to find
-a rule which is matched. With
-a bit of headscratching one can see that this must be the
-state it's in when it has seen "fo". When this has happened,
-if anything other than another 'o' is seen, the scanner will
-have to back up to simply match the 'f' (by the default rule).
-.PP
-The comment regarding State #8 indicates there's a problem
-when "foob" has been scanned. Indeed, on any character other
-than an 'a', the scanner will have to back up to accept "foo".
-Similarly, the comment for State #9 concerns when "fooba" has
-been scanned and an 'r' does not follow.
-.PP
-The final comment reminds us that there's no point going to
-all the trouble of removing backing up from the rules unless
-we're using
-.B \-Cf
-or
-.B \-CF,
-since there's no performance gain doing so with compressed scanners.
-.PP
-The way to remove the backing up is to add "error" rules:
-.nf
-
- %%
- foo return TOK_KEYWORD;
- foobar return TOK_KEYWORD;
-
- fooba |
- foob |
- fo {
- /* false alarm, not really a keyword */
- return TOK_ID;
- }
-
-.fi
-.PP
-Eliminating backing up among a list of keywords can also be
-done using a "catch-all" rule:
-.nf
-
- %%
- foo return TOK_KEYWORD;
- foobar return TOK_KEYWORD;
-
- [a-z]+ return TOK_ID;
-
-.fi
-This is usually the best solution when appropriate.
-.PP
-Backing up messages tend to cascade.
-With a complicated set of rules it's not uncommon to get hundreds
-of messages. If one can decipher them, though, it often
-only takes a dozen or so rules to eliminate the backing up (though
-it's easy to make a mistake and have an error rule accidentally match
-a valid token. A possible future
-.I flex
-feature will be to automatically add rules to eliminate backing up).
-.PP
-.I Variable
-trailing context (where both the leading and trailing parts do not have
-a fixed length) entails almost the same performance loss as
-.B REJECT
-(i.e., substantial). So when possible a rule like:
-.nf
-
- %%
- mouse|rat/(cat|dog) run();
-
-.fi
-is better written:
-.nf
-
- %%
- mouse/cat|dog run();
- rat/cat|dog run();
-
-.fi
-or as
-.nf
-
- %%
- mouse|rat/cat run();
- mouse|rat/dog run();
-
-.fi
-Note that here the special '|' action does
-.I not
-provide any savings, and can even make things worse (see
-.PP
-A final note regarding performance: as mentioned above in the section
-How the Input is Matched, dynamically resizing
-.B yytext
-to accomodate huge tokens is a slow process because it presently requires that
-the (huge) token be rescanned from the beginning. Thus if performance is
-vital, you should attempt to match "large" quantities of text but not
-"huge" quantities, where the cutoff between the two is at about 8K
-characters/token.
-.PP
-Another area where the user can increase a scanner's performance
-(and one that's easier to implement) arises from the fact that
-the longer the tokens matched, the faster the scanner will run.
-This is because with long tokens the processing of most input
-characters takes place in the (short) inner scanning loop, and
-does not often have to go through the additional work of setting up
-the scanning environment (e.g.,
-.B yytext)
-for the action. Recall the scanner for C comments:
-.nf
-
- %x comment
- %%
- int line_num = 1;
-
- "/*" BEGIN(comment);
-
- <comment>[^*\\n]*
- <comment>"*"+[^*/\\n]*
- <comment>\\n ++line_num;
- <comment>"*"+"/" BEGIN(INITIAL);
-
-.fi
-This could be sped up by writing it as:
-.nf
-
- %x comment
- %%
- int line_num = 1;
-
- "/*" BEGIN(comment);
-
- <comment>[^*\\n]*
- <comment>[^*\\n]*\\n ++line_num;
- <comment>"*"+[^*/\\n]*
- <comment>"*"+[^*/\\n]*\\n ++line_num;
- <comment>"*"+"/" BEGIN(INITIAL);
-
-.fi
-Now instead of each newline requiring the processing of another
-action, recognizing the newlines is "distributed" over the other rules
-to keep the matched text as long as possible. Note that
-.I adding
-rules does
-.I not
-slow down the scanner! The speed of the scanner is independent
-of the number of rules or (modulo the considerations given at the
-beginning of this section) how complicated the rules are with
-regard to operators such as '*' and '|'.
-.PP
-A final example in speeding up a scanner: suppose you want to scan
-through a file containing identifiers and keywords, one per line
-and with no other extraneous characters, and recognize all the
-keywords. A natural first approach is:
-.nf
-
- %%
- asm |
- auto |
- break |
- ... etc ...
- volatile |
- while /* it's a keyword */
-
- .|\\n /* it's not a keyword */
-
-.fi
-To eliminate the back-tracking, introduce a catch-all rule:
-.nf
-
- %%
- asm |
- auto |
- break |
- ... etc ...
- volatile |
- while /* it's a keyword */
-
- [a-z]+ |
- .|\\n /* it's not a keyword */
-
-.fi
-Now, if it's guaranteed that there's exactly one word per line,
-then we can reduce the total number of matches by a half by
-merging in the recognition of newlines with that of the other
-tokens:
-.nf
-
- %%
- asm\\n |
- auto\\n |
- break\\n |
- ... etc ...
- volatile\\n |
- while\\n /* it's a keyword */
-
- [a-z]+\\n |
- .|\\n /* it's not a keyword */
-
-.fi
-One has to be careful here, as we have now reintroduced backing up
-into the scanner. In particular, while
-.I we
-know that there will never be any characters in the input stream
-other than letters or newlines,
-.I flex
-can't figure this out, and it will plan for possibly needing to back up
-when it has scanned a token like "auto" and then the next character
-is something other than a newline or a letter. Previously it would
-then just match the "auto" rule and be done, but now it has no "auto"
-rule, only a "auto\\n" rule. To eliminate the possibility of backing up,
-we could either duplicate all rules but without final newlines, or,
-since we never expect to encounter such an input and therefore don't
-how it's classified, we can introduce one more catch-all rule, this
-one which doesn't include a newline:
-.nf
-
- %%
- asm\\n |
- auto\\n |
- break\\n |
- ... etc ...
- volatile\\n |
- while\\n /* it's a keyword */
-
- [a-z]+\\n |
- [a-z]+ |
- .|\\n /* it's not a keyword */
-
-.fi
-Compiled with
-.B \-Cf,
-this is about as fast as one can get a
-.I flex
-scanner to go for this particular problem.
-.PP
-A final note:
-.I flex
-is slow when matching NUL's, particularly when a token contains
-multiple NUL's.
-It's best to write rules which match
-.I short
-amounts of text if it's anticipated that the text will often include NUL's.
-.SH GENERATING C++ SCANNERS
-.I flex
-provides two different ways to generate scanners for use with C++. The
-first way is to simply compile a scanner generated by
-.I flex
-using a C++ compiler instead of a C compiler. You should not encounter
-any compilations errors (please report any you find to the email address
-given in the Author section below). You can then use C++ code in your
-rule actions instead of C code. Note that the default input source for
-your scanner remains
-.I yyin,
-and default echoing is still done to
-.I yyout.
-Both of these remain
-.I FILE *
-variables and not C++
-.I streams.
-.PP
-You can also use
-.I flex
-to generate a C++ scanner class, using the
-.B \-+
-option, which is automatically specified if the name of the flex
-executable ends in a '+', such as
-.I flex++.
-When using this option, flex defaults to generating the scanner to the file
-.B lex.yy.cc
-instead of
-.B lex.yy.c.
-The generated scanner includes the header file
-.I FlexLexer.h,
-which defines the interface to two C++ classes.
-.PP
-The first class,
-.B FlexLexer,
-provides an abstract base class defining the general scanner class
-interface. It provides the following member functions:
-.TP
-.B const char* YYText()
-returns the text of the most recently matched token, the equivalent of
-.B yytext.
-.TP
-.B int YYLeng()
-returns the length of the most recently matched token, the equivalent of
-.B yyleng.
-.PP
-Also provided are member functions equivalent to
-.B yy_switch_to_buffer(),
-.B yy_create_buffer()
-(though the first argument is an
-.B istream*
-object pointer and not a
-.B FILE*),
-.B yy_delete_buffer(),
-and
-.B yyrestart()
-(again, the first argument is a
-.B istream*
-object pointer).
-.PP
-The second class defined in
-.I FlexLexer.h
-is
-.B yyFlexLexer,
-which is derived from
-.B FlexLexer.
-It defines the following additional member functions:
-.TP
-.B
-yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
-constructs a
-.B yyFlexLexer
-object using the given streams for input and output. If not specified,
-the streams default to
-.B cin
-and
-.B cout,
-respectively.
-.TP
-.B virtual int yylex()
-performs the same role is
-.B yylex()
-does for ordinary flex scanners: it scans the input stream, consuming
-tokens, until a rule's action returns a value.
-.PP
-In addition,
-.B yyFlexLexer
-defines the following protected virtual functions which you can redefine
-in derived classes to tailor the scanner:
-.TP
-.B
-virtual int LexerInput( char* buf, int max_size )
-reads up to
-.B max_size
-characters into
-.B buf
-and returns the number of characters read. To indicate end-of-input,
-return 0 characters. Note that "interactive" scanners (see the
-.B \-B
-and
-.B \-I
-flags) define the macro
-.B YY_INTERACTIVE.
-If you redefine
-.B LexerInput()
-and need to take different actions depending on whether or not
-the scanner might be scanning an interactive input source, you can
-test for the presence of this name via
-.B #ifdef.
-.TP
-.B
-virtual void LexerOutput( const char* buf, int size )
-writes out
-.B size
-characters from the buffer
-.B buf,
-which, while NUL-terminated, may also contain "internal" NUL's if
-the scanner's rules can match text with NUL's in them.
-.TP
-.B
-virtual void LexerError( const char* msg )
-reports a fatal error message. The default version of this function
-writes the message to the stream
-.B cerr
-and exits.
-.PP
-Note that a
-.B yyFlexLexer
-object contains its
-.I entire
-scanning state. Thus you can use such objects to create reentrant
-scanners. You can instantiate multiple instances of the same
-.B yyFlexLexer
-class, and you can also combine multiple C++ scanner classes together
-in the same program using the
-.B \-P
-option discussed above.
-.PP
-Finally, note that the
-.B %array
-feature is not available to C++ scanner classes; you must use
-.B %pointer
-(the default).
-.PP
-Here is an example of a simple C++ scanner:
-.nf
-
- // An example of using the flex C++ scanner class.
-
- %{
- int mylineno = 0;
- %}
-
- string \\"[^\\n"]+\\"
-
- ws [ \\t]+
-
- alpha [A-Za-z]
- dig [0-9]
- name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
- num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
- num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
- number {num1}|{num2}
-
- %%
-
- {ws} /* skip blanks and tabs */
-
- "/*" {
- int c;
-
- while((c = yyinput()) != 0)
- {
- if(c == '\\n')
- ++mylineno;
-
- else if(c == '*')
- {
- if((c = yyinput()) == '/')
- break;
- else
- unput(c);
- }
- }
- }
-
- {number} cout << "number " << YYText() << '\\n';
-
- \\n mylineno++;
-
- {name} cout << "name " << YYText() << '\\n';
-
- {string} cout << "string " << YYText() << '\\n';
-
- %%
-
- int main( int /* argc */, char** /* argv */ )
- {
- FlexLexer* lexer = new yyFlexLexer;
- while(lexer->yylex() != 0)
- ;
- return 0;
- }
-.fi
-IMPORTANT: the present form of the scanning class is
-.I experimental
-and may change considerably between major releases.
-.SH INCOMPATIBILITIES WITH LEX AND POSIX
-.I flex
-is a rewrite of the AT&T Unix
-.I lex
-tool (the two implementations do not share any code, though),
-with some extensions and incompatibilities, both of which
-are of concern to those who wish to write scanners acceptable
-to either implementation. The POSIX
-.I lex
-specification is closer to
-.I flex's
-behavior than that of the original
-.I lex
-implementation, but there also remain some incompatibilities between
-.I flex
-and POSIX. The intent is that ultimately
-.I flex
-will be fully POSIX-conformant. In this section we discuss all of
-the known areas of incompatibility.
-.PP
-.I flex's
-.B \-l
-option turns on maximum compatibility with the original AT&T
-.I lex
-implementation, at the cost of a major loss in the generated scanner's
-performance. We note below which incompatibilities can be overcome
-using the
-.B \-l
-option.
-.PP
-.I flex
-is fully compatible with
-.I lex
-with the following exceptions:
-.IP -
-The undocumented
-.I lex
-scanner internal variable
-.B yylineno
-is not supported unless
-.B \-l
-is used.
-.IP
-yylineno is not part of the POSIX specification.
-.IP -
-The
-.B input()
-routine is not redefinable, though it may be called to read characters
-following whatever has been matched by a rule. If
-.B input()
-encounters an end-of-file the normal
-.B yywrap()
-processing is done. A ``real'' end-of-file is returned by
-.B input()
-as
-.I EOF.
-.IP
-Input is instead controlled by defining the
-.B YY_INPUT
-macro.
-.IP
-The
-.I flex
-restriction that
-.B input()
-cannot be redefined is in accordance with the POSIX specification,
-which simply does not specify any way of controlling the
-scanner's input other than by making an initial assignment to
-.I yyin.
-.IP -
-.I flex
-scanners are not as reentrant as
-.I lex
-scanners. In particular, if you have an interactive scanner and
-an interrupt handler which long-jumps out of the scanner, and
-the scanner is subsequently called again, you may get the following
-message:
-.nf
-
- fatal flex scanner internal error--end of buffer missed
-
-.fi
-To reenter the scanner, first use
-.nf
-
- yyrestart( yyin );
-
-.fi
-Note that this call will throw away any buffered input; usually this
-isn't a problem with an interactive scanner.
-.IP
-Also note that flex C++ scanner classes
-.I are
-reentrant, so if using C++ is an option for you, you should use
-them instead. See "Generating C++ Scanners" above for details.
-.IP -
-.B output()
-is not supported.
-Output from the
-.B ECHO
-macro is done to the file-pointer
-.I yyout
-(default
-.I stdout).
-.IP
-.B output()
-is not part of the POSIX specification.
-.IP -
-.I lex
-does not support exclusive start conditions (%x), though they
-are in the POSIX specification.
-.IP -
-When definitions are expanded,
-.I flex
-encloses them in parentheses.
-With lex, the following:
-.nf
-
- NAME [A-Z][A-Z0-9]*
- %%
- foo{NAME}? printf( "Found it\\n" );
- %%
-
-.fi
-will not match the string "foo" because when the macro
-is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
-and the precedence is such that the '?' is associated with
-"[A-Z0-9]*". With
-.I flex,
-the rule will be expanded to
-"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
-.IP
-Note that if the definition begins with
-.B ^
-or ends with
-.B $
-then it is
-.I not
-expanded with parentheses, to allow these operators to appear in
-definitions without losing their special meanings. But the
-.B <s>, /,
-and
-.B <<EOF>>
-operators cannot be used in a
-.I flex
-definition.
-.IP
-Using
-.B \-l
-results in the
-.I lex
-behavior of no parentheses around the definition.
-.IP
-The POSIX specification is that the definition be enclosed in parentheses.
-.IP -
-The
-.I lex
-.B %r
-(generate a Ratfor scanner) option is not supported. It is not part
-of the POSIX specification.
-.IP -
-After a call to
-.B unput(),
-.I yytext
-and
-.I yyleng
-are undefined until the next token is matched, unless the scanner
-was built using
-.B %array.
-This is not the case with
-.I lex
-or the POSIX specification. The
-.B \-l
-option does away with this incompatibility.
-.IP -
-The precedence of the
-.B {}
-(numeric range) operator is different.
-.I lex
-interprets "abc{1,3}" as "match one, two, or
-three occurrences of 'abc'", whereas
-.I flex
-interprets it as "match 'ab'
-followed by one, two, or three occurrences of 'c'". The latter is
-in agreement with the POSIX specification.
-.IP -
-The precedence of the
-.B ^
-operator is different.
-.I lex
-interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
-or 'bar' anywhere", whereas
-.I flex
-interprets it as "match either 'foo' or 'bar' if they come at the beginning
-of a line". The latter is in agreement with the POSIX specification.
-.IP -
-.I yyin
-is
-.I initialized
-by
-.I lex
-to be
-.I stdin;
-.I flex,
-on the other hand,
-initializes
-.I yyin
-to NULL
-and then
-.I assigns
-it to
-.I stdin
-the first time the scanner is called, providing
-.I yyin
-has not already been assigned to a non-NULL value. The difference is
-subtle, but the net effect is that with
-.I flex
-scanners,
-.I yyin
-does not have a valid value until the scanner has been called.
-.IP
-The
-.B \-l
-option does away with this incompatibility.
-.IP -
-The special table-size declarations such as
-.B %a
-supported by
-.I lex
-are not required by
-.I flex
-scanners;
-.I flex
-ignores them.
-.IP -
-The name
-.bd
-FLEX_SCANNER
-is #define'd so scanners may be written for use with either
-.I flex
-or
-.I lex.
-.PP
-The following
-.I flex
-features are not included in
-.I lex
-or the POSIX specification:
-.nf
-
- yyterminate()
- <<EOF>>
- <*>
- YY_DECL
- YY_START
- YY_USER_ACTION
- #line directives
- %{}'s around actions
- multiple actions on a line
-
-.fi
-plus almost all of the flex flags.
-The last feature in the list refers to the fact that with
-.I flex
-you can put multiple actions on the same line, separated with
-semi-colons, while with
-.I lex,
-the following
-.nf
-
- foo handle_foo(); ++num_foos_seen;
-
-.fi
-is (rather surprisingly) truncated to
-.nf
-
- foo handle_foo();
-
-.fi
-.I flex
-does not truncate the action. Actions that are not enclosed in
-braces are simply terminated at the end of the line.
-.SH DIAGNOSTICS
-.PP
-.I warning, rule cannot be matched
-indicates that the given rule
-cannot be matched because it follows other rules that will
-always match the same text as it. For
-example, in the following "foo" cannot be matched because it comes after
-an identifier "catch-all" rule:
-.nf
-
- [a-z]+ got_identifier();
- foo got_foo();
-
-.fi
-Using
-.B REJECT
-in a scanner suppresses this warning.
-.PP
-.I warning,
-.B \-s
-.I
-option given but default rule can be matched
-means that it is possible (perhaps only in a particular start condition)
-that the default rule (match any single character) is the only one
-that will match a particular input. Since
-.B \-s
-was given, presumably this is not intended.
-.PP
-.I reject_used_but_not_detected undefined
-or
-.I yymore_used_but_not_detected undefined -
-These errors can occur at compile time. They indicate that the
-scanner uses
-.B REJECT
-or
-.B yymore()
-but that
-.I flex
-failed to notice the fact, meaning that
-.I flex
-scanned the first two sections looking for occurrences of these actions
-and failed to find any, but somehow you snuck some in (via a #include
-file, for example). Make an explicit reference to the action in your
-.I flex
-input file. (Note that previously
-.I flex
-supported a
-.B %used/%unused
-mechanism for dealing with this problem; this feature is still supported
-but now deprecated, and will go away soon unless the author hears from
-people who can argue compellingly that they need it.)
-.PP
-.I flex scanner jammed -
-a scanner compiled with
-.B \-s
-has encountered an input string which wasn't matched by
-any of its rules. This error can also occur due to internal problems.
-.PP
-.I token too large, exceeds YYLMAX -
-your scanner uses
-.B %array
-and one of its rules matched a string longer than the
-.B YYLMAX
-constant (8K bytes by default). You can increase the value by
-#define'ing
-.B YYLMAX
-in the definitions section of your
-.I flex
-input.
-.PP
-.I scanner requires \-8 flag to
-.I use the character 'x' -
-Your scanner specification includes recognizing the 8-bit character
-.I 'x'
-and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
-because you used the
-.B \-Cf
-or
-.B \-CF
-table compression options. See the discussion of the
-.B \-7
-flag for details.
-.PP
-.I flex scanner push-back overflow -
-you used
-.B unput()
-to push back so much text that the scanner's buffer could not hold
-both the pushed-back text and the current token in
-.B yytext.
-Ideally the scanner should dynamically resize the buffer in this case, but at
-present it does not.
-.PP
-.I
-input buffer overflow, can't enlarge buffer because scanner uses REJECT -
-the scanner was working on matching an extremely large token and needed
-to expand the input buffer. This doesn't work with scanners that use
-.B
-REJECT.
-.PP
-.I
-fatal flex scanner internal error--end of buffer missed -
-This can occur in an scanner which is reentered after a long-jump
-has jumped out (or over) the scanner's activation frame. Before
-reentering the scanner, use:
-.nf
-
- yyrestart( yyin );
-
-.fi
-or, as noted above, switch to using the C++ scanner class.
-.PP
-.I too many start conditions in <> construct! -
-you listed more start conditions in a <> construct than exist (so
-you must have listed at least one of them twice).
-.SH FILES
-See flex(1).
-.SH DEFICIENCIES / BUGS
-Again, see flex(1).
-.SH "SEE ALSO"
-.PP
-flex(1), lex(1), yacc(1), sed(1), awk(1).
-.PP
-M. E. Lesk and E. Schmidt,
-.I LEX \- Lexical Analyzer Generator
-.SH AUTHOR
-Vern Paxson, with the help of many ideas and much inspiration from
-Van Jacobson. Original version by Jef Poskanzer. The fast table
-representation is a partial implementation of a design done by Van
-Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
-.PP
-Thanks to the many
-.I flex
-beta-testers, feedbackers, and contributors, especially Francois Pinard,
-Casey Leedom,
-Nelson H.F. Beebe, benson@odi.com, Peter A. Bigot, Keith Bostic, Frederic
-Brehm, Nick Christopher, Jason Coughlin, Bill Cox, Dave Curtis, Scott David
-Daniels, Chris G. Demetriou, Mike Donahue, Chuck Doucette, Tom Epperly, Leo
-Eskin, Chris Faylor, Jon Forrest, Kaveh R. Ghazi,
-Eric Goldman, Ulrich Grepel, Jan Hajic,
-Jarkko Hietaniemi, Eric Hughes, John Interrante,
-Ceriel Jacobs, Jeffrey R. Jones, Henry
-Juengst, Amir Katz, ken@ken.hilco.com, Kevin B. Kenny, Marq Kole, Ronald
-Lamprecht, Greg Lee, Craig Leres, John Levine, Steve Liddle,
-Mohamed el Lozy, Brian Madsen, Chris
-Metcalf, Luke Mewburn, Jim Meyering, G.T. Nicol, Landon Noll, Marc Nozell,
-Richard Ohnemus, Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
-Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Frederic Raimbault,
-Rick Richardson,
-Kevin Rodgers, Jim Roskind,
-Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
-Alex Siegel, Mike Stump, Paul Stuart, Dave Tallman, Chris Thewalt,
-Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
-Yap, Nathan Zelle, David Zuhn, and those whose names have slipped my marginal
-mail-archiving skills but whose contributions are appreciated all the
-same.
-.PP
-Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
-John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
-Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
-distribution headaches.
-.PP
-Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
-Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
-Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
-Eric Hughes for support of multiple buffers.
-.PP
-This work was primarily done when I was with the Real Time Systems Group
-at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
-for the support I received.
-.PP
-Send comments to:
-.nf
-
- Vern Paxson
- Systems Engineering
- Bldg. 46A, Room 1123
- Lawrence Berkeley Laboratory
- University of California
- Berkeley, CA 94720
-
- vern@ee.lbl.gov
-
-.fi
OpenPOWER on IntegriCloud