summaryrefslogtreecommitdiffstats
path: root/gnu
diff options
context:
space:
mode:
authorjraynard <jraynard@FreeBSD.org>1997-10-14 18:29:32 +0000
committerjraynard <jraynard@FreeBSD.org>1997-10-14 18:29:32 +0000
commit6db12e8fe98fa53c73801a8d9c6dc559eb2a65fa (patch)
treecf24aefca65bf86802f8fd598c150cd794a0ef01 /gnu
parent66a0e8d6b0be9bddd17a134bb709154357040eb4 (diff)
downloadFreeBSD-src-6db12e8fe98fa53c73801a8d9c6dc559eb2a65fa.zip
FreeBSD-src-6db12e8fe98fa53c73801a8d9c6dc559eb2a65fa.tar.gz
Remove old version of awk.
Diffstat (limited to 'gnu')
-rw-r--r--gnu/usr.bin/awk/ACKNOWLEDGMENT25
-rw-r--r--gnu/usr.bin/awk/COPYING340
-rw-r--r--gnu/usr.bin/awk/FUTURES117
-rw-r--r--gnu/usr.bin/awk/LIMITATIONS16
-rw-r--r--gnu/usr.bin/awk/NEWS1480
-rw-r--r--gnu/usr.bin/awk/PORTS35
-rw-r--r--gnu/usr.bin/awk/POSIX95
-rw-r--r--gnu/usr.bin/awk/PROBLEMS10
-rw-r--r--gnu/usr.bin/awk/README125
-rw-r--r--gnu/usr.bin/awk/array.c515
-rw-r--r--gnu/usr.bin/awk/awk.11969
-rw-r--r--gnu/usr.bin/awk/awk.h790
-rw-r--r--gnu/usr.bin/awk/awk.y1868
-rw-r--r--gnu/usr.bin/awk/builtin.c1239
-rw-r--r--gnu/usr.bin/awk/config.h306
-rw-r--r--gnu/usr.bin/awk/dfa.c2613
-rw-r--r--gnu/usr.bin/awk/dfa.h360
-rw-r--r--gnu/usr.bin/awk/doc/gawk.texi11270
-rw-r--r--gnu/usr.bin/awk/eval.c1260
-rw-r--r--gnu/usr.bin/awk/field.c678
-rw-r--r--gnu/usr.bin/awk/getopt.c757
-rw-r--r--gnu/usr.bin/awk/getopt.h129
-rw-r--r--gnu/usr.bin/awk/getopt1.c187
-rw-r--r--gnu/usr.bin/awk/io.c1283
-rw-r--r--gnu/usr.bin/awk/iop.c321
-rw-r--r--gnu/usr.bin/awk/main.c823
-rw-r--r--gnu/usr.bin/awk/msg.c107
-rw-r--r--gnu/usr.bin/awk/node.c463
-rw-r--r--gnu/usr.bin/awk/patchlevel.h1
-rw-r--r--gnu/usr.bin/awk/protos.h122
-rw-r--r--gnu/usr.bin/awk/re.c212
-rw-r--r--gnu/usr.bin/awk/version.c47
32 files changed, 0 insertions, 29563 deletions
diff --git a/gnu/usr.bin/awk/ACKNOWLEDGMENT b/gnu/usr.bin/awk/ACKNOWLEDGMENT
deleted file mode 100644
index cb4021f..0000000
--- a/gnu/usr.bin/awk/ACKNOWLEDGMENT
+++ /dev/null
@@ -1,25 +0,0 @@
-The current developers of Gawk would like to thank and acknowledge the
-many people who have contributed to the development through bug reports
-and fixes and suggestions. Unfortunately, we have not been organized
-enough to keep track of all the names -- for that we apologize.
-
-Another group of people have assisted even more by porting Gawk to new
-platforms and providing a great deal of feedback. They are:
-
- Hal Peterson <hrp@pecan.cray.com> (Cray)
- Pat Rankin <gawk.rankin@EQL.Caltech.Edu> (VMS)
- Michal Jaegermann <michal@gortel.phys.UAlberta.CA> (Atari, NeXT, DEC 3100)
- Mike Lijewski <mjlx@eagle.cnsf.cornell.edu> (IBM RS6000)
- Scott Deifik <scottd@amgen.com> (MSDOS 2.14 and 2.15)
- Kent Williams (MSDOS 2.11)
- Conrad Kwok (MSDOS earlier versions)
- Scott Garfinkle (MSDOS earlier versions)
- Kai Uwe Rommel <rommel@ars.muc.de> (OS/2)
- Darrel Hankerson <hankedr@mail.auburn.edu> (OS/2)
- Mark Moraes <Mark-Moraes@deshaw.com> (Code Center, Purify)
- Kaveh Ghazi <ghazi@noc.rutgers.edu> (Lots of Unix variants)
-
-Last, but far from least, we would like to thank Brian Kernighan who
-has helped to clear up many dark corners of the language and provided a
-restraining touch when we have been overly tempted by "feeping
-creaturism".
diff --git a/gnu/usr.bin/awk/COPYING b/gnu/usr.bin/awk/COPYING
deleted file mode 100644
index 3358a7b..0000000
--- a/gnu/usr.bin/awk/COPYING
+++ /dev/null
@@ -1,340 +0,0 @@
- GNU GENERAL PUBLIC LICENSE
- Version 2, June 1991
-
- Copyright (C) 1989, 1991 Free Software Foundation, Inc.
- 675 Mass Ave, Cambridge, MA 02139, USA
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
- Preamble
-
- The licenses for most software are designed to take away your
-freedom to share and change it. By contrast, the GNU General Public
-License is intended to guarantee your freedom to share and change free
-software--to make sure the software is free for all its users. This
-General Public License applies to most of the Free Software
-Foundation's software and to any other program whose authors commit to
-using it. (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.) You can apply it to
-your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-this service if you wish), that you receive source code or can get it
-if you want it, that you can change the software or use pieces of it
-in new free programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must show them these terms so they know their
-rights.
-
- We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- Finally, any free program is threatened constantly by software
-patents. We wish to avoid the danger that redistributors of a free
-program will individually obtain patent licenses, in effect making the
-program proprietary. To prevent this, we have made it clear that any
-patent must be licensed for everyone's free use or not licensed at all.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
- GNU GENERAL PUBLIC LICENSE
- TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
- 0. This License applies to any program or other work which contains
-a notice placed by the copyright holder saying it may be distributed
-under the terms of this General Public License. The "Program", below,
-refers to any such program or work, and a "work based on the Program"
-means either the Program or any derivative work under copyright law:
-that is to say, a work containing the Program or a portion of it,
-either verbatim or with modifications and/or translated into another
-language. (Hereinafter, translation is included without limitation in
-the term "modification".) Each licensee is addressed as "you".
-
-Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope. The act of
-running the Program is not restricted, and the output from the Program
-is covered only if its contents constitute a work based on the
-Program (independent of having been made by running the Program).
-Whether that is true depends on what the Program does.
-
- 1. You may copy and distribute verbatim copies of the Program's
-source code as you receive it, in any medium, provided that you
-conspicuously and appropriately publish on each copy an appropriate
-copyright notice and disclaimer of warranty; keep intact all the
-notices that refer to this License and to the absence of any warranty;
-and give any other recipients of the Program a copy of this License
-along with the Program.
-
-You may charge a fee for the physical act of transferring a copy, and
-you may at your option offer warranty protection in exchange for a fee.
-
- 2. You may modify your copy or copies of the Program or any portion
-of it, thus forming a work based on the Program, and copy and
-distribute such modifications or work under the terms of Section 1
-above, provided that you also meet all of these conditions:
-
- a) You must cause the modified files to carry prominent notices
- stating that you changed the files and the date of any change.
-
- b) You must cause any work that you distribute or publish, that in
- whole or in part contains or is derived from the Program or any
- part thereof, to be licensed as a whole at no charge to all third
- parties under the terms of this License.
-
- c) If the modified program normally reads commands interactively
- when run, you must cause it, when started running for such
- interactive use in the most ordinary way, to print or display an
- announcement including an appropriate copyright notice and a
- notice that there is no warranty (or else, saying that you provide
- a warranty) and that users may redistribute the program under
- these conditions, and telling the user how to view a copy of this
- License. (Exception: if the Program itself is interactive but
- does not normally print such an announcement, your work based on
- the Program is not required to print an announcement.)
-
-These requirements apply to the modified work as a whole. If
-identifiable sections of that work are not derived from the Program,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works. But when you
-distribute the same sections as part of a whole which is a work based
-on the Program, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Program.
-
-In addition, mere aggregation of another work not based on the Program
-with the Program (or with a work based on the Program) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
- 3. You may copy and distribute the Program (or a work based on it,
-under Section 2) in object code or executable form under the terms of
-Sections 1 and 2 above provided that you also do one of the following:
-
- a) Accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of Sections
- 1 and 2 above on a medium customarily used for software interchange; or,
-
- b) Accompany it with a written offer, valid for at least three
- years, to give any third party, for a charge no more than your
- cost of physically performing source distribution, a complete
- machine-readable copy of the corresponding source code, to be
- distributed under the terms of Sections 1 and 2 above on a medium
- customarily used for software interchange; or,
-
- c) Accompany it with the information you received as to the offer
- to distribute corresponding source code. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form with such
- an offer, in accord with Subsection b above.)
-
-The source code for a work means the preferred form of the work for
-making modifications to it. For an executable work, complete source
-code means all the source code for all modules it contains, plus any
-associated interface definition files, plus the scripts used to
-control compilation and installation of the executable. However, as a
-special exception, the source code distributed need not include
-anything that is normally distributed (in either source or binary
-form) with the major components (compiler, kernel, and so on) of the
-operating system on which the executable runs, unless that component
-itself accompanies the executable.
-
-If distribution of executable or object code is made by offering
-access to copy from a designated place, then offering equivalent
-access to copy the source code from the same place counts as
-distribution of the source code, even though third parties are not
-compelled to copy the source along with the object code.
-
- 4. You may not copy, modify, sublicense, or distribute the Program
-except as expressly provided under this License. Any attempt
-otherwise to copy, modify, sublicense or distribute the Program is
-void, and will automatically terminate your rights under this License.
-However, parties who have received copies, or rights, from you under
-this License will not have their licenses terminated so long as such
-parties remain in full compliance.
-
- 5. You are not required to accept this License, since you have not
-signed it. However, nothing else grants you permission to modify or
-distribute the Program or its derivative works. These actions are
-prohibited by law if you do not accept this License. Therefore, by
-modifying or distributing the Program (or any work based on the
-Program), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Program or works based on it.
-
- 6. Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the
-original licensor to copy, distribute or modify the Program subject to
-these terms and conditions. You may not impose any further
-restrictions on the recipients' exercise of the rights granted herein.
-You are not responsible for enforcing compliance by third parties to
-this License.
-
- 7. If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License. If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Program at all. For example, if a patent
-license would not permit royalty-free redistribution of the Program by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Program.
-
-If any portion of this section is held invalid or unenforceable under
-any particular circumstance, the balance of the section is intended to
-apply and the section as a whole is intended to apply in other
-circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system, which is
-implemented by public license practices. Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-
- 8. If the distribution and/or use of the Program is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Program under this License
-may add an explicit geographical distribution limitation excluding
-those countries, so that distribution is permitted only in or among
-countries not thus excluded. In such case, this License incorporates
-the limitation as if written in the body of this License.
-
- 9. The Free Software Foundation may publish revised and/or new versions
-of the General Public License from time to time. Such new versions will
-be similar in spirit to the present version, but may differ in detail to
-address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of this License which applies to it and "any
-later version", you have the option of following the terms and conditions
-either of that version or of any later version published by the Free
-Software Foundation. If the Program does not specify a version number of
-this License, you may choose any version ever published by the Free Software
-Foundation.
-
- 10. If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the author
-to ask for permission. For software which is copyrighted by the Free
-Software Foundation, write to the Free Software Foundation; we sometimes
-make exceptions for this. Our decision will be guided by the two goals
-of preserving the free status of all derivatives of our free software and
-of promoting the sharing and reuse of software generally.
-
- NO WARRANTY
-
- 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
-FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
-OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
-PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
-OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
-TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
-PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
-REPAIR OR CORRECTION.
-
- 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
-WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
-REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
-OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
-TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
-YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
-PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGES.
-
- END OF TERMS AND CONDITIONS
-
- Appendix: How to Apply These Terms to Your New Programs
-
- If you develop a new program, and you want it to be of the greatest
-possible use to the public, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least
-the "copyright" line and a pointer to where the full notice is found.
-
- <one line to give the program's name and a brief idea of what it does.>
- Copyright (C) 19yy <name of author>
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
-Also add information on how to contact you by electronic and paper mail.
-
-If the program is interactive, make it output a short notice like this
-when it starts in an interactive mode:
-
- Gnomovision version 69, Copyright (C) 19yy name of author
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
-The hypothetical commands `show w' and `show c' should show the appropriate
-parts of the General Public License. Of course, the commands you use may
-be called something other than `show w' and `show c'; they could even be
-mouse-clicks or menu items--whatever suits your program.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a "copyright disclaimer" for the program, if
-necessary. Here is a sample; alter the names:
-
- Yoyodyne, Inc., hereby disclaims all copyright interest in the program
- `Gnomovision' (which makes passes at compilers) written by James Hacker.
-
- <signature of Ty Coon>, 1 April 1989
- Ty Coon, President of Vice
-
-This General Public License does not permit incorporating your program into
-proprietary programs. If your program is a subroutine library, you may
-consider it more useful to permit linking proprietary applications with the
-library. If this is what you want to do, use the GNU Library General
-Public License instead of this License.
-
diff --git a/gnu/usr.bin/awk/FUTURES b/gnu/usr.bin/awk/FUTURES
deleted file mode 100644
index 6250584..0000000
--- a/gnu/usr.bin/awk/FUTURES
+++ /dev/null
@@ -1,117 +0,0 @@
-This file lists future projects and enhancements for gawk. Items are listed
-in roughly the order they will be done for a given release. This file is
-mainly for use by the developer(s) to help keep themselves on track, please
-don't bug us too much about schedules or what all this really means.
-
-(An `x' indicates that some progress has been made, but that the feature is
-not complete yet.)
-
-For 2.16
-========
-x Move to autoconf-based configure system.
-
-x Research awk `fflush' function.
-
-x Generalize IGNORECASE
- any value makes it work, not just numeric non-zero
- make it apply to *all* string comparisons
-
-x Fix FILENAME to have an initial value of "", not "-"
-
-In 2.17
-=======
-x Allow RS to be a regexp.
-
- RT variable to hold text of record terminator
-
- RECLEN variable for fixed length records
-
- Feedback alloca.s changes to FSF
-
-x Split() with null string as third arg to split up strings
-
-x Analogously, setting FS="" would split the input record into individual
- characters.
-
-x Clean up code by isolating system-specific functions in separate files.
-
- Undertake significant directory reorganization.
-
-x Extensive manual cleanup:
- Use of texinfo 2.0 features
- Lots more examples
- Document all of the above.
-
-x Go to POSIX regexps
-
- Make regex + dfa less dependant on gawk header file includes
-
- Additional manual features:
- Document posix regexps
- Document use of dbm arrays
- ? Add an error messages section to the manual
- ? A section on where gawk is bounded
- regex
- i/o
- sun fp conversions
-
-For 2.18
-========
- DBM storage of awk arrays. Try to allow multiple dbm packages
-
- General sub functions:
- edit(line, pat, sub) and gedit(line, pat, sub)
- that return the substituted strings and allow \1 etc. in the sub
- string.
-
- ? Have strftime() pay attention to the value of ENVIRON["TZ"]
-
-For 2.19
-========
- Add chdir and stat built-in functions.
-
- Add function pointers as valid variable types.
-
- Add an `ftw' built-in function that takes a function pointer.
-
- Do an optimization pass over parse tree?
-
-For 2.20 or later:
-==================
-Add variables similar to C's __FILE__ and __LINE__ for better diagnostics
-from within awk programs.
-
-Add an explicit concatenation operator and assignment version.
-
-? Add a switch statement
-
-Add the ability to seek on an open file and retrieve the current file position.
-
-Add lint checking everywhere, including check for use of builtin vars.
-only in new awk.
-
-"restart" keyword
-
-Add |&
-
-Make awk '/foo/' files... run at egrep speeds
-
-Do a reference card
-
-Allow OFMT and CONVFMT to be other than a floating point format.
-
-Allow redefining of builtin functions?
-
-Make it faster and smaller.
-
-For 3.x:
-========
-
-Create a gawk compiler?
-
-Create a gawk-to-C translator? (or C++??)
-
-Provide awk profiling and debugging.
-
-
-
diff --git a/gnu/usr.bin/awk/LIMITATIONS b/gnu/usr.bin/awk/LIMITATIONS
deleted file mode 100644
index 64eab85..0000000
--- a/gnu/usr.bin/awk/LIMITATIONS
+++ /dev/null
@@ -1,16 +0,0 @@
-This file describes limits of gawk on a Unix system (although it
-is variable even then). Non-Unix systems may have other limits.
-
-# of fields in a record: MAX_INT
-Length of input record: MAX_INT
-Length of output record: unlimited
-Size of a field: MAX_INT
-Size of a printf string: MAX_INT
-Size of a literal string: MAX_INT
-Characters in a character class: 2^(# of bits per byte)
-# of file redirections: unlimited
-# of pipe redirections: min(# of processes per user, # of open files)
-double-precision floating point
-Length of source line: unlimited
-Number of input records in one file: MAX_LONG
-Number of input records total: MAX_LONG
diff --git a/gnu/usr.bin/awk/NEWS b/gnu/usr.bin/awk/NEWS
deleted file mode 100644
index 4df69e7..0000000
--- a/gnu/usr.bin/awk/NEWS
+++ /dev/null
@@ -1,1480 +0,0 @@
-Changes from 2.15.4 to 2.15.5
------------------------------
-
-FUTURES file updated and re-arranged some with more rational schedule.
-
-Many prototypes handled better for ANSI C in protos.h.
-
-getopt.c updated somewhat.
-
-test/Makefile now removes junk directory, `bardargtest' renamed `badargs.'
-
-Bug fix in iop.c for RS = "". Eat trailing newlines off of record separator.
-
-Bug fix in Makefile.bsd44, use leading tab in actions.
-
-Fix in field.c:set_FS for FS == "\\" and IGNORECASE != 0.
-
-Config files updated or added:
- cray60, DEC OSF/1 2.0, Utek, sgi405, next21, next30, atari/config.h,
- sco.
-
-Fix in io.c for ENFILE as well as EMFILE, update decl of groupset to
-include OSF/1.
-
-Rationalized printing as integers if numbers are outside the range of a long.
-Changes to node.c:force_string and builtin.c.
-
-Made internal NF, NR, and FNR variables longs instead of ints.
-
-Add LIMITS_H_MISSING stuff to config.in and awk.h, and default defs for
-INT_MAX and LONG_MAX, if no limits.h file. Add a standard decl of
-the time() function for __STDC__. From ghazi@noc.rutgers.edu.
-
-Fix tree_eval in awk.h and r_tree_eval in eval.c to deal better with
-function parameters, particularly ones that are arrays.
-
-Fix eval.c to print out array names of arrays used in scalar contexts.
-
-Fix eval.c in interpret to zero out source and sourceline initially. This
-does a better job of providing source file and line number information.
-
-Fix to re_parse_field in field.c to not use isspace when RS = "", but rather
-to explicitly look for blank and tab.
-
-Fix to sc_parse_field in field.c to catch the case of the FS character at the
-end of a record.
-
-Lots of miscellanious bug fixes for memory leaks, courtesy Mark Moraes,
-also fixes for arrays.
-
-io.c fixed to warn about lack of explicit closes if --lint.
-
-Updated missing/strftime.c to match posted strftime 6.2.
-
-Bug fix in builtin.c, in case of non-match in sub_common.
-
-Updated constant used for division in builtin.c:do_rand for DEC Alpha
-and CRAY Y-MP.
-
-POSIXLY_CORRECT in the environment turns on --posix (fixed in main.c).
-
-Updated srandom prototype and calls in builtin.c.
-
-Fix awk.y to enforce posix semantics of unary +: result is numeric.
-
-Fix array.c to not rearrange the hash chain upon finding an index in
-the array. This messed things up in cases like:
- for (index1 in array) {
- blah
- if (index2 in array) # blew away the for
- stuff
- }
-
-Fixed spelling errors in the man page.
-
-Fixes in awk.y so that
- gawk '' /path/to/file
-will work without core dumping or finding parse errors.
-
-Fix main.c so that --lint will fuss about an empty program.
-Yet another fix for argument parsing in the case of unrecognized options.
-
-Bug fix in dfa.c to not attempt to free null pointers.
-
-Bug fix in builtin.c to only use DEFAULT_G_PRECISION for %g or %G.
-
-Bug fix in field.c to achieve call by value semantics for split.
-
-Changes from 2.15.3 to 2.15.4
------------------------------
-
-Lots of lint fixes, and do_sprintf made mostly ANSI C compatible.
-
-Man page updated and edited.
-
-Copyrights updated.
-
-Arrays now grow dynamically, initially scaling up by an order of magnitude
- and then doubling, up to ~ 64K. This should keep gawk's performance
- graceful under heavy load.
-
-New `delete array' feature added. Only documented in the man page.
-
-Switched to dfa and regex suites from grep-2.0. These offer the ability to
- move to POSIX regexps in the next release.
-
-Disabled GNU regex ops.
-
-Research awk -m option now recognized. It does nothing in gawk, since gawk
- has no static limits. Only documented in the man page.
-
-New bionic (faster, better, stronger than before) hashing function.
-
-Bug fix in argument handling. `gawk -X' now notices there was no program.
- Additional bug fixes to make --compat and --lint work again.
-
-Many changes for systems where sizeof(int) != sizeof(void *).
-
-Add explicit alloca(0) in io.c to recover space from C alloca.
-
-Fixed file descriptor leak in io.c.
-
-The --version option now follows the GNU coding standards and exits.
-
-Fixed several prototypes in protos.h.
-
-Several tests updated. On Solaris, warn that the out? tests will fail.
-
-Configuration files for SunOS with cc and Solaris 2.x added.
-
-Improved error messages in awk.y on gawk extensions if do_unix or do_compat.
-
-INSTALL file added.
-
-Fixed Atari Makefile and several VMS specific changes.
-
-Better conversion of numbers to strings on systems with broken sprintfs.
-
-Changes from 2.15.2 to 2.15.3
------------------------------
-
-Increased HASHSIZE to a decent number, 127 was way too small.
-
-FILENAME is now the null string in a BEGIN rule.
-
-Argument processing fixed for invalid options and missing arguments.
-
-This version will build on VMS. This included a fix to close all files
- and pipes opened with redirections before closing stdout and stderr.
-
-More getpgrp() defines.
-
-Changes for BSD44: <sys/param.h> in io.c and Makefile.bsd44.
-
-All directories in the distribution are now writable.
-
-Separated LDFLAGS and CFLAGS in Makefile. CFLAGS can now be overridden by
- user.
-
-Make dist now builds compressed archives ending in .gz and runs doschk.
-
-Amiga port.
-
-New getopt.c fixes Alpha OSF/1 problem.
-
-Make clean now removes possible test output.
-
-Improved algorithm for multiple adjacent string concatenations leads to
- performance improvements.
-
-Fix nasty bug whereby command-line assignments, both with -v and at run time,
- could create variables with syntactically illegal names.
-
-Fix obscure bug in printf with %0 flag and filling.
-
-Add a lint check for substr if provided length exceeds remaining characters
- in string.
-
-Update atari support.
-
-PC support enhanced to include support for both DOS and OS/2. (Lots more
- #ifdefs. Sigh.)
-
-Config files for Hitachi Unix and OSF/1, courtesy of Yoko Morishita
- (morisita@sra.co.jp)
-
-Changes from 2.15.1 to 2.15.2
------------------------------
-
-Additions to the FUTURES file.
-
-Document undefined order of output when using both standard output
- and /dev/stdout or any of the /dev output files that gawk emulates in
- the absence of OS support.
-
-Clean up the distribution generation in Makefile.in: the info files are
- now included, the distributed files are marked read-only and patched
- distributions are now unpacked in a directory named with the patch level.
-
-Changes from 2.15 to 2.15.1
----------------------------
-
-Close stdout and stderr before all redirections on program exit. This allows
- detection of write errors and also fixes the messages test on Solaris 2.x.
-
-Removed YYMAXDEPTH define in awk.y which was limiting the parser stack depth.
-
-Changes to config/bsd44, Makefile.bsd44 and configure to bring it into line
- with the BSD4.4 release.
-
-Changed Makefile to use prefix, exec_prefix, bindir etc.
-
-make install now installs info files.
-
-make install now sets permissions on installed files.
-
-Make targets added: uninstall, distclean, mostlyclean and realclean.
-
-Added config.h to cleaner and clobber make targets.
-
-Changes to config/{hpux8x,sysv3,sysv4,ultrix41} to deal with alloca().
-
-Change to getopt.h for portability.
-
-Added more special cases to the getpgrp() call.
-
-Added README.ibmrt-aos and config/ibmrt-aos.
-
-Changes from 2.14 to 2.15
----------------------------
-
-Command-line source can now be mixed with library functions.
-
-ARGIND variable tracks index in ARGV of FILENAME.
-
-GNU style long options in addition to short options.
-
-Plan 9 style special files interpreted by gawk:
- /dev/pid
- /dev/ppid
- /dev/pgrpid
- /dev/user
- $1 = getuid
- $2 = geteuid
- $3 = getgid
- $4 = getegid
- $5 ... $NF = getgroups if supported
-
-ERRNO variable contains error string if getline or close fails.
-
-Very old options -a and -e have gone away.
-
-Inftest has been removed from the default target in test/Makefile -- the
- results were too machine specific and resulted in too many false alarms.
-
-A README.amiga has been added.
-
-The "too many arguments supplied for format string" warning message is only
- in effect under the lint option.
-
-Code improvements in dfa.c.
-
-Fixed all reported bugs:
-
- Writes are checked for failure (such as full filesystem).
-
- Stopped (at least some) runaway error messages.
-
- gsub(/^/, "x") does the right thing for $0 of 0, 1, or more length.
-
- close() on a command being piped to a getline now works properly.
-
- The input record will no longer be freed upon an explicit close()
- of the input file.
-
- A NUL character in FS now works.
-
- In a substitute, \\& now means a literal backslash followed by what
- was matched.
-
- Integer overflow of substring length in substr() is caught.
-
- An input record without a newline termination is handled properly.
-
- In io.c, check is against only EMFILE so that system file table
- is not filled.
-
- Renamed all files with names longer than 14 characters.
-
- Escaped characters in regular expressions were being lost when
- IGNORECASE was used.
-
- Long source lines were not being handled properly.
-
- Sourcefiles that ended in a tab but no newline were bombing.
-
- Patterns that could match zero characters in split() were not working
- properly.
-
- The parsedebug option was not working.
-
- The grammar was being a bit too lenient, allowing some very dubious
- programs to pass.
-
- Compilation with DEBUG defined now works.
-
- A variable read in with getline was not being treated as a potential
- number.
-
- Array subscripts were not always of string type.
-
-
-Changes from 2.13.2 to 2.14
----------------------------
-
-Updated manual!
-
-Added "next file" to skip efficiently to the next input file.
-
-Fixed potential of overflowing buffer in do_sprintf().
-
-Plugged small memory leak in sub_common().
-
-EOF on a redirect is now "sticky" -- it can only be cleared by close()ing
- the pipe or file.
-
-Now works if used via a #! /bin/gawk line at the top of an executable file
- when that line ends with whitespace.
-
-Added some checks to the grammar to catch redefinition of builtin functions.
- This could eventually be the basis for an extension to allow redefining
- functions, but in the mean time it's a good error catching facility.
-
-Negative integer exponents now work.
-
-Modified do_system() to make sure it had a non-null string to be passed
- to system(3). Thus, system("") will flush any pending output but not go
- through the overhead of forking an un-needed shell.
-
-A fix to floating point comparisons so that NaNs compare right on IEEE systems.
-
-Added code to make sure we're not opening directories for reading and such.
-
-Added code to do better diagnoses of weird or null file names.
-
-Allow continue outside of a loop, unless in strict posix mode. Lint option
- will issue warning.
-
-New missing/strftime.c. There has been one change that affects gawk. Posix
- now defines a %V conversion so the vms conversion has been changed to %v.
- If this version is used with gawk -Wlint and they use %V in a call to
- strftime, they'll get a warning.
-
-Error messages now conform to GNU standard (I hope).
-
-Changed comparisons to conform to the description found in the file POSIX.
- This is inconsistent with the current POSIX draft, but that is broken.
- Hopefully the final POSIX standard will conform to this version.
- (Alas, this will have to wait for 1003.2b, which will be a revision to
- the 1003.2 standard. That standard has been frozen with the broken
- comparison rules.)
-
-The length of a string was a short and now is a size_t.
-
-Updated VMS help.
-
-Added quite a few new tests to the test suite and deleted many due to lack of
- written releases. Test output is only removed if it is identical to the
- "good" output.
-
-Fixed a couple of bugs for reference to $0 when $0 is "" -- particularly in
- a BEGIN block.
-
-Fixed premature freeing in construct "$0 = $0".
-
-Removed the call to wait_any() in gawk_popen(), since on at least some systems,
- if gawk's input was from a pipe, the predecessor process in the pipe was a
- child of gawk and this caused a deadlock.
-
-Regexp can (once again) match a newline, if given explicitly.
-
-nextopen() makes sure file name is null terminated.
-
-Fixed VMS pipe simulation. Improved VMS I/O performance.
-
-Catch . used in variable names.
-
-Fixed bug in getline without redirect from a file -- it was quitting after the
- first EOF, rather than trying the next file.
-
-Fixed bug in treatment of backslash at the end of a string -- it was bombing
- rather than doing something sensible. It is not clear what this should mean,
- but for now I issue a warning and take it as a literal backslash.
-
-Moved setting of regexp syntax to before the option parsing in main(), to
- handle things like -v FS='[.,;]'
-
-Fixed bug when NF is set by user -- fields_arr must be expanded if necessary
- and "new" fields must be initialized.
-
-Fixed several bugs in [g]sub() for no match found or the match is 0-length.
-
-Fixed bug where in gsub() a pattern anchored at the beginning would still
- substitute throughout the string.
-
-make test does not assume the . is in PATH.
-
-Fixed bug when a field beyond the end of the record was requested after
- $0 was altered (directly or indirectly).
-
-Fixed bug for assignment to field beyond end of record -- the assigned value
- was not found on subsequent reference to that field.
-
-Fixed bug for FS a regexp and it matches at the end of a record.
-
-Fixed memory leak for an array local to a function.
-
-Fixed hanging of pipe redirection to getline
-
-Fixed coredump on access to $0 inside BEGIN block.
-
-Fixed treatment of RS = "". It now parses the fields correctly and strips
- leading whitespace from a record if FS is a space.
-
-Fixed faking of /dev/stdin.
-
-Fixed problem with x += x
-
-Use of scalar as array and vice versa is now detected.
-
-IGNORECASE now obeyed for FS (even if FS is a single alphabetic character).
-
-Switch to GPL version 2.
-
-Renamed awk.tab.c to awktab.c for MSDOS and VMS tar programs.
-
-Renamed this file (CHANGES) to NEWS.
-
-Use fmod() instead of modf() and provide FMOD_MISSING #define to undo
- this change.
-
-Correct the volatile declarations in eval.c.
-
-Avoid errant closing of the file descriptors for stdin, stdout and stderr.
-
-Be more flexible about where semi-colons can occur in programs.
-
-Check for write errors on all output, not just on close().
-
-Eliminate the need for missing/{strtol.c,vprintf.c}.
-
-Use GNU getopt and eliminate missing/getopt.c.
-
-More "lint" checking.
-
-
-Changes from 2.13.1 to 2.13.2
------------------------------
-
-Toward conformity with GNU standards, configure is a link to mkconf, the latter
- to disappear in the next major release.
-
-Update to config/bsd43.
-
-Added config/apollo, config/msc60, config/cray2-50, config/interactive2.2
-
-sgi33.cc added for compilation using cc rather than gcc.
-
-Ultrix41 now propagates to config.h properly -- as part of a general
- mechanism in configure for kludges -- #define anything from a config file
- just gets tacked onto the end of config.h -- to be used sparingly.
-
-Got rid of an unnecessary and troublesome declaration of vprintf().
-
-Small improvement in locality of error messages.
-
-Try to diagnose use of array as scalar and vice versa -- to be improved in
- the future.
-
-Fix for last bug fix for Cray division code--sigh.
-
-More changes to test suite to explicitly use sh. Also get rid of
- a few generated files.
-
-Fixed off-by-one bug in string concatenation code.
-
-Fix for use of array that is passed in from a previous function parameter.
- Addition to test suite for above.
-
-A number of changes associated with changing NF and access to fields
- beyond the end of the current record.
-
-Change to missing/memcmp.c to avoid seg. fault on zero length input.
-
-Updates to test suite (including some inadvertently left out of the last patch)
- to invoke sh explicitly (rather than rely on #!/bin/sh) and remove some
- junk files. test/chem/good updated to correspond to bug fixes.
-
-Changes from 2.13.0 to 2.13.1
------------------------------
-
-More configs and PORTS.
-
-Fixed bug wherein a simple division produced an erroneous FPE, caused by
- the Cray division workaround -- that code is now #ifdef'd only for
- Cray *and* fixed.
-
-Fixed bug in modulus implementation -- it was very close to the above
- code, so I noticed it.
-
-Fixed portability problem with limits.h in missing.c
-
-Fixed portability problem with tzname and daylight -- define TZNAME_MISSING
- if strftime() is missing and tzname is also.
-
-Better support for Latin-1 character set.
-
-Fixed portability problem in test Makefile.
-
-Updated PROBLEMS file.
-
-=============================== gawk-2.13 released =========================
-Changes from 2.12.42 to 2.12.43
--------------------------------
-
-Typo in awk.y
-
-Fixed up strftime.3 and added doc. for %V.
-
-Changes from 2.12.41 to 2.12.42
--------------------------------
-
-Fixed bug in devopen() -- if you had write permission in /dev,
- it would just create /dev/stdout etc.!!
-
-Final (?) VMS update.
-
-Make NeXT use GFMT_WORKAROUND
-
-Fixed bug in sub_common() for substitute on zero-length match. Improved the
- code a bit while I was at it.
-
-Fixed grammar so that $i++ parses as ($i)++
-
-Put support/* back in the distribution (didn't I already do this?!)
-
-Changes from 2.12.40 to 2.12.41
--------------------------------
-
-VMS workaround for broken %g format.
-
-Changes from 2.12.39 to 2.12.40
--------------------------------
-
-Minor man page update.
-
-Fixed latent bug in redirect().
-
-Changes from 2.12.38 to 2.12.39
--------------------------------
-
-Updates to test suite -- remove dependence on changing gawk.1 man page.
-
-Changes from 2.12.37 to 2.12.38
--------------------------------
-
-Fixed bug in use of *= without whitespace following.
-
-VMS update.
-
-Updates to man page.
-
-Option handling updates in main.c
-
-test/manyfiles redone and added to bigtest.
-
-Fixed latent (on Sun) bug in handling of save_fs.
-
-Changes from 2.12.36 to 2.12.37
--------------------------------
-
-Update REL in Makefile-dist. Incorporate test suite into main distribution.
-
-Minor fix in regtest.
-
-Changes from 2.12.35 to 2.12.36
--------------------------------
-
-Release takes on dual personality -- 2.12.36 and 2.13.0 -- any further
- patches before public release won't count for 2.13, although they will for
- 2.12 -- be careful to avoid confusion! patchlevel.h will be the last thing
- to change.
-
-Cray updates to deal with arithmetic problems.
-
-Minor test suite updates.
-
-Fixed latent bug in parser (freeing memory).
-
-Changes from 2.12.34 to 2.12.35
--------------------------------
-
-VMS updates.
-
-Flush stdout at top of err() and stderr at bottom.
-
-Fixed bug in eval_condition() -- it wasn't testing for MAYBE_NUM and
- doing the force_number().
-
-Included the missing manyfiles.awk and a new test to catch the above bug which
- I am amazed wasn't already caught by the test suite -- it's pretty basic.
-
-Changes from 2.12.33 to 2.12.34
--------------------------------
-
-Atari updates -- including bug fix.
-
-More VMS updates -- also nuke vms/version.com.
-
-Fixed bug in handling of large numbers of redirections -- it was probably never
- tested before (blush!).
-
-Minor rearrangement of code in r_force_number().
-
-Made chem and regtest tests a bit more portable (Ultrix again).
-
-Added another test -- manyfiles -- not invoked under any other test -- very Unix
- specific.
-
-Rough beginning of LIMITATIONS file -- need my AWK book to complete it.
-
-Changes from 2.12.32 to 2.12.33
--------------------------------
-
-Expunge debug.? from various files.
-
-Remove vestiges of Floor and Ceil kludge.
-
-Special case integer division -- mainly for Cray, but maybe someone else
- will benefit.
-
-Workaround for iop_close closing an output pipe descriptor on Cray --
- not conditional since I think it may fix a bug on SGI as well and I don't
- think it can hurt elsewhere.
-
-Fixed memory leak in assoc_lookup().
-
-Small cleanup in test suite.
-
-Changes from 2.12.31 to 2.12.32
--------------------------------
-
-Nuked debug.c and debugging flag -- there are better ways.
-
-Nuked version.sh and version.c in subdirectories.
-
-Fixed bug in handling of IGNORECASE.
-
-Fixed bug when FIELDWIDTHS was set via -v option.
-
-Fixed (obscure) bug when $0 is assigned a numerical value.
-
-Fixed so that escape sequences in command-line assignments work (as it already
- said in the comment).
-
-Added a few cases to test suite.
-
-Moved support/* back into distribution.
-
-VMS updates.
-
-Changes from 2.12.30 to 2.12.31
--------------------------------
-
-Cosmetic manual page changes.
-
-Updated sunos3 config.
-
-Small changes in test suite including renaming files over 14 chars. in length.
-
-Changes from 2.12.29 to 2.12.30
--------------------------------
-
-Bug fix for many string concatenations in a row.
-
-Changes from 2.12.28 to 2.12.29
--------------------------------
-
-Minor cleanup in awk.y
-
-Minor VMS update.
-
-Minor atari update.
-
-Changes from 2.12.27 to 2.12.28
--------------------------------
-
-Got rid of the debugging goop in eval.c -- there are better ways.
-
-Sequent port.
-
-VMS changes left out of the last patch -- sigh! config/vms.h renamed
- to config/vms-conf.h.
-
-Fixed missing/tzset.c
-
-Removed use of gcvt() and GCVT_MISSING -- turns out it was no faster than
- sprintf("%g") and caused all sorts of portability headaches.
-
-Tuned get_field() -- it was unnecessarily parsing the whole record on reference
- to $0.
-
-Tuned interpret() a bit in the rule_node loop.
-
-In r_force_number(), worked around bug in Uglix strtod() and got rid of
- ugly do{}while(0) at Michal's urging.
-
-Replaced do_deref() and deref with unref(node) -- much cleaner and a bit faster.
-
-Got rid of assign_number() -- contrary to comment, it was no faster than
- just making a new node and freeing the old one.
-
-Replaced make_number() and tmp_number() with macros that call mk_number().
-
-Changed freenode() and newnode() into macros -- the latter is getnode()
- which calls more_nodes() as necessary.
-
-Changes from 2.12.26 to 2.12.27
--------------------------------
-
-Completion of Cray 2 port (includes a kludge for floor() and ceil()
- that may go or be changed -- I think that it may just be working around
- a bug in chem that is being tweaked on the Cray).
-
-More VMS updates.
-
-Moved kludge over yacc's insertion of malloc and realloc declarations
- from protos.h to the Makefile.
-
-Added a lisp interpreter in awk to the test suite. (Invoked under
- bigtest.)
-
-Cleanup in r_force_number() -- I had never gotten around to a thorough
- profile of the cache code and it turns out to be not worth it.
-
-Performance boost -- do lazy force_number()'ing for fields etc. i.e.
- flag them (MAYBE_NUM) and call force_number only as necessary.
-
-Changes from 2.12.25 to 2.12.26
--------------------------------
-
-Rework of regexp stuff so that dynamic regexps have reasonable
- performance -- string used for compiled regexp is stored and
- compared to new string -- if same, no recompilation is necessary.
- Also, very dynamic regexps cause dfa-based searching to be turned
- off.
-
-Code in dev_open() is back to returning fileno(std*) rather than
- dup()ing it. This will be documented. Sorry for the run-around
- on this.
-
-Minor atari updates.
-
-Minor vms update.
-
-Missing file from MSDOS port.
-
-Added warning (under lint) if third arg. of [g]sub is a constant and
- handle it properly in the code (i.e. return how many matches).
-
-Changes from 2.12.24 to 2.12.25
--------------------------------
-
-MSDOS port.
-
-Non-consequential changes to regexp variables in preparation for
- a more serious change to fix a serious performance problem.
-
-Changes from 2.12.23 to 2.12.24
--------------------------------
-
-Fixed bug in output flushing introduced a few patches back. This caused
- serious performance losses.
-
-Changes from 2.12.22 to 2.12.23
--------------------------------
-
-Accidentally left config/cray2-60 out of last patch.
-
-Added some missing dependencies to Makefile.
-
-Cleaned up mkconf a bit; made yacc the default parser (no alloca needed,
- right?); added rs6000 hook for signed characters.
-
-Made regex.c with NO_ALLOCA undefined work.
-
-Fixed bug in dfa.c for systems where free(NULL) bombs.
-
-Deleted a few cant_happen()'s that *really* can't hapen.
-
-Changes from 2.12.21 to 2.12.22
--------------------------------
-
-Added to config stuff the ability to choose YACC rather than bison.
-
-Fixed CHAR_UNSIGNED in config.h-dist.
-
-Second arg. of strtod() is char ** rather than const char **.
-
-stackb is now initially malloc()'ed since it may be realloc()'ed.
-
-VMS updates.
-
-Added SIZE_T_MISSING to config stuff and a default typedef to awk.h.
- (Maybe it is not needed on any current systems??)
-
-re_compile_pattern()'s size is now size_t unconditionally.
-
-Changes from 2.12.20 to 2.12.21
--------------------------------
-
-Corrected missing/gcvt.c.
-
-Got rid of use of dup2() and thus DUP_MISSING.
-
-Updated config/sgi33.
-
-Turned on (and fixed) in cmp_nodes() the behaviour that I *hope* will be in
- POSIX 1003.2 for relational comparisons.
-
-Small updates to test suite.
-
-Changes from 2.12.19 to 2.12.20
--------------------------------
-
-Sloppy, sloppy, sloppy!! I didn't even try to compile the last two
- patches. This one fixes goofs in regex.c.
-
-Changes from 2.12.18 to 2.12.19
--------------------------------
-
-Cleanup of last patch.
-
-Changes from 2.12.17 to 2.12.18
--------------------------------
-
-Makefile renamed to Makefile-dist.
-
-Added alloca() configuration to mkconf. (A bit kludgey.) Just
- add a single line containing ALLOCA_PW, ALLOCA_S or ALLOCA_C
- to the appropriate config file to have Makefile-dist edited
- accordingly.
-
-Reorganized output flushing to correspond with new semantics of
- devopen() on "/dev/std*" etc.
-
-Fixed rest of last goof!!
-
-Save and restore errno in do_pathopen().
-
-Miscellaneous atari updates.
-
-Get rid of the trailing comma in the NODETYPE definition (Cray
- compiler won't take it).
-
-Try to make the use of `const' consistent since Cray compiler is
- fussy about that. See the changes to `basename' and `myname'.
-
-It turns out that, according to section 3.8.3 (Macro Replacement)
- of the ANSI Standard: ``If there are sequences of preprocessing
- tokens within the list of arguments that would otherwise act as
- preprocessing directives, the behavior is undefined.'' That means
- that you cannot count on the behavior of the declaration of
- re_compile_pattern in awk.h, and indeed the Cray compiler chokes on it.
-
-Replaced alloca with malloc/realloc/free in regex.c. It was much simpler
- than expected. (Inside NO_ALLOCA for now -- by default no alloca.)
-
-Added a configuration file, config/cray60, for Unicos-6.0.
-
-Changes from 2.12.16 to 2.12.17
--------------------------------
-
-Ooops. Goofed signal use in last patch.
-
-Changes from 2.12.15 to 2.12.16
--------------------------------
-
-RENAMED *_dir to just * (e.g. missing_dir).
-
-Numerous VMS changes.
-
-Proper inclusion of atari and vms files.
-
-Added experimental (ifdef'd out) RELAXED_CONTINUATION and DEFAULT_FILETYPE
- -- please comment on these!
-
-Moved pathopen() to io.c (sigh).
-
-Put local directory ahead in default AWKPATH.
-
-Added facility in mkconf to echo comments on stdout: lines beginning
- with "#echo " will have the remainder of the line echoed when mkconf is run.
- Any lines starting with "#" will otherwise be treated as comments. The
- intent is to be able to say:
- "#echo Make sure you uncomment alloca.c in the Makefile"
- or the like.
-
-Prototype fix for V.4
-
-Fixed version_string to not print leading @(#).
-
-Fixed FIELDWIDTHS to work with strict (turned out to be easy).
-
-Fixed conf for V.2.
-
-Changed semantics of /dev/fd/n to be like on real /dev/fd.
-
-Several configuration and updates in the makefile.
-
-Updated manpage.
-
-Include tzset.c and system.c from missing_dir that were accidently left out of
- the last patch.
-
-Fixed bug in cmdline variable assignment -- arg was getting freed(!) in
- call to variable.
-
-Backed out of parse-time constant folding for now, until I can figure out
- how to do it right.
-
-Fixed devopen() so that getline <"-" works.
-
-Changes from 2.12.14 to 2.12.15
--------------------------------
-
-Changed config/* to a condensed form that can be used with mkconf to generate
- a config.h from config.h-dist -- much easier to maintain. Please check
- carefully against what you had before for a particular system and report
- any problems. vms.h remains separate since the stuff at the bottom
- didn't quite fit the mkconf model -- hopefully cleared up later.
-
-Fixed bug in grammar -- didn't allow function definition to be separated from
- other rules by a semi-colon.
-
-VMS fix to #includes in missing.c -- should we just be including awk.h?
-
-Updated README for texinfo.tex version.
-
-Updating of copyright in all .[chy] files.
-
-Added but commented out Michal's fix to strftime.
-
-Added tzset() emulation based on Rick Adams' code. Added TZSET_MISSING to
- config.h-dist.
-
-Added strftime.3 man page for missing_dir
-
-More posix: func, **, **= don't work in -W posix
-
-More lint: ^, ^= not in old awk
-
-gawk.1: removed ref to -DNO_DEV_FD, other minor updating.
-
-Style change: pushbak becomes pushback() in yylex().
-
-Changes from 2.12.13 to 2.12.14
--------------------------------
-
-Better (?) organization of awk.h -- attempt to keep all system dependencies
- near the top and move some of the non-general things out of the config.h
- files.
-
-Change to handling of SYSTEM_MISSING.
-
-Small change to ultrix config.
-
-Do "/dev/fd/*" etc. checking at runtime.
-
-First pass at VMS port.
-
-Improvements to error handling (when lexeme spans buffers).
-
-Fixed backslash handling -- why didn't I notice this sooner?
-
-Added programs from book to test suite and new target "bigtest" to Makefile.
-
-Changes from 2.12.12 to 2.12.13
--------------------------------
-
-Recognize OFS and ORS specially so that OFS = 9 works without efficiency hit.
- Took advantage of opportunity to tune do_print*() for about 10% win on a
- print with 5 args (i.e. small but significant).
-
-Somewhat pervasive changes to reconcile CONVFMT vs. OFMT.
-
-Better initialization of builtin vars.
-
-Make config/* consistent wrt STRTOL_MISSING.
-
-Small portability improvement to alloca.s
-
-Improvements to lint code in awk.y
-
-Replaced strtol() with a better one by Chris Torek.
-
-Changes from 2.12.11 to 2.12.12
--------------------------------
-
-Added PORTS file to record successful ports.
-
-Added #define const to nothing if not STDC and added const to strtod() header.
-
-Added * to printf capabilities and partially implemented ' ' and '+' (has an
- effect for %d only, silently ignored for other formats). I'm afraid that's
- as far as I want to go before I look at a complete replacement for
- do_sprintf().
-
-Added warning for /regexp/ on LHS of MATCHOP.
-
-Changes from 2.12.10 to 2.12.11
--------------------------------
-
-Small Makefile improvements.
-
-Some remaining nits from the NeXT port.
-
-Got rid of bcopy() define in awk.h -- not needed anymore (??)
-
-Changed private in builtin.c -- it is special on Sequent.
-
-Added subset implementation of strtol() and STRTOL_MISSING.
-
-A little bit of cleanup in debug.c, dfa.c.
-
-Changes from 2.12.9 to 2.12.10
-------------------------------
-
-Redid compatability checking and checking for # of args.
-
-Removed all references to variables[] from outside awk.y, in preparation
- for a more abstract interface to the symbol table.
-
-Got rid of a remaining use of bcopy() in regex.c.
-
-Changes from 2.12.8 to 2.12.9
------------------------------
-
-Portability improvements for atari, next and decstation.
-
-Bug fix in substr() -- wasn't handling 3rd arg. of -1 properly.
-
-Manpage updates.
-
-Moved support from src release to doc release.
-
-Updated FUTURES file.
-
-Added some "lint" warnings.
-
-Changes from 2.12.7 to 2.12.8
------------------------------
-
-Changed time() to systime().
-
-Changed warning() in snode() to fatal().
-
-strftime() now defaults second arg. to current time.
-
-Changes from 2.12.6 to 2.12.7
------------------------------
-
-Fixed bug in sub_common() involving inadequate allocation of a buffer.
-
-Added some missing files to the Makefile.
-
-Changes from 2.12.5 to 2.12.6
------------------------------
-
-Fixed bug wherein non-redirected getline could call iop_close() just
- prior to a call from do_input().
-
-Fixed bug in handling of /dev/stdout and /dev/stderr.
-
-Changes from 2.12.4 to 2.12.5
------------------------------
-
-Updated README and support directory.
-
-Changes from 2.12.3 to 2.12.4
------------------------------
-
-Updated CHANGES and TODO (should have been done in previous 2 patches).
-
-Changes from 2.12.2 to 2.12.3
------------------------------
-
-Brought regex.c and alloca.s into line with current FSF versions.
-
-Changes from 2.12.1 to 2.12.2
------------------------------
-
-Portability improvements; mostly moving system prototypes out of awk.h
-
-Introduction of strftime.
-
-Use of CONVFMT.
-
-Changes from 2.12 to 2.12.1
------------------------------
-
-Consolidated treatment of command-line assignments (thus correcting the
--v treatment).
-
-Rationalized builtin-variable handling into a table-driven process, thus
-simplifying variable() and eliminating spc_var().
-
-Fixed bug in handling of command-line source that ended in a newline.
-
-Simplified install() and lookup().
-
-Did away with double-mallocing of identifiers and now free second and later
-instances of a name, after the first gets installed into the symbol table.
-
-Treat IGNORECASE specially, simplifying a lot of code, and allowing
-checking against strict conformance only on setting it, rather than on each
-pattern match.
-
-Fixed regexp matching when IGNORECASE is non-zero (broken when dfa.c was
-added).
-
-Fixed bug where $0 was not being marked as valid, even after it was rebuilt.
-This caused mangling of $0.
-
-
-Changes from 2.11.1 to 2.12
------------------------------
-
-Makefile:
-
-Portability improvements in Makefile.
-Move configuration stuff into config.h
-
-FSF files:
-
-Synchronized alloca.[cs] and regex.[ch] with FSF.
-
-array.c:
-
-Rationalized hash routines into one with a different algorithm.
-delete() now works if the array is a local variable.
-Changed interface of assoc_next() and avoided dereferencing past the end of the
- array.
-
-awk.h:
-
-Merged non-prototype and prototype declarations in awk.h.
-Expanded tree_eval #define to short-circuit more calls of r_tree_eval().
-
-awk.y:
-
-Delinted some of the code in the grammar.
-Fixed and improved some of the error message printing.
-Changed to accomodate unlimited length source lines.
-Line continuation now works as advertised.
-Source lines can be arbitrarily long.
-Refined grammar hacks so that /= assignment works. Regular expressions
- starting with /= are recognized at the beginning of a line, after && or ||
- and after ~ or !~. More contexts can be added if necessary.
-Fixed IGNORECASE (multiple scans for backslash).
-Condensed expression_lists in array references.
-Detect and warn for correct # args in builtin functions -- call most of them
- with a fixed number (i.e. fill in defaults at parse-time rather than at
- run-time).
-Load ENVIRON only if it is referenced (detected at parse-time).
-Treat NF, FS, RS, NR, FNR specially at parse time, to improve run time.
-Fold constant expressions at parse time.
-Do make_regexp() on third arg. of split() at parse tiem if it is a constant.
-
-builtin.c:
-
-srand() returns 0 the first time called.
-Replaced alloca() with malloc() in do_sprintf().
-Fixed setting of RSTART and RLENGTH in do_match().
-Got rid of get_{one,two,three} and allowance for variable # of args. at
- run-time -- this is now done at parse-time.
-Fixed latent bug in [g]sub whereby changes to $0 would never get made.
-Rewrote much of sub_common() for simplicity and performance.
-Added ctime() and time() builtin functions (unless -DSTRICT). ctime() returns
- a time string like the C function, given the number of seconds since the epoch
- and time() returns the current time in seconds.
-do_sprintf() now checks for mismatch between format string and number of
- arguments supplied.
-
-dfa.c
-
-This is borrowed (almost unmodified) from GNU grep to provide faster searches.
-
-eval.c
-
-Node_var, Node_var_array and Node_param_list handled from macro rather
- than in r_tree_eval().
-Changed cmp_nodes() to not do a force_number() -- this, combined with a
- force_number() on ARGV[] and ENVIRON[] brings it into line with other awks
-Greatly simplified cmp_nodes().
-Separated out Node_NF, Node_FS, Node_RS, Node_NR and Node_FNR in get_lhs().
-All adjacent string concatenations now done at once.
-
-field.c
-
-Added support for FIELDWIDTHS.
-Fixed bug in get_field() whereby changes to a field were not always
- properly reflected in $0.
-Reordered tests in parse_field() so that reference off the end of the buffer
- doesn't happen.
-set_FS() now sets *parse_field i.e. routine to call depending on type of FS.
-It also does make_regexp() for FS if needed. get_field() passes FS_regexp
- to re_parse_field(), as does do_split().
-Changes to set_field() and set_record() to avoid malloc'ing and free'ing the
- field nodes repeatedly. The fields now just point into $0 unless they are
- assigned to another variable or changed. force_number() on the field is
- *only* done when the field is needed.
-
-gawk.1
-
-Fixed troff formatting problem on .TP lines.
-
-io.c
-
-Moved some code out into iop.c.
-Output from pipes and system() calls is properly synchronized.
-Status from pipe close properly returned.
-Bug in getline with no redirect fixed.
-
-iop.c
-
-This file contains a totally revamped get_a_record and associated code.
-
-main.c
-
-Command line programs no longer use a temporary file.
-Therefore, tmpnam() no longer required.
-Deprecated -a and -e options -- they will go away in the next release,
- but for now they cause a warning.
-Moved -C, -V, -c options to -W ala posix.
-Added -W posix option: throw out \x
-Added -W lint option.
-
-
-node.c
-
-force_number() now allows pure numerics to have leading whitespace.
-Added make_string facility to optimize case of adding an already malloc'd
- string.
-Cleaned up and simplified do_deref().
-Fixed bug in handling of stref==255 in do_deref().
-
-re.c
-
-contains the interface to regexp code
-
-Changes from 2.11.1 to FSF version of same
-------------------------------------------
-Thu Jan 4 14:19:30 1990 Jim Kingdon (kingdon at albert)
-
- * Makefile (YACC): Add -y to bison part.
-
- * missing.c: Add #include <stdio.h>.
-
-Sun Dec 24 16:16:05 1989 David J. MacKenzie (djm at hobbes.ai.mit.edu)
-
- * * Makefile: Add (commented out) default defines for Sony News.
-
- * awk.h: Move declaration of vprintf so it will compile when
- -DVPRINTF_MISSING is defined.
-
-Mon Nov 13 18:54:08 1989 Robert J. Chassell (bob at apple-gunkies.ai.mit.edu)
-
- * gawk.texinfo: changed @-commands that are not part of the
- standard, currently released texinfmt.el to those that are.
- Otherwise, only people with the as-yet unreleased makeinfo.c can
- format this file.
-
-Changes from 2.11beta to 2.11.1 (production)
---------------------------------------------
-
-Went from "beta" to production status!!!
-
-Now flushes stdout before closing pipes or redirected files to
-synchronize output.
-
-MS-DOS changes added in.
-
-Signal handler return type parameterized in Makefile and awk.h and
-some lint removed. debug.c cleaned up.
-
-Fixed FS splitting to never match null strings, per book.
-
-Correction to the manual's description of FS.
-
-Some compilers break on char *foo = "string" + 4 so fixed version.sh and
-main.c.
-
-Changes from 2.10beta to 2.11beta
----------------------------------
-
-This release fixes all reported bugs that we could reproduce. Probably
-some of the changes are not documented here.
-
-The next release will probably not be a beta release!
-
-The most important change is the addition of the -nostalgia option. :-)
-
-The documentation has been improved and brought up-to-date.
-
-There has been a lot of general cleaning up of the code that is not otherwise
-documented here. There has been a movement toward using standard-conforming
-library routines and providing them (in missing.d) for systems lacking them.
-Improved (hopefully) configuration through Makfile modifications and missing.c.
-In particular, straightened out confusion over vprintf #defines, declarations
-etc.
-
-Deleted RCS log comments from source, to reduce source size by about one third.
-Most of them were horribly out-of-date, anyway.
-
-Renamed source files to reflect (for the most part) their contents.
-
-More and improved error messages. Cleanup and fixes to yyerror().
-String constants are not altered in input buffer, so error messages come out
-better. Fixed usage message. Make use of ANSI C strerror() function
-(provided).
-
-Plugged many more memory leaks. The memory consumption is now quite
-reasonable over a wide range of programs.
-
-Uses volatile declaration if STDC > 0 to avoid problems due to longjmp.
-
-New -a and -e options to use awk or egrep style regexps, respectively,
-since POSIX says awk should use egrep regexps. Default is -a.
-
-Added -v option for setting variables before the first file is encountered.
-Version information now uses -V and copyleft uses -C.
-
-Added a patchlevel.h file and its use for -V and -C.
-
-Append_right() optimized for major improvement to programs with a *lot*
-of statements.
-
-Operator precedence has been corrected to match draft Posix.
-
-Tightened up grammar for builtin functions so that only length
-may be called without arguments or parentheses.
-
-/regex/ is now a normal expression that can appear in any expression
-context.
-
-Allow /= to begin a regexp. Allow ..[../..].. in a regexp.
-
-Allow empty compound statements ({}).
-
-Made return and next illegal outside a function and in BEGIN/END respectively.
-
-Division by zero is now illegal and causes a fatal error.
-
-Fixed exponentiation so that x ^ 0 and x ^= 0 both return 1.
-
-Fixed do_sqrt, do_log, and do_exp to do argument/return checking and
-print an error message, per the manual.
-
-Fixed main to catch SIGSEGV to get source and data file line numbers.
-
-Fixed yyerror to print the ^ at the beginning of the bad token, not the end.
-
-Fix to substr() builtin: it was failing if the arguments
-weren't already strings.
-
-Added new node value flag NUMERIC to indicate that a variable is
-purely a number as opposed to type NUM which indicates that
-the node's numeric value is valid. This is set in make_number(),
-tmp_number and r_force_number() when appropriate and used in
-cmp_nodes(). This fixed a bug in comparison of variables that had
-numeric prefixes. The new code uses strtod() and eliminates is_a_number().
-A simple strtod() is provided for systems lacking one. It does no
-overflow checking, so could be improved.
-
-Simplification and efficiency improvement in force_string.
-
-Added performance tweak in r_force_number().
-
-Fixed a bug with nested loops and break/continue in functions.
-
-Fixed inconsistency in handling of empty fields when $0 has to be rebuilt.
-Happens to simplify rebuild_record().
-
-Cleaned up the code associated with opening a pipe for reading. Gawk
-now has its own popen routine (gawk_popen) that allocates an IOBUF
-and keeps track of the pid of the child process. gawk_pclose
-marks the appropriate child as defunct in the right struct redirect.
-
-Cleaned up and fixed close_redir().
-
-Fixed an obscure bug to do with redirection. Intermingled ">" and ">>"
-redirects did not output in a predictable order.
-
-Improved handling of output buffering: now all print[f]s redirected to a tty
-or pipe are flushed immediately and non-redirected output to a tty is flushed
-before the next input record is read.
-
-Fixed a bug in get_a_record() where bcopy() could have copied over
-a random pointer.
-
-Fixed a bug when RS="" and records separated by multiple blank lines.
-
-Got rid of SLOWIO code which was out-of-date anyway.
-
-Fix in get_field() for case where $0 is changed and then $(n) are
-changed and then $0 is used.
-
-Fixed infinite loop on failure to open file for reading from getline.
-Now handles redirect file open failures properly.
-
-Filenames such as /dev/stdin now allowed on the command line as well as
-in redirects.
-
-Fixed so that gawk '$1' where $1 is a zero tests false.
-
-Fixed parsing so that `RLENGTH -1' parses the same as `RLENGTH - 1',
-for example.
-
-The return from a user-defined function now defaults to the Null node.
-This fixes a core-dump-causing bug when the return value of a function
-is used and that function returns no value.
-
-Now catches floating point exceptions to avoid core dumps.
-
-Bug fix for deleting elements of an array -- under some conditions, it was
-deleting more than one element at a time.
-
-Fix in AWKPATH code for running off the end of the string.
-
-Fixed handling of precision in *printf calls. %0.2d now works properly,
-as does %c. [s]printf now recognizes %i and %X.
-
-Fixed a bug in printing of very large (>240) strings.
-
-Cleaned up erroneous behaviour for RS == "".
-
-Added IGNORECASE support to index().
-
-Simplified and fixed newnode/freenode.
-
-Fixed reference to $(anything) in a BEGIN block.
-
-Eliminated use of USG rand48().
-
-Bug fix in force_string for machines with 16-bit ints.
-
-Replaced use of mktemp() with tmpnam() and provided a partial implementation of
-the latter for systems that don't have it.
-
-Added a portability check for includes in io.c.
-
-Minor portability fix in alloc.c plus addition of xmalloc().
-
-Portability fix: on UMAX4.2, st_blksize is zero for a pipe, thus breaking
-iop_alloc() -- fixed.
-
-Workaround for compiler bug on Sun386i in do_sprintf.
-
-More and improved prototypes in awk.h.
-
-Consolidated C escape parsing code into one place.
-
-strict flag is now turned on only when invoked with compatability option.
-It now applies to fewer things.
-
-Changed cast of f._ptr in vprintf.c from (unsigned char *) to (char *).
-Hopefully this is right for the systems that use this code (I don't).
-
-Support for pipes under MSDOS added.
diff --git a/gnu/usr.bin/awk/PORTS b/gnu/usr.bin/awk/PORTS
deleted file mode 100644
index 5087a43..0000000
--- a/gnu/usr.bin/awk/PORTS
+++ /dev/null
@@ -1,35 +0,0 @@
-A recent version of gawk has been successfully compiled and run "make test"
-on the following:
-
-Sun 4/490 running 4.1
-NeXT running 2.0
-DECstation 3100 running Ultrix 4.0 or Ultrix 3.1 (different config)
-AtariST (16-bit ints, gcc compiler, byacc, running under TOS)
-ESIX V.3.2 Rev D (== System V Release 3.2), the 386. compiler was gcc + bison
-IBM RS/6000 (see README.rs6000)
-486 running SVR4, using cc and bison
-SGI running IRIX 3.3 using gcc (fails with cc)
-Sequent Balance running Dynix V3.1
-Cray Y-MP8 running Unicos 6.0.11
-Cray 2 running Unicos 6.1 (modulo trailing zeroes in chem)
-VAX/VMS V5.x (should also work on 4.6 and 4.7)
-VMS POSIX V1.0, V1.1
-OpenVMS AXP V1.0
-MSDOS - Microsoft C 5.1, compiles and runs very simple testing
-BSD 4.4alpha
-
-From: ghazi@noc.rutgers.edu (Kaveh R. Ghazi):
-
-arch configured as:
----- --------------
-Dec Alpha OSF 1.3 osf1
-Hpux 9.0 hpux8x
-NeXTStep 2.0 next20
-Sgi Irix 4.0.5 (/bin/cc) sgi405.cc
-Stardent Titan 1500 OSv2.5 sysv3
-Stardent Vistra (i860) SVR4 sysv4
-Solaris 2.3 solaris2.cc
-SunOS 4.1.3 sunos41
-Tektronix XD88 (UTekV 3.2e) sysv3
-Tektronix 4300 (UTek 4.0) utek
-Ultrix 4.2 ultrix41
diff --git a/gnu/usr.bin/awk/POSIX b/gnu/usr.bin/awk/POSIX
deleted file mode 100644
index f240542..0000000
--- a/gnu/usr.bin/awk/POSIX
+++ /dev/null
@@ -1,95 +0,0 @@
-Right now, the numeric vs. string comparisons are screwed up in draft
-11.2. What prompted me to check it out was the note in gnu.bug.utils
-which observed that gawk was doing the comparison $1 == "000"
-numerically. I think that we can agree that intuitively, this should
-be done as a string comparison. Version 2.13.2 of gawk follows the
-current POSIX draft. Following is how I (now) think this
-stuff should be done.
-
-1. A numeric literal or the result of a numeric operation has the NUMERIC
- attribute.
-
-2. A string literal or the result of a string operation has the STRING
- attribute.
-
-3. Fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the
- elements of an array created by split() that are numeric strings
- have the STRNUM attribute. Otherwise, they have the STRING attribute.
- Uninitialized variables also have the STRNUM attribute.
-
-4. Attributes propagate across assignments, but are not changed by
- any use. (Although a use may cause the entity to acquire an additional
- value such that it has both a numeric and string value -- this leaves the
- attribute unchanged.)
-
-When two operands are compared, either string comparison or numeric comparison
-may be used, depending on the attributes of the operands, according to the
-following (symmetric) matrix:
-
- +----------------------------------------------
- | STRING NUMERIC STRNUM
---------+----------------------------------------------
- |
-STRING | string string string
- |
-NUMERIC | string numeric numeric
- |
-STRNUM | string numeric numeric
---------+----------------------------------------------
-
-So, the following program should print all OKs.
-
-echo '0e2 0a 0 0b
-0e2 0a 0 0b' |
-$AWK '
-NR == 1 {
- num = 0
- str = "0e2"
-
- print ++test ": " ( (str == "0e2") ? "OK" : "OOPS" )
- print ++test ": " ( ("0e2" != 0) ? "OK" : "OOPS" )
- print ++test ": " ( ("0" != $2) ? "OK" : "OOPS" )
- print ++test ": " ( ("0e2" == $1) ? "OK" : "OOPS" )
-
- print ++test ": " ( (0 == "0") ? "OK" : "OOPS" )
- print ++test ": " ( (0 == num) ? "OK" : "OOPS" )
- print ++test ": " ( (0 != $2) ? "OK" : "OOPS" )
- print ++test ": " ( (0 == $1) ? "OK" : "OOPS" )
-
- print ++test ": " ( ($1 != "0") ? "OK" : "OOPS" )
- print ++test ": " ( ($1 == num) ? "OK" : "OOPS" )
- print ++test ": " ( ($2 != 0) ? "OK" : "OOPS" )
- print ++test ": " ( ($2 != $1) ? "OK" : "OOPS" )
- print ++test ": " ( ($3 == 0) ? "OK" : "OOPS" )
- print ++test ": " ( ($3 == $1) ? "OK" : "OOPS" )
- print ++test ": " ( ($2 != $4) ? "OK" : "OOPS" ) # 15
-}
-{
- a = "+2"
- b = 2
- if (NR % 2)
- c = a + b
- print ++test ": " ( (a != b) ? "OK" : "OOPS" ) # 16 and 22
-
- d = "2a"
- b = 2
- if (NR % 2)
- c = d + b
- print ++test ": " ( (d != b) ? "OK" : "OOPS" )
-
- print ++test ": " ( (d + 0 == b) ? "OK" : "OOPS" )
-
- e = "2"
- print ++test ": " ( (e == b "") ? "OK" : "OOPS" )
-
- a = "2.13"
- print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" )
-
- a = "2.130000"
- print ++test ": " ( (a != 2.13) ? "OK" : "OOPS" )
-
- if (NR == 2) {
- CONVFMT = "%.6f"
- print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" )
- }
-}'
diff --git a/gnu/usr.bin/awk/PROBLEMS b/gnu/usr.bin/awk/PROBLEMS
deleted file mode 100644
index a436180..0000000
--- a/gnu/usr.bin/awk/PROBLEMS
+++ /dev/null
@@ -1,10 +0,0 @@
-This is a list of known problems in gawk 2.15.
-Hopefully they will all be fixed in the next major release of gawk.
-
-Please keep in mind that the code is still undergoing significant evolution.
-
-1. The interactions with the lexer and yyerror need reworking. It is possible
- to get line numbers that are one line off if --compat or --posix is
- true and either `next file' or `delete array' are used.
-
- Really the whole lexical analysis stuff needs reworking.
diff --git a/gnu/usr.bin/awk/README b/gnu/usr.bin/awk/README
deleted file mode 100644
index 90ed9c2..0000000
--- a/gnu/usr.bin/awk/README
+++ /dev/null
@@ -1,125 +0,0 @@
-README:
-
-This is GNU Awk 2.15. It should be upwardly compatible with the System
-V Release 4 awk. It is almost completely compliant with POSIX 1003.2.
-
-This release adds new features -- see NEWS for details.
-
-See the installation instructions, below.
-
-Known problems are given in the PROBLEMS file. Work to be done is
-described briefly in the FUTURES file. Verified ports are listed in
-the PORTS file. Changes in this version are summarized in the NEWS file.
-Please read the LIMITATIONS and ACKNOWLEDGMENT files.
-
-Read the file POSIX for a discussion of how the standard says comparisons
-should be done vs. how they really should be done and how gawk does them.
-
-To format the documentation with TeX, you must use texinfo.tex 2.53
-or later. Otherwise footnotes look unacceptable.
-
-If you wish to remake the Info files, you should use makeinfo. The 2.15
-version of makeinfo works with no errors.
-
-The man page is up to date.
-
-INSTALLATION:
-
-Check whether there is a system-specific README file for your system.
-
-A quick overview of the installation process is in the file INSTALL.
-
-Makefile.in may need some tailoring. The only changes necessary should
-be to change installation targets or to change compiler flags.
-The changes to make in Makefile.in are commented and should be obvious.
-
-All other changes should be made in a config file. Samples for
-various systems are included in the config directory. Starting with
-2.11, our intent has been to make the code conform to standards (ANSI,
-POSIX, SVID, in that order) whenever possible, and to not penalize
-standard conforming systems. We have included substitute versions of
-routines not universally available. Simply add the appropriate define
-for the missing feature(s) on your system.
-
-If you have neither bison nor yacc, use the awktab.c file here. It was
-generated with bison, and should have no AT&T code in it. (Note that
-modifying awk.y without bison or yacc will be difficult, at best. You might
-want to get a copy of bison from the FSF too.)
-
-If no config file is included for your system, start by copying one
-for a similar system. One way of determining the defines needed is to
-try to load gawk with nothing defined and see what routines are
-unresolved by the loader. This should give you a good idea of how to
-proceed.
-
-The next release will use the FSF autoconfig program, so we are no longer
-soliciting new config files.
-
-If you have an MS-DOS or OS/2 system, use the stuff in the pc directory.
-For an Atari there is an atari directory and similarly one for VMS.
-
-Chapter 16 of The GAWK Manual discusses configuration in detail.
-(However, it does not discuss OS/2 configuration, see README.pc for
-the details. The manual is being massively revised for 2.16.)
-
-After successful compilation, do 'make test' to run a small test
-suite. There should be no output from the 'cmp' invocations except in
-the cases where there are small differences in floating point values.
-If there are other differences, please investigate and report the
-problem.
-
-PRINTING THE MANUAL
-
-The 'support' directory contains texinfo.tex 2.115, which will be necessary
-for printing the manual, and the texindex.c program from the texinfo
-distribution which is also necessary. See the makefile for the steps needed
-to get a DVI file from the manual.
-
-CAVEATS
-
-The existence of a patchlevel.h file does *N*O*T* imply a commitment on
-our part to issue bug fixes or patches. It is there in case we should
-decide to do so.
-
-BUG REPORTS AND FIXES (Un*x systems):
-
-Please coordinate changes through David Trueman and/or Arnold Robbins.
-
-David Trueman
-Department of Mathematics, Statistics and Computing Science,
-Dalhousie University, Halifax, Nova Scotia, Canada
-
-UUCP: {uunet utai watmath}!dalcs!david
-INTERNET: david@cs.dal.ca
-
-Arnold Robbins
-1736 Reindeer Drive
-Atlanta, GA, 30329-3528, USA
-
-INTERNET: arnold@skeeve.atl.ga.us
-UUCP: { gatech, emory, emoryu1 }!skeeve!arnold
-
-BUG REPORTS AND FIXES (non-Unix ports):
-
-MS-DOS:
- Scott Deifik
- AMGEN Inc.
- Amgen Center, Bldg.17-Dept.393
- Thousand Oaks, CA 91320-1789
- Tel-805-499-5725 ext.4677
- Fax-805-498-0358
- scottd@amgen.com
-
-VMS:
- Pat Rankin
- rankin@eql.caltech.edu (e-mail only)
-
-Atari ST:
- Michal Jaegermann
- michal@gortel.phys.ualberta.ca (e-mail only)
-
-OS/2:
- Kai Uwe Rommel
- rommel@ars.muc.de (e-mail only)
- Darrel Hankerson
- hankedr@mail.auburn.edu (e-mail only)
diff --git a/gnu/usr.bin/awk/array.c b/gnu/usr.bin/awk/array.c
deleted file mode 100644
index 9166c4e..0000000
--- a/gnu/usr.bin/awk/array.c
+++ /dev/null
@@ -1,515 +0,0 @@
-/*
- * array.c - routines for associative arrays.
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-/*
- * Tree walks (``for (iggy in foo)'') and array deletions use expensive
- * linear searching. So what we do is start out with small arrays and
- * grow them as needed, so that our arrays are hopefully small enough,
- * most of the time, that they're pretty full and we're not looking at
- * wasted space.
- *
- * The decision is made to grow the array if the average chain length is
- * ``too big''. This is defined as the total number of entries in the table
- * divided by the size of the array being greater than some constant.
- */
-
-#define AVG_CHAIN_MAX 10 /* don't want to linear search more than this */
-
-#include "awk.h"
-
-static NODE *assoc_find P((NODE *symbol, NODE *subs, int hash1));
-static void grow_table P((NODE *symbol));
-
-NODE *
-concat_exp(tree)
-register NODE *tree;
-{
- register NODE *r;
- char *str;
- char *s;
- size_t len;
- int offset;
- size_t subseplen;
- char *subsep;
-
- if (tree->type != Node_expression_list)
- return force_string(tree_eval(tree));
- r = force_string(tree_eval(tree->lnode));
- if (tree->rnode == NULL)
- return r;
- subseplen = SUBSEP_node->lnode->stlen;
- subsep = SUBSEP_node->lnode->stptr;
- len = r->stlen + subseplen + 2;
- emalloc(str, char *, len, "concat_exp");
- memcpy(str, r->stptr, r->stlen+1);
- s = str + r->stlen;
- free_temp(r);
- tree = tree->rnode;
- while (tree) {
- if (subseplen == 1)
- *s++ = *subsep;
- else {
- memcpy(s, subsep, subseplen+1);
- s += subseplen;
- }
- r = force_string(tree_eval(tree->lnode));
- len += r->stlen + subseplen;
- offset = s - str;
- erealloc(str, char *, len, "concat_exp");
- s = str + offset;
- memcpy(s, r->stptr, r->stlen+1);
- s += r->stlen;
- free_temp(r);
- tree = tree->rnode;
- }
- r = make_str_node(str, s - str, ALREADY_MALLOCED);
- r->flags |= TEMP;
- return r;
-}
-
-/* Flush all the values in symbol[] before doing a split() */
-void
-assoc_clear(symbol)
-NODE *symbol;
-{
- int i;
- NODE *bucket, *next;
-
- if (symbol->var_array == 0)
- return;
- for (i = 0; i < symbol->array_size; i++) {
- for (bucket = symbol->var_array[i]; bucket; bucket = next) {
- next = bucket->ahnext;
- unref(bucket->ahname);
- unref(bucket->ahvalue);
- freenode(bucket);
- }
- symbol->var_array[i] = 0;
- }
- free(symbol->var_array);
- symbol->var_array = NULL;
- symbol->array_size = symbol->table_size = 0;
- symbol->flags &= ~ARRAYMAXED;
-}
-
-/*
- * calculate the hash function of the string in subs
- */
-unsigned int
-hash(s, len, hsize)
-register const char *s;
-register size_t len;
-unsigned long hsize;
-{
- register unsigned long h = 0;
-
-#ifdef this_is_really_slow
-
- register unsigned long g;
-
- while (len--) {
- h = (h << 4) + *s++;
- g = (h & 0xf0000000);
- if (g) {
- h = h ^ (g >> 24);
- h = h ^ g;
- }
- }
-
-#else /* this_is_really_slow */
-/*
- * This is INCREDIBLY ugly, but fast. We break the string up into 8 byte
- * units. On the first time through the loop we get the "leftover bytes"
- * (strlen % 8). On every other iteration, we perform 8 HASHC's so we handle
- * all 8 bytes. Essentially, this saves us 7 cmp & branch instructions. If
- * this routine is heavily used enough, it's worth the ugly coding.
- *
- * OZ's original sdbm hash, copied from Margo Seltzers db package.
- *
- */
-
-/* Even more speed: */
-/* #define HASHC h = *s++ + 65599 * h */
-/* Because 65599 = pow(2,6) + pow(2,16) - 1 we multiply by shifts */
-#define HASHC htmp = (h << 6); \
- h = *s++ + htmp + (htmp << 10) - h
-
- unsigned long htmp;
-
- h = 0;
-
-#if defined(VAXC)
-/*
- * [This was an implementation of "Duff's Device", but it has been
- * redone, separating the switch for extra iterations from the loop.
- * This is necessary because the DEC VAX-C compiler is STOOPID.]
- */
- switch (len & (8 - 1)) {
- case 7: HASHC;
- case 6: HASHC;
- case 5: HASHC;
- case 4: HASHC;
- case 3: HASHC;
- case 2: HASHC;
- case 1: HASHC;
- default: break;
- }
-
- if (len > (8 - 1)) {
- register size_t loop = len >> 3;
- do {
- HASHC;
- HASHC;
- HASHC;
- HASHC;
- HASHC;
- HASHC;
- HASHC;
- HASHC;
- } while (--loop);
- }
-#else /* !VAXC */
- /* "Duff's Device" for those who can handle it */
- if (len > 0) {
- register size_t loop = (len + 8 - 1) >> 3;
-
- switch (len & (8 - 1)) {
- case 0:
- do { /* All fall throughs */
- HASHC;
- case 7: HASHC;
- case 6: HASHC;
- case 5: HASHC;
- case 4: HASHC;
- case 3: HASHC;
- case 2: HASHC;
- case 1: HASHC;
- } while (--loop);
- }
- }
-#endif /* !VAXC */
-#endif /* this_is_really_slow - not */
-
- if (h >= hsize)
- h %= hsize;
- return h;
-}
-
-/*
- * locate symbol[subs]
- */
-static NODE * /* NULL if not found */
-assoc_find(symbol, subs, hash1)
-NODE *symbol;
-register NODE *subs;
-int hash1;
-{
- register NODE *bucket, *prev = 0;
-
- for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->ahnext) {
- if (cmp_nodes(bucket->ahname, subs) == 0) {
-#if 0
- /*
- * Disable this code for now. It screws things up if we have
- * a ``for (iggy in foo)'' in progress. Interestingly enough,
- * this was not a problem in 2.15.3, only in 2.15.4. I'm not
- * sure why it works in 2.15.3.
- */
- if (prev) { /* move found to front of chain */
- prev->ahnext = bucket->ahnext;
- bucket->ahnext = symbol->var_array[hash1];
- symbol->var_array[hash1] = bucket;
- }
-#endif
- return bucket;
- } else
- prev = bucket; /* save previous list entry */
- }
- return NULL;
-}
-
-/*
- * test whether the array element symbol[subs] exists or not
- */
-int
-in_array(symbol, subs)
-NODE *symbol, *subs;
-{
- register int hash1;
-
- if (symbol->type == Node_param_list)
- symbol = stack_ptr[symbol->param_cnt];
- if (symbol->var_array == 0)
- return 0;
- subs = concat_exp(subs); /* concat_exp returns a string node */
- hash1 = hash(subs->stptr, subs->stlen, (unsigned long) symbol->array_size);
- if (assoc_find(symbol, subs, hash1) == NULL) {
- free_temp(subs);
- return 0;
- } else {
- free_temp(subs);
- return 1;
- }
-}
-
-/*
- * SYMBOL is the address of the node (or other pointer) being dereferenced.
- * SUBS is a number or string used as the subscript.
- *
- * Find SYMBOL[SUBS] in the assoc array. Install it with value "" if it
- * isn't there. Returns a pointer ala get_lhs to where its value is stored
- */
-NODE **
-assoc_lookup(symbol, subs)
-NODE *symbol, *subs;
-{
- register int hash1;
- register NODE *bucket;
-
- (void) force_string(subs);
-
- if (symbol->var_array == 0) {
- symbol->type = Node_var_array;
- symbol->array_size = symbol->table_size = 0; /* sanity */
- symbol->flags &= ~ARRAYMAXED;
- grow_table(symbol);
- hash1 = hash(subs->stptr, subs->stlen,
- (unsigned long) symbol->array_size);
- } else {
- hash1 = hash(subs->stptr, subs->stlen,
- (unsigned long) symbol->array_size);
- bucket = assoc_find(symbol, subs, hash1);
- if (bucket != NULL) {
- free_temp(subs);
- return &(bucket->ahvalue);
- }
- }
-
- /* It's not there, install it. */
- if (do_lint && subs->stlen == 0)
- warning("subscript of array `%s' is null string",
- symbol->vname);
-
- /* first see if we would need to grow the array, before installing */
- symbol->table_size++;
- if ((symbol->flags & ARRAYMAXED) == 0
- && symbol->table_size/symbol->array_size > AVG_CHAIN_MAX) {
- grow_table(symbol);
- /* have to recompute hash value for new size */
- hash1 = hash(subs->stptr, subs->stlen,
- (unsigned long) symbol->array_size);
- }
-
- getnode(bucket);
- bucket->type = Node_ahash;
- if (subs->flags & TEMP)
- bucket->ahname = dupnode(subs);
- else {
- unsigned int saveflags = subs->flags;
-
- subs->flags &= ~MALLOC;
- bucket->ahname = dupnode(subs);
- subs->flags = saveflags;
- }
- free_temp(subs);
-
- /* array subscripts are strings */
- bucket->ahname->flags &= ~NUMBER;
- bucket->ahname->flags |= STRING;
- bucket->ahvalue = Nnull_string;
- bucket->ahnext = symbol->var_array[hash1];
- symbol->var_array[hash1] = bucket;
- return &(bucket->ahvalue);
-}
-
-void
-do_delete(symbol, tree)
-NODE *symbol, *tree;
-{
- register int hash1;
- register NODE *bucket, *last;
- NODE *subs;
-
- if (symbol->type == Node_param_list)
- symbol = stack_ptr[symbol->param_cnt];
- if (symbol->var_array == 0)
- return;
- subs = concat_exp(tree); /* concat_exp returns string node */
- hash1 = hash(subs->stptr, subs->stlen, (unsigned long) symbol->array_size);
-
- last = NULL;
- for (bucket = symbol->var_array[hash1]; bucket; last = bucket, bucket = bucket->ahnext)
- if (cmp_nodes(bucket->ahname, subs) == 0)
- break;
- free_temp(subs);
- if (bucket == NULL)
- return;
- if (last)
- last->ahnext = bucket->ahnext;
- else
- symbol->var_array[hash1] = bucket->ahnext;
- unref(bucket->ahname);
- unref(bucket->ahvalue);
- freenode(bucket);
- symbol->table_size--;
- if (symbol->table_size <= 0) {
- memset(symbol->var_array, '\0',
- sizeof(NODE *) * symbol->array_size);
- symbol->table_size = symbol->array_size = 0;
- symbol->flags &= ~ARRAYMAXED;
- free((char *) symbol->var_array);
- symbol->var_array = NULL;
- }
-}
-
-void
-assoc_scan(symbol, lookat)
-NODE *symbol;
-struct search *lookat;
-{
- lookat->sym = symbol;
- lookat->idx = 0;
- lookat->bucket = NULL;
- lookat->retval = NULL;
- if (symbol->var_array != NULL)
- assoc_next(lookat);
-}
-
-void
-assoc_next(lookat)
-struct search *lookat;
-{
- register NODE *symbol = lookat->sym;
-
- if (symbol == NULL)
- fatal("null symbol in assoc_next");
- if (symbol->var_array == NULL || lookat->idx > symbol->array_size) {
- lookat->retval = NULL;
- return;
- }
- /*
- * This is theoretically unsafe. The element bucket might have
- * been freed if the body of the scan did a delete on the next
- * element of the bucket. The only way to do that is by array
- * reference, which is unlikely. Basically, if the user is doing
- * anything other than an operation on the current element of an
- * assoc array while walking through it sequentially, all bets are
- * off. (The safe way is to register all search structs on an
- * array with the array, and update all of them on a delete or
- * insert)
- */
- if (lookat->bucket != NULL) {
- lookat->retval = lookat->bucket->ahname;
- lookat->bucket = lookat->bucket->ahnext;
- return;
- }
- for (; lookat->idx < symbol->array_size; lookat->idx++) {
- NODE *bucket;
-
- if ((bucket = symbol->var_array[lookat->idx]) != NULL) {
- lookat->retval = bucket->ahname;
- lookat->bucket = bucket->ahnext;
- lookat->idx++;
- return;
- }
- }
- lookat->retval = NULL;
- lookat->bucket = NULL;
- return;
-}
-
-/* grow_table --- grow a hash table */
-
-static void
-grow_table(symbol)
-NODE *symbol;
-{
- NODE **old, **new, *chain, *next;
- int i, j;
- unsigned long hash1;
- unsigned long oldsize, newsize;
- /*
- * This is an array of primes. We grow the table by an order of
- * magnitude each time (not just doubling) so that growing is a
- * rare operation. We expect, on average, that it won't happen
- * more than twice. The final size is also chosen to be small
- * enough so that MS-DOG mallocs can handle it. When things are
- * very large (> 8K), we just double more or less, instead of
- * just jumping from 8K to 64K.
- */
- static long sizes[] = { 13, 127, 1021, 8191, 16381, 32749, 65497 };
-
- /* find next biggest hash size */
- oldsize = symbol->array_size;
- newsize = 0;
- for (i = 0, j = sizeof(sizes)/sizeof(sizes[0]); i < j; i++) {
- if (oldsize < sizes[i]) {
- newsize = sizes[i];
- break;
- }
- }
-
- if (newsize == oldsize) { /* table already at max (!) */
- symbol->flags |= ARRAYMAXED;
- return;
- }
-
- /* allocate new table */
- emalloc(new, NODE **, newsize * sizeof(NODE *), "grow_table");
- memset(new, '\0', newsize * sizeof(NODE *));
-
- /* brand new hash table, set things up and return */
- if (symbol->var_array == NULL) {
- symbol->table_size = 0;
- goto done;
- }
-
- /* old hash table there, move stuff to new, free old */
- old = symbol->var_array;
- for (i = 0; i < oldsize; i++) {
- if (old[i] == NULL)
- continue;
-
- for (chain = old[i]; chain != NULL; chain = next) {
- next = chain->ahnext;
- hash1 = hash(chain->ahname->stptr,
- chain->ahname->stlen, newsize);
-
- /* remove from old list, add to new */
- chain->ahnext = new[hash1];
- new[hash1] = chain;
-
- }
- }
- free(old);
-
-done:
- /*
- * note that symbol->table_size does not change if an old array,
- * and is explicitly set to 0 if a new one.
- */
- symbol->var_array = new;
- symbol->array_size = newsize;
-}
diff --git a/gnu/usr.bin/awk/awk.1 b/gnu/usr.bin/awk/awk.1
deleted file mode 100644
index 1b58bec..0000000
--- a/gnu/usr.bin/awk/awk.1
+++ /dev/null
@@ -1,1969 +0,0 @@
-.ds PX \s-1POSIX\s+1
-.ds UX \s-1UNIX\s+1
-.ds AN \s-1ANSI\s+1
-.TH AWK 1 "Apr 18 1994" "Free Software Foundation" "Utility Commands"
-.SH NAME
-awk \- GNU awk pattern scanning and processing language
-.SH SYNOPSIS
-.B awk
-[ POSIX or GNU style options ]
-.B \-f
-.I program-file
-[
-.B \-\^\-
-] file .\^.\^.
-.br
-.B awk
-[ POSIX or GNU style options ]
-[
-.B \-\^\-
-]
-.I program-text
-file .\^.\^.
-.SH DESCRIPTION
-.I Gawk
-is the GNU Project's implementation of the AWK programming language.
-It conforms to the definition of the language in
-the \*(PX 1003.2 Command Language And Utilities Standard.
-This version in turn is based on the description in
-.IR "The AWK Programming Language" ,
-by Aho, Kernighan, and Weinberger,
-with the additional features defined in the System V Release 4 version
-of \*(UX
-.IR awk .
-.I Gawk
-also provides some GNU-specific extensions.
-.PP
-The command line consists of options to
-.I awk
-itself, the AWK program text (if not supplied via the
-.B \-f
-or
-.B \-\^\-file
-options), and values to be made
-available in the
-.B ARGC
-and
-.B ARGV
-pre-defined AWK variables.
-.SH OPTIONS
-.PP
-.I Gawk
-options may be either the traditional \*(PX one letter options,
-or the GNU style long options. \*(PX style options start with a single ``\-'',
-while GNU long options start with ``\-\^\-''.
-GNU style long options are provided for both GNU-specific features and
-for \*(PX mandated features. Other implementations of the AWK language
-are likely to only accept the traditional one letter options.
-.PP
-Following the \*(PX standard,
-.IR awk -specific
-options are supplied via arguments to the
-.B \-W
-option. Multiple
-.B \-W
-options may be supplied, or multiple arguments may be supplied together
-if they are separated by commas, or enclosed in quotes and separated
-by white space.
-Case is ignored in arguments to the
-.B \-W
-option.
-Each
-.B \-W
-option has a corresponding GNU style long option, as detailed below.
-Arguments to GNU style long options are either joined with the option
-by an
-.B =
-sign, with no intervening spaces, or they may be provided in the
-next command line argument.
-.PP
-.I Gawk
-accepts the following options.
-.TP
-.PD 0
-.BI \-F " fs"
-.TP
-.PD
-.BI \-\^\-field-separator= fs
-Use
-.I fs
-for the input field separator (the value of the
-.B FS
-predefined
-variable).
-.TP
-.PD 0
-\fB\-v\fI var\fB\^=\^\fIval\fR
-.TP
-.PD
-\fB\-\^\-assign=\fIvar\fB\^=\^\fIval\fR
-Assign the value
-.IR val ,
-to the variable
-.IR var ,
-before execution of the program begins.
-Such variable values are available to the
-.B BEGIN
-block of an AWK program.
-.TP
-.PD 0
-.BI \-f " program-file"
-.TP
-.PD
-.BI \-\^\-file= program-file
-Read the AWK program source from the file
-.IR program-file ,
-instead of from the first command line argument.
-Multiple
-.B \-f
-(or
-.BR \-\^\-file )
-options may be used.
-.TP
-.PD 0
-.BI \-mf= NNN
-.TP
-.BI \-mr= NNN
-Set various memory limits to the value
-.IR NNN .
-The
-.B f
-flag sets the maximum number of fields, and the
-.B r
-flag sets the maximum record size. These two flags and the
-.B \-m
-option are from the AT&T Bell Labs research version of \*(UX
-.IR awk .
-They are ignored by
-.IR awk ,
-since
-.I awk
-has no pre-defined limits.
-.TP \w'\fB\-\^\-copyright\fR'u+1n
-.PD 0
-.B "\-W compat"
-.TP
-.PD
-.B \-\^\-compat
-Run in
-.I compatibility
-mode. In compatibility mode,
-.I awk
-behaves identically to \*(UX
-.IR awk ;
-none of the GNU-specific extensions are recognized.
-See
-.BR "GNU EXTENSIONS" ,
-below, for more information.
-.TP
-.PD 0
-.B "\-W copyleft"
-.TP
-.PD 0
-.B "\-W copyright"
-.TP
-.PD 0
-.B \-\^\-copyleft
-.TP
-.PD
-.B \-\^\-copyright
-Print the short version of the GNU copyright information message on
-the error output.
-.TP
-.PD 0
-.B "\-W help"
-.TP
-.PD 0
-.B "\-W usage"
-.TP
-.PD 0
-.B \-\^\-help
-.TP
-.PD
-.B \-\^\-usage
-Print a relatively short summary of the available options on
-the error output.
-Per the GNU Coding Standards, these options cause an immediate,
-successful exit.
-.TP
-.PD 0
-.B "\-W lint"
-.TP
-.PD 0
-.B \-\^\-lint
-Provide warnings about constructs that are
-dubious or non-portable to other AWK implementations.
-.ig
-.\" This option is left undocumented, on purpose.
-.TP
-.PD 0
-.B "\-W nostalgia"
-.TP
-.PD
-.B \-\^\-nostalgia
-Provide a moment of nostalgia for long time
-.I awk
-users.
-..
-.TP
-.PD 0
-.B "\-W posix"
-.TP
-.PD
-.B \-\^\-posix
-This turns on
-.I compatibility
-mode, with the following additional restrictions:
-.RS
-.TP \w'\(bu'u+1n
-\(bu
-.B \ex
-escape sequences are not recognized.
-.TP
-\(bu
-The synonym
-.B func
-for the keyword
-.B function
-is not recognized.
-.TP
-\(bu
-The operators
-.B **
-and
-.B **=
-cannot be used in place of
-.B ^
-and
-.BR ^= .
-.RE
-.TP
-.PD 0
-.BI "\-W source=" program-text
-.TP
-.PD
-.BI \-\^\-source= program-text
-Use
-.I program-text
-as AWK program source code.
-This option allows the easy intermixing of library functions (used via the
-.B \-f
-and
-.B \-\^\-file
-options) with source code entered on the command line.
-It is intended primarily for medium to large size AWK programs used
-in shell scripts.
-.sp .5
-The
-.B "\-W source="
-form of this option uses the rest of the command line argument for
-.IR program-text ;
-no other options to
-.B \-W
-will be recognized in the same argument.
-.TP
-.PD 0
-.B "\-W version"
-.TP
-.PD
-.B \-\^\-version
-Print version information for this particular copy of
-.I awk
-on the error output.
-This is useful mainly for knowing if the current copy of
-.I awk
-on your system
-is up to date with respect to whatever the Free Software Foundation
-is distributing.
-Per the GNU Coding Standards, these options cause an immediate,
-successful exit.
-.TP
-.B \-\^\-
-Signal the end of options. This is useful to allow further arguments to the
-AWK program itself to start with a ``\-''.
-This is mainly for consistency with the argument parsing convention used
-by most other \*(PX programs.
-.PP
-In compatibility mode,
-any other options are flagged as illegal, but are otherwise ignored.
-In normal operation, as long as program text has been supplied, unknown
-options are passed on to the AWK program in the
-.B ARGV
-array for processing. This is particularly useful for running AWK
-programs via the ``#!'' executable interpreter mechanism.
-.SH AWK PROGRAM EXECUTION
-.PP
-An AWK program consists of a sequence of pattern-action statements
-and optional function definitions.
-.RS
-.PP
-\fIpattern\fB { \fIaction statements\fB }\fR
-.br
-\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
-.RE
-.PP
-.I Gawk
-first reads the program source from the
-.IR program-file (s)
-if specified,
-from arguments to
-.BR "\-W source=" ,
-or from the first non-option argument on the command line.
-The
-.B \-f
-and
-.B "\-W source="
-options may be used multiple times on the command line.
-.I Gawk
-will read the program text as if all the
-.IR program-file s
-and command line source texts
-had been concatenated together. This is useful for building libraries
-of AWK functions, without having to include them in each new AWK
-program that uses them. It also provides the ability to mix library
-functions with command line programs.
-.PP
-The environment variable
-.B AWKPATH
-specifies a search path to use when finding source files named with
-the
-.B \-f
-option. If this variable does not exist, the default path is
-\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
-If a file name given to the
-.B \-f
-option contains a ``/'' character, no path search is performed.
-.PP
-.I Gawk
-executes AWK programs in the following order.
-First,
-all variable assignments specified via the
-.B \-v
-option are performed.
-Next,
-.I awk
-compiles the program into an internal form.
-Then,
-.I awk
-executes the code in the
-.B BEGIN
-block(s) (if any),
-and then proceeds to read
-each file named in the
-.B ARGV
-array.
-If there are no files named on the command line,
-.I awk
-reads the standard input.
-.PP
-If a filename on the command line has the form
-.IB var = val
-it is treated as a variable assignment. The variable
-.I var
-will be assigned the value
-.IR val .
-(This happens after any
-.B BEGIN
-block(s) have been run.)
-Command line variable assignment
-is most useful for dynamically assigning values to the variables
-AWK uses to control how input is broken into fields and records. It
-is also useful for controlling state if multiple passes are needed over
-a single data file.
-.PP
-If the value of a particular element of
-.B ARGV
-is empty (\fB""\fR),
-.I awk
-skips over it.
-.PP
-For each line in the input,
-.I awk
-tests to see if it matches any
-.I pattern
-in the AWK program.
-For each pattern that the line matches, the associated
-.I action
-is executed.
-The patterns are tested in the order they occur in the program.
-.PP
-Finally, after all the input is exhausted,
-.I awk
-executes the code in the
-.B END
-block(s) (if any).
-.SH VARIABLES AND FIELDS
-AWK variables are dynamic; they come into existence when they are
-first used. Their values are either floating-point numbers or strings,
-or both,
-depending upon how they are used. AWK also has one dimensional
-arrays; arrays with multiple dimensions may be simulated.
-Several pre-defined variables are set as a program
-runs; these will be described as needed and summarized below.
-.SS Fields
-.PP
-As each input line is read,
-.I awk
-splits the line into
-.IR fields ,
-using the value of the
-.B FS
-variable as the field separator.
-If
-.B FS
-is a single character, fields are separated by that character.
-Otherwise,
-.B FS
-is expected to be a full regular expression.
-In the special case that
-.B FS
-is a single blank, fields are separated
-by runs of blanks and/or tabs.
-Note that the value of
-.B IGNORECASE
-(see below) will also affect how fields are split when
-.B FS
-is a regular expression.
-.PP
-If the
-.B FIELDWIDTHS
-variable is set to a space separated list of numbers, each field is
-expected to have fixed width, and
-.I awk
-will split up the record using the specified widths. The value of
-.B FS
-is ignored.
-Assigning a new value to
-.B FS
-overrides the use of
-.BR FIELDWIDTHS ,
-and restores the default behavior.
-.PP
-Each field in the input line may be referenced by its position,
-.BR $1 ,
-.BR $2 ,
-and so on.
-.B $0
-is the whole line. The value of a field may be assigned to as well.
-Fields need not be referenced by constants:
-.RS
-.PP
-.ft B
-n = 5
-.br
-print $n
-.ft R
-.RE
-.PP
-prints the fifth field in the input line.
-The variable
-.B NF
-is set to the total number of fields in the input line.
-.PP
-References to non-existent fields (i.e. fields after
-.BR $NF )
-produce the null-string. However, assigning to a non-existent field
-(e.g.,
-.BR "$(NF+2) = 5" )
-will increase the value of
-.BR NF ,
-create any intervening fields with the null string as their value, and
-cause the value of
-.B $0
-to be recomputed, with the fields being separated by the value of
-.BR OFS .
-References to negative numbered fields cause a fatal error.
-.SS Built-in Variables
-.PP
-AWK's built-in variables are:
-.PP
-.TP \w'\fBFIELDWIDTHS\fR'u+1n
-.B ARGC
-The number of command line arguments (does not include options to
-.IR awk ,
-or the program source).
-.TP
-.B ARGIND
-The index in
-.B ARGV
-of the current file being processed.
-.TP
-.B ARGV
-Array of command line arguments. The array is indexed from
-0 to
-.B ARGC
-\- 1.
-Dynamically changing the contents of
-.B ARGV
-can control the files used for data.
-.TP
-.B CONVFMT
-The conversion format for numbers, \fB"%.6g"\fR, by default.
-.TP
-.B ENVIRON
-An array containing the values of the current environment.
-The array is indexed by the environment variables, each element being
-the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
-.BR /u/arnold ).
-Changing this array does not affect the environment seen by programs which
-.I awk
-spawns via redirection or the
-.B system()
-function.
-(This may change in a future version of
-.IR awk .)
-.\" but don't hold your breath...
-.TP
-.B ERRNO
-If a system error occurs either doing a redirection for
-.BR getline ,
-during a read for
-.BR getline ,
-or during a
-.BR close() ,
-then
-.B ERRNO
-will contain
-a string describing the error.
-.TP
-.B FIELDWIDTHS
-A white-space separated list of fieldwidths. When set,
-.I awk
-parses the input into fields of fixed width, instead of using the
-value of the
-.B FS
-variable as the field separator.
-The fixed field width facility is still experimental; expect the
-semantics to change as
-.I awk
-evolves over time.
-.TP
-.B FILENAME
-The name of the current input file.
-If no files are specified on the command line, the value of
-.B FILENAME
-is ``\-''.
-However,
-.B FILENAME
-is undefined inside the
-.B BEGIN
-block.
-.TP
-.B FNR
-The input record number in the current input file.
-.TP
-.B FS
-The input field separator, a blank by default.
-.TP
-.B IGNORECASE
-Controls the case-sensitivity of all regular expression operations. If
-.B IGNORECASE
-has a non-zero value, then pattern matching in rules,
-field splitting with
-.BR FS ,
-regular expression
-matching with
-.B ~
-and
-.BR !~ ,
-and the
-.BR gsub() ,
-.BR index() ,
-.BR match() ,
-.BR split() ,
-and
-.B sub()
-pre-defined functions will all ignore case when doing regular expression
-operations. Thus, if
-.B IGNORECASE
-is not equal to zero,
-.B /aB/
-matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
-and \fB"AB"\fP.
-As with all AWK variables, the initial value of
-.B IGNORECASE
-is zero, so all regular expression operations are normally case-sensitive.
-.TP
-.B NF
-The number of fields in the current input record.
-.TP
-.B NR
-The total number of input records seen so far.
-.TP
-.B OFMT
-The output format for numbers, \fB"%.6g"\fR, by default.
-.TP
-.B OFS
-The output field separator, a blank by default.
-.TP
-.B ORS
-The output record separator, by default a newline.
-.TP
-.B RS
-The input record separator, by default a newline.
-.B RS
-is exceptional in that only the first character of its string
-value is used for separating records.
-(This will probably change in a future release of
-.IR awk .)
-If
-.B RS
-is set to the null string, then records are separated by
-blank lines.
-When
-.B RS
-is set to the null string, then the newline character always acts as
-a field separator, in addition to whatever value
-.B FS
-may have.
-.TP
-.B RSTART
-The index of the first character matched by
-.BR match() ;
-0 if no match.
-.TP
-.B RLENGTH
-The length of the string matched by
-.BR match() ;
-\-1 if no match.
-.TP
-.B SUBSEP
-The character used to separate multiple subscripts in array
-elements, by default \fB"\e034"\fR.
-.SS Arrays
-.PP
-Arrays are subscripted with an expression between square brackets
-.RB ( [ " and " ] ).
-If the expression is an expression list
-.RI ( expr ", " expr " ...)"
-then the array subscript is a string consisting of the
-concatenation of the (string) value of each expression,
-separated by the value of the
-.B SUBSEP
-variable.
-This facility is used to simulate multiply dimensioned
-arrays. For example:
-.PP
-.RS
-.ft B
-i = "A" ;\^ j = "B" ;\^ k = "C"
-.br
-x[i, j, k] = "hello, world\en"
-.ft R
-.RE
-.PP
-assigns the string \fB"hello, world\en"\fR to the element of the array
-.B x
-which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
-are associative, i.e. indexed by string values.
-.PP
-The special operator
-.B in
-may be used in an
-.B if
-or
-.B while
-statement to see if an array has an index consisting of a particular
-value.
-.PP
-.RS
-.ft B
-.nf
-if (val in array)
- print array[val]
-.fi
-.ft
-.RE
-.PP
-If the array has multiple subscripts, use
-.BR "(i, j) in array" .
-.PP
-The
-.B in
-construct may also be used in a
-.B for
-loop to iterate over all the elements of an array.
-.PP
-An element may be deleted from an array using the
-.B delete
-statement.
-The
-.B delete
-statement may also be used to delete the entire contents of an array.
-.SS Variable Typing And Conversion
-.PP
-Variables and fields
-may be (floating point) numbers, or strings, or both. How the
-value of a variable is interpreted depends upon its context. If used in
-a numeric expression, it will be treated as a number, if used as a string
-it will be treated as a string.
-.PP
-To force a variable to be treated as a number, add 0 to it; to force it
-to be treated as a string, concatenate it with the null string.
-.PP
-When a string must be converted to a number, the conversion is accomplished
-using
-.IR atof (3).
-A number is converted to a string by using the value of
-.B CONVFMT
-as a format string for
-.IR sprintf (3),
-with the numeric value of the variable as the argument.
-However, even though all numbers in AWK are floating-point,
-integral values are
-.I always
-converted as integers. Thus, given
-.PP
-.RS
-.ft B
-.nf
-CONVFMT = "%2.2f"
-a = 12
-b = a ""
-.fi
-.ft R
-.RE
-.PP
-the variable
-.B b
-has a string value of \fB"12"\fR and not \fB"12.00"\fR.
-.PP
-.I Gawk
-performs comparisons as follows:
-If two variables are numeric, they are compared numerically.
-If one value is numeric and the other has a string value that is a
-``numeric string,'' then comparisons are also done numerically.
-Otherwise, the numeric value is converted to a string and a string
-comparison is performed.
-Two strings are compared, of course, as strings.
-According to the \*(PX standard, even if two strings are
-numeric strings, a numeric comparison is performed. However, this is
-clearly incorrect, and
-.I awk
-does not do this.
-.PP
-Uninitialized variables have the numeric value 0 and the string value ""
-(the null, or empty, string).
-.SH PATTERNS AND ACTIONS
-AWK is a line oriented language. The pattern comes first, and then the
-action. Action statements are enclosed in
-.B {
-and
-.BR } .
-Either the pattern may be missing, or the action may be missing, but,
-of course, not both. If the pattern is missing, the action will be
-executed for every single line of input.
-A missing action is equivalent to
-.RS
-.PP
-.B "{ print }"
-.RE
-.PP
-which prints the entire line.
-.PP
-Comments begin with the ``#'' character, and continue until the
-end of the line.
-Blank lines may be used to separate statements.
-Normally, a statement ends with a newline, however, this is not the
-case for lines ending in
-a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''.
-Lines ending in
-.B do
-or
-.B else
-also have their statements automatically continued on the following line.
-In other cases, a line can be continued by ending it with a ``\e'',
-in which case the newline will be ignored.
-.PP
-Multiple statements may
-be put on one line by separating them with a ``;''.
-This applies to both the statements within the action part of a
-pattern-action pair (the usual case),
-and to the pattern-action statements themselves.
-.SS Patterns
-AWK patterns may be one of the following:
-.PP
-.RS
-.nf
-.B BEGIN
-.B END
-.BI / "regular expression" /
-.I "relational expression"
-.IB pattern " && " pattern
-.IB pattern " || " pattern
-.IB pattern " ? " pattern " : " pattern
-.BI ( pattern )
-.BI ! " pattern"
-.IB pattern1 ", " pattern2
-.fi
-.RE
-.PP
-.B BEGIN
-and
-.B END
-are two special kinds of patterns which are not tested against
-the input.
-The action parts of all
-.B BEGIN
-patterns are merged as if all the statements had
-been written in a single
-.B BEGIN
-block. They are executed before any
-of the input is read. Similarly, all the
-.B END
-blocks are merged,
-and executed when all the input is exhausted (or when an
-.B exit
-statement is executed).
-.B BEGIN
-and
-.B END
-patterns cannot be combined with other patterns in pattern expressions.
-.B BEGIN
-and
-.B END
-patterns cannot have missing action parts.
-.PP
-For
-.BI / "regular expression" /
-patterns, the associated statement is executed for each input line that matches
-the regular expression.
-Regular expressions are the same as those in
-.IR egrep (1),
-and are summarized below.
-.PP
-A
-.I "relational expression"
-may use any of the operators defined below in the section on actions.
-These generally test whether certain fields match certain regular expressions.
-.PP
-The
-.BR && ,
-.BR || ,
-and
-.B !
-operators are logical AND, logical OR, and logical NOT, respectively, as in C.
-They do short-circuit evaluation, also as in C, and are used for combining
-more primitive pattern expressions. As in most languages, parentheses
-may be used to change the order of evaluation.
-.PP
-The
-.B ?\^:
-operator is like the same operator in C. If the first pattern is true
-then the pattern used for testing is the second pattern, otherwise it is
-the third. Only one of the second and third patterns is evaluated.
-.PP
-The
-.IB pattern1 ", " pattern2
-form of an expression is called a
-.IR "range pattern" .
-It matches all input records starting with a line that matches
-.IR pattern1 ,
-and continuing until a record that matches
-.IR pattern2 ,
-inclusive. It does not combine with any other sort of pattern expression.
-.SS Regular Expressions
-Regular expressions are the extended kind found in
-.IR egrep .
-They are composed of characters as follows:
-.TP \w'\fB[^\fIabc...\fB]\fR'u+2n
-.I c
-matches the non-metacharacter
-.IR c .
-.TP
-.I \ec
-matches the literal character
-.IR c .
-.TP
-.B .
-matches any character except newline.
-.TP
-.B ^
-matches the beginning of a line or a string.
-.TP
-.B $
-matches the end of a line or a string.
-.TP
-.BI [ abc... ]
-character class, matches any of the characters
-.IR abc... .
-.TP
-.BI [^ abc... ]
-negated character class, matches any character except
-.I abc...
-and newline.
-.TP
-.IB r1 | r2
-alternation: matches either
-.I r1
-or
-.IR r2 .
-.TP
-.I r1r2
-concatenation: matches
-.IR r1 ,
-and then
-.IR r2 .
-.TP
-.IB r +
-matches one or more
-.IR r 's.
-.TP
-.IB r *
-matches zero or more
-.IR r 's.
-.TP
-.IB r ?
-matches zero or one
-.IR r 's.
-.TP
-.BI ( r )
-grouping: matches
-.IR r .
-.PP
-The escape sequences that are valid in string constants (see below)
-are also legal in regular expressions.
-.SS Actions
-Action statements are enclosed in braces,
-.B {
-and
-.BR } .
-Action statements consist of the usual assignment, conditional, and looping
-statements found in most languages. The operators, control statements,
-and input/output statements
-available are patterned after those in C.
-.SS Operators
-.PP
-The operators in AWK, in order of increasing precedence, are
-.PP
-.TP "\w'\fB*= /= %= ^=\fR'u+1n"
-.PD 0
-.B "= += \-="
-.TP
-.PD
-.B "*= /= %= ^="
-Assignment. Both absolute assignment
-.BI ( var " = " value )
-and operator-assignment (the other forms) are supported.
-.TP
-.B ?:
-The C conditional expression. This has the form
-.IB expr1 " ? " expr2 " : " expr3\c
-\&. If
-.I expr1
-is true, the value of the expression is
-.IR expr2 ,
-otherwise it is
-.IR expr3 .
-Only one of
-.I expr2
-and
-.I expr3
-is evaluated.
-.TP
-.B ||
-Logical OR.
-.TP
-.B &&
-Logical AND.
-.TP
-.B "~ !~"
-Regular expression match, negated match.
-.B NOTE:
-Do not use a constant regular expression
-.RB ( /foo/ )
-on the left-hand side of a
-.B ~
-or
-.BR !~ .
-Only use one on the right-hand side. The expression
-.BI "/foo/ ~ " exp
-has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
-This is usually
-.I not
-what was intended.
-.TP
-.PD 0
-.B "< >"
-.TP
-.PD 0
-.B "<= >="
-.TP
-.PD
-.B "!= =="
-The regular relational operators.
-.TP
-.I blank
-String concatenation.
-.TP
-.B "+ \-"
-Addition and subtraction.
-.TP
-.B "* / %"
-Multiplication, division, and modulus.
-.TP
-.B "+ \- !"
-Unary plus, unary minus, and logical negation.
-.TP
-.B ^
-Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
-the assignment operator).
-.TP
-.B "++ \-\^\-"
-Increment and decrement, both prefix and postfix.
-.TP
-.B $
-Field reference.
-.SS Control Statements
-.PP
-The control statements are
-as follows:
-.PP
-.RS
-.nf
-\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
-\fBwhile (\fIcondition\fB) \fIstatement \fR
-\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
-\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
-\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
-\fBbreak\fR
-\fBcontinue\fR
-\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
-\fBdelete \fIarray\^\fR
-\fBexit\fR [ \fIexpression\fR ]
-\fB{ \fIstatements \fB}
-.fi
-.RE
-.SS "I/O Statements"
-.PP
-The input/output statements are as follows:
-.PP
-.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
-.BI close( filename )
-Close file (or pipe, see below).
-.TP
-.B getline
-Set
-.B $0
-from next input record; set
-.BR NF ,
-.BR NR ,
-.BR FNR .
-.TP
-.BI "getline <" file
-Set
-.B $0
-from next record of
-.IR file ;
-set
-.BR NF .
-.TP
-.BI getline " var"
-Set
-.I var
-from next input record; set
-.BR NF ,
-.BR FNR .
-.TP
-.BI getline " var" " <" file
-Set
-.I var
-from next record of
-.IR file .
-.TP
-.B next
-Stop processing the current input record. The next input record
-is read and processing starts over with the first pattern in the
-AWK program. If the end of the input data is reached, the
-.B END
-block(s), if any, are executed.
-.TP
-.B "next file"
-Stop processing the current input file. The next input record read
-comes from the next input file.
-.B FILENAME
-is updated,
-.B FNR
-is reset to 1, and processing starts over with the first pattern in the
-AWK program. If the end of the input data is reached, the
-.B END
-block(s), if any, are executed.
-.TP
-.B print
-Prints the current record.
-.TP
-.BI print " expr-list"
-Prints expressions.
-Each expression is separated by the value of the
-.B OFS
-variable. The output record is terminated with the value of the
-.B ORS
-variable.
-.TP
-.BI print " expr-list" " >" file
-Prints expressions on
-.IR file .
-Each expression is separated by the value of the
-.B OFS
-variable. The output record is terminated with the value of the
-.B ORS
-variable.
-.TP
-.BI printf " fmt, expr-list"
-Format and print.
-.TP
-.BI printf " fmt, expr-list" " >" file
-Format and print on
-.IR file .
-.TP
-.BI system( cmd-line )
-Execute the command
-.IR cmd-line ,
-and return the exit status.
-(This may not be available on non-\*(PX systems.)
-.PP
-Other input/output redirections are also allowed. For
-.B print
-and
-.BR printf ,
-.BI >> file
-appends output to the
-.IR file ,
-while
-.BI | " command"
-writes on a pipe.
-In a similar fashion,
-.IB command " | getline"
-pipes into
-.BR getline .
-The
-.BR getline
-command will return 0 on end of file, and \-1 on an error.
-.SS The \fIprintf\fP\^ Statement
-.PP
-The AWK versions of the
-.B printf
-statement and
-.B sprintf()
-function
-(see below)
-accept the following conversion specification formats:
-.TP
-.B %c
-An \s-1ASCII\s+1 character.
-If the argument used for
-.B %c
-is numeric, it is treated as a character and printed.
-Otherwise, the argument is assumed to be a string, and the only first
-character of that string is printed.
-.TP
-.B %d
-A decimal number (the integer part).
-.TP
-.B %i
-Just like
-.BR %d .
-.TP
-.B %e
-A floating point number of the form
-.BR [\-]d.ddddddE[+\^\-]dd .
-.TP
-.B %f
-A floating point number of the form
-.BR [\-]ddd.dddddd .
-.TP
-.B %g
-Use
-.B e
-or
-.B f
-conversion, whichever is shorter, with nonsignificant zeros suppressed.
-.TP
-.B %o
-An unsigned octal number (again, an integer).
-.TP
-.B %s
-A character string.
-.TP
-.B %x
-An unsigned hexadecimal number (an integer).
-.TP
-.B %X
-Like
-.BR %x ,
-but using
-.B ABCDEF
-instead of
-.BR abcdef .
-.TP
-.B %%
-A single
-.B %
-character; no argument is converted.
-.PP
-There are optional, additional parameters that may lie between the
-.B %
-and the control letter:
-.TP
-.B \-
-The expression should be left-justified within its field.
-.TP
-.I width
-The field should be padded to this width. If the number has a leading
-zero, then the field will be padded with zeros.
-Otherwise it is padded with blanks.
-This applies even to the non-numeric output formats.
-.TP
-.BI . prec
-A number indicating the maximum width of strings or digits to the right
-of the decimal point.
-.PP
-The dynamic
-.I width
-and
-.I prec
-capabilities of the \*(AN C
-.B printf()
-routines are supported.
-A
-.B *
-in place of either the
-.B width
-or
-.B prec
-specifications will cause their values to be taken from
-the argument list to
-.B printf
-or
-.BR sprintf() .
-.SS Special File Names
-.PP
-When doing I/O redirection from either
-.B print
-or
-.B printf
-into a file,
-or via
-.B getline
-from a file,
-.I awk
-recognizes certain special filenames internally. These filenames
-allow access to open file descriptors inherited from
-.IR awk 's
-parent process (usually the shell).
-Other special filenames provide access information about the running
-.B awk
-process.
-The filenames are:
-.TP \w'\fB/dev/stdout\fR'u+1n
-.B /dev/pid
-Reading this file returns the process ID of the current process,
-in decimal, terminated with a newline.
-.TP
-.B /dev/ppid
-Reading this file returns the parent process ID of the current process,
-in decimal, terminated with a newline.
-.TP
-.B /dev/pgrpid
-Reading this file returns the process group ID of the current process,
-in decimal, terminated with a newline.
-.TP
-.B /dev/user
-Reading this file returns a single record terminated with a newline.
-The fields are separated with blanks.
-.B $1
-is the value of the
-.IR getuid (2)
-system call,
-.B $2
-is the value of the
-.IR geteuid (2)
-system call,
-.B $3
-is the value of the
-.IR getgid (2)
-system call, and
-.B $4
-is the value of the
-.IR getegid (2)
-system call.
-If there are any additional fields, they are the group IDs returned by
-.IR getgroups (2).
-Multiple groups may not be supported on all systems.
-.TP
-.B /dev/stdin
-The standard input.
-.TP
-.B /dev/stdout
-The standard output.
-.TP
-.B /dev/stderr
-The standard error output.
-.TP
-.BI /dev/fd/\^ n
-The file associated with the open file descriptor
-.IR n .
-.PP
-These are particularly useful for error messages. For example:
-.PP
-.RS
-.ft B
-print "You blew it!" > "/dev/stderr"
-.ft R
-.RE
-.PP
-whereas you would otherwise have to use
-.PP
-.RS
-.ft B
-print "You blew it!" | "cat 1>&2"
-.ft R
-.RE
-.PP
-These file names may also be used on the command line to name data files.
-.SS Numeric Functions
-.PP
-AWK has the following pre-defined arithmetic functions:
-.PP
-.TP \w'\fBsrand(\^\fIexpr\^\fB)\fR'u+1n
-.BI atan2( y , " x" )
-returns the arctangent of
-.I y/x
-in radians.
-.TP
-.BI cos( expr )
-returns the cosine in radians.
-.TP
-.BI exp( expr )
-the exponential function.
-.TP
-.BI int( expr )
-truncates to integer.
-.TP
-.BI log( expr )
-the natural logarithm function.
-.TP
-.B rand()
-returns a random number between 0 and 1.
-.TP
-.BI sin( expr )
-returns the sine in radians.
-.TP
-.BI sqrt( expr )
-the square root function.
-.TP
-.BI srand( expr )
-use
-.I expr
-as a new seed for the random number generator. If no
-.I expr
-is provided, the time of day will be used.
-The return value is the previous seed for the random
-number generator.
-.SS String Functions
-.PP
-AWK has the following pre-defined string functions:
-.PP
-.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
-\fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
-for each substring matching the regular expression
-.I r
-in the string
-.IR t ,
-substitute the string
-.IR s ,
-and return the number of substitutions.
-If
-.I t
-is not supplied, use
-.BR $0 .
-.TP
-.BI index( s , " t" )
-returns the index of the string
-.I t
-in the string
-.IR s ,
-or 0 if
-.I t
-is not present.
-.TP
-.BI length( s )
-returns the length of the string
-.IR s ,
-or the length of
-.B $0
-if
-.I s
-is not supplied.
-.TP
-.BI match( s , " r" )
-returns the position in
-.I s
-where the regular expression
-.I r
-occurs, or 0 if
-.I r
-is not present, and sets the values of
-.B RSTART
-and
-.BR RLENGTH .
-.TP
-\fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR
-splits the string
-.I s
-into the array
-.I a
-on the regular expression
-.IR r ,
-and returns the number of fields. If
-.I r
-is omitted,
-.B FS
-is used instead.
-The array
-.I a
-is cleared first.
-.TP
-.BI sprintf( fmt , " expr-list" )
-prints
-.I expr-list
-according to
-.IR fmt ,
-and returns the resulting string.
-.TP
-\fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
-just like
-.BR gsub() ,
-but only the first matching substring is replaced.
-.TP
-\fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR
-returns the
-.IR n -character
-substring of
-.I s
-starting at
-.IR i .
-If
-.I n
-is omitted, the rest of
-.I s
-is used.
-.TP
-.BI tolower( str )
-returns a copy of the string
-.IR str ,
-with all the upper-case characters in
-.I str
-translated to their corresponding lower-case counterparts.
-Non-alphabetic characters are left unchanged.
-.TP
-.BI toupper( str )
-returns a copy of the string
-.IR str ,
-with all the lower-case characters in
-.I str
-translated to their corresponding upper-case counterparts.
-Non-alphabetic characters are left unchanged.
-.SS Time Functions
-.PP
-Since one of the primary uses of AWK programs is processing log files
-that contain time stamp information,
-.I awk
-provides the following two functions for obtaining time stamps and
-formatting them.
-.PP
-.TP "\w'\fBsystime()\fR'u+1n"
-.B systime()
-returns the current time of day as the number of seconds since the Epoch
-(Midnight UTC, January 1, 1970 on \*(PX systems).
-.TP
-\fBstrftime(\fIformat\fR, \fItimestamp\fB)\fR
-formats
-.I timestamp
-according to the specification in
-.IR format.
-The
-.I timestamp
-should be of the same form as returned by
-.BR systime() .
-If
-.I timestamp
-is missing, the current time of day is used.
-See the specification for the
-.B strftime()
-function in \*(AN C for the format conversions that are
-guaranteed to be available.
-A public-domain version of
-.IR strftime (3)
-and a man page for it are shipped with
-.IR awk ;
-if that version was used to build
-.IR awk ,
-then all of the conversions described in that man page are available to
-.IR awk.
-.SS String Constants
-.PP
-String constants in AWK are sequences of characters enclosed
-between double quotes (\fB"\fR). Within strings, certain
-.I "escape sequences"
-are recognized, as in C. These are:
-.PP
-.TP \w'\fB\e\^\fIddd\fR'u+1n
-.B \e\e
-A literal backslash.
-.TP
-.B \ea
-The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
-.TP
-.B \eb
-backspace.
-.TP
-.B \ef
-form-feed.
-.TP
-.B \en
-new line.
-.TP
-.B \er
-carriage return.
-.TP
-.B \et
-horizontal tab.
-.TP
-.B \ev
-vertical tab.
-.TP
-.BI \ex "\^hex digits"
-The character represented by the string of hexadecimal digits following
-the
-.BR \ex .
-As in \*(AN C, all following hexadecimal digits are considered part of
-the escape sequence.
-(This feature should tell us something about language design by committee.)
-E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
-.TP
-.BI \e ddd
-The character represented by the 1-, 2-, or 3-digit sequence of octal
-digits. E.g. \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
-.TP
-.BI \e c
-The literal character
-.IR c\^ .
-.PP
-The escape sequences may also be used inside constant regular expressions
-(e.g.,
-.B "/[\ \et\ef\en\er\ev]/"
-matches whitespace characters).
-.SH FUNCTIONS
-Functions in AWK are defined as follows:
-.PP
-.RS
-\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
-.RE
-.PP
-Functions are executed when called from within the action parts of regular
-pattern-action statements. Actual parameters supplied in the function
-call are used to instantiate the formal parameters declared in the function.
-Arrays are passed by reference, other variables are passed by value.
-.PP
-Since functions were not originally part of the AWK language, the provision
-for local variables is rather clumsy: They are declared as extra parameters
-in the parameter list. The convention is to separate local variables from
-real parameters by extra spaces in the parameter list. For example:
-.PP
-.RS
-.ft B
-.nf
-function f(p, q, a, b) { # a & b are local
- ..... }
-
-/abc/ { ... ; f(1, 2) ; ... }
-.fi
-.ft R
-.RE
-.PP
-The left parenthesis in a function call is required
-to immediately follow the function name,
-without any intervening white space.
-This is to avoid a syntactic ambiguity with the concatenation operator.
-This restriction does not apply to the built-in functions listed above.
-.PP
-Functions may call each other and may be recursive.
-Function parameters used as local variables are initialized
-to the null string and the number zero upon function invocation.
-.PP
-The word
-.B func
-may be used in place of
-.BR function .
-.SH EXAMPLES
-.nf
-Print and sort the login names of all users:
-
-.ft B
- BEGIN { FS = ":" }
- { print $1 | "sort" }
-
-.ft R
-Count lines in a file:
-
-.ft B
- { nlines++ }
- END { print nlines }
-
-.ft R
-Precede each line by its number in the file:
-
-.ft B
- { print FNR, $0 }
-
-.ft R
-Concatenate and line number (a variation on a theme):
-
-.ft B
- { print NR, $0 }
-.ft R
-.fi
-.SH SEE ALSO
-.IR egrep (1),
-.IR getpid (2),
-.IR getppid (2),
-.IR getpgrp (2),
-.IR getuid (2),
-.IR geteuid (2),
-.IR getgid (2),
-.IR getegid (2),
-.IR getgroups (2)
-.PP
-.IR "The AWK Programming Language" ,
-Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
-Addison-Wesley, 1988. ISBN 0-201-07981-X.
-.PP
-.IR "The GAWK Manual" ,
-Edition 0.15, published by the Free Software Foundation, 1993.
-.SH POSIX COMPATIBILITY
-A primary goal for
-.I awk
-is compatibility with the \*(PX standard, as well as with the
-latest version of \*(UX
-.IR awk .
-To this end,
-.I awk
-incorporates the following user visible
-features which are not described in the AWK book,
-but are part of
-.I awk
-in System V Release 4, and are in the \*(PX standard.
-.PP
-The
-.B \-v
-option for assigning variables before program execution starts is new.
-The book indicates that command line variable assignment happens when
-.I awk
-would otherwise open the argument as a file, which is after the
-.B BEGIN
-block is executed. However, in earlier implementations, when such an
-assignment appeared before any file names, the assignment would happen
-.I before
-the
-.B BEGIN
-block was run. Applications came to depend on this ``feature.''
-When
-.I awk
-was changed to match its documentation, this option was added to
-accommodate applications that depended upon the old behavior.
-(This feature was agreed upon by both the AT&T and GNU developers.)
-.PP
-The
-.B \-W
-option for implementation specific features is from the \*(PX standard.
-.PP
-When processing arguments,
-.I awk
-uses the special option ``\fB\-\^\-\fP'' to signal the end of
-arguments.
-In compatibility mode, it will warn about, but otherwise ignore,
-undefined options.
-In normal operation, such arguments are passed on to the AWK program for
-it to process.
-.PP
-The AWK book does not define the return value of
-.BR srand() .
-The System V Release 4 version of \*(UX
-.I awk
-(and the \*(PX standard)
-has it return the seed it was using, to allow keeping track
-of random number sequences. Therefore
-.B srand()
-in
-.I awk
-also returns its current seed.
-.PP
-Other new features are:
-The use of multiple
-.B \-f
-options (from MKS
-.IR awk );
-the
-.B ENVIRON
-array; the
-.BR \ea ,
-and
-.BR \ev
-escape sequences (done originally in
-.I awk
-and fed back into AT&T's); the
-.B tolower()
-and
-.B toupper()
-built-in functions (from AT&T); and the \*(AN C conversion specifications in
-.B printf
-(done first in AT&T's version).
-.SH GNU EXTENSIONS
-.I Gawk
-has some extensions to \*(PX
-.IR awk .
-They are described in this section. All the extensions described here
-can be disabled by
-invoking
-.I awk
-with the
-.B "\-W compat"
-option.
-.PP
-The following features of
-.I awk
-are not available in
-\*(PX
-.IR awk .
-.RS
-.TP \w'\(bu'u+1n
-\(bu
-The
-.B \ex
-escape sequence.
-.TP
-\(bu
-The
-.B systime()
-and
-.B strftime()
-functions.
-.TP
-\(bu
-The special file names available for I/O redirection are not recognized.
-.TP
-\(bu
-The
-.B ARGIND
-and
-.B ERRNO
-variables are not special.
-.TP
-\(bu
-The
-.B IGNORECASE
-variable and its side-effects are not available.
-.TP
-\(bu
-The
-.B FIELDWIDTHS
-variable and fixed width field splitting.
-.TP
-\(bu
-No path search is performed for files named via the
-.B \-f
-option. Therefore the
-.B AWKPATH
-environment variable is not special.
-.TP
-\(bu
-The use of
-.B "next file"
-to abandon processing of the current input file.
-.TP
-\(bu
-The use of
-.BI delete " array"
-to delete the entire contents of an array.
-.RE
-.PP
-The AWK book does not define the return value of the
-.B close()
-function.
-.IR Gawk\^ 's
-.B close()
-returns the value from
-.IR fclose (3),
-or
-.IR pclose (3),
-when closing a file or pipe, respectively.
-.PP
-When
-.I awk
-is invoked with the
-.B "\-W compat"
-option,
-if the
-.I fs
-argument to the
-.B \-F
-option is ``t'', then
-.B FS
-will be set to the tab character.
-Since this is a rather ugly special case, it is not the default behavior.
-This behavior also does not occur if
-.B "\-W posix"
-has been specified.
-.ig
-.PP
-If
-.I awk
-was compiled for debugging, it will
-accept the following additional options:
-.TP
-.PD 0
-.B \-Wparsedebug
-.TP
-.PD
-.B \-\^\-parsedebug
-Turn on
-.IR yacc (1)
-or
-.IR bison (1)
-debugging output during program parsing.
-This option should only be of interest to the
-.I awk
-maintainers, and may not even be compiled into
-.IR awk .
-..
-.SH HISTORICAL FEATURES
-There are two features of historical AWK implementations that
-.I awk
-supports.
-First, it is possible to call the
-.B length()
-built-in function not only with no argument, but even without parentheses!
-Thus,
-.RS
-.PP
-.ft B
-a = length
-.ft R
-.RE
-.PP
-is the same as either of
-.RS
-.PP
-.ft B
-a = length()
-.br
-a = length($0)
-.ft R
-.RE
-.PP
-This feature is marked as ``deprecated'' in the \*(PX standard, and
-.I awk
-will issue a warning about its use if
-.B "\-W lint"
-is specified on the command line.
-.PP
-The other feature is the use of the
-.B continue
-statement outside the body of a
-.BR while ,
-.BR for ,
-or
-.B do
-loop. Traditional AWK implementations have treated such usage as
-equivalent to the
-.B next
-statement.
-.I Gawk
-will support this usage if
-.B "\-W posix"
-has not been specified.
-.SH ENVIRONMENT VARIABLES
-If
-.B POSIXLY_CORRECT
-exists in the environment, then
-.I awk
-behaves exactly as if
-.B \-\-posix
-had been specified on the command line.
-If
-.B \-\-lint
-has been specified,
-.I awk
-will issue a warning message to this effect.
-.SH BUGS
-The
-.B \-F
-option is not necessary given the command line variable assignment feature;
-it remains only for backwards compatibility.
-.PP
-If your system actually has support for
-.B /dev/fd
-and the associated
-.BR /dev/stdin ,
-.BR /dev/stdout ,
-and
-.B /dev/stderr
-files, you may get different output from
-.I awk
-than you would get on a system without those files. When
-.I awk
-interprets these files internally, it synchronizes output to the standard
-output with output to
-.BR /dev/stdout ,
-while on a system with those files, the output is actually to different
-open files.
-Caveat Emptor.
-.SH VERSION INFORMATION
-This man page documents
-.IR awk ,
-version 2.15.
-.PP
-Starting with the 2.15 version of
-.IR awk ,
-the
-.BR \-c ,
-.BR \-V ,
-.BR \-C ,
-.ig
-.BR \-D ,
-..
-.BR \-a ,
-and
-.B \-e
-options of the 2.11 version are no longer recognized.
-This fact will not even be documented in the manual page for version 2.16.
-.SH AUTHORS
-The original version of \*(UX
-.I awk
-was designed and implemented by Alfred Aho,
-Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
-continues to maintain and enhance it.
-.PP
-Paul Rubin and Jay Fenlason,
-of the Free Software Foundation, wrote
-.IR gawk ,
-to be compatible with the original version of
-.I awk
-distributed in Seventh Edition \*(UX.
-John Woods contributed a number of bug fixes.
-David Trueman, with contributions
-from Arnold Robbins, made
-.I gawk
-compatible with the new version of \*(UX
-.IR awk .
-.PP
-The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
-Scott Deifik is the current DOS maintainer. Pat Rankin did the
-port to VMS, and Michal Jaegermann did the port to the Atari ST.
-The port to OS/2 was done by Kai Uwe Rommel, with contributions and
-help from Darrel Hankerson.
-.SH ACKNOWLEDGEMENTS
-Brian Kernighan of Bell Labs
-provided valuable assistance during testing and debugging.
-We thank him.
diff --git a/gnu/usr.bin/awk/awk.h b/gnu/usr.bin/awk/awk.h
deleted file mode 100644
index 453a724..0000000
--- a/gnu/usr.bin/awk/awk.h
+++ /dev/null
@@ -1,790 +0,0 @@
-/*
- * awk.h -- Definitions for gawk.
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-/* ------------------------------ Includes ------------------------------ */
-#include "config.h"
-
-#include <stdio.h>
-#ifndef LIMITS_H_MISSING
-#include <limits.h>
-#endif
-#include <ctype.h>
-#include <setjmp.h>
-#include <varargs.h>
-#include <time.h>
-#include <errno.h>
-#if !defined(errno) && !defined(MSDOS) && !defined(OS2)
-extern int errno;
-#endif
-#ifdef __GNU_LIBRARY__
-#ifndef linux
-#include <signum.h>
-#endif
-#endif
-
-/* ----------------- System dependencies (with more includes) -----------*/
-
-#if defined(__FreeBSD__)
-# include <floatingpoint.h>
-#endif
-
-#if !defined(VMS) || (!defined(VAXC) && !defined(__DECC))
-#include <sys/types.h>
-#include <sys/stat.h>
-#else /* VMS w/ VAXC or DECC */
-#include <types.h>
-#include <stat.h>
-#include <file.h> /* avoid <fcntl.h> in io.c */
-#endif
-
-#include <signal.h>
-
-#ifdef __STDC__
-#define P(s) s
-#define MALLOC_ARG_T size_t
-#else
-#define P(s) ()
-#define MALLOC_ARG_T unsigned
-#define volatile
-#define const
-#endif
-
-#ifndef SIGTYPE
-#define SIGTYPE void
-#endif
-
-#ifdef SIZE_T_MISSING
-typedef unsigned int size_t;
-#endif
-
-#ifndef SZTC
-#define SZTC
-#define INTC
-#endif
-
-#ifdef STDC_HEADERS
-#include <stdlib.h>
-#include <string.h>
-#ifdef NeXT
-#include <libc.h>
-#undef atof
-#else
-#if defined(atarist) || defined(VMS)
-#include <unixlib.h>
-#else /* atarist || VMS */
-#if !defined(MSDOS) && !defined(_MSC_VER)
-#include <unistd.h>
-#endif /* MSDOS */
-#endif /* atarist || VMS */
-#endif /* Next */
-#else /* STDC_HEADERS */
-#include "protos.h"
-#endif /* STDC_HEADERS */
-
-#if defined(ultrix) && !defined(Ultrix41)
-extern char * getenv P((char *name));
-extern double atof P((char *s));
-#endif
-
-#ifndef __GNUC__
-#ifdef sparc
-/* nasty nasty SunOS-ism */
-#include <alloca.h>
-#ifdef lint
-extern char *alloca();
-#endif
-#else /* not sparc */
-#if !defined(alloca) && !defined(ALLOCA_PROTO)
-#if defined(_MSC_VER)
-#include <malloc.h>
-#else
-extern char *alloca();
-#endif /* _MSC_VER */
-#endif
-#endif /* sparc */
-#endif /* __GNUC__ */
-
-#ifdef HAVE_UNDERSCORE_SETJMP
-/* nasty nasty berkelixm */
-#define setjmp _setjmp
-#define longjmp _longjmp
-#endif
-
-/*
- * if you don't have vprintf, try this and cross your fingers.
- */
-#if defined(VPRINTF_MISSING)
-#define vfprintf(fp,fmt,arg) _doprnt((fmt), (arg), (fp))
-#endif
-
-#ifdef VMS
-/* some macros to redirect to code in vms/vms_misc.c */
-#define exit vms_exit
-#define open vms_open
-#define strerror vms_strerror
-#define strdup vms_strdup
-extern void exit P((int));
-extern int open P((const char *,int,...));
-extern char *strerror P((int));
-extern char *strdup P((const char *str));
-extern int vms_devopen P((const char *,int));
-# ifndef NO_TTY_FWRITE
-#define fwrite tty_fwrite
-#define fclose tty_fclose
-extern size_t fwrite P((const void *,size_t,size_t,FILE *));
-extern int fclose P((FILE *));
-# endif
-extern FILE *popen P((const char *,const char *));
-extern int pclose P((FILE *));
-extern void vms_arg_fixup P((int *,char ***));
-/* some things not in STDC_HEADERS */
-extern size_t gnu_strftime P((char *,size_t,const char *,const struct tm *));
-extern int unlink P((const char *));
-extern int getopt P((int,char **,char *));
-extern int isatty P((int));
-#ifndef fileno
-extern int fileno P((FILE *));
-#endif
-extern int close(), dup(), dup2(), fstat(), read(), stat();
-extern int getpgrp P((void));
-#endif /*VMS*/
-
-#define GNU_REGEX
-#ifdef GNU_REGEX
-#include "gnuregex.h"
-#include "dfa.h"
-typedef struct Regexp {
- struct re_pattern_buffer pat;
- struct re_registers regs;
- struct dfa dfareg;
- int dfa;
-} Regexp;
-#define RESTART(rp,s) (rp)->regs.start[0]
-#define REEND(rp,s) (rp)->regs.end[0]
-#else /* GNU_REGEX */
-#endif /* GNU_REGEX */
-
-#ifdef atarist
-#define read _text_read /* we do not want all these CR's to mess our input */
-extern int _text_read (int, char *, int);
-#ifndef __MINT__
-#undef NGROUPS_MAX
-#endif /* __MINT__ */
-#endif
-
-#ifndef DEFPATH
-#define DEFPATH ".:/usr/local/lib/awk:/usr/lib/awk"
-#endif
-
-#ifndef ENVSEP
-#define ENVSEP ':'
-#endif
-
-extern double double_to_int P((double d));
-
-/* ------------------ Constants, Structures, Typedefs ------------------ */
-#define AWKNUM double
-
-typedef enum {
- /* illegal entry == 0 */
- Node_illegal,
-
- /* binary operators lnode and rnode are the expressions to work on */
- Node_times,
- Node_quotient,
- Node_mod,
- Node_plus,
- Node_minus,
- Node_cond_pair, /* conditional pair (see Node_line_range) */
- Node_subscript,
- Node_concat,
- Node_exp,
-
- /* unary operators subnode is the expression to work on */
-/*10*/ Node_preincrement,
- Node_predecrement,
- Node_postincrement,
- Node_postdecrement,
- Node_unary_minus,
- Node_field_spec,
-
- /* assignments lnode is the var to assign to, rnode is the exp */
- Node_assign,
- Node_assign_times,
- Node_assign_quotient,
- Node_assign_mod,
-/*20*/ Node_assign_plus,
- Node_assign_minus,
- Node_assign_exp,
-
- /* boolean binaries lnode and rnode are expressions */
- Node_and,
- Node_or,
-
- /* binary relationals compares lnode and rnode */
- Node_equal,
- Node_notequal,
- Node_less,
- Node_greater,
- Node_leq,
-/*30*/ Node_geq,
- Node_match,
- Node_nomatch,
-
- /* unary relationals works on subnode */
- Node_not,
-
- /* program structures */
- Node_rule_list, /* lnode is a rule, rnode is rest of list */
- Node_rule_node, /* lnode is pattern, rnode is statement */
- Node_statement_list, /* lnode is statement, rnode is more list */
- Node_if_branches, /* lnode is to run on true, rnode on false */
- Node_expression_list, /* lnode is an exp, rnode is more list */
- Node_param_list, /* lnode is a variable, rnode is more list */
-
- /* keywords */
-/*40*/ Node_K_if, /* lnode is conditonal, rnode is if_branches */
- Node_K_while, /* lnode is condtional, rnode is stuff to run */
- Node_K_for, /* lnode is for_struct, rnode is stuff to run */
- Node_K_arrayfor, /* lnode is for_struct, rnode is stuff to run */
- Node_K_break, /* no subs */
- Node_K_continue, /* no stuff */
- Node_K_print, /* lnode is exp_list, rnode is redirect */
- Node_K_printf, /* lnode is exp_list, rnode is redirect */
- Node_K_next, /* no subs */
- Node_K_exit, /* subnode is return value, or NULL */
-/*50*/ Node_K_do, /* lnode is conditional, rnode stuff to run */
- Node_K_return,
- Node_K_delete,
- Node_K_getline,
- Node_K_function, /* lnode is statement list, rnode is params */
-
- /* I/O redirection for print statements */
- Node_redirect_output, /* subnode is where to redirect */
- Node_redirect_append, /* subnode is where to redirect */
- Node_redirect_pipe, /* subnode is where to redirect */
- Node_redirect_pipein, /* subnode is where to redirect */
- Node_redirect_input, /* subnode is where to redirect */
-
- /* Variables */
-/*60*/ Node_var, /* rnode is value, lnode is array stuff */
- Node_var_array, /* array is ptr to elements, asize num of
- * eles */
- Node_val, /* node is a value - type in flags */
-
- /* Builtins subnode is explist to work on, proc is func to call */
- Node_builtin,
-
- /*
- * pattern: conditional ',' conditional ; lnode of Node_line_range
- * is the two conditionals (Node_cond_pair), other word (rnode place)
- * is a flag indicating whether or not this range has been entered.
- */
- Node_line_range,
-
- /*
- * boolean test of membership in array lnode is string-valued
- * expression rnode is array name
- */
- Node_in_array,
-
- Node_func, /* lnode is param. list, rnode is body */
- Node_func_call, /* lnode is name, rnode is argument list */
-
- Node_cond_exp, /* lnode is conditonal, rnode is if_branches */
- Node_regex,
-/*70*/ Node_hashnode,
- Node_ahash,
- Node_NF,
- Node_NR,
- Node_FNR,
- Node_FS,
- Node_RS,
- Node_FIELDWIDTHS,
- Node_IGNORECASE,
- Node_OFS,
- Node_ORS,
- Node_OFMT,
- Node_CONVFMT,
- Node_K_nextfile
-} NODETYPE;
-
-/*
- * NOTE - this struct is a rather kludgey -- it is packed to minimize
- * space usage, at the expense of cleanliness. Alter at own risk.
- */
-typedef struct exp_node {
- union {
- struct {
- union {
- struct exp_node *lptr;
- char *param_name;
- long ll;
- } l;
- union {
- struct exp_node *rptr;
- struct exp_node *(*pptr) ();
- Regexp *preg;
- struct for_loop_header *hd;
- struct exp_node **av;
- int r_ent; /* range entered */
- } r;
- union {
- char *name;
- struct exp_node *extra;
- long xl;
- } x;
- short number;
- unsigned char reflags;
-# define CASE 1
-# define CONST 2
-# define FS_DFLT 4
- } nodep;
- struct {
- AWKNUM fltnum; /* this is here for optimal packing of
- * the structure on many machines
- */
- char *sp;
- size_t slen;
- unsigned char sref;
- char idx;
- } val;
- struct {
- struct exp_node *next;
- char *name;
- size_t length;
- struct exp_node *value;
- } hash;
-#define hnext sub.hash.next
-#define hname sub.hash.name
-#define hlength sub.hash.length
-#define hvalue sub.hash.value
- struct {
- struct exp_node *next;
- struct exp_node *name;
- struct exp_node *value;
- } ahash;
-#define ahnext sub.ahash.next
-#define ahname sub.ahash.name
-#define ahvalue sub.ahash.value
- } sub;
- NODETYPE type;
- unsigned short flags;
-# define MALLOC 1 /* can be free'd */
-# define TEMP 2 /* should be free'd */
-# define PERM 4 /* can't be free'd */
-# define STRING 8 /* assigned as string */
-# define STR 16 /* string value is current */
-# define NUM 32 /* numeric value is current */
-# define NUMBER 64 /* assigned as number */
-# define MAYBE_NUM 128 /* user input: if NUMERIC then
- * a NUMBER */
-# define ARRAYMAXED 256 /* array is at max size */
- char *vname; /* variable's name */
-} NODE;
-
-#define lnode sub.nodep.l.lptr
-#define nextp sub.nodep.l.lptr
-#define rnode sub.nodep.r.rptr
-#define source_file sub.nodep.x.name
-#define source_line sub.nodep.number
-#define param_cnt sub.nodep.number
-#define param sub.nodep.l.param_name
-
-#define subnode lnode
-#define proc sub.nodep.r.pptr
-
-#define re_reg sub.nodep.r.preg
-#define re_flags sub.nodep.reflags
-#define re_text lnode
-#define re_exp sub.nodep.x.extra
-#define re_cnt sub.nodep.number
-
-#define forsub lnode
-#define forloop rnode->sub.nodep.r.hd
-
-#define stptr sub.val.sp
-#define stlen sub.val.slen
-#define stref sub.val.sref
-#define stfmt sub.val.idx
-
-#define numbr sub.val.fltnum
-
-#define var_value lnode
-#define var_array sub.nodep.r.av
-#define array_size sub.nodep.l.ll
-#define table_size sub.nodep.x.xl
-
-#define condpair lnode
-#define triggered sub.nodep.r.r_ent
-
-#ifdef DONTDEF
-int primes[] = {31, 61, 127, 257, 509, 1021, 2053, 4099, 8191, 16381};
-#endif
-
-typedef struct for_loop_header {
- NODE *init;
- NODE *cond;
- NODE *incr;
-} FOR_LOOP_HEADER;
-
-/* for "for(iggy in foo) {" */
-struct search {
- NODE *sym;
- size_t idx;
- NODE *bucket;
- NODE *retval;
-};
-
-/* for faster input, bypass stdio */
-typedef struct iobuf {
- int fd;
- char *buf;
- char *off;
- char *end;
- size_t size; /* this will be determined by an fstat() call */
- int cnt;
- long secsiz;
- int flag;
-# define IOP_IS_TTY 1
-# define IOP_IS_INTERNAL 2
-# define IOP_NO_FREE 4
-} IOBUF;
-
-typedef void (*Func_ptr)();
-
-/*
- * structure used to dynamically maintain a linked-list of open files/pipes
- */
-struct redirect {
- unsigned int flag;
-# define RED_FILE 1
-# define RED_PIPE 2
-# define RED_READ 4
-# define RED_WRITE 8
-# define RED_APPEND 16
-# define RED_NOBUF 32
-# define RED_USED 64
-# define RED_EOF 128
- char *value;
- FILE *fp;
- IOBUF *iop;
- int pid;
- int status;
- struct redirect *prev;
- struct redirect *next;
-};
-
-/* structure for our source, either a command line string or a source file */
-struct src {
- enum srctype { CMDLINE = 1, SOURCEFILE } stype;
- char *val;
-};
-
-/* longjmp return codes, must be nonzero */
-/* Continue means either for loop/while continue, or next input record */
-#define TAG_CONTINUE 1
-/* Break means either for/while break, or stop reading input */
-#define TAG_BREAK 2
-/* Return means return from a function call; leave value in ret_node */
-#define TAG_RETURN 3
-
-#ifndef INT_MAX
-#define INT_MAX (~(1 << (sizeof (int) * 8 - 1)))
-#endif
-#ifndef LONG_MAX
-#define LONG_MAX (~(1 << (sizeof (long) * 8 - 1)))
-#endif
-#ifndef ULONG_MAX
-#define ULONG_MAX (~(unsigned long)0)
-#endif
-#ifndef LONG_MIN
-#define LONG_MIN (-LONG_MAX - 1)
-#endif
-#define HUGE INT_MAX
-
-/* -------------------------- External variables -------------------------- */
-/* gawk builtin variables */
-extern long NF;
-extern long NR;
-extern long FNR;
-extern int IGNORECASE;
-extern char *RS;
-extern char *OFS;
-extern int OFSlen;
-extern char *ORS;
-extern int ORSlen;
-extern char *OFMT;
-extern char *CONVFMT;
-extern int CONVFMTidx;
-extern int OFMTidx;
-extern NODE *FS_node, *NF_node, *RS_node, *NR_node;
-extern NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node;
-extern NODE *CONVFMT_node;
-extern NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node;
-extern NODE *IGNORECASE_node;
-extern NODE *FIELDWIDTHS_node;
-
-extern NODE **stack_ptr;
-extern NODE *Nnull_string;
-extern NODE **fields_arr;
-extern int sourceline;
-extern char *source;
-extern NODE *expression_value;
-
-extern NODE *_t; /* used as temporary in tree_eval */
-
-extern const char *myname;
-
-extern NODE *nextfree;
-extern int field0_valid;
-extern int do_unix;
-extern int do_posix;
-extern int do_lint;
-extern int in_begin_rule;
-extern int in_end_rule;
-
-/* ------------------------- Pseudo-functions ------------------------- */
-
-#define is_identchar(c) (isalnum(c) || (c) == '_')
-
-
-#ifndef MPROF
-#define getnode(n) if (nextfree) n = nextfree, nextfree = nextfree->nextp;\
- else n = more_nodes()
-#define freenode(n) ((n)->nextp = nextfree, nextfree = (n))
-#else
-#define getnode(n) emalloc(n, NODE *, sizeof(NODE), "getnode")
-#define freenode(n) free(n)
-#endif
-
-#ifdef DEBUG
-#define tree_eval(t) r_tree_eval(t)
-#define get_lhs(p, a) r_get_lhs((p), (a))
-#undef freenode
-#else
-#define get_lhs(p, a) ((p)->type == Node_var ? (&(p)->var_value) : \
- r_get_lhs((p), (a)))
-#define tree_eval(t) (_t = (t),_t == NULL ? Nnull_string : \
- (_t->type == Node_param_list ? r_tree_eval(_t) : \
- (_t->type == Node_val ? _t : \
- (_t->type == Node_var ? _t->var_value : \
- r_tree_eval(_t)))))
-#endif
-
-#define make_number(x) mk_number((x), (unsigned int)(MALLOC|NUM|NUMBER))
-#define tmp_number(x) mk_number((x), (unsigned int)(MALLOC|TEMP|NUM|NUMBER))
-
-#define free_temp(n) do {if ((n)->flags&TEMP) { unref(n); }} while (0)
-#define make_string(s,l) make_str_node((s), SZTC (l),0)
-#define SCAN 1
-#define ALREADY_MALLOCED 2
-
-#define cant_happen() fatal("internal error line %d, file: %s", \
- __LINE__, __FILE__);
-
-#if defined(__STDC__) && !defined(NO_TOKEN_PASTING)
-#define emalloc(var,ty,x,str) (void)((var=(ty)malloc((MALLOC_ARG_T)(x))) ||\
- (fatal("%s: %s: can't allocate memory (%s)",\
- (str), #var, strerror(errno)),0))
-#define erealloc(var,ty,x,str) (void)((var=(ty)realloc((char *)var,\
- (MALLOC_ARG_T)(x))) ||\
- (fatal("%s: %s: can't allocate memory (%s)",\
- (str), #var, strerror(errno)),0))
-#else /* __STDC__ */
-#define emalloc(var,ty,x,str) (void)((var=(ty)malloc((MALLOC_ARG_T)(x))) ||\
- (fatal("%s: %s: can't allocate memory (%s)",\
- (str), "var", strerror(errno)),0))
-#define erealloc(var,ty,x,str) (void)((var=(ty)realloc((char *)var,\
- (MALLOC_ARG_T)(x))) ||\
- (fatal("%s: %s: can't allocate memory (%s)",\
- (str), "var", strerror(errno)),0))
-#endif /* __STDC__ */
-
-#ifdef DEBUG
-#define force_number r_force_number
-#define force_string r_force_string
-#else /* not DEBUG */
-#ifdef lint
-extern AWKNUM force_number();
-#endif
-#ifdef MSDOS
-extern double _msc51bug;
-#define force_number(n) (_msc51bug=(_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t)))
-#else /* not MSDOS */
-#define force_number(n) (_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t))
-#endif /* MSDOS */
-#define force_string(s) (_t = (s),(_t->flags & STR) ? _t : r_force_string(_t))
-#endif /* not DEBUG */
-
-#define STREQ(a,b) (*(a) == *(b) && strcmp((a), (b)) == 0)
-#define STREQN(a,b,n) ((n)&& *(a)== *(b) && strncmp((a), (b), SZTC (n)) == 0)
-
-/* ------------- Function prototypes or defs (as appropriate) ------------- */
-
-/* array.c */
-extern NODE *concat_exp P((NODE *tree));
-extern void assoc_clear P((NODE *symbol));
-extern unsigned int hash P((const char *s, size_t len, unsigned long hsize));
-extern int in_array P((NODE *symbol, NODE *subs));
-extern NODE **assoc_lookup P((NODE *symbol, NODE *subs));
-extern void do_delete P((NODE *symbol, NODE *tree));
-extern void assoc_scan P((NODE *symbol, struct search *lookat));
-extern void assoc_next P((struct search *lookat));
-/* awk.tab.c */
-extern char *tokexpand P((void));
-extern char nextc P((void));
-extern NODE *node P((NODE *left, NODETYPE op, NODE *right));
-extern NODE *install P((char *name, NODE *value));
-extern NODE *lookup P((const char *name));
-extern NODE *variable P((char *name, int can_free));
-extern int yyparse P((void));
-/* builtin.c */
-extern NODE *do_exp P((NODE *tree));
-extern NODE *do_index P((NODE *tree));
-extern NODE *do_int P((NODE *tree));
-extern NODE *do_length P((NODE *tree));
-extern NODE *do_log P((NODE *tree));
-extern NODE *do_sprintf P((NODE *tree));
-extern void do_printf P((NODE *tree));
-extern void print_simple P((NODE *tree, FILE *fp));
-extern NODE *do_sqrt P((NODE *tree));
-extern NODE *do_substr P((NODE *tree));
-extern NODE *do_strftime P((NODE *tree));
-extern NODE *do_systime P((NODE *tree));
-extern NODE *do_system P((NODE *tree));
-extern void do_print P((NODE *tree));
-extern NODE *do_tolower P((NODE *tree));
-extern NODE *do_toupper P((NODE *tree));
-extern NODE *do_atan2 P((NODE *tree));
-extern NODE *do_sin P((NODE *tree));
-extern NODE *do_cos P((NODE *tree));
-extern NODE *do_rand P((NODE *tree));
-extern NODE *do_srand P((NODE *tree));
-extern NODE *do_match P((NODE *tree));
-extern NODE *do_gsub P((NODE *tree));
-extern NODE *do_sub P((NODE *tree));
-/* eval.c */
-extern int interpret P((NODE *volatile tree));
-extern NODE *r_tree_eval P((NODE *tree));
-extern int cmp_nodes P((NODE *t1, NODE *t2));
-extern NODE **r_get_lhs P((NODE *ptr, Func_ptr *assign));
-extern void set_IGNORECASE P((void));
-void set_OFS P((void));
-void set_ORS P((void));
-void set_OFMT P((void));
-void set_CONVFMT P((void));
-/* field.c */
-extern void init_fields P((void));
-extern void set_record P((char *buf, int cnt, int freeold));
-extern void reset_record P((void));
-extern void set_NF P((void));
-extern NODE **get_field P((int num, Func_ptr *assign));
-extern NODE *do_split P((NODE *tree));
-extern void set_FS P((void));
-extern void set_RS P((void));
-extern void set_FIELDWIDTHS P((void));
-/* io.c */
-extern void set_FNR P((void));
-extern void set_NR P((void));
-extern void do_input P((void));
-extern struct redirect *redirect P((NODE *tree, int *errflg));
-extern NODE *do_close P((NODE *tree));
-extern int flush_io P((void));
-extern int close_io P((void));
-extern int devopen P((const char *name, const char *mode));
-extern int pathopen P((const char *file));
-extern NODE *do_getline P((NODE *tree));
-extern void do_nextfile P((void));
-/* iop.c */
-extern int optimal_bufsize P((int fd));
-extern IOBUF *iop_alloc P((int fd));
-extern int get_a_record P((char **out, IOBUF *iop, int rs, int *errcode));
-/* main.c */
-extern int main P((int argc, char **argv));
-extern Regexp *mk_re_parse P((char *s, int ignorecase));
-extern void load_environ P((void));
-extern char *arg_assign P((char *arg));
-extern SIGTYPE catchsig P((int sig, int code));
-/* msg.c */
-extern void err P((const char *s, const char *emsg, va_list argp));
-#if _MSC_VER == 510
-extern void msg P((va_list va_alist, ...));
-extern void warning P((va_list va_alist, ...));
-extern void fatal P((va_list va_alist, ...));
-#else
-extern void msg ();
-extern void warning ();
-extern void fatal ();
-#endif
-/* node.c */
-extern AWKNUM r_force_number P((NODE *n));
-extern NODE *r_force_string P((NODE *s));
-extern NODE *dupnode P((NODE *n));
-extern NODE *mk_number P((AWKNUM x, unsigned int flags));
-extern NODE *make_str_node P((char *s, size_t len, int scan ));
-extern NODE *tmp_string P((char *s, size_t len ));
-extern NODE *more_nodes P((void));
-#ifdef DEBUG
-extern void freenode P((NODE *it));
-#endif
-extern void unref P((NODE *tmp));
-extern int parse_escape P((char **string_ptr));
-/* re.c */
-extern Regexp *make_regexp P((char *s, size_t len, int ignorecase, int dfa));
-extern int research P((Regexp *rp, char *str, int start,
- size_t len, int need_start));
-extern void refree P((Regexp *rp));
-extern void reg_error P((const char *s));
-extern Regexp *re_update P((NODE *t));
-extern void resyntax P((int syntax));
-extern void resetup P((void));
-
-/* strcase.c */
-extern int strcasecmp P((const char *s1, const char *s2));
-extern int strncasecmp P((const char *s1, const char *s2, register size_t n));
-
-#ifdef atarist
-/* atari/tmpnam.c */
-extern char *tmpnam P((char *buf));
-extern char *tempnam P((const char *path, const char *base));
-#endif
-
-/* Figure out what '\a' really is. */
-#ifdef __STDC__
-#define BELL '\a' /* sure makes life easy, don't it? */
-#else
-# if 'z' - 'a' == 25 /* ascii */
-# if 'a' != 97 /* machine is dumb enough to use mark parity */
-# define BELL '\207'
-# else
-# define BELL '\07'
-# endif
-# else
-# define BELL '\057'
-# endif
-#endif
-
-extern char casetable[]; /* for case-independent regexp matching */
diff --git a/gnu/usr.bin/awk/awk.y b/gnu/usr.bin/awk/awk.y
deleted file mode 100644
index 175cea9..0000000
--- a/gnu/usr.bin/awk/awk.y
+++ /dev/null
@@ -1,1868 +0,0 @@
-/*
- * awk.y --- yacc/bison parser
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-%{
-#ifdef DEBUG
-#define YYDEBUG 12
-#endif
-
-#include "awk.h"
-
-static void yyerror (); /* va_alist */
-static char *get_src_buf P((void));
-static int yylex P((void));
-static NODE *node_common P((NODETYPE op));
-static NODE *snode P((NODE *subn, NODETYPE op, int sindex));
-static NODE *mkrangenode P((NODE *cpair));
-static NODE *make_for_loop P((NODE *init, NODE *cond, NODE *incr));
-static NODE *append_right P((NODE *list, NODE *new));
-static void func_install P((NODE *params, NODE *def));
-static void pop_var P((NODE *np, int freeit));
-static void pop_params P((NODE *params));
-static NODE *make_param P((char *name));
-static NODE *mk_rexp P((NODE *exp));
-
-static int want_assign; /* lexical scanning kludge */
-static int want_regexp; /* lexical scanning kludge */
-static int can_return; /* lexical scanning kludge */
-static int io_allowed = 1; /* lexical scanning kludge */
-static char *lexptr; /* pointer to next char during parsing */
-static char *lexend;
-static char *lexptr_begin; /* keep track of where we were for error msgs */
-static char *lexeme; /* beginning of lexeme for debugging */
-static char *thisline = NULL;
-#define YYDEBUG_LEXER_TEXT (lexeme)
-static int param_counter;
-static char *tokstart = NULL;
-static char *tok = NULL;
-static char *tokend;
-
-#define HASHSIZE 1021 /* this constant only used here */
-NODE *variables[HASHSIZE];
-
-extern char *source;
-extern int sourceline;
-extern struct src *srcfiles;
-extern int numfiles;
-extern int errcount;
-extern NODE *begin_block;
-extern NODE *end_block;
-%}
-
-%union {
- long lval;
- AWKNUM fval;
- NODE *nodeval;
- NODETYPE nodetypeval;
- char *sval;
- NODE *(*ptrval)();
-}
-
-%type <nodeval> function_prologue function_body
-%type <nodeval> rexp exp start program rule simp_exp
-%type <nodeval> non_post_simp_exp
-%type <nodeval> pattern
-%type <nodeval> action variable param_list
-%type <nodeval> rexpression_list opt_rexpression_list
-%type <nodeval> expression_list opt_expression_list
-%type <nodeval> statements statement if_statement opt_param_list
-%type <nodeval> opt_exp opt_variable regexp
-%type <nodeval> input_redir output_redir
-%type <nodetypeval> print
-%type <sval> func_name
-%type <lval> lex_builtin
-
-%token <sval> FUNC_CALL NAME REGEXP
-%token <lval> ERROR
-%token <nodeval> YNUMBER YSTRING
-%token <nodetypeval> RELOP APPEND_OP
-%token <nodetypeval> ASSIGNOP MATCHOP NEWLINE CONCAT_OP
-%token <nodetypeval> LEX_BEGIN LEX_END LEX_IF LEX_ELSE LEX_RETURN LEX_DELETE
-%token <nodetypeval> LEX_WHILE LEX_DO LEX_FOR LEX_BREAK LEX_CONTINUE
-%token <nodetypeval> LEX_PRINT LEX_PRINTF LEX_NEXT LEX_EXIT LEX_FUNCTION
-%token <nodetypeval> LEX_GETLINE
-%token <nodetypeval> LEX_IN
-%token <lval> LEX_AND LEX_OR INCREMENT DECREMENT
-%token <lval> LEX_BUILTIN LEX_LENGTH
-
-/* these are just yylval numbers */
-
-/* Lowest to highest */
-%right ASSIGNOP
-%right '?' ':'
-%left LEX_OR
-%left LEX_AND
-%left LEX_GETLINE
-%nonassoc LEX_IN
-%left FUNC_CALL LEX_BUILTIN LEX_LENGTH
-%nonassoc ','
-%nonassoc MATCHOP
-%nonassoc RELOP '<' '>' '|' APPEND_OP
-%left CONCAT_OP
-%left YSTRING YNUMBER
-%left '+' '-'
-%left '*' '/' '%'
-%right '!' UNARY
-%right '^'
-%left INCREMENT DECREMENT
-%left '$'
-%left '(' ')'
-%%
-
-start
- : opt_nls program opt_nls
- { expression_value = $2; }
- ;
-
-program
- : rule
- {
- if ($1 != NULL)
- $$ = $1;
- else
- $$ = NULL;
- yyerrok;
- }
- | program rule
- /* add the rule to the tail of list */
- {
- if ($2 == NULL)
- $$ = $1;
- else if ($1 == NULL)
- $$ = $2;
- else {
- if ($1->type != Node_rule_list)
- $1 = node($1, Node_rule_list,
- (NODE*)NULL);
- $$ = append_right ($1,
- node($2, Node_rule_list,(NODE *) NULL));
- }
- yyerrok;
- }
- | error { $$ = NULL; }
- | program error { $$ = NULL; }
- | /* empty */ { $$ = NULL; }
- ;
-
-rule
- : LEX_BEGIN { io_allowed = 0; }
- action
- {
- if (begin_block) {
- if (begin_block->type != Node_rule_list)
- begin_block = node(begin_block, Node_rule_list,
- (NODE *)NULL);
- (void) append_right (begin_block, node(
- node((NODE *)NULL, Node_rule_node, $3),
- Node_rule_list, (NODE *)NULL) );
- } else
- begin_block = node((NODE *)NULL, Node_rule_node, $3);
- $$ = NULL;
- io_allowed = 1;
- yyerrok;
- }
- | LEX_END { io_allowed = 0; }
- action
- {
- if (end_block) {
- if (end_block->type != Node_rule_list)
- end_block = node(end_block, Node_rule_list,
- (NODE *)NULL);
- (void) append_right (end_block, node(
- node((NODE *)NULL, Node_rule_node, $3),
- Node_rule_list, (NODE *)NULL));
- } else
- end_block = node((NODE *)NULL, Node_rule_node, $3);
- $$ = NULL;
- io_allowed = 1;
- yyerrok;
- }
- | LEX_BEGIN statement_term
- {
- warning("BEGIN blocks must have an action part");
- errcount++;
- yyerrok;
- }
- | LEX_END statement_term
- {
- warning("END blocks must have an action part");
- errcount++;
- yyerrok;
- }
- | pattern action
- { $$ = node ($1, Node_rule_node, $2); yyerrok; }
- | action
- { $$ = node ((NODE *)NULL, Node_rule_node, $1); yyerrok; }
- | pattern statement_term
- {
- $$ = node ($1,
- Node_rule_node,
- node(node(node(make_number(0.0),
- Node_field_spec,
- (NODE *) NULL),
- Node_expression_list,
- (NODE *) NULL),
- Node_K_print,
- (NODE *) NULL));
- yyerrok;
- }
- | function_prologue function_body
- {
- func_install($1, $2);
- $$ = NULL;
- yyerrok;
- }
- ;
-
-func_name
- : NAME
- { $$ = $1; }
- | FUNC_CALL
- { $$ = $1; }
- | lex_builtin
- {
- yyerror("%s() is a built-in function, it cannot be redefined",
- tokstart);
- errcount++;
- /* yyerrok; */
- }
- ;
-
-lex_builtin
- : LEX_BUILTIN
- | LEX_LENGTH
- ;
-
-function_prologue
- : LEX_FUNCTION
- {
- param_counter = 0;
- }
- func_name '(' opt_param_list r_paren opt_nls
- {
- $$ = append_right(make_param($3), $5);
- can_return = 1;
- }
- ;
-
-function_body
- : l_brace statements r_brace opt_semi
- {
- $$ = $2;
- can_return = 0;
- }
- ;
-
-
-pattern
- : exp
- { $$ = $1; }
- | exp ',' exp
- { $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); }
- ;
-
-regexp
- /*
- * In this rule, want_regexp tells yylex that the next thing
- * is a regexp so it should read up to the closing slash.
- */
- : '/'
- { ++want_regexp; }
- REGEXP '/'
- {
- NODE *n;
- size_t len;
-
- getnode(n);
- n->type = Node_regex;
- len = strlen($3);
- n->re_exp = make_string($3, len);
- n->re_reg = make_regexp($3, len, 0, 1);
- n->re_text = NULL;
- n->re_flags = CONST;
- n->re_cnt = 1;
- $$ = n;
- }
- ;
-
-action
- : l_brace statements r_brace opt_semi opt_nls
- { $$ = $2 ; }
- | l_brace r_brace opt_semi opt_nls
- { $$ = NULL; }
- ;
-
-statements
- : statement
- { $$ = $1; }
- | statements statement
- {
- if ($1 == NULL || $1->type != Node_statement_list)
- $1 = node($1, Node_statement_list,(NODE *)NULL);
- $$ = append_right($1,
- node( $2, Node_statement_list, (NODE *)NULL));
- yyerrok;
- }
- | error
- { $$ = NULL; }
- | statements error
- { $$ = NULL; }
- ;
-
-statement_term
- : nls
- | semi opt_nls
- ;
-
-statement
- : semi opt_nls
- { $$ = NULL; }
- | l_brace r_brace
- { $$ = NULL; }
- | l_brace statements r_brace
- { $$ = $2; }
- | if_statement
- { $$ = $1; }
- | LEX_WHILE '(' exp r_paren opt_nls statement
- { $$ = node ($3, Node_K_while, $6); }
- | LEX_DO opt_nls statement LEX_WHILE '(' exp r_paren opt_nls
- { $$ = node ($6, Node_K_do, $3); }
- | LEX_FOR '(' NAME LEX_IN NAME r_paren opt_nls statement
- {
- $$ = node ($8, Node_K_arrayfor, make_for_loop(variable($3,1),
- (NODE *)NULL, variable($5,1)));
- }
- | LEX_FOR '(' opt_exp semi exp semi opt_exp r_paren opt_nls statement
- {
- $$ = node($10, Node_K_for, (NODE *)make_for_loop($3, $5, $7));
- }
- | LEX_FOR '(' opt_exp semi semi opt_exp r_paren opt_nls statement
- {
- $$ = node ($9, Node_K_for,
- (NODE *)make_for_loop($3, (NODE *)NULL, $6));
- }
- | LEX_BREAK statement_term
- /* for break, maybe we'll have to remember where to break to */
- { $$ = node ((NODE *)NULL, Node_K_break, (NODE *)NULL); }
- | LEX_CONTINUE statement_term
- /* similarly */
- { $$ = node ((NODE *)NULL, Node_K_continue, (NODE *)NULL); }
- | print '(' expression_list r_paren output_redir statement_term
- { $$ = node ($3, $1, $5); }
- | print opt_rexpression_list output_redir statement_term
- {
- if ($1 == Node_K_print && $2 == NULL)
- $2 = node(node(make_number(0.0),
- Node_field_spec,
- (NODE *) NULL),
- Node_expression_list,
- (NODE *) NULL);
-
- $$ = node ($2, $1, $3);
- }
- | LEX_NEXT opt_exp statement_term
- { NODETYPE type;
-
- if ($2 && $2 == lookup("file")) {
- if (do_lint)
- warning("`next file' is a gawk extension");
- if (do_unix || do_posix) {
- /*
- * can't use yyerror, since may have overshot
- * the source line
- */
- errcount++;
- msg("`next file' is a gawk extension");
- }
- if (! io_allowed) {
- /* same thing */
- errcount++;
- msg("`next file' used in BEGIN or END action");
- }
- type = Node_K_nextfile;
- } else {
- if (! io_allowed)
- yyerror("next used in BEGIN or END action");
- type = Node_K_next;
- }
- $$ = node ((NODE *)NULL, type, (NODE *)NULL);
- }
- | LEX_EXIT opt_exp statement_term
- { $$ = node ($2, Node_K_exit, (NODE *)NULL); }
- | LEX_RETURN
- { if (! can_return) yyerror("return used outside function context"); }
- opt_exp statement_term
- { $$ = node ($3, Node_K_return, (NODE *)NULL); }
- | LEX_DELETE NAME '[' expression_list ']' statement_term
- { $$ = node (variable($2,1), Node_K_delete, $4); }
- | LEX_DELETE NAME statement_term
- {
- if (do_lint)
- warning("`delete array' is a gawk extension");
- if (do_unix || do_posix) {
- /*
- * can't use yyerror, since may have overshot
- * the source line
- */
- errcount++;
- msg("`delete array' is a gawk extension");
- }
- $$ = node (variable($2,1), Node_K_delete, (NODE *) NULL);
- }
- | exp statement_term
- { $$ = $1; }
- ;
-
-print
- : LEX_PRINT
- { $$ = $1; }
- | LEX_PRINTF
- { $$ = $1; }
- ;
-
-if_statement
- : LEX_IF '(' exp r_paren opt_nls statement
- {
- $$ = node($3, Node_K_if,
- node($6, Node_if_branches, (NODE *)NULL));
- }
- | LEX_IF '(' exp r_paren opt_nls statement
- LEX_ELSE opt_nls statement
- { $$ = node ($3, Node_K_if,
- node ($6, Node_if_branches, $9)); }
- ;
-
-nls
- : NEWLINE
- { want_assign = 0; }
- | nls NEWLINE
- ;
-
-opt_nls
- : /* empty */
- | nls
- ;
-
-input_redir
- : /* empty */
- { $$ = NULL; }
- | '<' simp_exp
- { $$ = node ($2, Node_redirect_input, (NODE *)NULL); }
- ;
-
-output_redir
- : /* empty */
- { $$ = NULL; }
- | '>' exp
- { $$ = node ($2, Node_redirect_output, (NODE *)NULL); }
- | APPEND_OP exp
- { $$ = node ($2, Node_redirect_append, (NODE *)NULL); }
- | '|' exp
- { $$ = node ($2, Node_redirect_pipe, (NODE *)NULL); }
- ;
-
-opt_param_list
- : /* empty */
- { $$ = NULL; }
- | param_list
- { $$ = $1; }
- ;
-
-param_list
- : NAME
- { $$ = make_param($1); }
- | param_list comma NAME
- { $$ = append_right($1, make_param($3)); yyerrok; }
- | error
- { $$ = NULL; }
- | param_list error
- { $$ = NULL; }
- | param_list comma error
- { $$ = NULL; }
- ;
-
-/* optional expression, as in for loop */
-opt_exp
- : /* empty */
- { $$ = NULL; }
- | exp
- { $$ = $1; }
- ;
-
-opt_rexpression_list
- : /* empty */
- { $$ = NULL; }
- | rexpression_list
- { $$ = $1; }
- ;
-
-rexpression_list
- : rexp
- { $$ = node ($1, Node_expression_list, (NODE *)NULL); }
- | rexpression_list comma rexp
- {
- $$ = append_right($1,
- node( $3, Node_expression_list, (NODE *)NULL));
- yyerrok;
- }
- | error
- { $$ = NULL; }
- | rexpression_list error
- { $$ = NULL; }
- | rexpression_list error rexp
- { $$ = NULL; }
- | rexpression_list comma error
- { $$ = NULL; }
- ;
-
-opt_expression_list
- : /* empty */
- { $$ = NULL; }
- | expression_list
- { $$ = $1; }
- ;
-
-expression_list
- : exp
- { $$ = node ($1, Node_expression_list, (NODE *)NULL); }
- | expression_list comma exp
- {
- $$ = append_right($1,
- node( $3, Node_expression_list, (NODE *)NULL));
- yyerrok;
- }
- | error
- { $$ = NULL; }
- | expression_list error
- { $$ = NULL; }
- | expression_list error exp
- { $$ = NULL; }
- | expression_list comma error
- { $$ = NULL; }
- ;
-
-/* Expressions, not including the comma operator. */
-exp : variable ASSIGNOP
- { want_assign = 0; }
- exp
- {
- if (do_lint && $4->type == Node_regex)
- warning("Regular expression on left of assignment.");
- $$ = node ($1, $2, $4);
- }
- | '(' expression_list r_paren LEX_IN NAME
- { $$ = node (variable($5,1), Node_in_array, $2); }
- | exp '|' LEX_GETLINE opt_variable
- {
- $$ = node ($4, Node_K_getline,
- node ($1, Node_redirect_pipein, (NODE *)NULL));
- }
- | LEX_GETLINE opt_variable input_redir
- {
- if (do_lint && ! io_allowed && $3 == NULL)
- warning("non-redirected getline undefined inside BEGIN or END action");
- $$ = node ($2, Node_K_getline, $3);
- }
- | exp LEX_AND exp
- { $$ = node ($1, Node_and, $3); }
- | exp LEX_OR exp
- { $$ = node ($1, Node_or, $3); }
- | exp MATCHOP exp
- {
- if ($1->type == Node_regex)
- warning("Regular expression on left of MATCH operator.");
- $$ = node ($1, $2, mk_rexp($3));
- }
- | regexp
- { $$ = $1; }
- | '!' regexp %prec UNARY
- {
- $$ = node(node(make_number(0.0),
- Node_field_spec,
- (NODE *) NULL),
- Node_nomatch,
- $2);
- }
- | exp LEX_IN NAME
- { $$ = node (variable($3,1), Node_in_array, $1); }
- | exp RELOP exp
- {
- if (do_lint && $3->type == Node_regex)
- warning("Regular expression on left of comparison.");
- $$ = node ($1, $2, $3);
- }
- | exp '<' exp
- { $$ = node ($1, Node_less, $3); }
- | exp '>' exp
- { $$ = node ($1, Node_greater, $3); }
- | exp '?' exp ':' exp
- { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5));}
- | simp_exp
- { $$ = $1; }
- | exp simp_exp %prec CONCAT_OP
- { $$ = node ($1, Node_concat, $2); }
- ;
-
-rexp
- : variable ASSIGNOP
- { want_assign = 0; }
- rexp
- { $$ = node ($1, $2, $4); }
- | rexp LEX_AND rexp
- { $$ = node ($1, Node_and, $3); }
- | rexp LEX_OR rexp
- { $$ = node ($1, Node_or, $3); }
- | LEX_GETLINE opt_variable input_redir
- {
- if (do_lint && ! io_allowed && $3 == NULL)
- warning("non-redirected getline undefined inside BEGIN or END action");
- $$ = node ($2, Node_K_getline, $3);
- }
- | regexp
- { $$ = $1; }
- | '!' regexp %prec UNARY
- { $$ = node((NODE *) NULL, Node_nomatch, $2); }
- | rexp MATCHOP rexp
- { $$ = node ($1, $2, mk_rexp($3)); }
- | rexp LEX_IN NAME
- { $$ = node (variable($3,1), Node_in_array, $1); }
- | rexp RELOP rexp
- { $$ = node ($1, $2, $3); }
- | rexp '?' rexp ':' rexp
- { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5));}
- | simp_exp
- { $$ = $1; }
- | rexp simp_exp %prec CONCAT_OP
- { $$ = node ($1, Node_concat, $2); }
- ;
-
-simp_exp
- : non_post_simp_exp
- /* Binary operators in order of decreasing precedence. */
- | simp_exp '^' simp_exp
- { $$ = node ($1, Node_exp, $3); }
- | simp_exp '*' simp_exp
- { $$ = node ($1, Node_times, $3); }
- | simp_exp '/' simp_exp
- { $$ = node ($1, Node_quotient, $3); }
- | simp_exp '%' simp_exp
- { $$ = node ($1, Node_mod, $3); }
- | simp_exp '+' simp_exp
- { $$ = node ($1, Node_plus, $3); }
- | simp_exp '-' simp_exp
- { $$ = node ($1, Node_minus, $3); }
- | variable INCREMENT
- { $$ = node ($1, Node_postincrement, (NODE *)NULL); }
- | variable DECREMENT
- { $$ = node ($1, Node_postdecrement, (NODE *)NULL); }
- ;
-
-non_post_simp_exp
- : '!' simp_exp %prec UNARY
- { $$ = node ($2, Node_not,(NODE *) NULL); }
- | '(' exp r_paren
- { $$ = $2; }
- | LEX_BUILTIN
- '(' opt_expression_list r_paren
- { $$ = snode ($3, Node_builtin, (int) $1); }
- | LEX_LENGTH '(' opt_expression_list r_paren
- { $$ = snode ($3, Node_builtin, (int) $1); }
- | LEX_LENGTH
- {
- if (do_lint)
- warning("call of `length' without parentheses is not portable");
- $$ = snode ((NODE *)NULL, Node_builtin, (int) $1);
- if (do_posix)
- warning( "call of `length' without parentheses is deprecated by POSIX");
- }
- | FUNC_CALL '(' opt_expression_list r_paren
- {
- $$ = node ($3, Node_func_call, make_string($1, strlen($1)));
- }
- | variable
- | INCREMENT variable
- { $$ = node ($2, Node_preincrement, (NODE *)NULL); }
- | DECREMENT variable
- { $$ = node ($2, Node_predecrement, (NODE *)NULL); }
- | YNUMBER
- { $$ = $1; }
- | YSTRING
- { $$ = $1; }
-
- | '-' simp_exp %prec UNARY
- { if ($2->type == Node_val) {
- $2->numbr = -(force_number($2));
- $$ = $2;
- } else
- $$ = node ($2, Node_unary_minus, (NODE *)NULL);
- }
- | '+' simp_exp %prec UNARY
- {
- /* was: $$ = $2 */
- /* POSIX semantics: force a conversion to numeric type */
- $$ = node (make_number(0.0), Node_plus, $2);
- }
- ;
-
-opt_variable
- : /* empty */
- { $$ = NULL; }
- | variable
- { $$ = $1; }
- ;
-
-variable
- : NAME
- { $$ = variable($1,1); }
- | NAME '[' expression_list ']'
- {
- if ($3->rnode == NULL) {
- $$ = node (variable($1,1), Node_subscript, $3->lnode);
- freenode($3);
- } else
- $$ = node (variable($1,1), Node_subscript, $3);
- }
- | '$' non_post_simp_exp
- { $$ = node ($2, Node_field_spec, (NODE *)NULL); }
- ;
-
-l_brace
- : '{' opt_nls
- ;
-
-r_brace
- : '}' opt_nls { yyerrok; }
- ;
-
-r_paren
- : ')' { yyerrok; }
- ;
-
-opt_semi
- : /* empty */
- | semi
- ;
-
-semi
- : ';' { yyerrok; want_assign = 0; }
- ;
-
-comma : ',' opt_nls { yyerrok; }
- ;
-
-%%
-
-struct token {
- const char *operator; /* text to match */
- NODETYPE value; /* node type */
- int class; /* lexical class */
- unsigned flags; /* # of args. allowed and compatability */
-# define ARGS 0xFF /* 0, 1, 2, 3 args allowed (any combination */
-# define A(n) (1<<(n))
-# define VERSION 0xFF00 /* old awk is zero */
-# define NOT_OLD 0x0100 /* feature not in old awk */
-# define NOT_POSIX 0x0200 /* feature not in POSIX */
-# define GAWKX 0x0400 /* gawk extension */
- NODE *(*ptr) (); /* function that implements this keyword */
-};
-
-extern NODE
- *do_exp(), *do_getline(), *do_index(), *do_length(),
- *do_sqrt(), *do_log(), *do_sprintf(), *do_substr(),
- *do_split(), *do_system(), *do_int(), *do_close(),
- *do_atan2(), *do_sin(), *do_cos(), *do_rand(),
- *do_srand(), *do_match(), *do_tolower(), *do_toupper(),
- *do_sub(), *do_gsub(), *do_strftime(), *do_systime();
-
-/* Tokentab is sorted ascii ascending order, so it can be binary searched. */
-
-static struct token tokentab[] = {
-{"BEGIN", Node_illegal, LEX_BEGIN, 0, 0},
-{"END", Node_illegal, LEX_END, 0, 0},
-{"atan2", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2), do_atan2},
-{"break", Node_K_break, LEX_BREAK, 0, 0},
-{"close", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_close},
-{"continue", Node_K_continue, LEX_CONTINUE, 0, 0},
-{"cos", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_cos},
-{"delete", Node_K_delete, LEX_DELETE, NOT_OLD, 0},
-{"do", Node_K_do, LEX_DO, NOT_OLD, 0},
-{"else", Node_illegal, LEX_ELSE, 0, 0},
-{"exit", Node_K_exit, LEX_EXIT, 0, 0},
-{"exp", Node_builtin, LEX_BUILTIN, A(1), do_exp},
-{"for", Node_K_for, LEX_FOR, 0, 0},
-{"func", Node_K_function, LEX_FUNCTION, NOT_POSIX|NOT_OLD, 0},
-{"function", Node_K_function, LEX_FUNCTION, NOT_OLD, 0},
-{"getline", Node_K_getline, LEX_GETLINE, NOT_OLD, 0},
-{"gsub", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2)|A(3), do_gsub},
-{"if", Node_K_if, LEX_IF, 0, 0},
-{"in", Node_illegal, LEX_IN, 0, 0},
-{"index", Node_builtin, LEX_BUILTIN, A(2), do_index},
-{"int", Node_builtin, LEX_BUILTIN, A(1), do_int},
-{"length", Node_builtin, LEX_LENGTH, A(0)|A(1), do_length},
-{"log", Node_builtin, LEX_BUILTIN, A(1), do_log},
-{"match", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2), do_match},
-{"next", Node_K_next, LEX_NEXT, 0, 0},
-{"print", Node_K_print, LEX_PRINT, 0, 0},
-{"printf", Node_K_printf, LEX_PRINTF, 0, 0},
-{"rand", Node_builtin, LEX_BUILTIN, NOT_OLD|A(0), do_rand},
-{"return", Node_K_return, LEX_RETURN, NOT_OLD, 0},
-{"sin", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_sin},
-{"split", Node_builtin, LEX_BUILTIN, A(2)|A(3), do_split},
-{"sprintf", Node_builtin, LEX_BUILTIN, 0, do_sprintf},
-{"sqrt", Node_builtin, LEX_BUILTIN, A(1), do_sqrt},
-{"srand", Node_builtin, LEX_BUILTIN, NOT_OLD|A(0)|A(1), do_srand},
-{"strftime", Node_builtin, LEX_BUILTIN, GAWKX|A(1)|A(2), do_strftime},
-{"sub", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2)|A(3), do_sub},
-{"substr", Node_builtin, LEX_BUILTIN, A(2)|A(3), do_substr},
-{"system", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_system},
-{"systime", Node_builtin, LEX_BUILTIN, GAWKX|A(0), do_systime},
-{"tolower", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_tolower},
-{"toupper", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_toupper},
-{"while", Node_K_while, LEX_WHILE, 0, 0},
-};
-
-/* VARARGS0 */
-static void
-yyerror(va_alist)
-va_dcl
-{
- va_list args;
- const char *mesg = NULL;
- register char *bp, *cp;
- char *scan;
- char buf[120];
- static char end_of_file_line[] = "(END OF FILE)";
-
- errcount++;
- /* Find the current line in the input file */
- if (lexptr && lexeme) {
- if (!thisline) {
- cp = lexeme;
- if (*cp == '\n') {
- cp--;
- mesg = "unexpected newline";
- }
- for ( ; cp != lexptr_begin && *cp != '\n'; --cp)
- continue;
- if (*cp == '\n')
- cp++;
- thisline = cp;
- }
- /* NL isn't guaranteed */
- bp = lexeme;
- while (bp < lexend && *bp && *bp != '\n')
- bp++;
- } else {
- thisline = end_of_file_line;
- bp = thisline + strlen(thisline);
- }
- msg("%.*s", (int) (bp - thisline), thisline);
- bp = buf;
- cp = buf + sizeof(buf) - 24; /* 24 more than longest msg. input */
- if (lexptr) {
- scan = thisline;
- while (bp < cp && scan < lexeme)
- if (*scan++ == '\t')
- *bp++ = '\t';
- else
- *bp++ = ' ';
- *bp++ = '^';
- *bp++ = ' ';
- }
- va_start(args);
- if (mesg == NULL)
- mesg = va_arg(args, char *);
- strcpy(bp, mesg);
- err("", buf, args);
- va_end(args);
- exit(2);
-}
-
-static char *
-get_src_buf()
-{
- static int samefile = 0;
- static int nextfile = 0;
- static char *buf = NULL;
- static int fd;
- int n;
- register char *scan;
- static int len = 0;
- static int did_newline = 0;
-# define SLOP 128 /* enough space to hold most source lines */
-
-again:
- if (nextfile > numfiles)
- return NULL;
-
- if (srcfiles[nextfile].stype == CMDLINE) {
- if (len == 0) {
- len = strlen(srcfiles[nextfile].val);
- if (len == 0) {
- /*
- * Yet Another Special case:
- * gawk '' /path/name
- * Sigh.
- */
- ++nextfile;
- goto again;
- }
- sourceline = 1;
- lexptr = lexptr_begin = srcfiles[nextfile].val;
- lexend = lexptr + len;
- } else if (!did_newline && *(lexptr-1) != '\n') {
- /*
- * The following goop is to ensure that the source
- * ends with a newline and that the entire current
- * line is available for error messages.
- */
- int offset;
-
- did_newline = 1;
- offset = lexptr - lexeme;
- for (scan = lexeme; scan > lexptr_begin; scan--)
- if (*scan == '\n') {
- scan++;
- break;
- }
- len = lexptr - scan;
- emalloc(buf, char *, len+1, "get_src_buf");
- memcpy(buf, scan, len);
- thisline = buf;
- lexptr = buf + len;
- *lexptr = '\n';
- lexeme = lexptr - offset;
- lexptr_begin = buf;
- lexend = lexptr + 1;
- } else {
- len = 0;
- lexeme = lexptr = lexptr_begin = NULL;
- }
- if (lexptr == NULL && ++nextfile <= numfiles)
- return get_src_buf();
- return lexptr;
- }
- if (!samefile) {
- source = srcfiles[nextfile].val;
- if (source == NULL) {
- if (buf) {
- free(buf);
- buf = NULL;
- }
- len = 0;
- return lexeme = lexptr = lexptr_begin = NULL;
- }
- fd = pathopen(source);
- if (fd == -1)
- fatal("can't open source file \"%s\" for reading (%s)",
- source, strerror(errno));
- len = optimal_bufsize(fd);
- if (buf)
- free(buf);
- emalloc(buf, char *, len + SLOP, "get_src_buf");
- lexptr_begin = buf + SLOP;
- samefile = 1;
- sourceline = 1;
- } else {
- /*
- * Here, we retain the current source line (up to length SLOP)
- * in the beginning of the buffer that was overallocated above
- */
- int offset;
- int linelen;
-
- offset = lexptr - lexeme;
- for (scan = lexeme; scan > lexptr_begin; scan--)
- if (*scan == '\n') {
- scan++;
- break;
- }
- linelen = lexptr - scan;
- if (linelen > SLOP)
- linelen = SLOP;
- thisline = buf + SLOP - linelen;
- memcpy(thisline, scan, linelen);
- lexeme = buf + SLOP - offset;
- lexptr_begin = thisline;
- }
- n = read(fd, buf + SLOP, len);
- if (n == -1)
- fatal("can't read sourcefile \"%s\" (%s)",
- source, strerror(errno));
- if (n == 0) {
- samefile = 0;
- nextfile++;
- *lexeme = '\0';
- len = 0;
- return get_src_buf();
- }
- lexptr = buf + SLOP;
- lexend = lexptr + n;
- return buf;
-}
-
-#define tokadd(x) (*tok++ = (x), tok == tokend ? tokexpand() : tok)
-
-char *
-tokexpand()
-{
- static int toksize = 60;
- int tokoffset;
-
- tokoffset = tok - tokstart;
- toksize *= 2;
- if (tokstart)
- erealloc(tokstart, char *, toksize, "tokexpand");
- else
- emalloc(tokstart, char *, toksize, "tokexpand");
- tokend = tokstart + toksize;
- tok = tokstart + tokoffset;
- return tok;
-}
-
-#if DEBUG
-char
-nextc() {
- if (lexptr && lexptr < lexend)
- return *lexptr++;
- else if (get_src_buf())
- return *lexptr++;
- else
- return '\0';
-}
-#else
-#define nextc() ((lexptr && lexptr < lexend) ? \
- *lexptr++ : \
- (get_src_buf() ? *lexptr++ : '\0') \
- )
-#endif
-#define pushback() (lexptr && lexptr > lexptr_begin ? lexptr-- : lexptr)
-
-/*
- * Read the input and turn it into tokens.
- */
-
-static int
-yylex()
-{
- register int c;
- int seen_e = 0; /* These are for numbers */
- int seen_point = 0;
- int esc_seen; /* for literal strings */
- int low, mid, high;
- static int did_newline = 0;
- char *tokkey;
-
- if (!nextc())
- return 0;
- pushback();
-#ifdef OS2
- /*
- * added for OS/2's extproc feature of cmd.exe
- * (like #! in BSD sh)
- */
- if (strncasecmp(lexptr, "extproc ", 8) == 0) {
- while (*lexptr && *lexptr != '\n')
- lexptr++;
- }
-#endif
- lexeme = lexptr;
- thisline = NULL;
- if (want_regexp) {
- int in_brack = 0;
-
- want_regexp = 0;
- tok = tokstart;
- while ((c = nextc()) != 0) {
- switch (c) {
- case '[':
- in_brack = 1;
- break;
- case ']':
- in_brack = 0;
- break;
- case '\\':
- if ((c = nextc()) == '\0') {
- yyerror("unterminated regexp ends with \\ at end of file");
- } else if (c == '\n') {
- sourceline++;
- continue;
- } else
- tokadd('\\');
- break;
- case '/': /* end of the regexp */
- if (in_brack)
- break;
-
- pushback();
- tokadd('\0');
- yylval.sval = tokstart;
- return REGEXP;
- case '\n':
- pushback();
- yyerror("unterminated regexp");
- case '\0':
- yyerror("unterminated regexp at end of file");
- }
- tokadd(c);
- }
- }
-retry:
- while ((c = nextc()) == ' ' || c == '\t')
- continue;
-
- lexeme = lexptr ? lexptr - 1 : lexptr;
- thisline = NULL;
- tok = tokstart;
- yylval.nodetypeval = Node_illegal;
-
- switch (c) {
- case 0:
- return 0;
-
- case '\n':
- sourceline++;
- return NEWLINE;
-
- case '#': /* it's a comment */
- while ((c = nextc()) != '\n') {
- if (c == '\0')
- return 0;
- }
- sourceline++;
- return NEWLINE;
-
- case '\\':
-#ifdef RELAXED_CONTINUATION
- /*
- * This code puports to allow comments and/or whitespace
- * after the `\' at the end of a line used for continuation.
- * Use it at your own risk. We think it's a bad idea, which
- * is why it's not on by default.
- */
- if (!do_unix) {
- /* strip trailing white-space and/or comment */
- while ((c = nextc()) == ' ' || c == '\t')
- continue;
- if (c == '#')
- while ((c = nextc()) != '\n')
- if (c == '\0')
- break;
- pushback();
- }
-#endif /* RELAXED_CONTINUATION */
- if (nextc() == '\n') {
- sourceline++;
- goto retry;
- } else
- yyerror("backslash not last character on line");
- break;
-
- case '$':
- want_assign = 1;
- return '$';
-
- case ')':
- case ']':
- case '(':
- case '[':
- case ';':
- case ':':
- case '?':
- case '{':
- case ',':
- return c;
-
- case '*':
- if ((c = nextc()) == '=') {
- yylval.nodetypeval = Node_assign_times;
- return ASSIGNOP;
- } else if (do_posix) {
- pushback();
- return '*';
- } else if (c == '*') {
- /* make ** and **= aliases for ^ and ^= */
- static int did_warn_op = 0, did_warn_assgn = 0;
-
- if (nextc() == '=') {
- if (do_lint && ! did_warn_assgn) {
- did_warn_assgn = 1;
- warning("**= is not allowed by POSIX");
- }
- yylval.nodetypeval = Node_assign_exp;
- return ASSIGNOP;
- } else {
- pushback();
- if (do_lint && ! did_warn_op) {
- did_warn_op = 1;
- warning("** is not allowed by POSIX");
- }
- return '^';
- }
- }
- pushback();
- return '*';
-
- case '/':
- if (want_assign) {
- if (nextc() == '=') {
- yylval.nodetypeval = Node_assign_quotient;
- return ASSIGNOP;
- }
- pushback();
- }
- return '/';
-
- case '%':
- if (nextc() == '=') {
- yylval.nodetypeval = Node_assign_mod;
- return ASSIGNOP;
- }
- pushback();
- return '%';
-
- case '^':
- {
- static int did_warn_op = 0, did_warn_assgn = 0;
-
- if (nextc() == '=') {
-
- if (do_lint && ! did_warn_assgn) {
- did_warn_assgn = 1;
- warning("operator `^=' is not supported in old awk");
- }
- yylval.nodetypeval = Node_assign_exp;
- return ASSIGNOP;
- }
- pushback();
- if (do_lint && ! did_warn_op) {
- did_warn_op = 1;
- warning("operator `^' is not supported in old awk");
- }
- return '^';
- }
-
- case '+':
- if ((c = nextc()) == '=') {
- yylval.nodetypeval = Node_assign_plus;
- return ASSIGNOP;
- }
- if (c == '+')
- return INCREMENT;
- pushback();
- return '+';
-
- case '!':
- if ((c = nextc()) == '=') {
- yylval.nodetypeval = Node_notequal;
- return RELOP;
- }
- if (c == '~') {
- yylval.nodetypeval = Node_nomatch;
- want_assign = 0;
- return MATCHOP;
- }
- pushback();
- return '!';
-
- case '<':
- if (nextc() == '=') {
- yylval.nodetypeval = Node_leq;
- return RELOP;
- }
- yylval.nodetypeval = Node_less;
- pushback();
- return '<';
-
- case '=':
- if (nextc() == '=') {
- yylval.nodetypeval = Node_equal;
- return RELOP;
- }
- yylval.nodetypeval = Node_assign;
- pushback();
- return ASSIGNOP;
-
- case '>':
- if ((c = nextc()) == '=') {
- yylval.nodetypeval = Node_geq;
- return RELOP;
- } else if (c == '>') {
- yylval.nodetypeval = Node_redirect_append;
- return APPEND_OP;
- }
- yylval.nodetypeval = Node_greater;
- pushback();
- return '>';
-
- case '~':
- yylval.nodetypeval = Node_match;
- want_assign = 0;
- return MATCHOP;
-
- case '}':
- /*
- * Added did newline stuff. Easier than
- * hacking the grammar
- */
- if (did_newline) {
- did_newline = 0;
- return c;
- }
- did_newline++;
- --lexptr; /* pick up } next time */
- return NEWLINE;
-
- case '"':
- esc_seen = 0;
- while ((c = nextc()) != '"') {
- if (c == '\n') {
- pushback();
- yyerror("unterminated string");
- }
- if (c == '\\') {
- c = nextc();
- if (c == '\n') {
- sourceline++;
- continue;
- }
- esc_seen = 1;
- tokadd('\\');
- }
- if (c == '\0') {
- pushback();
- yyerror("unterminated string");
- }
- tokadd(c);
- }
- yylval.nodeval = make_str_node(tokstart,
- tok - tokstart, esc_seen ? SCAN : 0);
- yylval.nodeval->flags |= PERM;
- return YSTRING;
-
- case '-':
- if ((c = nextc()) == '=') {
- yylval.nodetypeval = Node_assign_minus;
- return ASSIGNOP;
- }
- if (c == '-')
- return DECREMENT;
- pushback();
- return '-';
-
- case '.':
- c = nextc();
- pushback();
- if (!isdigit(c))
- return '.';
- else
- c = '.'; /* FALL THROUGH */
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- /* It's a number */
- for (;;) {
- int gotnumber = 0;
-
- tokadd(c);
- switch (c) {
- case '.':
- if (seen_point) {
- gotnumber++;
- break;
- }
- ++seen_point;
- break;
- case 'e':
- case 'E':
- if (seen_e) {
- gotnumber++;
- break;
- }
- ++seen_e;
- if ((c = nextc()) == '-' || c == '+')
- tokadd(c);
- else
- pushback();
- break;
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- break;
- default:
- gotnumber++;
- }
- if (gotnumber)
- break;
- c = nextc();
- }
- pushback();
- yylval.nodeval = make_number(atof(tokstart));
- yylval.nodeval->flags |= PERM;
- return YNUMBER;
-
- case '&':
- if ((c = nextc()) == '&') {
- yylval.nodetypeval = Node_and;
- for (;;) {
- c = nextc();
- if (c == '\0')
- break;
- if (c == '#') {
- while ((c = nextc()) != '\n' && c != '\0')
- continue;
- if (c == '\0')
- break;
- }
- if (c == '\n')
- sourceline++;
- if (! isspace(c)) {
- pushback();
- break;
- }
- }
- want_assign = 0;
- return LEX_AND;
- }
- pushback();
- return '&';
-
- case '|':
- if ((c = nextc()) == '|') {
- yylval.nodetypeval = Node_or;
- for (;;) {
- c = nextc();
- if (c == '\0')
- break;
- if (c == '#') {
- while ((c = nextc()) != '\n' && c != '\0')
- continue;
- if (c == '\0')
- break;
- }
- if (c == '\n')
- sourceline++;
- if (! isspace(c)) {
- pushback();
- break;
- }
- }
- want_assign = 0;
- return LEX_OR;
- }
- pushback();
- return '|';
- }
-
- if (c != '_' && ! isalpha(c))
- yyerror("Invalid char '%c' in expression\n", c);
-
- /* it's some type of name-type-thing. Find its length */
- tok = tokstart;
- while (is_identchar(c)) {
- tokadd(c);
- c = nextc();
- }
- tokadd('\0');
- emalloc(tokkey, char *, tok - tokstart, "yylex");
- memcpy(tokkey, tokstart, tok - tokstart);
- pushback();
-
- /* See if it is a special token. */
- low = 0;
- high = (sizeof (tokentab) / sizeof (tokentab[0])) - 1;
- while (low <= high) {
- int i/* , c */;
-
- mid = (low + high) / 2;
- c = *tokstart - tokentab[mid].operator[0];
- i = c ? c : strcmp (tokstart, tokentab[mid].operator);
-
- if (i < 0) { /* token < mid */
- high = mid - 1;
- } else if (i > 0) { /* token > mid */
- low = mid + 1;
- } else {
- if (do_lint) {
- if (tokentab[mid].flags & GAWKX)
- warning("%s() is a gawk extension",
- tokentab[mid].operator);
- if (tokentab[mid].flags & NOT_POSIX)
- warning("POSIX does not allow %s",
- tokentab[mid].operator);
- if (tokentab[mid].flags & NOT_OLD)
- warning("%s is not supported in old awk",
- tokentab[mid].operator);
- }
- if ((do_unix && (tokentab[mid].flags & GAWKX))
- || (do_posix && (tokentab[mid].flags & NOT_POSIX)))
- break;
- if (tokentab[mid].class == LEX_BUILTIN
- || tokentab[mid].class == LEX_LENGTH
- )
- yylval.lval = mid;
- else
- yylval.nodetypeval = tokentab[mid].value;
-
- free(tokkey);
- return tokentab[mid].class;
- }
- }
-
- yylval.sval = tokkey;
- if (*lexptr == '(')
- return FUNC_CALL;
- else {
- want_assign = 1;
- return NAME;
- }
-}
-
-static NODE *
-node_common(op)
-NODETYPE op;
-{
- register NODE *r;
-
- getnode(r);
- r->type = op;
- r->flags = MALLOC;
- /* if lookahead is NL, lineno is 1 too high */
- if (lexeme && *lexeme == '\n')
- r->source_line = sourceline - 1;
- else
- r->source_line = sourceline;
- r->source_file = source;
- return r;
-}
-
-/*
- * This allocates a node with defined lnode and rnode.
- */
-NODE *
-node(left, op, right)
-NODE *left, *right;
-NODETYPE op;
-{
- register NODE *r;
-
- r = node_common(op);
- r->lnode = left;
- r->rnode = right;
- return r;
-}
-
-/*
- * This allocates a node with defined subnode and proc for builtin functions
- * Checks for arg. count and supplies defaults where possible.
- */
-static NODE *
-snode(subn, op, idx)
-NODETYPE op;
-int idx;
-NODE *subn;
-{
- register NODE *r;
- register NODE *n;
- int nexp = 0;
- int args_allowed;
-
- r = node_common(op);
-
- /* traverse expression list to see how many args. given */
- for (n= subn; n; n= n->rnode) {
- nexp++;
- if (nexp > 3)
- break;
- }
-
- /* check against how many args. are allowed for this builtin */
- args_allowed = tokentab[idx].flags & ARGS;
- if (args_allowed && !(args_allowed & A(nexp)))
- fatal("%s() cannot have %d argument%c",
- tokentab[idx].operator, nexp, nexp == 1 ? ' ' : 's');
-
- r->proc = tokentab[idx].ptr;
-
- /* special case processing for a few builtins */
- if (nexp == 0 && r->proc == do_length) {
- subn = node(node(make_number(0.0),Node_field_spec,(NODE *)NULL),
- Node_expression_list,
- (NODE *) NULL);
- } else if (r->proc == do_match) {
- if (subn->rnode->lnode->type != Node_regex)
- subn->rnode->lnode = mk_rexp(subn->rnode->lnode);
- } else if (r->proc == do_sub || r->proc == do_gsub) {
- if (subn->lnode->type != Node_regex)
- subn->lnode = mk_rexp(subn->lnode);
- if (nexp == 2)
- append_right(subn, node(node(make_number(0.0),
- Node_field_spec,
- (NODE *) NULL),
- Node_expression_list,
- (NODE *) NULL));
- else if (do_lint && subn->rnode->rnode->lnode->type == Node_val)
- warning("string literal as last arg of substitute");
- } else if (r->proc == do_split) {
- if (nexp == 2)
- append_right(subn,
- node(FS_node, Node_expression_list, (NODE *) NULL));
- n = subn->rnode->rnode->lnode;
- if (n->type != Node_regex)
- subn->rnode->rnode->lnode = mk_rexp(n);
- if (nexp == 2)
- subn->rnode->rnode->lnode->re_flags |= FS_DFLT;
- }
-
- r->subnode = subn;
- return r;
-}
-
-/*
- * This allocates a Node_line_range node with defined condpair and
- * zeroes the trigger word to avoid the temptation of assuming that calling
- * 'node( foo, Node_line_range, 0)' will properly initialize 'triggered'.
- */
-/* Otherwise like node() */
-static NODE *
-mkrangenode(cpair)
-NODE *cpair;
-{
- register NODE *r;
-
- getnode(r);
- r->type = Node_line_range;
- r->condpair = cpair;
- r->triggered = 0;
- return r;
-}
-
-/* Build a for loop */
-static NODE *
-make_for_loop(init, cond, incr)
-NODE *init, *cond, *incr;
-{
- register FOR_LOOP_HEADER *r;
- NODE *n;
-
- emalloc(r, FOR_LOOP_HEADER *, sizeof(FOR_LOOP_HEADER), "make_for_loop");
- getnode(n);
- n->type = Node_illegal;
- r->init = init;
- r->cond = cond;
- r->incr = incr;
- n->sub.nodep.r.hd = r;
- return n;
-}
-
-/*
- * Install a name in the symbol table, even if it is already there.
- * Caller must check against redefinition if that is desired.
- */
-NODE *
-install(name, value)
-char *name;
-NODE *value;
-{
- register NODE *hp;
- register size_t len;
- register int bucket;
-
- len = strlen(name);
- bucket = hash(name, len, (unsigned long) HASHSIZE);
- getnode(hp);
- hp->type = Node_hashnode;
- hp->hnext = variables[bucket];
- variables[bucket] = hp;
- hp->hlength = len;
- hp->hvalue = value;
- hp->hname = name;
- hp->hvalue->vname = name;
- return hp->hvalue;
-}
-
-/* find the most recent hash node for name installed by install */
-NODE *
-lookup(name)
-const char *name;
-{
- register NODE *bucket;
- register size_t len;
-
- len = strlen(name);
- bucket = variables[hash(name, len, (unsigned long) HASHSIZE)];
- while (bucket) {
- if (bucket->hlength == len && STREQN(bucket->hname, name, len))
- return bucket->hvalue;
- bucket = bucket->hnext;
- }
- return NULL;
-}
-
-/*
- * Add new to the rightmost branch of LIST. This uses n^2 time, so we make
- * a simple attempt at optimizing it.
- */
-static NODE *
-append_right(list, new)
-NODE *list, *new;
-{
- register NODE *oldlist;
- static NODE *savefront = NULL, *savetail = NULL;
-
- oldlist = list;
- if (savefront == oldlist) {
- savetail = savetail->rnode = new;
- return oldlist;
- } else
- savefront = oldlist;
- while (list->rnode != NULL)
- list = list->rnode;
- savetail = list->rnode = new;
- return oldlist;
-}
-
-/*
- * check if name is already installed; if so, it had better have Null value,
- * in which case def is added as the value. Otherwise, install name with def
- * as value.
- */
-static void
-func_install(params, def)
-NODE *params;
-NODE *def;
-{
- NODE *r;
-
- pop_params(params->rnode);
- pop_var(params, 0);
- r = lookup(params->param);
- if (r != NULL) {
- fatal("function name `%s' previously defined", params->param);
- } else
- (void) install(params->param, node(params, Node_func, def));
-}
-
-static void
-pop_var(np, freeit)
-NODE *np;
-int freeit;
-{
- register NODE *bucket, **save;
- register size_t len;
- char *name;
-
- name = np->param;
- len = strlen(name);
- save = &(variables[hash(name, len, (unsigned long) HASHSIZE)]);
- for (bucket = *save; bucket; bucket = bucket->hnext) {
- if (len == bucket->hlength && STREQN(bucket->hname, name, len)) {
- *save = bucket->hnext;
- freenode(bucket);
- if (freeit)
- free(np->param);
- return;
- }
- save = &(bucket->hnext);
- }
-}
-
-static void
-pop_params(params)
-NODE *params;
-{
- register NODE *np;
-
- for (np = params; np != NULL; np = np->rnode)
- pop_var(np, 1);
-}
-
-static NODE *
-make_param(name)
-char *name;
-{
- NODE *r;
-
- getnode(r);
- r->type = Node_param_list;
- r->rnode = NULL;
- r->param = name;
- r->param_cnt = param_counter++;
- return (install(name, r));
-}
-
-/* Name points to a variable name. Make sure its in the symbol table */
-NODE *
-variable(name, can_free)
-char *name;
-int can_free;
-{
- register NODE *r;
- static int env_loaded = 0;
-
- if (!env_loaded && STREQ(name, "ENVIRON")) {
- load_environ();
- env_loaded = 1;
- }
- if ((r = lookup(name)) == NULL)
- r = install(name, node(Nnull_string, Node_var, (NODE *) NULL));
- else if (can_free)
- free(name);
- return r;
-}
-
-static NODE *
-mk_rexp(exp)
-NODE *exp;
-{
- if (exp->type == Node_regex)
- return exp;
- else {
- NODE *n;
-
- getnode(n);
- n->type = Node_regex;
- n->re_exp = exp;
- n->re_text = NULL;
- n->re_reg = NULL;
- n->re_flags = 0;
- n->re_cnt = 1;
- return n;
- }
-}
diff --git a/gnu/usr.bin/awk/builtin.c b/gnu/usr.bin/awk/builtin.c
deleted file mode 100644
index 00c52e7..0000000
--- a/gnu/usr.bin/awk/builtin.c
+++ /dev/null
@@ -1,1239 +0,0 @@
-/*
- * builtin.c - Builtin functions and various utility procedures
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-
-#include "awk.h"
-
-#ifndef SRANDOM_PROTO
-extern void srandom P((unsigned int seed));
-#endif
-#if !defined(linux) && !defined(__FreeBSD__)
-extern char *initstate P((unsigned seed, char *state, int n));
-extern char *setstate P((char *state));
-extern long random P((void));
-#endif
-
-extern NODE **fields_arr;
-extern int output_is_tty;
-
-static NODE *sub_common P((NODE *tree, int global));
-NODE *format_tree P((const char *, int, NODE *));
-
-#ifdef _CRAY
-/* Work around a problem in conversion of doubles to exact integers. */
-#include <float.h>
-#define Floor(n) floor((n) * (1.0 + DBL_EPSILON))
-#define Ceil(n) ceil((n) * (1.0 + DBL_EPSILON))
-
-/* Force the standard C compiler to use the library math functions. */
-extern double exp(double);
-double (*Exp)() = exp;
-#define exp(x) (*Exp)(x)
-extern double log(double);
-double (*Log)() = log;
-#define log(x) (*Log)(x)
-#else
-#define Floor(n) floor(n)
-#define Ceil(n) ceil(n)
-#endif
-
-#define DEFAULT_G_PRECISION 6
-
-#ifdef GFMT_WORKAROUND
-/* semi-temporary hack, mostly to gracefully handle VMS */
-static void sgfmt P((char *buf, const char *format, int alt,
- int fwidth, int precision, double value));
-#endif /* GFMT_WORKAROUND */
-
-/*
- * On the alpha, LONG_MAX is too big for doing rand().
- * On the Cray (Y-MP, anyway), ints and longs are 64 bits, but
- * random() does things in terms of 32 bits. So we have to chop
- * LONG_MAX down.
- */
-#if (defined(__alpha) && defined(__osf__)) || defined(_CRAY)
-#define GAWK_RANDOM_MAX (LONG_MAX & 0x7fffffff)
-#else
-#define GAWK_RANDOM_MAX LONG_MAX
-#endif
-
-static void efwrite P((const void *ptr, size_t size, size_t count, FILE *fp,
- const char *from, struct redirect *rp,int flush));
-
-static void
-efwrite(ptr, size, count, fp, from, rp, flush)
-const void *ptr;
-size_t size, count;
-FILE *fp;
-const char *from;
-struct redirect *rp;
-int flush;
-{
- errno = 0;
- if (fwrite(ptr, size, count, fp) != count)
- goto wrerror;
- if (flush
- && ((fp == stdout && output_is_tty)
- || (rp && (rp->flag & RED_NOBUF)))) {
- fflush(fp);
- if (ferror(fp))
- goto wrerror;
- }
- return;
-
- wrerror:
- fatal("%s to \"%s\" failed (%s)", from,
- rp ? rp->value : "standard output",
- errno ? strerror(errno) : "reason unknown");
-}
-
-/* Builtin functions */
-NODE *
-do_exp(tree)
-NODE *tree;
-{
- NODE *tmp;
- double d, res;
-#ifndef exp
- double exp P((double));
-#endif
-
- tmp= tree_eval(tree->lnode);
- d = force_number(tmp);
- free_temp(tmp);
- errno = 0;
- res = exp(d);
- if (errno == ERANGE)
- warning("exp argument %g is out of range", d);
- return tmp_number((AWKNUM) res);
-}
-
-NODE *
-do_index(tree)
-NODE *tree;
-{
- NODE *s1, *s2;
- register char *p1, *p2;
- register size_t l1, l2;
- long ret;
-
-
- s1 = tree_eval(tree->lnode);
- s2 = tree_eval(tree->rnode->lnode);
- force_string(s1);
- force_string(s2);
- p1 = s1->stptr;
- p2 = s2->stptr;
- l1 = s1->stlen;
- l2 = s2->stlen;
- ret = 0;
- if (IGNORECASE) {
- while (l1) {
- if (l2 > l1)
- break;
- if (casetable[(int)*p1] == casetable[(int)*p2]
- && (l2 == 1 || strncasecmp(p1, p2, l2) == 0)) {
- ret = 1 + s1->stlen - l1;
- break;
- }
- l1--;
- p1++;
- }
- } else {
- while (l1) {
- if (l2 > l1)
- break;
- if (*p1 == *p2
- && (l2 == 1 || STREQN(p1, p2, l2))) {
- ret = 1 + s1->stlen - l1;
- break;
- }
- l1--;
- p1++;
- }
- }
- free_temp(s1);
- free_temp(s2);
- return tmp_number((AWKNUM) ret);
-}
-
-double
-double_to_int(d)
-double d;
-{
- double floor P((double));
- double ceil P((double));
-
- if (d >= 0)
- d = Floor(d);
- else
- d = Ceil(d);
- return d;
-}
-
-NODE *
-do_int(tree)
-NODE *tree;
-{
- NODE *tmp;
- double d;
-
- tmp = tree_eval(tree->lnode);
- d = force_number(tmp);
- d = double_to_int(d);
- free_temp(tmp);
- return tmp_number((AWKNUM) d);
-}
-
-NODE *
-do_length(tree)
-NODE *tree;
-{
- NODE *tmp;
- size_t len;
-
- tmp = tree_eval(tree->lnode);
- len = force_string(tmp)->stlen;
- free_temp(tmp);
- return tmp_number((AWKNUM) len);
-}
-
-NODE *
-do_log(tree)
-NODE *tree;
-{
- NODE *tmp;
-#ifndef log
- double log P((double));
-#endif
- double d, arg;
-
- tmp = tree_eval(tree->lnode);
- arg = (double) force_number(tmp);
- if (arg < 0.0)
- warning("log called with negative argument %g", arg);
- d = log(arg);
- free_temp(tmp);
- return tmp_number((AWKNUM) d);
-}
-
-/*
- * format_tree() formats nodes of a tree, starting with a left node,
- * and accordingly to a fmt_string providing a format like in
- * printf family from C library. Returns a string node which value
- * is a formatted string. Called by sprintf function.
- *
- * It is one of the uglier parts of gawk. Thanks to Michal Jaegermann
- * for taming this beast and making it compatible with ANSI C.
- */
-
-NODE *
-format_tree(fmt_string, n0, carg)
-const char *fmt_string;
-int n0;
-register NODE *carg;
-{
-/* copy 'l' bytes from 's' to 'obufout' checking for space in the process */
-/* difference of pointers should be of ptrdiff_t type, but let us be kind */
-#define bchunk(s,l) if(l) {\
- while((l)>ofre) {\
- long olen = obufout - obuf;\
- erealloc(obuf, char *, osiz*2, "format_tree");\
- ofre+=osiz;\
- osiz*=2;\
- obufout = obuf + olen;\
- }\
- memcpy(obufout,s,(size_t)(l));\
- obufout+=(l);\
- ofre-=(l);\
- }
-/* copy one byte from 's' to 'obufout' checking for space in the process */
-#define bchunk_one(s) {\
- if(ofre <= 0) {\
- long olen = obufout - obuf;\
- erealloc(obuf, char *, osiz*2, "format_tree");\
- ofre+=osiz;\
- osiz*=2;\
- obufout = obuf + olen;\
- }\
- *obufout++ = *s;\
- --ofre;\
- }
-
- /* Is there space for something L big in the buffer? */
-#define chksize(l) if((l)>ofre) {\
- long olen = obufout - obuf;\
- erealloc(obuf, char *, osiz*2, "format_tree");\
- obufout = obuf + olen;\
- ofre+=osiz;\
- osiz*=2;\
- }
-
- /*
- * Get the next arg to be formatted. If we've run out of args,
- * return "" (Null string)
- */
-#define parse_next_arg() {\
- if(!carg) { toofew = 1; break; }\
- else {\
- arg=tree_eval(carg->lnode);\
- carg=carg->rnode;\
- }\
- }
-
- NODE *r;
- int toofew = 0;
- char *obuf, *obufout;
- size_t osiz, ofre;
- char *chbuf;
- const char *s0, *s1;
- int cs1;
- NODE *arg;
- long fw, prec;
- int lj, alt, big, have_prec;
- long *cur;
- long val;
-#ifdef sun386 /* Can't cast unsigned (int/long) from ptr->value */
- long tmp_uval; /* on 386i 4.0.1 C compiler -- it just hangs */
-#endif
- unsigned long uval;
- int sgn;
- int base = 0;
- char cpbuf[30]; /* if we have numbers bigger than 30 */
- char *cend = &cpbuf[30];/* chars, we lose, but seems unlikely */
- char *cp;
- char *fill;
- double tmpval;
- char signchar = 0;
- size_t len;
- static char sp[] = " ";
- static char zero_string[] = "0";
- static char lchbuf[] = "0123456789abcdef";
- static char Uchbuf[] = "0123456789ABCDEF";
-
- emalloc(obuf, char *, 120, "format_tree");
- obufout = obuf;
- osiz = 120;
- ofre = osiz - 1;
-
- s0 = s1 = fmt_string;
- while (n0-- > 0) {
- if (*s1 != '%') {
- s1++;
- continue;
- }
- bchunk(s0, s1 - s0);
- s0 = s1;
- cur = &fw;
- fw = 0;
- prec = 0;
- have_prec = 0;
- lj = alt = big = 0;
- fill = sp;
- cp = cend;
- chbuf = lchbuf;
- s1++;
-
-retry:
- --n0;
- switch (cs1 = *s1++) {
- case (-1): /* dummy case to allow for checking */
-check_pos:
- if (cur != &fw)
- break; /* reject as a valid format */
- goto retry;
- case '%':
- bchunk_one("%");
- s0 = s1;
- break;
-
- case '0':
- if (lj)
- goto retry;
- if (cur == &fw)
- fill = zero_string; /* FALL through */
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- if (cur == 0)
- /* goto lose; */
- break;
- if (prec >= 0)
- *cur = cs1 - '0';
- /* with a negative precision *cur is already set */
- /* to -1, so it will remain negative, but we have */
- /* to "eat" precision digits in any case */
- while (n0 > 0 && *s1 >= '0' && *s1 <= '9') {
- --n0;
- *cur = *cur * 10 + *s1++ - '0';
- }
- if (prec < 0) /* negative precision is discarded */
- have_prec = 0;
- if (cur == &prec)
- cur = 0;
- goto retry;
- case '*':
- if (cur == 0)
- /* goto lose; */
- break;
- parse_next_arg();
- *cur = force_number(arg);
- free_temp(arg);
- if (cur == &prec)
- cur = 0;
- goto retry;
- case ' ': /* print ' ' or '-' */
- /* 'space' flag is ignored */
- /* if '+' already present */
- if (signchar != 0)
- goto check_pos;
- /* FALL THROUGH */
- case '+': /* print '+' or '-' */
- signchar = cs1;
- goto check_pos;
- case '-':
- if (prec < 0)
- break;
- if (cur == &prec) {
- prec = -1;
- goto retry;
- }
- fill = sp; /* if left justified then other */
- lj++; /* filling is ignored */
- goto check_pos;
- case '.':
- if (cur != &fw)
- break;
- cur = &prec;
- have_prec++;
- goto retry;
- case '#':
- alt++;
- goto check_pos;
- case 'l':
- if (big)
- break;
- big++;
- goto check_pos;
- case 'c':
- parse_next_arg();
- if (arg->flags & NUMBER) {
-#ifdef sun386
- tmp_uval = arg->numbr;
- uval= (unsigned long) tmp_uval;
-#else
- uval = (unsigned long) arg->numbr;
-#endif
- cpbuf[0] = uval;
- prec = 1;
- cp = cpbuf;
- goto pr_tail;
- }
- if (have_prec == 0)
- prec = 1;
- else if (prec > arg->stlen)
- prec = arg->stlen;
- cp = arg->stptr;
- goto pr_tail;
- case 's':
- parse_next_arg();
- arg = force_string(arg);
- if (have_prec == 0 || prec > arg->stlen)
- prec = arg->stlen;
- cp = arg->stptr;
- goto pr_tail;
- case 'd':
- case 'i':
- parse_next_arg();
- tmpval = force_number(arg);
- if (tmpval > LONG_MAX || tmpval < LONG_MIN) {
- /* out of range - emergency use of %g format */
- cs1 = 'g';
- goto format_float;
- }
- val = (long) tmpval;
-
- if (val < 0) {
- sgn = 1;
- if (val > LONG_MIN)
- uval = (unsigned long) -val;
- else
- uval = (unsigned long)(-(LONG_MIN + 1))
- + (unsigned long)1;
- } else {
- sgn = 0;
- uval = (unsigned long) val;
- }
- do {
- *--cp = (char) ('0' + uval % 10);
- uval /= 10;
- } while (uval);
- if (sgn)
- *--cp = '-';
- else if (signchar)
- *--cp = signchar;
- if (have_prec != 0) /* ignore '0' flag if */
- fill = sp; /* precision given */
- if (prec > fw)
- fw = prec;
- prec = cend - cp;
- if (fw > prec && ! lj && fill != sp
- && (*cp == '-' || signchar)) {
- bchunk_one(cp);
- cp++;
- prec--;
- fw--;
- }
- goto pr_tail;
- case 'X':
- chbuf = Uchbuf; /* FALL THROUGH */
- case 'x':
- base += 6; /* FALL THROUGH */
- case 'u':
- base += 2; /* FALL THROUGH */
- case 'o':
- base += 8;
- parse_next_arg();
- tmpval = force_number(arg);
- if (tmpval > ULONG_MAX || tmpval < LONG_MIN) {
- /* out of range - emergency use of %g format */
- cs1 = 'g';
- goto format_float;
- }
- uval = (unsigned long)tmpval;
- if (have_prec != 0) /* ignore '0' flag if */
- fill = sp; /* precision given */
- do {
- *--cp = chbuf[uval % base];
- uval /= base;
- } while (uval);
- if (alt) {
- if (base == 16) {
- *--cp = cs1;
- *--cp = '0';
- if (fill != sp) {
- bchunk(cp, 2);
- cp += 2;
- fw -= 2;
- }
- } else if (base == 8)
- *--cp = '0';
- }
- base = 0;
- prec = cend - cp;
- pr_tail:
- if (! lj) {
- while (fw > prec) {
- bchunk_one(fill);
- fw--;
- }
- }
- bchunk(cp, (int) prec);
- while (fw > prec) {
- bchunk_one(fill);
- fw--;
- }
- s0 = s1;
- free_temp(arg);
- break;
- case 'g':
- case 'G':
- case 'e':
- case 'f':
- case 'E':
- parse_next_arg();
- tmpval = force_number(arg);
- format_float:
- free_temp(arg);
- if (have_prec == 0)
- prec = DEFAULT_G_PRECISION;
- chksize(fw + prec + 9); /* 9==slop */
-
- cp = cpbuf;
- *cp++ = '%';
- if (lj)
- *cp++ = '-';
- if (signchar)
- *cp++ = signchar;
- if (alt)
- *cp++ = '#';
- if (fill != sp)
- *cp++ = '0';
- cp = strcpy(cp, "*.*") + 3;
- *cp++ = cs1;
- *cp = '\0';
-#ifndef GFMT_WORKAROUND
- (void) sprintf(obufout, cpbuf,
- (int) fw, (int) prec, (double) tmpval);
-#else /* GFMT_WORKAROUND */
- if (cs1 == 'g' || cs1 == 'G')
- sgfmt(obufout, cpbuf, (int) alt,
- (int) fw, (int) prec, (double) tmpval);
- else
- (void) sprintf(obufout, cpbuf,
- (int) fw, (int) prec, (double) tmpval);
-#endif /* GFMT_WORKAROUND */
- len = strlen(obufout);
- ofre -= len;
- obufout += len;
- s0 = s1;
- break;
- default:
- break;
- }
- if (toofew)
- fatal("%s\n\t%s\n\t%*s%s",
- "not enough arguments to satisfy format string",
- fmt_string, s1 - fmt_string - 2, "",
- "^ ran out for this one"
- );
- }
- if (do_lint && carg != NULL)
- warning("too many arguments supplied for format string");
- bchunk(s0, s1 - s0);
- r = make_str_node(obuf, obufout - obuf, ALREADY_MALLOCED);
- r->flags |= TEMP;
- return r;
-}
-
-NODE *
-do_sprintf(tree)
-NODE *tree;
-{
- NODE *r;
- NODE *sfmt = force_string(tree_eval(tree->lnode));
-
- r = format_tree(sfmt->stptr, sfmt->stlen, tree->rnode);
- free_temp(sfmt);
- return r;
-}
-
-
-void
-do_printf(tree)
-register NODE *tree;
-{
- struct redirect *rp = NULL;
- register FILE *fp;
-
- if (tree->rnode) {
- int errflg; /* not used, sigh */
-
- rp = redirect(tree->rnode, &errflg);
- if (rp) {
- fp = rp->fp;
- if (!fp)
- return;
- } else
- return;
- } else
- fp = stdout;
- tree = do_sprintf(tree->lnode);
- efwrite(tree->stptr, sizeof(char), tree->stlen, fp, "printf", rp , 1);
- free_temp(tree);
-}
-
-NODE *
-do_sqrt(tree)
-NODE *tree;
-{
- NODE *tmp;
- double arg;
- extern double sqrt P((double));
-
- tmp = tree_eval(tree->lnode);
- arg = (double) force_number(tmp);
- free_temp(tmp);
- if (arg < 0.0)
- warning("sqrt called with negative argument %g", arg);
- return tmp_number((AWKNUM) sqrt(arg));
-}
-
-NODE *
-do_substr(tree)
-NODE *tree;
-{
- NODE *t1, *t2, *t3;
- NODE *r;
- register int indx;
- size_t length;
- int is_long;
-
- t1 = tree_eval(tree->lnode);
- t2 = tree_eval(tree->rnode->lnode);
- if (tree->rnode->rnode == NULL) /* third arg. missing */
- length = t1->stlen;
- else {
- t3 = tree_eval(tree->rnode->rnode->lnode);
- length = (size_t) force_number(t3);
- free_temp(t3);
- }
- indx = (int) force_number(t2) - 1;
- free_temp(t2);
- t1 = force_string(t1);
- if (indx < 0)
- indx = 0;
- if (indx >= t1->stlen || (long) length <= 0) {
- free_temp(t1);
- return Nnull_string;
- }
- if ((is_long = (indx + length > t1->stlen)) || LONG_MAX - indx < length) {
- length = t1->stlen - indx;
- if (do_lint && is_long)
- warning("substr: length %d at position %d exceeds length of first argument",
- length, indx+1);
- }
- r = tmp_string(t1->stptr + indx, length);
- free_temp(t1);
- return r;
-}
-
-NODE *
-do_strftime(tree)
-NODE *tree;
-{
- NODE *t1, *t2;
- struct tm *tm;
- time_t fclock;
- char buf[100];
-
- t1 = force_string(tree_eval(tree->lnode));
-
- if (tree->rnode == NULL) /* second arg. missing, default */
- (void) time(&fclock);
- else {
- t2 = tree_eval(tree->rnode->lnode);
- fclock = (time_t) force_number(t2);
- free_temp(t2);
- }
- tm = localtime(&fclock);
-
- return tmp_string(buf, strftime(buf, 100, t1->stptr, tm));
-}
-
-NODE *
-do_systime(tree)
-NODE *tree;
-{
- time_t lclock;
-
- (void) time(&lclock);
- return tmp_number((AWKNUM) lclock);
-}
-
-NODE *
-do_system(tree)
-NODE *tree;
-{
- NODE *tmp;
- int ret = 0;
- char *cmd;
- char save;
-
- (void) flush_io (); /* so output is synchronous with gawk's */
- tmp = tree_eval(tree->lnode);
- cmd = force_string(tmp)->stptr;
-
- if (cmd && *cmd) {
- /* insure arg to system is zero-terminated */
-
- /*
- * From: David Trueman <emory!cs.dal.ca!david>
- * To: arnold@cc.gatech.edu (Arnold Robbins)
- * Date: Wed, 3 Nov 1993 12:49:41 -0400
- *
- * It may not be necessary to save the character, but
- * I'm not sure. It would normally be the field
- * separator. If the parse has not yet gone beyond
- * that, it could mess up (although I doubt it). If
- * FIELDWIDTHS is being used, it might be the first
- * character of the next field. Unless someone wants
- * to check it out exhaustively, I suggest saving it
- * for now...
- */
- save = cmd[tmp->stlen];
- cmd[tmp->stlen] = '\0';
-
- ret = system(cmd);
- ret = (ret >> 8) & 0xff;
-
- cmd[tmp->stlen] = save;
- }
- free_temp(tmp);
- return tmp_number((AWKNUM) ret);
-}
-
-extern NODE **fmt_list; /* declared in eval.c */
-
-void
-do_print(tree)
-register NODE *tree;
-{
- register NODE *t1;
- struct redirect *rp = NULL;
- register FILE *fp;
- register char *s;
-
- if (tree->rnode) {
- int errflg; /* not used, sigh */
-
- rp = redirect(tree->rnode, &errflg);
- if (rp) {
- fp = rp->fp;
- if (!fp)
- return;
- } else
- return;
- } else
- fp = stdout;
- tree = tree->lnode;
- while (tree) {
- t1 = tree_eval(tree->lnode);
- if (t1->flags & NUMBER) {
- if (OFMTidx == CONVFMTidx)
- (void) force_string(t1);
- else {
-#ifndef GFMT_WORKAROUND
- char buf[100];
-
- (void) sprintf(buf, OFMT, t1->numbr);
- free_temp(t1);
- t1 = tmp_string(buf, strlen(buf));
-#else /* GFMT_WORKAROUND */
- free_temp(t1);
- t1 = format_tree(OFMT,
- fmt_list[OFMTidx]->stlen,
- tree);
-#endif /* GFMT_WORKAROUND */
- }
- }
- efwrite(t1->stptr, sizeof(char), t1->stlen, fp, "print", rp, 0);
- free_temp(t1);
- tree = tree->rnode;
- if (tree) {
- s = OFS;
- if (OFSlen)
- efwrite(s, sizeof(char), (size_t)OFSlen,
- fp, "print", rp, 0);
- }
- }
- s = ORS;
- if (ORSlen)
- efwrite(s, sizeof(char), (size_t)ORSlen, fp, "print", rp, 1);
-}
-
-NODE *
-do_tolower(tree)
-NODE *tree;
-{
- NODE *t1, *t2;
- register char *cp, *cp2;
-
- t1 = tree_eval(tree->lnode);
- t1 = force_string(t1);
- t2 = tmp_string(t1->stptr, t1->stlen);
- for (cp = t2->stptr, cp2 = t2->stptr + t2->stlen; cp < cp2; cp++)
- if (isupper(*cp))
- *cp = tolower(*cp);
- free_temp(t1);
- return t2;
-}
-
-NODE *
-do_toupper(tree)
-NODE *tree;
-{
- NODE *t1, *t2;
- register char *cp;
-
- t1 = tree_eval(tree->lnode);
- t1 = force_string(t1);
- t2 = tmp_string(t1->stptr, t1->stlen);
- for (cp = t2->stptr; cp < t2->stptr + t2->stlen; cp++)
- if (islower(*cp))
- *cp = toupper(*cp);
- free_temp(t1);
- return t2;
-}
-
-NODE *
-do_atan2(tree)
-NODE *tree;
-{
- NODE *t1, *t2;
- extern double atan2 P((double, double));
- double d1, d2;
-
- t1 = tree_eval(tree->lnode);
- t2 = tree_eval(tree->rnode->lnode);
- d1 = force_number(t1);
- d2 = force_number(t2);
- free_temp(t1);
- free_temp(t2);
- return tmp_number((AWKNUM) atan2(d1, d2));
-}
-
-NODE *
-do_sin(tree)
-NODE *tree;
-{
- NODE *tmp;
- extern double sin P((double));
- double d;
-
- tmp = tree_eval(tree->lnode);
- d = sin((double)force_number(tmp));
- free_temp(tmp);
- return tmp_number((AWKNUM) d);
-}
-
-NODE *
-do_cos(tree)
-NODE *tree;
-{
- NODE *tmp;
- extern double cos P((double));
- double d;
-
- tmp = tree_eval(tree->lnode);
- d = cos((double)force_number(tmp));
- free_temp(tmp);
- return tmp_number((AWKNUM) d);
-}
-
-static int firstrand = 1;
-static char state[512];
-
-/* ARGSUSED */
-NODE *
-do_rand(tree)
-NODE *tree;
-{
- if (firstrand) {
- (void) initstate((unsigned long) 1, state, sizeof state);
- srandom(1);
- firstrand = 0;
- }
- return tmp_number((AWKNUM) random() / GAWK_RANDOM_MAX);
-}
-
-NODE *
-do_srand(tree)
-NODE *tree;
-{
- NODE *tmp;
- static long save_seed = 0;
- long ret = save_seed; /* SVR4 awk srand returns previous seed */
-
- if (firstrand)
- (void) initstate((unsigned long) 1, state, sizeof state);
- else
- (void) setstate(state);
-
- if (!tree)
- srandom((unsigned long) (save_seed = (long) time((time_t *) 0)));
- else {
- tmp = tree_eval(tree->lnode);
- srandom((unsigned long) (save_seed = (long) force_number(tmp)));
- free_temp(tmp);
- }
- firstrand = 0;
- return tmp_number((AWKNUM) ret);
-}
-
-NODE *
-do_match(tree)
-NODE *tree;
-{
- NODE *t1;
- int rstart;
- AWKNUM rlength;
- Regexp *rp;
-
- t1 = force_string(tree_eval(tree->lnode));
- tree = tree->rnode->lnode;
- rp = re_update(tree);
- rstart = research(rp, t1->stptr, 0, t1->stlen, 1);
- if (rstart >= 0) { /* match succeded */
- rstart++; /* 1-based indexing */
- rlength = REEND(rp, t1->stptr) - RESTART(rp, t1->stptr);
- } else { /* match failed */
- rstart = 0;
- rlength = -1.0;
- }
- free_temp(t1);
- unref(RSTART_node->var_value);
- RSTART_node->var_value = make_number((AWKNUM) rstart);
- unref(RLENGTH_node->var_value);
- RLENGTH_node->var_value = make_number(rlength);
- return tmp_number((AWKNUM) rstart);
-}
-
-static NODE *
-sub_common(tree, global)
-NODE *tree;
-int global;
-{
- register char *scan;
- register char *bp, *cp;
- char *buf;
- size_t buflen;
- register char *matchend;
- register size_t len;
- char *matchstart;
- char *text;
- size_t textlen;
- char *repl;
- char *replend;
- size_t repllen;
- int sofar;
- int ampersands;
- int matches = 0;
- Regexp *rp;
- NODE *s; /* subst. pattern */
- NODE *t; /* string to make sub. in; $0 if none given */
- NODE *tmp;
- NODE **lhs = &tree; /* value not used -- just different from NULL */
- int priv = 0;
- Func_ptr after_assign = NULL;
-
- tmp = tree->lnode;
- rp = re_update(tmp);
-
- tree = tree->rnode;
- s = tree->lnode;
-
- tree = tree->rnode;
- tmp = tree->lnode;
- t = force_string(tree_eval(tmp));
-
- /* do the search early to avoid work on non-match */
- if (research(rp, t->stptr, 0, t->stlen, 1) == -1 ||
- RESTART(rp, t->stptr) > t->stlen) {
- free_temp(t);
- return tmp_number((AWKNUM) 0.0);
- }
-
- if (tmp->type == Node_val)
- lhs = NULL;
- else
- lhs = get_lhs(tmp, &after_assign);
- t->flags |= STRING;
- /*
- * create a private copy of the string
- */
- if (t->stref > 1 || (t->flags & PERM)) {
- unsigned int saveflags;
-
- saveflags = t->flags;
- t->flags &= ~MALLOC;
- tmp = dupnode(t);
- t->flags = saveflags;
- t = tmp;
- priv = 1;
- }
- text = t->stptr;
- textlen = t->stlen;
- buflen = textlen + 2;
-
- s = force_string(tree_eval(s));
- repl = s->stptr;
- replend = repl + s->stlen;
- repllen = replend - repl;
- emalloc(buf, char *, buflen + 2, "do_sub");
- buf[buflen] = '\0';
- buf[buflen + 1] = '\0';
- ampersands = 0;
- for (scan = repl; scan < replend; scan++) {
- if (*scan == '&') {
- repllen--;
- ampersands++;
- } else if (*scan == '\\' && *(scan+1) == '&') {
- repllen--;
- scan++;
- }
- }
-
- bp = buf;
- for (;;) {
- matches++;
- matchstart = t->stptr + RESTART(rp, t->stptr);
- matchend = t->stptr + REEND(rp, t->stptr);
-
- /*
- * create the result, copying in parts of the original
- * string
- */
- len = matchstart - text + repllen
- + ampersands * (matchend - matchstart);
- sofar = bp - buf;
- while ((long)(buflen - sofar - len - 1) < 0) {
- buflen *= 2;
- erealloc(buf, char *, buflen, "do_sub");
- bp = buf + sofar;
- }
- for (scan = text; scan < matchstart; scan++)
- *bp++ = *scan;
- for (scan = repl; scan < replend; scan++)
- if (*scan == '&')
- for (cp = matchstart; cp < matchend; cp++)
- *bp++ = *cp;
- else if (*scan == '\\' && *(scan+1) == '&') {
- scan++;
- *bp++ = *scan;
- } else
- *bp++ = *scan;
-
- /* catch the case of gsub(//, "blah", whatever), i.e. empty regexp */
- if (global && matchstart == matchend && matchend < text + textlen) {
- *bp++ = *matchend;
- matchend++;
- }
- textlen = text + textlen - matchend;
- text = matchend;
- if (!global || (long)textlen <= 0 ||
- research(rp, t->stptr, text-t->stptr, textlen, 1) == -1)
- break;
- }
- sofar = bp - buf;
- if (buflen - sofar - textlen - 1) {
- buflen = sofar + textlen + 2;
- erealloc(buf, char *, buflen, "do_sub");
- bp = buf + sofar;
- }
- for (scan = matchend; scan < text + textlen; scan++)
- *bp++ = *scan;
- *bp = '\0';
- textlen = bp - buf;
- free(t->stptr);
- t->stptr = buf;
- t->stlen = textlen;
-
- free_temp(s);
- if (matches > 0 && lhs) {
- if (priv) {
- unref(*lhs);
- *lhs = t;
- }
- if (after_assign)
- (*after_assign)();
- t->flags &= ~(NUM|NUMBER);
- }
- return tmp_number((AWKNUM) matches);
-}
-
-NODE *
-do_gsub(tree)
-NODE *tree;
-{
- return sub_common(tree, 1);
-}
-
-NODE *
-do_sub(tree)
-NODE *tree;
-{
- return sub_common(tree, 0);
-}
-
-#ifdef GFMT_WORKAROUND
-/*
- * printf's %g format [can't rely on gcvt()]
- * caveat: don't use as argument to *printf()!
- * 'format' string HAS to be of "<flags>*.*g" kind, or we bomb!
- */
-static void
-sgfmt(buf, format, alt, fwidth, prec, g)
-char *buf; /* return buffer; assumed big enough to hold result */
-const char *format;
-int alt; /* use alternate form flag */
-int fwidth; /* field width in a format */
-int prec; /* indicates desired significant digits, not decimal places */
-double g; /* value to format */
-{
- char dform[40];
- register char *gpos;
- register char *d, *e, *p;
- int again = 0;
-
- strncpy(dform, format, sizeof dform - 1);
- dform[sizeof dform - 1] = '\0';
- gpos = strrchr(dform, '.');
-
- if (g == 0.0 && alt == 0) { /* easy special case */
- *gpos++ = 'd';
- *gpos = '\0';
- (void) sprintf(buf, dform, fwidth, 0);
- return;
- }
- gpos += 2; /* advance to location of 'g' in the format */
-
- if (prec <= 0) /* negative precision is ignored */
- prec = (prec < 0 ? DEFAULT_G_PRECISION : 1);
-
- if (*gpos == 'G')
- again = 1;
- /* start with 'e' format (it'll provide nice exponent) */
- *gpos = 'e';
- prec -= 1;
- (void) sprintf(buf, dform, fwidth, prec, g);
- if ((e = strrchr(buf, 'e')) != NULL) { /* find exponent */
- int exp = atoi(e+1); /* fetch exponent */
- if (exp >= -4 && exp <= prec) { /* per K&R2, B1.2 */
- /* switch to 'f' format and re-do */
- *gpos = 'f';
- prec -= exp; /* decimal precision */
- (void) sprintf(buf, dform, fwidth, prec, g);
- e = buf + strlen(buf);
- while (*--e == ' ')
- continue;
- e += 1;
- }
- else if (again != 0)
- *gpos = 'E';
-
- /* if 'alt' in force, then trailing zeros are not removed */
- if (alt == 0 && (d = strrchr(buf, '.')) != NULL) {
- /* throw away an excess of precision */
- for (p = e; p > d && *--p == '0'; )
- prec -= 1;
- if (d == p)
- prec -= 1;
- if (prec < 0)
- prec = 0;
- /* and do that once again */
- again = 1;
- }
- if (again != 0)
- (void) sprintf(buf, dform, fwidth, prec, g);
- }
-}
-#endif /* GFMT_WORKAROUND */
diff --git a/gnu/usr.bin/awk/config.h b/gnu/usr.bin/awk/config.h
deleted file mode 100644
index 601f483..0000000
--- a/gnu/usr.bin/awk/config.h
+++ /dev/null
@@ -1,306 +0,0 @@
-/*
- * config.h -- configuration definitions for gawk.
- *
- * For generic 4.4 alpha
- */
-
-/*
- * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2, or (at your option)
- * any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-/*
- * This file isolates configuration dependencies for gnu awk.
- * You should know something about your system, perhaps by having
- * a manual handy, when you edit this file. You should copy config.h-dist
- * to config.h, and edit config.h. Do not modify config.h-dist, so that
- * it will be easy to apply any patches that may be distributed.
- *
- * The general idea is that systems conforming to the various standards
- * should need to do the least amount of changing. Definining the various
- * items in ths file usually means that your system is missing that
- * particular feature.
- *
- * The order of preference in standard conformance is ANSI C, POSIX,
- * and the SVID.
- *
- * If you have no clue as to what's going on with your system, try
- * compiling gawk without editing this file and see what shows up
- * missing in the link stage. From there, you can probably figure out
- * which defines to turn on.
- */
-
-/**************************/
-/* Miscellanious features */
-/**************************/
-
-/*
- * BLKSIZE_MISSING
- *
- * Check your /usr/include/sys/stat.h file. If the stat structure
- * does not have a member named st_blksize, define this. (This will
- * most likely be the case on most System V systems prior to V.4.)
- */
-/* #define BLKSIZE_MISSING 1 */
-
-/*
- * SIGTYPE
- *
- * The return type of the routines passed to the signal function.
- * Modern systems use `void', older systems use `int'.
- * If left undefined, it will default to void.
- */
-/* #define SIGTYPE int */
-
-/*
- * SIZE_T_MISSING
- *
- * If your system has no typedef for size_t, define this to get a default
- */
-/* #define SIZE_T_MISSING 1 */
-
-/*
- * CHAR_UNSIGNED
- *
- * If your machine uses unsigned characters (IBM RT and RS/6000 and others)
- * then define this for use in regex.c
- */
-/* #define CHAR_UNSIGNED 1 */
-
-/*
- * HAVE_UNDERSCORE_SETJMP
- *
- * Check in your /usr/include/setjmp.h file. If there are routines
- * there named _setjmp and _longjmp, then you should define this.
- * Typically only systems derived from Berkeley Unix have this.
- */
-#define HAVE_UNDERSCORE_SETJMP 1
-
-/*
- * LIMITS_H_MISSING
- *
- * You don't have a <limits.h> include file.
- */
-/* #define LIMITS_H_MISSING 1 */
-
-/***********************************************/
-/* Missing library subroutines or system calls */
-/***********************************************/
-
-/*
- * MEMCMP_MISSING
- * MEMCPY_MISSING
- * MEMSET_MISSING
- *
- * These three routines are for manipulating blocks of memory. Most
- * likely they will either all three be present or all three be missing,
- * so they're grouped together.
- */
-/* #define MEMCMP_MISSING 1 */
-/* #define MEMCPY_MISSING 1 */
-/* #define MEMSET_MISSING 1 */
-
-/*
- * RANDOM_MISSING
- *
- * Your system does not have the random(3) suite of random number
- * generating routines. These are different than the old rand(3)
- * routines!
- */
-/* #define RANDOM_MISSING 1 */
-
-/*
- * STRCASE_MISSING
- *
- * Your system does not have the strcasemp() and strncasecmp()
- * routines that originated in Berkeley Unix.
- */
-/* #define STRCASE_MISSING 1 */
-
-/*
- * STRCHR_MISSING
- *
- * Your system does not have the strchr() and strrchr() functions.
- */
-/* #define STRCHR_MISSING 1 */
-
-/*
- * STRERROR_MISSING
- *
- * Your system lacks the ANSI C strerror() routine for returning the
- * strings associated with errno values.
- */
-/* #define STRERROR_MISSING 1 */
-
-/*
- * STRTOD_MISSING
- *
- * Your system does not have the strtod() routine for converting
- * strings to double precision floating point values.
- */
-/* #define STRTOD_MISSING 1 */
-
-/*
- * STRFTIME_MISSING
- *
- * Your system lacks the ANSI C strftime() routine for formatting
- * broken down time values.
- */
-/* #define STRFTIME_MISSING 1 */
-
-/*
- * TZSET_MISSING
- *
- * If you have a 4.2 BSD vintage system, then the strftime() routine
- * supplied in the missing directory won't be enough, because it relies on the
- * tzset() routine from System V / Posix. Fortunately, there is an
- * emulation for tzset() too that should do the trick. If you don't
- * have tzset(), define this.
- */
-/* #define TZSET_MISSING 1 */
-
-/*
- * TZNAME_MISSING
- *
- * Some systems do not support the external variables tzname and daylight.
- * If this is the case *and* strftime() is missing, define this.
- */
-/* #define TZNAME_MISSING 1 */
-
-/*
- * TM_ZONE_MISSING
- *
- * Your "struct tm" is missing the tm_zone field.
- * If this is the case *and* strftime() is missing *and* tzname is missing,
- * define this.
- */
-/* #define TM_ZONE_MISSING 1 */
-
-/*
- * STDC_HEADERS
- *
- * If your system does have ANSI compliant header files that
- * provide prototypes for library routines, then define this.
- */
-#define STDC_HEADERS 1
-
-/*
- * NO_TOKEN_PASTING
- *
- * If your compiler define's __STDC__ but does not support token
- * pasting (tok##tok), then define this.
- */
-/* #define NO_TOKEN_PASTING 1 */
-
-/*****************************************************************/
-/* Stuff related to the Standard I/O Library. */
-/*****************************************************************/
-/* Much of this is (still, unfortunately) black magic in nature. */
-/* You may have to use some or all of these together to get gawk */
-/* to work correctly. */
-/*****************************************************************/
-
-/*
- * NON_STD_SPRINTF
- *
- * Look in your /usr/include/stdio.h file. If the return type of the
- * sprintf() function is NOT `int', define this.
- */
-/* #define NON_STD_SPRINTF 1 */
-
-/*
- * VPRINTF_MISSING
- *
- * Define this if your system lacks vprintf() and the other routines
- * that go with it. This will trigger an attempt to use _doprnt().
- * If you don't have that, this attempt will fail and you are on your own.
- */
-/* #define VPRINTF_MISSING 1 */
-
-/*
- * Casts from size_t to int and back. These will become unnecessary
- * at some point in the future, but for now are required where the
- * two types are a different representation.
- */
-/* #define SZTC */
-/* #define INTC */
-
-/*
- * SYSTEM_MISSING
- *
- * Define this if your library does not provide a system function
- * or you are not entirely happy with it and would rather use
- * a provided replacement (atari only).
- */
-/* #define SYSTEM_MISSING 1 */
-
-/*
- * FMOD_MISSING
- *
- * Define this if your system lacks the fmod() function and modf() will
- * be used instead.
- */
-/* #define FMOD_MISSING 1 */
-
-
-/*******************************/
-/* Gawk configuration options. */
-/*******************************/
-
-/*
- * DEFPATH
- *
- * The default search path for the -f option of gawk. It is used
- * if the AWKPATH environment variable is undefined. The default
- * definition is provided here. Most likely you should not change
- * this.
- */
-
-/* #define DEFPATH ".:/usr/lib/awk:/usr/local/lib/awk" */
-/* #define ENVSEP ':' */
-
-/*
- * alloca already has a prototype defined - don't redefine it
- */
-#define ALLOCA_PROTO 1
-
-/*
- * srandom already has a prototype defined - don't redefine it
- */
-#define SRANDOM_PROTO 1
-
-/*
- * getpgrp() in sysvr4 and POSIX takes no argument
- */
-/* #define GETPGRP_NOARG 0 */
-
-/*
- * define const to nothing if not __STDC__
- */
-#ifndef __STDC__
-#define const
-#endif
-
-/* If svr4 and not gcc */
-/* #define SVR4 0 */
-#ifdef SVR4
-#define __svr4__ 1
-#endif
-
-/* anything that follows is for system-specific short-term kludges */
diff --git a/gnu/usr.bin/awk/dfa.c b/gnu/usr.bin/awk/dfa.c
deleted file mode 100644
index e9c832b..0000000
--- a/gnu/usr.bin/awk/dfa.c
+++ /dev/null
@@ -1,2613 +0,0 @@
-/* dfa.c - deterministic extended regexp routines for GNU
- Copyright (C) 1988 Free Software Foundation, Inc.
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2, or (at your option)
- any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
-
-/* Written June, 1988 by Mike Haertel
- Modified July, 1988 by Arthur David Olson to assist BMG speedups */
-
-#include <assert.h>
-#include <ctype.h>
-#include <stdio.h>
-
-#ifdef HAVE_CONFIG_H
-#include "config.h"
-#endif
-
-#ifdef STDC_HEADERS
-#include <stdlib.h>
-#else
-#include <sys/types.h>
-extern char *calloc(), *malloc(), *realloc();
-extern void free();
-#endif
-
-#if defined(HAVE_STRING_H) || defined(STDC_HEADERS)
-#include <string.h>
-#undef index
-#define index strchr
-#else
-#include <strings.h>
-#endif
-
-#ifndef DEBUG /* use the same approach as regex.c */
-#undef assert
-#define assert(e)
-#endif /* DEBUG */
-
-#ifndef isgraph
-#define isgraph(C) (isprint(C) && !isspace(C))
-#endif
-
-#define ISALPHA(C) isalpha(C)
-#define ISUPPER(C) isupper(C)
-#define ISLOWER(C) islower(C)
-#define ISDIGIT(C) isdigit(C)
-#define ISXDIGIT(C) isxdigit(C)
-#define ISSPACE(C) isspace(C)
-#define ISPUNCT(C) ispunct(C)
-#define ISALNUM(C) isalnum(C)
-#define ISPRINT(C) isprint(C)
-#define ISGRAPH(C) isgraph(C)
-#define ISCNTRL(C) iscntrl(C)
-
-#include "gnuregex.h"
-#include "dfa.h"
-
-#ifdef __STDC__
-typedef void *ptr_t;
-#else
-typedef char *ptr_t;
-#ifndef const
-#define const
-#endif
-#endif
-
-static void dfamust _RE_ARGS((struct dfa *dfa));
-
-static ptr_t xcalloc _RE_ARGS((size_t n, size_t s));
-static ptr_t xmalloc _RE_ARGS((size_t n));
-static ptr_t xrealloc _RE_ARGS((ptr_t p, size_t n));
-#ifdef DEBUG
-static void prtok _RE_ARGS((token t));
-#endif
-static int tstbit _RE_ARGS((int b, charclass c));
-static void setbit _RE_ARGS((int b, charclass c));
-static void clrbit _RE_ARGS((int b, charclass c));
-static void copyset _RE_ARGS((charclass src, charclass dst));
-static void zeroset _RE_ARGS((charclass s));
-static void notset _RE_ARGS((charclass s));
-static int equal _RE_ARGS((charclass s1, charclass s2));
-static int charclass_index _RE_ARGS((charclass s));
-static int looking_at _RE_ARGS((const char *s));
-static token lex _RE_ARGS((void));
-static void addtok _RE_ARGS((token t));
-static void atom _RE_ARGS((void));
-static int nsubtoks _RE_ARGS((int tindex));
-static void copytoks _RE_ARGS((int tindex, int ntokens));
-static void closure _RE_ARGS((void));
-static void branch _RE_ARGS((void));
-static void regexp _RE_ARGS((int toplevel));
-static void copy _RE_ARGS((position_set *src, position_set *dst));
-static void insert _RE_ARGS((position p, position_set *s));
-static void merge _RE_ARGS((position_set *s1, position_set *s2, position_set *m));
-static void delete _RE_ARGS((position p, position_set *s));
-static int state_index _RE_ARGS((struct dfa *d, position_set *s,
- int newline, int letter));
-static void build_state _RE_ARGS((int s, struct dfa *d));
-static void build_state_zero _RE_ARGS((struct dfa *d));
-static char *icatalloc _RE_ARGS((char *old, char *new));
-static char *icpyalloc _RE_ARGS((char *string));
-static char *istrstr _RE_ARGS((char *lookin, char *lookfor));
-static void ifree _RE_ARGS((char *cp));
-static void freelist _RE_ARGS((char **cpp));
-static char **enlist _RE_ARGS((char **cpp, char *new, size_t len));
-static char **comsubs _RE_ARGS((char *left, char *right));
-static char **addlists _RE_ARGS((char **old, char **new));
-static char **inboth _RE_ARGS((char **left, char **right));
-
-#ifdef __FreeBSD__
-static int collate_range_cmp (a, b)
- int a, b;
-{
- int r;
- static char s[2][2];
-
- if ((unsigned char)a == (unsigned char)b)
- return 0;
- s[0][0] = a;
- s[1][0] = b;
- if ((r = strcoll(s[0], s[1])) == 0)
- r = (unsigned char)a - (unsigned char)b;
- return r;
-}
-#endif
-
-static ptr_t
-xcalloc(n, s)
- size_t n;
- size_t s;
-{
- ptr_t r = calloc(n, s);
-
- if (!r)
- dfaerror("Memory exhausted");
- return r;
-}
-
-static ptr_t
-xmalloc(n)
- size_t n;
-{
- ptr_t r = malloc(n);
-
- assert(n != 0);
- if (!r)
- dfaerror("Memory exhausted");
- return r;
-}
-
-static ptr_t
-xrealloc(p, n)
- ptr_t p;
- size_t n;
-{
- ptr_t r = realloc(p, n);
-
- assert(n != 0);
- if (!r)
- dfaerror("Memory exhausted");
- return r;
-}
-
-#define CALLOC(p, t, n) ((p) = (t *) xcalloc((size_t)(n), sizeof (t)))
-#define MALLOC(p, t, n) ((p) = (t *) xmalloc((n) * sizeof (t)))
-#define REALLOC(p, t, n) ((p) = (t *) xrealloc((ptr_t) (p), (n) * sizeof (t)))
-
-/* Reallocate an array of type t if nalloc is too small for index. */
-#define REALLOC_IF_NECESSARY(p, t, nalloc, index) \
- if ((index) >= (nalloc)) \
- { \
- while ((index) >= (nalloc)) \
- (nalloc) *= 2; \
- REALLOC(p, t, nalloc); \
- }
-
-#ifdef DEBUG
-
-static void
-prtok(t)
- token t;
-{
- char *s;
-
- if (t < 0)
- fprintf(stderr, "END");
- else if (t < NOTCHAR)
- fprintf(stderr, "%c", t);
- else
- {
- switch (t)
- {
- case EMPTY: s = "EMPTY"; break;
- case BACKREF: s = "BACKREF"; break;
- case BEGLINE: s = "BEGLINE"; break;
- case ENDLINE: s = "ENDLINE"; break;
- case BEGWORD: s = "BEGWORD"; break;
- case ENDWORD: s = "ENDWORD"; break;
- case LIMWORD: s = "LIMWORD"; break;
- case NOTLIMWORD: s = "NOTLIMWORD"; break;
- case QMARK: s = "QMARK"; break;
- case STAR: s = "STAR"; break;
- case PLUS: s = "PLUS"; break;
- case CAT: s = "CAT"; break;
- case OR: s = "OR"; break;
- case ORTOP: s = "ORTOP"; break;
- case LPAREN: s = "LPAREN"; break;
- case RPAREN: s = "RPAREN"; break;
- default: s = "CSET"; break;
- }
- fprintf(stderr, "%s", s);
- }
-}
-#endif /* DEBUG */
-
-/* Stuff pertaining to charclasses. */
-
-static int
-tstbit(b, c)
- int b;
- charclass c;
-{
- return c[b / INTBITS] & 1 << b % INTBITS;
-}
-
-static void
-setbit(b, c)
- int b;
- charclass c;
-{
- c[b / INTBITS] |= 1 << b % INTBITS;
-}
-
-static void
-clrbit(b, c)
- int b;
- charclass c;
-{
- c[b / INTBITS] &= ~(1 << b % INTBITS);
-}
-
-static void
-copyset(src, dst)
- charclass src;
- charclass dst;
-{
- int i;
-
- for (i = 0; i < CHARCLASS_INTS; ++i)
- dst[i] = src[i];
-}
-
-static void
-zeroset(s)
- charclass s;
-{
- int i;
-
- for (i = 0; i < CHARCLASS_INTS; ++i)
- s[i] = 0;
-}
-
-static void
-notset(s)
- charclass s;
-{
- int i;
-
- for (i = 0; i < CHARCLASS_INTS; ++i)
- s[i] = ~s[i];
-}
-
-static int
-equal(s1, s2)
- charclass s1;
- charclass s2;
-{
- int i;
-
- for (i = 0; i < CHARCLASS_INTS; ++i)
- if (s1[i] != s2[i])
- return 0;
- return 1;
-}
-
-/* A pointer to the current dfa is kept here during parsing. */
-static struct dfa *dfa;
-
-/* Find the index of charclass s in dfa->charclasses, or allocate a new charclass. */
-static int
-charclass_index(s)
- charclass s;
-{
- int i;
-
- for (i = 0; i < dfa->cindex; ++i)
- if (equal(s, dfa->charclasses[i]))
- return i;
- REALLOC_IF_NECESSARY(dfa->charclasses, charclass, dfa->calloc, dfa->cindex);
- ++dfa->cindex;
- copyset(s, dfa->charclasses[i]);
- return i;
-}
-
-/* Syntax bits controlling the behavior of the lexical analyzer. */
-static reg_syntax_t syntax_bits, syntax_bits_set;
-
-/* Flag for case-folding letters into sets. */
-static int case_fold;
-
-/* Entry point to set syntax options. */
-void
-dfasyntax(bits, fold)
- reg_syntax_t bits;
- int fold;
-{
- syntax_bits_set = 1;
- syntax_bits = bits;
- case_fold = fold;
-}
-
-/* Lexical analyzer. All the dross that deals with the obnoxious
- GNU Regex syntax bits is located here. The poor, suffering
- reader is referred to the GNU Regex documentation for the
- meaning of the @#%!@#%^!@ syntax bits. */
-
-static char *lexstart; /* Pointer to beginning of input string. */
-static char *lexptr; /* Pointer to next input character. */
-static lexleft; /* Number of characters remaining. */
-static token lasttok; /* Previous token returned; initially END. */
-static int laststart; /* True if we're separated from beginning or (, |
- only by zero-width characters. */
-static int parens; /* Count of outstanding left parens. */
-static int minrep, maxrep; /* Repeat counts for {m,n}. */
-
-/* Note that characters become unsigned here. */
-#define FETCH(c, eoferr) \
- { \
- if (! lexleft) \
- if (eoferr != 0) \
- dfaerror(eoferr); \
- else \
- return lasttok = END; \
- (c) = (unsigned char) *lexptr++; \
- --lexleft; \
- }
-
-#ifdef __STDC__
-#define FUNC(F, P) static int F(int c) { return P(c); }
-#else
-#define FUNC(F, P) static int F(c) int c; { return P(c); }
-#endif
-
-FUNC(is_alpha, ISALPHA)
-FUNC(is_upper, ISUPPER)
-FUNC(is_lower, ISLOWER)
-FUNC(is_digit, ISDIGIT)
-FUNC(is_xdigit, ISXDIGIT)
-FUNC(is_space, ISSPACE)
-FUNC(is_punct, ISPUNCT)
-FUNC(is_alnum, ISALNUM)
-FUNC(is_print, ISPRINT)
-FUNC(is_graph, ISGRAPH)
-FUNC(is_cntrl, ISCNTRL)
-
-/* The following list maps the names of the Posix named character classes
- to predicate functions that determine whether a given character is in
- the class. The leading [ has already been eaten by the lexical analyzer. */
-static struct {
- const char *name;
- int (*pred) _RE_ARGS((int));
-} prednames[] = {
- { ":alpha:]", is_alpha },
- { ":upper:]", is_upper },
- { ":lower:]", is_lower },
- { ":digit:]", is_digit },
- { ":xdigit:]", is_xdigit },
- { ":space:]", is_space },
- { ":punct:]", is_punct },
- { ":alnum:]", is_alnum },
- { ":print:]", is_print },
- { ":graph:]", is_graph },
- { ":cntrl:]", is_cntrl },
- { 0 }
-};
-
-static int
-looking_at(s)
- const char *s;
-{
- size_t len;
-
- len = strlen(s);
- if (lexleft < len)
- return 0;
- return strncmp(s, lexptr, len) == 0;
-}
-
-static token
-lex()
-{
- token c, c1, c2;
- int backslash = 0, invert;
- charclass ccl;
- int i;
-
- /* Basic plan: We fetch a character. If it's a backslash,
- we set the backslash flag and go through the loop again.
- On the plus side, this avoids having a duplicate of the
- main switch inside the backslash case. On the minus side,
- it means that just about every case begins with
- "if (backslash) ...". */
- for (i = 0; i < 2; ++i)
- {
- FETCH(c, 0);
- switch (c)
- {
- case '\\':
- if (backslash)
- goto normal_char;
- if (lexleft == 0)
- dfaerror("Unfinished \\ escape");
- backslash = 1;
- break;
-
- case '^':
- if (backslash)
- goto normal_char;
- if (syntax_bits & RE_CONTEXT_INDEP_ANCHORS
- || lasttok == END
- || lasttok == LPAREN
- || lasttok == OR)
- return lasttok = BEGLINE;
- goto normal_char;
-
- case '$':
- if (backslash)
- goto normal_char;
- if (syntax_bits & RE_CONTEXT_INDEP_ANCHORS
- || lexleft == 0
- || (syntax_bits & RE_NO_BK_PARENS
- ? lexleft > 0 && *lexptr == ')'
- : lexleft > 1 && lexptr[0] == '\\' && lexptr[1] == ')')
- || (syntax_bits & RE_NO_BK_VBAR
- ? lexleft > 0 && *lexptr == '|'
- : lexleft > 1 && lexptr[0] == '\\' && lexptr[1] == '|')
- || ((syntax_bits & RE_NEWLINE_ALT)
- && lexleft > 0 && *lexptr == '\n'))
- return lasttok = ENDLINE;
- goto normal_char;
-
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- if (backslash && !(syntax_bits & RE_NO_BK_REFS))
- {
- laststart = 0;
- return lasttok = BACKREF;
- }
- goto normal_char;
-
- case '<':
- if (syntax_bits & RE_NO_GNU_OPS)
- goto normal_char;
- if (backslash)
- return lasttok = BEGWORD;
- goto normal_char;
-
- case '>':
- if (syntax_bits & RE_NO_GNU_OPS)
- goto normal_char;
- if (backslash)
- return lasttok = ENDWORD;
- goto normal_char;
-
- case 'b':
- if (syntax_bits & RE_NO_GNU_OPS)
- goto normal_char;
- if (backslash)
- return lasttok = LIMWORD;
- goto normal_char;
-
- case 'B':
- if (syntax_bits & RE_NO_GNU_OPS)
- goto normal_char;
- if (backslash)
- return lasttok = NOTLIMWORD;
- goto normal_char;
-
- case '?':
- if (syntax_bits & RE_LIMITED_OPS)
- goto normal_char;
- if (backslash != ((syntax_bits & RE_BK_PLUS_QM) != 0))
- goto normal_char;
- if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart)
- goto normal_char;
- return lasttok = QMARK;
-
- case '*':
- if (backslash)
- goto normal_char;
- if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart)
- goto normal_char;
- return lasttok = STAR;
-
- case '+':
- if (syntax_bits & RE_LIMITED_OPS)
- goto normal_char;
- if (backslash != ((syntax_bits & RE_BK_PLUS_QM) != 0))
- goto normal_char;
- if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart)
- goto normal_char;
- return lasttok = PLUS;
-
- case '{':
- if (!(syntax_bits & RE_INTERVALS))
- goto normal_char;
- if (backslash != ((syntax_bits & RE_NO_BK_BRACES) == 0))
- goto normal_char;
- minrep = maxrep = 0;
- /* Cases:
- {M} - exact count
- {M,} - minimum count, maximum is infinity
- {,M} - 0 through M
- {M,N} - M through N */
- FETCH(c, "unfinished repeat count");
- if (ISDIGIT(c))
- {
- minrep = c - '0';
- for (;;)
- {
- FETCH(c, "unfinished repeat count");
- if (!ISDIGIT(c))
- break;
- minrep = 10 * minrep + c - '0';
- }
- }
- else if (c != ',')
- dfaerror("malformed repeat count");
- if (c == ',')
- for (;;)
- {
- FETCH(c, "unfinished repeat count");
- if (!ISDIGIT(c))
- break;
- maxrep = 10 * maxrep + c - '0';
- }
- else
- maxrep = minrep;
- if (!(syntax_bits & RE_NO_BK_BRACES))
- {
- if (c != '\\')
- dfaerror("malformed repeat count");
- FETCH(c, "unfinished repeat count");
- }
- if (c != '}')
- dfaerror("malformed repeat count");
- laststart = 0;
- return lasttok = REPMN;
-
- case '|':
- if (syntax_bits & RE_LIMITED_OPS)
- goto normal_char;
- if (backslash != ((syntax_bits & RE_NO_BK_VBAR) == 0))
- goto normal_char;
- laststart = 1;
- return lasttok = OR;
-
- case '\n':
- if (syntax_bits & RE_LIMITED_OPS
- || backslash
- || !(syntax_bits & RE_NEWLINE_ALT))
- goto normal_char;
- laststart = 1;
- return lasttok = OR;
-
- case '(':
- if (backslash != ((syntax_bits & RE_NO_BK_PARENS) == 0))
- goto normal_char;
- ++parens;
- laststart = 1;
- return lasttok = LPAREN;
-
- case ')':
- if (backslash != ((syntax_bits & RE_NO_BK_PARENS) == 0))
- goto normal_char;
- if (parens == 0 && syntax_bits & RE_UNMATCHED_RIGHT_PAREN_ORD)
- goto normal_char;
- --parens;
- laststart = 0;
- return lasttok = RPAREN;
-
- case '.':
- if (backslash)
- goto normal_char;
- zeroset(ccl);
- notset(ccl);
- if (!(syntax_bits & RE_DOT_NEWLINE))
- clrbit('\n', ccl);
- if (syntax_bits & RE_DOT_NOT_NULL)
- clrbit('\0', ccl);
- laststart = 0;
- return lasttok = CSET + charclass_index(ccl);
-
- case 'w':
- case 'W':
- if (!backslash || (syntax_bits & RE_NO_GNU_OPS))
- goto normal_char;
- zeroset(ccl);
- for (c2 = 0; c2 < NOTCHAR; ++c2)
- if (ISALNUM(c2))
- setbit(c2, ccl);
- if (c == 'W')
- notset(ccl);
- laststart = 0;
- return lasttok = CSET + charclass_index(ccl);
-
- case '[':
- if (backslash)
- goto normal_char;
- zeroset(ccl);
- FETCH(c, "Unbalanced [");
- if (c == '^')
- {
- FETCH(c, "Unbalanced [");
- invert = 1;
- }
- else
- invert = 0;
- do
- {
- /* Nobody ever said this had to be fast. :-)
- Note that if we're looking at some other [:...:]
- construct, we just treat it as a bunch of ordinary
- characters. We can do this because we assume
- regex has checked for syntax errors before
- dfa is ever called. */
- if (c == '[' && (syntax_bits & RE_CHAR_CLASSES))
- for (c1 = 0; prednames[c1].name; ++c1)
- if (looking_at(prednames[c1].name))
- {
- for (c2 = 0; c2 < NOTCHAR; ++c2)
- if ((*prednames[c1].pred)(c2))
- setbit(c2, ccl);
- lexptr += strlen(prednames[c1].name);
- lexleft -= strlen(prednames[c1].name);
- FETCH(c1, "Unbalanced [");
- goto skip;
- }
- if (c == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
- FETCH(c, "Unbalanced [");
- FETCH(c1, "Unbalanced [");
- if (c1 == '-')
- {
- FETCH(c2, "Unbalanced [");
- if (c2 == ']')
- {
- /* In the case [x-], the - is an ordinary hyphen,
- which is left in c1, the lookahead character. */
- --lexptr;
- ++lexleft;
- c2 = c;
- }
- else
- {
- if (c2 == '\\'
- && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
- FETCH(c2, "Unbalanced [");
- FETCH(c1, "Unbalanced [");
- }
- }
- else
- c2 = c;
-#ifdef __FreeBSD__
- { token c3;
-
- if (collate_range_cmp(c, c2) > 0) {
- FETCH(c2, "Invalid range");
- goto skip;
- }
-
- for (c3 = 0; c3 < NOTCHAR; ++c3)
- if ( collate_range_cmp(c, c3) <= 0
- && collate_range_cmp(c3, c2) <= 0
- ) {
- setbit(c3, ccl);
- if (case_fold)
- if (ISUPPER(c3))
- setbit(tolower(c3), ccl);
- else if (ISLOWER(c3))
- setbit(toupper(c3), ccl);
- }
- }
-#else
- while (c <= c2)
- {
- setbit(c, ccl);
- if (case_fold)
- if (ISUPPER(c))
- setbit(tolower(c), ccl);
- else if (ISLOWER(c))
- setbit(toupper(c), ccl);
- ++c;
- }
-#endif
- skip:
- ;
- }
- while ((c = c1) != ']');
- if (invert)
- {
- notset(ccl);
- if (syntax_bits & RE_HAT_LISTS_NOT_NEWLINE)
- clrbit('\n', ccl);
- }
- laststart = 0;
- return lasttok = CSET + charclass_index(ccl);
-
- default:
- normal_char:
- laststart = 0;
- if (case_fold && ISALPHA(c))
- {
- zeroset(ccl);
- setbit(c, ccl);
- if (isupper(c))
- setbit(tolower(c), ccl);
- else
- setbit(toupper(c), ccl);
- return lasttok = CSET + charclass_index(ccl);
- }
- return c;
- }
- }
-
- /* The above loop should consume at most a backslash
- and some other character. */
- abort();
-}
-
-/* Recursive descent parser for regular expressions. */
-
-static token tok; /* Lookahead token. */
-static depth; /* Current depth of a hypothetical stack
- holding deferred productions. This is
- used to determine the depth that will be
- required of the real stack later on in
- dfaanalyze(). */
-
-/* Add the given token to the parse tree, maintaining the depth count and
- updating the maximum depth if necessary. */
-static void
-addtok(t)
- token t;
-{
- REALLOC_IF_NECESSARY(dfa->tokens, token, dfa->talloc, dfa->tindex);
- dfa->tokens[dfa->tindex++] = t;
-
- switch (t)
- {
- case QMARK:
- case STAR:
- case PLUS:
- break;
-
- case CAT:
- case OR:
- case ORTOP:
- --depth;
- break;
-
- default:
- ++dfa->nleaves;
- case EMPTY:
- ++depth;
- break;
- }
- if (depth > dfa->depth)
- dfa->depth = depth;
-}
-
-/* The grammar understood by the parser is as follows.
-
- regexp:
- regexp OR branch
- branch
-
- branch:
- branch closure
- closure
-
- closure:
- closure QMARK
- closure STAR
- closure PLUS
- atom
-
- atom:
- <normal character>
- CSET
- BACKREF
- BEGLINE
- ENDLINE
- BEGWORD
- ENDWORD
- LIMWORD
- NOTLIMWORD
- <empty>
-
- The parser builds a parse tree in postfix form in an array of tokens. */
-
-static void
-atom()
-{
- if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
- || tok == BEGLINE || tok == ENDLINE || tok == BEGWORD
- || tok == ENDWORD || tok == LIMWORD || tok == NOTLIMWORD)
- {
- addtok(tok);
- tok = lex();
- }
- else if (tok == LPAREN)
- {
- tok = lex();
- regexp(0);
- if (tok != RPAREN)
- dfaerror("Unbalanced (");
- tok = lex();
- }
- else
- addtok(EMPTY);
-}
-
-/* Return the number of tokens in the given subexpression. */
-static int
-nsubtoks(tindex)
-int tindex;
-{
- int ntoks1;
-
- switch (dfa->tokens[tindex - 1])
- {
- default:
- return 1;
- case QMARK:
- case STAR:
- case PLUS:
- return 1 + nsubtoks(tindex - 1);
- case CAT:
- case OR:
- case ORTOP:
- ntoks1 = nsubtoks(tindex - 1);
- return 1 + ntoks1 + nsubtoks(tindex - 1 - ntoks1);
- }
-}
-
-/* Copy the given subexpression to the top of the tree. */
-static void
-copytoks(tindex, ntokens)
- int tindex, ntokens;
-{
- int i;
-
- for (i = 0; i < ntokens; ++i)
- addtok(dfa->tokens[tindex + i]);
-}
-
-static void
-closure()
-{
- int tindex, ntokens, i;
-
- atom();
- while (tok == QMARK || tok == STAR || tok == PLUS || tok == REPMN)
- if (tok == REPMN)
- {
- ntokens = nsubtoks(dfa->tindex);
- tindex = dfa->tindex - ntokens;
- if (maxrep == 0)
- addtok(PLUS);
- if (minrep == 0)
- addtok(QMARK);
- for (i = 1; i < minrep; ++i)
- {
- copytoks(tindex, ntokens);
- addtok(CAT);
- }
- for (; i < maxrep; ++i)
- {
- copytoks(tindex, ntokens);
- addtok(QMARK);
- addtok(CAT);
- }
- tok = lex();
- }
- else
- {
- addtok(tok);
- tok = lex();
- }
-}
-
-static void
-branch()
-{
- closure();
- while (tok != RPAREN && tok != OR && tok >= 0)
- {
- closure();
- addtok(CAT);
- }
-}
-
-static void
-regexp(toplevel)
- int toplevel;
-{
- branch();
- while (tok == OR)
- {
- tok = lex();
- branch();
- if (toplevel)
- addtok(ORTOP);
- else
- addtok(OR);
- }
-}
-
-/* Main entry point for the parser. S is a string to be parsed, len is the
- length of the string, so s can include NUL characters. D is a pointer to
- the struct dfa to parse into. */
-void
-dfaparse(s, len, d)
- char *s;
- size_t len;
- struct dfa *d;
-
-{
- dfa = d;
- lexstart = lexptr = s;
- lexleft = len;
- lasttok = END;
- laststart = 1;
- parens = 0;
-
- if (! syntax_bits_set)
- dfaerror("No syntax specified");
-
- tok = lex();
- depth = d->depth;
-
- regexp(1);
-
- if (tok != END)
- dfaerror("Unbalanced )");
-
- addtok(END - d->nregexps);
- addtok(CAT);
-
- if (d->nregexps)
- addtok(ORTOP);
-
- ++d->nregexps;
-}
-
-/* Some primitives for operating on sets of positions. */
-
-/* Copy one set to another; the destination must be large enough. */
-static void
-copy(src, dst)
- position_set *src;
- position_set *dst;
-{
- int i;
-
- for (i = 0; i < src->nelem; ++i)
- dst->elems[i] = src->elems[i];
- dst->nelem = src->nelem;
-}
-
-/* Insert a position in a set. Position sets are maintained in sorted
- order according to index. If position already exists in the set with
- the same index then their constraints are logically or'd together.
- S->elems must point to an array large enough to hold the resulting set. */
-static void
-insert(p, s)
- position p;
- position_set *s;
-{
- int i;
- position t1, t2;
-
- for (i = 0; i < s->nelem && p.index < s->elems[i].index; ++i)
- continue;
- if (i < s->nelem && p.index == s->elems[i].index)
- s->elems[i].constraint |= p.constraint;
- else
- {
- t1 = p;
- ++s->nelem;
- while (i < s->nelem)
- {
- t2 = s->elems[i];
- s->elems[i++] = t1;
- t1 = t2;
- }
- }
-}
-
-/* Merge two sets of positions into a third. The result is exactly as if
- the positions of both sets were inserted into an initially empty set. */
-static void
-merge(s1, s2, m)
- position_set *s1;
- position_set *s2;
- position_set *m;
-{
- int i = 0, j = 0;
-
- m->nelem = 0;
- while (i < s1->nelem && j < s2->nelem)
- if (s1->elems[i].index > s2->elems[j].index)
- m->elems[m->nelem++] = s1->elems[i++];
- else if (s1->elems[i].index < s2->elems[j].index)
- m->elems[m->nelem++] = s2->elems[j++];
- else
- {
- m->elems[m->nelem] = s1->elems[i++];
- m->elems[m->nelem++].constraint |= s2->elems[j++].constraint;
- }
- while (i < s1->nelem)
- m->elems[m->nelem++] = s1->elems[i++];
- while (j < s2->nelem)
- m->elems[m->nelem++] = s2->elems[j++];
-}
-
-/* Delete a position from a set. */
-static void
-delete(p, s)
- position p;
- position_set *s;
-{
- int i;
-
- for (i = 0; i < s->nelem; ++i)
- if (p.index == s->elems[i].index)
- break;
- if (i < s->nelem)
- for (--s->nelem; i < s->nelem; ++i)
- s->elems[i] = s->elems[i + 1];
-}
-
-/* Find the index of the state corresponding to the given position set with
- the given preceding context, or create a new state if there is no such
- state. Newline and letter tell whether we got here on a newline or
- letter, respectively. */
-static int
-state_index(d, s, newline, letter)
- struct dfa *d;
- position_set *s;
- int newline;
- int letter;
-{
- int hash = 0;
- int constraint;
- int i, j;
-
- newline = newline ? 1 : 0;
- letter = letter ? 1 : 0;
-
- for (i = 0; i < s->nelem; ++i)
- hash ^= s->elems[i].index + s->elems[i].constraint;
-
- /* Try to find a state that exactly matches the proposed one. */
- for (i = 0; i < d->sindex; ++i)
- {
- if (hash != d->states[i].hash || s->nelem != d->states[i].elems.nelem
- || newline != d->states[i].newline || letter != d->states[i].letter)
- continue;
- for (j = 0; j < s->nelem; ++j)
- if (s->elems[j].constraint
- != d->states[i].elems.elems[j].constraint
- || s->elems[j].index != d->states[i].elems.elems[j].index)
- break;
- if (j == s->nelem)
- return i;
- }
-
- /* We'll have to create a new state. */
- REALLOC_IF_NECESSARY(d->states, dfa_state, d->salloc, d->sindex);
- d->states[i].hash = hash;
- MALLOC(d->states[i].elems.elems, position, s->nelem);
- copy(s, &d->states[i].elems);
- d->states[i].newline = newline;
- d->states[i].letter = letter;
- d->states[i].backref = 0;
- d->states[i].constraint = 0;
- d->states[i].first_end = 0;
- for (j = 0; j < s->nelem; ++j)
- if (d->tokens[s->elems[j].index] < 0)
- {
- constraint = s->elems[j].constraint;
- if (SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 0)
- || SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 1)
- || SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 0)
- || SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 1))
- d->states[i].constraint |= constraint;
- if (! d->states[i].first_end)
- d->states[i].first_end = d->tokens[s->elems[j].index];
- }
- else if (d->tokens[s->elems[j].index] == BACKREF)
- {
- d->states[i].constraint = NO_CONSTRAINT;
- d->states[i].backref = 1;
- }
-
- ++d->sindex;
-
- return i;
-}
-
-/* Find the epsilon closure of a set of positions. If any position of the set
- contains a symbol that matches the empty string in some context, replace
- that position with the elements of its follow labeled with an appropriate
- constraint. Repeat exhaustively until no funny positions are left.
- S->elems must be large enough to hold the result. */
-static void epsclosure _RE_ARGS((position_set *s, struct dfa *d));
-
-static void
-epsclosure(s, d)
- position_set *s;
- struct dfa *d;
-{
- int i, j;
- int *visited;
- position p, old;
-
- MALLOC(visited, int, d->tindex);
- for (i = 0; i < d->tindex; ++i)
- visited[i] = 0;
-
- for (i = 0; i < s->nelem; ++i)
- if (d->tokens[s->elems[i].index] >= NOTCHAR
- && d->tokens[s->elems[i].index] != BACKREF
- && d->tokens[s->elems[i].index] < CSET)
- {
- old = s->elems[i];
- p.constraint = old.constraint;
- delete(s->elems[i], s);
- if (visited[old.index])
- {
- --i;
- continue;
- }
- visited[old.index] = 1;
- switch (d->tokens[old.index])
- {
- case BEGLINE:
- p.constraint &= BEGLINE_CONSTRAINT;
- break;
- case ENDLINE:
- p.constraint &= ENDLINE_CONSTRAINT;
- break;
- case BEGWORD:
- p.constraint &= BEGWORD_CONSTRAINT;
- break;
- case ENDWORD:
- p.constraint &= ENDWORD_CONSTRAINT;
- break;
- case LIMWORD:
- p.constraint &= LIMWORD_CONSTRAINT;
- break;
- case NOTLIMWORD:
- p.constraint &= NOTLIMWORD_CONSTRAINT;
- break;
- default:
- break;
- }
- for (j = 0; j < d->follows[old.index].nelem; ++j)
- {
- p.index = d->follows[old.index].elems[j].index;
- insert(p, s);
- }
- /* Force rescan to start at the beginning. */
- i = -1;
- }
-
- free(visited);
-}
-
-/* Perform bottom-up analysis on the parse tree, computing various functions.
- Note that at this point, we're pretending constructs like \< are real
- characters rather than constraints on what can follow them.
-
- Nullable: A node is nullable if it is at the root of a regexp that can
- match the empty string.
- * EMPTY leaves are nullable.
- * No other leaf is nullable.
- * A QMARK or STAR node is nullable.
- * A PLUS node is nullable if its argument is nullable.
- * A CAT node is nullable if both its arguments are nullable.
- * An OR node is nullable if either argument is nullable.
-
- Firstpos: The firstpos of a node is the set of positions (nonempty leaves)
- that could correspond to the first character of a string matching the
- regexp rooted at the given node.
- * EMPTY leaves have empty firstpos.
- * The firstpos of a nonempty leaf is that leaf itself.
- * The firstpos of a QMARK, STAR, or PLUS node is the firstpos of its
- argument.
- * The firstpos of a CAT node is the firstpos of the left argument, union
- the firstpos of the right if the left argument is nullable.
- * The firstpos of an OR node is the union of firstpos of each argument.
-
- Lastpos: The lastpos of a node is the set of positions that could
- correspond to the last character of a string matching the regexp at
- the given node.
- * EMPTY leaves have empty lastpos.
- * The lastpos of a nonempty leaf is that leaf itself.
- * The lastpos of a QMARK, STAR, or PLUS node is the lastpos of its
- argument.
- * The lastpos of a CAT node is the lastpos of its right argument, union
- the lastpos of the left if the right argument is nullable.
- * The lastpos of an OR node is the union of the lastpos of each argument.
-
- Follow: The follow of a position is the set of positions that could
- correspond to the character following a character matching the node in
- a string matching the regexp. At this point we consider special symbols
- that match the empty string in some context to be just normal characters.
- Later, if we find that a special symbol is in a follow set, we will
- replace it with the elements of its follow, labeled with an appropriate
- constraint.
- * Every node in the firstpos of the argument of a STAR or PLUS node is in
- the follow of every node in the lastpos.
- * Every node in the firstpos of the second argument of a CAT node is in
- the follow of every node in the lastpos of the first argument.
-
- Because of the postfix representation of the parse tree, the depth-first
- analysis is conveniently done by a linear scan with the aid of a stack.
- Sets are stored as arrays of the elements, obeying a stack-like allocation
- scheme; the number of elements in each set deeper in the stack can be
- used to determine the address of a particular set's array. */
-void
-dfaanalyze(d, searchflag)
- struct dfa *d;
- int searchflag;
-{
- int *nullable; /* Nullable stack. */
- int *nfirstpos; /* Element count stack for firstpos sets. */
- position *firstpos; /* Array where firstpos elements are stored. */
- int *nlastpos; /* Element count stack for lastpos sets. */
- position *lastpos; /* Array where lastpos elements are stored. */
- int *nalloc; /* Sizes of arrays allocated to follow sets. */
- position_set tmp; /* Temporary set for merging sets. */
- position_set merged; /* Result of merging sets. */
- int wants_newline; /* True if some position wants newline info. */
- int *o_nullable;
- int *o_nfirst, *o_nlast;
- position *o_firstpos, *o_lastpos;
- int i, j;
- position *pos;
-
-#ifdef DEBUG
- fprintf(stderr, "dfaanalyze:\n");
- for (i = 0; i < d->tindex; ++i)
- {
- fprintf(stderr, " %d:", i);
- prtok(d->tokens[i]);
- }
- putc('\n', stderr);
-#endif
-
- d->searchflag = searchflag;
-
- MALLOC(nullable, int, d->depth);
- o_nullable = nullable;
- MALLOC(nfirstpos, int, d->depth);
- o_nfirst = nfirstpos;
- MALLOC(firstpos, position, d->nleaves);
- o_firstpos = firstpos, firstpos += d->nleaves;
- MALLOC(nlastpos, int, d->depth);
- o_nlast = nlastpos;
- MALLOC(lastpos, position, d->nleaves);
- o_lastpos = lastpos, lastpos += d->nleaves;
- MALLOC(nalloc, int, d->tindex);
- for (i = 0; i < d->tindex; ++i)
- nalloc[i] = 0;
- MALLOC(merged.elems, position, d->nleaves);
-
- CALLOC(d->follows, position_set, d->tindex);
-
- for (i = 0; i < d->tindex; ++i)
-#ifdef DEBUG
- { /* Nonsyntactic #ifdef goo... */
-#endif
- switch (d->tokens[i])
- {
- case EMPTY:
- /* The empty set is nullable. */
- *nullable++ = 1;
-
- /* The firstpos and lastpos of the empty leaf are both empty. */
- *nfirstpos++ = *nlastpos++ = 0;
- break;
-
- case STAR:
- case PLUS:
- /* Every element in the firstpos of the argument is in the follow
- of every element in the lastpos. */
- tmp.nelem = nfirstpos[-1];
- tmp.elems = firstpos;
- pos = lastpos;
- for (j = 0; j < nlastpos[-1]; ++j)
- {
- merge(&tmp, &d->follows[pos[j].index], &merged);
- REALLOC_IF_NECESSARY(d->follows[pos[j].index].elems, position,
- nalloc[pos[j].index], merged.nelem - 1);
- copy(&merged, &d->follows[pos[j].index]);
- }
-
- case QMARK:
- /* A QMARK or STAR node is automatically nullable. */
- if (d->tokens[i] != PLUS)
- nullable[-1] = 1;
- break;
-
- case CAT:
- /* Every element in the firstpos of the second argument is in the
- follow of every element in the lastpos of the first argument. */
- tmp.nelem = nfirstpos[-1];
- tmp.elems = firstpos;
- pos = lastpos + nlastpos[-1];
- for (j = 0; j < nlastpos[-2]; ++j)
- {
- merge(&tmp, &d->follows[pos[j].index], &merged);
- REALLOC_IF_NECESSARY(d->follows[pos[j].index].elems, position,
- nalloc[pos[j].index], merged.nelem - 1);
- copy(&merged, &d->follows[pos[j].index]);
- }
-
- /* The firstpos of a CAT node is the firstpos of the first argument,
- union that of the second argument if the first is nullable. */
- if (nullable[-2])
- nfirstpos[-2] += nfirstpos[-1];
- else
- firstpos += nfirstpos[-1];
- --nfirstpos;
-
- /* The lastpos of a CAT node is the lastpos of the second argument,
- union that of the first argument if the second is nullable. */
- if (nullable[-1])
- nlastpos[-2] += nlastpos[-1];
- else
- {
- pos = lastpos + nlastpos[-2];
- for (j = nlastpos[-1] - 1; j >= 0; --j)
- pos[j] = lastpos[j];
- lastpos += nlastpos[-2];
- nlastpos[-2] = nlastpos[-1];
- }
- --nlastpos;
-
- /* A CAT node is nullable if both arguments are nullable. */
- nullable[-2] = nullable[-1] && nullable[-2];
- --nullable;
- break;
-
- case OR:
- case ORTOP:
- /* The firstpos is the union of the firstpos of each argument. */
- nfirstpos[-2] += nfirstpos[-1];
- --nfirstpos;
-
- /* The lastpos is the union of the lastpos of each argument. */
- nlastpos[-2] += nlastpos[-1];
- --nlastpos;
-
- /* An OR node is nullable if either argument is nullable. */
- nullable[-2] = nullable[-1] || nullable[-2];
- --nullable;
- break;
-
- default:
- /* Anything else is a nonempty position. (Note that special
- constructs like \< are treated as nonempty strings here;
- an "epsilon closure" effectively makes them nullable later.
- Backreferences have to get a real position so we can detect
- transitions on them later. But they are nullable. */
- *nullable++ = d->tokens[i] == BACKREF;
-
- /* This position is in its own firstpos and lastpos. */
- *nfirstpos++ = *nlastpos++ = 1;
- --firstpos, --lastpos;
- firstpos->index = lastpos->index = i;
- firstpos->constraint = lastpos->constraint = NO_CONSTRAINT;
-
- /* Allocate the follow set for this position. */
- nalloc[i] = 1;
- MALLOC(d->follows[i].elems, position, nalloc[i]);
- break;
- }
-#ifdef DEBUG
- /* ... balance the above nonsyntactic #ifdef goo... */
- fprintf(stderr, "node %d:", i);
- prtok(d->tokens[i]);
- putc('\n', stderr);
- fprintf(stderr, nullable[-1] ? " nullable: yes\n" : " nullable: no\n");
- fprintf(stderr, " firstpos:");
- for (j = nfirstpos[-1] - 1; j >= 0; --j)
- {
- fprintf(stderr, " %d:", firstpos[j].index);
- prtok(d->tokens[firstpos[j].index]);
- }
- fprintf(stderr, "\n lastpos:");
- for (j = nlastpos[-1] - 1; j >= 0; --j)
- {
- fprintf(stderr, " %d:", lastpos[j].index);
- prtok(d->tokens[lastpos[j].index]);
- }
- putc('\n', stderr);
- }
-#endif
-
- /* For each follow set that is the follow set of a real position, replace
- it with its epsilon closure. */
- for (i = 0; i < d->tindex; ++i)
- if (d->tokens[i] < NOTCHAR || d->tokens[i] == BACKREF
- || d->tokens[i] >= CSET)
- {
-#ifdef DEBUG
- fprintf(stderr, "follows(%d:", i);
- prtok(d->tokens[i]);
- fprintf(stderr, "):");
- for (j = d->follows[i].nelem - 1; j >= 0; --j)
- {
- fprintf(stderr, " %d:", d->follows[i].elems[j].index);
- prtok(d->tokens[d->follows[i].elems[j].index]);
- }
- putc('\n', stderr);
-#endif
- copy(&d->follows[i], &merged);
- epsclosure(&merged, d);
- if (d->follows[i].nelem < merged.nelem)
- REALLOC(d->follows[i].elems, position, merged.nelem);
- copy(&merged, &d->follows[i]);
- }
-
- /* Get the epsilon closure of the firstpos of the regexp. The result will
- be the set of positions of state 0. */
- merged.nelem = 0;
- for (i = 0; i < nfirstpos[-1]; ++i)
- insert(firstpos[i], &merged);
- epsclosure(&merged, d);
-
- /* Check if any of the positions of state 0 will want newline context. */
- wants_newline = 0;
- for (i = 0; i < merged.nelem; ++i)
- if (PREV_NEWLINE_DEPENDENT(merged.elems[i].constraint))
- wants_newline = 1;
-
- /* Build the initial state. */
- d->salloc = 1;
- d->sindex = 0;
- MALLOC(d->states, dfa_state, d->salloc);
- state_index(d, &merged, wants_newline, 0);
-
- free(o_nullable);
- free(o_nfirst);
- free(o_firstpos);
- free(o_nlast);
- free(o_lastpos);
- free(nalloc);
- free(merged.elems);
-}
-
-/* Find, for each character, the transition out of state s of d, and store
- it in the appropriate slot of trans.
-
- We divide the positions of s into groups (positions can appear in more
- than one group). Each group is labeled with a set of characters that
- every position in the group matches (taking into account, if necessary,
- preceding context information of s). For each group, find the union
- of the its elements' follows. This set is the set of positions of the
- new state. For each character in the group's label, set the transition
- on this character to be to a state corresponding to the set's positions,
- and its associated backward context information, if necessary.
-
- If we are building a searching matcher, we include the positions of state
- 0 in every state.
-
- The collection of groups is constructed by building an equivalence-class
- partition of the positions of s.
-
- For each position, find the set of characters C that it matches. Eliminate
- any characters from C that fail on grounds of backward context.
-
- Search through the groups, looking for a group whose label L has nonempty
- intersection with C. If L - C is nonempty, create a new group labeled
- L - C and having the same positions as the current group, and set L to
- the intersection of L and C. Insert the position in this group, set
- C = C - L, and resume scanning.
-
- If after comparing with every group there are characters remaining in C,
- create a new group labeled with the characters of C and insert this
- position in that group. */
-void
-dfastate(s, d, trans)
- int s;
- struct dfa *d;
- int trans[];
-{
- position_set grps[NOTCHAR]; /* As many as will ever be needed. */
- charclass labels[NOTCHAR]; /* Labels corresponding to the groups. */
- int ngrps = 0; /* Number of groups actually used. */
- position pos; /* Current position being considered. */
- charclass matches; /* Set of matching characters. */
- int matchesf; /* True if matches is nonempty. */
- charclass intersect; /* Intersection with some label set. */
- int intersectf; /* True if intersect is nonempty. */
- charclass leftovers; /* Stuff in the label that didn't match. */
- int leftoversf; /* True if leftovers is nonempty. */
- static charclass letters; /* Set of characters considered letters. */
- static charclass newline; /* Set of characters that aren't newline. */
- position_set follows; /* Union of the follows of some group. */
- position_set tmp; /* Temporary space for merging sets. */
- int state; /* New state. */
- int wants_newline; /* New state wants to know newline context. */
- int state_newline; /* New state on a newline transition. */
- int wants_letter; /* New state wants to know letter context. */
- int state_letter; /* New state on a letter transition. */
- static initialized; /* Flag for static initialization. */
- int i, j, k;
-
- /* Initialize the set of letters, if necessary. */
- if (! initialized)
- {
- initialized = 1;
- for (i = 0; i < NOTCHAR; ++i)
- if (ISALNUM(i))
- setbit(i, letters);
- setbit('\n', newline);
- }
-
- zeroset(matches);
-
- for (i = 0; i < d->states[s].elems.nelem; ++i)
- {
- pos = d->states[s].elems.elems[i];
- if (d->tokens[pos.index] >= 0 && d->tokens[pos.index] < NOTCHAR)
- setbit(d->tokens[pos.index], matches);
- else if (d->tokens[pos.index] >= CSET)
- copyset(d->charclasses[d->tokens[pos.index] - CSET], matches);
- else
- continue;
-
- /* Some characters may need to be eliminated from matches because
- they fail in the current context. */
- if (pos.constraint != 0xFF)
- {
- if (! MATCHES_NEWLINE_CONTEXT(pos.constraint,
- d->states[s].newline, 1))
- clrbit('\n', matches);
- if (! MATCHES_NEWLINE_CONTEXT(pos.constraint,
- d->states[s].newline, 0))
- for (j = 0; j < CHARCLASS_INTS; ++j)
- matches[j] &= newline[j];
- if (! MATCHES_LETTER_CONTEXT(pos.constraint,
- d->states[s].letter, 1))
- for (j = 0; j < CHARCLASS_INTS; ++j)
- matches[j] &= ~letters[j];
- if (! MATCHES_LETTER_CONTEXT(pos.constraint,
- d->states[s].letter, 0))
- for (j = 0; j < CHARCLASS_INTS; ++j)
- matches[j] &= letters[j];
-
- /* If there are no characters left, there's no point in going on. */
- for (j = 0; j < CHARCLASS_INTS && !matches[j]; ++j)
- continue;
- if (j == CHARCLASS_INTS)
- continue;
- }
-
- for (j = 0; j < ngrps; ++j)
- {
- /* If matches contains a single character only, and the current
- group's label doesn't contain that character, go on to the
- next group. */
- if (d->tokens[pos.index] >= 0 && d->tokens[pos.index] < NOTCHAR
- && !tstbit(d->tokens[pos.index], labels[j]))
- continue;
-
- /* Check if this group's label has a nonempty intersection with
- matches. */
- intersectf = 0;
- for (k = 0; k < CHARCLASS_INTS; ++k)
- (intersect[k] = matches[k] & labels[j][k]) ? (intersectf = 1) : 0;
- if (! intersectf)
- continue;
-
- /* It does; now find the set differences both ways. */
- leftoversf = matchesf = 0;
- for (k = 0; k < CHARCLASS_INTS; ++k)
- {
- /* Even an optimizing compiler can't know this for sure. */
- int match = matches[k], label = labels[j][k];
-
- (leftovers[k] = ~match & label) ? (leftoversf = 1) : 0;
- (matches[k] = match & ~label) ? (matchesf = 1) : 0;
- }
-
- /* If there were leftovers, create a new group labeled with them. */
- if (leftoversf)
- {
- copyset(leftovers, labels[ngrps]);
- copyset(intersect, labels[j]);
- MALLOC(grps[ngrps].elems, position, d->nleaves);
- copy(&grps[j], &grps[ngrps]);
- ++ngrps;
- }
-
- /* Put the position in the current group. Note that there is no
- reason to call insert() here. */
- grps[j].elems[grps[j].nelem++] = pos;
-
- /* If every character matching the current position has been
- accounted for, we're done. */
- if (! matchesf)
- break;
- }
-
- /* If we've passed the last group, and there are still characters
- unaccounted for, then we'll have to create a new group. */
- if (j == ngrps)
- {
- copyset(matches, labels[ngrps]);
- zeroset(matches);
- MALLOC(grps[ngrps].elems, position, d->nleaves);
- grps[ngrps].nelem = 1;
- grps[ngrps].elems[0] = pos;
- ++ngrps;
- }
- }
-
- MALLOC(follows.elems, position, d->nleaves);
- MALLOC(tmp.elems, position, d->nleaves);
-
- /* If we are a searching matcher, the default transition is to a state
- containing the positions of state 0, otherwise the default transition
- is to fail miserably. */
- if (d->searchflag)
- {
- wants_newline = 0;
- wants_letter = 0;
- for (i = 0; i < d->states[0].elems.nelem; ++i)
- {
- if (PREV_NEWLINE_DEPENDENT(d->states[0].elems.elems[i].constraint))
- wants_newline = 1;
- if (PREV_LETTER_DEPENDENT(d->states[0].elems.elems[i].constraint))
- wants_letter = 1;
- }
- copy(&d->states[0].elems, &follows);
- state = state_index(d, &follows, 0, 0);
- if (wants_newline)
- state_newline = state_index(d, &follows, 1, 0);
- else
- state_newline = state;
- if (wants_letter)
- state_letter = state_index(d, &follows, 0, 1);
- else
- state_letter = state;
- for (i = 0; i < NOTCHAR; ++i)
- if (i == '\n')
- trans[i] = state_newline;
- else if (ISALNUM(i))
- trans[i] = state_letter;
- else
- trans[i] = state;
- }
- else
- for (i = 0; i < NOTCHAR; ++i)
- trans[i] = -1;
-
- for (i = 0; i < ngrps; ++i)
- {
- follows.nelem = 0;
-
- /* Find the union of the follows of the positions of the group.
- This is a hideously inefficient loop. Fix it someday. */
- for (j = 0; j < grps[i].nelem; ++j)
- for (k = 0; k < d->follows[grps[i].elems[j].index].nelem; ++k)
- insert(d->follows[grps[i].elems[j].index].elems[k], &follows);
-
- /* If we are building a searching matcher, throw in the positions
- of state 0 as well. */
- if (d->searchflag)
- for (j = 0; j < d->states[0].elems.nelem; ++j)
- insert(d->states[0].elems.elems[j], &follows);
-
- /* Find out if the new state will want any context information. */
- wants_newline = 0;
- if (tstbit('\n', labels[i]))
- for (j = 0; j < follows.nelem; ++j)
- if (PREV_NEWLINE_DEPENDENT(follows.elems[j].constraint))
- wants_newline = 1;
-
- wants_letter = 0;
- for (j = 0; j < CHARCLASS_INTS; ++j)
- if (labels[i][j] & letters[j])
- break;
- if (j < CHARCLASS_INTS)
- for (j = 0; j < follows.nelem; ++j)
- if (PREV_LETTER_DEPENDENT(follows.elems[j].constraint))
- wants_letter = 1;
-
- /* Find the state(s) corresponding to the union of the follows. */
- state = state_index(d, &follows, 0, 0);
- if (wants_newline)
- state_newline = state_index(d, &follows, 1, 0);
- else
- state_newline = state;
- if (wants_letter)
- state_letter = state_index(d, &follows, 0, 1);
- else
- state_letter = state;
-
- /* Set the transitions for each character in the current label. */
- for (j = 0; j < CHARCLASS_INTS; ++j)
- for (k = 0; k < INTBITS; ++k)
- if (labels[i][j] & 1 << k)
- {
- int c = j * INTBITS + k;
-
- if (c == '\n')
- trans[c] = state_newline;
- else if (ISALNUM(c))
- trans[c] = state_letter;
- else if (c < NOTCHAR)
- trans[c] = state;
- }
- }
-
- for (i = 0; i < ngrps; ++i)
- free(grps[i].elems);
- free(follows.elems);
- free(tmp.elems);
-}
-
-/* Some routines for manipulating a compiled dfa's transition tables.
- Each state may or may not have a transition table; if it does, and it
- is a non-accepting state, then d->trans[state] points to its table.
- If it is an accepting state then d->fails[state] points to its table.
- If it has no table at all, then d->trans[state] is NULL.
- TODO: Improve this comment, get rid of the unnecessary redundancy. */
-
-static void
-build_state(s, d)
- int s;
- struct dfa *d;
-{
- int *trans; /* The new transition table. */
- int i;
-
- /* Set an upper limit on the number of transition tables that will ever
- exist at once. 1024 is arbitrary. The idea is that the frequently
- used transition tables will be quickly rebuilt, whereas the ones that
- were only needed once or twice will be cleared away. */
- if (d->trcount >= 1024)
- {
- for (i = 0; i < d->tralloc; ++i)
- if (d->trans[i])
- {
- free((ptr_t) d->trans[i]);
- d->trans[i] = NULL;
- }
- else if (d->fails[i])
- {
- free((ptr_t) d->fails[i]);
- d->fails[i] = NULL;
- }
- d->trcount = 0;
- }
-
- ++d->trcount;
-
- /* Set up the success bits for this state. */
- d->success[s] = 0;
- if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 1, d->states[s].letter, 0,
- s, *d))
- d->success[s] |= 4;
- if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 0, d->states[s].letter, 1,
- s, *d))
- d->success[s] |= 2;
- if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 0, d->states[s].letter, 0,
- s, *d))
- d->success[s] |= 1;
-
- MALLOC(trans, int, NOTCHAR);
- dfastate(s, d, trans);
-
- /* Now go through the new transition table, and make sure that the trans
- and fail arrays are allocated large enough to hold a pointer for the
- largest state mentioned in the table. */
- for (i = 0; i < NOTCHAR; ++i)
- if (trans[i] >= d->tralloc)
- {
- int oldalloc = d->tralloc;
-
- while (trans[i] >= d->tralloc)
- d->tralloc *= 2;
- REALLOC(d->realtrans, int *, d->tralloc + 1);
- d->trans = d->realtrans + 1;
- REALLOC(d->fails, int *, d->tralloc);
- REALLOC(d->success, int, d->tralloc);
- REALLOC(d->newlines, int, d->tralloc);
- while (oldalloc < d->tralloc)
- {
- d->trans[oldalloc] = NULL;
- d->fails[oldalloc++] = NULL;
- }
- }
-
- /* Keep the newline transition in a special place so we can use it as
- a sentinel. */
- d->newlines[s] = trans['\n'];
- trans['\n'] = -1;
-
- if (ACCEPTING(s, *d))
- d->fails[s] = trans;
- else
- d->trans[s] = trans;
-}
-
-static void
-build_state_zero(d)
- struct dfa *d;
-{
- d->tralloc = 1;
- d->trcount = 0;
- CALLOC(d->realtrans, int *, d->tralloc + 1);
- d->trans = d->realtrans + 1;
- CALLOC(d->fails, int *, d->tralloc);
- MALLOC(d->success, int, d->tralloc);
- MALLOC(d->newlines, int, d->tralloc);
- build_state(0, d);
-}
-
-/* Search through a buffer looking for a match to the given struct dfa.
- Find the first occurrence of a string matching the regexp in the buffer,
- and the shortest possible version thereof. Return a pointer to the first
- character after the match, or NULL if none is found. Begin points to
- the beginning of the buffer, and end points to the first character after
- its end. We store a newline in *end to act as a sentinel, so end had
- better point somewhere valid. Newline is a flag indicating whether to
- allow newlines to be in the matching string. If count is non-
- NULL it points to a place we're supposed to increment every time we
- see a newline. Finally, if backref is non-NULL it points to a place
- where we're supposed to store a 1 if backreferencing happened and the
- match needs to be verified by a backtracking matcher. Otherwise
- we store a 0 in *backref. */
-char *
-dfaexec(d, begin, end, newline, count, backref)
- struct dfa *d;
- char *begin;
- char *end;
- int newline;
- int *count;
- int *backref;
-{
- register s, s1, tmp; /* Current state. */
- register unsigned char *p; /* Current input character. */
- register **trans, *t; /* Copy of d->trans so it can be optimized
- into a register. */
- static sbit[NOTCHAR]; /* Table for anding with d->success. */
- static sbit_init;
-
- if (! sbit_init)
- {
- int i;
-
- sbit_init = 1;
- for (i = 0; i < NOTCHAR; ++i)
- if (i == '\n')
- sbit[i] = 4;
- else if (ISALNUM(i))
- sbit[i] = 2;
- else
- sbit[i] = 1;
- }
-
- if (! d->tralloc)
- build_state_zero(d);
-
- s = s1 = 0;
- p = (unsigned char *) begin;
- trans = d->trans;
- *end = '\n';
-
- for (;;)
- {
- /* The dreaded inner loop. */
- if ((t = trans[s]) != 0)
- do
- {
- s1 = t[*p++];
- if (! (t = trans[s1]))
- goto last_was_s;
- s = t[*p++];
- }
- while ((t = trans[s]) != 0);
- goto last_was_s1;
- last_was_s:
- tmp = s, s = s1, s1 = tmp;
- last_was_s1:
-
- if (s >= 0 && p <= (unsigned char *) end && d->fails[s])
- {
- if (d->success[s] & sbit[*p])
- {
- if (backref)
- if (d->states[s].backref)
- *backref = 1;
- else
- *backref = 0;
- return (char *) p;
- }
-
- s1 = s;
- s = d->fails[s][*p++];
- continue;
- }
-
- /* If the previous character was a newline, count it. */
- if (count && (char *) p <= end && p[-1] == '\n')
- ++*count;
-
- /* Check if we've run off the end of the buffer. */
- if ((char *) p > end)
- return NULL;
-
- if (s >= 0)
- {
- build_state(s, d);
- trans = d->trans;
- continue;
- }
-
- if (p[-1] == '\n' && newline)
- {
- s = d->newlines[s1];
- continue;
- }
-
- s = 0;
- }
-}
-
-/* Initialize the components of a dfa that the other routines don't
- initialize for themselves. */
-void
-dfainit(d)
- struct dfa *d;
-{
- d->calloc = 1;
- MALLOC(d->charclasses, charclass, d->calloc);
- d->cindex = 0;
-
- d->talloc = 1;
- MALLOC(d->tokens, token, d->talloc);
- d->tindex = d->depth = d->nleaves = d->nregexps = 0;
-
- d->searchflag = 0;
- d->tralloc = 0;
-
- d->musts = 0;
-}
-
-/* Parse and analyze a single string of the given length. */
-void
-dfacomp(s, len, d, searchflag)
- char *s;
- size_t len;
- struct dfa *d;
- int searchflag;
-{
- if (case_fold) /* dummy folding in service of dfamust() */
- {
- char *lcopy;
- int i;
-
- lcopy = malloc(len);
- if (!lcopy)
- dfaerror("out of memory");
-
- /* This is a kludge. */
- case_fold = 0;
- for (i = 0; i < len; ++i)
- if (ISUPPER(s[i]))
- lcopy[i] = tolower(s[i]);
- else
- lcopy[i] = s[i];
-
- dfainit(d);
- dfaparse(lcopy, len, d);
- free(lcopy);
- dfamust(d);
- d->cindex = d->tindex = d->depth = d->nleaves = d->nregexps = 0;
- case_fold = 1;
- dfaparse(s, len, d);
- dfaanalyze(d, searchflag);
- }
- else
- {
- dfainit(d);
- dfaparse(s, len, d);
- dfamust(d);
- dfaanalyze(d, searchflag);
- }
-}
-
-/* Free the storage held by the components of a dfa. */
-void
-dfafree(d)
- struct dfa *d;
-{
- int i;
- struct dfamust *dm, *ndm;
-
- free((ptr_t) d->charclasses);
- free((ptr_t) d->tokens);
- for (i = 0; i < d->sindex; ++i)
- free((ptr_t) d->states[i].elems.elems);
- free((ptr_t) d->states);
- for (i = 0; i < d->tindex; ++i)
- if (d->follows[i].elems)
- free((ptr_t) d->follows[i].elems);
- free((ptr_t) d->follows);
- for (i = 0; i < d->tralloc; ++i)
- if (d->trans[i])
- free((ptr_t) d->trans[i]);
- else if (d->fails[i])
- free((ptr_t) d->fails[i]);
- if (d->realtrans) free((ptr_t) d->realtrans);
- if (d->fails) free((ptr_t) d->fails);
- if (d->newlines) free((ptr_t) d->newlines);
- for (dm = d->musts; dm; dm = ndm)
- {
- ndm = dm->next;
- free(dm->must);
- free((ptr_t) dm);
- }
-}
-
-/* Having found the postfix representation of the regular expression,
- try to find a long sequence of characters that must appear in any line
- containing the r.e.
- Finding a "longest" sequence is beyond the scope here;
- we take an easy way out and hope for the best.
- (Take "(ab|a)b"--please.)
-
- We do a bottom-up calculation of sequences of characters that must appear
- in matches of r.e.'s represented by trees rooted at the nodes of the postfix
- representation:
- sequences that must appear at the left of the match ("left")
- sequences that must appear at the right of the match ("right")
- lists of sequences that must appear somewhere in the match ("in")
- sequences that must constitute the match ("is")
-
- When we get to the root of the tree, we use one of the longest of its
- calculated "in" sequences as our answer. The sequence we find is returned in
- d->must (where "d" is the single argument passed to "dfamust");
- the length of the sequence is returned in d->mustn.
-
- The sequences calculated for the various types of node (in pseudo ANSI c)
- are shown below. "p" is the operand of unary operators (and the left-hand
- operand of binary operators); "q" is the right-hand operand of binary
- operators.
-
- "ZERO" means "a zero-length sequence" below.
-
- Type left right is in
- ---- ---- ----- -- --
- char c # c # c # c # c
-
- CSET ZERO ZERO ZERO ZERO
-
- STAR ZERO ZERO ZERO ZERO
-
- QMARK ZERO ZERO ZERO ZERO
-
- PLUS p->left p->right ZERO p->in
-
- CAT (p->is==ZERO)? (q->is==ZERO)? (p->is!=ZERO && p->in plus
- p->left : q->right : q->is!=ZERO) ? q->in plus
- p->is##q->left p->right##q->is p->is##q->is : p->right##q->left
- ZERO
-
- OR longest common longest common (do p->is and substrings common to
- leading trailing q->is have same p->in and q->in
- (sub)sequence (sub)sequence length and
- of p->left of p->right content) ?
- and q->left and q->right p->is : NULL
-
- If there's anything else we recognize in the tree, all four sequences get set
- to zero-length sequences. If there's something we don't recognize in the tree,
- we just return a zero-length sequence.
-
- Break ties in favor of infrequent letters (choosing 'zzz' in preference to
- 'aaa')?
-
- And. . .is it here or someplace that we might ponder "optimizations" such as
- egrep 'psi|epsilon' -> egrep 'psi'
- egrep 'pepsi|epsilon' -> egrep 'epsi'
- (Yes, we now find "epsi" as a "string
- that must occur", but we might also
- simplify the *entire* r.e. being sought)
- grep '[c]' -> grep 'c'
- grep '(ab|a)b' -> grep 'ab'
- grep 'ab*' -> grep 'a'
- grep 'a*b' -> grep 'b'
-
- There are several issues:
-
- Is optimization easy (enough)?
-
- Does optimization actually accomplish anything,
- or is the automaton you get from "psi|epsilon" (for example)
- the same as the one you get from "psi" (for example)?
-
- Are optimizable r.e.'s likely to be used in real-life situations
- (something like 'ab*' is probably unlikely; something like is
- 'psi|epsilon' is likelier)? */
-
-static char *
-icatalloc(old, new)
- char *old;
- char *new;
-{
- char *result;
- size_t oldsize, newsize;
-
- newsize = (new == NULL) ? 0 : strlen(new);
- if (old == NULL)
- oldsize = 0;
- else if (newsize == 0)
- return old;
- else oldsize = strlen(old);
- if (old == NULL)
- result = (char *) malloc(newsize + 1);
- else
- result = (char *) realloc((void *) old, oldsize + newsize + 1);
- if (result != NULL && new != NULL)
- (void) strcpy(result + oldsize, new);
- return result;
-}
-
-static char *
-icpyalloc(string)
- char *string;
-{
- return icatalloc((char *) NULL, string);
-}
-
-static char *
-istrstr(lookin, lookfor)
- char *lookin;
- char *lookfor;
-{
- char *cp;
- size_t len;
-
- len = strlen(lookfor);
- for (cp = lookin; *cp != '\0'; ++cp)
- if (strncmp(cp, lookfor, len) == 0)
- return cp;
- return NULL;
-}
-
-static void
-ifree(cp)
- char *cp;
-{
- if (cp != NULL)
- free(cp);
-}
-
-static void
-freelist(cpp)
- char **cpp;
-{
- int i;
-
- if (cpp == NULL)
- return;
- for (i = 0; cpp[i] != NULL; ++i)
- {
- free(cpp[i]);
- cpp[i] = NULL;
- }
-}
-
-static char **
-enlist(cpp, new, len)
- char **cpp;
- char *new;
- size_t len;
-{
- int i, j;
-
- if (cpp == NULL)
- return NULL;
- if ((new = icpyalloc(new)) == NULL)
- {
- freelist(cpp);
- return NULL;
- }
- new[len] = '\0';
- /* Is there already something in the list that's new (or longer)? */
- for (i = 0; cpp[i] != NULL; ++i)
- if (istrstr(cpp[i], new) != NULL)
- {
- free(new);
- return cpp;
- }
- /* Eliminate any obsoleted strings. */
- j = 0;
- while (cpp[j] != NULL)
- if (istrstr(new, cpp[j]) == NULL)
- ++j;
- else
- {
- free(cpp[j]);
- if (--i == j)
- break;
- cpp[j] = cpp[i];
- cpp[i] = NULL;
- }
- /* Add the new string. */
- cpp = (char **) realloc((char *) cpp, (i + 2) * sizeof *cpp);
- if (cpp == NULL)
- return NULL;
- cpp[i] = new;
- cpp[i + 1] = NULL;
- return cpp;
-}
-
-/* Given pointers to two strings, return a pointer to an allocated
- list of their distinct common substrings. Return NULL if something
- seems wild. */
-static char **
-comsubs(left, right)
- char *left;
- char *right;
-{
- char **cpp;
- char *lcp;
- char *rcp;
- size_t i, len;
-
- if (left == NULL || right == NULL)
- return NULL;
- cpp = (char **) malloc(sizeof *cpp);
- if (cpp == NULL)
- return NULL;
- cpp[0] = NULL;
- for (lcp = left; *lcp != '\0'; ++lcp)
- {
- len = 0;
- rcp = index(right, *lcp);
- while (rcp != NULL)
- {
- for (i = 1; lcp[i] != '\0' && lcp[i] == rcp[i]; ++i)
- continue;
- if (i > len)
- len = i;
- rcp = index(rcp + 1, *lcp);
- }
- if (len == 0)
- continue;
- if ((cpp = enlist(cpp, lcp, len)) == NULL)
- break;
- }
- return cpp;
-}
-
-static char **
-addlists(old, new)
-char **old;
-char **new;
-{
- int i;
-
- if (old == NULL || new == NULL)
- return NULL;
- for (i = 0; new[i] != NULL; ++i)
- {
- old = enlist(old, new[i], strlen(new[i]));
- if (old == NULL)
- break;
- }
- return old;
-}
-
-/* Given two lists of substrings, return a new list giving substrings
- common to both. */
-static char **
-inboth(left, right)
- char **left;
- char **right;
-{
- char **both;
- char **temp;
- int lnum, rnum;
-
- if (left == NULL || right == NULL)
- return NULL;
- both = (char **) malloc(sizeof *both);
- if (both == NULL)
- return NULL;
- both[0] = NULL;
- for (lnum = 0; left[lnum] != NULL; ++lnum)
- {
- for (rnum = 0; right[rnum] != NULL; ++rnum)
- {
- temp = comsubs(left[lnum], right[rnum]);
- if (temp == NULL)
- {
- freelist(both);
- return NULL;
- }
- both = addlists(both, temp);
- freelist(temp);
- if (both == NULL)
- return NULL;
- }
- }
- return both;
-}
-
-typedef struct
-{
- char **in;
- char *left;
- char *right;
- char *is;
-} must;
-
-static void
-resetmust(mp)
-must *mp;
-{
- mp->left[0] = mp->right[0] = mp->is[0] = '\0';
- freelist(mp->in);
-}
-
-static void
-dfamust(dfa)
-struct dfa *dfa;
-{
- must *musts;
- must *mp;
- char *result;
- int ri;
- int i;
- int exact;
- token t;
- static must must0;
- struct dfamust *dm;
- static char empty_string[] = "";
-
- result = empty_string;
- exact = 0;
- musts = (must *) malloc((dfa->tindex + 1) * sizeof *musts);
- if (musts == NULL)
- return;
- mp = musts;
- for (i = 0; i <= dfa->tindex; ++i)
- mp[i] = must0;
- for (i = 0; i <= dfa->tindex; ++i)
- {
- mp[i].in = (char **) malloc(sizeof *mp[i].in);
- mp[i].left = malloc(2);
- mp[i].right = malloc(2);
- mp[i].is = malloc(2);
- if (mp[i].in == NULL || mp[i].left == NULL ||
- mp[i].right == NULL || mp[i].is == NULL)
- goto done;
- mp[i].left[0] = mp[i].right[0] = mp[i].is[0] = '\0';
- mp[i].in[0] = NULL;
- }
-#ifdef DEBUG
- fprintf(stderr, "dfamust:\n");
- for (i = 0; i < dfa->tindex; ++i)
- {
- fprintf(stderr, " %d:", i);
- prtok(dfa->tokens[i]);
- }
- putc('\n', stderr);
-#endif
- for (ri = 0; ri < dfa->tindex; ++ri)
- {
- switch (t = dfa->tokens[ri])
- {
- case LPAREN:
- case RPAREN:
- goto done; /* "cannot happen" */
- case EMPTY:
- case BEGLINE:
- case ENDLINE:
- case BEGWORD:
- case ENDWORD:
- case LIMWORD:
- case NOTLIMWORD:
- case BACKREF:
- resetmust(mp);
- break;
- case STAR:
- case QMARK:
- if (mp <= musts)
- goto done; /* "cannot happen" */
- --mp;
- resetmust(mp);
- break;
- case OR:
- case ORTOP:
- if (mp < &musts[2])
- goto done; /* "cannot happen" */
- {
- char **new;
- must *lmp;
- must *rmp;
- int j, ln, rn, n;
-
- rmp = --mp;
- lmp = --mp;
- /* Guaranteed to be. Unlikely, but. . . */
- if (strcmp(lmp->is, rmp->is) != 0)
- lmp->is[0] = '\0';
- /* Left side--easy */
- i = 0;
- while (lmp->left[i] != '\0' && lmp->left[i] == rmp->left[i])
- ++i;
- lmp->left[i] = '\0';
- /* Right side */
- ln = strlen(lmp->right);
- rn = strlen(rmp->right);
- n = ln;
- if (n > rn)
- n = rn;
- for (i = 0; i < n; ++i)
- if (lmp->right[ln - i - 1] != rmp->right[rn - i - 1])
- break;
- for (j = 0; j < i; ++j)
- lmp->right[j] = lmp->right[(ln - i) + j];
- lmp->right[j] = '\0';
- new = inboth(lmp->in, rmp->in);
- if (new == NULL)
- goto done;
- freelist(lmp->in);
- free((char *) lmp->in);
- lmp->in = new;
- }
- break;
- case PLUS:
- if (mp <= musts)
- goto done; /* "cannot happen" */
- --mp;
- mp->is[0] = '\0';
- break;
- case END:
- if (mp != &musts[1])
- goto done; /* "cannot happen" */
- for (i = 0; musts[0].in[i] != NULL; ++i)
- if (strlen(musts[0].in[i]) > strlen(result))
- result = musts[0].in[i];
- if (strcmp(result, musts[0].is) == 0)
- exact = 1;
- goto done;
- case CAT:
- if (mp < &musts[2])
- goto done; /* "cannot happen" */
- {
- must *lmp;
- must *rmp;
-
- rmp = --mp;
- lmp = --mp;
- /* In. Everything in left, plus everything in
- right, plus catenation of
- left's right and right's left. */
- lmp->in = addlists(lmp->in, rmp->in);
- if (lmp->in == NULL)
- goto done;
- if (lmp->right[0] != '\0' &&
- rmp->left[0] != '\0')
- {
- char *tp;
-
- tp = icpyalloc(lmp->right);
- if (tp == NULL)
- goto done;
- tp = icatalloc(tp, rmp->left);
- if (tp == NULL)
- goto done;
- lmp->in = enlist(lmp->in, tp,
- strlen(tp));
- free(tp);
- if (lmp->in == NULL)
- goto done;
- }
- /* Left-hand */
- if (lmp->is[0] != '\0')
- {
- lmp->left = icatalloc(lmp->left,
- rmp->left);
- if (lmp->left == NULL)
- goto done;
- }
- /* Right-hand */
- if (rmp->is[0] == '\0')
- lmp->right[0] = '\0';
- lmp->right = icatalloc(lmp->right, rmp->right);
- if (lmp->right == NULL)
- goto done;
- /* Guaranteed to be */
- if (lmp->is[0] != '\0' && rmp->is[0] != '\0')
- {
- lmp->is = icatalloc(lmp->is, rmp->is);
- if (lmp->is == NULL)
- goto done;
- }
- else
- lmp->is[0] = '\0';
- }
- break;
- default:
- if (t < END)
- {
- /* "cannot happen" */
- goto done;
- }
- else if (t == '\0')
- {
- /* not on *my* shift */
- goto done;
- }
- else if (t >= CSET)
- {
- /* easy enough */
- resetmust(mp);
- }
- else
- {
- /* plain character */
- resetmust(mp);
- mp->is[0] = mp->left[0] = mp->right[0] = t;
- mp->is[1] = mp->left[1] = mp->right[1] = '\0';
- mp->in = enlist(mp->in, mp->is, (size_t)1);
- if (mp->in == NULL)
- goto done;
- }
- break;
- }
-#ifdef DEBUG
- fprintf(stderr, " node: %d:", ri);
- prtok(dfa->tokens[ri]);
- fprintf(stderr, "\n in:");
- for (i = 0; mp->in[i]; ++i)
- fprintf(stderr, " \"%s\"", mp->in[i]);
- fprintf(stderr, "\n is: \"%s\"\n", mp->is);
- fprintf(stderr, " left: \"%s\"\n", mp->left);
- fprintf(stderr, " right: \"%s\"\n", mp->right);
-#endif
- ++mp;
- }
- done:
- if (strlen(result))
- {
- dm = (struct dfamust *) malloc(sizeof (struct dfamust));
- dm->exact = exact;
- dm->must = malloc(strlen(result) + 1);
- strcpy(dm->must, result);
- dm->next = dfa->musts;
- dfa->musts = dm;
- }
- mp = musts;
- for (i = 0; i <= dfa->tindex; ++i)
- {
- freelist(mp[i].in);
- ifree((char *) mp[i].in);
- ifree(mp[i].left);
- ifree(mp[i].right);
- ifree(mp[i].is);
- }
- free((char *) mp);
-}
diff --git a/gnu/usr.bin/awk/dfa.h b/gnu/usr.bin/awk/dfa.h
deleted file mode 100644
index cc27d7a..0000000
--- a/gnu/usr.bin/awk/dfa.h
+++ /dev/null
@@ -1,360 +0,0 @@
-/* dfa.h - declarations for GNU deterministic regexp compiler
- Copyright (C) 1988 Free Software Foundation, Inc.
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2, or (at your option)
- any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
-
-/* Written June, 1988 by Mike Haertel */
-
-/* FIXME:
- 2. We should not export so much of the DFA internals.
- In addition to clobbering modularity, we eat up valuable
- name space. */
-
-/* Number of bits in an unsigned char. */
-#define CHARBITS 8
-
-/* First integer value that is greater than any character code. */
-#define NOTCHAR (1 << CHARBITS)
-
-/* INTBITS need not be exact, just a lower bound. */
-#define INTBITS (CHARBITS * sizeof (int))
-
-/* Number of ints required to hold a bit for every character. */
-#define CHARCLASS_INTS ((NOTCHAR + INTBITS - 1) / INTBITS)
-
-/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
-typedef int charclass[CHARCLASS_INTS];
-
-/* The regexp is parsed into an array of tokens in postfix form. Some tokens
- are operators and others are terminal symbols. Most (but not all) of these
- codes are returned by the lexical analyzer. */
-
-typedef enum
-{
- END = -1, /* END is a terminal symbol that matches the
- end of input; any value of END or less in
- the parse tree is such a symbol. Accepting
- states of the DFA are those that would have
- a transition on END. */
-
- /* Ordinary character values are terminal symbols that match themselves. */
-
- EMPTY = NOTCHAR, /* EMPTY is a terminal symbol that matches
- the empty string. */
-
- BACKREF, /* BACKREF is generated by \<digit>; it
- it not completely handled. If the scanner
- detects a transition on backref, it returns
- a kind of "semi-success" indicating that
- the match will have to be verified with
- a backtracking matcher. */
-
- BEGLINE, /* BEGLINE is a terminal symbol that matches
- the empty string if it is at the beginning
- of a line. */
-
- ENDLINE, /* ENDLINE is a terminal symbol that matches
- the empty string if it is at the end of
- a line. */
-
- BEGWORD, /* BEGWORD is a terminal symbol that matches
- the empty string if it is at the beginning
- of a word. */
-
- ENDWORD, /* ENDWORD is a terminal symbol that matches
- the empty string if it is at the end of
- a word. */
-
- LIMWORD, /* LIMWORD is a terminal symbol that matches
- the empty string if it is at the beginning
- or the end of a word. */
-
- NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that
- matches the empty string if it is not at
- the beginning or end of a word. */
-
- QMARK, /* QMARK is an operator of one argument that
- matches zero or one occurences of its
- argument. */
-
- STAR, /* STAR is an operator of one argument that
- matches the Kleene closure (zero or more
- occurrences) of its argument. */
-
- PLUS, /* PLUS is an operator of one argument that
- matches the positive closure (one or more
- occurrences) of its argument. */
-
- REPMN, /* REPMN is a lexical token corresponding
- to the {m,n} construct. REPMN never
- appears in the compiled token vector. */
-
- CAT, /* CAT is an operator of two arguments that
- matches the concatenation of its
- arguments. CAT is never returned by the
- lexical analyzer. */
-
- OR, /* OR is an operator of two arguments that
- matches either of its arguments. */
-
- ORTOP, /* OR at the toplevel in the parse tree.
- This is used for a boyer-moore heuristic. */
-
- LPAREN, /* LPAREN never appears in the parse tree,
- it is only a lexeme. */
-
- RPAREN, /* RPAREN never appears in the parse tree. */
-
- CSET /* CSET and (and any value greater) is a
- terminal symbol that matches any of a
- class of characters. */
-} token;
-
-/* Sets are stored in an array in the compiled dfa; the index of the
- array corresponding to a given set token is given by SET_INDEX(t). */
-#define SET_INDEX(t) ((t) - CSET)
-
-/* Sometimes characters can only be matched depending on the surrounding
- context. Such context decisions depend on what the previous character
- was, and the value of the current (lookahead) character. Context
- dependent constraints are encoded as 8 bit integers. Each bit that
- is set indicates that the constraint succeeds in the corresponding
- context.
-
- bit 7 - previous and current are newlines
- bit 6 - previous was newline, current isn't
- bit 5 - previous wasn't newline, current is
- bit 4 - neither previous nor current is a newline
- bit 3 - previous and current are word-constituents
- bit 2 - previous was word-constituent, current isn't
- bit 1 - previous wasn't word-constituent, current is
- bit 0 - neither previous nor current is word-constituent
-
- Word-constituent characters are those that satisfy isalnum().
-
- The macro SUCCEEDS_IN_CONTEXT determines whether a a given constraint
- succeeds in a particular context. Prevn is true if the previous character
- was a newline, currn is true if the lookahead character is a newline.
- Prevl and currl similarly depend upon whether the previous and current
- characters are word-constituent letters. */
-#define MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
- ((constraint) & 1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4))
-#define MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
- ((constraint) & 1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0)))
-#define SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
- (MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
- && MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
-
-/* The following macros give information about what a constraint depends on. */
-#define PREV_NEWLINE_DEPENDENT(constraint) \
- (((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
-#define PREV_LETTER_DEPENDENT(constraint) \
- (((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
-
-/* Tokens that match the empty string subject to some constraint actually
- work by applying that constraint to determine what may follow them,
- taking into account what has gone before. The following values are
- the constraints corresponding to the special tokens previously defined. */
-#define NO_CONSTRAINT 0xff
-#define BEGLINE_CONSTRAINT 0xcf
-#define ENDLINE_CONSTRAINT 0xaf
-#define BEGWORD_CONSTRAINT 0xf2
-#define ENDWORD_CONSTRAINT 0xf4
-#define LIMWORD_CONSTRAINT 0xf6
-#define NOTLIMWORD_CONSTRAINT 0xf9
-
-/* States of the recognizer correspond to sets of positions in the parse
- tree, together with the constraints under which they may be matched.
- So a position is encoded as an index into the parse tree together with
- a constraint. */
-typedef struct
-{
- unsigned index; /* Index into the parse array. */
- unsigned constraint; /* Constraint for matching this position. */
-} position;
-
-/* Sets of positions are stored as arrays. */
-typedef struct
-{
- position *elems; /* Elements of this position set. */
- int nelem; /* Number of elements in this set. */
-} position_set;
-
-/* A state of the dfa consists of a set of positions, some flags,
- and the token value of the lowest-numbered position of the state that
- contains an END token. */
-typedef struct
-{
- int hash; /* Hash of the positions of this state. */
- position_set elems; /* Positions this state could match. */
- char newline; /* True if previous state matched newline. */
- char letter; /* True if previous state matched a letter. */
- char backref; /* True if this state matches a \<digit>. */
- unsigned char constraint; /* Constraint for this state to accept. */
- int first_end; /* Token value of the first END in elems. */
-} dfa_state;
-
-/* Element of a list of strings, at least one of which is known to
- appear in any R.E. matching the DFA. */
-struct dfamust
-{
- int exact;
- char *must;
- struct dfamust *next;
-};
-
-/* A compiled regular expression. */
-struct dfa
-{
- /* Stuff built by the scanner. */
- charclass *charclasses; /* Array of character sets for CSET tokens. */
- int cindex; /* Index for adding new charclasses. */
- int calloc; /* Number of charclasses currently allocated. */
-
- /* Stuff built by the parser. */
- token *tokens; /* Postfix parse array. */
- int tindex; /* Index for adding new tokens. */
- int talloc; /* Number of tokens currently allocated. */
- int depth; /* Depth required of an evaluation stack
- used for depth-first traversal of the
- parse tree. */
- int nleaves; /* Number of leaves on the parse tree. */
- int nregexps; /* Count of parallel regexps being built
- with dfaparse(). */
-
- /* Stuff owned by the state builder. */
- dfa_state *states; /* States of the dfa. */
- int sindex; /* Index for adding new states. */
- int salloc; /* Number of states currently allocated. */
-
- /* Stuff built by the structure analyzer. */
- position_set *follows; /* Array of follow sets, indexed by position
- index. The follow of a position is the set
- of positions containing characters that
- could conceivably follow a character
- matching the given position in a string
- matching the regexp. Allocated to the
- maximum possible position index. */
- int searchflag; /* True if we are supposed to build a searching
- as opposed to an exact matcher. A searching
- matcher finds the first and shortest string
- matching a regexp anywhere in the buffer,
- whereas an exact matcher finds the longest
- string matching, but anchored to the
- beginning of the buffer. */
-
- /* Stuff owned by the executor. */
- int tralloc; /* Number of transition tables that have
- slots so far. */
- int trcount; /* Number of transition tables that have
- actually been built. */
- int **trans; /* Transition tables for states that can
- never accept. If the transitions for a
- state have not yet been computed, or the
- state could possibly accept, its entry in
- this table is NULL. */
- int **realtrans; /* Trans always points to realtrans + 1; this
- is so trans[-1] can contain NULL. */
- int **fails; /* Transition tables after failing to accept
- on a state that potentially could do so. */
- int *success; /* Table of acceptance conditions used in
- dfaexec and computed in build_state. */
- int *newlines; /* Transitions on newlines. The entry for a
- newline in any transition table is always
- -1 so we can count lines without wasting
- too many cycles. The transition for a
- newline is stored separately and handled
- as a special case. Newline is also used
- as a sentinel at the end of the buffer. */
- struct dfamust *musts; /* List of strings, at least one of which
- is known to appear in any r.e. matching
- the dfa. */
-};
-
-/* Some macros for user access to dfa internals. */
-
-/* ACCEPTING returns true if s could possibly be an accepting state of r. */
-#define ACCEPTING(s, r) ((r).states[s].constraint)
-
-/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
- specified context. */
-#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, dfa) \
- SUCCEEDS_IN_CONTEXT((dfa).states[state].constraint, \
- prevn, currn, prevl, currl)
-
-/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
- regexps that a given state could accept. Parallel regexps are numbered
- starting at 1. */
-#define FIRST_MATCHING_REGEXP(state, dfa) (-(dfa).states[state].first_end)
-
-/* Entry points. */
-
-#ifdef __STDC__
-
-/* dfasyntax() takes two arguments; the first sets the syntax bits described
- earlier in this file, and the second sets the case-folding flag. */
-extern void dfasyntax(reg_syntax_t, int);
-
-/* Compile the given string of the given length into the given struct dfa.
- Final argument is a flag specifying whether to build a searching or an
- exact matcher. */
-extern void dfacomp(char *, size_t, struct dfa *, int);
-
-/* Execute the given struct dfa on the buffer of characters. The
- first char * points to the beginning, and the second points to the
- first character after the end of the buffer, which must be a writable
- place so a sentinel end-of-buffer marker can be stored there. The
- second-to-last argument is a flag telling whether to allow newlines to
- be part of a string matching the regexp. The next-to-last argument,
- if non-NULL, points to a place to increment every time we see a
- newline. The final argument, if non-NULL, points to a flag that will
- be set if further examination by a backtracking matcher is needed in
- order to verify backreferencing; otherwise the flag will be cleared.
- Returns NULL if no match is found, or a pointer to the first
- character after the first & shortest matching string in the buffer. */
-extern char *dfaexec(struct dfa *, char *, char *, int, int *, int *);
-
-/* Free the storage held by the components of a struct dfa. */
-extern void dfafree(struct dfa *);
-
-/* Entry points for people who know what they're doing. */
-
-/* Initialize the components of a struct dfa. */
-extern void dfainit(struct dfa *);
-
-/* Incrementally parse a string of given length into a struct dfa. */
-extern void dfaparse(char *, size_t, struct dfa *);
-
-/* Analyze a parsed regexp; second argument tells whether to build a searching
- or an exact matcher. */
-extern void dfaanalyze(struct dfa *, int);
-
-/* Compute, for each possible character, the transitions out of a given
- state, storing them in an array of integers. */
-extern void dfastate(int, struct dfa *, int []);
-
-/* Error handling. */
-
-/* dfaerror() is called by the regexp routines whenever an error occurs. It
- takes a single argument, a NUL-terminated string describing the error.
- The default dfaerror() prints the error message to stderr and exits.
- The user can provide a different dfafree() if so desired. */
-extern void dfaerror(const char *);
-
-#else /* ! __STDC__ */
-extern void dfasyntax(), dfacomp(), dfafree(), dfainit(), dfaparse();
-extern void dfaanalyze(), dfastate(), dfaerror();
-extern char *dfaexec();
-#endif /* ! __STDC__ */
diff --git a/gnu/usr.bin/awk/doc/gawk.texi b/gnu/usr.bin/awk/doc/gawk.texi
deleted file mode 100644
index b280262..0000000
--- a/gnu/usr.bin/awk/doc/gawk.texi
+++ /dev/null
@@ -1,11270 +0,0 @@
-\input texinfo @c -*-texinfo-*-
-@c %**start of header (This is for running Texinfo on a region.)
-@setfilename gawk.info
-@settitle The GAWK Manual
-@c @smallbook
-@c %**end of header (This is for running Texinfo on a region.)
-
-@ifinfo
-@synindex fn cp
-@synindex vr cp
-@end ifinfo
-@iftex
-@syncodeindex fn cp
-@syncodeindex vr cp
-@end iftex
-
-@c If "finalout" is commented out, the printed output will show
-@c black boxes that mark lines that are too long. Thus, it is
-@c unwise to comment it out when running a master in case there are
-@c overfulls which are deemed okay.
-
-@iftex
-@finalout
-@end iftex
-
-@c ===> NOTE! <==
-@c Determine the edition number in *four* places by hand:
-@c 1. First ifinfo section 2. title page 3. copyright page 4. top node
-@c To find the locations, search for !!set
-
-@ifinfo
-This file documents @code{awk}, a program that you can use to select
-particular records in a file and perform operations upon them.
-
-This is Edition 0.15 of @cite{The GAWK Manual}, @*
-for the 2.15 version of the GNU implementation @*
-of AWK.
-
-Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-@ignore
-Permission is granted to process this file through TeX and print the
-results, provided the printed document carries copying permission
-notice identical to this one except for the removal of this paragraph
-(this paragraph not being relevant to the printed manual).
-
-@end ignore
-Permission is granted to copy and distribute modified versions of this
-manual under the conditions for verbatim copying, provided that the entire
-resulting derived work is distributed under the terms of a permission
-notice identical to this one.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for modified versions,
-except that this permission notice may be stated in a translation approved
-by the Foundation.
-@end ifinfo
-
-@setchapternewpage odd
-
-@c !!set edition, date, version
-@titlepage
-@title The GAWK Manual
-@subtitle Edition 0.15
-@subtitle April 1993
-@author Diane Barlow Close
-@author Arnold D. Robbins
-@author Paul H. Rubin
-@author Richard Stallman
-
-@c Include the Distribution inside the titlepage environment so
-@c that headings are turned off. Headings on and off do not work.
-
-@page
-@vskip 0pt plus 1filll
-Copyright @copyright{} 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
-@sp 2
-
-@c !!set edition, date, version
-This is Edition 0.15 of @cite{The GAWK Manual}, @*
-for the 2.15 version of the GNU implementation @*
-of AWK.
-
-@sp 2
-Published by the Free Software Foundation @*
-675 Massachusetts Avenue @*
-Cambridge, MA 02139 USA @*
-Printed copies are available for $20 each.
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-Permission is granted to copy and distribute modified versions of this
-manual under the conditions for verbatim copying, provided that the entire
-resulting derived work is distributed under the terms of a permission
-notice identical to this one.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for modified versions,
-except that this permission notice may be stated in a translation approved
-by the Foundation.
-@end titlepage
-
-@ifinfo
-@node Top, Preface, (dir), (dir)
-@comment node-name, next, previous, up
-@top General Introduction
-@c Preface or Licensing nodes should come right after the Top
-@c node, in `unnumbered' sections, then the chapter, `What is gawk'.
-
-This file documents @code{awk}, a program that you can use to select
-particular records in a file and perform operations upon them.
-
-@c !!set edition, date, version
-This is Edition 0.15 of @cite{The GAWK Manual}, @*
-for the 2.15 version of the GNU implementation @*
-of AWK.
-
-@end ifinfo
-
-@menu
-* Preface:: What you can do with @code{awk}; brief history
- and acknowledgements.
-* Copying:: Your right to copy and distribute @code{gawk}.
-* This Manual:: Using this manual.
- Includes sample input files that you can use.
-* Getting Started:: A basic introduction to using @code{awk}.
- How to run an @code{awk} program.
- Command line syntax.
-* Reading Files:: How to read files and manipulate fields.
-* Printing:: How to print using @code{awk}. Describes the
- @code{print} and @code{printf} statements.
- Also describes redirection of output.
-* One-liners:: Short, sample @code{awk} programs.
-* Patterns:: The various types of patterns
- explained in detail.
-* Actions:: The various types of actions are
- introduced here. Describes
- expressions and the various operators in
- detail. Also describes comparison expressions.
-* Expressions:: Expressions are the basic building
- blocks of statements.
-* Statements:: The various control statements are
- described in detail.
-* Arrays:: The description and use of arrays.
- Also includes array-oriented control
- statements.
-* Built-in:: The built-in functions are summarized here.
-* User-defined:: User-defined functions are described in detail.
-* Built-in Variables:: Built-in Variables
-* Command Line:: How to run @code{gawk}.
-* Language History:: The evolution of the @code{awk} language.
-* Installation:: Installing @code{gawk} under
- various operating systems.
-* Gawk Summary:: @code{gawk} Options and Language Summary.
-* Sample Program:: A sample @code{awk} program with a
- complete explanation.
-* Bugs:: Reporting Problems and Bugs.
-* Notes:: Something about the
- implementation of @code{gawk}.
-* Glossary:: An explanation of some unfamiliar terms.
-* Index::
-@end menu
-
-@node Preface, Copying, Top, Top
-@comment node-name, next, previous, up
-@unnumbered Preface
-
-@iftex
-@cindex what is @code{awk}
-@end iftex
-If you are like many computer users, you would frequently like to make
-changes in various text files wherever certain patterns appear, or
-extract data from parts of certain lines while discarding the rest. To
-write a program to do this in a language such as C or Pascal is a
-time-consuming inconvenience that may take many lines of code. The job
-may be easier with @code{awk}.
-
-The @code{awk} utility interprets a special-purpose programming language
-that makes it possible to handle simple data-reformatting jobs easily
-with just a few lines of code.
-
-The GNU implementation of @code{awk} is called @code{gawk}; it is fully
-upward compatible with the System V Release 4 version of
-@code{awk}. @code{gawk} is also upward compatible with the @sc{posix}
-(draft) specification of the @code{awk} language. This means that all
-properly written @code{awk} programs should work with @code{gawk}.
-Thus, we usually don't distinguish between @code{gawk} and other @code{awk}
-implementations in this manual.@refill
-
-@cindex uses of @code{awk}
-This manual teaches you what @code{awk} does and how you can use
-@code{awk} effectively. You should already be familiar with basic
-system commands such as @code{ls}. Using @code{awk} you can: @refill
-
-@itemize @bullet
-@item
-manage small, personal databases
-
-@item
-generate reports
-
-@item
-validate data
-@item
-produce indexes, and perform other document preparation tasks
-
-@item
-even experiment with algorithms that can be adapted later to other computer
-languages
-@end itemize
-
-@iftex
-This manual has the difficult task of being both tutorial and reference.
-If you are a novice, feel free to skip over details that seem too complex.
-You should also ignore the many cross references; they are for the
-expert user, and for the on-line Info version of the manual.
-@end iftex
-
-@menu
-* History:: The history of @code{gawk} and
- @code{awk}. Acknowledgements.
-@end menu
-
-@node History, , Preface, Preface
-@comment node-name, next, previous, up
-@unnumberedsec History of @code{awk} and @code{gawk}
-
-@cindex acronym
-@cindex history of @code{awk}
-The name @code{awk} comes from the initials of its designers: Alfred V.
-Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of
-@code{awk} was written in 1977. In 1985 a new version made the programming
-language more powerful, introducing user-defined functions, multiple input
-streams, and computed regular expressions.
-This new version became generally available with System V Release 3.1.
-The version in System V Release 4 added some new features and also cleaned
-up the behavior in some of the ``dark corners'' of the language.
-The specification for @code{awk} in the @sc{posix} Command Language
-and Utilities standard further clarified the language based on feedback
-from both the @code{gawk} designers, and the original @code{awk}
-designers.@refill
-
-The GNU implementation, @code{gawk}, was written in 1986 by Paul Rubin
-and Jay Fenlason, with advice from Richard Stallman. John Woods
-contributed parts of the code as well. In 1988 and 1989, David Trueman, with
-help from Arnold Robbins, thoroughly reworked @code{gawk} for compatibility
-with the newer @code{awk}. Current development (1992) focuses on bug fixes,
-performance improvements, and standards compliance.
-
-We need to thank many people for their assistance in producing this
-manual. Jay Fenlason contributed many ideas and sample programs. Richard
-Mlynarik and Robert J. Chassell gave helpful comments on early drafts of this
-manual. The paper @cite{A Supplemental Document for @code{awk}} by John W.
-Pierce of the Chemistry Department at UC San Diego, pinpointed several
-issues relevant both to @code{awk} implementation and to this manual, that
-would otherwise have escaped us. David Trueman, Pat Rankin, and Michal
-Jaegermann also contributed sections of the manual.@refill
-
-The following people provided many helpful comments on this edition of
-the manual: Rick Adams, Michael Brennan, Rich Burridge, Diane Close,
-Christopher (``Topher'') Eliot, Michael Lijewski, Pat Rankin, Miriam Robbins,
-and Michal Jaegermann. Robert J. Chassell provided much valuable advice on
-the use of Texinfo.
-
-Finally, we would like to thank Brian Kernighan of Bell Labs for invaluable
-assistance during the testing and debugging of @code{gawk}, and for
-help in clarifying numerous points about the language.@refill
-
-@node Copying, This Manual, Preface, Top
-@unnumbered GNU GENERAL PUBLIC LICENSE
-@center Version 2, June 1991
-
-@display
-Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
-675 Mass Ave, Cambridge, MA 02139, USA
-
-Everyone is permitted to copy and distribute verbatim copies
-of this license document, but changing it is not allowed.
-@end display
-
-@c fakenode --- for prepinfo
-@unnumberedsec Preamble
-
- The licenses for most software are designed to take away your
-freedom to share and change it. By contrast, the GNU General Public
-License is intended to guarantee your freedom to share and change free
-software---to make sure the software is free for all its users. This
-General Public License applies to most of the Free Software
-Foundation's software and to any other program whose authors commit to
-using it. (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.) You can apply it to
-your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-this service if you wish), that you receive source code or can get it
-if you want it, that you can change the software or use pieces of it
-in new free programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must show them these terms so they know their
-rights.
-
- We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- Finally, any free program is threatened constantly by software
-patents. We wish to avoid the danger that redistributors of a free
-program will individually obtain patent licenses, in effect making the
-program proprietary. To prevent this, we have made it clear that any
-patent must be licensed for everyone's free use or not licensed at all.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
-@iftex
-@c fakenode --- for prepinfo
-@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-@end iftex
-@ifinfo
-@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-@end ifinfo
-
-@enumerate
-@item
-This License applies to any program or other work which contains
-a notice placed by the copyright holder saying it may be distributed
-under the terms of this General Public License. The ``Program'', below,
-refers to any such program or work, and a ``work based on the Program''
-means either the Program or any derivative work under copyright law:
-that is to say, a work containing the Program or a portion of it,
-either verbatim or with modifications and/or translated into another
-language. (Hereinafter, translation is included without limitation in
-the term ``modification''.) Each licensee is addressed as ``you''.
-
-Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope. The act of
-running the Program is not restricted, and the output from the Program
-is covered only if its contents constitute a work based on the
-Program (independent of having been made by running the Program).
-Whether that is true depends on what the Program does.
-
-@item
-You may copy and distribute verbatim copies of the Program's
-source code as you receive it, in any medium, provided that you
-conspicuously and appropriately publish on each copy an appropriate
-copyright notice and disclaimer of warranty; keep intact all the
-notices that refer to this License and to the absence of any warranty;
-and give any other recipients of the Program a copy of this License
-along with the Program.
-
-You may charge a fee for the physical act of transferring a copy, and
-you may at your option offer warranty protection in exchange for a fee.
-
-@item
-You may modify your copy or copies of the Program or any portion
-of it, thus forming a work based on the Program, and copy and
-distribute such modifications or work under the terms of Section 1
-above, provided that you also meet all of these conditions:
-
-@enumerate a
-@item
-You must cause the modified files to carry prominent notices
-stating that you changed the files and the date of any change.
-
-@item
-You must cause any work that you distribute or publish, that in
-whole or in part contains or is derived from the Program or any
-part thereof, to be licensed as a whole at no charge to all third
-parties under the terms of this License.
-
-@item
-If the modified program normally reads commands interactively
-when run, you must cause it, when started running for such
-interactive use in the most ordinary way, to print or display an
-announcement including an appropriate copyright notice and a
-notice that there is no warranty (or else, saying that you provide
-a warranty) and that users may redistribute the program under
-these conditions, and telling the user how to view a copy of this
-License. (Exception: if the Program itself is interactive but
-does not normally print such an announcement, your work based on
-the Program is not required to print an announcement.)
-@end enumerate
-
-These requirements apply to the modified work as a whole. If
-identifiable sections of that work are not derived from the Program,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works. But when you
-distribute the same sections as part of a whole which is a work based
-on the Program, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Program.
-
-In addition, mere aggregation of another work not based on the Program
-with the Program (or with a work based on the Program) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
-@item
-You may copy and distribute the Program (or a work based on it,
-under Section 2) in object code or executable form under the terms of
-Sections 1 and 2 above provided that you also do one of the following:
-
-@enumerate a
-@item
-Accompany it with the complete corresponding machine-readable
-source code, which must be distributed under the terms of Sections
-1 and 2 above on a medium customarily used for software interchange; or,
-
-@item
-Accompany it with a written offer, valid for at least three
-years, to give any third party, for a charge no more than your
-cost of physically performing source distribution, a complete
-machine-readable copy of the corresponding source code, to be
-distributed under the terms of Sections 1 and 2 above on a medium
-customarily used for software interchange; or,
-
-@item
-Accompany it with the information you received as to the offer
-to distribute corresponding source code. (This alternative is
-allowed only for noncommercial distribution and only if you
-received the program in object code or executable form with such
-an offer, in accord with Subsection b above.)
-@end enumerate
-
-The source code for a work means the preferred form of the work for
-making modifications to it. For an executable work, complete source
-code means all the source code for all modules it contains, plus any
-associated interface definition files, plus the scripts used to
-control compilation and installation of the executable. However, as a
-special exception, the source code distributed need not include
-anything that is normally distributed (in either source or binary
-form) with the major components (compiler, kernel, and so on) of the
-operating system on which the executable runs, unless that component
-itself accompanies the executable.
-
-If distribution of executable or object code is made by offering
-access to copy from a designated place, then offering equivalent
-access to copy the source code from the same place counts as
-distribution of the source code, even though third parties are not
-compelled to copy the source along with the object code.
-
-@item
-You may not copy, modify, sublicense, or distribute the Program
-except as expressly provided under this License. Any attempt
-otherwise to copy, modify, sublicense or distribute the Program is
-void, and will automatically terminate your rights under this License.
-However, parties who have received copies, or rights, from you under
-this License will not have their licenses terminated so long as such
-parties remain in full compliance.
-
-@item
-You are not required to accept this License, since you have not
-signed it. However, nothing else grants you permission to modify or
-distribute the Program or its derivative works. These actions are
-prohibited by law if you do not accept this License. Therefore, by
-modifying or distributing the Program (or any work based on the
-Program), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Program or works based on it.
-
-@item
-Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the
-original licensor to copy, distribute or modify the Program subject to
-these terms and conditions. You may not impose any further
-restrictions on the recipients' exercise of the rights granted herein.
-You are not responsible for enforcing compliance by third parties to
-this License.
-
-@item
-If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License. If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Program at all. For example, if a patent
-license would not permit royalty-free redistribution of the Program by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Program.
-
-If any portion of this section is held invalid or unenforceable under
-any particular circumstance, the balance of the section is intended to
-apply and the section as a whole is intended to apply in other
-circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system, which is
-implemented by public license practices. Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-
-@item
-If the distribution and/or use of the Program is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Program under this License
-may add an explicit geographical distribution limitation excluding
-those countries, so that distribution is permitted only in or among
-countries not thus excluded. In such case, this License incorporates
-the limitation as if written in the body of this License.
-
-@item
-The Free Software Foundation may publish revised and/or new versions
-of the General Public License from time to time. Such new versions will
-be similar in spirit to the present version, but may differ in detail to
-address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of this License which applies to it and ``any
-later version'', you have the option of following the terms and conditions
-either of that version or of any later version published by the Free
-Software Foundation. If the Program does not specify a version number of
-this License, you may choose any version ever published by the Free Software
-Foundation.
-
-@item
-If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the author
-to ask for permission. For software which is copyrighted by the Free
-Software Foundation, write to the Free Software Foundation; we sometimes
-make exceptions for this. Our decision will be guided by the two goals
-of preserving the free status of all derivatives of our free software and
-of promoting the sharing and reuse of software generally.
-
-@iftex
-@c fakenode --- for prepinfo
-@heading NO WARRANTY
-@end iftex
-@ifinfo
-@center NO WARRANTY
-@end ifinfo
-
-@item
-BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
-FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
-OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
-PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
-OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
-TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
-PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
-REPAIR OR CORRECTION.
-
-@item
-IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
-WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
-REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
-OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
-TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
-YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
-PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGES.
-@end enumerate
-
-@iftex
-@c fakenode --- for prepinfo
-@heading END OF TERMS AND CONDITIONS
-@end iftex
-@ifinfo
-@center END OF TERMS AND CONDITIONS
-@end ifinfo
-
-@page
-@c fakenode --- for prepinfo
-@unnumberedsec How to Apply These Terms to Your New Programs
-
- If you develop a new program, and you want it to be of the greatest
-possible use to the public, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least
-the ``copyright'' line and a pointer to where the full notice is found.
-
-@smallexample
-@var{one line to give the program's name and a brief idea of what it does.}
-Copyright (C) 19@var{yy} @var{name of author}
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-@end smallexample
-
-Also add information on how to contact you by electronic and paper mail.
-
-If the program is interactive, make it output a short notice like this
-when it starts in an interactive mode:
-
-@smallexample
-Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
-Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
-type `show w'.
-This is free software, and you are welcome to redistribute it
-under certain conditions; type `show c' for details.
-@end smallexample
-
-The hypothetical commands @samp{show w} and @samp{show c} should show
-the appropriate parts of the General Public License. Of course, the
-commands you use may be called something other than @samp{show w} and
-@samp{show c}; they could even be mouse-clicks or menu items---whatever
-suits your program.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a ``copyright disclaimer'' for the program, if
-necessary. Here is a sample; alter the names:
-
-@smallexample
-Yoyodyne, Inc., hereby disclaims all copyright interest in the program
-`Gnomovision' (which makes passes at compilers) written by James Hacker.
-
-@var{signature of Ty Coon}, 1 April 1989
-Ty Coon, President of Vice
-@end smallexample
-
-This General Public License does not permit incorporating your program into
-proprietary programs. If your program is a subroutine library, you may
-consider it more useful to permit linking proprietary applications with the
-library. If this is what you want to do, use the GNU Library General
-Public License instead of this License.
-
-@node This Manual, Getting Started, Copying, Top
-@chapter Using this Manual
-@cindex manual, using this
-@cindex using this manual
-@cindex language, @code{awk}
-@cindex program, @code{awk}
-@cindex @code{awk} language
-@cindex @code{awk} program
-
-The term @code{awk} refers to a particular program, and to the language you
-use to tell this program what to do. When we need to be careful, we call
-the program ``the @code{awk} utility'' and the language ``the @code{awk}
-language.'' The term @code{gawk} refers to a version of @code{awk} developed
-as part the GNU project. The purpose of this manual is to explain
-both the
-@code{awk} language and how to run the @code{awk} utility.@refill
-
-While concentrating on the features of @code{gawk}, the manual will also
-attempt to describe important differences between @code{gawk} and other
-@code{awk} implementations. In particular, any features that are not
-in the @sc{posix} standard for @code{awk} will be noted. @refill
-
-The term @dfn{@code{awk} program} refers to a program written by you in
-the @code{awk} programming language.@refill
-
-@xref{Getting Started, ,Getting Started with @code{awk}}, for the bare
-essentials you need to know to start using @code{awk}.
-
-Some useful ``one-liners'' are included to give you a feel for the
-@code{awk} language (@pxref{One-liners, ,Useful ``One-liners''}).
-
-@ignore
-@strong{I deleted four paragraphs here because they would confuse the
-beginner more than help him. They mention terms such as ``field,''
-``pattern,'' ``action,'' ``built-in function'' which the beginner
-doesn't know.}
-
-@strong{If you can find a way to introduce several of these concepts here,
-enough to give the reader a map of what is to follow, that might
-be useful. I'm not sure that can be done without taking up more
-space than ought to be used here. There may be no way to win.}
-
-@strong{ADR: I'd like to tackle this in phase 2 of my editing.}
-@end ignore
-
-A sample @code{awk} program has been provided for you
-(@pxref{Sample Program}).@refill
-
-If you find terms that you aren't familiar with, try looking them
-up in the glossary (@pxref{Glossary}).@refill
-
-The entire @code{awk} language is summarized for quick reference in
-@ref{Gawk Summary, ,@code{gawk} Summary}. Look there if you just need
-to refresh your memory about a particular feature.@refill
-
-Most of the time complete @code{awk} programs are used as examples, but in
-some of the more advanced sections, only the part of the @code{awk} program
-that illustrates the concept being described is shown.@refill
-
-@menu
-* Sample Data Files:: Sample data files for use in the @code{awk}
- programs illustrated in this manual.
-@end menu
-
-@node Sample Data Files, , This Manual, This Manual
-@section Data Files for the Examples
-
-@cindex input file, sample
-@cindex sample input file
-@cindex @file{BBS-list} file
-Many of the examples in this manual take their input from two sample
-data files. The first, called @file{BBS-list}, represents a list of
-computer bulletin board systems together with information about those systems.
-The second data file, called @file{inventory-shipped}, contains
-information about shipments on a monthly basis. Each line of these
-files is one @dfn{record}.
-
-In the file @file{BBS-list}, each record contains the name of a computer
-bulletin board, its phone number, the board's baud rate, and a code for
-the number of hours it is operational. An @samp{A} in the last column
-means the board operates 24 hours a day. A @samp{B} in the last
-column means the board operates evening and weekend hours, only. A
-@samp{C} means the board operates only on weekends.
-
-@example
-aardvark 555-5553 1200/300 B
-alpo-net 555-3412 2400/1200/300 A
-barfly 555-7685 1200/300 A
-bites 555-1675 2400/1200/300 A
-camelot 555-0542 300 C
-core 555-2912 1200/300 C
-fooey 555-1234 2400/1200/300 B
-foot 555-6699 1200/300 B
-macfoo 555-6480 1200/300 A
-sdace 555-3430 2400/1200/300 A
-sabafoo 555-2127 1200/300 C
-@end example
-
-@cindex @file{inventory-shipped} file
-The second data file, called @file{inventory-shipped}, represents
-information about shipments during the year.
-Each record contains the month of the year, the number
-of green crates shipped, the number of red boxes shipped, the number of
-orange bags shipped, and the number of blue packages shipped,
-respectively. There are 16 entries, covering the 12 months of one year
-and 4 months of the next year.@refill
-
-@example
-Jan 13 25 15 115
-Feb 15 32 24 226
-Mar 15 24 34 228
-Apr 31 52 63 420
-May 16 34 29 208
-Jun 31 42 75 492
-Jul 24 34 67 436
-Aug 15 34 47 316
-Sep 13 55 37 277
-Oct 29 54 68 525
-Nov 20 87 82 577
-Dec 17 35 61 401
-
-Jan 21 36 64 620
-Feb 26 58 80 652
-Mar 24 75 70 495
-Apr 21 70 74 514
-@end example
-
-@ifinfo
-If you are reading this in GNU Emacs using Info, you can copy the regions
-of text showing these sample files into your own test files. This way you
-can try out the examples shown in the remainder of this document. You do
-this by using the command @kbd{M-x write-region} to copy text from the Info
-file into a file for use with @code{awk}
-(@xref{Misc File Ops, , , emacs, GNU Emacs Manual},
-for more information). Using this information, create your own
-@file{BBS-list} and @file{inventory-shipped} files, and practice what you
-learn in this manual.
-@end ifinfo
-
-@node Getting Started, Reading Files, This Manual, Top
-@chapter Getting Started with @code{awk}
-@cindex script, definition of
-@cindex rule, definition of
-@cindex program, definition of
-@cindex basic function of @code{gawk}
-
-The basic function of @code{awk} is to search files for lines (or other
-units of text) that contain certain patterns. When a line matches one
-of the patterns, @code{awk} performs specified actions on that line.
-@code{awk} keeps processing input lines in this way until the end of the
-input file is reached.@refill
-
-When you run @code{awk}, you specify an @code{awk} @dfn{program} which
-tells @code{awk} what to do. The program consists of a series of
-@dfn{rules}. (It may also contain @dfn{function definitions}, but that
-is an advanced feature, so we will ignore it for now.
-@xref{User-defined, ,User-defined Functions}.) Each rule specifies one
-pattern to search for, and one action to perform when that pattern is found.
-
-Syntactically, a rule consists of a pattern followed by an action. The
-action is enclosed in curly braces to separate it from the pattern.
-Rules are usually separated by newlines. Therefore, an @code{awk}
-program looks like this:
-
-@example
-@var{pattern} @{ @var{action} @}
-@var{pattern} @{ @var{action} @}
-@dots{}
-@end example
-
-@menu
-* Very Simple:: A very simple example.
-* Two Rules:: A less simple one-line example with two rules.
-* More Complex:: A more complex example.
-* Running gawk:: How to run @code{gawk} programs;
- includes command line syntax.
-* Comments:: Adding documentation to @code{gawk} programs.
-* Statements/Lines:: Subdividing or combining statements into lines.
-* When:: When to use @code{gawk} and
- when to use other things.
-@end menu
-
-@node Very Simple, Two Rules, Getting Started, Getting Started
-@section A Very Simple Example
-
-@cindex @samp{print $0}
-The following command runs a simple @code{awk} program that searches the
-input file @file{BBS-list} for the string of characters: @samp{foo}. (A
-string of characters is usually called, a @dfn{string}.
-The term @dfn{string} is perhaps based on similar usage in English, such
-as ``a string of pearls,'' or, ``a string of cars in a train.'')
-
-@example
-awk '/foo/ @{ print $0 @}' BBS-list
-@end example
-
-@noindent
-When lines containing @samp{foo} are found, they are printed, because
-@w{@samp{print $0}} means print the current line. (Just @samp{print} by
-itself means the same thing, so we could have written that
-instead.)
-
-You will notice that slashes, @samp{/}, surround the string @samp{foo}
-in the actual @code{awk} program. The slashes indicate that @samp{foo}
-is a pattern to search for. This type of pattern is called a
-@dfn{regular expression}, and is covered in more detail later
-(@pxref{Regexp, ,Regular Expressions as Patterns}). There are
-single-quotes around the @code{awk} program so that the shell won't
-interpret any of it as special shell characters.@refill
-
-Here is what this program prints:
-
-@example
-@group
-fooey 555-1234 2400/1200/300 B
-foot 555-6699 1200/300 B
-macfoo 555-6480 1200/300 A
-sabafoo 555-2127 1200/300 C
-@end group
-@end example
-
-@cindex action, default
-@cindex pattern, default
-@cindex default action
-@cindex default pattern
-In an @code{awk} rule, either the pattern or the action can be omitted,
-but not both. If the pattern is omitted, then the action is performed
-for @emph{every} input line. If the action is omitted, the default
-action is to print all lines that match the pattern.
-
-Thus, we could leave out the action (the @code{print} statement and the curly
-braces) in the above example, and the result would be the same: all
-lines matching the pattern @samp{foo} would be printed. By comparison,
-omitting the @code{print} statement but retaining the curly braces makes an
-empty action that does nothing; then no lines would be printed.
-
-@node Two Rules, More Complex, Very Simple, Getting Started
-@section An Example with Two Rules
-@cindex how @code{awk} works
-
-The @code{awk} utility reads the input files one line at a
-time. For each line, @code{awk} tries the patterns of each of the rules.
-If several patterns match then several actions are run, in the order in
-which they appear in the @code{awk} program. If no patterns match, then
-no actions are run.
-
-After processing all the rules (perhaps none) that match the line,
-@code{awk} reads the next line (however,
-@pxref{Next Statement, ,The @code{next} Statement}). This continues
-until the end of the file is reached.@refill
-
-For example, the @code{awk} program:
-
-@example
-/12/ @{ print $0 @}
-/21/ @{ print $0 @}
-@end example
-
-@noindent
-contains two rules. The first rule has the string @samp{12} as the
-pattern and @samp{print $0} as the action. The second rule has the
-string @samp{21} as the pattern and also has @samp{print $0} as the
-action. Each rule's action is enclosed in its own pair of braces.
-
-This @code{awk} program prints every line that contains the string
-@samp{12} @emph{or} the string @samp{21}. If a line contains both
-strings, it is printed twice, once by each rule.
-
-If we run this program on our two sample data files, @file{BBS-list} and
-@file{inventory-shipped}, as shown here:
-
-@example
-awk '/12/ @{ print $0 @}
- /21/ @{ print $0 @}' BBS-list inventory-shipped
-@end example
-
-@noindent
-we get the following output:
-
-@example
-aardvark 555-5553 1200/300 B
-alpo-net 555-3412 2400/1200/300 A
-barfly 555-7685 1200/300 A
-bites 555-1675 2400/1200/300 A
-core 555-2912 1200/300 C
-fooey 555-1234 2400/1200/300 B
-foot 555-6699 1200/300 B
-macfoo 555-6480 1200/300 A
-sdace 555-3430 2400/1200/300 A
-sabafoo 555-2127 1200/300 C
-sabafoo 555-2127 1200/300 C
-Jan 21 36 64 620
-Apr 21 70 74 514
-@end example
-
-@noindent
-Note how the line in @file{BBS-list} beginning with @samp{sabafoo}
-was printed twice, once for each rule.
-
-@node More Complex, Running gawk, Two Rules, Getting Started
-@comment node-name, next, previous, up
-@section A More Complex Example
-
-Here is an example to give you an idea of what typical @code{awk}
-programs do. This example shows how @code{awk} can be used to
-summarize, select, and rearrange the output of another utility. It uses
-features that haven't been covered yet, so don't worry if you don't
-understand all the details.
-
-@example
-ls -l | awk '$5 == "Nov" @{ sum += $4 @}
- END @{ print sum @}'
-@end example
-
-This command prints the total number of bytes in all the files in the
-current directory that were last modified in November (of any year).
-(In the C shell you would need to type a semicolon and then a backslash
-at the end of the first line; in a @sc{posix}-compliant shell, such as the
-Bourne shell or the Bourne-Again shell, you can type the example as shown.)
-
-The @w{@samp{ls -l}} part of this example is a command that gives you a
-listing of the files in a directory, including file size and date.
-Its output looks like this:@refill
-
-@example
--rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile
--rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h
--rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h
--rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y
--rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c
--rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c
--rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c
--rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c
-@end example
-
-@noindent
-The first field contains read-write permissions, the second field contains
-the number of links to the file, and the third field identifies the owner of
-the file. The fourth field contains the size of the file in bytes. The
-fifth, sixth, and seventh fields contain the month, day, and time,
-respectively, that the file was last modified. Finally, the eighth field
-contains the name of the file.
-
-The @code{$5 == "Nov"} in our @code{awk} program is an expression that
-tests whether the fifth field of the output from @w{@samp{ls -l}}
-matches the string @samp{Nov}. Each time a line has the string
-@samp{Nov} in its fifth field, the action @samp{@{ sum += $4 @}} is
-performed. This adds the fourth field (the file size) to the variable
-@code{sum}. As a result, when @code{awk} has finished reading all the
-input lines, @code{sum} is the sum of the sizes of files whose
-lines matched the pattern. (This works because @code{awk} variables
-are automatically initialized to zero.)@refill
-
-After the last line of output from @code{ls} has been processed, the
-@code{END} rule is executed, and the value of @code{sum} is
-printed. In this example, the value of @code{sum} would be 80600.@refill
-
-These more advanced @code{awk} techniques are covered in later sections
-(@pxref{Actions, ,Overview of Actions}). Before you can move on to more
-advanced @code{awk} programming, you have to know how @code{awk} interprets
-your input and displays your output. By manipulating fields and using
-@code{print} statements, you can produce some very useful and spectacular
-looking reports.@refill
-
-@node Running gawk, Comments, More Complex, Getting Started
-@section How to Run @code{awk} Programs
-
-@ignore
-Date: Mon, 26 Aug 91 09:48:10 +0200
-From: gatech!vsoc07.cern.ch!matheys (Jean-Pol Matheys (CERN - ECP Division))
-To: uunet.UU.NET!skeeve!arnold
-Subject: RE: status check
-
-The introduction of Chapter 2 (i.e. before 2.1) should include
-the whole of section 2.4 - it's better to tell people how to run awk programs
-before giving any examples
-
-ADR --- he's right. but for now, don't do this because the rest of the
-chapter would need some rewriting.
-@end ignore
-
-@cindex command line formats
-@cindex running @code{awk} programs
-There are several ways to run an @code{awk} program. If the program is
-short, it is easiest to include it in the command that runs @code{awk},
-like this:
-
-@example
-awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
-@end example
-
-@noindent
-where @var{program} consists of a series of patterns and actions, as
-described earlier.
-
-When the program is long, it is usually more convenient to put it in a file
-and run it with a command like this:
-
-@example
-awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{}
-@end example
-
-@menu
-* One-shot:: Running a short throw-away @code{awk} program.
-* Read Terminal:: Using no input files (input from
- terminal instead).
-* Long:: Putting permanent @code{awk} programs in files.
-* Executable Scripts:: Making self-contained @code{awk} programs.
-@end menu
-
-@node One-shot, Read Terminal, Running gawk, Running gawk
-@subsection One-shot Throw-away @code{awk} Programs
-
-Once you are familiar with @code{awk}, you will often type simple
-programs at the moment you want to use them. Then you can write the
-program as the first argument of the @code{awk} command, like this:
-
-@example
-awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
-@end example
-
-@noindent
-where @var{program} consists of a series of @var{patterns} and
-@var{actions}, as described earlier.
-
-@cindex single quotes, why needed
-This command format instructs the shell to start @code{awk} and use the
-@var{program} to process records in the input file(s). There are single
-quotes around @var{program} so that the shell doesn't interpret any
-@code{awk} characters as special shell characters. They also cause the
-shell to treat all of @var{program} as a single argument for
-@code{awk} and allow @var{program} to be more than one line long.@refill
-
-This format is also useful for running short or medium-sized @code{awk}
-programs from shell scripts, because it avoids the need for a separate
-file for the @code{awk} program. A self-contained shell script is more
-reliable since there are no other files to misplace.
-
-@node Read Terminal, Long, One-shot, Running gawk
-@subsection Running @code{awk} without Input Files
-
-@cindex standard input
-@cindex input, standard
-You can also run @code{awk} without any input files. If you type the
-command line:@refill
-
-@example
-awk '@var{program}'
-@end example
-
-@noindent
-then @code{awk} applies the @var{program} to the @dfn{standard input},
-which usually means whatever you type on the terminal. This continues
-until you indicate end-of-file by typing @kbd{Control-d}.
-
-For example, if you execute this command:
-
-@example
-awk '/th/'
-@end example
-
-@noindent
-whatever you type next is taken as data for that @code{awk}
-program. If you go on to type the following data:
-
-@example
-Kathy
-Ben
-Tom
-Beth
-Seth
-Karen
-Thomas
-@kbd{Control-d}
-@end example
-
-@noindent
-then @code{awk} prints this output:
-
-@example
-Kathy
-Beth
-Seth
-@end example
-
-@noindent
-@cindex case sensitivity
-@cindex pattern, case sensitive
-as matching the pattern @samp{th}. Notice that it did not recognize
-@samp{Thomas} as matching the pattern. The @code{awk} language is
-@dfn{case sensitive}, and matches patterns exactly. (However, you can
-override this with the variable @code{IGNORECASE}.
-@xref{Case-sensitivity, ,Case-sensitivity in Matching}.)
-
-@node Long, Executable Scripts, Read Terminal, Running gawk
-@subsection Running Long Programs
-
-@cindex running long programs
-@cindex @samp{-f} option
-@cindex program file
-@cindex file, @code{awk} program
-Sometimes your @code{awk} programs can be very long. In this case it is
-more convenient to put the program into a separate file. To tell
-@code{awk} to use that file for its program, you type:@refill
-
-@example
-awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
-@end example
-
-The @samp{-f} instructs the @code{awk} utility to get the @code{awk} program
-from the file @var{source-file}. Any file name can be used for
-@var{source-file}. For example, you could put the program:@refill
-
-@example
-/th/
-@end example
-
-@noindent
-into the file @file{th-prog}. Then this command:
-
-@example
-awk -f th-prog
-@end example
-
-@noindent
-does the same thing as this one:
-
-@example
-awk '/th/'
-@end example
-
-@noindent
-which was explained earlier (@pxref{Read Terminal, ,Running @code{awk} without Input Files}).
-Note that you don't usually need single quotes around the file name that you
-specify with @samp{-f}, because most file names don't contain any of the shell's
-special characters. Notice that in @file{th-prog}, the @code{awk}
-program did not have single quotes around it. The quotes are only needed
-for programs that are provided on the @code{awk} command line.
-
-If you want to identify your @code{awk} program files clearly as such,
-you can add the extension @file{.awk} to the file name. This doesn't
-affect the execution of the @code{awk} program, but it does make
-``housekeeping'' easier.
-
-@node Executable Scripts, , Long, Running gawk
-@c node-name, next, previous, up
-@subsection Executable @code{awk} Programs
-@cindex executable scripts
-@cindex scripts, executable
-@cindex self contained programs
-@cindex program, self contained
-@cindex @samp{#!}
-
-Once you have learned @code{awk}, you may want to write self-contained
-@code{awk} scripts, using the @samp{#!} script mechanism. You can do
-this on many Unix systems @footnote{The @samp{#!} mechanism works on
-Unix systems derived from Berkeley Unix, System V Release 4, and some System
-V Release 3 systems.} (and someday on GNU).@refill
-
-For example, you could create a text file named @file{hello}, containing
-the following (where @samp{BEGIN} is a feature we have not yet
-discussed):
-
-@example
-#! /bin/awk -f
-
-# a sample awk program
-BEGIN @{ print "hello, world" @}
-@end example
-
-@noindent
-After making this file executable (with the @code{chmod} command), you
-can simply type:
-
-@example
-hello
-@end example
-
-@noindent
-at the shell, and the system will arrange to run @code{awk} @footnote{The
-line beginning with @samp{#!} lists the full pathname of an interpreter
-to be run, and an optional initial command line argument to pass to that
-interpreter. The operating system then runs the interpreter with the given
-argument and the full argument list of the executed program. The first argument
-in the list is the full pathname of the @code{awk} program. The rest of the
-argument list will either be options to @code{awk}, or data files,
-or both.} as if you had typed:@refill
-
-@example
-awk -f hello
-@end example
-
-@noindent
-Self-contained @code{awk} scripts are useful when you want to write a
-program which users can invoke without knowing that the program is
-written in @code{awk}.
-
-@cindex shell scripts
-@cindex scripts, shell
-If your system does not support the @samp{#!} mechanism, you can get a
-similar effect using a regular shell script. It would look something
-like this:
-
-@example
-: The colon makes sure this script is executed by the Bourne shell.
-awk '@var{program}' "$@@"
-@end example
-
-Using this technique, it is @emph{vital} to enclose the @var{program} in
-single quotes to protect it from interpretation by the shell. If you
-omit the quotes, only a shell wizard can predict the results.
-
-The @samp{"$@@"} causes the shell to forward all the command line
-arguments to the @code{awk} program, without interpretation. The first
-line, which starts with a colon, is used so that this shell script will
-work even if invoked by a user who uses the C shell.
-@c Someday: (See @cite{The Bourne Again Shell}, by ??.)
-
-@node Comments, Statements/Lines, Running gawk, Getting Started
-@section Comments in @code{awk} Programs
-@cindex @samp{#}
-@cindex comments
-@cindex use of comments
-@cindex documenting @code{awk} programs
-@cindex programs, documenting
-
-A @dfn{comment} is some text that is included in a program for the sake
-of human readers, and that is not really part of the program. Comments
-can explain what the program does, and how it works. Nearly all
-programming languages have provisions for comments, because programs are
-typically hard to understand without their extra help.
-
-In the @code{awk} language, a comment starts with the sharp sign
-character, @samp{#}, and continues to the end of the line. The
-@code{awk} language ignores the rest of a line following a sharp sign.
-For example, we could have put the following into @file{th-prog}:@refill
-
-@smallexample
-# This program finds records containing the pattern @samp{th}. This is how
-# you continue comments on additional lines.
-/th/
-@end smallexample
-
-You can put comment lines into keyboard-composed throw-away @code{awk}
-programs also, but this usually isn't very useful; the purpose of a
-comment is to help you or another person understand the program at
-a later time.@refill
-
-@node Statements/Lines, When, Comments, Getting Started
-@section @code{awk} Statements versus Lines
-
-Most often, each line in an @code{awk} program is a separate statement or
-separate rule, like this:
-
-@example
-awk '/12/ @{ print $0 @}
- /21/ @{ print $0 @}' BBS-list inventory-shipped
-@end example
-
-But sometimes statements can be more than one line, and lines can
-contain several statements. You can split a statement into multiple
-lines by inserting a newline after any of the following:@refill
-
-@example
-, @{ ? : || && do else
-@end example
-
-@noindent
-A newline at any other point is considered the end of the statement.
-(Splitting lines after @samp{?} and @samp{:} is a minor @code{gawk}
-extension. The @samp{?} and @samp{:} referred to here is the
-three operand conditional expression described in
-@ref{Conditional Exp, ,Conditional Expressions}.)@refill
-
-@cindex backslash continuation
-@cindex continuation of lines
-If you would like to split a single statement into two lines at a point
-where a newline would terminate it, you can @dfn{continue} it by ending the
-first line with a backslash character, @samp{\}. This is allowed
-absolutely anywhere in the statement, even in the middle of a string or
-regular expression. For example:
-
-@example
-awk '/This program is too long, so continue it\
- on the next line/ @{ print $1 @}'
-@end example
-
-@noindent
-We have generally not used backslash continuation in the sample programs in
-this manual. Since in @code{gawk} there is no limit on the length of a line,
-it is never strictly necessary; it just makes programs prettier. We have
-preferred to make them even more pretty by keeping the statements short.
-Backslash continuation is most useful when your @code{awk} program is in a
-separate source file, instead of typed in on the command line. You should
-also note that many @code{awk} implementations are more picky about where
-you may use backslash continuation. For maximal portability of your @code{awk}
-programs, it is best not to split your lines in the middle of a regular
-expression or a string.@refill
-
-@strong{Warning: backslash continuation does not work as described above
-with the C shell.} Continuation with backslash works for @code{awk}
-programs in files, and also for one-shot programs @emph{provided} you
-are using a @sc{posix}-compliant shell, such as the Bourne shell or the
-Bourne-again shell. But the C shell used on Berkeley Unix behaves
-differently! There, you must use two backslashes in a row, followed by
-a newline.@refill
-
-@cindex multiple statements on one line
-When @code{awk} statements within one rule are short, you might want to put
-more than one of them on a line. You do this by separating the statements
-with a semicolon, @samp{;}.
-This also applies to the rules themselves.
-Thus, the previous program could have been written:@refill
-
-@example
-/12/ @{ print $0 @} ; /21/ @{ print $0 @}
-@end example
-
-@noindent
-@strong{Note:} the requirement that rules on the same line must be
-separated with a semicolon is a recent change in the @code{awk}
-language; it was done for consistency with the treatment of statements
-within an action.
-
-@node When, , Statements/Lines, Getting Started
-@section When to Use @code{awk}
-
-@cindex when to use @code{awk}
-@cindex applications of @code{awk}
-You might wonder how @code{awk} might be useful for you. Using additional
-utility programs, more advanced patterns, field separators, arithmetic
-statements, and other selection criteria, you can produce much more
-complex output. The @code{awk} language is very useful for producing
-reports from large amounts of raw data, such as summarizing information
-from the output of other utility programs like @code{ls}.
-(@xref{More Complex, ,A More Complex Example}.)
-
-Programs written with @code{awk} are usually much smaller than they would
-be in other languages. This makes @code{awk} programs easy to compose and
-use. Often @code{awk} programs can be quickly composed at your terminal,
-used once, and thrown away. Since @code{awk} programs are interpreted, you
-can avoid the usually lengthy edit-compile-test-debug cycle of software
-development.
-
-Complex programs have been written in @code{awk}, including a complete
-retargetable assembler for 8-bit microprocessors (@pxref{Glossary}, for
-more information) and a microcode assembler for a special purpose Prolog
-computer. However, @code{awk}'s capabilities are strained by tasks of
-such complexity.
-
-If you find yourself writing @code{awk} scripts of more than, say, a few
-hundred lines, you might consider using a different programming
-language. Emacs Lisp is a good choice if you need sophisticated string
-or pattern matching capabilities. The shell is also good at string and
-pattern matching; in addition, it allows powerful use of the system
-utilities. More conventional languages, such as C, C++, and Lisp, offer
-better facilities for system programming and for managing the complexity
-of large programs. Programs in these languages may require more lines
-of source code than the equivalent @code{awk} programs, but they are
-easier to maintain and usually run more efficiently.@refill
-
-@node Reading Files, Printing, Getting Started, Top
-@chapter Reading Input Files
-
-@cindex reading files
-@cindex input
-@cindex standard input
-@vindex FILENAME
-In the typical @code{awk} program, all input is read either from the
-standard input (by default the keyboard, but often a pipe from another
-command) or from files whose names you specify on the @code{awk} command
-line. If you specify input files, @code{awk} reads them in order, reading
-all the data from one before going on to the next. The name of the current
-input file can be found in the built-in variable @code{FILENAME}
-(@pxref{Built-in Variables}).@refill
-
-The input is read in units called records, and processed by the
-rules one record at a time. By default, each record is one line. Each
-record is split automatically into fields, to make it more
-convenient for a rule to work on its parts.
-
-On rare occasions you will need to use the @code{getline} command,
-which can do explicit input from any number of files
-(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill
-
-@menu
-* Records:: Controlling how data is split into records.
-* Fields:: An introduction to fields.
-* Non-Constant Fields:: Non-constant Field Numbers.
-* Changing Fields:: Changing the Contents of a Field.
-* Field Separators:: The field separator and how to change it.
-* Constant Size:: Reading constant width data.
-* Multiple Line:: Reading multi-line records.
-* Getline:: Reading files under explicit program control
- using the @code{getline} function.
-* Close Input:: Closing an input file (so you can read from
- the beginning once more).
-@end menu
-
-@node Records, Fields, Reading Files, Reading Files
-@section How Input is Split into Records
-
-@cindex record separator
-The @code{awk} language divides its input into records and fields.
-Records are separated by a character called the @dfn{record separator}.
-By default, the record separator is the newline character, defining
-a record to be a single line of text.@refill
-
-@iftex
-@cindex changing the record separator
-@end iftex
-@vindex RS
-Sometimes you may want to use a different character to separate your
-records. You can use a different character by changing the built-in
-variable @code{RS}. The value of @code{RS} is a string that says how
-to separate records; the default value is @code{"\n"}, the string containing
-just a newline character. This is why records are, by default, single lines.
-
-@code{RS} can have any string as its value, but only the first character
-of the string is used as the record separator. The other characters are
-ignored. @code{RS} is exceptional in this regard; @code{awk} uses the
-full value of all its other built-in variables.@refill
-
-@ignore
-Someday this should be true!
-
-The value of @code{RS} is not limited to a one-character string. It can
-be any regular expression (@pxref{Regexp, ,Regular Expressions as Patterns}).
-In general, each record
-ends at the next string that matches the regular expression; the next
-record starts at the end of the matching string. This general rule is
-actually at work in the usual case, where @code{RS} contains just a
-newline: a record ends at the beginning of the next matching string (the
-next newline in the input) and the following record starts just after
-the end of this string (at the first character of the following line).
-The newline, since it matches @code{RS}, is not part of either record.@refill
-@end ignore
-
-You can change the value of @code{RS} in the @code{awk} program with the
-assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).
-The new record-separator character should be enclosed in quotation marks to make
-a string constant. Often the right time to do this is at the beginning
-of execution, before any input has been processed, so that the very
-first record will be read with the proper separator. To do this, use
-the special @code{BEGIN} pattern
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}). For
-example:@refill
-
-@example
-awk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list
-@end example
-
-@noindent
-changes the value of @code{RS} to @code{"/"}, before reading any input.
-This is a string whose first character is a slash; as a result, records
-are separated by slashes. Then the input file is read, and the second
-rule in the @code{awk} program (the action with no pattern) prints each
-record. Since each @code{print} statement adds a newline at the end of
-its output, the effect of this @code{awk} program is to copy the input
-with each slash changed to a newline.
-
-Another way to change the record separator is on the command line,
-using the variable-assignment feature
-(@pxref{Command Line, ,Invoking @code{awk}}).@refill
-
-@example
-awk '@{ print $0 @}' RS="/" BBS-list
-@end example
-
-@noindent
-This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
-
-Reaching the end of an input file terminates the current input record,
-even if the last character in the file is not the character in @code{RS}.
-
-@ignore
-@c merge the preceding paragraph and this stuff into one paragraph
-@c and put it in an `expert info' section.
-This produces correct behavior in the vast majority of cases, although
-the following (extreme) pipeline prints a surprising @samp{1}. (There
-is one field, consisting of a newline.)
-
-@example
-echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}'
-@end example
-
-@end ignore
-
-The empty string, @code{""} (a string of no characters), has a special meaning
-as the value of @code{RS}: it means that records are separated only
-by blank lines. @xref{Multiple Line, ,Multiple-Line Records}, for more details.
-
-@cindex number of records, @code{NR} or @code{FNR}
-@vindex NR
-@vindex FNR
-The @code{awk} utility keeps track of the number of records that have
-been read so far from the current input file. This value is stored in a
-built-in variable called @code{FNR}. It is reset to zero when a new
-file is started. Another built-in variable, @code{NR}, is the total
-number of input records read so far from all files. It starts at zero
-but is never automatically reset to zero.
-
-If you change the value of @code{RS} in the middle of an @code{awk} run,
-the new value is used to delimit subsequent records, but the record
-currently being processed (and records already processed) are not
-affected.
-
-@node Fields, Non-Constant Fields, Records, Reading Files
-@section Examining Fields
-
-@cindex examining fields
-@cindex fields
-@cindex accessing fields
-When @code{awk} reads an input record, the record is
-automatically separated or @dfn{parsed} by the interpreter into chunks
-called @dfn{fields}. By default, fields are separated by whitespace,
-like words in a line.
-Whitespace in @code{awk} means any string of one or more spaces and/or
-tabs; other characters such as newline, formfeed, and so on, that are
-considered whitespace by other languages are @emph{not} considered
-whitespace by @code{awk}.@refill
-
-The purpose of fields is to make it more convenient for you to refer to
-these pieces of the record. You don't have to use them---you can
-operate on the whole record if you wish---but fields are what make
-simple @code{awk} programs so powerful.
-
-@cindex @code{$} (field operator)
-@cindex operators, @code{$}
-To refer to a field in an @code{awk} program, you use a dollar-sign,
-@samp{$}, followed by the number of the field you want. Thus, @code{$1}
-refers to the first field, @code{$2} to the second, and so on. For
-example, suppose the following is a line of input:@refill
-
-@example
-This seems like a pretty nice example.
-@end example
-
-@noindent
-Here the first field, or @code{$1}, is @samp{This}; the second field, or
-@code{$2}, is @samp{seems}; and so on. Note that the last field,
-@code{$7}, is @samp{example.}. Because there is no space between the
-@samp{e} and the @samp{.}, the period is considered part of the seventh
-field.@refill
-
-No matter how many fields there are, the last field in a record can be
-represented by @code{$NF}. So, in the example above, @code{$NF} would
-be the same as @code{$7}, which is @samp{example.}. Why this works is
-explained below (@pxref{Non-Constant Fields, ,Non-constant Field Numbers}).
-If you try to refer to a field beyond the last one, such as @code{$8}
-when the record has only 7 fields, you get the empty string.@refill
-
-@vindex NF
-@cindex number of fields, @code{NF}
-Plain @code{NF}, with no @samp{$}, is a built-in variable whose value
-is the number of fields in the current record.
-
-@code{$0}, which looks like an attempt to refer to the zeroth field, is
-a special case: it represents the whole input record. This is what you
-would use if you weren't interested in fields.
-
-Here are some more examples:
-
-@example
-awk '$1 ~ /foo/ @{ print $0 @}' BBS-list
-@end example
-
-@noindent
-This example prints each record in the file @file{BBS-list} whose first
-field contains the string @samp{foo}. The operator @samp{~} is called a
-@dfn{matching operator} (@pxref{Comparison Ops, ,Comparison Expressions});
-it tests whether a string (here, the field @code{$1}) matches a given regular
-expression.@refill
-
-By contrast, the following example:
-
-@example
-awk '/foo/ @{ print $1, $NF @}' BBS-list
-@end example
-
-@noindent
-looks for @samp{foo} in @emph{the entire record} and prints the first
-field and the last field for each input record containing a
-match.@refill
-
-@node Non-Constant Fields, Changing Fields, Fields, Reading Files
-@section Non-constant Field Numbers
-
-The number of a field does not need to be a constant. Any expression in
-the @code{awk} language can be used after a @samp{$} to refer to a
-field. The value of the expression specifies the field number. If the
-value is a string, rather than a number, it is converted to a number.
-Consider this example:@refill
-
-@example
-awk '@{ print $NR @}'
-@end example
-
-@noindent
-Recall that @code{NR} is the number of records read so far: 1 in the
-first record, 2 in the second, etc. So this example prints the first
-field of the first record, the second field of the second record, and so
-on. For the twentieth record, field number 20 is printed; most likely,
-the record has fewer than 20 fields, so this prints a blank line.
-
-Here is another example of using expressions as field numbers:
-
-@example
-awk '@{ print $(2*2) @}' BBS-list
-@end example
-
-The @code{awk} language must evaluate the expression @code{(2*2)} and use
-its value as the number of the field to print. The @samp{*} sign
-represents multiplication, so the expression @code{2*2} evaluates to 4.
-The parentheses are used so that the multiplication is done before the
-@samp{$} operation; they are necessary whenever there is a binary
-operator in the field-number expression. This example, then, prints the
-hours of operation (the fourth field) for every line of the file
-@file{BBS-list}.@refill
-
-If the field number you compute is zero, you get the entire record.
-Thus, @code{$(2-2)} has the same value as @code{$0}. Negative field
-numbers are not allowed.
-
-The number of fields in the current record is stored in the built-in
-variable @code{NF} (@pxref{Built-in Variables}). The expression
-@code{$NF} is not a special feature: it is the direct consequence of
-evaluating @code{NF} and using its value as a field number.
-
-@node Changing Fields, Field Separators, Non-Constant Fields, Reading Files
-@section Changing the Contents of a Field
-
-@cindex field, changing contents of
-@cindex changing contents of a field
-@cindex assignment to fields
-You can change the contents of a field as seen by @code{awk} within an
-@code{awk} program; this changes what @code{awk} perceives as the
-current input record. (The actual input is untouched: @code{awk} never
-modifies the input file.)
-
-Consider this example:
-
-@smallexample
-awk '@{ $3 = $2 - 10; print $2, $3 @}' inventory-shipped
-@end smallexample
-
-@noindent
-The @samp{-} sign represents subtraction, so this program reassigns
-field three, @code{$3}, to be the value of field two minus ten,
-@code{$2 - 10}. (@xref{Arithmetic Ops, ,Arithmetic Operators}.)
-Then field two, and the new value for field three, are printed.
-
-In order for this to work, the text in field @code{$2} must make sense
-as a number; the string of characters must be converted to a number in
-order for the computer to do arithmetic on it. The number resulting
-from the subtraction is converted back to a string of characters which
-then becomes field three.
-@xref{Conversion, ,Conversion of Strings and Numbers}.@refill
-
-When you change the value of a field (as perceived by @code{awk}), the
-text of the input record is recalculated to contain the new field where
-the old one was. Therefore, @code{$0} changes to reflect the altered
-field. Thus,
-
-@smallexample
-awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped
-@end smallexample
-
-@noindent
-prints a copy of the input file, with 10 subtracted from the second
-field of each line.
-
-You can also assign contents to fields that are out of range. For
-example:
-
-@smallexample
-awk '@{ $6 = ($5 + $4 + $3 + $2) ; print $6 @}' inventory-shipped
-@end smallexample
-
-@noindent
-We've just created @code{$6}, whose value is the sum of fields
-@code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign
-represents addition. For the file @file{inventory-shipped}, @code{$6}
-represents the total number of parcels shipped for a particular month.
-
-Creating a new field changes the internal @code{awk} copy of the current
-input record---the value of @code{$0}. Thus, if you do @samp{print $0}
-after adding a field, the record printed includes the new field, with
-the appropriate number of field separators between it and the previously
-existing fields.
-
-This recomputation affects and is affected by several features not yet
-discussed, in particular, the @dfn{output field separator}, @code{OFS},
-which is used to separate the fields (@pxref{Output Separators}), and
-@code{NF} (the number of fields; @pxref{Fields, ,Examining Fields}).
-For example, the value of @code{NF} is set to the number of the highest
-field you create.@refill
-
-Note, however, that merely @emph{referencing} an out-of-range field
-does @emph{not} change the value of either @code{$0} or @code{NF}.
-Referencing an out-of-range field merely produces a null string. For
-example:@refill
-
-@smallexample
-if ($(NF+1) != "")
- print "can't happen"
-else
- print "everything is normal"
-@end smallexample
-
-@noindent
-should print @samp{everything is normal}, because @code{NF+1} is certain
-to be out of range. (@xref{If Statement, ,The @code{if} Statement},
-for more information about @code{awk}'s @code{if-else} statements.)@refill
-
-It is important to note that assigning to a field will change the
-value of @code{$0}, but will not change the value of @code{NF},
-even when you assign the null string to a field. For example:
-
-@smallexample
-echo a b c d | awk '@{ OFS = ":"; $2 = "" ; print ; print NF @}'
-@end smallexample
-
-@noindent
-prints
-
-@smallexample
-a::c:d
-4
-@end smallexample
-
-@noindent
-The field is still there, it just has an empty value. You can tell
-because there are two colons in a row.
-
-@node Field Separators, Constant Size, Changing Fields, Reading Files
-@section Specifying how Fields are Separated
-@vindex FS
-@cindex fields, separating
-@cindex field separator, @code{FS}
-@cindex @samp{-F} option
-
-(This section is rather long; it describes one of the most fundamental
-operations in @code{awk}. If you are a novice with @code{awk}, we
-recommend that you re-read this section after you have studied the
-section on regular expressions, @ref{Regexp, ,Regular Expressions as Patterns}.)
-
-The way @code{awk} splits an input record into fields is controlled by
-the @dfn{field separator}, which is a single character or a regular
-expression. @code{awk} scans the input record for matches for the
-separator; the fields themselves are the text between the matches. For
-example, if the field separator is @samp{oo}, then the following line:
-
-@smallexample
-moo goo gai pan
-@end smallexample
-
-@noindent
-would be split into three fields: @samp{m}, @samp{@ g} and @samp{@ gai@
-pan}.
-
-The field separator is represented by the built-in variable @code{FS}.
-Shell programmers take note! @code{awk} does not use the name @code{IFS}
-which is used by the shell.@refill
-
-You can change the value of @code{FS} in the @code{awk} program with the
-assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).
-Often the right time to do this is at the beginning of execution,
-before any input has been processed, so that the very first record
-will be read with the proper separator. To do this, use the special
-@code{BEGIN} pattern
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}).
-For example, here we set the value of @code{FS} to the string
-@code{","}:@refill
-
-@smallexample
-awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}'
-@end smallexample
-
-@noindent
-Given the input line,
-
-@smallexample
-John Q. Smith, 29 Oak St., Walamazoo, MI 42139
-@end smallexample
-
-@noindent
-this @code{awk} program extracts the string @samp{@ 29 Oak St.}.
-
-@cindex field separator, choice of
-@cindex regular expressions as field separators
-Sometimes your input data will contain separator characters that don't
-separate fields the way you thought they would. For instance, the
-person's name in the example we've been using might have a title or
-suffix attached, such as @samp{John Q. Smith, LXIX}. From input
-containing such a name:
-
-@smallexample
-John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
-@end smallexample
-
-@noindent
-the previous sample program would extract @samp{@ LXIX}, instead of
-@samp{@ 29 Oak St.}. If you were expecting the program to print the
-address, you would be surprised. So choose your data layout and
-separator characters carefully to prevent such problems.
-
-As you know, by default, fields are separated by whitespace sequences
-(spaces and tabs), not by single spaces: two spaces in a row do not
-delimit an empty field. The default value of the field separator is a
-string @w{@code{" "}} containing a single space. If this value were
-interpreted in the usual way, each space character would separate
-fields, so two spaces in a row would make an empty field between them.
-The reason this does not happen is that a single space as the value of
-@code{FS} is a special case: it is taken to specify the default manner
-of delimiting fields.
-
-If @code{FS} is any other single character, such as @code{","}, then
-each occurrence of that character separates two fields. Two consecutive
-occurrences delimit an empty field. If the character occurs at the
-beginning or the end of the line, that too delimits an empty field. The
-space character is the only single character which does not follow these
-rules.
-
-More generally, the value of @code{FS} may be a string containing any
-regular expression. Then each match in the record for the regular
-expression separates fields. For example, the assignment:@refill
-
-@smallexample
-FS = ", \t"
-@end smallexample
-
-@noindent
-makes every area of an input line that consists of a comma followed by a
-space and a tab, into a field separator. (@samp{\t} stands for a
-tab.)@refill
-
-For a less trivial example of a regular expression, suppose you want
-single spaces to separate fields the way single commas were used above.
-You can set @code{FS} to @w{@code{"[@ ]"}}. This regular expression
-matches a single space and nothing else.
-
-@c the following index entry is an overfull hbox. --mew 30jan1992
-@cindex field separator: on command line
-@cindex command line, setting @code{FS} on
-@code{FS} can be set on the command line. You use the @samp{-F} argument to
-do so. For example:
-
-@smallexample
-awk -F, '@var{program}' @var{input-files}
-@end smallexample
-
-@noindent
-sets @code{FS} to be the @samp{,} character. Notice that the argument uses
-a capital @samp{F}. Contrast this with @samp{-f}, which specifies a file
-containing an @code{awk} program. Case is significant in command options:
-the @samp{-F} and @samp{-f} options have nothing to do with each other.
-You can use both options at the same time to set the @code{FS} argument
-@emph{and} get an @code{awk} program from a file.@refill
-
-@c begin expert info
-The value used for the argument to @samp{-F} is processed in exactly the
-same way as assignments to the built-in variable @code{FS}. This means that
-if the field separator contains special characters, they must be escaped
-appropriately. For example, to use a @samp{\} as the field separator, you
-would have to type:
-
-@smallexample
-# same as FS = "\\"
-awk -F\\\\ '@dots{}' files @dots{}
-@end smallexample
-
-@noindent
-Since @samp{\} is used for quoting in the shell, @code{awk} will see
-@samp{-F\\}. Then @code{awk} processes the @samp{\\} for escape
-characters (@pxref{Constants, ,Constant Expressions}), finally yielding
-a single @samp{\} to be used for the field separator.
-@c end expert info
-
-As a special case, in compatibility mode
-(@pxref{Command Line, ,Invoking @code{awk}}), if the
-argument to @samp{-F} is @samp{t}, then @code{FS} is set to the tab
-character. (This is because if you type @samp{-F\t}, without the quotes,
-at the shell, the @samp{\} gets deleted, so @code{awk} figures that you
-really want your fields to be separated with tabs, and not @samp{t}s.
-Use @samp{-v FS="t"} on the command line if you really do want to separate
-your fields with @samp{t}s.)@refill
-
-For example, let's use an @code{awk} program file called @file{baud.awk}
-that contains the pattern @code{/300/}, and the action @samp{print $1}.
-Here is the program:
-
-@smallexample
-/300/ @{ print $1 @}
-@end smallexample
-
-Let's also set @code{FS} to be the @samp{-} character, and run the
-program on the file @file{BBS-list}. The following command prints a
-list of the names of the bulletin boards that operate at 300 baud and
-the first three digits of their phone numbers:@refill
-
-@smallexample
-awk -F- -f baud.awk BBS-list
-@end smallexample
-
-@noindent
-It produces this output:
-
-@smallexample
-aardvark 555
-alpo
-barfly 555
-bites 555
-camelot 555
-core 555
-fooey 555
-foot 555
-macfoo 555
-sdace 555
-sabafoo 555
-@end smallexample
-
-@noindent
-Note the second line of output. If you check the original file, you will
-see that the second line looked like this:
-
-@smallexample
-alpo-net 555-3412 2400/1200/300 A
-@end smallexample
-
-The @samp{-} as part of the system's name was used as the field
-separator, instead of the @samp{-} in the phone number that was
-originally intended. This demonstrates why you have to be careful in
-choosing your field and record separators.
-
-The following program searches the system password file, and prints
-the entries for users who have no password:
-
-@smallexample
-awk -F: '$2 == ""' /etc/passwd
-@end smallexample
-
-@noindent
-Here we use the @samp{-F} option on the command line to set the field
-separator. Note that fields in @file{/etc/passwd} are separated by
-colons. The second field represents a user's encrypted password, but if
-the field is empty, that user has no password.
-
-@c begin expert info
-According to the @sc{posix} standard, @code{awk} is supposed to behave
-as if each record is split into fields at the time that it is read.
-In particular, this means that you can change the value of @code{FS}
-after a record is read, but before any of the fields are referenced.
-The value of the fields (i.e. how they were split) should reflect the
-old value of @code{FS}, not the new one.
-
-However, many implementations of @code{awk} do not do this. Instead,
-they defer splitting the fields until a field reference actually happens,
-using the @emph{current} value of @code{FS}! This behavior can be difficult
-to diagnose. The following example illustrates the results of the two methods.
-(The @code{sed} command prints just the first line of @file{/etc/passwd}.)
-
-@smallexample
-sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
-@end smallexample
-
-@noindent
-will usually print
-
-@smallexample
-root
-@end smallexample
-
-@noindent
-on an incorrect implementation of @code{awk}, while @code{gawk}
-will print something like
-
-@smallexample
-root:nSijPlPhZZwgE:0:0:Root:/:
-@end smallexample
-@c end expert info
-
-@c begin expert info
-There is an important difference between the two cases of @samp{FS = @w{" "}}
-(a single blank) and @samp{FS = @w{"[ \t]+"}} (which is a regular expression
-matching one or more blanks or tabs). For both values of @code{FS}, fields
-are separated by runs of blanks and/or tabs. However, when the value of
-@code{FS} is @code{" "}, @code{awk} will strip leading and trailing whitespace
-from the record, and then decide where the fields are.
-
-For example, the following expression prints @samp{b}:
-
-@smallexample
-echo ' a b c d ' | awk '@{ print $2 @}'
-@end smallexample
-
-@noindent
-However, the following prints @samp{a}:
-
-@smallexample
-echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t]+" @} ; @{ print $2 @}'
-@end smallexample
-
-@noindent
-In this case, the first field is null.
-
-The stripping of leading and trailing whitespace also comes into
-play whenever @code{$0} is recomputed. For instance, this pipeline
-
-@smallexample
-echo ' a b c d' | awk '@{ print; $2 = $2; print @}'
-@end smallexample
-
-@noindent
-produces this output:
-
-@smallexample
- a b c d
-a b c d
-@end smallexample
-
-@noindent
-The first @code{print} statement prints the record as it was read,
-with leading whitespace intact. The assignment to @code{$2} rebuilds
-@code{$0} by concatenating @code{$1} through @code{$NF} together,
-separated by the value of @code{OFS}. Since the leading whitespace
-was ignored when finding @code{$1}, it is not part of the new @code{$0}.
-Finally, the last @code{print} statement prints the new @code{$0}.
-@c end expert info
-
-The following table summarizes how fields are split, based on the
-value of @code{FS}.
-
-@table @code
-@item FS == " "
-Fields are separated by runs of whitespace. Leading and trailing
-whitespace are ignored. This is the default.
-
-@item FS == @var{any single character}
-Fields are separated by each occurrence of the character. Multiple
-successive occurrences delimit empty fields, as do leading and
-trailing occurrences.
-
-@item FS == @var{regexp}
-Fields are separated by occurrences of characters that match @var{regexp}.
-Leading and trailing matches of @var{regexp} delimit empty fields.
-@end table
-
-@node Constant Size, Multiple Line, Field Separators, Reading Files
-@section Reading Fixed-width Data
-
-(This section discusses an advanced, experimental feature. If you are
-a novice @code{awk} user, you may wish to skip it on the first reading.)
-
-@code{gawk} 2.13 introduced a new facility for dealing with fixed-width fields
-with no distinctive field separator. Data of this nature arises typically
-in one of at least two ways: the input for old FORTRAN programs where
-numbers are run together, and the output of programs that did not anticipate
-the use of their output as input for other programs.
-
-An example of the latter is a table where all the columns are lined up by
-the use of a variable number of spaces and @emph{empty fields are just
-spaces}. Clearly, @code{awk}'s normal field splitting based on @code{FS}
-will not work well in this case. (Although a portable @code{awk} program
-can use a series of @code{substr} calls on @code{$0}, this is awkward and
-inefficient for a large number of fields.)@refill
-
-The splitting of an input record into fixed-width fields is specified by
-assigning a string containing space-separated numbers to the built-in
-variable @code{FIELDWIDTHS}. Each number specifies the width of the field
-@emph{including} columns between fields. If you want to ignore the columns
-between fields, you can specify the width as a separate field that is
-subsequently ignored.
-
-The following data is the output of the @code{w} utility. It is useful
-to illustrate the use of @code{FIELDWIDTHS}.
-
-@smallexample
- 10:06pm up 21 days, 14:04, 23 users
-User tty login@ idle JCPU PCPU what
-hzuo ttyV0 8:58pm 9 5 vi p24.tex
-hzang ttyV3 6:37pm 50 -csh
-eklye ttyV5 9:53pm 7 1 em thes.tex
-dportein ttyV6 8:17pm 1:47 -csh
-gierd ttyD3 10:00pm 1 elm
-dave ttyD4 9:47pm 4 4 w
-brent ttyp0 26Jun91 4:46 26:46 4:41 bash
-dave ttyq4 26Jun9115days 46 46 wnewmail
-@end smallexample
-
-The following program takes the above input, converts the idle time to
-number of seconds and prints out the first two fields and the calculated
-idle time. (This program uses a number of @code{awk} features that
-haven't been introduced yet.)@refill
-
-@smallexample
-BEGIN @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @}
-NR > 2 @{
- idle = $4
- sub(/^ */, "", idle) # strip leading spaces
- if (idle == "") idle = 0
- if (idle ~ /:/) @{ split(idle, t, ":"); idle = t[1] * 60 + t[2] @}
- if (idle ~ /days/) @{ idle *= 24 * 60 * 60 @}
-
- print $1, $2, idle
-@}
-@end smallexample
-
-Here is the result of running the program on the data:
-
-@smallexample
-hzuo ttyV0 0
-hzang ttyV3 50
-eklye ttyV5 0
-dportein ttyV6 107
-gierd ttyD3 1
-dave ttyD4 0
-brent ttyp0 286
-dave ttyq4 1296000
-@end smallexample
-
-Another (possibly more practical) example of fixed-width input data
-would be the input from a deck of balloting cards. In some parts of
-the United States, voters make their choices by punching holes in computer
-cards. These cards are then processed to count the votes for any particular
-candidate or on any particular issue. Since a voter may choose not to
-vote on some issue, any column on the card may be empty. An @code{awk}
-program for processing such data could use the @code{FIELDWIDTHS} feature
-to simplify reading the data.@refill
-
-@c of course, getting gawk to run on a system with card readers is
-@c another story!
-
-This feature is still experimental, and will likely evolve over time.
-
-@node Multiple Line, Getline, Constant Size, Reading Files
-@section Multiple-Line Records
-
-@cindex multiple line records
-@cindex input, multiple line records
-@cindex reading files, multiple line records
-@cindex records, multiple line
-In some data bases, a single line cannot conveniently hold all the
-information in one entry. In such cases, you can use multi-line
-records.
-
-The first step in doing this is to choose your data format: when records
-are not defined as single lines, how do you want to define them?
-What should separate records?
-
-One technique is to use an unusual character or string to separate
-records. For example, you could use the formfeed character (written
-@code{\f} in @code{awk}, as in C) to separate them, making each record
-a page of the file. To do this, just set the variable @code{RS} to
-@code{"\f"} (a string containing the formfeed character). Any
-other character could equally well be used, as long as it won't be part
-of the data in a record.@refill
-
-@ignore
-Another technique is to have blank lines separate records. The string
-@code{"^\n+"} is a regular expression that matches any sequence of
-newlines starting at the beginning of a line---in other words, it
-matches a sequence of blank lines. If you set @code{RS} to this string,
-a record always ends at the first blank line encountered. In
-addition, a regular expression always matches the longest possible
-sequence when there is a choice. So the next record doesn't start until
-the first nonblank line that follows---no matter how many blank lines
-appear in a row, they are considered one record-separator.
-@end ignore
-
-Another technique is to have blank lines separate records. By a special
-dispensation, a null string as the value of @code{RS} indicates that
-records are separated by one or more blank lines. If you set @code{RS}
-to the null string, a record always ends at the first blank line
-encountered. And the next record doesn't start until the first nonblank
-line that follows---no matter how many blank lines appear in a row, they
-are considered one record-separator. (End of file is also considered
-a record separator.)@refill
-@c !!! This use of `end of file' is confusing. Needs to be clarified.
-
-The second step is to separate the fields in the record. One way to do
-this is to put each field on a separate line: to do this, just set the
-variable @code{FS} to the string @code{"\n"}. (This simple regular
-expression matches a single newline.)
-
-Another way to separate fields is to divide each of the lines into fields
-in the normal manner. This happens by default as a result of a special
-feature: when @code{RS} is set to the null string, the newline character
-@emph{always} acts as a field separator. This is in addition to whatever
-field separations result from @code{FS}.
-
-The original motivation for this special exception was probably so that
-you get useful behavior in the default case (i.e., @w{@code{FS == " "}}).
-This feature can be a problem if you really don't want the
-newline character to separate fields, since there is no way to
-prevent it. However, you can work around this by using the @code{split}
-function to break up the record manually
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill
-
-@ignore
-Here are two ways to use records separated by blank lines and break each
-line into fields normally:
-
-@example
-awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list
-
-@exdent @r{or}
-
-awk 'BEGIN @{ RS = "^\n+"; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list
-@end example
-@end ignore
-
-@ignore
-Here is how to use records separated by blank lines and break each
-line into fields normally:
-
-@example
-awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} ; @{ print $1 @}' BBS-list
-@end example
-@end ignore
-
-@node Getline, Close Input, Multiple Line, Reading Files
-@section Explicit Input with @code{getline}
-
-@findex getline
-@cindex input, explicit
-@cindex explicit input
-@cindex input, @code{getline} command
-@cindex reading files, @code{getline} command
-So far we have been getting our input files from @code{awk}'s main
-input stream---either the standard input (usually your terminal) or the
-files specified on the command line. The @code{awk} language has a
-special built-in command called @code{getline} that
-can be used to read input under your explicit control.@refill
-
-This command is quite complex and should @emph{not} be used by
-beginners. It is covered here because this is the chapter on input.
-The examples that follow the explanation of the @code{getline} command
-include material that has not been covered yet. Therefore, come back
-and study the @code{getline} command @emph{after} you have reviewed the
-rest of this manual and have a good knowledge of how @code{awk} works.
-
-@vindex ERRNO
-@cindex differences: @code{gawk} and @code{awk}
-@code{getline} returns 1 if it finds a record, and 0 if the end of the
-file is encountered. If there is some error in getting a record, such
-as a file that cannot be opened, then @code{getline} returns @minus{}1.
-In this case, @code{gawk} sets the variable @code{ERRNO} to a string
-describing the error that occurred.
-
-In the following examples, @var{command} stands for a string value that
-represents a shell command.
-
-@table @code
-@item getline
-The @code{getline} command can be used without arguments to read input
-from the current input file. All it does in this case is read the next
-input record and split it up into fields. This is useful if you've
-finished processing the current record, but you want to do some special
-processing @emph{right now} on the next record. Here's an
-example:@refill
-
-@example
-awk '@{
- if (t = index($0, "/*")) @{
- if (t > 1)
- tmp = substr($0, 1, t - 1)
- else
- tmp = ""
- u = index(substr($0, t + 2), "*/")
- while (u == 0) @{
- getline
- t = -1
- u = index($0, "*/")
- @}
- if (u <= length($0) - 2)
- $0 = tmp substr($0, t + u + 3)
- else
- $0 = tmp
- @}
- print $0
-@}'
-@end example
-
-This @code{awk} program deletes all C-style comments, @samp{/* @dots{}
-*/}, from the input. By replacing the @samp{print $0} with other
-statements, you could perform more complicated processing on the
-decommented input, like searching for matches of a regular
-expression. (This program has a subtle problem---can you spot it?)
-
-@c the program to remove comments doesn't work if one
-@c comment ends and another begins on the same line. (Your
-@c idea for restart would be useful here). --- brennan@boeing.com
-
-This form of the @code{getline} command sets @code{NF} (the number of
-fields; @pxref{Fields, ,Examining Fields}), @code{NR} (the number of
-records read so far; @pxref{Records, ,How Input is Split into Records}),
-@code{FNR} (the number of records read from this input file), and the
-value of @code{$0}.
-
-@strong{Note:} the new value of @code{$0} is used in testing
-the patterns of any subsequent rules. The original value
-of @code{$0} that triggered the rule which executed @code{getline}
-is lost. By contrast, the @code{next} statement reads a new record
-but immediately begins processing it normally, starting with the first
-rule in the program. @xref{Next Statement, ,The @code{next} Statement}.
-
-@item getline @var{var}
-This form of @code{getline} reads a record into the variable @var{var}.
-This is useful when you want your program to read the next record from
-the current input file, but you don't want to subject the record to the
-normal input processing.
-
-For example, suppose the next line is a comment, or a special string,
-and you want to read it, but you must make certain that it won't trigger
-any rules. This version of @code{getline} allows you to read that line
-and store it in a variable so that the main
-read-a-line-and-check-each-rule loop of @code{awk} never sees it.
-
-The following example swaps every two lines of input. For example, given:
-
-@example
-wan
-tew
-free
-phore
-@end example
-
-@noindent
-it outputs:
-
-@example
-tew
-wan
-phore
-free
-@end example
-
-@noindent
-Here's the program:
-
-@example
-@group
-awk '@{
- if ((getline tmp) > 0) @{
- print tmp
- print $0
- @} else
- print $0
-@}'
-@end group
-@end example
-
-The @code{getline} function used in this way sets only the variables
-@code{NR} and @code{FNR} (and of course, @var{var}). The record is not
-split into fields, so the values of the fields (including @code{$0}) and
-the value of @code{NF} do not change.@refill
-
-@item getline < @var{file}
-@cindex input redirection
-@cindex redirection of input
-This form of the @code{getline} function takes its input from the file
-@var{file}. Here @var{file} is a string-valued expression that
-specifies the file name. @samp{< @var{file}} is called a @dfn{redirection}
-since it directs input to come from a different place.
-
-This form is useful if you want to read your input from a particular
-file, instead of from the main input stream. For example, the following
-program reads its input record from the file @file{foo.input} when it
-encounters a first field with a value equal to 10 in the current input
-file.@refill
-
-@example
-awk '@{
- if ($1 == 10) @{
- getline < "foo.input"
- print
- @} else
- print
-@}'
-@end example
-
-Since the main input stream is not used, the values of @code{NR} and
-@code{FNR} are not changed. But the record read is split into fields in
-the normal manner, so the values of @code{$0} and other fields are
-changed. So is the value of @code{NF}.
-
-This does not cause the record to be tested against all the patterns
-in the @code{awk} program, in the way that would happen if the record
-were read normally by the main processing loop of @code{awk}. However
-the new record is tested against any subsequent rules, just as when
-@code{getline} is used without a redirection.
-
-@item getline @var{var} < @var{file}
-This form of the @code{getline} function takes its input from the file
-@var{file} and puts it in the variable @var{var}. As above, @var{file}
-is a string-valued expression that specifies the file from which to read.
-
-In this version of @code{getline}, none of the built-in variables are
-changed, and the record is not split into fields. The only variable
-changed is @var{var}.
-
-For example, the following program copies all the input files to the
-output, except for records that say @w{@samp{@@include @var{filename}}}.
-Such a record is replaced by the contents of the file
-@var{filename}.@refill
-
-@example
-awk '@{
- if (NF == 2 && $1 == "@@include") @{
- while ((getline line < $2) > 0)
- print line
- close($2)
- @} else
- print
-@}'
-@end example
-
-Note here how the name of the extra input file is not built into
-the program; it is taken from the data, from the second field on
-the @samp{@@include} line.@refill
-
-The @code{close} function is called to ensure that if two identical
-@samp{@@include} lines appear in the input, the entire specified file is
-included twice. @xref{Close Input, ,Closing Input Files and Pipes}.@refill
-
-One deficiency of this program is that it does not process nested
-@samp{@@include} statements the way a true macro preprocessor would.
-
-@item @var{command} | getline
-You can @dfn{pipe} the output of a command into @code{getline}. A pipe is
-simply a way to link the output of one program to the input of another. In
-this case, the string @var{command} is run as a shell command and its output
-is piped into @code{awk} to be used as input. This form of @code{getline}
-reads one record from the pipe.
-
-For example, the following program copies input to output, except for lines
-that begin with @samp{@@execute}, which are replaced by the output produced by
-running the rest of the line as a shell command:
-
-@example
-awk '@{
- if ($1 == "@@execute") @{
- tmp = substr($0, 10)
- while ((tmp | getline) > 0)
- print
- close(tmp)
- @} else
- print
-@}'
-@end example
-
-@noindent
-The @code{close} function is called to ensure that if two identical
-@samp{@@execute} lines appear in the input, the command is run for
-each one. @xref{Close Input, ,Closing Input Files and Pipes}.
-
-Given the input:
-
-@example
-foo
-bar
-baz
-@@execute who
-bletch
-@end example
-
-@noindent
-the program might produce:
-
-@example
-foo
-bar
-baz
-hack ttyv0 Jul 13 14:22
-hack ttyp0 Jul 13 14:23 (gnu:0)
-hack ttyp1 Jul 13 14:23 (gnu:0)
-hack ttyp2 Jul 13 14:23 (gnu:0)
-hack ttyp3 Jul 13 14:23 (gnu:0)
-bletch
-@end example
-
-@noindent
-Notice that this program ran the command @code{who} and printed the result.
-(If you try this program yourself, you will get different results, showing
-you who is logged in on your system.)
-
-This variation of @code{getline} splits the record into fields, sets the
-value of @code{NF} and recomputes the value of @code{$0}. The values of
-@code{NR} and @code{FNR} are not changed.
-
-@item @var{command} | getline @var{var}
-The output of the command @var{command} is sent through a pipe to
-@code{getline} and into the variable @var{var}. For example, the
-following program reads the current date and time into the variable
-@code{current_time}, using the @code{date} utility, and then
-prints it.@refill
-
-@example
-awk 'BEGIN @{
- "date" | getline current_time
- close("date")
- print "Report printed on " current_time
-@}'
-@end example
-
-In this version of @code{getline}, none of the built-in variables are
-changed, and the record is not split into fields.
-@end table
-
-@node Close Input, , Getline, Reading Files
-@section Closing Input Files and Pipes
-@cindex closing input files and pipes
-@findex close
-
-If the same file name or the same shell command is used with
-@code{getline} more than once during the execution of an @code{awk}
-program, the file is opened (or the command is executed) only the first time.
-At that time, the first record of input is read from that file or command.
-The next time the same file or command is used in @code{getline}, another
-record is read from it, and so on.
-
-This implies that if you want to start reading the same file again from
-the beginning, or if you want to rerun a shell command (rather than
-reading more output from the command), you must take special steps.
-What you must do is use the @code{close} function, as follows:
-
-@example
-close(@var{filename})
-@end example
-
-@noindent
-or
-
-@example
-close(@var{command})
-@end example
-
-The argument @var{filename} or @var{command} can be any expression. Its
-value must exactly equal the string that was used to open the file or
-start the command---for example, if you open a pipe with this:
-
-@example
-"sort -r names" | getline foo
-@end example
-
-@noindent
-then you must close it with this:
-
-@example
-close("sort -r names")
-@end example
-
-Once this function call is executed, the next @code{getline} from that
-file or command will reopen the file or rerun the command.
-
-@iftex
-@vindex ERRNO
-@cindex differences: @code{gawk} and @code{awk}
-@end iftex
-@code{close} returns a value of zero if the close succeeded.
-Otherwise, the value will be non-zero.
-In this case, @code{gawk} sets the variable @code{ERRNO} to a string
-describing the error that occurred.
-
-@node Printing, One-liners, Reading Files, Top
-@chapter Printing Output
-
-@cindex printing
-@cindex output
-One of the most common things that actions do is to output or @dfn{print}
-some or all of the input. For simple output, use the @code{print}
-statement. For fancier formatting use the @code{printf} statement.
-Both are described in this chapter.
-
-@menu
-* Print:: The @code{print} statement.
-* Print Examples:: Simple examples of @code{print} statements.
-* Output Separators:: The output separators and how to change them.
-* OFMT:: Controlling Numeric Output With @code{print}.
-* Printf:: The @code{printf} statement.
-* Redirection:: How to redirect output to multiple
- files and pipes.
-* Special Files:: File name interpretation in @code{gawk}.
- @code{gawk} allows access to
- inherited file descriptors.
-@end menu
-
-@node Print, Print Examples, Printing, Printing
-@section The @code{print} Statement
-@cindex @code{print} statement
-
-The @code{print} statement does output with simple, standardized
-formatting. You specify only the strings or numbers to be printed, in a
-list separated by commas. They are output, separated by single spaces,
-followed by a newline. The statement looks like this:
-
-@example
-print @var{item1}, @var{item2}, @dots{}
-@end example
-
-@noindent
-The entire list of items may optionally be enclosed in parentheses. The
-parentheses are necessary if any of the item expressions uses a
-relational operator; otherwise it could be confused with a redirection
-(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}).
-The relational operators are @samp{==},
-@samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and
-@samp{!~} (@pxref{Comparison Ops, ,Comparison Expressions}).@refill
-
-The items printed can be constant strings or numbers, fields of the
-current record (such as @code{$1}), variables, or any @code{awk}
-expressions. The @code{print} statement is completely general for
-computing @emph{what} values to print. With two exceptions,
-you cannot specify @emph{how} to print them---how many
-columns, whether to use exponential notation or not, and so on.
-(@xref{Output Separators}, and
-@ref{OFMT, ,Controlling Numeric Output with @code{print}}.)
-For that, you need the @code{printf} statement
-(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill
-
-The simple statement @samp{print} with no items is equivalent to
-@samp{print $0}: it prints the entire current record. To print a blank
-line, use @samp{print ""}, where @code{""} is the null, or empty,
-string.
-
-To print a fixed piece of text, use a string constant such as
-@w{@code{"Hello there"}} as one item. If you forget to use the
-double-quote characters, your text will be taken as an @code{awk}
-expression, and you will probably get an error. Keep in mind that a
-space is printed between any two items.
-
-Most often, each @code{print} statement makes one line of output. But it
-isn't limited to one line. If an item value is a string that contains a
-newline, the newline is output along with the rest of the string. A
-single @code{print} can make any number of lines this way.
-
-@node Print Examples, Output Separators, Print, Printing
-@section Examples of @code{print} Statements
-
-Here is an example of printing a string that contains embedded newlines:
-
-@example
-awk 'BEGIN @{ print "line one\nline two\nline three" @}'
-@end example
-
-@noindent
-produces output like this:
-
-@example
-line one
-line two
-line three
-@end example
-
-Here is an example that prints the first two fields of each input record,
-with a space between them:
-
-@example
-awk '@{ print $1, $2 @}' inventory-shipped
-@end example
-
-@noindent
-Its output looks like this:
-
-@example
-Jan 13
-Feb 15
-Mar 15
-@dots{}
-@end example
-
-A common mistake in using the @code{print} statement is to omit the comma
-between two items. This often has the effect of making the items run
-together in the output, with no space. The reason for this is that
-juxtaposing two string expressions in @code{awk} means to concatenate
-them. For example, without the comma:
-
-@example
-awk '@{ print $1 $2 @}' inventory-shipped
-@end example
-
-@noindent
-prints:
-
-@example
-@group
-Jan13
-Feb15
-Mar15
-@dots{}
-@end group
-@end example
-
-Neither example's output makes much sense to someone unfamiliar with the
-file @file{inventory-shipped}. A heading line at the beginning would make
-it clearer. Let's add some headings to our table of months (@code{$1}) and
-green crates shipped (@code{$2}). We do this using the @code{BEGIN} pattern
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}) to force the headings to be printed only once:
-
-@example
-awk 'BEGIN @{ print "Month Crates"
- print "----- ------" @}
- @{ print $1, $2 @}' inventory-shipped
-@end example
-
-@noindent
-Did you already guess what happens? This program prints the following:
-
-@example
-@group
-Month Crates
------ ------
-Jan 13
-Feb 15
-Mar 15
-@dots{}
-@end group
-@end example
-
-@noindent
-The headings and the table data don't line up! We can fix this by printing
-some spaces between the two fields:
-
-@example
-awk 'BEGIN @{ print "Month Crates"
- print "----- ------" @}
- @{ print $1, " ", $2 @}' inventory-shipped
-@end example
-
-You can imagine that this way of lining up columns can get pretty
-complicated when you have many columns to fix. Counting spaces for two
-or three columns can be simple, but more than this and you can get
-``lost'' quite easily. This is why the @code{printf} statement was
-created (@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing});
-one of its specialties is lining up columns of data.@refill
-
-@node Output Separators, OFMT, Print Examples, Printing
-@section Output Separators
-
-@cindex output field separator, @code{OFS}
-@vindex OFS
-@vindex ORS
-@cindex output record separator, @code{ORS}
-As mentioned previously, a @code{print} statement contains a list
-of items, separated by commas. In the output, the items are normally
-separated by single spaces. But they do not have to be spaces; a
-single space is only the default. You can specify any string of
-characters to use as the @dfn{output field separator} by setting the
-built-in variable @code{OFS}. The initial value of this variable
-is the string @w{@code{" "}}, that is, just a single space.@refill
-
-The output from an entire @code{print} statement is called an
-@dfn{output record}. Each @code{print} statement outputs one output
-record and then outputs a string called the @dfn{output record separator}.
-The built-in variable @code{ORS} specifies this string. The initial
-value of the variable is the string @code{"\n"} containing a newline
-character; thus, normally each @code{print} statement makes a separate line.
-
-You can change how output fields and records are separated by assigning
-new values to the variables @code{OFS} and/or @code{ORS}. The usual
-place to do this is in the @code{BEGIN} rule
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}), so
-that it happens before any input is processed. You may also do this
-with assignments on the command line, before the names of your input
-files.@refill
-
-The following example prints the first and second fields of each input
-record separated by a semicolon, with a blank line added after each
-line:@refill
-
-@example
-@group
-awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}
- @{ print $1, $2 @}' BBS-list
-@end group
-@end example
-
-If the value of @code{ORS} does not contain a newline, all your output
-will be run together on a single line, unless you output newlines some
-other way.
-
-@node OFMT, Printf, Output Separators, Printing
-@section Controlling Numeric Output with @code{print}
-@vindex OFMT
-When you use the @code{print} statement to print numeric values,
-@code{awk} internally converts the number to a string of characters,
-and prints that string. @code{awk} uses the @code{sprintf} function
-to do this conversion. For now, it suffices to say that the @code{sprintf}
-function accepts a @dfn{format specification} that tells it how to format
-numbers (or strings), and that there are a number of different ways that
-numbers can be formatted. The different format specifications are discussed
-more fully in
-@ref{Printf, ,Using @code{printf} Statements for Fancier Printing}.@refill
-
-The built-in variable @code{OFMT} contains the default format specification
-that @code{print} uses with @code{sprintf} when it wants to convert a
-number to a string for printing. By supplying different format specifications
-as the value of @code{OFMT}, you can change how @code{print} will print
-your numbers. As a brief example:
-
-@example
-@group
-awk 'BEGIN @{ OFMT = "%d" # print numbers as integers
- print 17.23 @}'
-@end group
-@end example
-
-@noindent
-will print @samp{17}.
-
-@node Printf, Redirection, OFMT, Printing
-@section Using @code{printf} Statements for Fancier Printing
-@cindex formatted output
-@cindex output, formatted
-
-If you want more precise control over the output format than
-@code{print} gives you, use @code{printf}. With @code{printf} you can
-specify the width to use for each item, and you can specify various
-stylistic choices for numbers (such as what radix to use, whether to
-print an exponent, whether to print a sign, and how many digits to print
-after the decimal point). You do this by specifying a string, called
-the @dfn{format string}, which controls how and where to print the other
-arguments.
-
-@menu
-* Basic Printf:: Syntax of the @code{printf} statement.
-* Control Letters:: Format-control letters.
-* Format Modifiers:: Format-specification modifiers.
-* Printf Examples:: Several examples.
-@end menu
-
-@node Basic Printf, Control Letters, Printf, Printf
-@subsection Introduction to the @code{printf} Statement
-
-@cindex @code{printf} statement, syntax of
-The @code{printf} statement looks like this:@refill
-
-@example
-printf @var{format}, @var{item1}, @var{item2}, @dots{}
-@end example
-
-@noindent
-The entire list of arguments may optionally be enclosed in parentheses. The
-parentheses are necessary if any of the item expressions uses a
-relational operator; otherwise it could be confused with a redirection
-(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}).
-The relational operators are @samp{==},
-@samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and
-@samp{!~} (@pxref{Comparison Ops, ,Comparison Expressions}).@refill
-
-@cindex format string
-The difference between @code{printf} and @code{print} is the argument
-@var{format}. This is an expression whose value is taken as a string; it
-specifies how to output each of the other arguments. It is called
-the @dfn{format string}.
-
-The format string is the same as in the @sc{ansi} C library function
-@code{printf}. Most of @var{format} is text to be output verbatim.
-Scattered among this text are @dfn{format specifiers}, one per item.
-Each format specifier says to output the next item at that place in the
-format.@refill
-
-The @code{printf} statement does not automatically append a newline to its
-output. It outputs only what the format specifies. So if you want
-a newline, you must include one in the format. The output separator
-variables @code{OFS} and @code{ORS} have no effect on @code{printf}
-statements.@refill
-
-@node Control Letters, Format Modifiers, Basic Printf, Printf
-@subsection Format-Control Letters
-@cindex @code{printf}, format-control characters
-@cindex format specifier
-
-A format specifier starts with the character @samp{%} and ends with a
-@dfn{format-control letter}; it tells the @code{printf} statement how
-to output one item. (If you actually want to output a @samp{%}, write
-@samp{%%}.) The format-control letter specifies what kind of value to
-print. The rest of the format specifier is made up of optional
-@dfn{modifiers} which are parameters such as the field width to use.@refill
-
-Here is a list of the format-control letters:
-
-@table @samp
-@item c
-This prints a number as an ASCII character. Thus, @samp{printf "%c",
-65} outputs the letter @samp{A}. The output for a string value is
-the first character of the string.
-
-@item d
-This prints a decimal integer.
-
-@item i
-This also prints a decimal integer.
-
-@item e
-This prints a number in scientific (exponential) notation.
-For example,
-
-@example
-printf "%4.3e", 1950
-@end example
-
-@noindent
-prints @samp{1.950e+03}, with a total of four significant figures of
-which three follow the decimal point. The @samp{4.3} are @dfn{modifiers},
-discussed below.
-
-@item f
-This prints a number in floating point notation.
-
-@item g
-This prints a number in either scientific notation or floating point
-notation, whichever uses fewer characters.
-@ignore
-From: gatech!ames!elroy!cit-vax!EQL.Caltech.Edu!rankin (Pat Rankin)
-
-In the description of printf formats (p.43), the information for %g
-is incorrect (mainly, it's too much of an oversimplification). It's
-wrong in the AWK book too, and in the gawk man page. I suggested to
-David Trueman before 2.13 was released that the latter be revised, so
-that it matched gawk's behavior (rather than trying to change gawk to
-match the docs ;-). The documented description is nice and simple, but
-it doesn't match the actual underlying behavior of %g in the various C
-run-time libraries that gawk relies on. The precision value for g format
-is different than for f and e formats, so it's inaccurate to say 'g' is
-the shorter of 'e' or 'f'. For 'g', precision represents the number of
-significant digits rather than the number of decimal places, and it has
-special rules about how to format numbers with range between 10E-1 and
-10E-4. All in all, it's pretty messy, and I had to add that clumsy
-GFMT_WORKAROUND code because the VMS run-time library doesn't conform to
-the ANSI-C specifications.
-@end ignore
-
-@item o
-This prints an unsigned octal integer.
-
-@item s
-This prints a string.
-
-@item x
-This prints an unsigned hexadecimal integer.
-
-@item X
-This prints an unsigned hexadecimal integer. However, for the values 10
-through 15, it uses the letters @samp{A} through @samp{F} instead of
-@samp{a} through @samp{f}.
-
-@item %
-This isn't really a format-control letter, but it does have a meaning
-when used after a @samp{%}: the sequence @samp{%%} outputs one
-@samp{%}. It does not consume an argument.
-@end table
-
-@node Format Modifiers, Printf Examples, Control Letters, Printf
-@subsection Modifiers for @code{printf} Formats
-
-@cindex @code{printf}, modifiers
-@cindex modifiers (in format specifiers)
-A format specification can also include @dfn{modifiers} that can control
-how much of the item's value is printed and how much space it gets. The
-modifiers come between the @samp{%} and the format-control letter. Here
-are the possible modifiers, in the order in which they may appear:
-
-@table @samp
-@item -
-The minus sign, used before the width modifier, says to left-justify
-the argument within its specified width. Normally the argument
-is printed right-justified in the specified width. Thus,
-
-@example
-printf "%-4s", "foo"
-@end example
-
-@noindent
-prints @samp{foo }.
-
-@item @var{width}
-This is a number representing the desired width of a field. Inserting any
-number between the @samp{%} sign and the format control character forces the
-field to be expanded to this width. The default way to do this is to
-pad with spaces on the left. For example,
-
-@example
-printf "%4s", "foo"
-@end example
-
-@noindent
-prints @samp{ foo}.
-
-The value of @var{width} is a minimum width, not a maximum. If the item
-value requires more than @var{width} characters, it can be as wide as
-necessary. Thus,
-
-@example
-printf "%4s", "foobar"
-@end example
-
-@noindent
-prints @samp{foobar}.
-
-Preceding the @var{width} with a minus sign causes the output to be
-padded with spaces on the right, instead of on the left.
-
-@item .@var{prec}
-This is a number that specifies the precision to use when printing.
-This specifies the number of digits you want printed to the right of the
-decimal point. For a string, it specifies the maximum number of
-characters from the string that should be printed.
-@end table
-
-The C library @code{printf}'s dynamic @var{width} and @var{prec}
-capability (for example, @code{"%*.*s"}) is supported. Instead of
-supplying explicit @var{width} and/or @var{prec} values in the format
-string, you pass them in the argument list. For example:@refill
-
-@example
-w = 5
-p = 3
-s = "abcdefg"
-printf "<%*.*s>\n", w, p, s
-@end example
-
-@noindent
-is exactly equivalent to
-
-@example
-s = "abcdefg"
-printf "<%5.3s>\n", s
-@end example
-
-@noindent
-Both programs output @samp{@w{<@bullet{}@bullet{}abc>}}. (We have
-used the bullet symbol ``@bullet{}'' to represent a space, to clearly
-show you that there are two spaces in the output.)@refill
-
-Earlier versions of @code{awk} did not support this capability. You may
-simulate it by using concatenation to build up the format string,
-like so:@refill
-
-@example
-w = 5
-p = 3
-s = "abcdefg"
-printf "<%" w "." p "s>\n", s
-@end example
-
-@noindent
-This is not particularly easy to read, however.
-
-@node Printf Examples, , Format Modifiers, Printf
-@subsection Examples of Using @code{printf}
-
-Here is how to use @code{printf} to make an aligned table:
-
-@example
-awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
-@end example
-
-@noindent
-prints the names of bulletin boards (@code{$1}) of the file
-@file{BBS-list} as a string of 10 characters, left justified. It also
-prints the phone numbers (@code{$2}) afterward on the line. This
-produces an aligned two-column table of names and phone numbers:@refill
-
-@example
-@group
-aardvark 555-5553
-alpo-net 555-3412
-barfly 555-7685
-bites 555-1675
-camelot 555-0542
-core 555-2912
-fooey 555-1234
-foot 555-6699
-macfoo 555-6480
-sdace 555-3430
-sabafoo 555-2127
-@end group
-@end example
-
-Did you notice that we did not specify that the phone numbers be printed
-as numbers? They had to be printed as strings because the numbers are
-separated by a dash. This dash would be interpreted as a minus sign if
-we had tried to print the phone numbers as numbers. This would have led
-to some pretty confusing results.
-
-We did not specify a width for the phone numbers because they are the
-last things on their lines. We don't need to put spaces after them.
-
-We could make our table look even nicer by adding headings to the tops
-of the columns. To do this, use the @code{BEGIN} pattern
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns})
-to force the header to be printed only once, at the beginning of
-the @code{awk} program:@refill
-
-@example
-@group
-awk 'BEGIN @{ print "Name Number"
- print "---- ------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
-@end group
-@end example
-
-Did you notice that we mixed @code{print} and @code{printf} statements in
-the above example? We could have used just @code{printf} statements to get
-the same results:
-
-@example
-@group
-awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
- printf "%-10s %s\n", "----", "------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
-@end group
-@end example
-
-@noindent
-By outputting each column heading with the same format specification
-used for the elements of the column, we have made sure that the headings
-are aligned just like the columns.
-
-The fact that the same format specification is used three times can be
-emphasized by storing it in a variable, like this:
-
-@example
-awk 'BEGIN @{ format = "%-10s %s\n"
- printf format, "Name", "Number"
- printf format, "----", "------" @}
- @{ printf format, $1, $2 @}' BBS-list
-@end example
-
-See if you can use the @code{printf} statement to line up the headings and
-table data for our @file{inventory-shipped} example covered earlier in the
-section on the @code{print} statement
-(@pxref{Print, ,The @code{print} Statement}).@refill
-
-@node Redirection, Special Files, Printf, Printing
-@section Redirecting Output of @code{print} and @code{printf}
-
-@cindex output redirection
-@cindex redirection of output
-So far we have been dealing only with output that prints to the standard
-output, usually your terminal. Both @code{print} and @code{printf} can
-also send their output to other places.
-This is called @dfn{redirection}.@refill
-
-A redirection appears after the @code{print} or @code{printf} statement.
-Redirections in @code{awk} are written just like redirections in shell
-commands, except that they are written inside the @code{awk} program.
-
-@menu
-* File/Pipe Redirection:: Redirecting Output to Files and Pipes.
-* Close Output:: How to close output files and pipes.
-@end menu
-
-@node File/Pipe Redirection, Close Output, Redirection, Redirection
-@subsection Redirecting Output to Files and Pipes
-
-Here are the three forms of output redirection. They are all shown for
-the @code{print} statement, but they work identically for @code{printf}
-also.@refill
-
-@table @code
-@item print @var{items} > @var{output-file}
-This type of redirection prints the items onto the output file
-@var{output-file}. The file name @var{output-file} can be any
-expression. Its value is changed to a string and then used as a
-file name (@pxref{Expressions, ,Expressions as Action Statements}).@refill
-
-When this type of redirection is used, the @var{output-file} is erased
-before the first output is written to it. Subsequent writes do not
-erase @var{output-file}, but append to it. If @var{output-file} does
-not exist, then it is created.@refill
-
-For example, here is how one @code{awk} program can write a list of
-BBS names to a file @file{name-list} and a list of phone numbers to a
-file @file{phone-list}. Each output file contains one name or number
-per line.
-
-@smallexample
-awk '@{ print $2 > "phone-list"
- print $1 > "name-list" @}' BBS-list
-@end smallexample
-
-@item print @var{items} >> @var{output-file}
-This type of redirection prints the items onto the output file
-@var{output-file}. The difference between this and the
-single-@samp{>} redirection is that the old contents (if any) of
-@var{output-file} are not erased. Instead, the @code{awk} output is
-appended to the file.
-
-@cindex pipes for output
-@cindex output, piping
-@item print @var{items} | @var{command}
-It is also possible to send output through a @dfn{pipe} instead of into a
-file. This type of redirection opens a pipe to @var{command} and writes
-the values of @var{items} through this pipe, to another process created
-to execute @var{command}.@refill
-
-The redirection argument @var{command} is actually an @code{awk}
-expression. Its value is converted to a string, whose contents give the
-shell command to be run.
-
-For example, this produces two files, one unsorted list of BBS names
-and one list sorted in reverse alphabetical order:
-
-@smallexample
-awk '@{ print $1 > "names.unsorted"
- print $1 | "sort -r > names.sorted" @}' BBS-list
-@end smallexample
-
-Here the unsorted list is written with an ordinary redirection while
-the sorted list is written by piping through the @code{sort} utility.
-
-Here is an example that uses redirection to mail a message to a mailing
-list @samp{bug-system}. This might be useful when trouble is encountered
-in an @code{awk} script run periodically for system maintenance.
-
-@smallexample
-report = "mail bug-system"
-print "Awk script failed:", $0 | report
-print "at record number", FNR, "of", FILENAME | report
-close(report)
-@end smallexample
-
-We call the @code{close} function here because it's a good idea to close
-the pipe as soon as all the intended output has been sent to it.
-@xref{Close Output, ,Closing Output Files and Pipes}, for more information
-on this. This example also illustrates the use of a variable to represent
-a @var{file} or @var{command}: it is not necessary to always
-use a string constant. Using a variable is generally a good idea,
-since @code{awk} requires you to spell the string value identically
-every time.
-@end table
-
-Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system
-to open a file or pipe only if the particular @var{file} or @var{command}
-you've specified has not already been written to by your program, or if
-it has been closed since it was last written to.@refill
-
-@node Close Output, , File/Pipe Redirection, Redirection
-@subsection Closing Output Files and Pipes
-@cindex closing output files and pipes
-@findex close
-
-When a file or pipe is opened, the file name or command associated with
-it is remembered by @code{awk} and subsequent writes to the same file or
-command are appended to the previous writes. The file or pipe stays
-open until @code{awk} exits. This is usually convenient.
-
-Sometimes there is a reason to close an output file or pipe earlier
-than that. To do this, use the @code{close} function, as follows:
-
-@example
-close(@var{filename})
-@end example
-
-@noindent
-or
-
-@example
-close(@var{command})
-@end example
-
-The argument @var{filename} or @var{command} can be any expression.
-Its value must exactly equal the string used to open the file or pipe
-to begin with---for example, if you open a pipe with this:
-
-@example
-print $1 | "sort -r > names.sorted"
-@end example
-
-@noindent
-then you must close it with this:
-
-@example
-close("sort -r > names.sorted")
-@end example
-
-Here are some reasons why you might need to close an output file:
-
-@itemize @bullet
-@item
-To write a file and read it back later on in the same @code{awk}
-program. Close the file when you are finished writing it; then
-you can start reading it with @code{getline}
-(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill
-
-@item
-To write numerous files, successively, in the same @code{awk}
-program. If you don't close the files, eventually you may exceed a
-system limit on the number of open files in one process. So close
-each one when you are finished writing it.
-
-@item
-To make a command finish. When you redirect output through a pipe,
-the command reading the pipe normally continues to try to read input
-as long as the pipe is open. Often this means the command cannot
-really do its work until the pipe is closed. For example, if you
-redirect output to the @code{mail} program, the message is not
-actually sent until the pipe is closed.
-
-@item
-To run the same program a second time, with the same arguments.
-This is not the same thing as giving more input to the first run!
-
-For example, suppose you pipe output to the @code{mail} program. If you
-output several lines redirected to this pipe without closing it, they make
-a single message of several lines. By contrast, if you close the pipe
-after each line of output, then each line makes a separate message.
-@end itemize
-
-@iftex
-@vindex ERRNO
-@cindex differences: @code{gawk} and @code{awk}
-@end iftex
-@code{close} returns a value of zero if the close succeeded.
-Otherwise, the value will be non-zero.
-In this case, @code{gawk} sets the variable @code{ERRNO} to a string
-describing the error that occurred.
-
-@node Special Files, , Redirection, Printing
-@section Standard I/O Streams
-@cindex standard input
-@cindex standard output
-@cindex standard error output
-@cindex file descriptors
-
-Running programs conventionally have three input and output streams
-already available to them for reading and writing. These are known as
-the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
-output}. These streams are, by default, terminal input and output, but
-they are often redirected with the shell, via the @samp{<}, @samp{<<},
-@samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error
-is used only for writing error messages; the reason we have two separate
-streams, standard output and standard error, is so that they can be
-redirected separately.
-
-@iftex
-@cindex differences: @code{gawk} and @code{awk}
-@end iftex
-In other implementations of @code{awk}, the only way to write an error
-message to standard error in an @code{awk} program is as follows:
-
-@smallexample
-print "Serious error detected!\n" | "cat 1>&2"
-@end smallexample
-
-@noindent
-This works by opening a pipeline to a shell command which can access the
-standard error stream which it inherits from the @code{awk} process.
-This is far from elegant, and is also inefficient, since it requires a
-separate process. So people writing @code{awk} programs have often
-neglected to do this. Instead, they have sent the error messages to the
-terminal, like this:
-
-@smallexample
-@group
-NF != 4 @{
- printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty"
-@}
-@end group
-@end smallexample
-
-@noindent
-This has the same effect most of the time, but not always: although the
-standard error stream is usually the terminal, it can be redirected, and
-when that happens, writing to the terminal is not correct. In fact, if
-@code{awk} is run from a background job, it may not have a terminal at all.
-Then opening @file{/dev/tty} will fail.
-
-@code{gawk} provides special file names for accessing the three standard
-streams. When you redirect input or output in @code{gawk}, if the file name
-matches one of these special names, then @code{gawk} directly uses the
-stream it stands for.
-
-@cindex @file{/dev/stdin}
-@cindex @file{/dev/stdout}
-@cindex @file{/dev/stderr}
-@cindex @file{/dev/fd/}
-@table @file
-@item /dev/stdin
-The standard input (file descriptor 0).
-
-@item /dev/stdout
-The standard output (file descriptor 1).
-
-@item /dev/stderr
-The standard error output (file descriptor 2).
-
-@item /dev/fd/@var{N}
-The file associated with file descriptor @var{N}. Such a file must have
-been opened by the program initiating the @code{awk} execution (typically
-the shell). Unless you take special pains, only descriptors 0, 1 and 2
-are available.
-@end table
-
-The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
-are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
-respectively, but they are more self-explanatory.
-
-The proper way to write an error message in a @code{gawk} program
-is to use @file{/dev/stderr}, like this:
-
-@smallexample
-NF != 4 @{
- printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr"
-@}
-@end smallexample
-
-@code{gawk} also provides special file names that give access to information
-about the running @code{gawk} process. Each of these ``files'' provides
-a single record of information. To read them more than once, you must
-first close them with the @code{close} function
-(@pxref{Close Input, ,Closing Input Files and Pipes}).
-The filenames are:
-
-@cindex @file{/dev/pid}
-@cindex @file{/dev/pgrpid}
-@cindex @file{/dev/ppid}
-@cindex @file{/dev/user}
-@table @file
-@item /dev/pid
-Reading this file returns the process ID of the current process,
-in decimal, terminated with a newline.
-
-@item /dev/ppid
-Reading this file returns the parent process ID of the current process,
-in decimal, terminated with a newline.
-
-@item /dev/pgrpid
-Reading this file returns the process group ID of the current process,
-in decimal, terminated with a newline.
-
-@item /dev/user
-Reading this file returns a single record terminated with a newline.
-The fields are separated with blanks. The fields represent the
-following information:
-
-@table @code
-@item $1
-The value of the @code{getuid} system call.
-
-@item $2
-The value of the @code{geteuid} system call.
-
-@item $3
-The value of the @code{getgid} system call.
-
-@item $4
-The value of the @code{getegid} system call.
-@end table
-
-If there are any additional fields, they are the group IDs returned by
-@code{getgroups} system call.
-(Multiple groups may not be supported on all systems.)@refill
-@end table
-
-These special file names may be used on the command line as data
-files, as well as for I/O redirections within an @code{awk} program.
-They may not be used as source files with the @samp{-f} option.
-
-Recognition of these special file names is disabled if @code{gawk} is in
-compatibility mode (@pxref{Command Line, ,Invoking @code{awk}}).
-
-@quotation
-@strong{Caution}: Unless your system actually has a @file{/dev/fd} directory
-(or any of the other above listed special files),
-the interpretation of these file names is done by @code{gawk} itself.
-For example, using @samp{/dev/fd/4} for output will actually write on
-file descriptor 4, and not on a new file descriptor that was @code{dup}'ed
-from file descriptor 4. Most of the time this does not matter; however, it
-is important to @emph{not} close any of the files related to file descriptors
-0, 1, and 2. If you do close one of these files, unpredictable behavior
-will result.
-@end quotation
-
-@node One-liners, Patterns, Printing, Top
-@chapter Useful ``One-liners''
-
-@cindex one-liners
-Useful @code{awk} programs are often short, just a line or two. Here is a
-collection of useful, short programs to get you started. Some of these
-programs contain constructs that haven't been covered yet. The description
-of the program will give you a good idea of what is going on, but please
-read the rest of the manual to become an @code{awk} expert!
-
-@c Per suggestions from Michal Jaegermann
-@ifinfo
-Since you are reading this in Info, each line of the example code is
-enclosed in quotes, to represent text that you would type literally.
-The examples themselves represent shell commands that use single quotes
-to keep the shell from interpreting the contents of the program.
-When reading the examples, focus on the text between the open and close
-quotes.
-@end ifinfo
-
-@table @code
-@item awk '@{ if (NF > max) max = NF @}
-@itemx @ @ @ @ @ END @{ print max @}'
-This program prints the maximum number of fields on any input line.
-
-@item awk 'length($0) > 80'
-This program prints every line longer than 80 characters. The sole
-rule has a relational expression as its pattern, and has no action (so the
-default action, printing the record, is used).
-
-@item awk 'NF > 0'
-This program prints every line that has at least one field. This is an
-easy way to delete blank lines from a file (or rather, to create a new
-file similar to the old file but from which the blank lines have been
-deleted).
-
-@item awk '@{ if (NF > 0) print @}'
-This program also prints every line that has at least one field. Here we
-allow the rule to match every line, then decide in the action whether
-to print.
-
-@item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++)
-@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}'
-This program prints 7 random numbers from 0 to 100, inclusive.
-
-@item ls -l @var{files} | awk '@{ x += $4 @} ; END @{ print "total bytes: " x @}'
-This program prints the total number of bytes used by @var{files}.
-
-@item expand@ @var{file}@ |@ awk@ '@{ if (x < length()) x = length() @}
-@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}'
-This program prints the maximum line length of @var{file}. The input
-is piped through the @code{expand} program to change tabs into spaces,
-so the widths compared are actually the right-margin columns.
-
-@item awk 'BEGIN @{ FS = ":" @}
-@itemx @ @ @ @ @ @{ print $1 | "sort" @}' /etc/passwd
-This program prints a sorted list of the login names of all users.
-
-@item awk '@{ nlines++ @}
-@itemx @ @ @ @ @ END@ @{ print nlines @}'
-This programs counts lines in a file.
-
-@item awk 'END @{ print NR @}'
-This program also counts lines in a file, but lets @code{awk} do the work.
-
-@item awk '@{ print NR, $0 @}'
-This program adds line numbers to all its input files,
-similar to @samp{cat -n}.
-@end table
-
-@node Patterns, Actions, One-liners, Top
-@chapter Patterns
-@cindex pattern, definition of
-
-Patterns in @code{awk} control the execution of rules: a rule is
-executed when its pattern matches the current input record. This
-chapter tells all about how to write patterns.
-
-@menu
-* Kinds of Patterns:: A list of all kinds of patterns.
- The following subsections describe
- them in detail.
-* Regexp:: Regular expressions such as @samp{/foo/}.
-* Comparison Patterns:: Comparison expressions such as @code{$1 > 10}.
-* Boolean Patterns:: Combining comparison expressions.
-* Expression Patterns:: Any expression can be used as a pattern.
-* Ranges:: Pairs of patterns specify record ranges.
-* BEGIN/END:: Specifying initialization and cleanup rules.
-* Empty:: The empty pattern, which matches every record.
-@end menu
-
-@node Kinds of Patterns, Regexp, Patterns, Patterns
-@section Kinds of Patterns
-@cindex patterns, types of
-
-Here is a summary of the types of patterns supported in @code{awk}.
-@c At the next rewrite, check to see that this order matches the
-@c order in the text. It might not matter to a reader, but it's good
-@c style. Also, it might be nice to mention all the topics of sections
-@c that follow in this list; that way people can scan and know when to
-@c expect a specific topic. Specifically please also make an entry
-@c for Boolean operators as patterns in the right place. --mew
-
-@table @code
-@item /@var{regular expression}/
-A regular expression as a pattern. It matches when the text of the
-input record fits the regular expression.
-(@xref{Regexp, ,Regular Expressions as Patterns}.)@refill
-
-@item @var{expression}
-A single expression. It matches when its value, converted to a number,
-is nonzero (if a number) or nonnull (if a string).
-(@xref{Expression Patterns, ,Expressions as Patterns}.)@refill
-
-@item @var{pat1}, @var{pat2}
-A pair of patterns separated by a comma, specifying a range of records.
-(@xref{Ranges, ,Specifying Record Ranges with Patterns}.)
-
-@item BEGIN
-@itemx END
-Special patterns to supply start-up or clean-up information to
-@code{awk}. (@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}.)
-
-@item @var{null}
-The empty pattern matches every input record.
-(@xref{Empty, ,The Empty Pattern}.)@refill
-@end table
-
-
-@node Regexp, Comparison Patterns, Kinds of Patterns, Patterns
-@section Regular Expressions as Patterns
-@cindex pattern, regular expressions
-@cindex regexp
-@cindex regular expressions as patterns
-
-A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
-class of strings. A regular expression enclosed in slashes (@samp{/})
-is an @code{awk} pattern that matches every input record whose text
-belongs to that class.
-
-The simplest regular expression is a sequence of letters, numbers, or
-both. Such a regexp matches any string that contains that sequence.
-Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
-Therefore, the pattern @code{/foo/} matches any input record containing
-@samp{foo}. Other kinds of regexps let you specify more complicated
-classes of strings.
-
-@menu
-* Regexp Usage:: How to Use Regular Expressions
-* Regexp Operators:: Regular Expression Operators
-* Case-sensitivity:: How to do case-insensitive matching.
-@end menu
-
-@node Regexp Usage, Regexp Operators, Regexp, Regexp
-@subsection How to Use Regular Expressions
-
-A regular expression can be used as a pattern by enclosing it in
-slashes. Then the regular expression is matched against the
-entire text of each record. (Normally, it only needs
-to match some part of the text in order to succeed.) For example, this
-prints the second field of each record that contains @samp{foo} anywhere:
-
-@example
-awk '/foo/ @{ print $2 @}' BBS-list
-@end example
-
-@cindex regular expression matching operators
-@cindex string-matching operators
-@cindex operators, string-matching
-@cindex operators, regexp matching
-@cindex regexp search operators
-Regular expressions can also be used in comparison expressions. Then
-you can specify the string to match against; it need not be the entire
-current input record. These comparison expressions can be used as
-patterns or in @code{if}, @code{while}, @code{for}, and @code{do} statements.
-
-@table @code
-@item @var{exp} ~ /@var{regexp}/
-This is true if the expression @var{exp} (taken as a character string)
-is matched by @var{regexp}. The following example matches, or selects,
-all input records with the upper-case letter @samp{J} somewhere in the
-first field:@refill
-
-@example
-awk '$1 ~ /J/' inventory-shipped
-@end example
-
-So does this:
-
-@example
-awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
-@end example
-
-@item @var{exp} !~ /@var{regexp}/
-This is true if the expression @var{exp} (taken as a character string)
-is @emph{not} matched by @var{regexp}. The following example matches,
-or selects, all input records whose first field @emph{does not} contain
-the upper-case letter @samp{J}:@refill
-
-@example
-awk '$1 !~ /J/' inventory-shipped
-@end example
-@end table
-
-@cindex computed regular expressions
-@cindex regular expressions, computed
-@cindex dynamic regular expressions
-The right hand side of a @samp{~} or @samp{!~} operator need not be a
-constant regexp (i.e., a string of characters between slashes). It may
-be any expression. The expression is evaluated, and converted if
-necessary to a string; the contents of the string are used as the
-regexp. A regexp that is computed in this way is called a @dfn{dynamic
-regexp}. For example:
-
-@example
-identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+"
-$0 ~ identifier_regexp
-@end example
-
-@noindent
-sets @code{identifier_regexp} to a regexp that describes @code{awk}
-variable names, and tests if the input record matches this regexp.
-
-@node Regexp Operators, Case-sensitivity, Regexp Usage, Regexp
-@subsection Regular Expression Operators
-@cindex metacharacters
-@cindex regular expression metacharacters
-
-You can combine regular expressions with the following characters,
-called @dfn{regular expression operators}, or @dfn{metacharacters}, to
-increase the power and versatility of regular expressions.
-
-Here is a table of metacharacters. All characters not listed in the
-table stand for themselves.
-
-@table @code
-@item ^
-This matches the beginning of the string or the beginning of a line
-within the string. For example:
-
-@example
-^@@chapter
-@end example
-
-@noindent
-matches the @samp{@@chapter} at the beginning of a string, and can be used
-to identify chapter beginnings in Texinfo source files.
-
-@item $
-This is similar to @samp{^}, but it matches only at the end of a string
-or the end of a line within the string. For example:
-
-@example
-p$
-@end example
-
-@noindent
-matches a record that ends with a @samp{p}.
-
-@item .
-This matches any single character except a newline. For example:
-
-@example
-.P
-@end example
-
-@noindent
-matches any single character followed by a @samp{P} in a string. Using
-concatenation we can make regular expressions like @samp{U.A}, which
-matches any three-character sequence that begins with @samp{U} and ends
-with @samp{A}.
-
-@item [@dots{}]
-This is called a @dfn{character set}. It matches any one of the
-characters that are enclosed in the square brackets. For example:
-
-@example
-[MVX]
-@end example
-
-@noindent
-matches any one of the characters @samp{M}, @samp{V}, or @samp{X} in a
-string.@refill
-
-Ranges of characters are indicated by using a hyphen between the beginning
-and ending characters, and enclosing the whole thing in brackets. For
-example:@refill
-
-@example
-[0-9]
-@end example
-
-@noindent
-matches any digit.
-
-To include the character @samp{\}, @samp{]}, @samp{-} or @samp{^} in a
-character set, put a @samp{\} in front of it. For example:
-
-@example
-[d\]]
-@end example
-
-@noindent
-matches either @samp{d}, or @samp{]}.@refill
-
-This treatment of @samp{\} is compatible with other @code{awk}
-implementations, and is also mandated by the @sc{posix} Command Language
-and Utilities standard. The regular expressions in @code{awk} are a superset
-of the @sc{posix} specification for Extended Regular Expressions (EREs).
-@sc{posix} EREs are based on the regular expressions accepted by the
-traditional @code{egrep} utility.
-
-In @code{egrep} syntax, backslash is not syntactically special within
-square brackets. This means that special tricks have to be used to
-represent the characters @samp{]}, @samp{-} and @samp{^} as members of a
-character set.
-
-In @code{egrep} syntax, to match @samp{-}, write it as @samp{---},
-which is a range containing only @w{@samp{-}.} You may also give @samp{-}
-as the first or last character in the set. To match @samp{^}, put it
-anywhere except as the first character of a set. To match a @samp{]},
-make it the first character in the set. For example:@refill
-
-@example
-[]d^]
-@end example
-
-@noindent
-matches either @samp{]}, @samp{d} or @samp{^}.@refill
-
-@item [^ @dots{}]
-This is a @dfn{complemented character set}. The first character after
-the @samp{[} @emph{must} be a @samp{^}. It matches any characters
-@emph{except} those in the square brackets (or newline). For example:
-
-@example
-[^0-9]
-@end example
-
-@noindent
-matches any character that is not a digit.
-
-@item |
-This is the @dfn{alternation operator} and it is used to specify
-alternatives. For example:
-
-@example
-^P|[0-9]
-@end example
-
-@noindent
-matches any string that matches either @samp{^P} or @samp{[0-9]}. This
-means it matches any string that contains a digit or starts with @samp{P}.
-
-The alternation applies to the largest possible regexps on either side.
-@item (@dots{})
-Parentheses are used for grouping in regular expressions as in
-arithmetic. They can be used to concatenate regular expressions
-containing the alternation operator, @samp{|}.
-
-@item *
-This symbol means that the preceding regular expression is to be
-repeated as many times as possible to find a match. For example:
-
-@example
-ph*
-@end example
-
-@noindent
-applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
-to one @samp{p} followed by any number of @samp{h}s. This will also match
-just @samp{p} if no @samp{h}s are present.
-
-The @samp{*} repeats the @emph{smallest} possible preceding expression.
-(Use parentheses if you wish to repeat a larger expression.) It finds
-as many repetitions as possible. For example:
-
-@example
-awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample
-@end example
-
-@noindent
-prints every record in the input containing a string of the form
-@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.@refill
-
-@item +
-This symbol is similar to @samp{*}, but the preceding expression must be
-matched at least once. This means that:
-
-@example
-wh+y
-@end example
-
-@noindent
-would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas
-@samp{wh*y} would match all three of these strings. This is a simpler
-way of writing the last @samp{*} example:
-
-@example
-awk '/\(c[ad]+r x\)/ @{ print @}' sample
-@end example
-
-@item ?
-This symbol is similar to @samp{*}, but the preceding expression can be
-matched once or not at all. For example:
-
-@example
-fe?d
-@end example
-
-@noindent
-will match @samp{fed} and @samp{fd}, but nothing else.@refill
-
-@item \
-This is used to suppress the special meaning of a character when
-matching. For example:
-
-@example
-\$
-@end example
-
-@noindent
-matches the character @samp{$}.
-
-The escape sequences used for string constants
-(@pxref{Constants, ,Constant Expressions}) are
-valid in regular expressions as well; they are also introduced by a
-@samp{\}.@refill
-@end table
-
-In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators have
-the highest precedence, followed by concatenation, and finally by @samp{|}.
-As in arithmetic, parentheses can change how operators are grouped.@refill
-
-@node Case-sensitivity, , Regexp Operators, Regexp
-@subsection Case-sensitivity in Matching
-
-Case is normally significant in regular expressions, both when matching
-ordinary characters (i.e., not metacharacters), and inside character
-sets. Thus a @samp{w} in a regular expression matches only a lower case
-@samp{w} and not an upper case @samp{W}.
-
-The simplest way to do a case-independent match is to use a character
-set: @samp{[Ww]}. However, this can be cumbersome if you need to use it
-often; and it can make the regular expressions harder for humans to
-read. There are two other alternatives that you might prefer.
-
-One way to do a case-insensitive match at a particular point in the
-program is to convert the data to a single case, using the
-@code{tolower} or @code{toupper} built-in string functions (which we
-haven't discussed yet;
-@pxref{String Functions, ,Built-in Functions for String Manipulation}).
-For example:@refill
-
-@example
-tolower($1) ~ /foo/ @{ @dots{} @}
-@end example
-
-@noindent
-converts the first field to lower case before matching against it.
-
-Another method is to set the variable @code{IGNORECASE} to a nonzero
-value (@pxref{Built-in Variables}). When @code{IGNORECASE} is not zero,
-@emph{all} regexp operations ignore case. Changing the value of
-@code{IGNORECASE} dynamically controls the case sensitivity of your
-program as it runs. Case is significant by default because
-@code{IGNORECASE} (like most variables) is initialized to zero.
-
-@example
-x = "aB"
-if (x ~ /ab/) @dots{} # this test will fail
-
-IGNORECASE = 1
-if (x ~ /ab/) @dots{} # now it will succeed
-@end example
-
-In general, you cannot use @code{IGNORECASE} to make certain rules
-case-insensitive and other rules case-sensitive, because there is no way
-to set @code{IGNORECASE} just for the pattern of a particular rule. To
-do this, you must use character sets or @code{tolower}. However, one
-thing you can do only with @code{IGNORECASE} is turn case-sensitivity on
-or off dynamically for all the rules at once.@refill
-
-@code{IGNORECASE} can be set on the command line, or in a @code{BEGIN}
-rule. Setting @code{IGNORECASE} from the command line is a way to make
-a program case-insensitive without having to edit it.
-
-The value of @code{IGNORECASE} has no effect if @code{gawk} is in
-compatibility mode (@pxref{Command Line, ,Invoking @code{awk}}).
-Case is always significant in compatibility mode.@refill
-
-@node Comparison Patterns, Boolean Patterns, Regexp, Patterns
-@section Comparison Expressions as Patterns
-@cindex comparison expressions as patterns
-@cindex pattern, comparison expressions
-@cindex relational operators
-@cindex operators, relational
-
-@dfn{Comparison patterns} test relationships such as equality between
-two strings or numbers. They are a special case of expression patterns
-(@pxref{Expression Patterns, ,Expressions as Patterns}). They are written
-with @dfn{relational operators}, which are a superset of those in C.
-Here is a table of them:@refill
-
-@table @code
-@item @var{x} < @var{y}
-True if @var{x} is less than @var{y}.
-
-@item @var{x} <= @var{y}
-True if @var{x} is less than or equal to @var{y}.
-
-@item @var{x} > @var{y}
-True if @var{x} is greater than @var{y}.
-
-@item @var{x} >= @var{y}
-True if @var{x} is greater than or equal to @var{y}.
-
-@item @var{x} == @var{y}
-True if @var{x} is equal to @var{y}.
-
-@item @var{x} != @var{y}
-True if @var{x} is not equal to @var{y}.
-
-@item @var{x} ~ @var{y}
-True if @var{x} matches the regular expression described by @var{y}.
-
-@item @var{x} !~ @var{y}
-True if @var{x} does not match the regular expression described by @var{y}.
-@end table
-
-The operands of a relational operator are compared as numbers if they
-are both numbers. Otherwise they are converted to, and compared as,
-strings (@pxref{Conversion, ,Conversion of Strings and Numbers},
-for the detailed rules). Strings are compared by comparing the first
-character of each, then the second character of each,
-and so on, until there is a difference. If the two strings are equal until
-the shorter one runs out, the shorter one is considered to be less than the
-longer one. Thus, @code{"10"} is less than @code{"9"}, and @code{"abc"}
-is less than @code{"abcd"}.@refill
-
-The left operand of the @samp{~} and @samp{!~} operators is a string.
-The right operand is either a constant regular expression enclosed in
-slashes (@code{/@var{regexp}/}), or any expression, whose string value
-is used as a dynamic regular expression
-(@pxref{Regexp Usage, ,How to Use Regular Expressions}).@refill
-
-The following example prints the second field of each input record
-whose first field is precisely @samp{foo}.
-
-@example
-awk '$1 == "foo" @{ print $2 @}' BBS-list
-@end example
-
-@noindent
-Contrast this with the following regular expression match, which would
-accept any record with a first field that contains @samp{foo}:
-
-@example
-awk '$1 ~ "foo" @{ print $2 @}' BBS-list
-@end example
-
-@noindent
-or, equivalently, this one:
-
-@example
-awk '$1 ~ /foo/ @{ print $2 @}' BBS-list
-@end example
-
-@node Boolean Patterns, Expression Patterns, Comparison Patterns, Patterns
-@section Boolean Operators and Patterns
-@cindex patterns, boolean
-@cindex boolean patterns
-
-A @dfn{boolean pattern} is an expression which combines other patterns
-using the @dfn{boolean operators} ``or'' (@samp{||}), ``and''
-(@samp{&&}), and ``not'' (@samp{!}). Whether the boolean pattern
-matches an input record depends on whether its subpatterns match.
-
-For example, the following command prints all records in the input file
-@file{BBS-list} that contain both @samp{2400} and @samp{foo}.@refill
-
-@example
-awk '/2400/ && /foo/' BBS-list
-@end example
-
-The following command prints all records in the input file
-@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or
-both.@refill
-
-@example
-awk '/2400/ || /foo/' BBS-list
-@end example
-
-The following command prints all records in the input file
-@file{BBS-list} that do @emph{not} contain the string @samp{foo}.
-
-@example
-awk '! /foo/' BBS-list
-@end example
-
-Note that boolean patterns are a special case of expression patterns
-(@pxref{Expression Patterns, ,Expressions as Patterns}); they are
-expressions that use the boolean operators.
-@xref{Boolean Ops, ,Boolean Expressions}, for complete information
-on the boolean operators.@refill
-
-The subpatterns of a boolean pattern can be constant regular
-expressions, comparisons, or any other @code{awk} expressions. Range
-patterns are not expressions, so they cannot appear inside boolean
-patterns. Likewise, the special patterns @code{BEGIN} and @code{END},
-which never match any input record, are not expressions and cannot
-appear inside boolean patterns.
-
-@node Expression Patterns, Ranges, Boolean Patterns, Patterns
-@section Expressions as Patterns
-
-Any @code{awk} expression is also valid as an @code{awk} pattern.
-Then the pattern ``matches'' if the expression's value is nonzero (if a
-number) or nonnull (if a string).
-
-The expression is reevaluated each time the rule is tested against a new
-input record. If the expression uses fields such as @code{$1}, the
-value depends directly on the new input record's text; otherwise, it
-depends only on what has happened so far in the execution of the
-@code{awk} program, but that may still be useful.
-
-Comparison patterns are actually a special case of this. For
-example, the expression @code{$5 == "foo"} has the value 1 when the
-value of @code{$5} equals @code{"foo"}, and 0 otherwise; therefore, this
-expression as a pattern matches when the two values are equal.
-
-Boolean patterns are also special cases of expression patterns.
-
-A constant regexp as a pattern is also a special case of an expression
-pattern. @code{/foo/} as an expression has the value 1 if @samp{foo}
-appears in the current input record; thus, as a pattern, @code{/foo/}
-matches any record containing @samp{foo}.
-
-Other implementations of @code{awk} that are not yet @sc{posix} compliant
-are less general than @code{gawk}: they allow comparison expressions, and
-boolean combinations thereof (optionally with parentheses), but not
-necessarily other kinds of expressions.
-
-@node Ranges, BEGIN/END, Expression Patterns, Patterns
-@section Specifying Record Ranges with Patterns
-
-@cindex range pattern
-@cindex patterns, range
-A @dfn{range pattern} is made of two patterns separated by a comma, of
-the form @code{@var{begpat}, @var{endpat}}. It matches ranges of
-consecutive input records. The first pattern @var{begpat} controls
-where the range begins, and the second one @var{endpat} controls where
-it ends. For example,@refill
-
-@example
-awk '$1 == "on", $1 == "off"'
-@end example
-
-@noindent
-prints every record between @samp{on}/@samp{off} pairs, inclusive.
-
-A range pattern starts out by matching @var{begpat}
-against every input record; when a record matches @var{begpat}, the
-range pattern becomes @dfn{turned on}. The range pattern matches this
-record. As long as it stays turned on, it automatically matches every
-input record read. It also matches @var{endpat} against
-every input record; when that succeeds, the range pattern is turned
-off again for the following record. Now it goes back to checking
-@var{begpat} against each record.
-
-The record that turns on the range pattern and the one that turns it
-off both match the range pattern. If you don't want to operate on
-these records, you can write @code{if} statements in the rule's action
-to distinguish them.
-
-It is possible for a pattern to be turned both on and off by the same
-record, if both conditions are satisfied by that record. Then the action is
-executed for just that record.
-
-@node BEGIN/END, Empty, Ranges, Patterns
-@section @code{BEGIN} and @code{END} Special Patterns
-
-@cindex @code{BEGIN} special pattern
-@cindex patterns, @code{BEGIN}
-@cindex @code{END} special pattern
-@cindex patterns, @code{END}
-@code{BEGIN} and @code{END} are special patterns. They are not used to
-match input records. Rather, they are used for supplying start-up or
-clean-up information to your @code{awk} script. A @code{BEGIN} rule is
-executed, once, before the first input record has been read. An @code{END}
-rule is executed, once, after all the input has been read. For
-example:@refill
-
-@example
-awk 'BEGIN @{ print "Analysis of `foo'" @}
- /foo/ @{ ++foobar @}
- END @{ print "`foo' appears " foobar " times." @}' BBS-list
-@end example
-
-This program finds the number of records in the input file @file{BBS-list}
-that contain the string @samp{foo}. The @code{BEGIN} rule prints a title
-for the report. There is no need to use the @code{BEGIN} rule to
-initialize the counter @code{foobar} to zero, as @code{awk} does this
-for us automatically (@pxref{Variables}).
-
-The second rule increments the variable @code{foobar} every time a
-record containing the pattern @samp{foo} is read. The @code{END} rule
-prints the value of @code{foobar} at the end of the run.@refill
-
-The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
-or with boolean operators (indeed, they cannot be used with any operators).
-
-An @code{awk} program may have multiple @code{BEGIN} and/or @code{END}
-rules. They are executed in the order they appear, all the @code{BEGIN}
-rules at start-up and all the @code{END} rules at termination.
-
-Multiple @code{BEGIN} and @code{END} sections are useful for writing
-library functions, since each library can have its own @code{BEGIN} or
-@code{END} rule to do its own initialization and/or cleanup. Note that
-the order in which library functions are named on the command line
-controls the order in which their @code{BEGIN} and @code{END} rules are
-executed. Therefore you have to be careful to write such rules in
-library files so that the order in which they are executed doesn't matter.
-@xref{Command Line, ,Invoking @code{awk}}, for more information on
-using library functions.
-
-If an @code{awk} program only has a @code{BEGIN} rule, and no other
-rules, then the program exits after the @code{BEGIN} rule has been run.
-(Older versions of @code{awk} used to keep reading and ignoring input
-until end of file was seen.) However, if an @code{END} rule exists as
-well, then the input will be read, even if there are no other rules in
-the program. This is necessary in case the @code{END} rule checks the
-@code{NR} variable.
-
-@code{BEGIN} and @code{END} rules must have actions; there is no default
-action for these rules since there is no current record when they run.
-
-@node Empty, , BEGIN/END, Patterns
-@comment node-name, next, previous, up
-@section The Empty Pattern
-
-@cindex empty pattern
-@cindex pattern, empty
-An empty pattern is considered to match @emph{every} input record. For
-example, the program:@refill
-
-@example
-awk '@{ print $1 @}' BBS-list
-@end example
-
-@noindent
-prints the first field of every record.
-
-@node Actions, Expressions, Patterns, Top
-@chapter Overview of Actions
-@cindex action, definition of
-@cindex curly braces
-@cindex action, curly braces
-@cindex action, separating statements
-
-An @code{awk} program or script consists of a series of
-rules and function definitions, interspersed. (Functions are
-described later. @xref{User-defined, ,User-defined Functions}.)
-
-A rule contains a pattern and an action, either of which may be
-omitted. The purpose of the @dfn{action} is to tell @code{awk} what to do
-once a match for the pattern is found. Thus, the entire program
-looks somewhat like this:
-
-@example
-@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
-@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
-@dots{}
-function @var{name} (@var{args}) @{ @dots{} @}
-@dots{}
-@end example
-
-An action consists of one or more @code{awk} @dfn{statements}, enclosed
-in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one
-thing to be done. The statements are separated by newlines or
-semicolons.
-
-The curly braces around an action must be used even if the action
-contains only one statement, or even if it contains no statements at
-all. However, if you omit the action entirely, omit the curly braces as
-well. (An omitted action is equivalent to @samp{@{ print $0 @}}.)
-
-Here are the kinds of statements supported in @code{awk}:
-
-@itemize @bullet
-@item
-Expressions, which can call functions or assign values to variables
-(@pxref{Expressions, ,Expressions as Action Statements}). Executing
-this kind of statement simply computes the value of the expression and
-then ignores it. This is useful when the expression has side effects
-(@pxref{Assignment Ops, ,Assignment Expressions}).@refill
-
-@item
-Control statements, which specify the control flow of @code{awk}
-programs. The @code{awk} language gives you C-like constructs
-(@code{if}, @code{for}, @code{while}, and so on) as well as a few
-special ones (@pxref{Statements, ,Control Statements in Actions}).@refill
-
-@item
-Compound statements, which consist of one or more statements enclosed in
-curly braces. A compound statement is used in order to put several
-statements together in the body of an @code{if}, @code{while}, @code{do}
-or @code{for} statement.
-
-@item
-Input control, using the @code{getline} command
-(@pxref{Getline, ,Explicit Input with @code{getline}}), and the @code{next}
-statement (@pxref{Next Statement, ,The @code{next} Statement}).
-
-@item
-Output statements, @code{print} and @code{printf}.
-@xref{Printing, ,Printing Output}.@refill
-
-@item
-Deletion statements, for deleting array elements.
-@xref{Delete, ,The @code{delete} Statement}.@refill
-@end itemize
-
-@iftex
-The next two chapters cover in detail expressions and control
-statements, respectively. We go on to treat arrays and built-in
-functions, both of which are used in expressions. Then we proceed
-to discuss how to define your own functions.
-@end iftex
-
-@node Expressions, Statements, Actions, Top
-@chapter Expressions as Action Statements
-@cindex expression
-
-Expressions are the basic building block of @code{awk} actions. An
-expression evaluates to a value, which you can print, test, store in a
-variable or pass to a function. But beyond that, an expression can assign a new value to a variable
-or a field, with an assignment operator.
-
-An expression can serve as a statement on its own. Most other kinds of
-statements contain one or more expressions which specify data to be
-operated on. As in other languages, expressions in @code{awk} include
-variables, array references, constants, and function calls, as well as
-combinations of these with various operators.
-
-@menu
-* Constants:: String, numeric, and regexp constants.
-* Variables:: Variables give names to values for later use.
-* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, etc.)
-* Concatenation:: Concatenating strings.
-* Comparison Ops:: Comparison of numbers and strings
- with @samp{<}, etc.
-* Boolean Ops:: Combining comparison expressions
- using boolean operators
- @samp{||} (``or''), @samp{&&} (``and'') and @samp{!} (``not'').
-
-* Assignment Ops:: Changing the value of a variable or a field.
-* Increment Ops:: Incrementing the numeric value of a variable.
-
-* Conversion:: The conversion of strings to numbers
- and vice versa.
-* Values:: The whole truth about numbers and strings.
-* Conditional Exp:: Conditional expressions select
- between two subexpressions under control
- of a third subexpression.
-* Function Calls:: A function call is an expression.
-* Precedence:: How various operators nest.
-@end menu
-
-@node Constants, Variables, Expressions, Expressions
-@section Constant Expressions
-@cindex constants, types of
-@cindex string constants
-
-The simplest type of expression is the @dfn{constant}, which always has
-the same value. There are three types of constants: numeric constants,
-string constants, and regular expression constants.
-
-@cindex numeric constant
-@cindex numeric value
-A @dfn{numeric constant} stands for a number. This number can be an
-integer, a decimal fraction, or a number in scientific (exponential)
-notation. Note that all numeric values are represented within
-@code{awk} in double-precision floating point. Here are some examples
-of numeric constants, which all have the same value:
-
-@example
-105
-1.05e+2
-1050e-1
-@end example
-
-A string constant consists of a sequence of characters enclosed in
-double-quote marks. For example:
-
-@example
-"parrot"
-@end example
-
-@noindent
-@iftex
-@cindex differences between @code{gawk} and @code{awk}
-@end iftex
-represents the string whose contents are @samp{parrot}. Strings in
-@code{gawk} can be of any length and they can contain all the possible
-8-bit ASCII characters including ASCII NUL. Other @code{awk}
-implementations may have difficulty with some character codes.@refill
-
-@cindex escape sequence notation
-Some characters cannot be included literally in a string constant. You
-represent them instead with @dfn{escape sequences}, which are character
-sequences beginning with a backslash (@samp{\}).
-
-One use of an escape sequence is to include a double-quote character in
-a string constant. Since a plain double-quote would end the string, you
-must use @samp{\"} to represent a single double-quote character as a
-part of the string.
-The
-backslash character itself is another character that cannot be
-included normally; you write @samp{\\} to put one backslash in the
-string. Thus, the string whose contents are the two characters
-@samp{"\} must be written @code{"\"\\"}.
-
-Another use of backslash is to represent unprintable characters
-such as newline. While there is nothing to stop you from writing most
-of these characters directly in a string constant, they may look ugly.
-
-Here is a table of all the escape sequences used in @code{awk}:
-
-@table @code
-@item \\
-Represents a literal backslash, @samp{\}.
-
-@item \a
-Represents the ``alert'' character, control-g, ASCII code 7.
-
-@item \b
-Represents a backspace, control-h, ASCII code 8.
-
-@item \f
-Represents a formfeed, control-l, ASCII code 12.
-
-@item \n
-Represents a newline, control-j, ASCII code 10.
-
-@item \r
-Represents a carriage return, control-m, ASCII code 13.
-
-@item \t
-Represents a horizontal tab, control-i, ASCII code 9.
-
-@item \v
-Represents a vertical tab, control-k, ASCII code 11.
-
-@item \@var{nnn}
-Represents the octal value @var{nnn}, where @var{nnn} are one to three
-digits between 0 and 7. For example, the code for the ASCII ESC
-(escape) character is @samp{\033}.@refill
-
-@item \x@var{hh}@dots{}
-Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
-digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
-@samp{a} through @samp{f}). Like the same construct in @sc{ansi} C, the escape
-sequence continues until the first non-hexadecimal digit is seen. However,
-using more than two hexadecimal digits produces undefined results. (The
-@samp{\x} escape sequence is not allowed in @sc{posix} @code{awk}.)@refill
-@end table
-
-A @dfn{constant regexp} is a regular expression description enclosed in
-slashes, such as @code{/^beginning and end$/}. Most regexps used in
-@code{awk} programs are constant, but the @samp{~} and @samp{!~}
-operators can also match computed or ``dynamic'' regexps
-(@pxref{Regexp Usage, ,How to Use Regular Expressions}).@refill
-
-Constant regexps may be used like simple expressions. When a
-constant regexp is not on the right hand side of the @samp{~} or
-@samp{!~} operators, it has the same meaning as if it appeared
-in a pattern, i.e. @samp{($0 ~ /foo/)}
-(@pxref{Expression Patterns, ,Expressions as Patterns}).
-This means that the two code segments,@refill
-
-@example
-if ($0 ~ /barfly/ || $0 ~ /camelot/)
- print "found"
-@end example
-
-@noindent
-and
-
-@example
-if (/barfly/ || /camelot/)
- print "found"
-@end example
-
-@noindent
-are exactly equivalent. One rather bizarre consequence of this rule is
-that the following boolean expression is legal, but does not do what the user
-intended:@refill
-
-@example
-if (/foo/ ~ $1) print "found foo"
-@end example
-
-This code is ``obviously'' testing @code{$1} for a match against the regexp
-@code{/foo/}. But in fact, the expression @code{(/foo/ ~ $1)} actually means
-@code{(($0 ~ /foo/) ~ $1)}. In other words, first match the input record
-against the regexp @code{/foo/}. The result will be either a 0 or a 1,
-depending upon the success or failure of the match. Then match that result
-against the first field in the record.@refill
-
-Since it is unlikely that you would ever really wish to make this kind of
-test, @code{gawk} will issue a warning when it sees this construct in
-a program.@refill
-
-Another consequence of this rule is that the assignment statement
-
-@example
-matches = /foo/
-@end example
-
-@noindent
-will assign either 0 or 1 to the variable @code{matches}, depending
-upon the contents of the current input record.
-
-Constant regular expressions are also used as the first argument for
-the @code{sub} and @code{gsub} functions
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill
-
-This feature of the language was never well documented until the
-@sc{posix} specification.
-
-You may be wondering, when is
-
-@example
-$1 ~ /foo/ @{ @dots{} @}
-@end example
-
-@noindent
-preferable to
-
-@example
-$1 ~ "foo" @{ @dots{} @}
-@end example
-
-Since the right-hand sides of both @samp{~} operators are constants,
-it is more efficient to use the @samp{/foo/} form: @code{awk} can note
-that you have supplied a regexp and store it internally in a form that
-makes pattern matching more efficient. In the second form, @code{awk}
-must first convert the string into this internal form, and then perform
-the pattern matching. The first form is also better style; it shows
-clearly that you intend a regexp match.
-
-@node Variables, Arithmetic Ops, Constants, Expressions
-@section Variables
-@cindex variables, user-defined
-@cindex user-defined variables
-@c there should be more than one subsection, ideally. Not a big deal.
-@c But usually there are supposed to be at least two. One way to get
-@c around this is to write the info in the subsection as the info in the
-@c section itself and not have any subsections.. --mew
-
-Variables let you give names to values and refer to them later. You have
-already seen variables in many of the examples. The name of a variable
-must be a sequence of letters, digits and underscores, but it may not begin
-with a digit. Case is significant in variable names; @code{a} and @code{A}
-are distinct variables.
-
-A variable name is a valid expression by itself; it represents the
-variable's current value. Variables are given new values with
-@dfn{assignment operators} and @dfn{increment operators}.
-@xref{Assignment Ops, ,Assignment Expressions}.
-
-A few variables have special built-in meanings, such as @code{FS}, the
-field separator, and @code{NF}, the number of fields in the current
-input record. @xref{Built-in Variables}, for a list of them. These
-built-in variables can be used and assigned just like all other
-variables, but their values are also used or changed automatically by
-@code{awk}. Each built-in variable's name is made entirely of upper case
-letters.
-
-Variables in @code{awk} can be assigned either numeric or string
-values. By default, variables are initialized to the null string, which
-is effectively zero if converted to a number. There is no need to
-``initialize'' each variable explicitly in @code{awk}, the way you would in C or most other traditional languages.
-
-@menu
-* Assignment Options:: Setting variables on the command line
- and a summary of command line syntax.
- This is an advanced method of input.
-@end menu
-
-@node Assignment Options, , Variables, Variables
-@subsection Assigning Variables on the Command Line
-
-You can set any @code{awk} variable by including a @dfn{variable assignment}
-among the arguments on the command line when you invoke @code{awk}
-(@pxref{Command Line, ,Invoking @code{awk}}). Such an assignment has
-this form:@refill
-
-@example
-@var{variable}=@var{text}
-@end example
-
-@noindent
-With it, you can set a variable either at the beginning of the
-@code{awk} run or in between input files.
-
-If you precede the assignment with the @samp{-v} option, like this:
-
-@example
--v @var{variable}=@var{text}
-@end example
-
-@noindent
-then the variable is set at the very beginning, before even the
-@code{BEGIN} rules are run. The @samp{-v} option and its assignment
-must precede all the file name arguments, as well as the program text.
-
-Otherwise, the variable assignment is performed at a time determined by
-its position among the input file arguments: after the processing of the
-preceding input file argument. For example:
-
-@example
-awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
-@end example
-
-@noindent
-prints the value of field number @code{n} for all input records. Before
-the first file is read, the command line sets the variable @code{n}
-equal to 4. This causes the fourth field to be printed in lines from
-the file @file{inventory-shipped}. After the first file has finished,
-but before the second file is started, @code{n} is set to 2, so that the
-second field is printed in lines from @file{BBS-list}.
-
-Command line arguments are made available for explicit examination by
-the @code{awk} program in an array named @code{ARGV}
-(@pxref{Built-in Variables}).@refill
-
-@code{awk} processes the values of command line assignments for escape
-sequences (@pxref{Constants, ,Constant Expressions}).
-
-@node Arithmetic Ops, Concatenation, Variables, Expressions
-@section Arithmetic Operators
-@cindex arithmetic operators
-@cindex operators, arithmetic
-@cindex addition
-@cindex subtraction
-@cindex multiplication
-@cindex division
-@cindex remainder
-@cindex quotient
-@cindex exponentiation
-
-The @code{awk} language uses the common arithmetic operators when
-evaluating expressions. All of these arithmetic operators follow normal
-precedence rules, and work as you would expect them to. This example
-divides field three by field four, adds field two, stores the result
-into field one, and prints the resulting altered input record:
-
-@example
-awk '@{ $1 = $2 + $3 / $4; print @}' inventory-shipped
-@end example
-
-The arithmetic operators in @code{awk} are:
-
-@table @code
-@item @var{x} + @var{y}
-Addition.
-
-@item @var{x} - @var{y}
-Subtraction.
-
-@item - @var{x}
-Negation.
-
-@item + @var{x}
-Unary plus. No real effect on the expression.
-
-@item @var{x} * @var{y}
-Multiplication.
-
-@item @var{x} / @var{y}
-Division. Since all numbers in @code{awk} are double-precision
-floating point, the result is not rounded to an integer: @code{3 / 4}
-has the value 0.75.
-
-@item @var{x} % @var{y}
-@iftex
-@cindex differences between @code{gawk} and @code{awk}
-@end iftex
-Remainder. The quotient is rounded toward zero to an integer,
-multiplied by @var{y} and this result is subtracted from @var{x}.
-This operation is sometimes known as ``trunc-mod.'' The following
-relation always holds:
-
-@example
-b * int(a / b) + (a % b) == a
-@end example
-
-One possibly undesirable effect of this definition of remainder is that
-@code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus,
-
-@example
--17 % 8 = -1
-@end example
-
-In other @code{awk} implementations, the signedness of the remainder
-may be machine dependent.
-
-@item @var{x} ^ @var{y}
-@itemx @var{x} ** @var{y}
-Exponentiation: @var{x} raised to the @var{y} power. @code{2 ^ 3} has
-the value 8. The character sequence @samp{**} is equivalent to
-@samp{^}. (The @sc{posix} standard only specifies the use of @samp{^}
-for exponentiation.)
-@end table
-
-@node Concatenation, Comparison Ops, Arithmetic Ops, Expressions
-@section String Concatenation
-
-@cindex string operators
-@cindex operators, string
-@cindex concatenation
-There is only one string operation: concatenation. It does not have a
-specific operator to represent it. Instead, concatenation is performed by
-writing expressions next to one another, with no operator. For example:
-
-@example
-awk '@{ print "Field number one: " $1 @}' BBS-list
-@end example
-
-@noindent
-produces, for the first record in @file{BBS-list}:
-
-@example
-Field number one: aardvark
-@end example
-
-Without the space in the string constant after the @samp{:}, the line
-would run together. For example:
-
-@example
-awk '@{ print "Field number one:" $1 @}' BBS-list
-@end example
-
-@noindent
-produces, for the first record in @file{BBS-list}:
-
-@example
-Field number one:aardvark
-@end example
-
-Since string concatenation does not have an explicit operator, it is
-often necessary to insure that it happens where you want it to by
-enclosing the items to be concatenated in parentheses. For example, the
-following code fragment does not concatenate @code{file} and @code{name}
-as you might expect:
-
-@example
-file = "file"
-name = "name"
-print "something meaningful" > file name
-@end example
-
-@noindent
-It is necessary to use the following:
-
-@example
-print "something meaningful" > (file name)
-@end example
-
-We recommend you use parentheses around concatenation in all but the
-most common contexts (such as in the right-hand operand of @samp{=}).
-
-@ignore
-@code{gawk} actually now allows a concatenation on the right hand
-side of a @code{>} redirection, but other @code{awk}s don't. So for
-now we won't mention that fact.
-@end ignore
-
-@node Comparison Ops, Boolean Ops, Concatenation, Expressions
-@section Comparison Expressions
-@cindex comparison expressions
-@cindex expressions, comparison
-@cindex relational operators
-@cindex operators, relational
-@cindex regexp operators
-
-@dfn{Comparison expressions} compare strings or numbers for
-relationships such as equality. They are written using @dfn{relational
-operators}, which are a superset of those in C. Here is a table of
-them:
-
-@table @code
-@item @var{x} < @var{y}
-True if @var{x} is less than @var{y}.
-
-@item @var{x} <= @var{y}
-True if @var{x} is less than or equal to @var{y}.
-
-@item @var{x} > @var{y}
-True if @var{x} is greater than @var{y}.
-
-@item @var{x} >= @var{y}
-True if @var{x} is greater than or equal to @var{y}.
-
-@item @var{x} == @var{y}
-True if @var{x} is equal to @var{y}.
-
-@item @var{x} != @var{y}
-True if @var{x} is not equal to @var{y}.
-
-@item @var{x} ~ @var{y}
-True if the string @var{x} matches the regexp denoted by @var{y}.
-
-@item @var{x} !~ @var{y}
-True if the string @var{x} does not match the regexp denoted by @var{y}.
-
-@item @var{subscript} in @var{array}
-True if array @var{array} has an element with the subscript @var{subscript}.
-@end table
-
-Comparison expressions have the value 1 if true and 0 if false.
-
-The rules @code{gawk} uses for performing comparisons are based on those
-in draft 11.2 of the @sc{posix} standard. The @sc{posix} standard introduced
-the concept of a @dfn{numeric string}, which is simply a string that looks
-like a number, for example, @code{@w{" +2"}}.
-
-@vindex CONVFMT
-When performing a relational operation, @code{gawk} considers the type of an
-operand to be the type it received on its last @emph{assignment}, rather
-than the type of its last @emph{use}
-(@pxref{Values, ,Numeric and String Values}).
-This type is @emph{unknown} when the operand is from an ``external'' source:
-field variables, command line arguments, array elements resulting from a
-@code{split} operation, and the value of an @code{ENVIRON} element.
-In this case only, if the operand is a numeric string, then it is
-considered to be of both string type and numeric type. If at least one
-operand of a comparison is of string type only, then a string
-comparison is performed. Any numeric operand will be converted to a
-string using the value of @code{CONVFMT}
-(@pxref{Conversion, ,Conversion of Strings and Numbers}).
-If one operand of a comparison is numeric, and the other operand is
-either numeric or both numeric and string, then @code{gawk} does a
-numeric comparison. If both operands have both types, then the
-comparison is numeric. Strings are compared
-by comparing the first character of each, then the second character of each,
-and so on. Thus @code{"10"} is less than @code{"9"}. If there are two
-strings where one is a prefix of the other, the shorter string is less than
-the longer one. Thus @code{"abc"} is less than @code{"abcd"}.@refill
-
-Here are some sample expressions, how @code{gawk} compares them, and what
-the result of the comparison is.
-
-@table @code
-@item 1.5 <= 2.0
-numeric comparison (true)
-
-@item "abc" >= "xyz"
-string comparison (false)
-
-@item 1.5 != " +2"
-string comparison (true)
-
-@item "1e2" < "3"
-string comparison (true)
-
-@item a = 2; b = "2"
-@itemx a == b
-string comparison (true)
-@end table
-
-@example
-echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'
-@end example
-
-@noindent
-prints @samp{false} since both @code{$1} and @code{$2} are numeric
-strings and thus have both string and numeric types, thus dictating
-a numeric comparison.
-
-The purpose of the comparison rules and the use of numeric strings is
-to attempt to produce the behavior that is ``least surprising,'' while
-still ``doing the right thing.''
-
-String comparisons and regular expression comparisons are very different.
-For example,
-
-@example
-$1 == "foo"
-@end example
-
-@noindent
-has the value of 1, or is true, if the first field of the current input
-record is precisely @samp{foo}. By contrast,
-
-@example
-$1 ~ /foo/
-@end example
-
-@noindent
-has the value 1 if the first field contains @samp{foo}, such as @samp{foobar}.
-
-The right hand operand of the @samp{~} and @samp{!~} operators may be
-either a constant regexp (@code{/@dots{}/}), or it may be an ordinary
-expression, in which case the value of the expression as a string is a
-dynamic regexp (@pxref{Regexp Usage, ,How to Use Regular Expressions}).
-
-@cindex regexp as expression
-In very recent implementations of @code{awk}, a constant regular
-expression in slashes by itself is also an expression. The regexp
-@code{/@var{regexp}/} is an abbreviation for this comparison expression:
-
-@example
-$0 ~ /@var{regexp}/
-@end example
-
-In some contexts it may be necessary to write parentheses around the
-regexp to avoid confusing the @code{gawk} parser. For example,
-@code{(/x/ - /y/) > threshold} is not allowed, but @code{((/x/) - (/y/))
-> threshold} parses properly.
-
-One special place where @code{/foo/} is @emph{not} an abbreviation for
-@code{$0 ~ /foo/} is when it is the right-hand operand of @samp{~} or
-@samp{!~}! @xref{Constants, ,Constant Expressions}, where this is
-discussed in more detail.
-
-@node Boolean Ops, Assignment Ops, Comparison Ops, Expressions
-@section Boolean Expressions
-@cindex expressions, boolean
-@cindex boolean expressions
-@cindex operators, boolean
-@cindex boolean operators
-@cindex logical operations
-@cindex and operator
-@cindex or operator
-@cindex not operator
-
-A @dfn{boolean expression} is a combination of comparison expressions or
-matching expressions, using the boolean operators ``or''
-(@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with
-parentheses to control nesting. The truth of the boolean expression is
-computed by combining the truth values of the component expressions.
-
-Boolean expressions can be used wherever comparison and matching
-expressions can be used. They can be used in @code{if}, @code{while}
-@code{do} and @code{for} statements. They have numeric values (1 if true,
-0 if false), which come into play if the result of the boolean expression
-is stored in a variable, or used in arithmetic.@refill
-
-In addition, every boolean expression is also a valid boolean pattern, so
-you can use it as a pattern to control the execution of rules.
-
-Here are descriptions of the three boolean operators, with an example of
-each. It may be instructive to compare these examples with the
-analogous examples of boolean patterns
-(@pxref{Boolean Patterns, ,Boolean Operators and Patterns}), which
-use the same boolean operators in patterns instead of expressions.@refill
-
-@table @code
-@item @var{boolean1} && @var{boolean2}
-True if both @var{boolean1} and @var{boolean2} are true. For example,
-the following statement prints the current input record if it contains
-both @samp{2400} and @samp{foo}.@refill
-
-@smallexample
-if ($0 ~ /2400/ && $0 ~ /foo/) print
-@end smallexample
-
-The subexpression @var{boolean2} is evaluated only if @var{boolean1}
-is true. This can make a difference when @var{boolean2} contains
-expressions that have side effects: in the case of @code{$0 ~ /foo/ &&
-($2 == bar++)}, the variable @code{bar} is not incremented if there is
-no @samp{foo} in the record.
-
-@item @var{boolean1} || @var{boolean2}
-True if at least one of @var{boolean1} or @var{boolean2} is true.
-For example, the following command prints all records in the input
-file @file{BBS-list} that contain @emph{either} @samp{2400} or
-@samp{foo}, or both.@refill
-
-@smallexample
-awk '@{ if ($0 ~ /2400/ || $0 ~ /foo/) print @}' BBS-list
-@end smallexample
-
-The subexpression @var{boolean2} is evaluated only if @var{boolean1}
-is false. This can make a difference when @var{boolean2} contains
-expressions that have side effects.
-
-@item !@var{boolean}
-True if @var{boolean} is false. For example, the following program prints
-all records in the input file @file{BBS-list} that do @emph{not} contain the
-string @samp{foo}.
-
-@smallexample
-awk '@{ if (! ($0 ~ /foo/)) print @}' BBS-list
-@end smallexample
-@end table
-
-@node Assignment Ops, Increment Ops, Boolean Ops, Expressions
-@section Assignment Expressions
-@cindex assignment operators
-@cindex operators, assignment
-@cindex expressions, assignment
-
-An @dfn{assignment} is an expression that stores a new value into a
-variable. For example, let's assign the value 1 to the variable
-@code{z}:@refill
-
-@example
-z = 1
-@end example
-
-After this expression is executed, the variable @code{z} has the value 1.
-Whatever old value @code{z} had before the assignment is forgotten.
-
-Assignments can store string values also. For example, this would store
-the value @code{"this food is good"} in the variable @code{message}:
-
-@example
-thing = "food"
-predicate = "good"
-message = "this " thing " is " predicate
-@end example
-
-@noindent
-(This also illustrates concatenation of strings.)
-
-The @samp{=} sign is called an @dfn{assignment operator}. It is the
-simplest assignment operator because the value of the right-hand
-operand is stored unchanged.
-
-@cindex side effect
-Most operators (addition, concatenation, and so on) have no effect
-except to compute a value. If you ignore the value, you might as well
-not use the operator. An assignment operator is different; it does
-produce a value, but even if you ignore the value, the assignment still
-makes itself felt through the alteration of the variable. We call this
-a @dfn{side effect}.
-
-@cindex lvalue
-The left-hand operand of an assignment need not be a variable
-(@pxref{Variables}); it can also be a field
-(@pxref{Changing Fields, ,Changing the Contents of a Field}) or
-an array element (@pxref{Arrays, ,Arrays in @code{awk}}).
-These are all called @dfn{lvalues},
-which means they can appear on the left-hand side of an assignment operator.
-The right-hand operand may be any expression; it produces the new value
-which the assignment stores in the specified variable, field or array
-element.@refill
-
-It is important to note that variables do @emph{not} have permanent types.
-The type of a variable is simply the type of whatever value it happens
-to hold at the moment. In the following program fragment, the variable
-@code{foo} has a numeric value at first, and a string value later on:
-
-@example
-foo = 1
-print foo
-foo = "bar"
-print foo
-@end example
-
-@noindent
-When the second assignment gives @code{foo} a string value, the fact that
-it previously had a numeric value is forgotten.
-
-An assignment is an expression, so it has a value: the same value that
-is assigned. Thus, @code{z = 1} as an expression has the value 1.
-One consequence of this is that you can write multiple assignments together:
-
-@example
-x = y = z = 0
-@end example
-
-@noindent
-stores the value 0 in all three variables. It does this because the
-value of @code{z = 0}, which is 0, is stored into @code{y}, and then
-the value of @code{y = z = 0}, which is 0, is stored into @code{x}.
-
-You can use an assignment anywhere an expression is called for. For
-example, it is valid to write @code{x != (y = 1)} to set @code{y} to 1
-and then test whether @code{x} equals 1. But this style tends to make
-programs hard to read; except in a one-shot program, you should
-rewrite it to get rid of such nesting of assignments. This is never very
-hard.
-
-Aside from @samp{=}, there are several other assignment operators that
-do arithmetic with the old value of the variable. For example, the
-operator @samp{+=} computes a new value by adding the right-hand value
-to the old value of the variable. Thus, the following assignment adds
-5 to the value of @code{foo}:
-
-@example
-foo += 5
-@end example
-
-@noindent
-This is precisely equivalent to the following:
-
-@example
-foo = foo + 5
-@end example
-
-@noindent
-Use whichever one makes the meaning of your program clearer.
-
-Here is a table of the arithmetic assignment operators. In each
-case, the right-hand operand is an expression whose value is converted
-to a number.
-
-@table @code
-@item @var{lvalue} += @var{increment}
-Adds @var{increment} to the value of @var{lvalue} to make the new value
-of @var{lvalue}.
-
-@item @var{lvalue} -= @var{decrement}
-Subtracts @var{decrement} from the value of @var{lvalue}.
-
-@item @var{lvalue} *= @var{coefficient}
-Multiplies the value of @var{lvalue} by @var{coefficient}.
-
-@item @var{lvalue} /= @var{quotient}
-Divides the value of @var{lvalue} by @var{quotient}.
-
-@item @var{lvalue} %= @var{modulus}
-Sets @var{lvalue} to its remainder by @var{modulus}.
-
-@item @var{lvalue} ^= @var{power}
-@itemx @var{lvalue} **= @var{power}
-Raises @var{lvalue} to the power @var{power}.
-(Only the @code{^=} operator is specified by @sc{posix}.)
-@end table
-
-@ignore
-From: gatech!ames!elroy!cit-vax!EQL.Caltech.Edu!rankin (Pat Rankin)
- In the discussion of assignment operators, it states that
-``foo += 5'' "is precisely equivalent to" ``foo = foo + 5'' (p.77). That
-may be true for simple variables, but it's not true for expressions with
-side effects, like array references. For proof, try
- BEGIN {
- foo[rand()] += 5; for (x in foo) print x, foo[x]
- bar[rand()] = bar[rand()] + 5; for (x in bar) print x, bar[x]
- }
-I suspect that the original statement is simply untrue--that '+=' is more
-efficient in all cases.
-
-ADR --- Try to add something about this here for the next go 'round.
-@end ignore
-
-@node Increment Ops, Conversion, Assignment Ops, Expressions
-@section Increment Operators
-
-@cindex increment operators
-@cindex operators, increment
-@dfn{Increment operators} increase or decrease the value of a variable
-by 1. You could do the same thing with an assignment operator, so
-the increment operators add no power to the @code{awk} language; but they
-are convenient abbreviations for something very common.
-
-The operator to add 1 is written @samp{++}. It can be used to increment
-a variable either before or after taking its value.
-
-To pre-increment a variable @var{v}, write @code{++@var{v}}. This adds
-1 to the value of @var{v} and that new value is also the value of this
-expression. The assignment expression @code{@var{v} += 1} is completely
-equivalent.
-
-Writing the @samp{++} after the variable specifies post-increment. This
-increments the variable value just the same; the difference is that the
-value of the increment expression itself is the variable's @emph{old}
-value. Thus, if @code{foo} has the value 4, then the expression @code{foo++}
-has the value 4, but it changes the value of @code{foo} to 5.
-
-The post-increment @code{foo++} is nearly equivalent to writing @code{(foo
-+= 1) - 1}. It is not perfectly equivalent because all numbers in
-@code{awk} are floating point: in floating point, @code{foo + 1 - 1} does
-not necessarily equal @code{foo}. But the difference is minute as
-long as you stick to numbers that are fairly small (less than a trillion).
-
-Any lvalue can be incremented. Fields and array elements are incremented
-just like variables. (Use @samp{$(i++)} when you wish to do a field reference
-and a variable increment at the same time. The parentheses are necessary
-because of the precedence of the field reference operator, @samp{$}.)
-@c expert information in the last parenthetical remark
-
-The decrement operator @samp{--} works just like @samp{++} except that
-it subtracts 1 instead of adding. Like @samp{++}, it can be used before
-the lvalue to pre-decrement or after it to post-decrement.
-
-Here is a summary of increment and decrement expressions.
-
-@table @code
-@item ++@var{lvalue}
-This expression increments @var{lvalue} and the new value becomes the
-value of this expression.
-
-@item @var{lvalue}++
-This expression causes the contents of @var{lvalue} to be incremented.
-The value of the expression is the @emph{old} value of @var{lvalue}.
-
-@item --@var{lvalue}
-Like @code{++@var{lvalue}}, but instead of adding, it subtracts. It
-decrements @var{lvalue} and delivers the value that results.
-
-@item @var{lvalue}--
-Like @code{@var{lvalue}++}, but instead of adding, it subtracts. It
-decrements @var{lvalue}. The value of the expression is the @emph{old}
-value of @var{lvalue}.
-@end table
-
-@node Conversion, Values, Increment Ops, Expressions
-@section Conversion of Strings and Numbers
-
-@cindex conversion of strings and numbers
-Strings are converted to numbers, and numbers to strings, if the context
-of the @code{awk} program demands it. For example, if the value of
-either @code{foo} or @code{bar} in the expression @code{foo + bar}
-happens to be a string, it is converted to a number before the addition
-is performed. If numeric values appear in string concatenation, they
-are converted to strings. Consider this:@refill
-
-@example
-two = 2; three = 3
-print (two three) + 4
-@end example
-
-@noindent
-This eventually prints the (numeric) value 27. The numeric values of
-the variables @code{two} and @code{three} are converted to strings and
-concatenated together, and the resulting string is converted back to the
-number 23, to which 4 is then added.
-
-If, for some reason, you need to force a number to be converted to a
-string, concatenate the null string with that number. To force a string
-to be converted to a number, add zero to that string.
-
-A string is converted to a number by interpreting a numeric prefix
-of the string as numerals:
-@code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"}
-has a numeric value of 25.
-Strings that can't be interpreted as valid numbers are converted to
-zero.
-
-@vindex CONVFMT
-The exact manner in which numbers are converted into strings is controlled
-by the @code{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}).
-Numbers are converted using a special version of the @code{sprintf} function
-(@pxref{Built-in, ,Built-in Functions}) with @code{CONVFMT} as the format
-specifier.@refill
-
-@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with
-at least six significant digits. For some applications you will want to
-change it to specify more precision. Double precision on most modern
-machines gives you 16 or 17 decimal digits of precision.
-
-Strange results can happen if you set @code{CONVFMT} to a string that doesn't
-tell @code{sprintf} how to format floating point numbers in a useful way.
-For example, if you forget the @samp{%} in the format, all numbers will be
-converted to the same constant string.@refill
-
-As a special case, if a number is an integer, then the result of converting
-it to a string is @emph{always} an integer, no matter what the value of
-@code{CONVFMT} may be. Given the following code fragment:
-
-@example
-CONVFMT = "%2.2f"
-a = 12
-b = a ""
-@end example
-
-@noindent
-@code{b} has the value @code{"12"}, not @code{"12.00"}.
-
-@ignore
-For the 2.14 version, describe the ``stickyness'' of conversions. Right now
-the manual assumes everywhere that variables are either numbers or strings;
-in fact both kinds of values may be valid. If both happen to be valid, a
-conversion isn't necessary and isn't done. Revising the manual to be
-consistent with this, though, is too big a job to tackle at the moment.
-
-7/92: This has sort of been done, only the section isn't completely right!
- What to do?
-7/92: Pretty much fixed, at least for the short term, thanks to text
- from David.
-@end ignore
-
-@vindex OFMT
-Prior to the @sc{posix} standard, @code{awk} specified that the value
-of @code{OFMT} was used for converting numbers to strings. @code{OFMT}
-specifies the output format to use when printing numbers with @code{print}.
-@code{CONVFMT} was introduced in order to separate the semantics of
-conversions from the semantics of printing. Both @code{CONVFMT} and
-@code{OFMT} have the same default value: @code{"%.6g"}. In the vast majority
-of cases, old @code{awk} programs will not change their behavior.
-However, this use of @code{OFMT} is something to keep in mind if you must
-port your program to other implementations of @code{awk}; we recommend
-that instead of changing your programs, you just port @code{gawk} itself!@refill
-
-@node Values, Conditional Exp, Conversion, Expressions
-@section Numeric and String Values
-@cindex conversion of strings and numbers
-
-Through most of this manual, we present @code{awk} values (such as constants,
-fields, or variables) as @emph{either} numbers @emph{or} strings. This is
-a convenient way to think about them, since typically they are used in only
-one way, or the other.
-
-In truth though, @code{awk} values can be @emph{both} string and
-numeric, at the same time. Internally, @code{awk} represents values
-with a string, a (floating point) number, and an indication that one,
-the other, or both representations of the value are valid.
-
-Keeping track of both kinds of values is important for execution
-efficiency: a variable can acquire a string value the first time it
-is used as a string, and then that string value can be used until the
-variable is assigned a new value. Thus, if a variable with only a numeric
-value is used in several concatenations in a row, it only has to be given
-a string representation once. The numeric value remains valid, so that
-no conversion back to a number is necessary if the variable is later used
-in an arithmetic expression.
-
-Tracking both kinds of values is also important for precise numerical
-calculations. Consider the following:
-
-@smallexample
-a = 123.321
-CONVFMT = "%3.1f"
-b = a " is a number"
-c = a + 1.654
-@end smallexample
-
-@noindent
-The variable @code{a} receives a string value in the concatenation and
-assignment to @code{b}. The string value of @code{a} is @code{"123.3"}.
-If the numeric value was lost when it was converted to a string, then the
-numeric use of @code{a} in the last statement would lose information.
-@code{c} would be assigned the value 124.954 instead of 124.975.
-Such errors accumulate rapidly, and very adversely affect numeric
-computations.@refill
-
-Once a numeric value acquires a corresponding string value, it stays valid
-until a new assignment is made. If @code{CONVFMT}
-(@pxref{Conversion, ,Conversion of Strings and Numbers}) changes in the
-meantime, the old string value will still be used. For example:@refill
-
-@smallexample
-BEGIN @{
- CONVFMT = "%2.2f"
- a = 123.456
- b = a "" # force `a' to have string value too
- printf "a = %s\n", a
- CONVFMT = "%.6g"
- printf "a = %s\n", a
- a += 0 # make `a' numeric only again
- printf "a = %s\n", a # use `a' as string
-@}
-@end smallexample
-
-@noindent
-This program prints @samp{a = 123.46} twice, and then prints
-@samp{a = 123.456}.
-
-@xref{Conversion, ,Conversion of Strings and Numbers}, for the rules that
-specify how string values are made from numeric values.
-
-@node Conditional Exp, Function Calls, Values, Expressions
-@section Conditional Expressions
-@cindex conditional expression
-@cindex expression, conditional
-
-A @dfn{conditional expression} is a special kind of expression with
-three operands. It allows you to use one expression's value to select
-one of two other expressions.
-
-The conditional expression looks the same as in the C language:
-
-@example
-@var{selector} ? @var{if-true-exp} : @var{if-false-exp}
-@end example
-
-@noindent
-There are three subexpressions. The first, @var{selector}, is always
-computed first. If it is ``true'' (not zero and not null) then
-@var{if-true-exp} is computed next and its value becomes the value of
-the whole expression. Otherwise, @var{if-false-exp} is computed next
-and its value becomes the value of the whole expression.@refill
-
-For example, this expression produces the absolute value of @code{x}:
-
-@example
-x > 0 ? x : -x
-@end example
-
-Each time the conditional expression is computed, exactly one of
-@var{if-true-exp} and @var{if-false-exp} is computed; the other is ignored.
-This is important when the expressions contain side effects. For example,
-this conditional expression examines element @code{i} of either array
-@code{a} or array @code{b}, and increments @code{i}.
-
-@example
-x == y ? a[i++] : b[i++]
-@end example
-
-@noindent
-This is guaranteed to increment @code{i} exactly once, because each time
-one or the other of the two increment expressions is executed,
-and the other is not.
-
-@node Function Calls, Precedence, Conditional Exp, Expressions
-@section Function Calls
-@cindex function call
-@cindex calling a function
-
-A @dfn{function} is a name for a particular calculation. Because it has
-a name, you can ask for it by name at any point in the program. For
-example, the function @code{sqrt} computes the square root of a number.
-
-A fixed set of functions are @dfn{built-in}, which means they are
-available in every @code{awk} program. The @code{sqrt} function is one
-of these. @xref{Built-in, ,Built-in Functions}, for a list of built-in
-functions and their descriptions. In addition, you can define your own
-functions in the program for use elsewhere in the same program.
-@xref{User-defined, ,User-defined Functions}, for how to do this.@refill
-
-@cindex arguments in function call
-The way to use a function is with a @dfn{function call} expression,
-which consists of the function name followed by a list of
-@dfn{arguments} in parentheses. The arguments are expressions which
-give the raw materials for the calculation that the function will do.
-When there is more than one argument, they are separated by commas. If
-there are no arguments, write just @samp{()} after the function name.
-Here are some examples:
-
-@example
-sqrt(x^2 + y^2) # @r{One argument}
-atan2(y, x) # @r{Two arguments}
-rand() # @r{No arguments}
-@end example
-
-@strong{Do not put any space between the function name and the
-open-parenthesis!} A user-defined function name looks just like the name of
-a variable, and space would make the expression look like concatenation
-of a variable with an expression inside parentheses. Space before the
-parenthesis is harmless with built-in functions, but it is best not to get
-into the habit of using space to avoid mistakes with user-defined
-functions.
-
-Each function expects a particular number of arguments. For example, the
-@code{sqrt} function must be called with a single argument, the number
-to take the square root of:
-
-@example
-sqrt(@var{argument})
-@end example
-
-Some of the built-in functions allow you to omit the final argument.
-If you do so, they use a reasonable default.
-@xref{Built-in, ,Built-in Functions}, for full details. If arguments
-are omitted in calls to user-defined functions, then those arguments are
-treated as local variables, initialized to the null string
-(@pxref{User-defined, ,User-defined Functions}).@refill
-
-Like every other expression, the function call has a value, which is
-computed by the function based on the arguments you give it. In this
-example, the value of @code{sqrt(@var{argument})} is the square root of the
-argument. A function can also have side effects, such as assigning the
-values of certain variables or doing I/O.
-
-Here is a command to read numbers, one number per line, and print the
-square root of each one:
-
-@example
-awk '@{ print "The square root of", $1, "is", sqrt($1) @}'
-@end example
-
-@node Precedence, , Function Calls, Expressions
-@section Operator Precedence (How Operators Nest)
-@cindex precedence
-@cindex operator precedence
-
-@dfn{Operator precedence} determines how operators are grouped, when
-different operators appear close by in one expression. For example,
-@samp{*} has higher precedence than @samp{+}; thus, @code{a + b * c}
-means to multiply @code{b} and @code{c}, and then add @code{a} to the
-product (i.e., @code{a + (b * c)}).
-
-You can overrule the precedence of the operators by using parentheses.
-You can think of the precedence rules as saying where the
-parentheses are assumed if you do not write parentheses yourself. In
-fact, it is wise to always use parentheses whenever you have an unusual
-combination of operators, because other people who read the program may
-not remember what the precedence is in this case. You might forget,
-too; then you could make a mistake. Explicit parentheses will help prevent
-any such mistake.
-
-When operators of equal precedence are used together, the leftmost
-operator groups first, except for the assignment, conditional and
-exponentiation operators, which group in the opposite order.
-Thus, @code{a - b + c} groups as @code{(a - b) + c};
-@code{a = b = c} groups as @code{a = (b = c)}.@refill
-
-The precedence of prefix unary operators does not matter as long as only
-unary operators are involved, because there is only one way to parse
-them---innermost first. Thus, @code{$++i} means @code{$(++i)} and
-@code{++$x} means @code{++($x)}. However, when another operator follows
-the operand, then the precedence of the unary operators can matter.
-Thus, @code{$x^2} means @code{($x)^2}, but @code{-x^2} means
-@code{-(x^2)}, because @samp{-} has lower precedence than @samp{^}
-while @samp{$} has higher precedence.
-
-Here is a table of the operators of @code{awk}, in order of increasing
-precedence:
-
-@table @asis
-@item assignment
-@samp{=}, @samp{+=}, @samp{-=}, @samp{*=}, @samp{/=}, @samp{%=},
-@samp{^=}, @samp{**=}. These operators group right-to-left.
-(The @samp{**=} operator is not specified by @sc{posix}.)
-
-@item conditional
-@samp{?:}. This operator groups right-to-left.
-
-@item logical ``or''.
-@samp{||}.
-
-@item logical ``and''.
-@samp{&&}.
-
-@item array membership
-@samp{in}.
-
-@item matching
-@samp{~}, @samp{!~}.
-
-@item relational, and redirection
-The relational operators and the redirections have the same precedence
-level. Characters such as @samp{>} serve both as relationals and as
-redirections; the context distinguishes between the two meanings.
-
-The relational operators are @samp{<}, @samp{<=}, @samp{==}, @samp{!=},
-@samp{>=} and @samp{>}.
-
-The I/O redirection operators are @samp{<}, @samp{>}, @samp{>>} and
-@samp{|}.
-
-Note that I/O redirection operators in @code{print} and @code{printf}
-statements belong to the statement level, not to expressions. The
-redirection does not produce an expression which could be the operand of
-another operator. As a result, it does not make sense to use a
-redirection operator near another operator of lower precedence, without
-parentheses. Such combinations, for example @samp{print foo > a ? b :
-c}, result in syntax errors.
-
-@item concatenation
-No special token is used to indicate concatenation.
-The operands are simply written side by side.
-
-@item add, subtract
-@samp{+}, @samp{-}.
-
-@item multiply, divide, mod
-@samp{*}, @samp{/}, @samp{%}.
-
-@item unary plus, minus, ``not''
-@samp{+}, @samp{-}, @samp{!}.
-
-@item exponentiation
-@samp{^}, @samp{**}. These operators group right-to-left.
-(The @samp{**} operator is not specified by @sc{posix}.)
-
-@item increment, decrement
-@samp{++}, @samp{--}.
-
-@item field
-@samp{$}.
-@end table
-
-@node Statements, Arrays, Expressions, Top
-@chapter Control Statements in Actions
-@cindex control statement
-
-@dfn{Control statements} such as @code{if}, @code{while}, and so on
-control the flow of execution in @code{awk} programs. Most of the
-control statements in @code{awk} are patterned on similar statements in
-C.
-
-All the control statements start with special keywords such as @code{if}
-and @code{while}, to distinguish them from simple expressions.
-
-Many control statements contain other statements; for example, the
-@code{if} statement contains another statement which may or may not be
-executed. The contained statement is called the @dfn{body}. If you
-want to include more than one statement in the body, group them into a
-single compound statement with curly braces, separating them with
-newlines or semicolons.
-
-@menu
-* If Statement:: Conditionally execute
- some @code{awk} statements.
-* While Statement:: Loop until some condition is satisfied.
-* Do Statement:: Do specified action while looping until some
- condition is satisfied.
-* For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-* Break Statement:: Immediately exit the innermost enclosing loop.
-* Continue Statement:: Skip to the end of the innermost
- enclosing loop.
-* Next Statement:: Stop processing the current input record.
-* Next File Statement:: Stop processing the current file.
-* Exit Statement:: Stop execution of @code{awk}.
-@end menu
-
-@node If Statement, While Statement, Statements, Statements
-@section The @code{if} Statement
-
-@cindex @code{if} statement
-The @code{if}-@code{else} statement is @code{awk}'s decision-making
-statement. It looks like this:@refill
-
-@example
-if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]}
-@end example
-
-@noindent
-@var{condition} is an expression that controls what the rest of the
-statement will do. If @var{condition} is true, @var{then-body} is
-executed; otherwise, @var{else-body} is executed (assuming that the
-@code{else} clause is present). The @code{else} part of the statement is
-optional. The condition is considered false if its value is zero or
-the null string, and true otherwise.@refill
-
-Here is an example:
-
-@example
-if (x % 2 == 0)
- print "x is even"
-else
- print "x is odd"
-@end example
-
-In this example, if the expression @code{x % 2 == 0} is true (that is,
-the value of @code{x} is divisible by 2), then the first @code{print}
-statement is executed, otherwise the second @code{print} statement is
-performed.@refill
-
-If the @code{else} appears on the same line as @var{then-body}, and
-@var{then-body} is not a compound statement (i.e., not surrounded by
-curly braces), then a semicolon must separate @var{then-body} from
-@code{else}. To illustrate this, let's rewrite the previous example:
-
-@example
-awk '@{ if (x % 2 == 0) print "x is even"; else
- print "x is odd" @}'
-@end example
-
-@noindent
-If you forget the @samp{;}, @code{awk} won't be able to parse the
-statement, and you will get a syntax error.
-
-We would not actually write this example this way, because a human
-reader might fail to see the @code{else} if it were not the first thing
-on its line.
-
-@node While Statement, Do Statement, If Statement, Statements
-@section The @code{while} Statement
-@cindex @code{while} statement
-@cindex loop
-@cindex body of a loop
-
-In programming, a @dfn{loop} means a part of a program that is (or at least can
-be) executed two or more times in succession.
-
-The @code{while} statement is the simplest looping statement in
-@code{awk}. It repeatedly executes a statement as long as a condition is
-true. It looks like this:
-
-@example
-while (@var{condition})
- @var{body}
-@end example
-
-@noindent
-Here @var{body} is a statement that we call the @dfn{body} of the loop,
-and @var{condition} is an expression that controls how long the loop
-keeps running.
-
-The first thing the @code{while} statement does is test @var{condition}.
-If @var{condition} is true, it executes the statement @var{body}.
-(@var{condition} is true when the value
-is not zero and not a null string.) After @var{body} has been executed,
-@var{condition} is tested again, and if it is still true, @var{body} is
-executed again. This process repeats until @var{condition} is no longer
-true. If @var{condition} is initially false, the body of the loop is
-never executed.@refill
-
-This example prints the first three fields of each record, one per line.
-
-@example
-awk '@{ i = 1
- while (i <= 3) @{
- print $i
- i++
- @}
-@}'
-@end example
-
-@noindent
-Here the body of the loop is a compound statement enclosed in braces,
-containing two statements.
-
-The loop works like this: first, the value of @code{i} is set to 1.
-Then, the @code{while} tests whether @code{i} is less than or equal to
-three. This is the case when @code{i} equals one, so the @code{i}-th
-field is printed. Then the @code{i++} increments the value of @code{i}
-and the loop repeats. The loop terminates when @code{i} reaches 4.
-
-As you can see, a newline is not required between the condition and the
-body; but using one makes the program clearer unless the body is a
-compound statement or is very simple. The newline after the open-brace
-that begins the compound statement is not required either, but the
-program would be hard to read without it.
-
-@node Do Statement, For Statement, While Statement, Statements
-@section The @code{do}-@code{while} Statement
-
-The @code{do} loop is a variation of the @code{while} looping statement.
-The @code{do} loop executes the @var{body} once, then repeats @var{body}
-as long as @var{condition} is true. It looks like this:
-
-@example
-do
- @var{body}
-while (@var{condition})
-@end example
-
-Even if @var{condition} is false at the start, @var{body} is executed at
-least once (and only once, unless executing @var{body} makes
-@var{condition} true). Contrast this with the corresponding
-@code{while} statement:
-
-@example
-while (@var{condition})
- @var{body}
-@end example
-
-@noindent
-This statement does not execute @var{body} even once if @var{condition}
-is false to begin with.
-
-Here is an example of a @code{do} statement:
-
-@example
-awk '@{ i = 1
- do @{
- print $0
- i++
- @} while (i <= 10)
-@}'
-@end example
-
-@noindent
-prints each input record ten times. It isn't a very realistic example,
-since in this case an ordinary @code{while} would do just as well. But
-this reflects actual experience; there is only occasionally a real use
-for a @code{do} statement.@refill
-
-@node For Statement, Break Statement, Do Statement, Statements
-@section The @code{for} Statement
-@cindex @code{for} statement
-
-The @code{for} statement makes it more convenient to count iterations of a
-loop. The general form of the @code{for} statement looks like this:@refill
-
-@example
-for (@var{initialization}; @var{condition}; @var{increment})
- @var{body}
-@end example
-
-@noindent
-This statement starts by executing @var{initialization}. Then, as long
-as @var{condition} is true, it repeatedly executes @var{body} and then
-@var{increment}. Typically @var{initialization} sets a variable to
-either zero or one, @var{increment} adds 1 to it, and @var{condition}
-compares it against the desired number of iterations.
-
-Here is an example of a @code{for} statement:
-
-@example
-@group
-awk '@{ for (i = 1; i <= 3; i++)
- print $i
-@}'
-@end group
-@end example
-
-@noindent
-This prints the first three fields of each input record, one field per
-line.
-
-In the @code{for} statement, @var{body} stands for any statement, but
-@var{initialization}, @var{condition} and @var{increment} are just
-expressions. You cannot set more than one variable in the
-@var{initialization} part unless you use a multiple assignment statement
-such as @code{x = y = 0}, which is possible only if all the initial values
-are equal. (But you can initialize additional variables by writing
-their assignments as separate statements preceding the @code{for} loop.)
-
-The same is true of the @var{increment} part; to increment additional
-variables, you must write separate statements at the end of the loop.
-The C compound expression, using C's comma operator, would be useful in
-this context, but it is not supported in @code{awk}.
-
-Most often, @var{increment} is an increment expression, as in the
-example above. But this is not required; it can be any expression
-whatever. For example, this statement prints all the powers of 2
-between 1 and 100:
-
-@example
-for (i = 1; i <= 100; i *= 2)
- print i
-@end example
-
-Any of the three expressions in the parentheses following the @code{for} may
-be omitted if there is nothing to be done there. Thus, @w{@samp{for (;x
-> 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the
-@var{condition} is omitted, it is treated as @var{true}, effectively
-yielding an @dfn{infinite loop} (i.e., a loop that will never
-terminate).@refill
-
-In most cases, a @code{for} loop is an abbreviation for a @code{while}
-loop, as shown here:
-
-@example
-@var{initialization}
-while (@var{condition}) @{
- @var{body}
- @var{increment}
-@}
-@end example
-
-@noindent
-The only exception is when the @code{continue} statement
-(@pxref{Continue Statement, ,The @code{continue} Statement}) is used
-inside the loop; changing a @code{for} statement to a @code{while}
-statement in this way can change the effect of the @code{continue}
-statement inside the loop.@refill
-
-There is an alternate version of the @code{for} loop, for iterating over
-all the indices of an array:
-
-@example
-for (i in array)
- @var{do something with} array[i]
-@end example
-
-@noindent
-@xref{Arrays, ,Arrays in @code{awk}}, for more information on this
-version of the @code{for} loop.
-
-The @code{awk} language has a @code{for} statement in addition to a
-@code{while} statement because often a @code{for} loop is both less work to
-type and more natural to think of. Counting the number of iterations is
-very common in loops. It can be easier to think of this counting as part
-of looping rather than as something to do inside the loop.
-
-The next section has more complicated examples of @code{for} loops.
-
-@node Break Statement, Continue Statement, For Statement, Statements
-@section The @code{break} Statement
-@cindex @code{break} statement
-@cindex loops, exiting
-
-The @code{break} statement jumps out of the innermost @code{for},
-@code{while}, or @code{do}-@code{while} loop that encloses it. The
-following example finds the smallest divisor of any integer, and also
-identifies prime numbers:@refill
-
-@smallexample
-awk '# find smallest divisor of num
- @{ num = $1
- for (div = 2; div*div <= num; div++)
- if (num % div == 0)
- break
- if (num % div == 0)
- printf "Smallest divisor of %d is %d\n", num, div
- else
- printf "%d is prime\n", num @}'
-@end smallexample
-
-When the remainder is zero in the first @code{if} statement, @code{awk}
-immediately @dfn{breaks out} of the containing @code{for} loop. This means
-that @code{awk} proceeds immediately to the statement following the loop
-and continues processing. (This is very different from the @code{exit}
-statement which stops the entire @code{awk} program.
-@xref{Exit Statement, ,The @code{exit} Statement}.)@refill
-
-Here is another program equivalent to the previous one. It illustrates how
-the @var{condition} of a @code{for} or @code{while} could just as well be
-replaced with a @code{break} inside an @code{if}:
-
-@smallexample
-@group
-awk '# find smallest divisor of num
- @{ num = $1
- for (div = 2; ; div++) @{
- if (num % div == 0) @{
- printf "Smallest divisor of %d is %d\n", num, div
- break
- @}
- if (div*div > num) @{
- printf "%d is prime\n", num
- break
- @}
- @}
-@}'
-@end group
-@end smallexample
-
-@node Continue Statement, Next Statement, Break Statement, Statements
-@section The @code{continue} Statement
-
-@cindex @code{continue} statement
-The @code{continue} statement, like @code{break}, is used only inside
-@code{for}, @code{while}, and @code{do}-@code{while} loops. It skips
-over the rest of the loop body, causing the next cycle around the loop
-to begin immediately. Contrast this with @code{break}, which jumps out
-of the loop altogether. Here is an example:@refill
-
-@example
-# print names that don't contain the string "ignore"
-
-# first, save the text of each line
-@{ names[NR] = $0 @}
-
-# print what we're interested in
-END @{
- for (x in names) @{
- if (names[x] ~ /ignore/)
- continue
- print names[x]
- @}
-@}
-@end example
-
-If one of the input records contains the string @samp{ignore}, this
-example skips the print statement for that record, and continues back to
-the first statement in the loop.
-
-This is not a practical example of @code{continue}, since it would be
-just as easy to write the loop like this:
-
-@example
-for (x in names)
- if (names[x] !~ /ignore/)
- print names[x]
-@end example
-
-@ignore
-from brennan@boeing.com:
-
-page 90, section 9.6. The example is too artificial as
-the one line program
-
- !/ignore/
-
-does the same thing.
-@end ignore
-@c ADR --- he's right, but don't worry about this for now
-
-The @code{continue} statement in a @code{for} loop directs @code{awk} to
-skip the rest of the body of the loop, and resume execution with the
-increment-expression of the @code{for} statement. The following program
-illustrates this fact:@refill
-
-@example
-awk 'BEGIN @{
- for (x = 0; x <= 20; x++) @{
- if (x == 5)
- continue
- printf ("%d ", x)
- @}
- print ""
-@}'
-@end example
-
-@noindent
-This program prints all the numbers from 0 to 20, except for 5, for
-which the @code{printf} is skipped. Since the increment @code{x++}
-is not skipped, @code{x} does not remain stuck at 5. Contrast the
-@code{for} loop above with the @code{while} loop:
-
-@example
-awk 'BEGIN @{
- x = 0
- while (x <= 20) @{
- if (x == 5)
- continue
- printf ("%d ", x)
- x++
- @}
- print ""
-@}'
-@end example
-
-@noindent
-This program loops forever once @code{x} gets to 5.
-
-As described above, the @code{continue} statement has no meaning when
-used outside the body of a loop. However, although it was never documented,
-historical implementations of @code{awk} have treated the @code{continue}
-statement outside of a loop as if it were a @code{next} statement
-(@pxref{Next Statement, ,The @code{next} Statement}).
-By default, @code{gawk} silently supports this usage. However, if
-@samp{-W posix} has been specified on the command line
-(@pxref{Command Line, ,Invoking @code{awk}}),
-it will be treated as an error, since the @sc{posix} standard specifies
-that @code{continue} should only be used inside the body of a loop.@refill
-
-@node Next Statement, Next File Statement, Continue Statement, Statements
-@section The @code{next} Statement
-@cindex @code{next} statement
-
-The @code{next} statement forces @code{awk} to immediately stop processing
-the current record and go on to the next record. This means that no
-further rules are executed for the current record. The rest of the
-current rule's action is not executed either.
-
-Contrast this with the effect of the @code{getline} function
-(@pxref{Getline, ,Explicit Input with @code{getline}}). That too causes
-@code{awk} to read the next record immediately, but it does not alter the
-flow of control in any way. So the rest of the current action executes
-with a new input record.
-
-At the highest level, @code{awk} program execution is a loop that reads
-an input record and then tests each rule's pattern against it. If you
-think of this loop as a @code{for} statement whose body contains the
-rules, then the @code{next} statement is analogous to a @code{continue}
-statement: it skips to the end of the body of this implicit loop, and
-executes the increment (which reads another record).
-
-For example, if your @code{awk} program works only on records with four
-fields, and you don't want it to fail when given bad input, you might
-use this rule near the beginning of the program:
-
-@smallexample
-NF != 4 @{
- printf("line %d skipped: doesn't have 4 fields", FNR) > "/dev/stderr"
- next
-@}
-@end smallexample
-
-@noindent
-so that the following rules will not see the bad record. The error
-message is redirected to the standard error output stream, as error
-messages should be. @xref{Special Files, ,Standard I/O Streams}.
-
-According to the @sc{posix} standard, the behavior is undefined if
-the @code{next} statement is used in a @code{BEGIN} or @code{END} rule.
-@code{gawk} will treat it as a syntax error.
-
-If the @code{next} statement causes the end of the input to be reached,
-then the code in the @code{END} rules, if any, will be executed.
-@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}.
-
-@node Next File Statement, Exit Statement, Next Statement, Statements
-@section The @code{next file} Statement
-
-@cindex @code{next file} statement
-The @code{next file} statement is similar to the @code{next} statement.
-However, instead of abandoning processing of the current record, the
-@code{next file} statement instructs @code{awk} to stop processing the
-current data file.
-
-Upon execution of the @code{next file} statement, @code{FILENAME} is
-updated to the name of the next data file listed on the command line,
-@code{FNR} is reset to 1, and processing starts over with the first
-rule in the progam. @xref{Built-in Variables}.
-
-If the @code{next file} statement causes the end of the input to be reached,
-then the code in the @code{END} rules, if any, will be executed.
-@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}.
-
-The @code{next file} statement is a @code{gawk} extension; it is not
-(currently) available in any other @code{awk} implementation. You can
-simulate its behavior by creating a library file named @file{nextfile.awk},
-with the following contents. (This sample program uses user-defined
-functions, a feature that has not been presented yet.
-@xref{User-defined, ,User-defined Functions},
-for more information.)@refill
-
-@smallexample
-# nextfile --- function to skip remaining records in current file
-
-# this should be read in before the "main" awk program
-
-function nextfile() @{ _abandon_ = FILENAME; next @}
-
-_abandon_ == FILENAME && FNR > 1 @{ next @}
-_abandon_ == FILENAME && FNR == 1 @{ _abandon_ = "" @}
-@end smallexample
-
-The @code{nextfile} function simply sets a ``private'' variable@footnote{Since
-all variables in @code{awk} are global, this program uses the common
-practice of prefixing the variable name with an underscore. In fact, it
-also suffixes the variable name with an underscore, as extra insurance
-against using a variable name that might be used in some other library
-file.} to the name of the current data file, and then retrieves the next
-record. Since this file is read before the main @code{awk} program,
-the rules that follows the function definition will be executed before the
-rules in the main program. The first rule continues to skip records as long as
-the name of the input file has not changed, and this is not the first
-record in the file. This rule is sufficient most of the time. But what if
-the @emph{same} data file is named twice in a row on the command line?
-This rule would not process the data file the second time. The second rule
-catches this case: If the data file name is what was being skipped, but
-@code{FNR} is 1, then this is the second time the file is being processed,
-and it should not be skipped.
-
-The @code{next file} statement would be useful if you have many data
-files to process, and due to the nature of the data, you expect that you
-would not want to process every record in the file. In order to move on to
-the next data file, you would have to continue scanning the unwanted
-records (as described above). The @code{next file} statement accomplishes
-this much more efficiently.
-
-@ignore
-Would it make sense down the road to nuke `next file' in favor of
-semantics that would make this work?
-
- function nextfile() { ARGIND++ ; next }
-@end ignore
-
-@node Exit Statement, , Next File Statement, Statements
-@section The @code{exit} Statement
-
-@cindex @code{exit} statement
-The @code{exit} statement causes @code{awk} to immediately stop
-executing the current rule and to stop processing input; any remaining input
-is ignored.@refill
-
-If an @code{exit} statement is executed from a @code{BEGIN} rule the
-program stops processing everything immediately. No input records are
-read. However, if an @code{END} rule is present, it is executed
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}).
-
-If @code{exit} is used as part of an @code{END} rule, it causes
-the program to stop immediately.
-
-An @code{exit} statement that is part of an ordinary rule (that is, not part
-of a @code{BEGIN} or @code{END} rule) stops the execution of any further
-automatic rules, but the @code{END} rule is executed if there is one.
-If you do not want the @code{END} rule to do its job in this case, you
-can set a variable to nonzero before the @code{exit} statement, and check
-that variable in the @code{END} rule.
-
-If an argument is supplied to @code{exit}, its value is used as the exit
-status code for the @code{awk} process. If no argument is supplied,
-@code{exit} returns status zero (success).@refill
-
-For example, let's say you've discovered an error condition you really
-don't know how to handle. Conventionally, programs report this by
-exiting with a nonzero status. Your @code{awk} program can do this
-using an @code{exit} statement with a nonzero argument. Here's an
-example of this:@refill
-
-@example
-@group
-BEGIN @{
- if (("date" | getline date_now) < 0) @{
- print "Can't get system date" > "/dev/stderr"
- exit 4
- @}
-@}
-@end group
-@end example
-
-@node Arrays, Built-in, Statements, Top
-@chapter Arrays in @code{awk}
-
-An @dfn{array} is a table of values, called @dfn{elements}. The
-elements of an array are distinguished by their indices. @dfn{Indices}
-may be either numbers or strings. Each array has a name, which looks
-like a variable name, but must not be in use as a variable name in the
-same @code{awk} program.
-
-@menu
-* Array Intro:: Introduction to Arrays
-* Reference to Elements:: How to examine one element of an array.
-* Assigning Elements:: How to change an element of an array.
-* Array Example:: Basic Example of an Array
-* Scanning an Array:: A variation of the @code{for} statement.
- It loops through the indices of
- an array's existing elements.
-* Delete:: The @code{delete} statement removes
- an element from an array.
-* Numeric Array Subscripts:: How to use numbers as subscripts in @code{awk}.
-* Multi-dimensional:: Emulating multi-dimensional arrays in @code{awk}.
-* Multi-scanning:: Scanning multi-dimensional arrays.
-@end menu
-
-@node Array Intro, Reference to Elements, Arrays, Arrays
-@section Introduction to Arrays
-
-@cindex arrays
-The @code{awk} language has one-dimensional @dfn{arrays} for storing groups
-of related strings or numbers.
-
-Every @code{awk} array must have a name. Array names have the same
-syntax as variable names; any valid variable name would also be a valid
-array name. But you cannot use one name in both ways (as an array and
-as a variable) in one @code{awk} program.
-
-Arrays in @code{awk} superficially resemble arrays in other programming
-languages; but there are fundamental differences. In @code{awk}, you
-don't need to specify the size of an array before you start to use it.
-Additionally, any number or string in @code{awk} may be used as an
-array index.
-
-In most other languages, you have to @dfn{declare} an array and specify
-how many elements or components it contains. In such languages, the
-declaration causes a contiguous block of memory to be allocated for that
-many elements. An index in the array must be a positive integer; for
-example, the index 0 specifies the first element in the array, which is
-actually stored at the beginning of the block of memory. Index 1
-specifies the second element, which is stored in memory right after the
-first element, and so on. It is impossible to add more elements to the
-array, because it has room for only as many elements as you declared.
-
-A contiguous array of four elements might look like this,
-conceptually, if the element values are @code{8}, @code{"foo"},
-@code{""} and @code{30}:@refill
-
-@example
-+---------+---------+--------+---------+
-| 8 | "foo" | "" | 30 | @r{value}
-+---------+---------+--------+---------+
- 0 1 2 3 @r{index}
-@end example
-
-@noindent
-Only the values are stored; the indices are implicit from the order of
-the values. @code{8} is the value at index 0, because @code{8} appears in the
-position with 0 elements before it.
-
-@cindex arrays, definition of
-@cindex associative arrays
-Arrays in @code{awk} are different: they are @dfn{associative}. This means
-that each array is a collection of pairs: an index, and its corresponding
-array element value:
-
-@example
-@r{Element} 4 @r{Value} 30
-@r{Element} 2 @r{Value} "foo"
-@r{Element} 1 @r{Value} 8
-@r{Element} 3 @r{Value} ""
-@end example
-
-@noindent
-We have shown the pairs in jumbled order because their order is irrelevant.
-
-One advantage of an associative array is that new pairs can be added
-at any time. For example, suppose we add to the above array a tenth element
-whose value is @w{@code{"number ten"}}. The result is this:
-
-@example
-@r{Element} 10 @r{Value} "number ten"
-@r{Element} 4 @r{Value} 30
-@r{Element} 2 @r{Value} "foo"
-@r{Element} 1 @r{Value} 8
-@r{Element} 3 @r{Value} ""
-@end example
-
-@noindent
-Now the array is @dfn{sparse} (i.e., some indices are missing): it has
-elements 1--4 and 10, but doesn't have elements 5, 6, 7, 8, or 9.@refill
-
-Another consequence of associative arrays is that the indices don't
-have to be positive integers. Any number, or even a string, can be
-an index. For example, here is an array which translates words from
-English into French:
-
-@example
-@r{Element} "dog" @r{Value} "chien"
-@r{Element} "cat" @r{Value} "chat"
-@r{Element} "one" @r{Value} "un"
-@r{Element} 1 @r{Value} "un"
-@end example
-
-@noindent
-Here we decided to translate the number 1 in both spelled-out and
-numeric form---thus illustrating that a single array can have both
-numbers and strings as indices.
-
-When @code{awk} creates an array for you, e.g., with the @code{split}
-built-in function,
-that array's indices are consecutive integers starting at 1.
-(@xref{String Functions, ,Built-in Functions for String Manipulation}.)
-
-@node Reference to Elements, Assigning Elements, Array Intro, Arrays
-@section Referring to an Array Element
-@cindex array reference
-@cindex element of array
-@cindex reference to array
-
-The principal way of using an array is to refer to one of its elements.
-An array reference is an expression which looks like this:
-
-@example
-@var{array}[@var{index}]
-@end example
-
-@noindent
-Here, @var{array} is the name of an array. The expression @var{index} is
-the index of the element of the array that you want.
-
-The value of the array reference is the current value of that array
-element. For example, @code{foo[4.3]} is an expression for the element
-of array @code{foo} at index 4.3.
-
-If you refer to an array element that has no recorded value, the value
-of the reference is @code{""}, the null string. This includes elements
-to which you have not assigned any value, and elements that have been
-deleted (@pxref{Delete, ,The @code{delete} Statement}). Such a reference
-automatically creates that array element, with the null string as its value.
-(In some cases, this is unfortunate, because it might waste memory inside
-@code{awk}).
-
-@cindex arrays, presence of elements
-You can find out if an element exists in an array at a certain index with
-the expression:
-
-@example
-@var{index} in @var{array}
-@end example
-
-@noindent
-This expression tests whether or not the particular index exists,
-without the side effect of creating that element if it is not present.
-The expression has the value 1 (true) if @code{@var{array}[@var{index}]}
-exists, and 0 (false) if it does not exist.@refill
-
-For example, to test whether the array @code{frequencies} contains the
-index @code{"2"}, you could write this statement:@refill
-
-@smallexample
-if ("2" in frequencies) print "Subscript \"2\" is present."
-@end smallexample
-
-Note that this is @emph{not} a test of whether or not the array
-@code{frequencies} contains an element whose @emph{value} is @code{"2"}.
-(There is no way to do that except to scan all the elements.) Also, this
-@emph{does not} create @code{frequencies["2"]}, while the following
-(incorrect) alternative would do so:@refill
-
-@smallexample
-if (frequencies["2"] != "") print "Subscript \"2\" is present."
-@end smallexample
-
-@node Assigning Elements, Array Example, Reference to Elements, Arrays
-@section Assigning Array Elements
-@cindex array assignment
-@cindex element assignment
-
-Array elements are lvalues: they can be assigned values just like
-@code{awk} variables:
-
-@example
-@var{array}[@var{subscript}] = @var{value}
-@end example
-
-@noindent
-Here @var{array} is the name of your array. The expression
-@var{subscript} is the index of the element of the array that you want
-to assign a value. The expression @var{value} is the value you are
-assigning to that element of the array.@refill
-
-@node Array Example, Scanning an Array, Assigning Elements, Arrays
-@section Basic Example of an Array
-
-The following program takes a list of lines, each beginning with a line
-number, and prints them out in order of line number. The line numbers are
-not in order, however, when they are first read: they are scrambled. This
-program sorts the lines by making an array using the line numbers as
-subscripts. It then prints out the lines in sorted order of their numbers.
-It is a very simple program, and gets confused if it encounters repeated
-numbers, gaps, or lines that don't begin with a number.@refill
-
-@example
-@{
- if ($1 > max)
- max = $1
- arr[$1] = $0
-@}
-
-END @{
- for (x = 1; x <= max; x++)
- print arr[x]
-@}
-@end example
-
-The first rule keeps track of the largest line number seen so far;
-it also stores each line into the array @code{arr}, at an index that
-is the line's number.
-
-The second rule runs after all the input has been read, to print out
-all the lines.
-
-When this program is run with the following input:
-
-@example
-5 I am the Five man
-2 Who are you? The new number two!
-4 . . . And four on the floor
-1 Who is number one?
-3 I three you.
-@end example
-
-@noindent
-its output is this:
-
-@example
-1 Who is number one?
-2 Who are you? The new number two!
-3 I three you.
-4 . . . And four on the floor
-5 I am the Five man
-@end example
-
-If a line number is repeated, the last line with a given number overrides
-the others.
-
-Gaps in the line numbers can be handled with an easy improvement to the
-program's @code{END} rule:
-
-@example
-END @{
- for (x = 1; x <= max; x++)
- if (x in arr)
- print arr[x]
-@}
-@end example
-
-@node Scanning an Array, Delete, Array Example, Arrays
-@section Scanning all Elements of an Array
-@cindex @code{for (x in @dots{})}
-@cindex arrays, special @code{for} statement
-@cindex scanning an array
-
-In programs that use arrays, often you need a loop that executes
-once for each element of an array. In other languages, where arrays are
-contiguous and indices are limited to positive integers, this is
-easy: the largest index is one less than the length of the array, and you can
-find all the valid indices by counting from zero up to that value. This
-technique won't do the job in @code{awk}, since any number or string
-may be an array index. So @code{awk} has a special kind of @code{for}
-statement for scanning an array:
-
-@example
-for (@var{var} in @var{array})
- @var{body}
-@end example
-
-@noindent
-This loop executes @var{body} once for each different value that your
-program has previously used as an index in @var{array}, with the
-variable @var{var} set to that index.@refill
-
-Here is a program that uses this form of the @code{for} statement. The
-first rule scans the input records and notes which words appear (at
-least once) in the input, by storing a 1 into the array @code{used} with
-the word as index. The second rule scans the elements of @code{used} to
-find all the distinct words that appear in the input. It prints each
-word that is more than 10 characters long, and also prints the number of
-such words. @xref{Built-in, ,Built-in Functions}, for more information
-on the built-in function @code{length}.
-
-@smallexample
-# Record a 1 for each word that is used at least once.
-@{
- for (i = 1; i <= NF; i++)
- used[$i] = 1
-@}
-
-# Find number of distinct words more than 10 characters long.
-END @{
- for (x in used)
- if (length(x) > 10) @{
- ++num_long_words
- print x
- @}
- print num_long_words, "words longer than 10 characters"
-@}
-@end smallexample
-
-@noindent
-@xref{Sample Program}, for a more detailed example of this type.
-
-The order in which elements of the array are accessed by this statement
-is determined by the internal arrangement of the array elements within
-@code{awk} and cannot be controlled or changed. This can lead to
-problems if new elements are added to @var{array} by statements in
-@var{body}; you cannot predict whether or not the @code{for} loop will
-reach them. Similarly, changing @var{var} inside the loop can produce
-strange results. It is best to avoid such things.@refill
-
-@node Delete, Numeric Array Subscripts, Scanning an Array, Arrays
-@section The @code{delete} Statement
-@cindex @code{delete} statement
-@cindex deleting elements of arrays
-@cindex removing elements of arrays
-@cindex arrays, deleting an element
-
-You can remove an individual element of an array using the @code{delete}
-statement:
-
-@example
-delete @var{array}[@var{index}]
-@end example
-
-You can not refer to an array element after it has been deleted;
-it is as if you had never referred
-to it and had never given it any value. You can no longer obtain any
-value the element once had.
-
-Here is an example of deleting elements in an array:
-
-@example
-for (i in frequencies)
- delete frequencies[i]
-@end example
-
-@noindent
-This example removes all the elements from the array @code{frequencies}.
-
-If you delete an element, a subsequent @code{for} statement to scan the array
-will not report that element, and the @code{in} operator to check for
-the presence of that element will return 0:
-
-@example
-delete foo[4]
-if (4 in foo)
- print "This will never be printed"
-@end example
-
-It is not an error to delete an element which does not exist.
-
-@node Numeric Array Subscripts, Multi-dimensional, Delete, Arrays
-@section Using Numbers to Subscript Arrays
-
-An important aspect of arrays to remember is that array subscripts
-are @emph{always} strings. If you use a numeric value as a subscript,
-it will be converted to a string value before it is used for subscripting
-(@pxref{Conversion, ,Conversion of Strings and Numbers}).
-
-@cindex conversions, during subscripting
-@cindex numbers, used as subscripts
-@vindex CONVFMT
-This means that the value of the @code{CONVFMT} can potentially
-affect how your program accesses elements of an array. For example:
-
-@example
-a = b = 12.153
-data[a] = 1
-CONVFMT = "%2.2f"
-if (b in data)
- printf "%s is in data", b
-else
- printf "%s is not in data", b
-@end example
-
-@noindent
-should print @samp{12.15 is not in data}. The first statement gives
-both @code{a} and @code{b} the same numeric value. Assigning to
-@code{data[a]} first gives @code{a} the string value @code{"12.153"}
-(using the default conversion value of @code{CONVFMT}, @code{"%.6g"}),
-and then assigns 1 to @code{data["12.153"]}. The program then changes
-the value of @code{CONVFMT}. The test @samp{(b in data)} forces @code{b}
-to be converted to a string, this time @code{"12.15"}, since the value of
-@code{CONVFMT} only allows two significant digits. This test fails,
-since @code{"12.15"} is a different string from @code{"12.153"}.@refill
-
-According to the rules for conversions
-(@pxref{Conversion, ,Conversion of Strings and Numbers}), integer
-values are always converted to strings as integers, no matter what the
-value of @code{CONVFMT} may happen to be. So the usual case of@refill
-
-@example
-for (i = 1; i <= maxsub; i++)
- @i{do something with} array[i]
-@end example
-
-@noindent
-will work, no matter what the value of @code{CONVFMT}.
-
-Like many things in @code{awk}, the majority of the time things work
-as you would expect them to work. But it is useful to have a precise
-knowledge of the actual rules, since sometimes they can have a subtle
-effect on your programs.
-
-@node Multi-dimensional, Multi-scanning, Numeric Array Subscripts, Arrays
-@section Multi-dimensional Arrays
-
-@c the following index entry is an overfull hbox. --mew 30jan1992
-@cindex subscripts in arrays
-@cindex arrays, multi-dimensional subscripts
-@cindex multi-dimensional subscripts
-A multi-dimensional array is an array in which an element is identified
-by a sequence of indices, not a single index. For example, a
-two-dimensional array requires two indices. The usual way (in most
-languages, including @code{awk}) to refer to an element of a
-two-dimensional array named @code{grid} is with
-@code{grid[@var{x},@var{y}]}.
-
-@vindex SUBSEP
-Multi-dimensional arrays are supported in @code{awk} through
-concatenation of indices into one string. What happens is that
-@code{awk} converts the indices into strings
-(@pxref{Conversion, ,Conversion of Strings and Numbers}) and
-concatenates them together, with a separator between them. This creates
-a single string that describes the values of the separate indices. The
-combined string is used as a single index into an ordinary,
-one-dimensional array. The separator used is the value of the built-in
-variable @code{SUBSEP}.@refill
-
-For example, suppose we evaluate the expression @code{foo[5,12]="value"}
-when the value of @code{SUBSEP} is @code{"@@"}. The numbers 5 and 12 are
-converted to strings and
-concatenated with an @samp{@@} between them, yielding @code{"5@@12"}; thus,
-the array element @code{foo["5@@12"]} is set to @code{"value"}.@refill
-
-Once the element's value is stored, @code{awk} has no record of whether
-it was stored with a single index or a sequence of indices. The two
-expressions @code{foo[5,12]} and @w{@code{foo[5 SUBSEP 12]}} always have
-the same value.
-
-The default value of @code{SUBSEP} is the string @code{"\034"},
-which contains a nonprinting character that is unlikely to appear in an
-@code{awk} program or in the input data.
-
-The usefulness of choosing an unlikely character comes from the fact
-that index values that contain a string matching @code{SUBSEP} lead to
-combined strings that are ambiguous. Suppose that @code{SUBSEP} were
-@code{"@@"}; then @w{@code{foo["a@@b", "c"]}} and @w{@code{foo["a",
-"b@@c"]}} would be indistinguishable because both would actually be
-stored as @code{foo["a@@b@@c"]}. Because @code{SUBSEP} is
-@code{"\034"}, such confusion can arise only when an index
-contains the character with ASCII code 034, which is a rare
-event.@refill
-
-You can test whether a particular index-sequence exists in a
-``multi-dimensional'' array with the same operator @code{in} used for single
-dimensional arrays. Instead of a single index as the left-hand operand,
-write the whole sequence of indices, separated by commas, in
-parentheses:@refill
-
-@example
-(@var{subscript1}, @var{subscript2}, @dots{}) in @var{array}
-@end example
-
-The following example treats its input as a two-dimensional array of
-fields; it rotates this array 90 degrees clockwise and prints the
-result. It assumes that all lines have the same number of
-elements.
-
-@example
-awk '@{
- if (max_nf < NF)
- max_nf = NF
- max_nr = NR
- for (x = 1; x <= NF; x++)
- vector[x, NR] = $x
-@}
-
-END @{
- for (x = 1; x <= max_nf; x++) @{
- for (y = max_nr; y >= 1; --y)
- printf("%s ", vector[x, y])
- printf("\n")
- @}
-@}'
-@end example
-
-@noindent
-When given the input:
-
-@example
-@group
-1 2 3 4 5 6
-2 3 4 5 6 1
-3 4 5 6 1 2
-4 5 6 1 2 3
-@end group
-@end example
-
-@noindent
-it produces:
-
-@example
-@group
-4 3 2 1
-5 4 3 2
-6 5 4 3
-1 6 5 4
-2 1 6 5
-3 2 1 6
-@end group
-@end example
-
-@node Multi-scanning, , Multi-dimensional, Arrays
-@section Scanning Multi-dimensional Arrays
-
-There is no special @code{for} statement for scanning a
-``multi-dimensional'' array; there cannot be one, because in truth there
-are no multi-dimensional arrays or elements; there is only a
-multi-dimensional @emph{way of accessing} an array.
-
-However, if your program has an array that is always accessed as
-multi-dimensional, you can get the effect of scanning it by combining
-the scanning @code{for} statement
-(@pxref{Scanning an Array, ,Scanning all Elements of an Array}) with the
-@code{split} built-in function
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
-It works like this:@refill
-
-@example
-for (combined in @var{array}) @{
- split(combined, separate, SUBSEP)
- @dots{}
-@}
-@end example
-
-@noindent
-This finds each concatenated, combined index in the array, and splits it
-into the individual indices by breaking it apart where the value of
-@code{SUBSEP} appears. The split-out indices become the elements of
-the array @code{separate}.
-
-Thus, suppose you have previously stored in @code{@var{array}[1,
-"foo"]}; then an element with index @code{"1\034foo"} exists in
-@var{array}. (Recall that the default value of @code{SUBSEP} contains
-the character with code 034.) Sooner or later the @code{for} statement
-will find that index and do an iteration with @code{combined} set to
-@code{"1\034foo"}. Then the @code{split} function is called as
-follows:
-
-@example
-split("1\034foo", separate, "\034")
-@end example
-
-@noindent
-The result of this is to set @code{separate[1]} to 1 and @code{separate[2]}
-to @code{"foo"}. Presto, the original sequence of separate indices has
-been recovered.
-
-@node Built-in, User-defined, Arrays, Top
-@chapter Built-in Functions
-
-@cindex built-in functions
-@dfn{Built-in} functions are functions that are always available for
-your @code{awk} program to call. This chapter defines all the built-in
-functions in @code{awk}; some of them are mentioned in other sections,
-but they are summarized here for your convenience. (You can also define
-new functions yourself. @xref{User-defined, ,User-defined Functions}.)
-
-@menu
-* Calling Built-in:: How to call built-in functions.
-* Numeric Functions:: Functions that work with numbers,
- including @code{int}, @code{sin} and @code{rand}.
-* String Functions:: Functions for string manipulation,
- such as @code{split}, @code{match}, and @code{sprintf}.
-* I/O Functions:: Functions for files and shell commands.
-* Time Functions:: Functions for dealing with time stamps.
-@end menu
-
-@node Calling Built-in, Numeric Functions, Built-in, Built-in
-@section Calling Built-in Functions
-
-To call a built-in function, write the name of the function followed
-by arguments in parentheses. For example, @code{atan2(y + z, 1)}
-is a call to the function @code{atan2}, with two arguments.
-
-Whitespace is ignored between the built-in function name and the
-open-parenthesis, but we recommend that you avoid using whitespace
-there. User-defined functions do not permit whitespace in this way, and
-you will find it easier to avoid mistakes by following a simple
-convention which always works: no whitespace after a function name.
-
-Each built-in function accepts a certain number of arguments. In most
-cases, any extra arguments given to built-in functions are ignored. The
-defaults for omitted arguments vary from function to function and are
-described under the individual functions.
-
-When a function is called, expressions that create the function's actual
-parameters are evaluated completely before the function call is performed.
-For example, in the code fragment:
-
-@example
-i = 4
-j = sqrt(i++)
-@end example
-
-@noindent
-the variable @code{i} is set to 5 before @code{sqrt} is called
-with a value of 4 for its actual parameter.
-
-@node Numeric Functions, String Functions, Calling Built-in, Built-in
-@section Numeric Built-in Functions
-@c I didn't make all the examples small because a couple of them were
-@c short already. --mew 29jan1992
-
-Here is a full list of built-in functions that work with numbers:
-
-@table @code
-@item int(@var{x})
-This gives you the integer part of @var{x}, truncated toward 0. This
-produces the nearest integer to @var{x}, located between @var{x} and 0.
-
-For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)}
-is @minus{}3, and @code{int(-3)} is @minus{}3 as well.@refill
-
-@item sqrt(@var{x})
-This gives you the positive square root of @var{x}. It reports an error
-if @var{x} is negative. Thus, @code{sqrt(4)} is 2.@refill
-
-@item exp(@var{x})
-This gives you the exponential of @var{x}, or reports an error if
-@var{x} is out of range. The range of values @var{x} can have depends
-on your machine's floating point representation.@refill
-
-@item log(@var{x})
-This gives you the natural logarithm of @var{x}, if @var{x} is positive;
-otherwise, it reports an error.@refill
-
-@item sin(@var{x})
-This gives you the sine of @var{x}, with @var{x} in radians.
-
-@item cos(@var{x})
-This gives you the cosine of @var{x}, with @var{x} in radians.
-
-@item atan2(@var{y}, @var{x})
-This gives you the arctangent of @code{@var{y} / @var{x}} in radians.
-
-@item rand()
-This gives you a random number. The values of @code{rand} are
-uniformly-distributed between 0 and 1. The value is never 0 and never
-1.
-
-Often you want random integers instead. Here is a user-defined function
-you can use to obtain a random nonnegative integer less than @var{n}:
-
-@example
-function randint(n) @{
- return int(n * rand())
-@}
-@end example
-
-@noindent
-The multiplication produces a random real number greater than 0 and less
-than @var{n}. We then make it an integer (using @code{int}) between 0
-and @code{@var{n} @minus{} 1}.
-
-Here is an example where a similar function is used to produce
-random integers between 1 and @var{n}. Note that this program will
-print a new random number for each input record.
-
-@smallexample
-awk '
-# Function to roll a simulated die.
-function roll(n) @{ return 1 + int(rand() * n) @}
-
-# Roll 3 six-sided dice and print total number of points.
-@{
- printf("%d points\n", roll(6)+roll(6)+roll(6))
-@}'
-@end smallexample
-
-@strong{Note:} @code{rand} starts generating numbers from the same
-point, or @dfn{seed}, each time you run @code{awk}. This means that
-a program will produce the same results each time you run it.
-The numbers are random within one @code{awk} run, but predictable
-from run to run. This is convenient for debugging, but if you want
-a program to do different things each time it is used, you must change
-the seed to a value that will be different in each run. To do this,
-use @code{srand}.
-
-@item srand(@var{x})
-The function @code{srand} sets the starting point, or @dfn{seed},
-for generating random numbers to the value @var{x}.
-
-Each seed value leads to a particular sequence of ``random'' numbers.
-Thus, if you set the seed to the same value a second time, you will get
-the same sequence of ``random'' numbers again.
-
-If you omit the argument @var{x}, as in @code{srand()}, then the current
-date and time of day are used for a seed. This is the way to get random
-numbers that are truly unpredictable.
-
-The return value of @code{srand} is the previous seed. This makes it
-easy to keep track of the seeds for use in consistently reproducing
-sequences of random numbers.
-@end table
-
-@node String Functions, I/O Functions, Numeric Functions, Built-in
-@section Built-in Functions for String Manipulation
-
-The functions in this section look at or change the text of one or more
-strings.
-
-@table @code
-@item index(@var{in}, @var{find})
-@findex match
-This searches the string @var{in} for the first occurrence of the string
-@var{find}, and returns the position in characters where that occurrence
-begins in the string @var{in}. For example:@refill
-
-@smallexample
-awk 'BEGIN @{ print index("peanut", "an") @}'
-@end smallexample
-
-@noindent
-prints @samp{3}. If @var{find} is not found, @code{index} returns 0.
-(Remember that string indices in @code{awk} start at 1.)
-
-@item length(@var{string})
-@findex length
-This gives you the number of characters in @var{string}. If
-@var{string} is a number, the length of the digit string representing
-that number is returned. For example, @code{length("abcde")} is 5. By
-contrast, @code{length(15 * 35)} works out to 3. How? Well, 15 * 35 =
-525, and 525 is then converted to the string @samp{"525"}, which has
-three characters.
-
-If no argument is supplied, @code{length} returns the length of @code{$0}.
-
-In older versions of @code{awk}, you could call the @code{length} function
-without any parentheses. Doing so is marked as ``deprecated'' in the
-@sc{posix} standard. This means that while you can do this in your
-programs, it is a feature that can eventually be removed from a future
-version of the standard. Therefore, for maximal portability of your
-@code{awk} programs you should always supply the parentheses.
-
-@item match(@var{string}, @var{regexp})
-@findex match
-The @code{match} function searches the string, @var{string}, for the
-longest, leftmost substring matched by the regular expression,
-@var{regexp}. It returns the character position, or @dfn{index}, of
-where that substring begins (1, if it starts at the beginning of
-@var{string}). If no match if found, it returns 0.
-
-@vindex RSTART
-@vindex RLENGTH
-The @code{match} function sets the built-in variable @code{RSTART} to
-the index. It also sets the built-in variable @code{RLENGTH} to the
-length in characters of the matched substring. If no match is found,
-@code{RSTART} is set to 0, and @code{RLENGTH} to @minus{}1.
-
-For example:
-
-@smallexample
-awk '@{
- if ($1 == "FIND")
- regex = $2
- else @{
- where = match($0, regex)
- if (where)
- print "Match of", regex, "found at", where, "in", $0
- @}
-@}'
-@end smallexample
-
-@noindent
-This program looks for lines that match the regular expression stored in
-the variable @code{regex}. This regular expression can be changed. If the
-first word on a line is @samp{FIND}, @code{regex} is changed to be the
-second word on that line. Therefore, given:
-
-@smallexample
-FIND fo*bar
-My program was a foobar
-But none of it would doobar
-FIND Melvin
-JF+KM
-This line is property of The Reality Engineering Co.
-This file created by Melvin.
-@end smallexample
-
-@noindent
-@code{awk} prints:
-
-@smallexample
-Match of fo*bar found at 18 in My program was a foobar
-Match of Melvin found at 26 in This file created by Melvin.
-@end smallexample
-
-@item split(@var{string}, @var{array}, @var{fieldsep})
-@findex split
-This divides @var{string} into pieces separated by @var{fieldsep},
-and stores the pieces in @var{array}. The first piece is stored in
-@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
-forth. The string value of the third argument, @var{fieldsep}, is
-a regexp describing where to split @var{string} (much as @code{FS} can
-be a regexp describing where to split input records). If
-the @var{fieldsep} is omitted, the value of @code{FS} is used.
-@code{split} returns the number of elements created.@refill
-
-The @code{split} function, then, splits strings into pieces in a
-manner similar to the way input lines are split into fields. For example:
-
-@smallexample
-split("auto-da-fe", a, "-")
-@end smallexample
-
-@noindent
-splits the string @samp{auto-da-fe} into three fields using @samp{-} as the
-separator. It sets the contents of the array @code{a} as follows:
-
-@smallexample
-a[1] = "auto"
-a[2] = "da"
-a[3] = "fe"
-@end smallexample
-
-@noindent
-The value returned by this call to @code{split} is 3.
-
-As with input field-splitting, when the value of @var{fieldsep} is
-@code{" "}, leading and trailing whitespace is ignored, and the elements
-are separated by runs of whitespace.
-
-@item sprintf(@var{format}, @var{expression1},@dots{})
-@findex sprintf
-This returns (without printing) the string that @code{printf} would
-have printed out with the same arguments
-(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).
-For example:@refill
-
-@smallexample
-sprintf("pi = %.2f (approx.)", 22/7)
-@end smallexample
-
-@noindent
-returns the string @w{@code{"pi = 3.14 (approx.)"}}.
-
-@item sub(@var{regexp}, @var{replacement}, @var{target})
-@findex sub
-The @code{sub} function alters the value of @var{target}.
-It searches this value, which should be a string, for the
-leftmost substring matched by the regular expression, @var{regexp},
-extending this match as far as possible. Then the entire string is
-changed by replacing the matched text with @var{replacement}.
-The modified string becomes the new value of @var{target}.
-
-This function is peculiar because @var{target} is not simply
-used to compute a value, and not just any expression will do: it
-must be a variable, field or array reference, so that @code{sub} can
-store a modified value there. If this argument is omitted, then the
-default is to use and alter @code{$0}.
-
-For example:@refill
-
-@smallexample
-str = "water, water, everywhere"
-sub(/at/, "ith", str)
-@end smallexample
-
-@noindent
-sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the
-leftmost, longest occurrence of @samp{at} with @samp{ith}.
-
-The @code{sub} function returns the number of substitutions made (either
-one or zero).
-
-If the special character @samp{&} appears in @var{replacement}, it
-stands for the precise substring that was matched by @var{regexp}. (If
-the regexp can match more than one string, then this precise substring
-may vary.) For example:@refill
-
-@smallexample
-awk '@{ sub(/candidate/, "& and his wife"); print @}'
-@end smallexample
-
-@noindent
-changes the first occurrence of @samp{candidate} to @samp{candidate
-and his wife} on each input line.
-
-Here is another example:
-
-@smallexample
-awk 'BEGIN @{
- str = "daabaaa"
- sub(/a*/, "c&c", str)
- print str
-@}'
-@end smallexample
-
-@noindent
-prints @samp{dcaacbaaa}. This show how @samp{&} can represent a non-constant
-string, and also illustrates the ``leftmost, longest'' rule.
-
-The effect of this special character (@samp{&}) can be turned off by putting a
-backslash before it in the string. As usual, to insert one backslash in
-the string, you must write two backslashes. Therefore, write @samp{\\&}
-in a string constant to include a literal @samp{&} in the replacement.
-For example, here is how to replace the first @samp{|} on each line with
-an @samp{&}:@refill
-
-@smallexample
-awk '@{ sub(/\|/, "\\&"); print @}'
-@end smallexample
-
-@strong{Note:} as mentioned above, the third argument to @code{sub} must
-be an lvalue. Some versions of @code{awk} allow the third argument to
-be an expression which is not an lvalue. In such a case, @code{sub}
-would still search for the pattern and return 0 or 1, but the result of
-the substitution (if any) would be thrown away because there is no place
-to put it. Such versions of @code{awk} accept expressions like
-this:@refill
-
-@smallexample
-sub(/USA/, "United States", "the USA and Canada")
-@end smallexample
-
-@noindent
-But that is considered erroneous in @code{gawk}.
-
-@item gsub(@var{regexp}, @var{replacement}, @var{target})
-@findex gsub
-This is similar to the @code{sub} function, except @code{gsub} replaces
-@emph{all} of the longest, leftmost, @emph{nonoverlapping} matching
-substrings it can find. The @samp{g} in @code{gsub} stands for
-``global,'' which means replace everywhere. For example:@refill
-
-@smallexample
-awk '@{ gsub(/Britain/, "United Kingdom"); print @}'
-@end smallexample
-
-@noindent
-replaces all occurrences of the string @samp{Britain} with @samp{United
-Kingdom} for all input records.@refill
-
-The @code{gsub} function returns the number of substitutions made. If
-the variable to be searched and altered, @var{target}, is
-omitted, then the entire input record, @code{$0}, is used.@refill
-
-As in @code{sub}, the characters @samp{&} and @samp{\} are special, and
-the third argument must be an lvalue.
-
-@item substr(@var{string}, @var{start}, @var{length})
-@findex substr
-This returns a @var{length}-character-long substring of @var{string},
-starting at character number @var{start}. The first character of a
-string is character number one. For example,
-@code{substr("washington", 5, 3)} returns @code{"ing"}.@refill
-
-If @var{length} is not present, this function returns the whole suffix of
-@var{string} that begins at character number @var{start}. For example,
-@code{substr("washington", 5)} returns @code{"ington"}. This is also
-the case if @var{length} is greater than the number of characters remaining
-in the string, counting from character number @var{start}.
-
-@item tolower(@var{string})
-@findex tolower
-This returns a copy of @var{string}, with each upper-case character
-in the string replaced with its corresponding lower-case character.
-Nonalphabetic characters are left unchanged. For example,
-@code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}.
-
-@item toupper(@var{string})
-@findex toupper
-This returns a copy of @var{string}, with each lower-case character
-in the string replaced with its corresponding upper-case character.
-Nonalphabetic characters are left unchanged. For example,
-@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
-@end table
-
-@node I/O Functions, Time Functions, String Functions, Built-in
-@section Built-in Functions for Input/Output
-
-@table @code
-@item close(@var{filename})
-Close the file @var{filename}, for input or output. The argument may
-alternatively be a shell command that was used for redirecting to or
-from a pipe; then the pipe is closed.
-
-@xref{Close Input, ,Closing Input Files and Pipes}, regarding closing
-input files and pipes. @xref{Close Output, ,Closing Output Files and Pipes},
-regarding closing output files and pipes.@refill
-
-@item system(@var{command})
-@findex system
-@c the following index entry is an overfull hbox. --mew 30jan1992
-@cindex interaction, @code{awk} and other programs
-The system function allows the user to execute operating system commands
-and then return to the @code{awk} program. The @code{system} function
-executes the command given by the string @var{command}. It returns, as
-its value, the status returned by the command that was executed.
-
-For example, if the following fragment of code is put in your @code{awk}
-program:
-
-@smallexample
-END @{
- system("mail -s 'awk run done' operator < /dev/null")
-@}
-@end smallexample
-
-@noindent
-the system operator will be sent mail when the @code{awk} program
-finishes processing input and begins its end-of-input processing.
-
-Note that much the same result can be obtained by redirecting
-@code{print} or @code{printf} into a pipe. However, if your @code{awk}
-program is interactive, @code{system} is useful for cranking up large
-self-contained programs, such as a shell or an editor.@refill
-
-Some operating systems cannot implement the @code{system} function.
-@code{system} causes a fatal error if it is not supported.
-@end table
-
-@c fakenode --- for prepinfo
-@subheading Controlling Output Buffering with @code{system}
-@cindex flushing buffers
-@cindex buffers, flushing
-@cindex buffering output
-@cindex output, buffering
-
-Many utility programs will @dfn{buffer} their output; they save information
-to be written to a disk file or terminal in memory, until there is enough
-to be written in one operation. This is often more efficient than writing
-every little bit of information as soon as it is ready. However, sometimes
-it is necessary to force a program to @dfn{flush} its buffers; that is,
-write the information to its destination, even if a buffer is not full.
-You can do this from your @code{awk} program by calling @code{system}
-with a null string as its argument:
-
-@example
-system("") # flush output
-@end example
-
-@noindent
-@code{gawk} treats this use of the @code{system} function as a special
-case, and is smart enough not to run a shell (or other command
-interpreter) with the empty command. Therefore, with @code{gawk}, this
-idiom is not only useful, it is efficient. While this idiom should work
-with other @code{awk} implementations, it will not necessarily avoid
-starting an unnecessary shell.
-@ignore
-Need a better explanation, perhaps in a separate paragraph. Explain that
-for
-
-awk 'BEGIN { print "hi"
- system("echo hello")
- print "howdy" }'
-
-that the output had better be
-
- hi
- hello
- howdy
-
-and not
-
- hello
- hi
- howdy
-
-which it would be if awk did not flush its buffers before calling system.
-@end ignore
-
-@node Time Functions, , I/O Functions, Built-in
-@section Functions for Dealing with Time Stamps
-
-@cindex time stamps
-@cindex time of day
-A common use for @code{awk} programs is the processing of log files.
-Log files often contain time stamp information, indicating when a
-particular log record was written. Many programs log their time stamp
-in the form returned by the @code{time} system call, which is the
-number of seconds since a particular epoch. On @sc{posix} systems,
-it is the number of seconds since Midnight, January 1, 1970, @sc{utc}.
-
-In order to make it easier to process such log files, and to easily produce
-useful reports, @code{gawk} provides two functions for working with time
-stamps. Both of these are @code{gawk} extensions; they are not specified
-in the @sc{posix} standard, nor are they in any other known version
-of @code{awk}.
-
-@table @code
-@item systime()
-@findex systime
-This function returns the current time as the number of seconds since
-the system epoch. On @sc{posix} systems, this is the number of seconds
-since Midnight, January 1, 1970, @sc{utc}. It may be a different number on
-other systems.
-
-@item strftime(@var{format}, @var{timestamp})
-@findex strftime
-This function returns a string. It is similar to the function of the
-same name in the @sc{ansi} C standard library. The time specified by
-@var{timestamp} is used to produce a string, based on the contents
-of the @var{format} string.
-@end table
-
-The @code{systime} function allows you to compare a time stamp from a
-log file with the current time of day. In particular, it is easy to
-determine how long ago a particular record was logged. It also allows
-you to produce log records using the ``seconds since the epoch'' format.
-
-The @code{strftime} function allows you to easily turn a time stamp
-into human-readable information. It is similar in nature to the @code{sprintf}
-function, copying non-format specification characters verbatim to the
-returned string, and substituting date and time values for format
-specifications in the @var{format} string. If no @var{timestamp} argument
-is supplied, @code{gawk} will use the current time of day as the
-time stamp.@refill
-
-@code{strftime} is guaranteed by the @sc{ansi} C standard to support
-the following date format specifications:
-
-@table @code
-@item %a
-The locale's abbreviated weekday name.
-
-@item %A
-The locale's full weekday name.
-
-@item %b
-The locale's abbreviated month name.
-
-@item %B
-The locale's full month name.
-
-@item %c
-The locale's ``appropriate'' date and time representation.
-
-@item %d
-The day of the month as a decimal number (01--31).
-
-@item %H
-The hour (24-hour clock) as a decimal number (00--23).
-
-@item %I
-The hour (12-hour clock) as a decimal number (01--12).
-
-@item %j
-The day of the year as a decimal number (001--366).
-
-@item %m
-The month as a decimal number (01--12).
-
-@item %M
-The minute as a decimal number (00--59).
-
-@item %p
-The locale's equivalent of the AM/PM designations associated
-with a 12-hour clock.
-
-@item %S
-The second as a decimal number (00--61). (Occasionally there are
-minutes in a year with one or two leap seconds, which is why the
-seconds can go from 0 all the way to 61.)
-
-@item %U
-The week number of the year (the first Sunday as the first day of week 1)
-as a decimal number (00--53).
-
-@item %w
-The weekday as a decimal number (0--6). Sunday is day 0.
-
-@item %W
-The week number of the year (the first Monday as the first day of week 1)
-as a decimal number (00--53).
-
-@item %x
-The locale's ``appropriate'' date representation.
-
-@item %X
-The locale's ``appropriate'' time representation.
-
-@item %y
-The year without century as a decimal number (00--99).
-
-@item %Y
-The year with century as a decimal number.
-
-@item %Z
-The time zone name or abbreviation, or no characters if
-no time zone is determinable.
-
-@item %%
-A literal @samp{%}.
-@end table
-
-@c The parenthetical remark here should really be a footnote, but
-@c it gave formatting problems at the FSF. So for now put it in
-@c parentheses.
-If a conversion specifier is not one of the above, the behavior is
-undefined. (This is because the @sc{ansi} standard for C leaves the
-behavior of the C version of @code{strftime} undefined, and @code{gawk}
-will use the system's version of @code{strftime} if it's there.
-Typically, the conversion specifier will either not appear in the
-returned string, or it will appear literally.)
-
-Informally, a @dfn{locale} is the geographic place in which a program
-is meant to run. For example, a common way to abbreviate the date
-September 4, 1991 in the United States would be ``9/4/91''.
-In many countries in Europe, however, it would be abbreviated ``4.9.91''.
-Thus, the @samp{%x} specification in a @code{"US"} locale might produce
-@samp{9/4/91}, while in a @code{"EUROPE"} locale, it might produce
-@samp{4.9.91}. The @sc{ansi} C standard defines a default @code{"C"}
-locale, which is an environment that is typical of what most C programmers
-are used to.
-
-A public-domain C version of @code{strftime} is shipped with @code{gawk}
-for systems that are not yet fully @sc{ansi}-compliant. If that version is
-used to compile @code{gawk} (@pxref{Installation, ,Installing @code{gawk}}),
-then the following additional format specifications are available:@refill
-
-@table @code
-@item %D
-Equivalent to specifying @samp{%m/%d/%y}.
-
-@item %e
-The day of the month, padded with a blank if it is only one digit.
-
-@item %h
-Equivalent to @samp{%b}, above.
-
-@item %n
-A newline character (ASCII LF).
-
-@item %r
-Equivalent to specifying @samp{%I:%M:%S %p}.
-
-@item %R
-Equivalent to specifying @samp{%H:%M}.
-
-@item %T
-Equivalent to specifying @samp{%H:%M:%S}.
-
-@item %t
-A TAB character.
-
-@item %k
-is replaced by the hour (24-hour clock) as a decimal number (0-23).
-Single digit numbers are padded with a blank.
-
-@item %l
-is replaced by the hour (12-hour clock) as a decimal number (1-12).
-Single digit numbers are padded with a blank.
-
-@item %C
-The century, as a number between 00 and 99.
-
-@item %u
-is replaced by the weekday as a decimal number
-[1 (Monday)--7].
-
-@item %V
-is replaced by the week number of the year (the first Monday as the first
-day of week 1) as a decimal number (01--53).
-The method for determining the week number is as specified by ISO 8601
-(to wit: if the week containing January 1 has four or more days in the
-new year, then it is week 1, otherwise it is week 53 of the previous year
-and the next week is week 1).@refill
-
-@item %Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI
-@itemx %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy
-These are ``alternate representations'' for the specifications
-that use only the second letter (@samp{%c}, @samp{%C}, and so on).
-They are recognized, but their normal representations are used.
-(These facilitate compliance with the @sc{posix} @code{date}
-utility.)@refill
-
-@item %v
-The date in VMS format (e.g. 20-JUN-1991).
-@end table
-
-Here are two examples that use @code{strftime}. The first is an
-@code{awk} version of the C @code{ctime} function. (This is a
-user defined function, which we have not discussed yet.
-@xref{User-defined, ,User-defined Functions}, for more information.)
-
-@smallexample
-# ctime.awk
-#
-# awk version of C ctime(3) function
-
-function ctime(ts, format)
-@{
- format = "%a %b %e %H:%M:%S %Z %Y"
- if (ts == 0)
- ts = systime() # use current time as default
- return strftime(format, ts)
-@}
-@end smallexample
-
-This next example is an @code{awk} implementation of the @sc{posix}
-@code{date} utility. Normally, the @code{date} utility prints the
-current date and time of day in a well known format. However, if you
-provide an argument to it that begins with a @samp{+}, @code{date}
-will copy non-format specifier characters to the standard output, and
-will interpret the current time according to the format specifiers in
-the string. For example:
-
-@smallexample
-date '+Today is %A, %B %d, %Y.'
-@end smallexample
-
-@noindent
-might print
-
-@smallexample
-Today is Thursday, July 11, 1991.
-@end smallexample
-
-Here is the @code{awk} version of the @code{date} utility.
-
-@smallexample
-#! /usr/bin/gawk -f
-#
-# date --- implement the P1003.2 Draft 11 'date' command
-#
-# Bug: does not recognize the -u argument.
-
-BEGIN \
-@{
- format = "%a %b %e %H:%M:%S %Z %Y"
- exitval = 0
-
- if (ARGC > 2)
- exitval = 1
- else if (ARGC == 2) @{
- format = ARGV[1]
- if (format ~ /^\+/)
- format = substr(format, 2) # remove leading +
- @}
- print strftime(format)
- exit exitval
-@}
-@end smallexample
-
-@node User-defined, Built-in Variables, Built-in, Top
-@chapter User-defined Functions
-
-@cindex user-defined functions
-@cindex functions, user-defined
-Complicated @code{awk} programs can often be simplified by defining
-your own functions. User-defined functions can be called just like
-built-in ones (@pxref{Function Calls}), but it is up to you to define
-them---to tell @code{awk} what they should do.
-
-@menu
-* Definition Syntax:: How to write definitions and what they mean.
-* Function Example:: An example function definition and
- what it does.
-* Function Caveats:: Things to watch out for.
-* Return Statement:: Specifying the value a function returns.
-@end menu
-
-@node Definition Syntax, Function Example, User-defined, User-defined
-@section Syntax of Function Definitions
-@cindex defining functions
-@cindex function definition
-
-Definitions of functions can appear anywhere between the rules of the
-@code{awk} program. Thus, the general form of an @code{awk} program is
-extended to include sequences of rules @emph{and} user-defined function
-definitions.
-
-The definition of a function named @var{name} looks like this:
-
-@example
-function @var{name} (@var{parameter-list}) @{
- @var{body-of-function}
-@}
-@end example
-
-@noindent
-@var{name} is the name of the function to be defined. A valid function
-name is like a valid variable name: a sequence of letters, digits and
-underscores, not starting with a digit. Functions share the same pool
-of names as variables and arrays.
-
-@var{parameter-list} is a list of the function's arguments and local
-variable names, separated by commas. When the function is called,
-the argument names are used to hold the argument values given in
-the call. The local variables are initialized to the null string.
-
-The @var{body-of-function} consists of @code{awk} statements. It is the
-most important part of the definition, because it says what the function
-should actually @emph{do}. The argument names exist to give the body a
-way to talk about the arguments; local variables, to give the body
-places to keep temporary values.
-
-Argument names are not distinguished syntactically from local variable
-names; instead, the number of arguments supplied when the function is
-called determines how many argument variables there are. Thus, if three
-argument values are given, the first three names in @var{parameter-list}
-are arguments, and the rest are local variables.
-
-It follows that if the number of arguments is not the same in all calls
-to the function, some of the names in @var{parameter-list} may be
-arguments on some occasions and local variables on others. Another
-way to think of this is that omitted arguments default to the
-null string.
-
-Usually when you write a function you know how many names you intend to
-use for arguments and how many you intend to use as locals. By
-convention, you should write an extra space between the arguments and
-the locals, so other people can follow how your function is
-supposed to be used.
-
-During execution of the function body, the arguments and local variable
-values hide or @dfn{shadow} any variables of the same names used in the
-rest of the program. The shadowed variables are not accessible in the
-function definition, because there is no way to name them while their
-names have been taken away for the local variables. All other variables
-used in the @code{awk} program can be referenced or set normally in the
-function definition.
-
-The arguments and local variables last only as long as the function body
-is executing. Once the body finishes, the shadowed variables come back.
-
-The function body can contain expressions which call functions. They
-can even call this function, either directly or by way of another
-function. When this happens, we say the function is @dfn{recursive}.
-
-There is no need in @code{awk} to put the definition of a function
-before all uses of the function. This is because @code{awk} reads the
-entire program before starting to execute any of it.
-
-In many @code{awk} implementations, the keyword @code{function} may be
-abbreviated @code{func}. However, @sc{posix} only specifies the use of
-the keyword @code{function}. This actually has some practical implications.
-If @code{gawk} is in @sc{posix}-compatibility mode
-(@pxref{Command Line, ,Invoking @code{awk}}), then the following
-statement will @emph{not} define a function:@refill
-
-@example
-func foo() @{ a = sqrt($1) ; print a @}
-@end example
-
-@noindent
-Instead it defines a rule that, for each record, concatenates the value
-of the variable @samp{func} with the return value of the function @samp{foo},
-and based on the truth value of the result, executes the corresponding action.
-This is probably not what was desired. (@code{awk} accepts this input as
-syntactically valid, since functions may be used before they are defined
-in @code{awk} programs.)
-
-@node Function Example, Function Caveats, Definition Syntax, User-defined
-@section Function Definition Example
-
-Here is an example of a user-defined function, called @code{myprint}, that
-takes a number and prints it in a specific format.
-
-@example
-function myprint(num)
-@{
- printf "%6.3g\n", num
-@}
-@end example
-
-@noindent
-To illustrate, here is an @code{awk} rule which uses our @code{myprint}
-function:
-
-@example
-$3 > 0 @{ myprint($3) @}
-@end example
-
-@noindent
-This program prints, in our special format, all the third fields that
-contain a positive number in our input. Therefore, when given:
-
-@example
- 1.2 3.4 5.6 7.8
- 9.10 11.12 -13.14 15.16
-17.18 19.20 21.22 23.24
-@end example
-
-@noindent
-this program, using our function to format the results, prints:
-
-@example
- 5.6
- 21.2
-@end example
-
-Here is a rather contrived example of a recursive function. It prints a
-string backwards:
-
-@example
-function rev (str, len) @{
- if (len == 0) @{
- printf "\n"
- return
- @}
- printf "%c", substr(str, len, 1)
- rev(str, len - 1)
-@}
-@end example
-
-@node Function Caveats, Return Statement, Function Example, User-defined
-@section Calling User-defined Functions
-
-@dfn{Calling a function} means causing the function to run and do its job.
-A function call is an expression, and its value is the value returned by
-the function.
-
-A function call consists of the function name followed by the arguments
-in parentheses. What you write in the call for the arguments are
-@code{awk} expressions; each time the call is executed, these
-expressions are evaluated, and the values are the actual arguments. For
-example, here is a call to @code{foo} with three arguments (the first
-being a string concatenation):
-
-@example
-foo(x y, "lose", 4 * z)
-@end example
-
-@quotation
-@strong{Caution:} whitespace characters (spaces and tabs) are not allowed
-between the function name and the open-parenthesis of the argument list.
-If you write whitespace by mistake, @code{awk} might think that you mean
-to concatenate a variable with an expression in parentheses. However, it
-notices that you used a function name and not a variable name, and reports
-an error.
-@end quotation
-
-@cindex call by value
-When a function is called, it is given a @emph{copy} of the values of
-its arguments. This is called @dfn{call by value}. The caller may use
-a variable as the expression for the argument, but the called function
-does not know this: it only knows what value the argument had. For
-example, if you write this code:
-
-@example
-foo = "bar"
-z = myfunc(foo)
-@end example
-
-@noindent
-then you should not think of the argument to @code{myfunc} as being
-``the variable @code{foo}.'' Instead, think of the argument as the
-string value, @code{"bar"}.
-
-If the function @code{myfunc} alters the values of its local variables,
-this has no effect on any other variables. In particular, if @code{myfunc}
-does this:
-
-@example
-function myfunc (win) @{
- print win
- win = "zzz"
- print win
-@}
-@end example
-
-@noindent
-to change its first argument variable @code{win}, this @emph{does not}
-change the value of @code{foo} in the caller. The role of @code{foo} in
-calling @code{myfunc} ended when its value, @code{"bar"}, was computed.
-If @code{win} also exists outside of @code{myfunc}, the function body
-cannot alter this outer value, because it is shadowed during the
-execution of @code{myfunc} and cannot be seen or changed from there.
-
-@cindex call by reference
-However, when arrays are the parameters to functions, they are @emph{not}
-copied. Instead, the array itself is made available for direct manipulation
-by the function. This is usually called @dfn{call by reference}.
-Changes made to an array parameter inside the body of a function @emph{are}
-visible outside that function.
-@ifinfo
-This can be @strong{very} dangerous if you do not watch what you are
-doing. For example:@refill
-@end ifinfo
-@iftex
-@emph{This can be very dangerous if you do not watch what you are
-doing.} For example:@refill
-@end iftex
-
-@example
-function changeit (array, ind, nvalue) @{
- array[ind] = nvalue
-@}
-
-BEGIN @{
- a[1] = 1 ; a[2] = 2 ; a[3] = 3
- changeit(a, 2, "two")
- printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
- @}
-@end example
-
-@noindent
-prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because calling
-@code{changeit} stores @code{"two"} in the second element of @code{a}.
-
-@node Return Statement, , Function Caveats, User-defined
-@section The @code{return} Statement
-@cindex @code{return} statement
-
-The body of a user-defined function can contain a @code{return} statement.
-This statement returns control to the rest of the @code{awk} program. It
-can also be used to return a value for use in the rest of the @code{awk}
-program. It looks like this:@refill
-
-@example
-return @var{expression}
-@end example
-
-The @var{expression} part is optional. If it is omitted, then the returned
-value is undefined and, therefore, unpredictable.
-
-A @code{return} statement with no value expression is assumed at the end of
-every function definition. So if control reaches the end of the function
-body, then the function returns an unpredictable value. @code{awk}
-will not warn you if you use the return value of such a function; you will
-simply get unpredictable or unexpected results.
-
-Here is an example of a user-defined function that returns a value
-for the largest number among the elements of an array:@refill
-
-@example
-@group
-function maxelt (vec, i, ret) @{
- for (i in vec) @{
- if (ret == "" || vec[i] > ret)
- ret = vec[i]
- @}
- return ret
-@}
-@end group
-@end example
-
-@noindent
-You call @code{maxelt} with one argument, which is an array name. The local
-variables @code{i} and @code{ret} are not intended to be arguments;
-while there is nothing to stop you from passing two or three arguments
-to @code{maxelt}, the results would be strange. The extra space before
-@code{i} in the function parameter list is to indicate that @code{i} and
-@code{ret} are not supposed to be arguments. This is a convention which
-you should follow when you define functions.
-
-Here is a program that uses our @code{maxelt} function. It loads an
-array, calls @code{maxelt}, and then reports the maximum number in that
-array:@refill
-
-@example
-@group
-awk '
-function maxelt (vec, i, ret) @{
- for (i in vec) @{
- if (ret == "" || vec[i] > ret)
- ret = vec[i]
- @}
- return ret
-@}
-@end group
-
-@group
-# Load all fields of each record into nums.
-@{
- for(i = 1; i <= NF; i++)
- nums[NR, i] = $i
-@}
-
-END @{
- print maxelt(nums)
-@}'
-@end group
-@end example
-
-Given the following input:
-
-@example
-@group
- 1 5 23 8 16
-44 3 5 2 8 26
-256 291 1396 2962 100
--6 467 998 1101
-99385 11 0 225
-@end group
-@end example
-
-@noindent
-our program tells us (predictably) that:
-
-@example
-99385
-@end example
-
-@noindent
-is the largest number in our array.
-
-@node Built-in Variables, Command Line, User-defined, Top
-@chapter Built-in Variables
-@cindex built-in variables
-
-Most @code{awk} variables are available for you to use for your own
-purposes; they never change except when your program assigns values to
-them, and never affect anything except when your program examines them.
-
-A few variables have special built-in meanings. Some of them @code{awk}
-examines automatically, so that they enable you to tell @code{awk} how
-to do certain things. Others are set automatically by @code{awk}, so
-that they carry information from the internal workings of @code{awk} to
-your program.
-
-This chapter documents all the built-in variables of @code{gawk}. Most
-of them are also documented in the chapters where their areas of
-activity are described.
-
-@menu
-* User-modified:: Built-in variables that you change
- to control @code{awk}.
-* Auto-set:: Built-in variables where @code{awk}
- gives you information.
-@end menu
-
-@node User-modified, Auto-set, Built-in Variables, Built-in Variables
-@section Built-in Variables that Control @code{awk}
-@cindex built-in variables, user modifiable
-
-This is a list of the variables which you can change to control how
-@code{awk} does certain things.
-
-@table @code
-@iftex
-@vindex CONVFMT
-@end iftex
-@item CONVFMT
-This string is used by @code{awk} to control conversion of numbers to
-strings (@pxref{Conversion, ,Conversion of Strings and Numbers}).
-It works by being passed, in effect, as the first argument to the
-@code{sprintf} function. Its default value is @code{"%.6g"}.
-@code{CONVFMT} was introduced by the @sc{posix} standard.@refill
-
-@iftex
-@vindex FIELDWIDTHS
-@end iftex
-@item FIELDWIDTHS
-This is a space separated list of columns that tells @code{gawk}
-how to manage input with fixed, columnar boundaries. It is an
-experimental feature that is still evolving. Assigning to @code{FIELDWIDTHS}
-overrides the use of @code{FS} for field splitting.
-@xref{Constant Size, ,Reading Fixed-width Data}, for more information.@refill
-
-If @code{gawk} is in compatibility mode
-(@pxref{Command Line, ,Invoking @code{awk}}), then @code{FIELDWIDTHS}
-has no special meaning, and field splitting operations are done based
-exclusively on the value of @code{FS}.@refill
-
-@iftex
-@vindex FS
-@end iftex
-@item FS
-@code{FS} is the input field separator
-(@pxref{Field Separators, ,Specifying how Fields are Separated}).
-The value is a single-character string or a multi-character regular
-expression that matches the separations between fields in an input
-record.@refill
-
-The default value is @w{@code{" "}}, a string consisting of a single
-space. As a special exception, this value actually means that any
-sequence of spaces and tabs is a single separator. It also causes
-spaces and tabs at the beginning or end of a line to be ignored.
-
-You can set the value of @code{FS} on the command line using the
-@samp{-F} option:
-
-@example
-awk -F, '@var{program}' @var{input-files}
-@end example
-
-If @code{gawk} is using @code{FIELDWIDTHS} for field-splitting,
-assigning a value to @code{FS} will cause @code{gawk} to return to
-the normal, regexp-based, field splitting.
-
-@item IGNORECASE
-@iftex
-@vindex IGNORECASE
-@end iftex
-If @code{IGNORECASE} is nonzero, then @emph{all} regular expression
-matching is done in a case-independent fashion. In particular, regexp
-matching with @samp{~} and @samp{!~}, and the @code{gsub} @code{index},
-@code{match}, @code{split} and @code{sub} functions all ignore case when
-doing their particular regexp operations. @strong{Note:} since field
-splitting with the value of the @code{FS} variable is also a regular
-expression operation, that too is done with case ignored.
-@xref{Case-sensitivity, ,Case-sensitivity in Matching}.
-
-If @code{gawk} is in compatibility mode
-(@pxref{Command Line, ,Invoking @code{awk}}), then @code{IGNORECASE} has
-no special meaning, and regexp operations are always case-sensitive.@refill
-
-@item OFMT
-@iftex
-@vindex OFMT
-@end iftex
-This string is used by @code{awk} to control conversion of numbers to
-strings (@pxref{Conversion, ,Conversion of Strings and Numbers}) for
-printing with the @code{print} statement.
-It works by being passed, in effect, as the first argument to the
-@code{sprintf} function. Its default value is @code{"%.6g"}.
-Earlier versions of @code{awk} also used @code{OFMT} to specify the
-format for converting numbers to strings in general expressions; this
-has been taken over by @code{CONVFMT}.@refill
-
-@item OFS
-@iftex
-@vindex OFS
-@end iftex
-This is the output field separator (@pxref{Output Separators}). It is
-output between the fields output by a @code{print} statement. Its
-default value is @w{@code{" "}}, a string consisting of a single space.
-
-@item ORS
-@iftex
-@vindex ORS
-@end iftex
-This is the output record separator. It is output at the end of every
-@code{print} statement. Its default value is a string containing a
-single newline character, which could be written as @code{"\n"}.
-(@xref{Output Separators}.)@refill
-
-@item RS
-@iftex
-@vindex RS
-@end iftex
-This is @code{awk}'s input record separator. Its default value is a string
-containing a single newline character, which means that an input record
-consists of a single line of text.
-(@xref{Records, ,How Input is Split into Records}.)@refill
-
-@item SUBSEP
-@iftex
-@vindex SUBSEP
-@end iftex
-@code{SUBSEP} is the subscript separator. It has the default value of
-@code{"\034"}, and is used to separate the parts of the name of a
-multi-dimensional array. Thus, if you access @code{foo[12,3]}, it
-really accesses @code{foo["12\0343"]}
-(@pxref{Multi-dimensional, ,Multi-dimensional Arrays}).@refill
-@end table
-
-@node Auto-set, , User-modified, Built-in Variables
-@section Built-in Variables that Convey Information
-
-This is a list of the variables that are set automatically by @code{awk}
-on certain occasions so as to provide information to your program.
-
-@table @code
-@item ARGC
-@itemx ARGV
-@iftex
-@vindex ARGC
-@vindex ARGV
-@end iftex
-The command-line arguments available to @code{awk} programs are stored in
-an array called @code{ARGV}. @code{ARGC} is the number of command-line
-arguments present. @xref{Command Line, ,Invoking @code{awk}}.
-@code{ARGV} is indexed from zero to @w{@code{ARGC - 1}}. For example:@refill
-
-@example
-awk 'BEGIN @{
- for (i = 0; i < ARGC; i++)
- print ARGV[i]
- @}' inventory-shipped BBS-list
-@end example
-
-@noindent
-In this example, @code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]}
-contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains
-@code{"BBS-list"}. The value of @code{ARGC} is 3, one more than the
-index of the last element in @code{ARGV} since the elements are numbered
-from zero.@refill
-
-The names @code{ARGC} and @code{ARGV}, as well the convention of indexing
-the array from 0 to @w{@code{ARGC - 1}}, are derived from the C language's
-method of accessing command line arguments.@refill
-
-Notice that the @code{awk} program is not entered in @code{ARGV}. The
-other special command line options, with their arguments, are also not
-entered. But variable assignments on the command line @emph{are}
-treated as arguments, and do show up in the @code{ARGV} array.
-
-Your program can alter @code{ARGC} and the elements of @code{ARGV}.
-Each time @code{awk} reaches the end of an input file, it uses the next
-element of @code{ARGV} as the name of the next input file. By storing a
-different string there, your program can change which files are read.
-You can use @code{"-"} to represent the standard input. By storing
-additional elements and incrementing @code{ARGC} you can cause
-additional files to be read.
-
-If you decrease the value of @code{ARGC}, that eliminates input files
-from the end of the list. By recording the old value of @code{ARGC}
-elsewhere, your program can treat the eliminated arguments as
-something other than file names.
-
-To eliminate a file from the middle of the list, store the null string
-(@code{""}) into @code{ARGV} in place of the file's name. As a
-special feature, @code{awk} ignores file names that have been
-replaced with the null string.
-
-@ignore
-see getopt.awk in the examples...
-@end ignore
-
-@item ARGIND
-@vindex ARGIND
-The index in @code{ARGV} of the current file being processed.
-Every time @code{gawk} opens a new data file for processing, it sets
-@code{ARGIND} to the index in @code{ARGV} of the file name. Thus, the
-condition @samp{FILENAME == ARGV[ARGIND]} is always true.
-
-This variable is useful in file processing; it allows you to tell how far
-along you are in the list of data files, and to distinguish between
-multiple successive instances of the same filename on the command line.
-
-While you can change the value of @code{ARGIND} within your @code{awk}
-program, @code{gawk} will automatically set it to a new value when the
-next file is opened.
-
-This variable is a @code{gawk} extension; in other @code{awk} implementations
-it is not special.
-
-@item ENVIRON
-@vindex ENVIRON
-This is an array that contains the values of the environment. The array
-indices are the environment variable names; the values are the values of
-the particular environment variables. For example,
-@code{ENVIRON["HOME"]} might be @file{/u/close}. Changing this array
-does not affect the environment passed on to any programs that
-@code{awk} may spawn via redirection or the @code{system} function.
-(In a future version of @code{gawk}, it may do so.)
-
-Some operating systems may not have environment variables.
-On such systems, the array @code{ENVIRON} is empty.
-
-@item ERRNO
-@iftex
-@vindex ERRNO
-@end iftex
-If a system error occurs either doing a redirection for @code{getline},
-during a read for @code{getline}, or during a @code{close} operation,
-then @code{ERRNO} will contain a string describing the error.
-
-This variable is a @code{gawk} extension; in other @code{awk} implementations
-it is not special.
-
-@item FILENAME
-@iftex
-@vindex FILENAME
-@end iftex
-This is the name of the file that @code{awk} is currently reading.
-If @code{awk} is reading from the standard input (in other words,
-there are no files listed on the command line),
-@code{FILENAME} is set to @code{"-"}.
-@code{FILENAME} is changed each time a new file is read
-(@pxref{Reading Files, ,Reading Input Files}).@refill
-
-@item FNR
-@iftex
-@vindex FNR
-@end iftex
-@code{FNR} is the current record number in the current file. @code{FNR} is
-incremented each time a new record is read
-(@pxref{Getline, ,Explicit Input with @code{getline}}). It is reinitialized
-to 0 each time a new input file is started.@refill
-
-@item NF
-@iftex
-@vindex NF
-@end iftex
-@code{NF} is the number of fields in the current input record.
-@code{NF} is set each time a new record is read, when a new field is
-created, or when @code{$0} changes (@pxref{Fields, ,Examining Fields}).@refill
-
-@item NR
-@iftex
-@vindex NR
-@end iftex
-This is the number of input records @code{awk} has processed since
-the beginning of the program's execution.
-(@pxref{Records, ,How Input is Split into Records}).
-@code{NR} is set each time a new record is read.@refill
-
-@item RLENGTH
-@iftex
-@vindex RLENGTH
-@end iftex
-@code{RLENGTH} is the length of the substring matched by the
-@code{match} function
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
-@code{RLENGTH} is set by invoking the @code{match} function. Its value
-is the length of the matched string, or @minus{}1 if no match was found.@refill
-
-@item RSTART
-@iftex
-@vindex RSTART
-@end iftex
-@code{RSTART} is the start-index in characters of the substring matched by the
-@code{match} function
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
-@code{RSTART} is set by invoking the @code{match} function. Its value
-is the position of the string where the matched substring starts, or 0
-if no match was found.@refill
-@end table
-
-@node Command Line, Language History, Built-in Variables, Top
-@c node-name, next, previous, up
-@chapter Invoking @code{awk}
-@cindex command line
-@cindex invocation of @code{gawk}
-@cindex arguments, command line
-@cindex options, command line
-@cindex long options
-@cindex options, long
-
-There are two ways to run @code{awk}: with an explicit program, or with
-one or more program files. Here are templates for both of them; items
-enclosed in @samp{@r{[}@dots{}@r{]}} in these templates are optional.
-
-Besides traditional one-letter @sc{posix}-style options, @code{gawk} also
-supports GNU long named options.
-
-@example
-awk @r{[@var{POSIX or GNU style options}]} -f progfile @r{[@code{--}]} @var{file} @dots{}
-awk @r{[@var{POSIX or GNU style options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
-@end example
-
-@menu
-* Options:: Command line options and their meanings.
-* Other Arguments:: Input file names and variable assignments.
-* AWKPATH Variable:: Searching directories for @code{awk} programs.
-* Obsolete:: Obsolete Options and/or features.
-* Undocumented:: Undocumented Options and Features.
-@end menu
-
-@node Options, Other Arguments, Command Line, Command Line
-@section Command Line Options
-
-Options begin with a minus sign, and consist of a single character.
-GNU style long named options consist of two minus signs and
-a keyword that can be abbreviated if the abbreviation allows the option
-to be uniquely identified. If the option takes an argument, then the
-keyword is immediately followed by an equals sign (@samp{=}) and the
-argument's value. For brevity, the discussion below only refers to the
-traditional short options; however the long and short options are
-interchangeable in all contexts.
-
-Each long named option for @code{gawk} has a corresponding
-@sc{posix}-style option. The options and their meanings are as follows:
-
-@table @code
-@item -F @var{fs}
-@itemx --field-separator=@var{fs}
-@iftex
-@cindex @code{-F} option
-@end iftex
-@cindex @code{--field-separator} option
-Sets the @code{FS} variable to @var{fs}
-(@pxref{Field Separators, ,Specifying how Fields are Separated}).@refill
-
-@item -f @var{source-file}
-@itemx --file=@var{source-file}
-@iftex
-@cindex @code{-f} option
-@end iftex
-@cindex @code{--file} option
-Indicates that the @code{awk} program is to be found in @var{source-file}
-instead of in the first non-option argument.
-
-@item -v @var{var}=@var{val}
-@itemx --assign=@var{var}=@var{val}
-@cindex @samp{-v} option
-@cindex @code{--assign} option
-Sets the variable @var{var} to the value @var{val} @emph{before}
-execution of the program begins. Such variable values are available
-inside the @code{BEGIN} rule (see below for a fuller explanation).
-
-The @samp{-v} option can only set one variable, but you can use
-it more than once, setting another variable each time, like this:
-@samp{@w{-v foo=1} @w{-v bar=2}}.
-
-@item -W @var{gawk-opt}
-@cindex @samp{-W} option
-Following the @sc{posix} standard, options that are implementation
-specific are supplied as arguments to the @samp{-W} option. With @code{gawk},
-these arguments may be separated by commas, or quoted and separated by
-whitespace. Case is ignored when processing these options. These options
-also have corresponding GNU style long named options. The following
-@code{gawk}-specific options are available:
-
-@table @code
-@item -W compat
-@itemx --compat
-@cindex @code{--compat} option
-Specifies @dfn{compatibility mode}, in which the GNU extensions in
-@code{gawk} are disabled, so that @code{gawk} behaves just like Unix
-@code{awk}.
-@xref{POSIX/GNU, ,Extensions in @code{gawk} not in POSIX @code{awk}},
-which summarizes the extensions. Also see
-@ref{Compatibility Mode, ,Downward Compatibility and Debugging}.@refill
-
-@item -W copyleft
-@itemx -W copyright
-@itemx --copyleft
-@itemx --copyright
-@cindex @code{--copyleft} option
-@cindex @code{--copyright} option
-Print the short version of the General Public License.
-This option may disappear in a future version of @code{gawk}.
-
-@item -W help
-@itemx -W usage
-@itemx --help
-@itemx --usage
-@cindex @code{--help} option
-@cindex @code{--usage} option
-Print a ``usage'' message summarizing the short and long style options
-that @code{gawk} accepts, and then exit.
-
-@item -W lint
-@itemx --lint
-@cindex @code{--lint} option
-Provide warnings about constructs that are dubious or non-portable to
-other @code{awk} implementations.
-Some warnings are issued when @code{gawk} first reads your program. Others
-are issued at run-time, as your program executes.
-
-@item -W posix
-@itemx --posix
-@cindex @code{--posix} option
-Operate in strict @sc{posix} mode. This disables all @code{gawk}
-extensions (just like @code{-W compat}), and adds the following additional
-restrictions:
-
-@itemize @bullet{}
-@item
-@code{\x} escape sequences are not recognized
-(@pxref{Constants, ,Constant Expressions}).@refill
-
-@item
-The synonym @code{func} for the keyword @code{function} is not
-recognized (@pxref{Definition Syntax, ,Syntax of Function Definitions}).
-
-@item
-The operators @samp{**} and @samp{**=} cannot be used in
-place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators},
-and also @pxref{Assignment Ops, ,Assignment Expressions}).@refill
-
-@item
-Specifying @samp{-Ft} on the command line does not set the value
-of @code{FS} to be a single tab character
-(@pxref{Field Separators, ,Specifying how Fields are Separated}).@refill
-@end itemize
-
-Although you can supply both @samp{-W compat} and @samp{-W posix} on the
-command line, @samp{-W posix} will take precedence.
-
-@item -W source=@var{program-text}
-@itemx --source=@var{program-text}
-@cindex @code{--source} option
-Program source code is taken from the @var{program-text}. This option
-allows you to mix @code{awk} source code in files with program source
-code that you would enter on the command line. This is particularly useful
-when you have library functions that you wish to use from your command line
-programs (@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}).
-
-@item -W version
-@itemx --version
-@cindex @code{--version} option
-Prints version information for this particular copy of @code{gawk}.
-This is so you can determine if your copy of @code{gawk} is up to date
-with respect to whatever the Free Software Foundation is currently
-distributing. This option may disappear in a future version of @code{gawk}.
-@end table
-
-@item --
-Signals the end of the command line options. The following arguments
-are not treated as options even if they begin with @samp{-}. This
-interpretation of @samp{--} follows the @sc{posix} argument parsing
-conventions.
-
-This is useful if you have file names that start with @samp{-},
-or in shell scripts, if you have file names that will be specified
-by the user which could start with @samp{-}.
-@end table
-
-Any other options are flagged as invalid with a warning message, but
-are otherwise ignored.
-
-In compatibility mode, as a special case, if the value of @var{fs} supplied
-to the @samp{-F} option is @samp{t}, then @code{FS} is set to the tab
-character (@code{"\t"}). This is only true for @samp{-W compat}, and not
-for @samp{-W posix}
-(@pxref{Field Separators, ,Specifying how Fields are Separated}).@refill
-
-If the @samp{-f} option is @emph{not} used, then the first non-option
-command line argument is expected to be the program text.
-
-The @samp{-f} option may be used more than once on the command line.
-If it is, @code{awk} reads its program source from all of the named files, as
-if they had been concatenated together into one big file. This is
-useful for creating libraries of @code{awk} functions. Useful functions
-can be written once, and then retrieved from a standard place, instead
-of having to be included into each individual program. You can still
-type in a program at the terminal and use library functions, by specifying
-@samp{-f /dev/tty}. @code{awk} will read a file from the terminal
-to use as part of the @code{awk} program. After typing your program,
-type @kbd{Control-d} (the end-of-file character) to terminate it.
-(You may also use @samp{-f -} to read program source from the standard
-input, but then you will not be able to also use the standard input as a
-source of data.)
-
-Because it is clumsy using the standard @code{awk} mechanisms to mix source
-file and command line @code{awk} programs, @code{gawk} provides the
-@samp{--source} option. This does not require you to pre-empt the standard
-input for your source code, and allows you to easily mix command line
-and library source code
-(@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}).
-
-If no @samp{-f} or @samp{--source} option is specified, then @code{gawk}
-will use the first non-option command line argument as the text of the
-program source code.
-
-@node Other Arguments, AWKPATH Variable, Options, Command Line
-@section Other Command Line Arguments
-
-Any additional arguments on the command line are normally treated as
-input files to be processed in the order specified. However, an
-argument that has the form @code{@var{var}=@var{value}}, means to assign
-the value @var{value} to the variable @var{var}---it does not specify a
-file at all.
-
-@vindex ARGV
-All these arguments are made available to your @code{awk} program in the
-@code{ARGV} array (@pxref{Built-in Variables}). Command line options
-and the program text (if present) are omitted from the @code{ARGV}
-array. All other arguments, including variable assignments, are
-included.
-
-The distinction between file name arguments and variable-assignment
-arguments is made when @code{awk} is about to open the next input file.
-At that point in execution, it checks the ``file name'' to see whether
-it is really a variable assignment; if so, @code{awk} sets the variable
-instead of reading a file.
-
-Therefore, the variables actually receive the specified values after all
-previously specified files have been read. In particular, the values of
-variables assigned in this fashion are @emph{not} available inside a
-@code{BEGIN} rule
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}),
-since such rules are run before @code{awk} begins scanning the argument list.
-The values given on the command line are processed for escape sequences
-(@pxref{Constants, ,Constant Expressions}).@refill
-
-In some earlier implementations of @code{awk}, when a variable assignment
-occurred before any file names, the assignment would happen @emph{before}
-the @code{BEGIN} rule was executed. Some applications came to depend
-upon this ``feature.'' When @code{awk} was changed to be more consistent,
-the @samp{-v} option was added to accommodate applications that depended
-upon this old behavior.
-
-The variable assignment feature is most useful for assigning to variables
-such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
-output formats, before scanning the data files. It is also useful for
-controlling state if multiple passes are needed over a data file. For
-example:@refill
-
-@cindex multiple passes over data
-@cindex passes, multiple
-@smallexample
-awk 'pass == 1 @{ @var{pass 1 stuff} @}
- pass == 2 @{ @var{pass 2 stuff} @}' pass=1 datafile pass=2 datafile
-@end smallexample
-
-Given the variable assignment feature, the @samp{-F} option is not
-strictly necessary. It remains for historical compatibility.
-
-@node AWKPATH Variable, Obsolete, Other Arguments, Command Line
-@section The @code{AWKPATH} Environment Variable
-@cindex @code{AWKPATH} environment variable
-@cindex search path
-@cindex directory search
-@cindex path, search
-@iftex
-@cindex differences between @code{gawk} and @code{awk}
-@end iftex
-
-The previous section described how @code{awk} program files can be named
-on the command line with the @samp{-f} option. In some @code{awk}
-implementations, you must supply a precise path name for each program
-file, unless the file is in the current directory.
-
-But in @code{gawk}, if the file name supplied in the @samp{-f} option
-does not contain a @samp{/}, then @code{gawk} searches a list of
-directories (called the @dfn{search path}), one by one, looking for a
-file with the specified name.
-
-The search path is actually a string consisting of directory names
-separated by colons. @code{gawk} gets its search path from the
-@code{AWKPATH} environment variable. If that variable does not exist,
-@code{gawk} uses the default path, which is
-@samp{.:/usr/lib/awk:/usr/local/lib/awk}. (Programs written by
-system administrators should use an @code{AWKPATH} variable that
-does not include the current directory, @samp{.}.)@refill
-
-The search path feature is particularly useful for building up libraries
-of useful @code{awk} functions. The library files can be placed in a
-standard directory that is in the default path, and then specified on
-the command line with a short file name. Otherwise, the full file name
-would have to be typed for each file.
-
-By combining the @samp{--source} and @samp{-f} options, your command line
-@code{awk} programs can use facilities in @code{awk} library files.
-
-Path searching is not done if @code{gawk} is in compatibility mode.
-This is true for both @samp{-W compat} and @samp{-W posix}.
-@xref{Options, ,Command Line Options}.
-
-@strong{Note:} if you want files in the current directory to be found,
-you must include the current directory in the path, either by writing
-@file{.} as an entry in the path, or by writing a null entry in the
-path. (A null entry is indicated by starting or ending the path with a
-colon, or by placing two colons next to each other (@samp{::}).) If the
-current directory is not included in the path, then files cannot be
-found in the current directory. This path search mechanism is identical
-to the shell's.
-@c someday, @cite{The Bourne Again Shell}....
-
-@node Obsolete, Undocumented, AWKPATH Variable, Command Line
-@section Obsolete Options and/or Features
-
-@cindex deprecated options
-@cindex obsolete options
-@cindex deprecated features
-@cindex obsolete features
-This section describes features and/or command line options from the
-previous release of @code{gawk} that are either not available in the
-current version, or that are still supported but deprecated (meaning that
-they will @emph{not} be in the next release).
-
-@c update this section for each release!
-
-For version 2.15 of @code{gawk}, the following command line options
-from version 2.11.1 are no longer recognized.
-
-@table @samp
-@ignore
-@item -nostalgia
-Use @samp{-W nostalgia} instead.
-@end ignore
-
-@item -c
-Use @samp{-W compat} instead.
-
-@item -V
-Use @samp{-W version} instead.
-
-@item -C
-Use @samp{-W copyright} instead.
-
-@item -a
-@itemx -e
-These options produce an ``unrecognized option'' error message but have
-no effect on the execution of @code{gawk}. The @sc{posix} standard now
-specifies traditional @code{awk} regular expressions for the @code{awk} utility.
-@end table
-
-The public-domain version of @code{strftime} that is distributed with
-@code{gawk} changed for the 2.14 release. The @samp{%V} conversion specifier
-that used to generate the date in VMS format was changed to @samp{%v}.
-This is because the @sc{posix} standard for the @code{date} utility now
-specifies a @samp{%V} conversion specifier.
-@xref{Time Functions, ,Functions for Dealing with Time Stamps}, for details.
-
-@node Undocumented, , Obsolete, Command Line
-@section Undocumented Options and Features
-
-This section intentionally left blank.
-
-@c Read The Source, Luke!
-
-@ignore
-@c If these came out in the Info file or TeX manual, then they wouldn't
-@c be undocumented, would they?
-
-@code{gawk} has one undocumented option:
-
-@table @samp
-@item -W nostalgia
-Print the message @code{"awk: bailing out near line 1"} and dump core.
-This option was inspired by the common behavior of very early versions of
-Unix @code{awk}, and by a t--shirt.
-@end table
-
-Early versions of @code{awk} used to not require any separator (either
-a newline or @samp{;}) between the rules in @code{awk} programs. Thus,
-it was common to see one-line programs like:
-
-@example
-awk '@{ sum += $1 @} END @{ print sum @}'
-@end example
-
-@code{gawk} actually supports this, but it is purposely undocumented
-since it is considered bad style. The correct way to write such a program
-is either
-
-@example
-awk '@{ sum += $1 @} ; END @{ print sum @}'
-@end example
-
-@noindent
-or
-
-@example
-awk '@{ sum += $1 @}
- END @{ print sum @}' data
-@end example
-
-@noindent
-@xref{Statements/Lines, ,@code{awk} Statements versus Lines}, for a fuller
-explanation.@refill
-
-As an accident of the implementation of the original Unix @code{awk}, if
-a built-in function used @code{$0} as its default argument, it was possible
-to call that function without the parentheses. In particular, it was
-common practice to use the @code{length} function in this fashion.
-For example, the pipeline:
-
-@example
-echo abcdef | awk '@{ print length @}'
-@end example
-
-@noindent
-would print @samp{6}.
-
-For backwards compatibility with old programs, @code{gawk} supports
-this usage, but only for the @code{length} function. New programs should
-@emph{not} call the @code{length} function this way. In particular,
-this usage will not be portable to other @sc{posix} compliant versions
-of @code{awk}. It is also poor style.
-
-@end ignore
-
-@node Language History, Installation, Command Line, Top
-@chapter The Evolution of the @code{awk} Language
-
-This manual describes the GNU implementation of @code{awk}, which is patterned
-after the @sc{posix} specification. Many @code{awk} users are only familiar
-with the original @code{awk} implementation in Version 7 Unix, which is also
-the basis for the version in Berkeley Unix (through 4.3--Reno). This chapter
-briefly describes the evolution of the @code{awk} language.
-
-@menu
-* V7/S5R3.1:: The major changes between V7 and
- System V Release 3.1.
-* S5R4:: Minor changes between System V
- Releases 3.1 and 4.
-* POSIX:: New features from the @sc{posix} standard.
-* POSIX/GNU:: The extensions in @code{gawk}
- not in @sc{posix} @code{awk}.
-@end menu
-
-@node V7/S5R3.1, S5R4, Language History, Language History
-@section Major Changes between V7 and S5R3.1
-
-The @code{awk} language evolved considerably between the release of
-Version 7 Unix (1978) and the new version first made widely available in
-System V Release 3.1 (1987). This section summarizes the changes, with
-cross-references to further details.
-
-@itemize @bullet
-@item
-The requirement for @samp{;} to separate rules on a line
-(@pxref{Statements/Lines, ,@code{awk} Statements versus Lines}).
-
-@item
-User-defined functions, and the @code{return} statement
-(@pxref{User-defined, ,User-defined Functions}).
-
-@item
-The @code{delete} statement (@pxref{Delete, ,The @code{delete} Statement}).
-
-@item
-The @code{do}-@code{while} statement
-(@pxref{Do Statement, ,The @code{do}-@code{while} Statement}).@refill
-
-@item
-The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand} and
-@code{srand} (@pxref{Numeric Functions, ,Numeric Built-in Functions}).
-
-@item
-The built-in functions @code{gsub}, @code{sub}, and @code{match}
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).
-
-@item
-The built-in functions @code{close}, which closes an open file, and
-@code{system}, which allows the user to execute operating system
-commands (@pxref{I/O Functions, ,Built-in Functions for Input/Output}).@refill
-@c Does the above verbiage prevents an overfull hbox? --mew, rjc 24jan1992
-
-@item
-The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
-and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}).
-
-@item
-The conditional expression using the operators @samp{?} and @samp{:}
-(@pxref{Conditional Exp, ,Conditional Expressions}).@refill
-
-@item
-The exponentiation operator @samp{^}
-(@pxref{Arithmetic Ops, ,Arithmetic Operators}) and its assignment operator
-form @samp{^=} (@pxref{Assignment Ops, ,Assignment Expressions}).@refill
-
-@item
-C-compatible operator precedence, which breaks some old @code{awk}
-programs (@pxref{Precedence, ,Operator Precedence (How Operators Nest)}).
-
-@item
-Regexps as the value of @code{FS}
-(@pxref{Field Separators, ,Specifying how Fields are Separated}), and as the
-third argument to the @code{split} function
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill
-
-@item
-Dynamic regexps as operands of the @samp{~} and @samp{!~} operators
-(@pxref{Regexp Usage, ,How to Use Regular Expressions}).
-
-@item
-Escape sequences (@pxref{Constants, ,Constant Expressions}) in regexps.@refill
-
-@item
-The escape sequences @samp{\b}, @samp{\f}, and @samp{\r}
-(@pxref{Constants, ,Constant Expressions}).
-
-@item
-Redirection of input for the @code{getline} function
-(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill
-
-@item
-Multiple @code{BEGIN} and @code{END} rules
-(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}).@refill
-
-@item
-Simulated multi-dimensional arrays
-(@pxref{Multi-dimensional, ,Multi-dimensional Arrays}).@refill
-@end itemize
-
-@node S5R4, POSIX, V7/S5R3.1, Language History
-@section Changes between S5R3.1 and S5R4
-
-The System V Release 4 version of Unix @code{awk} added these features
-(some of which originated in @code{gawk}):
-
-@itemize @bullet
-@item
-The @code{ENVIRON} variable (@pxref{Built-in Variables}).
-
-@item
-Multiple @samp{-f} options on the command line
-(@pxref{Command Line, ,Invoking @code{awk}}).@refill
-
-@item
-The @samp{-v} option for assigning variables before program execution begins
-(@pxref{Command Line, ,Invoking @code{awk}}).@refill
-
-@item
-The @samp{--} option for terminating command line options.
-
-@item
-The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences
-(@pxref{Constants, ,Constant Expressions}).@refill
-
-@item
-A defined return value for the @code{srand} built-in function
-(@pxref{Numeric Functions, ,Numeric Built-in Functions}).
-
-@item
-The @code{toupper} and @code{tolower} built-in string functions
-for case translation
-(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill
-
-@item
-A cleaner specification for the @samp{%c} format-control letter in the
-@code{printf} function
-(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill
-
-@item
-The ability to dynamically pass the field width and precision (@code{"%*.*d"})
-in the argument list of the @code{printf} function
-(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill
-
-@item
-The use of constant regexps such as @code{/foo/} as expressions, where
-they are equivalent to use of the matching operator, as in @code{$0 ~
-/foo/} (@pxref{Constants, ,Constant Expressions}).
-@end itemize
-
-@node POSIX, POSIX/GNU, S5R4, Language History
-@section Changes between S5R4 and POSIX @code{awk}
-
-The @sc{posix} Command Language and Utilities standard for @code{awk}
-introduced the following changes into the language:
-
-@itemize @bullet{}
-@item
-The use of @samp{-W} for implementation-specific options.
-
-@item
-The use of @code{CONVFMT} for controlling the conversion of numbers
-to strings (@pxref{Conversion, ,Conversion of Strings and Numbers}).
-
-@item
-The concept of a numeric string, and tighter comparison rules to go
-with it (@pxref{Comparison Ops, ,Comparison Expressions}).
-
-@item
-More complete documentation of many of the previously undocumented
-features of the language.
-@end itemize
-
-@node POSIX/GNU, , POSIX, Language History
-@section Extensions in @code{gawk} not in POSIX @code{awk}
-
-The GNU implementation, @code{gawk}, adds these features:
-
-@itemize @bullet
-@item
-The @code{AWKPATH} environment variable for specifying a path search for
-the @samp{-f} command line option
-(@pxref{Command Line, ,Invoking @code{awk}}).@refill
-
-@item
-The various @code{gawk} specific features available via the @samp{-W}
-command line option (@pxref{Command Line, ,Invoking @code{awk}}).
-
-@item
-The @code{ARGIND} variable, that tracks the movement of @code{FILENAME}
-through @code{ARGV}. (@pxref{Built-in Variables}).
-
-@item
-The @code{ERRNO} variable, that contains the system error message when
-@code{getline} returns @minus{}1, or when @code{close} fails.
-(@pxref{Built-in Variables}).
-
-@item
-The @code{IGNORECASE} variable and its effects
-(@pxref{Case-sensitivity, ,Case-sensitivity in Matching}).@refill
-
-@item
-The @code{FIELDWIDTHS} variable and its effects
-(@pxref{Constant Size, ,Reading Fixed-width Data}).@refill
-
-@item
-The @code{next file} statement for skipping to the next data file
-(@pxref{Next File Statement, ,The @code{next file} Statement}).@refill
-
-@item
-The @code{systime} and @code{strftime} built-in functions for obtaining
-and printing time stamps
-(@pxref{Time Functions, ,Functions for Dealing with Time Stamps}).@refill
-
-@item
-The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr}, and
-@file{/dev/fd/@var{n}} file name interpretation
-(@pxref{Special Files, ,Standard I/O Streams}).@refill
-
-@item
-The @samp{-W compat} option to turn off these extensions
-(@pxref{Command Line, ,Invoking @code{awk}}).@refill
-
-@item
-The @samp{-W posix} option for full @sc{posix} compliance
-(@pxref{Command Line, ,Invoking @code{awk}}).@refill
-
-@end itemize
-
-@node Installation, Gawk Summary, Language History, Top
-@chapter Installing @code{gawk}
-
-This chapter provides instructions for installing @code{gawk} on the
-various platforms that are supported by the developers. The primary
-developers support Unix (and one day, GNU), while the other ports were
-contributed. The file @file{ACKNOWLEDGMENT} in the @code{gawk}
-distribution lists the electronic mail addresses of the people who did
-the respective ports.@refill
-
-@menu
-* Gawk Distribution:: What is in the @code{gawk} distribution.
-* Unix Installation:: Installing @code{gawk} under various versions
- of Unix.
-* VMS Installation:: Installing @code{gawk} on VMS.
-* MS-DOS Installation:: Installing @code{gawk} on MS-DOS.
-* Atari Installation:: Installing @code{gawk} on the Atari ST.
-@end menu
-
-@node Gawk Distribution, Unix Installation, Installation, Installation
-@section The @code{gawk} Distribution
-
-This section first describes how to get and extract the @code{gawk}
-distribution, and then discusses what is in the various files and
-subdirectories.
-
-@menu
-* Extracting:: How to get and extract the distribution.
-* Distribution contents:: What is in the distribution.
-@end menu
-
-@node Extracting, Distribution contents, Gawk Distribution, Gawk Distribution
-@subsection Getting the @code{gawk} Distribution
-
-@cindex getting gawk
-@cindex anonymous ftp
-@cindex anonymous uucp
-@cindex ftp, anonymous
-@cindex uucp, anonymous
-@code{gawk} is distributed as a @code{tar} file compressed with the
-GNU Zip program, @code{gzip}. You can
-get it via anonymous @code{ftp} to the Internet host @code{prep.ai.mit.edu}.
-Like all GNU software, it will be archived at other well known systems,
-from which it will be possible to use some sort of anonymous @code{uucp} to
-obtain the distribution as well.
-You can also order @code{gawk} on tape or CD-ROM directly from the
-Free Software Foundation. (The address is on the copyright page.)
-Doing so directly contributes to the support of the foundation and to
-the production of more free software.
-
-Once you have the distribution (for example,
-@file{gawk-2.15.0.tar.z}), first use @code{gzip} to expand the
-file, and then use @code{tar} to extract it. You can use the following
-pipeline to produce the @code{gawk} distribution:
-
-@example
-# Under System V, add 'o' to the tar flags
-gzip -d -c gawk-2.15.0.tar.z | tar -xvpf -
-@end example
-
-@noindent
-This will create a directory named @file{gawk-2.15} in the current
-directory.
-
-The distribution file name is of the form @file{gawk-2.15.@var{n}.tar.Z}.
-The @var{n} represents a @dfn{patchlevel}, meaning that minor bugs have
-been fixed in the major release. The current patchlevel is 0, but when
-retrieving distributions, you should get the version with the highest
-patchlevel.@refill
-
-If you are not on a Unix system, you will need to make other arrangements
-for getting and extracting the @code{gawk} distribution. You should consult
-a local expert.
-
-@node Distribution contents, , Extracting, Gawk Distribution
-@subsection Contents of the @code{gawk} Distribution
-
-@code{gawk} has a number of C source files, documentation files,
-subdirectories and files related to the configuration process
-(@pxref{Unix Installation, ,Compiling and Installing @code{gawk} on Unix}),
-and several subdirectories related to different, non-Unix,
-operating systems.@refill
-
-@table @asis
-@item various @samp{.c}, @samp{.y}, and @samp{.h} files
-
-The C and YACC source files are the actual @code{gawk} source code.
-@end table
-
-@table @file
-@item README
-@itemx README.VMS
-@itemx README.dos
-@itemx README.rs6000
-@itemx README.ultrix
-Descriptive files: @file{README} for @code{gawk} under Unix, and the
-rest for the various hardware and software combinations.
-
-@item PORTS
-A list of systems to which @code{gawk} has been ported, and which
-have successfully run the test suite.
-
-@item ACKNOWLEDGMENT
-A list of the people who contributed major parts of the code or documentation.
-
-@item NEWS
-A list of changes to @code{gawk} since the last release or patch.
-
-@item COPYING
-The GNU General Public License.
-
-@item FUTURES
-A brief list of features and/or changes being contemplated for future
-releases, with some indication of the time frame for the feature, based
-on its difficulty.
-
-@item LIMITATIONS
-A list of those factors that limit @code{gawk}'s performance.
-Most of these depend on the hardware or operating system software, and
-are not limits in @code{gawk} itself.@refill
-
-@item PROBLEMS
-A file describing known problems with the current release.
-
-@item gawk.1
-The @code{troff} source for a manual page describing @code{gawk}.
-
-@item gawk.texinfo
-@ifinfo
-The @code{texinfo} source file for this Info file.
-It should be processed with @TeX{} to produce a printed manual, and
-with @code{makeinfo} to produce the Info file.@refill
-@end ifinfo
-@iftex
-The @code{texinfo} source file for this manual.
-It should be processed with @TeX{} to produce a printed manual, and
-with @code{makeinfo} to produce the Info file.@refill
-@end iftex
-
-@item Makefile.in
-@itemx config
-@itemx config.in
-@itemx configure
-@itemx missing
-@itemx mungeconf
-These files and subdirectories are used when configuring @code{gawk}
-for various Unix systems. They are explained in detail in
-@ref{Unix Installation, ,Compiling and Installing @code{gawk} on Unix}.@refill
-
-@item atari
-Files needed for building @code{gawk} on an Atari ST.
-@xref{Atari Installation, ,Installing @code{gawk} on the Atari ST}, for details.
-
-@item pc
-Files needed for building @code{gawk} under MS-DOS.
-@xref{MS-DOS Installation, ,Installing @code{gawk} on MS-DOS}, for details.
-
-@item vms
-Files needed for building @code{gawk} under VMS.
-@xref{VMS Installation, ,Compiling Installing and Running @code{gawk} on VMS}, for details.
-
-@item test
-Many interesting @code{awk} programs, provided as a test suite for
-@code{gawk}. You can use @samp{make test} from the top level @code{gawk}
-directory to run your version of @code{gawk} against the test suite.
-@c There are many programs here that are useful in their own right.
-If @code{gawk} successfully passes @samp{make test} then you can
-be confident of a successful port.@refill
-@end table
-
-@node Unix Installation, VMS Installation, Gawk Distribution, Installation
-@section Compiling and Installing @code{gawk} on Unix
-
-Often, you can compile and install @code{gawk} by typing only two
-commands. However, if you do not use a supported system, you may need
-to configure @code{gawk} for your system yourself.
-
-@menu
-* Quick Installation:: Compiling @code{gawk} on a
- supported Unix version.
-* Configuration Philosophy:: How it's all supposed to work.
-* New Configurations:: What to do if there is no supplied
- configuration for your system.
-@end menu
-
-@node Quick Installation, Configuration Philosophy, Unix Installation, Unix Installation
-@subsection Compiling @code{gawk} for a Supported Unix Version
-
-@cindex installation, unix
-After you have extracted the @code{gawk} distribution, @code{cd}
-to @file{gawk-2.15}. Look in the @file{config} subdirectory for a
-file that matches your hardware/software combination. In general,
-only the software is relevant; for example @code{sunos41} is used
-for SunOS 4.1, on both Sun 3 and Sun 4 hardware.@refill
-
-If you find such a file, run the command:
-
-@example
-# assume you have SunOS 4.1
-./configure sunos41
-@end example
-
-This produces a @file{Makefile} and @file{config.h} tailored to your
-system. You may wish to edit the @file{Makefile} to use a different
-C compiler, such as @code{gcc}, the GNU C compiler, if you have it.
-You may also wish to change the @code{CFLAGS} variable, which controls
-the command line options that are passed to the C compiler (such as
-optimization levels, or compiling for debugging).@refill
-
-After you have configured @file{Makefile} and @file{config.h}, type:
-
-@example
-make
-@end example
-
-@noindent
-and shortly thereafter, you should have an executable version of @code{gawk}.
-That's all there is to it!
-
-@node Configuration Philosophy, New Configurations, Quick Installation, Unix Installation
-@subsection The Configuration Process
-
-(This section is of interest only if you know something about using the
-C language and the Unix operating system.)
-
-The source code for @code{gawk} generally attempts to adhere to industry
-standards wherever possible. This means that @code{gawk} uses library
-routines that are specified by the @sc{ansi} C standard and by the @sc{posix}
-operating system interface standard. When using an @sc{ansi} C compiler,
-function prototypes are provided to help improve the compile-time checking.
-
-Many older Unix systems do not support all of either the @sc{ansi} or the
-@sc{posix} standards. The @file{missing} subdirectory in the @code{gawk}
-distribution contains replacement versions of those subroutines that are
-most likely to be missing.
-
-The @file{config.h} file that is created by the @code{configure} program
-contains definitions that describe features of the particular operating
-system where you are attempting to compile @code{gawk}. For the most
-part, it lists which standard subroutines are @emph{not} available.
-For example, if your system lacks the @samp{getopt} routine, then
-@samp{GETOPT_MISSING} would be defined.
-
-@file{config.h} also defines constants that describe facts about your
-variant of Unix. For example, there may not be an @samp{st_blksize}
-element in the @code{stat} structure. In this case @samp{BLKSIZE_MISSING}
-would be defined.
-
-Based on the list in @file{config.h} of standard subroutines that are
-missing, @file{missing.c} will do a @samp{#include} of the appropriate
-file(s) from the @file{missing} subdirectory.@refill
-
-Conditionally compiled code in the other source files relies on the
-other definitions in the @file{config.h} file.
-
-Besides creating @file{config.h}, @code{configure} produces a @file{Makefile}
-from @file{Makefile.in}. There are a number of lines in @file{Makefile.in}
-that are system or feature specific. For example, there is line that begins
-with @samp{##MAKE_ALLOCA_C##}. This is normally a comment line, since
-it starts with @samp{#}. If a configuration file has @samp{MAKE_ALLOCA_C}
-in it, then @code{configure} will delete the @samp{##MAKE_ALLOCA_C##}
-from the beginning of the line. This will enable the rules in the
-@file{Makefile} that use a C version of @samp{alloca}. There are several
-similar features that work in this fashion.@refill
-
-@node New Configurations, , Configuration Philosophy, Unix Installation
-@subsection Configuring @code{gawk} for a New System
-
-(This section is of interest only if you know something about using the
-C language and the Unix operating system, and if you have to install
-@code{gawk} on a system that is not supported by the @code{gawk} distribution.
-If you are a C or Unix novice, get help from a local expert.)
-
-If you need to configure @code{gawk} for a Unix system that is not
-supported in the distribution, first see
-@ref{Configuration Philosophy, ,The Configuration Process}.
-Then, copy @file{config.in} to @file{config.h}, and copy
-@file{Makefile.in} to @file{Makefile}.@refill
-
-Next, edit both files. Both files are liberally commented, and the
-necessary changes should be straightforward.
-
-While editing @file{config.h}, you need to determine what library
-routines you do or do not have by consulting your system documentation, or
-by perusing your actual libraries using the @code{ar} or @code{nm} utilities.
-In the worst case, simply do not define @emph{any} of the macros for missing
-subroutines. When you compile @code{gawk}, the final link-editing step
-will fail. The link editor will provide you with a list of unresolved external
-references---these are the missing subroutines. Edit @file{config.h} again
-and recompile, and you should be set.@refill
-
-Editing the @file{Makefile} should also be straightforward. Enable or
-disable the lines that begin with @samp{##MAKE_@var{whatever}##}, as
-appropriate. Select the correct C compiler and @code{CFLAGS} for it.
-Then run @code{make}.
-
-Getting a correct configuration is likely to be an iterative process.
-Do not be discouraged if it takes you several tries. If you have no
-luck whatsoever, please report your system type, and the steps you took.
-Once you do have a working configuration, please send it to the maintainers
-so that support for your system can be added to the official release.
-
-@xref{Bugs, ,Reporting Problems and Bugs}, for information on how to report
-problems in configuring @code{gawk}. You may also use the same mechanisms
-for sending in new configurations.@refill
-
-@node VMS Installation, MS-DOS Installation, Unix Installation, Installation
-@section Compiling, Installing, and Running @code{gawk} on VMS
-
-@c based on material from
-@c Pat Rankin <rankin@eql.caltech.edu>
-
-@cindex installation, vms
-This section describes how to compile and install @code{gawk} under VMS.
-
-@menu
-* VMS Compilation:: How to compile @code{gawk} under VMS.
-* VMS Installation Details:: How to install @code{gawk} under VMS.
-* VMS Running:: How to run @code{gawk} under VMS.
-* VMS POSIX:: Alternate instructions for VMS POSIX.
-@end menu
-
-@node VMS Compilation, VMS Installation Details, VMS Installation, VMS Installation
-@subsection Compiling @code{gawk} under VMS
-
-To compile @code{gawk} under VMS, there is a @code{DCL} command procedure that
-will issue all the necessary @code{CC} and @code{LINK} commands, and there is
-also a @file{Makefile} for use with the @code{MMS} utility. From the source
-directory, use either
-
-@smallexample
-$ @@[.VMS]VMSBUILD.COM
-@end smallexample
-
-@noindent
-or
-
-@smallexample
-$ MMS/DESCRIPTION=[.VMS]DECSRIP.MMS GAWK
-@end smallexample
-
-Depending upon which C compiler you are using, follow one of the sets
-of instructions in this table:
-
-@table @asis
-@item VAX C V3.x
-Use either @file{vmsbuild.com} or @file{descrip.mms} as is. These use
-@code{CC/OPTIMIZE=NOLINE}, which is essential for Version 3.0.
-
-@item VAX C V2.x
-You must have Version 2.3 or 2.4; older ones won't work. Edit either
-@file{vmsbuild.com} or @file{descrip.mms} according to the comments in them.
-For @file{vmsbuild.com}, this just entails removing two @samp{!} delimiters.
-Also edit @file{config.h} (which is a copy of file @file{[.config]vms-conf.h})
-and comment out or delete the two lines @samp{#define __STDC__ 0} and
-@samp{#define VAXC_BUILTINS} near the end.@refill
-
-@item GNU C
-Edit @file{vmsbuild.com} or @file{descrip.mms}; the changes are different
-from those for VAX C V2.x, but equally straightforward. No changes to
-@file{config.h} should be needed.
-
-@item DEC C
-Edit @file{vmsbuild.com} or @file{descrip.mms} according to their comments.
-No changes to @file{config.h} should be needed.
-@end table
-
-@code{gawk} 2.15 has been tested under VAX/VMS 5.5-1 using VAX C V3.2,
-GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6 and up.
-
-@node VMS Installation Details, VMS Running, VMS Compilation, VMS Installation
-@subsection Installing @code{gawk} on VMS
-
-To install @code{gawk}, all you need is a ``foreign'' command, which is
-a @code{DCL} symbol whose value begins with a dollar sign.
-
-@smallexample
-$ GAWK :== $device:[directory]GAWK
-@end smallexample
-
-@noindent
-(Substitute the actual location of @code{gawk.exe} for
-@samp{device:[directory]}.) The symbol should be placed in the
-@file{login.com} of any user who wishes to run @code{gawk},
-so that it will be defined every time the user logs on.
-Alternatively, the symbol may be placed in the system-wide
-@file{sylogin.com} procedure, which will allow all users
-to run @code{gawk}.@refill
-
-Optionally, the help entry can be loaded into a VMS help library:
-
-@smallexample
-$ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP
-@end smallexample
-
-@noindent
-(You may want to substitute a site-specific help library rather than
-the standard VMS library @samp{HELPLIB}.) After loading the help text,
-
-@c this is so tiny, but `should' be smallexample for consistency sake...
-@c I didn't because it was so short. --mew 29jan1992
-@example
-$ HELP GAWK
-@end example
-
-@noindent
-will provide information about both the @code{gawk} implementation and the
-@code{awk} programming language.
-
-The logical name @samp{AWK_LIBRARY} can designate a default location
-for @code{awk} program files. For the @samp{-f} option, if the specified
-filename has no device or directory path information in it, @code{gawk}
-will look in the current directory first, then in the directory specified
-by the translation of @samp{AWK_LIBRARY} if the file was not found.
-If after searching in both directories, the file still is not found,
-then @code{gawk} appends the suffix @samp{.awk} to the filename and the
-file search will be re-tried. If @samp{AWK_LIBRARY} is not defined, that
-portion of the file search will fail benignly.@refill
-
-@node VMS Running, VMS POSIX, VMS Installation Details, VMS Installation
-@subsection Running @code{gawk} on VMS
-
-Command line parsing and quoting conventions are significantly different
-on VMS, so examples in this manual or from other sources often need minor
-changes. They @emph{are} minor though, and all @code{awk} programs
-should run correctly.
-
-Here are a couple of trivial tests:
-
-@smallexample
-$ gawk -- "BEGIN @{print ""Hello, World!""@}"
-$ gawk -"W" version ! could also be -"W version" or "-W version"
-@end smallexample
-
-@noindent
-Note that upper-case and mixed-case text must be quoted.
-
-The VMS port of @code{gawk} includes a @code{DCL}-style interface in addition
-to the original shell-style interface (see the help entry for details).
-One side-effect of dual command line parsing is that if there is only a
-single parameter (as in the quoted string program above), the command
-becomes ambiguous. To work around this, the normally optional @samp{--}
-flag is required to force Unix style rather than @code{DCL} parsing. If any
-other dash-type options (or multiple parameters such as data files to be
-processed) are present, there is no ambiguity and @samp{--} can be omitted.
-
-The default search path when looking for @code{awk} program files specified
-by the @samp{-f} option is @code{"SYS$DISK:[],AWK_LIBRARY:"}. The logical
-name @samp{AWKPATH} can be used to override this default. The format
-of @samp{AWKPATH} is a comma-separated list of directory specifications.
-When defining it, the value should be quoted so that it retains a single
-translation, and not a multi-translation @code{RMS} searchlist.
-
-@node VMS POSIX, , VMS Running, VMS Installation
-@subsection Building and using @code{gawk} under VMS POSIX
-
-Ignore the instructions above, although @file{vms/gawk.hlp} should still
-be made available in a help library. Make sure that the two scripts,
-@file{configure} and @file{mungeconf}, are executable; use @samp{chmod +x}
-on them if necessary. Then execute the following commands:
-
-@smallexample
-$ POSIX
-psx> configure vms-posix
-psx> make awktab.c gawk
-@end smallexample
-
-@noindent
-The first command will construct files @file{config.h} and @file{Makefile}
-out of templates. The second command will compile and link @code{gawk}.
-Due to a @code{make} bug in VMS POSIX V1.0 and V1.1,
-the file @file{awktab.c} must be given as an explicit target or it will
-not be built and the final link step will fail. Ignore the warning
-@samp{"Could not find lib m in lib list"}; it is harmless, caused by the
-explicit use of @samp{-lm} as a linker option which is not needed
-under VMS POSIX. Under V1.1 (but not V1.0) a problem with the @code{yacc}
-skeleton @file{/etc/yyparse.c} will cause a compiler warning for
-@file{awktab.c}, followed by a linker warning about compilation warnings
-in the resulting object module. These warnings can be ignored.@refill
-
-Once built, @code{gawk} will work like any other shell utility. Unlike
-the normal VMS port of @code{gawk}, no special command line manipulation is
-needed in the VMS POSIX environment.
-
-@node MS-DOS Installation, Atari Installation, VMS Installation, Installation
-@section Installing @code{gawk} on MS-DOS
-
-@cindex installation, ms-dos
-The first step is to get all the files in the @code{gawk} distribution
-onto your PC. Move all the files from the @file{pc} directory into
-the main directory where the other files are. Edit the file
-@file{make.bat} so that it will be an acceptable MS-DOS batch file.
-This means making sure that all lines are terminated with the ASCII
-carriage return and line feed characters.
-restrictions.
-
-@code{gawk} has only been compiled with version 5.1 of the Microsoft
-C compiler. The file @file{make.bat} from the @file{pc} directory
-assumes that you have this compiler.
-
-Copy the file @file{setargv.obj} from the library directory where it
-resides to the @code{gawk} source code directory.
-
-Run @file{make.bat}. This will compile @code{gawk} for you, and link it.
-That's all there is to it!
-
-@node Atari Installation, , MS-DOS Installation, Installation
-@section Installing @code{gawk} on the Atari ST
-
-@c based on material from
-@c Michal Jaegermann <ntomczak@vm.ucs.ualberta.ca>
-
-@cindex installation, atari
-This section assumes that you are running TOS. It applies to other Atari
-models (STe, TT) as well.
-
-In order to use @code{gawk}, you need to have a shell, either text or
-graphics, that does not map all the characters of a command line to
-upper case. Maintaining case distinction in option flags is very
-important (@pxref{Command Line, ,Invoking @code{awk}}). Popular shells
-like @code{gulam} or @code{gemini} will work, as will newer versions of
-@code{desktop}. Support for I/O redirection is necessary to make it easy
-to import @code{awk} programs from other environments. Pipes are nice to have,
-but not vital.
-
-If you have received an executable version of @code{gawk}, place it,
-as usual, anywhere in your @code{PATH} where your shell will find it.
-
-While executing, @code{gawk} creates a number of temporary files.
-@code{gawk} looks for either of the environment variables @code{TEMP}
-or @code{TMPDIR}, in that order. If either one is found, its value
-is assumed to be a directory for temporary files. This directory
-must exist, and if you can spare the memory, it is a good idea to
-put it on a @sc{ram} drive. If neither @code{TEMP} nor @code{TMPDIR}
-are found, then @code{gawk} uses the current directory for its
-temporary files.
-
-The ST version of @code{gawk} searches for its program files as
-described in @ref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}.
-On the ST, the default value for the @code{AWKPATH} variable is
-@code{@w{".,c:\lib\awk,c:\gnu\lib\awk"}}.
-The search path can be modified by explicitly setting @code{AWKPATH} to
-whatever you wish. Note that colons cannot be used on the ST to separate
-elements in the @code{AWKPATH} variable, since they have another, reserved,
-meaning. Instead, you must use a comma to separate elements in the path.
-If you are recompiling @code{gawk} on the ST, then you can choose a new
-default search path, by setting the value of @samp{DEFPATH} in the file
-@file{...\config\atari}. You may choose a different separator character
-by setting the value of @samp{ENVSEP} in the same file. The new values will
-be used when creating the header file @file{config.h}.@refill
-
-@ignore
-As a last resort, small
-adjustments can be made directly on the executable version of @code{gawk}
-using a binary editor.@refill
-@end ignore
-
-Although @code{awk} allows great flexibility in doing I/O redirections
-from within a program, this facility should be used with care on the ST.
-In some circumstances the OS routines for file handle pool processing
-lose track of certain events, causing the computer to crash, and requiring
-a reboot. Often a warm reboot is sufficient. Fortunately, this happens
-infrequently, and in rather esoteric situations. In particular, avoid
-having one part of an @code{awk} program using @code{print}
-statements explicitly redirected to @code{"/dev/stdout"}, while other
-@code{print} statements use the default standard output, and a
-calling shell has redirected standard output to a file.@refill
-@c whew!
-
-When @code{gawk} is compiled with the ST version of @code{gcc} and its
-usual libraries, it will accept both @samp{/} and @samp{\} as path separators.
-While this is convenient, it should be remembered that this removes one,
-technically legal, character (@samp{/}) from your file names, and that
-it may create problems for external programs, called via the @code{system()}
-function, which may not support this convention. Whenever it is possible
-that a file created by @code{gawk} will be used by some other program,
-use only backslashes. Also remember that in @code{awk}, backslashes in
-strings have to be doubled in order to get literal backslashes.
-
-The initial port of @code{gawk} to the ST was done with @code{gcc}.
-If you wish to recompile @code{gawk} from scratch, you will need to use
-a compiler that accepts @sc{ansi} standard C (such as @code{gcc}, Turbo C,
-or Prospero C). If @code{sizeof(int) != @w{sizeof(int *)}}, the correctness
-of the generated code depends heavily on the fact that all function calls
-have function prototypes in the current scope. If your compiler does
-not accept function prototypes, you will probably have to add a
-number of casts to the code.@refill
-
-If you are using @code{gcc}, make sure that you have up-to-date libraries.
-Older versions have problems with some library functions (@code{atan2()},
-@code{strftime()}, the @samp{%g} conversion in @code{sprintf()}) which
-may affect the operation of @code{gawk}.
-
-In the @file{atari} subdirectory of the @code{gawk} distribution is
-a version of the @code{system()} function that has been tested with
-@code{gulam} and @code{msh}; it should work with other shells as well.
-With @code{gulam}, it passes the string to be executed without spawning
-an extra copy of a shell. It is possible to replace this version of
-@code{system()} with a similar function from a library or from some other
-source if that version would be a better choice for the shell you prefer.
-
-The files needed to recompile @code{gawk} on the ST can be found in
-the @file{atari} directory. The provided files and instructions below
-assume that you have the GNU C compiler (@code{gcc}), the @code{gulam} shell,
-and an ST version of @code{sed}. The @file{Makefile} is set up to use
-@file{byacc} as a @file{yacc} replacement. With a different set of tools some
-adjustments and/or editing will be needed.@refill
-
-@code{cd} to the @file{atari} directory. Copy @file{Makefile.st} to
-@file{makefile} in the source (parent) directory. Possibly adjust
-@file{../config/atari} to suit your system. Execute the script @file{mkconf.g}
-which will create the header file @file{../config.h}. Go back to the source
-directory. If you are not using @code{gcc}, check the file @file{missing.c}.
-It may be necessary to change forward slashes in the references to files
-from the @file{atari} subdirectory into backslashes. Type @code{make} and
-enjoy.@refill
-
-Compilation with @code{gcc} of some of the bigger modules, like
-@file{awk_tab.c}, may require a full four megabytes of memory. On smaller
-machines you would need to cut down on optimizations, or you would have to
-switch to another, less memory hungry, compiler.@refill
-
-@node Gawk Summary, Sample Program, Installation, Top
-@appendix @code{gawk} Summary
-
-This appendix provides a brief summary of the @code{gawk} command line and the
-@code{awk} language. It is designed to serve as ``quick reference.'' It is
-therefore terse, but complete.
-
-@menu
-* Command Line Summary:: Recapitulation of the command line.
-* Language Summary:: A terse review of the language.
-* Variables/Fields:: Variables, fields, and arrays.
-* Rules Summary:: Patterns and Actions, and their
- component parts.
-* Functions Summary:: Defining and calling functions.
-* Historical Features:: Some undocumented but supported ``features''.
-@end menu
-
-@node Command Line Summary, Language Summary, Gawk Summary, Gawk Summary
-@appendixsec Command Line Options Summary
-
-The command line consists of options to @code{gawk} itself, the
-@code{awk} program text (if not supplied via the @samp{-f} option), and
-values to be made available in the @code{ARGC} and @code{ARGV}
-predefined @code{awk} variables:
-
-@example
-awk @r{[@var{POSIX or GNU style options}]} -f source-file @r{[@code{--}]} @var{file} @dots{}
-awk @r{[@var{POSIX or GNU style options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{}
-@end example
-
-The options that @code{gawk} accepts are:
-
-@table @code
-@item -F @var{fs}
-@itemx --field-separator=@var{fs}
-Use @var{fs} for the input field separator (the value of the @code{FS}
-predefined variable).
-
-@item -f @var{program-file}
-@itemx --file=@var{program-file}
-Read the @code{awk} program source from the file @var{program-file}, instead
-of from the first command line argument.
-
-@item -v @var{var}=@var{val}
-@itemx --assign=@var{var}=@var{val}
-Assign the variable @var{var} the value @var{val} before program execution
-begins.
-
-@item -W compat
-@itemx --compat
-Specifies compatibility mode, in which @code{gawk} extensions are turned
-off.
-
-@item -W copyleft
-@itemx -W copyright
-@itemx --copyleft
-@itemx --copyright
-Print the short version of the General Public License on the error
-output. This option may disappear in a future version of @code{gawk}.
-
-@item -W help
-@itemx -W usage
-@itemx --help
-@itemx --usage
-Print a relatively short summary of the available options on the error output.
-
-@item -W lint
-@itemx --lint
-Give warnings about dubious or non-portable @code{awk} constructs.
-
-@item -W posix
-@itemx --posix
-Specifies @sc{posix} compatibility mode, in which @code{gawk} extensions
-are turned off and additional restrictions apply.
-
-@item -W source=@var{program-text}
-@itemx --source=@var{program-text}
-Use @var{program-text} as @code{awk} program source code. This option allows
-mixing command line source code with source code from files, and is
-particularly useful for mixing command line programs with library functions.
-
-@item -W version
-@itemx --version
-Print version information for this particular copy of @code{gawk} on the error
-output. This option may disappear in a future version of @code{gawk}.
-
-@item --
-Signal the end of options. This is useful to allow further arguments to the
-@code{awk} program itself to start with a @samp{-}. This is mainly for
-consistency with the argument parsing conventions of @sc{posix}.
-@end table
-
-Any other options are flagged as invalid, but are otherwise ignored.
-@xref{Command Line, ,Invoking @code{awk}}, for more details.
-
-@node Language Summary, Variables/Fields, Command Line Summary, Gawk Summary
-@appendixsec Language Summary
-
-An @code{awk} program consists of a sequence of pattern-action statements
-and optional function definitions.
-
-@example
-@var{pattern} @{ @var{action statements} @}
-
-function @var{name}(@var{parameter list}) @{ @var{action statements} @}
-@end example
-
-@code{gawk} first reads the program source from the
-@var{program-file}(s) if specified, or from the first non-option
-argument on the command line. The @samp{-f} option may be used multiple
-times on the command line. @code{gawk} reads the program text from all
-the @var{program-file} files, effectively concatenating them in the
-order they are specified. This is useful for building libraries of
-@code{awk} functions, without having to include them in each new
-@code{awk} program that uses them. To use a library function in a file
-from a program typed in on the command line, specify @samp{-f /dev/tty};
-then type your program, and end it with a @kbd{Control-d}.
-@xref{Command Line, ,Invoking @code{awk}}.@refill
-
-The environment variable @code{AWKPATH} specifies a search path to use
-when finding source files named with the @samp{-f} option. The default
-path, which is
-@samp{.:/usr/lib/awk:/usr/local/lib/awk} is used if @code{AWKPATH} is not set.
-If a file name given to the @samp{-f} option contains a @samp{/} character,
-no path search is performed.
-@xref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable},
-for a full description of the @code{AWKPATH} environment variable.@refill
-
-@code{gawk} compiles the program into an internal form, and then proceeds to
-read each file named in the @code{ARGV} array. If there are no files named
-on the command line, @code{gawk} reads the standard input.
-
-If a ``file'' named on the command line has the form
-@samp{@var{var}=@var{val}}, it is treated as a variable assignment: the
-variable @var{var} is assigned the value @var{val}.
-If any of the files have a value that is the null string, that
-element in the list is skipped.@refill
-
-For each line in the input, @code{gawk} tests to see if it matches any
-@var{pattern} in the @code{awk} program. For each pattern that the line
-matches, the associated @var{action} is executed.
-
-@node Variables/Fields, Rules Summary, Language Summary, Gawk Summary
-@appendixsec Variables and Fields
-
-@code{awk} variables are dynamic; they come into existence when they are
-first used. Their values are either floating-point numbers or strings.
-@code{awk} also has one-dimension arrays; multiple-dimensional arrays
-may be simulated. There are several predefined variables that
-@code{awk} sets as a program runs; these are summarized below.
-
-@menu
-* Fields Summary:: Input field splitting.
-* Built-in Summary:: @code{awk}'s built-in variables.
-* Arrays Summary:: Using arrays.
-* Data Type Summary:: Values in @code{awk} are numbers or strings.
-@end menu
-
-@node Fields Summary, Built-in Summary, Variables/Fields, Variables/Fields
-@appendixsubsec Fields
-
-As each input line is read, @code{gawk} splits the line into
-@var{fields}, using the value of the @code{FS} variable as the field
-separator. If @code{FS} is a single character, fields are separated by
-that character. Otherwise, @code{FS} is expected to be a full regular
-expression. In the special case that @code{FS} is a single blank,
-fields are separated by runs of blanks and/or tabs. Note that the value
-of @code{IGNORECASE} (@pxref{Case-sensitivity, ,Case-sensitivity in Matching})
-also affects how fields are split when @code{FS} is a regular expression.@refill
-
-Each field in the input line may be referenced by its position, @code{$1},
-@code{$2}, and so on. @code{$0} is the whole line. The value of a field may
-be assigned to as well. Field numbers need not be constants:
-
-@example
-n = 5
-print $n
-@end example
-
-@noindent
-prints the fifth field in the input line. The variable @code{NF} is set to
-the total number of fields in the input line.
-
-References to nonexistent fields (i.e., fields after @code{$NF}) return
-the null-string. However, assigning to a nonexistent field (e.g.,
-@code{$(NF+2) = 5}) increases the value of @code{NF}, creates any
-intervening fields with the null string as their value, and causes the
-value of @code{$0} to be recomputed, with the fields being separated by
-the value of @code{OFS}.@refill
-
-@xref{Reading Files, ,Reading Input Files}, for a full description of the
-way @code{awk} defines and uses fields.
-
-@node Built-in Summary, Arrays Summary, Fields Summary, Variables/Fields
-@appendixsubsec Built-in Variables
-
-@code{awk}'s built-in variables are:
-
-@table @code
-@item ARGC
-The number of command line arguments (not including options or the
-@code{awk} program itself).
-
-@item ARGIND
-The index in @code{ARGV} of the current file being processed.
-It is always true that @samp{FILENAME == ARGV[ARGIND]}.
-
-@item ARGV
-The array of command line arguments. The array is indexed from 0 to
-@code{ARGC} @minus{} 1. Dynamically changing the contents of @code{ARGV}
-can control the files used for data.@refill
-
-@item CONVFMT
-The conversion format to use when converting numbers to strings.
-
-@item FIELDWIDTHS
-A space separated list of numbers describing the fixed-width input data.
-
-@item ENVIRON
-An array containing the values of the environment variables. The array
-is indexed by variable name, each element being the value of that
-variable. Thus, the environment variable @code{HOME} would be in
-@code{ENVIRON["HOME"]}. Its value might be @file{/u/close}.
-
-Changing this array does not affect the environment seen by programs
-which @code{gawk} spawns via redirection or the @code{system} function.
-(This may change in a future version of @code{gawk}.)
-
-Some operating systems do not have environment variables.
-The array @code{ENVIRON} is empty when running on these systems.
-
-@item ERRNO
-The system error message when an error occurs using @code{getline}
-or @code{close}.
-
-@item FILENAME
-The name of the current input file. If no files are specified on the command
-line, the value of @code{FILENAME} is @samp{-}.
-
-@item FNR
-The input record number in the current input file.
-
-@item FS
-The input field separator, a blank by default.
-
-@item IGNORECASE
-The case-sensitivity flag for regular expression operations. If
-@code{IGNORECASE} has a nonzero value, then pattern matching in rules,
-field splitting with @code{FS}, regular expression matching with
-@samp{~} and @samp{!~}, and the @code{gsub}, @code{index}, @code{match},
-@code{split} and @code{sub} predefined functions all ignore case
-when doing regular expression operations.@refill
-
-@item NF
-The number of fields in the current input record.
-
-@item NR
-The total number of input records seen so far.
-
-@item OFMT
-The output format for numbers for the @code{print} statement,
-@code{"%.6g"} by default.
-
-@item OFS
-The output field separator, a blank by default.
-
-@item ORS
-The output record separator, by default a newline.
-
-@item RS
-The input record separator, by default a newline. @code{RS} is exceptional
-in that only the first character of its string value is used for separating
-records. If @code{RS} is set to the null string, then records are separated by
-blank lines. When @code{RS} is set to the null string, then the newline
-character always acts as a field separator, in addition to whatever value
-@code{FS} may have.@refill
-
-@item RSTART
-The index of the first character matched by @code{match}; 0 if no match.
-
-@item RLENGTH
-The length of the string matched by @code{match}; @minus{}1 if no match.
-
-@item SUBSEP
-The string used to separate multiple subscripts in array elements, by
-default @code{"\034"}.
-@end table
-
-@xref{Built-in Variables}, for more information.
-
-@node Arrays Summary, Data Type Summary, Built-in Summary, Variables/Fields
-@appendixsubsec Arrays
-
-Arrays are subscripted with an expression between square brackets
-(@samp{[} and @samp{]}). Array subscripts are @emph{always} strings;
-numbers are converted to strings as necessary, following the standard
-conversion rules
-(@pxref{Conversion, ,Conversion of Strings and Numbers}).@refill
-
-If you use multiple expressions separated by commas inside the square
-brackets, then the array subscript is a string consisting of the
-concatenation of the individual subscript values, converted to strings,
-separated by the subscript separator (the value of @code{SUBSEP}).
-
-The special operator @code{in} may be used in an @code{if} or
-@code{while} statement to see if an array has an index consisting of a
-particular value.
-
-@example
-if (val in array)
- print array[val]
-@end example
-
-If the array has multiple subscripts, use @code{(i, j, @dots{}) in array}
-to test for existence of an element.
-
-The @code{in} construct may also be used in a @code{for} loop to iterate
-over all the elements of an array.
-@xref{Scanning an Array, ,Scanning all Elements of an Array}.@refill
-
-An element may be deleted from an array using the @code{delete} statement.
-
-@xref{Arrays, ,Arrays in @code{awk}}, for more detailed information.
-
-@node Data Type Summary, , Arrays Summary, Variables/Fields
-@appendixsubsec Data Types
-
-The value of an @code{awk} expression is always either a number
-or a string.
-
-Certain contexts (such as arithmetic operators) require numeric
-values. They convert strings to numbers by interpreting the text
-of the string as a numeral. If the string does not look like a
-numeral, it converts to 0.
-
-Certain contexts (such as concatenation) require string values.
-They convert numbers to strings by effectively printing them
-with @code{sprintf}.
-@xref{Conversion, ,Conversion of Strings and Numbers}, for the details.@refill
-
-To force conversion of a string value to a number, simply add 0
-to it. If the value you start with is already a number, this
-does not change it.
-
-To force conversion of a numeric value to a string, concatenate it with
-the null string.
-
-The @code{awk} language defines comparisons as being done numerically if
-both operands are numeric, or if one is numeric and the other is a numeric
-string. Otherwise one or both operands are converted to strings and a
-string comparison is performed.
-
-Uninitialized variables have the string value @code{""} (the null, or
-empty, string). In contexts where a number is required, this is
-equivalent to 0.
-
-@xref{Variables}, for more information on variable naming and initialization;
-@pxref{Conversion, ,Conversion of Strings and Numbers}, for more information
-on how variable values are interpreted.@refill
-
-@node Rules Summary, Functions Summary, Variables/Fields, Gawk Summary
-@appendixsec Patterns and Actions
-
-@menu
-* Pattern Summary:: Quick overview of patterns.
-* Regexp Summary:: Quick overview of regular expressions.
-* Actions Summary:: Quick overview of actions.
-@end menu
-
-An @code{awk} program is mostly composed of rules, each consisting of a
-pattern followed by an action. The action is enclosed in @samp{@{} and
-@samp{@}}. Either the pattern may be missing, or the action may be
-missing, but, of course, not both. If the pattern is missing, the
-action is executed for every single line of input. A missing action is
-equivalent to this action,
-
-@example
-@{ print @}
-@end example
-
-@noindent
-which prints the entire line.
-
-Comments begin with the @samp{#} character, and continue until the end of the
-line. Blank lines may be used to separate statements. Normally, a statement
-ends with a newline, however, this is not the case for lines ending in a
-@samp{,}, @samp{@{}, @samp{?}, @samp{:}, @samp{&&}, or @samp{||}. Lines
-ending in @code{do} or @code{else} also have their statements automatically
-continued on the following line. In other cases, a line can be continued by
-ending it with a @samp{\}, in which case the newline is ignored.@refill
-
-Multiple statements may be put on one line by separating them with a @samp{;}.
-This applies to both the statements within the action part of a rule (the
-usual case), and to the rule statements.
-
-@xref{Comments, ,Comments in @code{awk} Programs}, for information on
-@code{awk}'s commenting convention;
-@pxref{Statements/Lines, ,@code{awk} Statements versus Lines}, for a
-description of the line continuation mechanism in @code{awk}.@refill
-
-@node Pattern Summary, Regexp Summary, Rules Summary, Rules Summary
-@appendixsubsec Patterns
-
-@code{awk} patterns may be one of the following:
-
-@example
-/@var{regular expression}/
-@var{relational expression}
-@var{pattern} && @var{pattern}
-@var{pattern} || @var{pattern}
-@var{pattern} ? @var{pattern} : @var{pattern}
-(@var{pattern})
-! @var{pattern}
-@var{pattern1}, @var{pattern2}
-BEGIN
-END
-@end example
-
-@code{BEGIN} and @code{END} are two special kinds of patterns that are not
-tested against the input. The action parts of all @code{BEGIN} rules are
-merged as if all the statements had been written in a single @code{BEGIN}
-rule. They are executed before any of the input is read. Similarly, all the
-@code{END} rules are merged, and executed when all the input is exhausted (or
-when an @code{exit} statement is executed). @code{BEGIN} and @code{END}
-patterns cannot be combined with other patterns in pattern expressions.
-@code{BEGIN} and @code{END} rules cannot have missing action parts.@refill
-
-For @samp{/@var{regular-expression}/} patterns, the associated statement is
-executed for each input line that matches the regular expression. Regular
-expressions are extensions of those in @code{egrep}, and are summarized below.
-
-A @var{relational expression} may use any of the operators defined below in
-the section on actions. These generally test whether certain fields match
-certain regular expressions.
-
-The @samp{&&}, @samp{||}, and @samp{!} operators are logical ``and,''
-logical ``or,'' and logical ``not,'' respectively, as in C. They do
-short-circuit evaluation, also as in C, and are used for combining more
-primitive pattern expressions. As in most languages, parentheses may be
-used to change the order of evaluation.
-
-The @samp{?:} operator is like the same operator in C. If the first
-pattern matches, then the second pattern is matched against the input
-record; otherwise, the third is matched. Only one of the second and
-third patterns is matched.
-
-The @samp{@var{pattern1}, @var{pattern2}} form of a pattern is called a
-range pattern. It matches all input lines starting with a line that
-matches @var{pattern1}, and continuing until a line that matches
-@var{pattern2}, inclusive. A range pattern cannot be used as an operand
-to any of the pattern operators.
-
-@xref{Patterns}, for a full description of the pattern part of @code{awk}
-rules.
-
-@node Regexp Summary, Actions Summary, Pattern Summary, Rules Summary
-@appendixsubsec Regular Expressions
-
-Regular expressions are the extended kind found in @code{egrep}.
-They are composed of characters as follows:
-
-@table @code
-@item @var{c}
-matches the character @var{c} (assuming @var{c} is a character with no
-special meaning in regexps).
-
-@item \@var{c}
-matches the literal character @var{c}.
-
-@item .
-matches any character except newline.
-
-@item ^
-matches the beginning of a line or a string.
-
-@item $
-matches the end of a line or a string.
-
-@item [@var{abc}@dots{}]
-matches any of the characters @var{abc}@dots{} (character class).
-
-@item [^@var{abc}@dots{}]
-matches any character except @var{abc}@dots{} and newline (negated
-character class).
-
-@item @var{r1}|@var{r2}
-matches either @var{r1} or @var{r2} (alternation).
-
-@item @var{r1r2}
-matches @var{r1}, and then @var{r2} (concatenation).
-
-@item @var{r}+
-matches one or more @var{r}'s.
-
-@item @var{r}*
-matches zero or more @var{r}'s.
-
-@item @var{r}?
-matches zero or one @var{r}'s.
-
-@item (@var{r})
-matches @var{r} (grouping).
-@end table
-
-@xref{Regexp, ,Regular Expressions as Patterns}, for a more detailed
-explanation of regular expressions.
-
-The escape sequences allowed in string constants are also valid in
-regular expressions (@pxref{Constants, ,Constant Expressions}).
-
-@node Actions Summary, , Regexp Summary, Rules Summary
-@appendixsubsec Actions
-
-Action statements are enclosed in braces, @samp{@{} and @samp{@}}.
-Action statements consist of the usual assignment, conditional, and looping
-statements found in most languages. The operators, control statements,
-and input/output statements available are patterned after those in C.
-
-@menu
-* Operator Summary:: @code{awk} operators.
-* Control Flow Summary:: The control statements.
-* I/O Summary:: The I/O statements.
-* Printf Summary:: A summary of @code{printf}.
-* Special File Summary:: Special file names interpreted internally.
-* Numeric Functions Summary:: Built-in numeric functions.
-* String Functions Summary:: Built-in string functions.
-* Time Functions Summary:: Built-in time functions.
-* String Constants Summary:: Escape sequences in strings.
-@end menu
-
-@node Operator Summary, Control Flow Summary, Actions Summary, Actions Summary
-@appendixsubsubsec Operators
-
-The operators in @code{awk}, in order of increasing precedence, are:
-
-@table @code
-@item = += -= *= /= %= ^=
-Assignment. Both absolute assignment (@code{@var{var}=@var{value}})
-and operator assignment (the other forms) are supported.
-
-@item ?:
-A conditional expression, as in C. This has the form @code{@var{expr1} ?
-@var{expr2} : @var{expr3}}. If @var{expr1} is true, the value of the
-expression is @var{expr2}; otherwise it is @var{expr3}. Only one of
-@var{expr2} and @var{expr3} is evaluated.@refill
-
-@item ||
-Logical ``or''.
-
-@item &&
-Logical ``and''.
-
-@item ~ !~
-Regular expression match, negated match.
-
-@item < <= > >= != ==
-The usual relational operators.
-
-@item @var{blank}
-String concatenation.
-
-@item + -
-Addition and subtraction.
-
-@item * / %
-Multiplication, division, and modulus.
-
-@item + - !
-Unary plus, unary minus, and logical negation.
-
-@item ^
-Exponentiation (@samp{**} may also be used, and @samp{**=} for the assignment
-operator, but they are not specified in the @sc{posix} standard).
-
-@item ++ --
-Increment and decrement, both prefix and postfix.
-
-@item $
-Field reference.
-@end table
-
-@xref{Expressions, ,Expressions as Action Statements}, for a full
-description of all the operators listed above.
-@xref{Fields, ,Examining Fields}, for a description of the field
-reference operator.@refill
-
-@node Control Flow Summary, I/O Summary, Operator Summary, Actions Summary
-@appendixsubsubsec Control Statements
-
-The control statements are as follows:
-
-@example
-if (@var{condition}) @var{statement} @r{[} else @var{statement} @r{]}
-while (@var{condition}) @var{statement}
-do @var{statement} while (@var{condition})
-for (@var{expr1}; @var{expr2}; @var{expr3}) @var{statement}
-for (@var{var} in @var{array}) @var{statement}
-break
-continue
-delete @var{array}[@var{index}]
-exit @r{[} @var{expression} @r{]}
-@{ @var{statements} @}
-@end example
-
-@xref{Statements, ,Control Statements in Actions}, for a full description
-of all the control statements listed above.
-
-@node I/O Summary, Printf Summary, Control Flow Summary, Actions Summary
-@appendixsubsubsec I/O Statements
-
-The input/output statements are as follows:
-
-@table @code
-@item getline
-Set @code{$0} from next input record; set @code{NF}, @code{NR}, @code{FNR}.
-
-@item getline <@var{file}
-Set @code{$0} from next record of @var{file}; set @code{NF}.
-
-@item getline @var{var}
-Set @var{var} from next input record; set @code{NF}, @code{FNR}.
-
-@item getline @var{var} <@var{file}
-Set @var{var} from next record of @var{file}.
-
-@item next
-Stop processing the current input record. The next input record is read and
-processing starts over with the first pattern in the @code{awk} program.
-If the end of the input data is reached, the @code{END} rule(s), if any,
-are executed.
-
-@item next file
-Stop processing the current input file. The next input record read comes
-from the next input file. @code{FILENAME} is updated, @code{FNR} is set to 1,
-and processing starts over with the first pattern in the @code{awk} program.
-If the end of the input data is reached, the @code{END} rule(s), if any,
-are executed.
-
-@item print
-Prints the current record.
-
-@item print @var{expr-list}
-Prints expressions.
-
-@item print @var{expr-list} > @var{file}
-Prints expressions on @var{file}.
-
-@item printf @var{fmt, expr-list}
-Format and print.
-
-@item printf @var{fmt, expr-list} > file
-Format and print on @var{file}.
-@end table
-
-Other input/output redirections are also allowed. For @code{print} and
-@code{printf}, @samp{>> @var{file}} appends output to the @var{file},
-and @samp{| @var{command}} writes on a pipe. In a similar fashion,
-@samp{@var{command} | getline} pipes input into @code{getline}.
-@code{getline} returns 0 on end of file, and @minus{}1 on an error.@refill
-
-@xref{Getline, ,Explicit Input with @code{getline}}, for a full description
-of the @code{getline} statement.
-@xref{Printing, ,Printing Output}, for a full description of @code{print} and
-@code{printf}. Finally, @pxref{Next Statement, ,The @code{next} Statement},
-for a description of how the @code{next} statement works.@refill
-
-@node Printf Summary, Special File Summary, I/O Summary, Actions Summary
-@appendixsubsubsec @code{printf} Summary
-
-The @code{awk} @code{printf} statement and @code{sprintf} function
-accept the following conversion specification formats:
-
-@table @code
-@item %c
-An ASCII character. If the argument used for @samp{%c} is numeric, it is
-treated as a character and printed. Otherwise, the argument is assumed to
-be a string, and the only first character of that string is printed.
-
-@item %d
-@itemx %i
-A decimal number (the integer part).
-
-@item %e
-A floating point number of the form
-@samp{@r{[}-@r{]}d.ddddddE@r{[}+-@r{]}dd}.@refill
-
-@item %f
-A floating point number of the form
-@r{[}@code{-}@r{]}@code{ddd.dddddd}.
-
-@item %g
-Use @samp{%e} or @samp{%f} conversion, whichever produces a shorter string,
-with nonsignificant zeros suppressed.
-
-@item %o
-An unsigned octal number (again, an integer).
-
-@item %s
-A character string.
-
-@item %x
-An unsigned hexadecimal number (an integer).
-
-@item %X
-Like @samp{%x}, except use @samp{A} through @samp{F} instead of @samp{a}
-through @samp{f} for decimal 10 through 15.@refill
-
-@item %%
-A single @samp{%} character; no argument is converted.
-@end table
-
-There are optional, additional parameters that may lie between the @samp{%}
-and the control letter:
-
-@table @code
-@item -
-The expression should be left-justified within its field.
-
-@item @var{width}
-The field should be padded to this width. If @var{width} has a leading zero,
-then the field is padded with zeros. Otherwise it is padded with blanks.
-
-@item .@var{prec}
-A number indicating the maximum width of strings or digits to the right
-of the decimal point.
-@end table
-
-Either or both of the @var{width} and @var{prec} values may be specified
-as @samp{*}. In that case, the particular value is taken from the argument
-list.
-
-@xref{Printf, ,Using @code{printf} Statements for Fancier Printing}, for
-examples and for a more detailed description.
-
-@node Special File Summary, Numeric Functions Summary, Printf Summary, Actions Summary
-@appendixsubsubsec Special File Names
-
-When doing I/O redirection from either @code{print} or @code{printf} into a
-file, or via @code{getline} from a file, @code{gawk} recognizes certain special
-file names internally. These file names allow access to open file descriptors
-inherited from @code{gawk}'s parent process (usually the shell). The
-file names are:
-
-@table @file
-@item /dev/stdin
-The standard input.
-
-@item /dev/stdout
-The standard output.
-
-@item /dev/stderr
-The standard error output.
-
-@item /dev/fd/@var{n}
-The file denoted by the open file descriptor @var{n}.
-@end table
-
-In addition the following files provide process related information
-about the running @code{gawk} program.
-
-@table @file
-@item /dev/pid
-Reading this file returns the process ID of the current process,
-in decimal, terminated with a newline.
-
-@item /dev/ppid
-Reading this file returns the parent process ID of the current process,
-in decimal, terminated with a newline.
-
-@item /dev/pgrpid
-Reading this file returns the process group ID of the current process,
-in decimal, terminated with a newline.
-
-@item /dev/user
-Reading this file returns a single record terminated with a newline.
-The fields are separated with blanks. The fields represent the
-following information:
-
-@table @code
-@item $1
-The value of the @code{getuid} system call.
-
-@item $2
-The value of the @code{geteuid} system call.
-
-@item $3
-The value of the @code{getgid} system call.
-
-@item $4
-The value of the @code{getegid} system call.
-@end table
-
-If there are any additional fields, they are the group IDs returned by
-@code{getgroups} system call.
-(Multiple groups may not be supported on all systems.)@refill
-@end table
-
-@noindent
-These file names may also be used on the command line to name data files.
-These file names are only recognized internally if you do not
-actually have files by these names on your system.
-
-@xref{Special Files, ,Standard I/O Streams}, for a longer description that
-provides the motivation for this feature.
-
-@node Numeric Functions Summary, String Functions Summary, Special File Summary, Actions Summary
-@appendixsubsubsec Numeric Functions
-
-@code{awk} has the following predefined arithmetic functions:
-
-@table @code
-@item atan2(@var{y}, @var{x})
-returns the arctangent of @var{y/x} in radians.
-
-@item cos(@var{expr})
-returns the cosine in radians.
-
-@item exp(@var{expr})
-the exponential function.
-
-@item int(@var{expr})
-truncates to integer.
-
-@item log(@var{expr})
-the natural logarithm function.
-
-@item rand()
-returns a random number between 0 and 1.
-
-@item sin(@var{expr})
-returns the sine in radians.
-
-@item sqrt(@var{expr})
-the square root function.
-
-@item srand(@var{expr})
-use @var{expr} as a new seed for the random number generator. If no @var{expr}
-is provided, the time of day is used. The return value is the previous
-seed for the random number generator.
-@end table
-
-@node String Functions Summary, Time Functions Summary, Numeric Functions Summary, Actions Summary
-@appendixsubsubsec String Functions
-
-@code{awk} has the following predefined string functions:
-
-@table @code
-@item gsub(@var{r}, @var{s}, @var{t})
-for each substring matching the regular expression @var{r} in the string
-@var{t}, substitute the string @var{s}, and return the number of substitutions.
-If @var{t} is not supplied, use @code{$0}.
-
-@item index(@var{s}, @var{t})
-returns the index of the string @var{t} in the string @var{s}, or 0 if
-@var{t} is not present.
-
-@item length(@var{s})
-returns the length of the string @var{s}. The length of @code{$0}
-is returned if no argument is supplied.
-
-@item match(@var{s}, @var{r})
-returns the position in @var{s} where the regular expression @var{r}
-occurs, or 0 if @var{r} is not present, and sets the values of @code{RSTART}
-and @code{RLENGTH}.
-
-@item split(@var{s}, @var{a}, @var{r})
-splits the string @var{s} into the array @var{a} on the regular expression
-@var{r}, and returns the number of fields. If @var{r} is omitted, @code{FS}
-is used instead.
-
-@item sprintf(@var{fmt}, @var{expr-list})
-prints @var{expr-list} according to @var{fmt}, and returns the resulting string.
-
-@item sub(@var{r}, @var{s}, @var{t})
-this is just like @code{gsub}, but only the first matching substring is
-replaced.
-
-@item substr(@var{s}, @var{i}, @var{n})
-returns the @var{n}-character substring of @var{s} starting at @var{i}.
-If @var{n} is omitted, the rest of @var{s} is used.
-
-@item tolower(@var{str})
-returns a copy of the string @var{str}, with all the upper-case characters in
-@var{str} translated to their corresponding lower-case counterparts.
-Nonalphabetic characters are left unchanged.
-
-@item toupper(@var{str})
-returns a copy of the string @var{str}, with all the lower-case characters in
-@var{str} translated to their corresponding upper-case counterparts.
-Nonalphabetic characters are left unchanged.
-
-@item system(@var{cmd-line})
-Execute the command @var{cmd-line}, and return the exit status.
-@end table
-
-@node Time Functions Summary, String Constants Summary, String Functions Summary, Actions Summary
-@appendixsubsubsec Built-in time functions
-
-The following two functions are available for getting the current
-time of day, and for formatting time stamps.
-
-@table @code
-@item systime()
-returns the current time of day as the number of seconds since a particular
-epoch (Midnight, January 1, 1970 @sc{utc}, on @sc{posix} systems).
-
-@item strftime(@var{format}, @var{timestamp})
-formats @var{timestamp} according to the specification in @var{format}.
-The current time of day is used if no @var{timestamp} is supplied.
-@xref{Time Functions, ,Functions for Dealing with Time Stamps}, for the
-details on the conversion specifiers that @code{strftime} accepts.@refill
-@end table
-
-@iftex
-@xref{Built-in, ,Built-in Functions}, for a description of all of
-@code{awk}'s built-in functions.
-@end iftex
-
-@node String Constants Summary, , Time Functions Summary, Actions Summary
-@appendixsubsubsec String Constants
-
-String constants in @code{awk} are sequences of characters enclosed
-between double quotes (@code{"}). Within strings, certain @dfn{escape sequences}
-are recognized, as in C. These are:
-
-@table @code
-@item \\
-A literal backslash.
-
-@item \a
-The ``alert'' character; usually the ASCII BEL character.
-
-@item \b
-Backspace.
-
-@item \f
-Formfeed.
-
-@item \n
-Newline.
-
-@item \r
-Carriage return.
-
-@item \t
-Horizontal tab.
-
-@item \v
-Vertical tab.
-
-@item \x@var{hex digits}
-The character represented by the string of hexadecimal digits following
-the @samp{\x}. As in @sc{ansi} C, all following hexadecimal digits are
-considered part of the escape sequence. (This feature should tell us
-something about language design by committee.) E.g., @code{"\x1B"} is a
-string containing the ASCII ESC (escape) character. (The @samp{\x}
-escape sequence is not in @sc{posix} @code{awk}.)
-
-@item \@var{ddd}
-The character represented by the 1-, 2-, or 3-digit sequence of octal
-digits. Thus, @code{"\033"} is also a string containing the ASCII ESC
-(escape) character.
-
-@item \@var{c}
-The literal character @var{c}.
-@end table
-
-The escape sequences may also be used inside constant regular expressions
-(e.g., the regexp @code{@w{/[@ \t\f\n\r\v]/}} matches whitespace
-characters).@refill
-
-@xref{Constants, ,Constant Expressions}.
-
-@node Functions Summary, Historical Features, Rules Summary, Gawk Summary
-@appendixsec Functions
-
-Functions in @code{awk} are defined as follows:
-
-@example
-function @var{name}(@var{parameter list}) @{ @var{statements} @}
-@end example
-
-Actual parameters supplied in the function call are used to instantiate
-the formal parameters declared in the function. Arrays are passed by
-reference, other variables are passed by value.
-
-If there are fewer arguments passed than there are names in @var{parameter-list},
-the extra names are given the null string as value. Extra names have the
-effect of local variables.
-
-The open-parenthesis in a function call of a user-defined function must
-immediately follow the function name, without any intervening white space.
-This is to avoid a syntactic ambiguity with the concatenation operator.
-
-The word @code{func} may be used in place of @code{function} (but not in
-@sc{posix} @code{awk}).
-
-Use the @code{return} statement to return a value from a function.
-
-@xref{User-defined, ,User-defined Functions}, for a more complete description.
-
-@node Historical Features, , Functions Summary, Gawk Summary
-@appendixsec Historical Features
-
-There are two features of historical @code{awk} implementations that
-@code{gawk} supports. First, it is possible to call the @code{length}
-built-in function not only with no arguments, but even without parentheses!
-
-@example
-a = length
-@end example
-
-@noindent
-is the same as either of
-
-@example
-a = length()
-a = length($0)
-@end example
-
-@noindent
-This feature is marked as ``deprecated'' in the @sc{posix} standard, and
-@code{gawk} will issue a warning about its use if @samp{-W lint} is
-specified on the command line.
-
-The other feature is the use of the @code{continue} statement outside the
-body of a @code{while}, @code{for}, or @code{do} loop. Traditional
-@code{awk} implementations have treated such usage as equivalent to the
-@code{next} statement. @code{gawk} will support this usage if @samp{-W posix}
-has not been specified.
-
-@node Sample Program, Bugs, Gawk Summary, Top
-@appendix Sample Program
-
-The following example is a complete @code{awk} program, which prints
-the number of occurrences of each word in its input. It illustrates the
-associative nature of @code{awk} arrays by using strings as subscripts. It
-also demonstrates the @samp{for @var{x} in @var{array}} construction.
-Finally, it shows how @code{awk} can be used in conjunction with other
-utility programs to do a useful task of some complexity with a minimum of
-effort. Some explanations follow the program listing.@refill
-
-@example
-awk '
-# Print list of word frequencies
-@{
- for (i = 1; i <= NF; i++)
- freq[$i]++
-@}
-
-END @{
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
-@}'
-@end example
-
-The first thing to notice about this program is that it has two rules. The
-first rule, because it has an empty pattern, is executed on every line of
-the input. It uses @code{awk}'s field-accessing mechanism
-(@pxref{Fields, ,Examining Fields}) to pick out the individual words from
-the line, and the built-in variable @code{NF} (@pxref{Built-in Variables})
-to know how many fields are available.@refill
-
-For each input word, an element of the array @code{freq} is incremented to
-reflect that the word has been seen an additional time.@refill
-
-The second rule, because it has the pattern @code{END}, is not executed
-until the input has been exhausted. It prints out the contents of the
-@code{freq} table that has been built up inside the first action.@refill
-
-Note that this program has several problems that would prevent it from being
-useful by itself on real text files:@refill
-
-@itemize @bullet
-@item
-Words are detected using the @code{awk} convention that fields are
-separated by whitespace and that other characters in the input (except
-newlines) don't have any special meaning to @code{awk}. This means that
-punctuation characters count as part of words.@refill
-
-@item
-The @code{awk} language considers upper and lower case characters to be
-distinct. Therefore, @samp{foo} and @samp{Foo} are not treated by this
-program as the same word. This is undesirable since in normal text, words
-are capitalized if they begin sentences, and a frequency analyzer should not
-be sensitive to that.@refill
-
-@item
-The output does not come out in any useful order. You're more likely to be
-interested in which words occur most frequently, or having an alphabetized
-table of how frequently each word occurs.@refill
-@end itemize
-
-The way to solve these problems is to use some of the more advanced
-features of the @code{awk} language. First, we use @code{tolower} to remove
-case distinctions. Next, we use @code{gsub} to remove punctuation
-characters. Finally, we use the system @code{sort} utility to process the
-output of the @code{awk} script. First, here is the new version of
-the program:@refill
-
-@example
-awk '
-# Print list of word frequencies
-@{
- $0 = tolower($0) # remove case distinctions
- gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation
- for (i = 1; i <= NF; i++)
- freq[$i]++
-@}
-
-END @{
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
-@}'
-@end example
-
-Assuming we have saved this program in a file named @file{frequency.awk},
-and that the data is in @file{file1}, the following pipeline
-
-@example
-awk -f frequency.awk file1 | sort +1 -nr
-@end example
-
-@noindent
-produces a table of the words appearing in @file{file1} in order of
-decreasing frequency.
-
-The @code{awk} program suitably massages the data and produces a word
-frequency table, which is not ordered.
-
-The @code{awk} script's output is then sorted by the @code{sort} command and
-printed on the terminal. The options given to @code{sort} in this example
-specify to sort using the second field of each input line (skipping one field),
-that the sort keys should be treated as numeric quantities (otherwise
-@samp{15} would come before @samp{5}), and that the sorting should be done
-in descending (reverse) order.@refill
-
-We could have even done the @code{sort} from within the program, by
-changing the @code{END} action to:
-
-@example
-END @{
- sort = "sort +1 -nr"
- for (word in freq)
- printf "%s\t%d\n", word, freq[word] | sort
- close(sort)
-@}'
-@end example
-
-See the general operating system documentation for more information on how
-to use the @code{sort} command.@refill
-
-@ignore
-@strong{ADR: I have some more substantial programs courtesy of Rick Adams
-at UUNET. I am planning on incorporating those either in addition to or
-instead of this program.}
-
-@strong{I would also like to incorporate the general @code{translate}
-function that I have written.}
-
-@strong{I have a ton of other sample programs to include too.}
-@end ignore
-
-@node Bugs, Notes, Sample Program, Top
-@appendix Reporting Problems and Bugs
-
-@c This chapter stolen shamelessly from the GNU m4 manual.
-@c This chapter has been unshamelessly altered to emulate changes made to
-@c make.texi from whence it was originally shamelessly stolen! :-} --mew
-
-If you have problems with @code{gawk} or think that you have found a bug,
-please report it to the developers; we cannot promise to do anything
-but we might well want to fix it.
-
-Before reporting a bug, make sure you have actually found a real bug.
-Carefully reread the documentation and see if it really says you can do
-what you're trying to do. If it's not clear whether you should be able
-to do something or not, report that too; it's a bug in the documentation!
-
-Before reporting a bug or trying to fix it yourself, try to isolate it
-to the smallest possible @code{awk} program and input data file that
-reproduces the problem. Then send us the program and data file,
-some idea of what kind of Unix system you're using, and the exact results
-@code{gawk} gave you. Also say what you expected to occur; this will help
-us decide whether the problem was really in the documentation.
-
-Once you have a precise problem, send e-mail to (Internet)
-@samp{bug-gnu-utils@@prep.ai.mit.edu} or (UUCP)
-@samp{mit-eddie!prep.ai.mit.edu!bug-gnu-utils}. Please include the
-version number of @code{gawk} you are using. You can get this information
-with the command @samp{gawk -W version '@{@}' /dev/null}.
-You should send carbon copies of your mail to David Trueman at
-@samp{david@@cs.dal.ca}, and to Arnold Robbins, who can be reached at
-@samp{arnold@@skeeve.atl.ga.us}. David is most likely to fix code
-problems, while Arnold is most likely to fix documentation problems.@refill
-
-Non-bug suggestions are always welcome as well. If you have questions
-about things that are unclear in the documentation or are just obscure
-features, ask Arnold Robbins; he will try to help you out, although he
-may not have the time to fix the problem. You can send him electronic mail at the Internet address
-above.
-
-If you find bugs in one of the non-Unix ports of @code{gawk}, please send
-an electronic mail message to the person who maintains that port. They
-are listed below, and also in the @file{README} file in the @code{gawk}
-distribution. Information in the @code{README} file should be considered
-authoritative if it conflicts with this manual.
-
-The people maintaining the non-Unix ports of @code{gawk} are:
-
-@table @asis
-@item MS-DOS
-The port to MS-DOS is maintained by Scott Deifik.
-His electronic mail address is @samp{scottd@@amgen.com}.
-
-@item VMS
-The port to VAX VMS is maintained by Pat Rankin.
-His electronic mail address is @samp{rankin@@eql.caltech.edu}.
-
-@item Atari ST
-The port to the Atari ST is maintained by Michal Jaegermann.
-His electronic mail address is @samp{ntomczak@@vm.ucs.ualberta.ca}.
-
-@end table
-
-If your bug is also reproducible under Unix, please send copies of your
-report to the general GNU bug list, as well as to Arnold Robbins and David
-Trueman, at the addresses listed above.
-
-@node Notes, Glossary, Bugs, Top
-@appendix Implementation Notes
-
-This appendix contains information mainly of interest to implementors and
-maintainers of @code{gawk}. Everything in it applies specifically to
-@code{gawk}, and not to other implementations.
-
-@menu
-* Compatibility Mode:: How to disable certain @code{gawk} extensions.
-* Future Extensions:: New features we may implement soon.
-* Improvements:: Suggestions for improvements by volunteers.
-@end menu
-
-@node Compatibility Mode, Future Extensions, Notes, Notes
-@appendixsec Downward Compatibility and Debugging
-
-@xref{POSIX/GNU, ,Extensions in @code{gawk} not in POSIX @code{awk}},
-for a summary of the GNU extensions to the @code{awk} language and program.
-All of these features can be turned off by invoking @code{gawk} with the
-@samp{-W compat} option, or with the @samp{-W posix} option.@refill
-
-If @code{gawk} is compiled for debugging with @samp{-DDEBUG}, then there
-is one more option available on the command line:
-
-@table @samp
-@item -W parsedebug
-Print out the parse stack information as the program is being parsed.
-@end table
-
-This option is intended only for serious @code{gawk} developers,
-and not for the casual user. It probably has not even been compiled into
-your version of @code{gawk}, since it slows down execution.
-
-@node Future Extensions, Improvements, Compatibility Mode, Notes
-@appendixsec Probable Future Extensions
-
-This section briefly lists extensions that indicate the directions we are
-currently considering for @code{gawk}. The file @file{FUTURES} in the
-@code{gawk} distributions lists these extensions, as well as several others.
-
-@table @asis
-@item @code{RS} as a regexp
-The meaning of @code{RS} may be generalized along the lines of @code{FS}.
-
-@item Control of subprocess environment
-Changes made in @code{gawk} to the array @code{ENVIRON} may be
-propagated to subprocesses run by @code{gawk}.
-
-@item Databases
-It may be possible to map a GDBM/NDBM/SDBM file into an @code{awk} array.
-
-@item Single-character fields
-The null string, @code{""}, as a field separator, will cause field
-splitting and the @code{split} function to separate individual characters.
-Thus, @code{split(a, "abcd", "")} would yield @code{a[1] == "a"},
-@code{a[2] == "b"}, and so on.
-
-@item More @code{lint} warnings
-There are more things that could be checked for portability.
-
-@item @code{RECLEN} variable for fixed length records
-Along with @code{FIELDWIDTHS}, this would speed up the processing of
-fixed-length records.
-
-@item @code{RT} variable to hold the record terminator
-It is occasionally useful to have access to the actual string of
-characters that matched the @code{RS} variable. The @code{RT}
-variable would hold these characters.
-
-@item A @code{restart} keyword
-After modifying @code{$0}, @code{restart} would restart the pattern
-matching loop, without reading a new record from the input.
-
-@item A @samp{|&} redirection
-The @samp{|&} redirection, in place of @samp{|}, would open a two-way
-pipeline for communication with a sub-process (via @code{getline} and
-@code{print} and @code{printf}).
-
-@item @code{IGNORECASE} affecting all comparisons
-The effects of the @code{IGNORECASE} variable may be generalized to
-all string comparisons, and not just regular expression operations.
-
-@item A way to mix command line source code and library files
-There may be a new option that would make it possible to easily use library
-functions from a program entered on the command line.
-@c probably a @samp{-s} option...
-
-@item GNU-style long options
-We will add GNU-style long options
-to @code{gawk} for compatibility with other GNU programs.
-(For example, @samp{--field-separator=:} would be equivalent to
-@samp{-F:}.)@refill
-
-@c this is @emph{very} long term --- not worth including right now.
-@ignore
-@item The C Comma Operator
-We may add the C comma operator, which takes the form
-@code{@var{expr1},@var{expr2}}. The first expression is evaluated, and the
-result is thrown away. The value of the full expression is the value of
-@var{expr2}.@refill
-@end ignore
-@end table
-
-@node Improvements, , Future Extensions, Notes
-@appendixsec Suggestions for Improvements
-
-Here are some projects that would-be @code{gawk} hackers might like to take
-on. They vary in size from a few days to a few weeks of programming,
-depending on which one you choose and how fast a programmer you are. Please
-send any improvements you write to the maintainers at the GNU
-project.@refill
-
-@enumerate
-@item
-Compilation of @code{awk} programs: @code{gawk} uses a Bison (YACC-like)
-parser to convert the script given it into a syntax tree; the syntax
-tree is then executed by a simple recursive evaluator. This method incurs
-a lot of overhead, since the recursive evaluator performs many procedure
-calls to do even the simplest things.@refill
-
-It should be possible for @code{gawk} to convert the script's parse tree
-into a C program which the user would then compile, using the normal
-C compiler and a special @code{gawk} library to provide all the needed
-functions (regexps, fields, associative arrays, type coercion, and so
-on).@refill
-
-An easier possibility might be for an intermediate phase of @code{awk} to
-convert the parse tree into a linear byte code form like the one used
-in GNU Emacs Lisp. The recursive evaluator would then be replaced by
-a straight line byte code interpreter that would be intermediate in speed
-between running a compiled program and doing what @code{gawk} does
-now.@refill
-
-This may actually happen for the 3.0 version of @code{gawk}.
-
-@item
-An error message section has not been included in this version of the
-manual. Perhaps some nice beta testers will document some of the messages
-for the future.
-
-@item
-The programs in the test suite could use documenting in this manual.
-
-@item
-The programs and data files in the manual should be available in
-separate files to facilitate experimentation.
-
-@item
-See the @file{FUTURES} file for more ideas. Contact us if you would
-seriously like to tackle any of the items listed there.
-@end enumerate
-
-@node Glossary, Index, Notes, Top
-@appendix Glossary
-
-@table @asis
-@item Action
-A series of @code{awk} statements attached to a rule. If the rule's
-pattern matches an input record, the @code{awk} language executes the
-rule's action. Actions are always enclosed in curly braces.
-@xref{Actions, ,Overview of Actions}.@refill
-
-@item Amazing @code{awk} Assembler
-Henry Spencer at the University of Toronto wrote a retargetable assembler
-completely as @code{awk} scripts. It is thousands of lines long, including
-machine descriptions for several 8-bit microcomputers.
-@c It is distributed with @code{gawk} (as part of the test suite) and
-It is a good example of a
-program that would have been better written in another language.@refill
-
-@item @sc{ansi}
-The American National Standards Institute. This organization produces
-many standards, among them the standard for the C programming language.
-
-@item Assignment
-An @code{awk} expression that changes the value of some @code{awk}
-variable or data object. An object that you can assign to is called an
-@dfn{lvalue}. @xref{Assignment Ops, ,Assignment Expressions}.@refill
-
-@item @code{awk} Language
-The language in which @code{awk} programs are written.
-
-@item @code{awk} Program
-An @code{awk} program consists of a series of @dfn{patterns} and
-@dfn{actions}, collectively known as @dfn{rules}. For each input record
-given to the program, the program's rules are all processed in turn.
-@code{awk} programs may also contain function definitions.@refill
-
-@item @code{awk} Script
-Another name for an @code{awk} program.
-
-@item Built-in Function
-The @code{awk} language provides built-in functions that perform various
-numerical, time stamp related, and string computations. Examples are
-@code{sqrt} (for the square root of a number) and @code{substr} (for a
-substring of a string). @xref{Built-in, ,Built-in Functions}.@refill
-
-@item Built-in Variable
-@code{ARGC}, @code{ARGIND}, @code{ARGV}, @code{CONVFMT}, @code{ENVIRON},
-@code{ERRNO}, @code{FIELDWIDTHS}, @code{FILENAME}, @code{FNR}, @code{FS},
-@code{IGNORECASE}, @code{NF}, @code{NR}, @code{OFMT}, @code{OFS}, @code{ORS},
-@code{RLENGTH}, @code{RSTART}, @code{RS}, and @code{SUBSEP},
-are the variables that have special
-meaning to @code{awk}. Changing some of them affects @code{awk}'s running
-environment. @xref{Built-in Variables}.@refill
-
-@item Braces
-See ``Curly Braces.''
-
-@item C
-The system programming language that most GNU software is written in. The
-@code{awk} programming language has C-like syntax, and this manual
-points out similarities between @code{awk} and C when appropriate.@refill
-
-@item CHEM
-A preprocessor for @code{pic} that reads descriptions of molecules
-and produces @code{pic} input for drawing them. It was written by
-Brian Kernighan, and is available from @code{netlib@@research.att.com}.@refill
-
-@item Compound Statement
-A series of @code{awk} statements, enclosed in curly braces. Compound
-statements may be nested.
-@xref{Statements, ,Control Statements in Actions}.@refill
-
-@item Concatenation
-Concatenating two strings means sticking them together, one after another,
-giving a new string. For example, the string @samp{foo} concatenated with
-the string @samp{bar} gives the string @samp{foobar}.
-@xref{Concatenation, ,String Concatenation}.@refill
-
-@item Conditional Expression
-An expression using the @samp{?:} ternary operator, such as
-@code{@var{expr1} ? @var{expr2} : @var{expr3}}. The expression
-@var{expr1} is evaluated; if the result is true, the value of the whole
-expression is the value of @var{expr2} otherwise the value is
-@var{expr3}. In either case, only one of @var{expr2} and @var{expr3}
-is evaluated. @xref{Conditional Exp, ,Conditional Expressions}.@refill
-
-@item Constant Regular Expression
-A constant regular expression is a regular expression written within
-slashes, such as @samp{/foo/}. This regular expression is chosen
-when you write the @code{awk} program, and cannot be changed doing
-its execution. @xref{Regexp Usage, ,How to Use Regular Expressions}.
-
-@item Comparison Expression
-A relation that is either true or false, such as @code{(a < b)}.
-Comparison expressions are used in @code{if}, @code{while}, and @code{for}
-statements, and in patterns to select which input records to process.
-@xref{Comparison Ops, ,Comparison Expressions}.@refill
-
-@item Curly Braces
-The characters @samp{@{} and @samp{@}}. Curly braces are used in
-@code{awk} for delimiting actions, compound statements, and function
-bodies.@refill
-
-@item Data Objects
-These are numbers and strings of characters. Numbers are converted into
-strings and vice versa, as needed.
-@xref{Conversion, ,Conversion of Strings and Numbers}.@refill
-
-@item Dynamic Regular Expression
-A dynamic regular expression is a regular expression written as an
-ordinary expression. It could be a string constant, such as
-@code{"foo"}, but it may also be an expression whose value may vary.
-@xref{Regexp Usage, ,How to Use Regular Expressions}.
-
-@item Escape Sequences
-A special sequence of characters used for describing nonprinting
-characters, such as @samp{\n} for newline, or @samp{\033} for the ASCII
-ESC (escape) character. @xref{Constants, ,Constant Expressions}.
-
-@item Field
-When @code{awk} reads an input record, it splits the record into pieces
-separated by whitespace (or by a separator regexp which you can
-change by setting the built-in variable @code{FS}). Such pieces are
-called fields. If the pieces are of fixed length, you can use the built-in
-variable @code{FIELDWIDTHS} to describe their lengths.
-@xref{Records, ,How Input is Split into Records}.@refill
-
-@item Format
-Format strings are used to control the appearance of output in the
-@code{printf} statement. Also, data conversions from numbers to strings
-are controlled by the format string contained in the built-in variable
-@code{CONVFMT}. @xref{Control Letters, ,Format-Control Letters}.@refill
-
-@item Function
-A specialized group of statements often used to encapsulate general
-or program-specific tasks. @code{awk} has a number of built-in
-functions, and also allows you to define your own.
-@xref{Built-in, ,Built-in Functions}.
-Also, see @ref{User-defined, ,User-defined Functions}.@refill
-
-@item @code{gawk}
-The GNU implementation of @code{awk}.
-
-@item GNU
-``GNU's not Unix''. An on-going project of the Free Software Foundation
-to create a complete, freely distributable, @sc{posix}-compliant computing
-environment.
-
-@item Input Record
-A single chunk of data read in by @code{awk}. Usually, an @code{awk} input
-record consists of one line of text.
-@xref{Records, ,How Input is Split into Records}.@refill
-
-@item Keyword
-In the @code{awk} language, a keyword is a word that has special
-meaning. Keywords are reserved and may not be used as variable names.
-
-@code{awk}'s keywords are:
-@code{if},
-@code{else},
-@code{while},
-@code{do@dots{}while},
-@code{for},
-@code{for@dots{}in},
-@code{break},
-@code{continue},
-@code{delete},
-@code{next},
-@code{function},
-@code{func},
-and @code{exit}.@refill
-
-@item Lvalue
-An expression that can appear on the left side of an assignment
-operator. In most languages, lvalues can be variables or array
-elements. In @code{awk}, a field designator can also be used as an
-lvalue.@refill
-
-@item Number
-A numeric valued data object. The @code{gawk} implementation uses double
-precision floating point to represent numbers.@refill
-
-@item Pattern
-Patterns tell @code{awk} which input records are interesting to which
-rules.
-
-A pattern is an arbitrary conditional expression against which input is
-tested. If the condition is satisfied, the pattern is said to @dfn{match}
-the input record. A typical pattern might compare the input record against
-a regular expression. @xref{Patterns}.@refill
-
-@item @sc{posix}
-The name for a series of standards being developed by the @sc{ieee}
-that specify a Portable Operating System interface. The ``IX'' denotes
-the Unix heritage of these standards. The main standard of interest for
-@code{awk} users is P1003.2, the Command Language and Utilities standard.
-
-@item Range (of input lines)
-A sequence of consecutive lines from the input file. A pattern
-can specify ranges of input lines for @code{awk} to process, or it can
-specify single lines. @xref{Patterns}.@refill
-
-@item Recursion
-When a function calls itself, either directly or indirectly.
-If this isn't clear, refer to the entry for ``recursion.''
-
-@item Redirection
-Redirection means performing input from other than the standard input
-stream, or output to other than the standard output stream.
-
-You can redirect the output of the @code{print} and @code{printf} statements
-to a file or a system command, using the @samp{>}, @samp{>>}, and @samp{|}
-operators. You can redirect input to the @code{getline} statement using
-the @samp{<} and @samp{|} operators.
-@xref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}.@refill
-
-@item Regular Expression
-See ``regexp.''
-
-@item Regexp
-Short for @dfn{regular expression}. A regexp is a pattern that denotes a
-set of strings, possibly an infinite set. For example, the regexp
-@samp{R.*xp} matches any string starting with the letter @samp{R}
-and ending with the letters @samp{xp}. In @code{awk}, regexps are
-used in patterns and in conditional expressions. Regexps may contain
-escape sequences. @xref{Regexp, ,Regular Expressions as Patterns}.@refill
-
-@item Rule
-A segment of an @code{awk} program, that specifies how to process single
-input records. A rule consists of a @dfn{pattern} and an @dfn{action}.
-@code{awk} reads an input record; then, for each rule, if the input record
-satisfies the rule's pattern, @code{awk} executes the rule's action.
-Otherwise, the rule does nothing for that input record.@refill
-
-@item Side Effect
-A side effect occurs when an expression has an effect aside from merely
-producing a value. Assignment expressions, increment expressions and
-function calls have side effects. @xref{Assignment Ops, ,Assignment Expressions}.
-
-@item Special File
-A file name interpreted internally by @code{gawk}, instead of being handed
-directly to the underlying operating system. For example, @file{/dev/stdin}.
-@xref{Special Files, ,Standard I/O Streams}.
-
-@item Stream Editor
-A program that reads records from an input stream and processes them one
-or more at a time. This is in contrast with batch programs, which may
-expect to read their input files in entirety before starting to do
-anything, and with interactive programs, which require input from the
-user.@refill
-
-@item String
-A datum consisting of a sequence of characters, such as @samp{I am a
-string}. Constant strings are written with double-quotes in the
-@code{awk} language, and may contain escape sequences.
-@xref{Constants, ,Constant Expressions}.
-
-@item Whitespace
-A sequence of blank or tab characters occurring inside an input record or a
-string.@refill
-@end table
-
-@node Index, , Glossary, Top
-@unnumbered Index
-@printindex cp
-
-@summarycontents
-@contents
-@bye
-
-Unresolved Issues:
-------------------
-1. From: ntomczak@vm.ucs.ualberta.ca (Michal Jaegermann)
- Examples of usage tend to suggest that /../ and ".." delimiters
- can be used for regular expressions, even if definition is consistently
- using /../. I am not sure what the real rules are and in particular
- what of the following is a bug and what is a feature:
- # This program matches everything
- '"\(" { print }'
- # This one complains about mismatched parenthesis
- '$0 ~ "\(" { print }'
- # This one behaves in an expected manner
- '/\(/ { print }'
- You may also try to use "\(" as an argument to match() to see what
- will happen.
-
-2. From ADR.
-
- The posix (and original Unix!) notion of awk values as both number
- and string values needs to be put into the manual. This involves
- major and minor rewrites of most of the manual, but should help in
- clarifying many of the weirder points of the language.
-
-3. From ADR.
-
- The manual should be reorganized. Expressions should be introduced
- early, building up to regexps as expressions, and from there to their
- use as patterns and then in actions. Built-in vars should come earlier
- in the manual too. The 'expert info' sections marked with comments
- should get their own sections or subsections with nodes and titles.
- The manual should be gone over thoroughly for indexing.
-
-4. From ADR.
-
- Robert J. Chassell points out that awk programs should have some indication
- of how to use them. It would be useful to perhaps have a "programming
- style" section of the manual that would include this and other tips.
-
-5. From ADR in response to moraes@uunet.ca
- (This would make the beginnings of a good "puzzles" section...)
-
- Date: Mon, 2 Dec 91 10:08:05 EST
- From: gatech!cc!arnold (Arnold Robbins)
- To: cs.dal.ca!david, uunet.ca!moraes
- Subject: redirecting to /dev/stderr
- Cc: skeeve!arnold, boeing.com!brennan, research.att.com!bwk
-
- In 2.13.3 the following program no longer dumps core:
-
- BEGIN { print "hello" > /dev/stderr ; exit(1) }
-
- Instead, it creates a file named `0' with the word `hello' in it. AWK
- semantics strikes again. The meaning of the statement is
-
- print "hello" > (($0 ~ /dev/) stderr)
-
- /dev/ tests $0 for the pattern `dev'. This yields a 0. The variable stderr,
- having never been used, has a null string in it. The concatenation yields
- a string value of "0" which is used as the file name. Sigh.
-
- I think with some more time I can come up with a decent fix, but it will
- probably only print a diagnostic with -Wlint.
-
- Arnold
-
diff --git a/gnu/usr.bin/awk/eval.c b/gnu/usr.bin/awk/eval.c
deleted file mode 100644
index ccf4671..0000000
--- a/gnu/usr.bin/awk/eval.c
+++ /dev/null
@@ -1,1260 +0,0 @@
-/*
- * eval.c - gawk parse tree interpreter
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include "awk.h"
-
-extern double pow P((double x, double y));
-extern double modf P((double x, double *yp));
-extern double fmod P((double x, double y));
-
-static int eval_condition P((NODE *tree));
-static NODE *op_assign P((NODE *tree));
-static NODE *func_call P((NODE *name, NODE *arg_list));
-static NODE *match_op P((NODE *tree));
-
-NODE *_t; /* used as a temporary in macros */
-#ifdef MSDOS
-double _msc51bug; /* to get around a bug in MSC 5.1 */
-#endif
-NODE *ret_node;
-int OFSlen;
-int ORSlen;
-int OFMTidx;
-int CONVFMTidx;
-
-/* Macros and variables to save and restore function and loop bindings */
-/*
- * the val variable allows return/continue/break-out-of-context to be
- * caught and diagnosed
- */
-#define PUSH_BINDING(stack, x, val) (memcpy ((char *)(stack), (char *)(x), sizeof (jmp_buf)), val++)
-#define RESTORE_BINDING(stack, x, val) (memcpy ((char *)(x), (char *)(stack), sizeof (jmp_buf)), val--)
-
-static jmp_buf loop_tag; /* always the current binding */
-static int loop_tag_valid = 0; /* nonzero when loop_tag valid */
-static int func_tag_valid = 0;
-static jmp_buf func_tag;
-extern int exiting, exit_val;
-
-/*
- * This table is used by the regexp routines to do case independant
- * matching. Basically, every ascii character maps to itself, except
- * uppercase letters map to lower case ones. This table has 256
- * entries, which may be overkill. Note also that if the system this
- * is compiled on doesn't use 7-bit ascii, casetable[] should not be
- * defined to the linker, so gawk should not load.
- *
- * Do NOT make this array static, it is used in several spots, not
- * just in this file.
- */
-#if 'a' == 97 /* it's ascii */
-char casetable[] = {
- '\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007',
- '\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017',
- '\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027',
- '\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037',
- /* ' ' '!' '"' '#' '$' '%' '&' ''' */
- '\040', '\041', '\042', '\043', '\044', '\045', '\046', '\047',
- /* '(' ')' '*' '+' ',' '-' '.' '/' */
- '\050', '\051', '\052', '\053', '\054', '\055', '\056', '\057',
- /* '0' '1' '2' '3' '4' '5' '6' '7' */
- '\060', '\061', '\062', '\063', '\064', '\065', '\066', '\067',
- /* '8' '9' ':' ';' '<' '=' '>' '?' */
- '\070', '\071', '\072', '\073', '\074', '\075', '\076', '\077',
- /* '@' 'A' 'B' 'C' 'D' 'E' 'F' 'G' */
- '\100', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
- /* 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' */
- '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
- /* 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' */
- '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
- /* 'X' 'Y' 'Z' '[' '\' ']' '^' '_' */
- '\170', '\171', '\172', '\133', '\134', '\135', '\136', '\137',
- /* '`' 'a' 'b' 'c' 'd' 'e' 'f' 'g' */
- '\140', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
- /* 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' */
- '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
- /* 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' */
- '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
- /* 'x' 'y' 'z' '{' '|' '}' '~' */
- '\170', '\171', '\172', '\173', '\174', '\175', '\176', '\177',
- '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207',
- '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217',
- '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227',
- '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237',
- '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247',
- '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257',
- '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267',
- '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277',
- '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307',
- '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317',
- '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327',
- '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337',
- '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347',
- '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357',
- '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367',
- '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377',
-};
-#else
-#include "You lose. You will need a translation table for your character set."
-#endif
-
-/*
- * Tree is a bunch of rules to run. Returns zero if it hit an exit()
- * statement
- */
-int
-interpret(tree)
-register NODE *volatile tree;
-{
- jmp_buf volatile loop_tag_stack; /* shallow binding stack for loop_tag */
- static jmp_buf rule_tag; /* tag the rule currently being run, for NEXT
- * and EXIT statements. It is static because
- * there are no nested rules */
- register NODE *volatile t = NULL; /* temporary */
- NODE **volatile lhs; /* lhs == Left Hand Side for assigns, etc */
- NODE *volatile stable_tree;
- int volatile traverse = 1; /* True => loop thru tree (Node_rule_list) */
-
- /* avoid false source indications */
- source = NULL;
- sourceline = 0;
-
- if (tree == NULL)
- return 1;
- sourceline = tree->source_line;
- source = tree->source_file;
- switch (tree->type) {
- case Node_rule_node:
- traverse = 0; /* False => one for-loop iteration only */
- /* FALL THROUGH */
- case Node_rule_list:
- for (t = tree; t != NULL; t = t->rnode) {
- if (traverse)
- tree = t->lnode;
- sourceline = tree->source_line;
- source = tree->source_file;
- switch (setjmp(rule_tag)) {
- case 0: /* normal non-jump */
- /* test pattern, if any */
- if (tree->lnode == NULL ||
- eval_condition(tree->lnode))
- (void) interpret(tree->rnode);
- break;
- case TAG_CONTINUE: /* NEXT statement */
- return 1;
- case TAG_BREAK:
- return 0;
- default:
- cant_happen();
- }
- if (!traverse) /* case Node_rule_node */
- break; /* don't loop */
- }
- break;
-
- case Node_statement_list:
- for (t = tree; t != NULL; t = t->rnode)
- (void) interpret(t->lnode);
- break;
-
- case Node_K_if:
- if (eval_condition(tree->lnode)) {
- (void) interpret(tree->rnode->lnode);
- } else {
- (void) interpret(tree->rnode->rnode);
- }
- break;
-
- case Node_K_while:
- PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
-
- stable_tree = tree;
- while (eval_condition(stable_tree->lnode)) {
- switch (setjmp(loop_tag)) {
- case 0: /* normal non-jump */
- (void) interpret(stable_tree->rnode);
- break;
- case TAG_CONTINUE: /* continue statement */
- break;
- case TAG_BREAK: /* break statement */
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- return 1;
- default:
- cant_happen();
- }
- }
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- break;
-
- case Node_K_do:
- PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- stable_tree = tree;
- do {
- switch (setjmp(loop_tag)) {
- case 0: /* normal non-jump */
- (void) interpret(stable_tree->rnode);
- break;
- case TAG_CONTINUE: /* continue statement */
- break;
- case TAG_BREAK: /* break statement */
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- return 1;
- default:
- cant_happen();
- }
- } while (eval_condition(stable_tree->lnode));
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- break;
-
- case Node_K_for:
- PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- (void) interpret(tree->forloop->init);
- stable_tree = tree;
- while (eval_condition(stable_tree->forloop->cond)) {
- switch (setjmp(loop_tag)) {
- case 0: /* normal non-jump */
- (void) interpret(stable_tree->lnode);
- /* fall through */
- case TAG_CONTINUE: /* continue statement */
- (void) interpret(stable_tree->forloop->incr);
- break;
- case TAG_BREAK: /* break statement */
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- return 1;
- default:
- cant_happen();
- }
- }
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- break;
-
- case Node_K_arrayfor:
- {
- volatile struct search l; /* For array_for */
- Func_ptr after_assign = NULL;
-
-#define hakvar forloop->init
-#define arrvar forloop->incr
- PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- lhs = get_lhs(tree->hakvar, &after_assign);
- t = tree->arrvar;
- if (t->type == Node_param_list)
- t = stack_ptr[t->param_cnt];
- stable_tree = tree;
- for (assoc_scan(t, (struct search *)&l);
- l.retval;
- assoc_next((struct search *)&l)) {
- unref(*((NODE **) lhs));
- *lhs = dupnode(l.retval);
- if (after_assign)
- (*after_assign)();
- switch (setjmp(loop_tag)) {
- case 0:
- (void) interpret(stable_tree->lnode);
- case TAG_CONTINUE:
- break;
-
- case TAG_BREAK:
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- return 1;
- default:
- cant_happen();
- }
- }
- RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
- break;
- }
-
- case Node_K_break:
- if (loop_tag_valid == 0)
- fatal("unexpected break");
- longjmp(loop_tag, TAG_BREAK);
- break;
-
- case Node_K_continue:
- if (loop_tag_valid == 0) {
- /*
- * AT&T nawk treats continue outside of loops like
- * next. Allow it if not posix, and complain if
- * lint.
- */
- static int warned = 0;
-
- if (do_lint && ! warned) {
- warning("use of `continue' outside of loop is not portable");
- warned = 1;
- }
- if (do_posix)
- fatal("use of `continue' outside of loop is not allowed");
- longjmp(rule_tag, TAG_CONTINUE);
- } else
- longjmp(loop_tag, TAG_CONTINUE);
- break;
-
- case Node_K_print:
- do_print(tree);
- break;
-
- case Node_K_printf:
- do_printf(tree);
- break;
-
- case Node_K_delete:
- if (tree->rnode != NULL)
- do_delete(tree->lnode, tree->rnode);
- else
- assoc_clear(tree->lnode);
- break;
-
- case Node_K_next:
- longjmp(rule_tag, TAG_CONTINUE);
- break;
-
- case Node_K_nextfile:
- do_nextfile();
- break;
-
- case Node_K_exit:
- /*
- * In A,K,&W, p. 49, it says that an exit statement "...
- * causes the program to behave as if the end of input had
- * occurred; no more input is read, and the END actions, if
- * any are executed." This implies that the rest of the rules
- * are not done. So we immediately break out of the main loop.
- */
- exiting = 1;
- if (tree) {
- t = tree_eval(tree->lnode);
- exit_val = (int) force_number(t);
- }
- free_temp(t);
- longjmp(rule_tag, TAG_BREAK);
- break;
-
- case Node_K_return:
- t = tree_eval(tree->lnode);
- ret_node = dupnode(t);
- free_temp(t);
- longjmp(func_tag, TAG_RETURN);
- break;
-
- default:
- /*
- * Appears to be an expression statement. Throw away the
- * value.
- */
- if (do_lint && tree->type == Node_var)
- warning("statement has no effect");
- t = tree_eval(tree);
- free_temp(t);
- break;
- }
- return 1;
-}
-
-/* evaluate a subtree */
-
-NODE *
-r_tree_eval(tree)
-register NODE *tree;
-{
- register NODE *r, *t1, *t2; /* return value & temporary subtrees */
- register NODE **lhs;
- register int di;
- AWKNUM x, x1, x2;
- long lx;
-#ifdef _CRAY
- long lx2;
-#endif
-
-#ifdef DEBUG
- if (tree == NULL)
- return Nnull_string;
- if (tree->type == Node_val) {
- if ((char)tree->stref <= 0) cant_happen();
- return tree;
- }
- if (tree->type == Node_var) {
- if ((char)tree->var_value->stref <= 0) cant_happen();
- return tree->var_value;
- }
-#endif
-
- if (tree->type == Node_param_list) {
- tree = stack_ptr[tree->param_cnt];
- if (tree == NULL)
- return Nnull_string;
- }
-
- switch (tree->type) {
- case Node_var:
- return tree->var_value;
-
- case Node_and:
- return tmp_number((AWKNUM) (eval_condition(tree->lnode)
- && eval_condition(tree->rnode)));
-
- case Node_or:
- return tmp_number((AWKNUM) (eval_condition(tree->lnode)
- || eval_condition(tree->rnode)));
-
- case Node_not:
- return tmp_number((AWKNUM) ! eval_condition(tree->lnode));
-
- /* Builtins */
- case Node_builtin:
- return ((*tree->proc) (tree->subnode));
-
- case Node_K_getline:
- return (do_getline(tree));
-
- case Node_in_array:
- return tmp_number((AWKNUM) in_array(tree->lnode, tree->rnode));
-
- case Node_func_call:
- return func_call(tree->rnode, tree->lnode);
-
- /* unary operations */
- case Node_NR:
- case Node_FNR:
- case Node_NF:
- case Node_FIELDWIDTHS:
- case Node_FS:
- case Node_RS:
- case Node_field_spec:
- case Node_subscript:
- case Node_IGNORECASE:
- case Node_OFS:
- case Node_ORS:
- case Node_OFMT:
- case Node_CONVFMT:
- lhs = get_lhs(tree, (Func_ptr *)0);
- return *lhs;
-
- case Node_var_array:
- fatal("attempt to use array `%s' in a scalar context", tree->vname);
-
- case Node_unary_minus:
- t1 = tree_eval(tree->subnode);
- x = -force_number(t1);
- free_temp(t1);
- return tmp_number(x);
-
- case Node_cond_exp:
- if (eval_condition(tree->lnode))
- return tree_eval(tree->rnode->lnode);
- return tree_eval(tree->rnode->rnode);
-
- case Node_match:
- case Node_nomatch:
- case Node_regex:
- return match_op(tree);
-
- case Node_func:
- fatal("function `%s' called with space between name and (,\n%s",
- tree->lnode->param,
- "or used in other expression context");
-
- /* assignments */
- case Node_assign:
- {
- Func_ptr after_assign = NULL;
-
- r = tree_eval(tree->rnode);
- lhs = get_lhs(tree->lnode, &after_assign);
- if (r != *lhs) {
- NODE *save;
-
- save = *lhs;
- *lhs = dupnode(r);
- unref(save);
- }
- free_temp(r);
- if (after_assign)
- (*after_assign)();
- return *lhs;
- }
-
- case Node_concat:
- {
-#define STACKSIZE 10
- NODE *treelist[STACKSIZE+1];
- NODE *strlist[STACKSIZE+1];
- register NODE **treep;
- register NODE **strp;
- register size_t len;
- char *str;
- register char *dest;
-
- /*
- * This is an efficiency hack for multiple adjacent string
- * concatenations, to avoid recursion and string copies.
- *
- * Node_concat trees grow downward to the left, so
- * descend to lowest (first) node, accumulating nodes
- * to evaluate to strings as we go.
- */
- treep = treelist;
- while (tree->type == Node_concat) {
- *treep++ = tree->rnode;
- tree = tree->lnode;
- if (treep == &treelist[STACKSIZE])
- break;
- }
- *treep = tree;
- /*
- * Now, evaluate to strings in LIFO order, accumulating
- * the string length, so we can do a single malloc at the
- * end.
- */
- strp = strlist;
- len = 0;
- while (treep >= treelist) {
- *strp = force_string(tree_eval(*treep--));
- len += (*strp)->stlen;
- strp++;
- }
- *strp = NULL;
- emalloc(str, char *, len+2, "tree_eval");
- str[len] = str[len+1] = '\0'; /* for good measure */
- dest = str;
- strp = strlist;
- while (*strp) {
- memcpy(dest, (*strp)->stptr, (*strp)->stlen);
- dest += (*strp)->stlen;
- free_temp(*strp);
- strp++;
- }
- r = make_str_node(str, len, ALREADY_MALLOCED);
- r->flags |= TEMP;
- }
- return r;
-
- /* other assignment types are easier because they are numeric */
- case Node_preincrement:
- case Node_predecrement:
- case Node_postincrement:
- case Node_postdecrement:
- case Node_assign_exp:
- case Node_assign_times:
- case Node_assign_quotient:
- case Node_assign_mod:
- case Node_assign_plus:
- case Node_assign_minus:
- return op_assign(tree);
- default:
- break; /* handled below */
- }
-
- /* evaluate subtrees in order to do binary operation, then keep going */
- t1 = tree_eval(tree->lnode);
- t2 = tree_eval(tree->rnode);
-
- switch (tree->type) {
- case Node_geq:
- case Node_leq:
- case Node_greater:
- case Node_less:
- case Node_notequal:
- case Node_equal:
- di = cmp_nodes(t1, t2);
- free_temp(t1);
- free_temp(t2);
- switch (tree->type) {
- case Node_equal:
- return tmp_number((AWKNUM) (di == 0));
- case Node_notequal:
- return tmp_number((AWKNUM) (di != 0));
- case Node_less:
- return tmp_number((AWKNUM) (di < 0));
- case Node_greater:
- return tmp_number((AWKNUM) (di > 0));
- case Node_leq:
- return tmp_number((AWKNUM) (di <= 0));
- case Node_geq:
- return tmp_number((AWKNUM) (di >= 0));
- default:
- cant_happen();
- }
- break;
- default:
- break; /* handled below */
- }
-
- x1 = force_number(t1);
- free_temp(t1);
- x2 = force_number(t2);
- free_temp(t2);
- switch (tree->type) {
- case Node_exp:
- if ((lx = x2) == x2 && lx >= 0) { /* integer exponent */
- if (lx == 0)
- x = 1;
- else if (lx == 1)
- x = x1;
- else {
- /* doing it this way should be more precise */
- for (x = x1; --lx; )
- x *= x1;
- }
- } else
- x = pow((double) x1, (double) x2);
- return tmp_number(x);
-
- case Node_times:
- return tmp_number(x1 * x2);
-
- case Node_quotient:
- if (x2 == 0)
- fatal("division by zero attempted");
-#ifdef _CRAY
- /*
- * special case for integer division, put in for Cray
- */
- lx2 = x2;
- if (lx2 == 0)
- return tmp_number(x1 / x2);
- lx = (long) x1 / lx2;
- if (lx * x2 == x1)
- return tmp_number((AWKNUM) lx);
- else
-#endif
- return tmp_number(x1 / x2);
-
- case Node_mod:
- if (x2 == 0)
- fatal("division by zero attempted in mod");
-#ifndef FMOD_MISSING
- return tmp_number(fmod (x1, x2));
-#else
- (void) modf(x1 / x2, &x);
- return tmp_number(x1 - x * x2);
-#endif
-
- case Node_plus:
- return tmp_number(x1 + x2);
-
- case Node_minus:
- return tmp_number(x1 - x2);
-
- case Node_var_array:
- fatal("attempt to use array `%s' in a scalar context", tree->vname);
-
- default:
- fatal("illegal type (%d) in tree_eval", tree->type);
- }
- return 0;
-}
-
-/* Is TREE true or false? Returns 0==false, non-zero==true */
-static int
-eval_condition(tree)
-register NODE *tree;
-{
- register NODE *t1;
- register int ret;
-
- if (tree == NULL) /* Null trees are the easiest kinds */
- return 1;
- if (tree->type == Node_line_range) {
- /*
- * Node_line_range is kind of like Node_match, EXCEPT: the
- * lnode field (more properly, the condpair field) is a node
- * of a Node_cond_pair; whether we evaluate the lnode of that
- * node or the rnode depends on the triggered word. More
- * precisely: if we are not yet triggered, we tree_eval the
- * lnode; if that returns true, we set the triggered word.
- * If we are triggered (not ELSE IF, note), we tree_eval the
- * rnode, clear triggered if it succeeds, and perform our
- * action (regardless of success or failure). We want to be
- * able to begin and end on a single input record, so this
- * isn't an ELSE IF, as noted above.
- */
- if (!tree->triggered)
- if (!eval_condition(tree->condpair->lnode))
- return 0;
- else
- tree->triggered = 1;
- /* Else we are triggered */
- if (eval_condition(tree->condpair->rnode))
- tree->triggered = 0;
- return 1;
- }
-
- /*
- * Could just be J.random expression. in which case, null and 0 are
- * false, anything else is true
- */
-
- t1 = tree_eval(tree);
- if (t1->flags & MAYBE_NUM)
- (void) force_number(t1);
- if (t1->flags & NUMBER)
- ret = t1->numbr != 0.0;
- else
- ret = t1->stlen != 0;
- free_temp(t1);
- return ret;
-}
-
-/*
- * compare two nodes, returning negative, 0, positive
- */
-int
-cmp_nodes(t1, t2)
-register NODE *t1, *t2;
-{
- register int ret;
- register size_t len1, len2;
-
- if (t1 == t2)
- return 0;
- if (t1->flags & MAYBE_NUM)
- (void) force_number(t1);
- if (t2->flags & MAYBE_NUM)
- (void) force_number(t2);
- if ((t1->flags & NUMBER) && (t2->flags & NUMBER)) {
- if (t1->numbr == t2->numbr) return 0;
- else if (t1->numbr - t2->numbr < 0) return -1;
- else return 1;
- }
- (void) force_string(t1);
- (void) force_string(t2);
- len1 = t1->stlen;
- len2 = t2->stlen;
- if (len1 == 0 || len2 == 0)
- return len1 - len2;
- ret = memcmp(t1->stptr, t2->stptr, len1 <= len2 ? len1 : len2);
- return ret == 0 ? len1-len2 : ret;
-}
-
-static NODE *
-op_assign(tree)
-register NODE *tree;
-{
- AWKNUM rval, lval;
- NODE **lhs;
- AWKNUM t1, t2;
- long ltemp;
- NODE *tmp;
- Func_ptr after_assign = NULL;
-
- lhs = get_lhs(tree->lnode, &after_assign);
- lval = force_number(*lhs);
-
- /*
- * Can't unref *lhs until we know the type; doing so
- * too early breaks x += x sorts of things.
- */
- switch(tree->type) {
- case Node_preincrement:
- case Node_predecrement:
- unref(*lhs);
- *lhs = make_number(lval +
- (tree->type == Node_preincrement ? 1.0 : -1.0));
- if (after_assign)
- (*after_assign)();
- return *lhs;
-
- case Node_postincrement:
- case Node_postdecrement:
- unref(*lhs);
- *lhs = make_number(lval +
- (tree->type == Node_postincrement ? 1.0 : -1.0));
- if (after_assign)
- (*after_assign)();
- return tmp_number(lval);
- default:
- break; /* handled below */
- }
-
- tmp = tree_eval(tree->rnode);
- rval = force_number(tmp);
- free_temp(tmp);
- unref(*lhs);
- switch(tree->type) {
- case Node_assign_exp:
- if ((ltemp = rval) == rval) { /* integer exponent */
- if (ltemp == 0)
- *lhs = make_number((AWKNUM) 1);
- else if (ltemp == 1)
- *lhs = make_number(lval);
- else {
- /* doing it this way should be more precise */
- for (t1 = t2 = lval; --ltemp; )
- t1 *= t2;
- *lhs = make_number(t1);
- }
- } else
- *lhs = make_number((AWKNUM) pow((double) lval, (double) rval));
- break;
-
- case Node_assign_times:
- *lhs = make_number(lval * rval);
- break;
-
- case Node_assign_quotient:
- if (rval == (AWKNUM) 0)
- fatal("division by zero attempted in /=");
-#ifdef _CRAY
- /*
- * special case for integer division, put in for Cray
- */
- ltemp = rval;
- if (ltemp == 0) {
- *lhs = make_number(lval / rval);
- break;
- }
- ltemp = (long) lval / ltemp;
- if (ltemp * lval == rval)
- *lhs = make_number((AWKNUM) ltemp);
- else
-#endif
- *lhs = make_number(lval / rval);
- break;
-
- case Node_assign_mod:
- if (rval == (AWKNUM) 0)
- fatal("division by zero attempted in %=");
-#ifndef FMOD_MISSING
- *lhs = make_number(fmod(lval, rval));
-#else
- (void) modf(lval / rval, &t1);
- t2 = lval - rval * t1;
- *lhs = make_number(t2);
-#endif
- break;
-
- case Node_assign_plus:
- *lhs = make_number(lval + rval);
- break;
-
- case Node_assign_minus:
- *lhs = make_number(lval - rval);
- break;
- default:
- cant_happen();
- }
- if (after_assign)
- (*after_assign)();
- return *lhs;
-}
-
-NODE **stack_ptr;
-
-static NODE *
-func_call(name, arg_list)
-NODE *name; /* name is a Node_val giving function name */
-NODE *arg_list; /* Node_expression_list of calling args. */
-{
- register NODE *arg, *argp, *r;
- NODE *n, *f;
- jmp_buf volatile func_tag_stack;
- jmp_buf volatile loop_tag_stack;
- int volatile save_loop_tag_valid = 0;
- NODE **volatile save_stack, *save_ret_node;
- NODE **volatile local_stack = NULL, **sp;
- int count;
- extern NODE *ret_node;
-
- /*
- * retrieve function definition node
- */
- f = lookup(name->stptr);
- if (!f || f->type != Node_func)
- fatal("function `%s' not defined", name->stptr);
-#ifdef FUNC_TRACE
- fprintf(stderr, "function %s called\n", name->stptr);
-#endif
- count = f->lnode->param_cnt;
- if (count)
- emalloc(local_stack, NODE **, count*sizeof(NODE *), "func_call");
- sp = local_stack;
-
- /*
- * for each calling arg. add NODE * on stack
- */
- for (argp = arg_list; count && argp != NULL; argp = argp->rnode) {
- arg = argp->lnode;
- getnode(r);
- r->type = Node_var;
- /*
- * call by reference for arrays; see below also
- */
- if (arg->type == Node_param_list)
- arg = stack_ptr[arg->param_cnt];
- if (arg->type == Node_var_array)
- *r = *arg;
- else {
- n = tree_eval(arg);
- r->lnode = dupnode(n);
- r->rnode = (NODE *) NULL;
- free_temp(n);
- }
- *sp++ = r;
- count--;
- }
- if (argp != NULL) /* left over calling args. */
- warning(
- "function `%s' called with more arguments than declared",
- name->stptr);
- /*
- * add remaining params. on stack with null value
- */
- while (count-- > 0) {
- getnode(r);
- r->type = Node_var;
- r->lnode = Nnull_string;
- r->rnode = (NODE *) NULL;
- *sp++ = r;
- }
-
- /*
- * Execute function body, saving context, as a return statement
- * will longjmp back here.
- *
- * Have to save and restore the loop_tag stuff so that a return
- * inside a loop in a function body doesn't scrog any loops going
- * on in the main program. We save the necessary info in variables
- * local to this function so that function nesting works OK.
- * We also only bother to save the loop stuff if we're in a loop
- * when the function is called.
- */
- if (loop_tag_valid) {
- int junk = 0;
-
- save_loop_tag_valid = (volatile int) loop_tag_valid;
- PUSH_BINDING(loop_tag_stack, loop_tag, junk);
- loop_tag_valid = 0;
- }
- save_stack = stack_ptr;
- stack_ptr = local_stack;
- PUSH_BINDING(func_tag_stack, func_tag, func_tag_valid);
- save_ret_node = ret_node;
- ret_node = Nnull_string; /* default return value */
- if (setjmp(func_tag) == 0)
- (void) interpret(f->rnode);
-
- r = ret_node;
- ret_node = (NODE *) save_ret_node;
- RESTORE_BINDING(func_tag_stack, func_tag, func_tag_valid);
- stack_ptr = (NODE **) save_stack;
-
- /*
- * here, we pop each parameter and check whether
- * it was an array. If so, and if the arg. passed in was
- * a simple variable, then the value should be copied back.
- * This achieves "call-by-reference" for arrays.
- */
- sp = local_stack;
- count = f->lnode->param_cnt;
- for (argp = arg_list; count > 0 && argp != NULL; argp = argp->rnode) {
- arg = argp->lnode;
- if (arg->type == Node_param_list)
- arg = stack_ptr[arg->param_cnt];
- n = *sp++;
- if ((arg->type == Node_var || arg->type == Node_var_array)
- && n->type == Node_var_array) {
- /* should we free arg->var_value ? */
- arg->var_array = n->var_array;
- arg->type = Node_var_array;
- arg->array_size = n->array_size;
- arg->table_size = n->table_size;
- arg->flags = n->flags;
- }
- /* n->lnode overlays the array size, don't unref it if array */
- if (n->type != Node_var_array)
- unref(n->lnode);
- freenode(n);
- count--;
- }
- while (count-- > 0) {
- n = *sp++;
- /* if n is an (local) array, all the elements should be freed */
- if (n->type == Node_var_array)
- assoc_clear(n);
- unref(n->lnode);
- freenode(n);
- }
- if (local_stack)
- free((char *) local_stack);
-
- /* Restore the loop_tag stuff if necessary. */
- if (save_loop_tag_valid) {
- int junk = 0;
-
- loop_tag_valid = (int) save_loop_tag_valid;
- RESTORE_BINDING(loop_tag_stack, loop_tag, junk);
- }
-
- if (!(r->flags & PERM))
- r->flags |= TEMP;
- return r;
-}
-
-/*
- * This returns a POINTER to a node pointer. get_lhs(ptr) is the current
- * value of the var, or where to store the var's new value
- */
-
-NODE **
-r_get_lhs(ptr, assign)
-register NODE *ptr;
-Func_ptr *assign;
-{
- register NODE **aptr = NULL;
- register NODE *n;
-
- switch (ptr->type) {
- case Node_var_array:
- fatal("attempt to use array `%s' in a scalar context", ptr->vname);
- case Node_var:
- aptr = &(ptr->var_value);
-#ifdef DEBUG
- if ((char)ptr->var_value->stref <= 0)
- cant_happen();
-#endif
- break;
-
- case Node_FIELDWIDTHS:
- aptr = &(FIELDWIDTHS_node->var_value);
- if (assign)
- *assign = set_FIELDWIDTHS;
- break;
-
- case Node_RS:
- aptr = &(RS_node->var_value);
- if (assign)
- *assign = set_RS;
- break;
-
- case Node_FS:
- aptr = &(FS_node->var_value);
- if (assign)
- *assign = set_FS;
- break;
-
- case Node_FNR:
- unref(FNR_node->var_value);
- FNR_node->var_value = make_number((AWKNUM) FNR);
- aptr = &(FNR_node->var_value);
- if (assign)
- *assign = set_FNR;
- break;
-
- case Node_NR:
- unref(NR_node->var_value);
- NR_node->var_value = make_number((AWKNUM) NR);
- aptr = &(NR_node->var_value);
- if (assign)
- *assign = set_NR;
- break;
-
- case Node_NF:
- if (NF == -1)
- (void) get_field(HUGE-1, assign); /* parse record */
- unref(NF_node->var_value);
- NF_node->var_value = make_number((AWKNUM) NF);
- aptr = &(NF_node->var_value);
- if (assign)
- *assign = set_NF;
- break;
-
- case Node_IGNORECASE:
- unref(IGNORECASE_node->var_value);
- IGNORECASE_node->var_value = make_number((AWKNUM) IGNORECASE);
- aptr = &(IGNORECASE_node->var_value);
- if (assign)
- *assign = set_IGNORECASE;
- break;
-
- case Node_OFMT:
- aptr = &(OFMT_node->var_value);
- if (assign)
- *assign = set_OFMT;
- break;
-
- case Node_CONVFMT:
- aptr = &(CONVFMT_node->var_value);
- if (assign)
- *assign = set_CONVFMT;
- break;
-
- case Node_ORS:
- aptr = &(ORS_node->var_value);
- if (assign)
- *assign = set_ORS;
- break;
-
- case Node_OFS:
- aptr = &(OFS_node->var_value);
- if (assign)
- *assign = set_OFS;
- break;
-
- case Node_param_list:
- aptr = &(stack_ptr[ptr->param_cnt]->var_value);
- break;
-
- case Node_field_spec:
- {
- int field_num;
-
- n = tree_eval(ptr->lnode);
- field_num = (int) force_number(n);
- free_temp(n);
- if (field_num < 0)
- fatal("attempt to access field %d", field_num);
- if (field_num == 0 && field0_valid) { /* short circuit */
- aptr = &fields_arr[0];
- if (assign)
- *assign = reset_record;
- break;
- }
- aptr = get_field(field_num, assign);
- break;
- }
- case Node_subscript:
- n = ptr->lnode;
- if (n->type == Node_param_list)
- n = stack_ptr[n->param_cnt];
- aptr = assoc_lookup(n, concat_exp(ptr->rnode));
- break;
-
- case Node_func:
- fatal ("`%s' is a function, assignment is not allowed",
- ptr->lnode->param);
- default:
- cant_happen();
- }
- return aptr;
-}
-
-static NODE *
-match_op(tree)
-register NODE *tree;
-{
- register NODE *t1;
- register Regexp *rp;
- int i;
- int match = 1;
-
- if (tree->type == Node_nomatch)
- match = 0;
- if (tree->type == Node_regex)
- t1 = *get_field(0, (Func_ptr *) 0);
- else {
- t1 = force_string(tree_eval(tree->lnode));
- tree = tree->rnode;
- }
- rp = re_update(tree);
- i = research(rp, t1->stptr, 0, t1->stlen, 0);
- i = (i == -1) ^ (match == 1);
- free_temp(t1);
- return tmp_number((AWKNUM) i);
-}
-
-void
-set_IGNORECASE()
-{
- static int warned = 0;
-
- if ((do_lint || do_unix) && ! warned) {
- warned = 1;
- warning("IGNORECASE not supported in compatibility mode");
- }
- IGNORECASE = (force_number(IGNORECASE_node->var_value) != 0.0);
- set_FS();
-}
-
-void
-set_OFS()
-{
- OFS = force_string(OFS_node->var_value)->stptr;
- OFSlen = OFS_node->var_value->stlen;
- OFS[OFSlen] = '\0';
-}
-
-void
-set_ORS()
-{
- ORS = force_string(ORS_node->var_value)->stptr;
- ORSlen = ORS_node->var_value->stlen;
- ORS[ORSlen] = '\0';
-}
-
-NODE **fmt_list = NULL;
-static int fmt_ok P((NODE *n));
-static int fmt_index P((NODE *n));
-
-static int
-fmt_ok(n)
-NODE *n;
-{
- /* to be done later */
- return 1;
-}
-
-static int
-fmt_index(n)
-NODE *n;
-{
- register int ix = 0;
- static int fmt_num = 4;
- static int fmt_hiwater = 0;
-
- if (fmt_list == NULL)
- emalloc(fmt_list, NODE **, fmt_num*sizeof(*fmt_list), "fmt_index");
- (void) force_string(n);
- while (ix < fmt_hiwater) {
- if (cmp_nodes(fmt_list[ix], n) == 0)
- return ix;
- ix++;
- }
- /* not found */
- n->stptr[n->stlen] = '\0';
- if (!fmt_ok(n))
- warning("bad FMT specification");
- if (fmt_hiwater >= fmt_num) {
- fmt_num *= 2;
- emalloc(fmt_list, NODE **, fmt_num, "fmt_index");
- }
- fmt_list[fmt_hiwater] = dupnode(n);
- return fmt_hiwater++;
-}
-
-void
-set_OFMT()
-{
- OFMTidx = fmt_index(OFMT_node->var_value);
- OFMT = fmt_list[OFMTidx]->stptr;
-}
-
-void
-set_CONVFMT()
-{
- CONVFMTidx = fmt_index(CONVFMT_node->var_value);
- CONVFMT = fmt_list[CONVFMTidx]->stptr;
-}
diff --git a/gnu/usr.bin/awk/field.c b/gnu/usr.bin/awk/field.c
deleted file mode 100644
index b1a709e..0000000
--- a/gnu/usr.bin/awk/field.c
+++ /dev/null
@@ -1,678 +0,0 @@
-/*
- * field.c - routines for dealing with fields and record parsing
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include "awk.h"
-
-typedef void (* Setfunc) P((int, char*, int, NODE *));
-
-static long (*parse_field) P((int, char **, int, NODE *,
- Regexp *, Setfunc, NODE *));
-static void rebuild_record P((void));
-static long re_parse_field P((int, char **, int, NODE *,
- Regexp *, Setfunc, NODE *));
-static long def_parse_field P((int, char **, int, NODE *,
- Regexp *, Setfunc, NODE *));
-static long sc_parse_field P((int, char **, int, NODE *,
- Regexp *, Setfunc, NODE *));
-static long fw_parse_field P((int, char **, int, NODE *,
- Regexp *, Setfunc, NODE *));
-static void set_element P((int, char *, int, NODE *));
-static void grow_fields_arr P((long num));
-static void set_field P((int num, char *str, int len, NODE *dummy));
-
-
-static Regexp *FS_regexp = NULL;
-static char *parse_extent; /* marks where to restart parse of record */
-static long parse_high_water=0; /* field number that we have parsed so far */
-static long nf_high_water = 0; /* size of fields_arr */
-static int resave_fs;
-static NODE *save_FS; /* save current value of FS when line is read,
- * to be used in deferred parsing
- */
-
-NODE **fields_arr; /* array of pointers to the field nodes */
-int field0_valid; /* $(>0) has not been changed yet */
-int default_FS;
-static NODE **nodes; /* permanent repository of field nodes */
-static int *FIELDWIDTHS = NULL;
-
-void
-init_fields()
-{
- NODE *n;
-
- emalloc(fields_arr, NODE **, sizeof(NODE *), "init_fields");
- emalloc(nodes, NODE **, sizeof(NODE *), "init_fields");
- getnode(n);
- *n = *Nnull_string;
- fields_arr[0] = nodes[0] = n;
- parse_extent = fields_arr[0]->stptr;
- save_FS = dupnode(FS_node->var_value);
- field0_valid = 1;
-}
-
-
-static void
-grow_fields_arr(num)
-long num;
-{
- register int t;
- register NODE *n;
-
- erealloc(fields_arr, NODE **, (num + 1) * sizeof(NODE *), "set_field");
- erealloc(nodes, NODE **, (num+1) * sizeof(NODE *), "set_field");
- for (t = nf_high_water+1; t <= num; t++) {
- getnode(n);
- *n = *Nnull_string;
- fields_arr[t] = nodes[t] = n;
- }
- nf_high_water = num;
-}
-
-/*ARGSUSED*/
-static void
-set_field(num, str, len, dummy)
-int num;
-char *str;
-int len;
-NODE *dummy; /* not used -- just to make interface same as set_element */
-{
- register NODE *n;
-
- if (num > nf_high_water)
- grow_fields_arr(num);
- n = nodes[num];
- n->stptr = str;
- n->stlen = len;
- n->flags = (PERM|STR|STRING|MAYBE_NUM);
- fields_arr[num] = n;
-}
-
-/* Someone assigned a value to $(something). Fix up $0 to be right */
-static void
-rebuild_record()
-{
- register size_t tlen;
- register NODE *tmp;
- NODE *ofs;
- char *ops;
- register char *cops;
- register NODE **ptr;
- register size_t ofslen;
-
- tlen = 0;
- ofs = force_string(OFS_node->var_value);
- ofslen = ofs->stlen;
- ptr = &fields_arr[NF];
- while (ptr > &fields_arr[0]) {
- tmp = force_string(*ptr);
- tlen += tmp->stlen;
- ptr--;
- }
- tlen += (NF - 1) * ofslen;
- if ((long)tlen < 0)
- tlen = 0;
- emalloc(ops, char *, tlen + 2, "rebuild_record");
- cops = ops;
- ops[0] = '\0';
- for (ptr = &fields_arr[1]; ptr <= &fields_arr[NF]; ptr++) {
- tmp = *ptr;
- if (tmp->stlen == 1)
- *cops++ = tmp->stptr[0];
- else if (tmp->stlen != 0) {
- memcpy(cops, tmp->stptr, tmp->stlen);
- cops += tmp->stlen;
- }
- if (ptr != &fields_arr[NF]) {
- if (ofslen == 1)
- *cops++ = ofs->stptr[0];
- else if (ofslen != 0) {
- memcpy(cops, ofs->stptr, ofslen);
- cops += ofslen;
- }
- }
- }
- tmp = make_str_node(ops, tlen, ALREADY_MALLOCED);
- unref(fields_arr[0]);
- fields_arr[0] = tmp;
- field0_valid = 1;
-}
-
-/*
- * setup $0, but defer parsing rest of line until reference is made to $(>0)
- * or to NF. At that point, parse only as much as necessary.
- */
-void
-set_record(buf, cnt, freeold)
-char *buf;
-int cnt;
-int freeold;
-{
- register int i;
-
- NF = -1;
- for (i = 1; i <= parse_high_water; i++) {
- unref(fields_arr[i]);
- }
- parse_high_water = 0;
- if (freeold) {
- unref(fields_arr[0]);
- if (resave_fs) {
- resave_fs = 0;
- unref(save_FS);
- save_FS = dupnode(FS_node->var_value);
- }
- nodes[0]->stptr = buf;
- nodes[0]->stlen = cnt;
- nodes[0]->stref = 1;
- nodes[0]->flags = (STRING|STR|PERM|MAYBE_NUM);
- fields_arr[0] = nodes[0];
- }
- fields_arr[0]->flags |= MAYBE_NUM;
- field0_valid = 1;
-}
-
-void
-reset_record()
-{
- (void) force_string(fields_arr[0]);
- set_record(fields_arr[0]->stptr, fields_arr[0]->stlen, 0);
-}
-
-void
-set_NF()
-{
- register int i;
-
- NF = (long) force_number(NF_node->var_value);
- if (NF > nf_high_water)
- grow_fields_arr(NF);
- for (i = parse_high_water + 1; i <= NF; i++) {
- unref(fields_arr[i]);
- fields_arr[i] = Nnull_string;
- }
- field0_valid = 0;
-}
-
-/*
- * this is called both from get_field() and from do_split()
- * via (*parse_field)(). This variation is for when FS is a regular
- * expression -- either user-defined or because RS=="" and FS==" "
- */
-static long
-re_parse_field(up_to, buf, len, fs, rp, set, n)
-int up_to; /* parse only up to this field number */
-char **buf; /* on input: string to parse; on output: point to start next */
-int len;
-NODE *fs;
-Regexp *rp;
-Setfunc set; /* routine to set the value of the parsed field */
-NODE *n;
-{
- register char *scan = *buf;
- register int nf = parse_high_water;
- register char *field;
- register char *end = scan + len;
-
- if (up_to == HUGE)
- nf = 0;
- if (len == 0)
- return nf;
-
- if (*RS == 0 && default_FS)
- while (scan < end && (*scan == ' ' || *scan == '\t' || *scan == '\n'))
- scan++;
- field = scan;
- while (scan < end
- && research(rp, scan, 0, (end - scan), 1) != -1
- && nf < up_to) {
- if (REEND(rp, scan) == RESTART(rp, scan)) { /* null match */
- scan++;
- if (scan == end) {
- (*set)(++nf, field, (int)(scan - field), n);
- up_to = nf;
- break;
- }
- continue;
- }
- (*set)(++nf, field,
- (int)(scan + RESTART(rp, scan) - field), n);
- scan += REEND(rp, scan);
- field = scan;
- if (scan == end) /* FS at end of record */
- (*set)(++nf, field, 0, n);
- }
- if (nf != up_to && scan < end) {
- (*set)(++nf, scan, (int)(end - scan), n);
- scan = end;
- }
- *buf = scan;
- return (nf);
-}
-
-/*
- * this is called both from get_field() and from do_split()
- * via (*parse_field)(). This variation is for when FS is a single space
- * character.
- */
-static long
-def_parse_field(up_to, buf, len, fs, rp, set, n)
-int up_to; /* parse only up to this field number */
-char **buf; /* on input: string to parse; on output: point to start next */
-int len;
-NODE *fs;
-Regexp *rp;
-Setfunc set; /* routine to set the value of the parsed field */
-NODE *n;
-{
- register char *scan = *buf;
- register int nf = parse_high_water;
- register char *field;
- register char *end = scan + len;
- char sav;
-
- if (up_to == HUGE)
- nf = 0;
- if (len == 0)
- return nf;
-
- /* before doing anything save the char at *end */
- sav = *end;
- /* because it will be destroyed now: */
-
- *end = ' '; /* sentinel character */
- for (; nf < up_to; scan++) {
- /*
- * special case: fs is single space, strip leading whitespace
- */
- while (scan < end && (*scan == ' ' || *scan == '\t'))
- scan++;
- if (scan >= end)
- break;
- field = scan;
- while (*scan != ' ' && *scan != '\t')
- scan++;
- (*set)(++nf, field, (int)(scan - field), n);
- if (scan == end)
- break;
- }
-
- /* everything done, restore original char at *end */
- *end = sav;
-
- *buf = scan;
- return nf;
-}
-
-/*
- * this is called both from get_field() and from do_split()
- * via (*parse_field)(). This variation is for when FS is a single character
- * other than space.
- */
-static long
-sc_parse_field(up_to, buf, len, fs, rp, set, n)
-int up_to; /* parse only up to this field number */
-char **buf; /* on input: string to parse; on output: point to start next */
-int len;
-NODE *fs;
-Regexp *rp;
-Setfunc set; /* routine to set the value of the parsed field */
-NODE *n;
-{
- register char *scan = *buf;
- register char fschar;
- register int nf = parse_high_water;
- register char *field;
- register char *end = scan + len;
- char sav;
-
- if (up_to == HUGE)
- nf = 0;
- if (len == 0)
- return nf;
-
- if (*RS == 0 && fs->stlen == 0)
- fschar = '\n';
- else
- fschar = fs->stptr[0];
-
- /* before doing anything save the char at *end */
- sav = *end;
- /* because it will be destroyed now: */
- *end = fschar; /* sentinel character */
-
- for (; nf < up_to;) {
- field = scan;
- while (*scan != fschar)
- scan++;
- (*set)(++nf, field, (int)(scan - field), n);
- if (scan == end)
- break;
- scan++;
- if (scan == end) { /* FS at end of record */
- (*set)(++nf, field, 0, n);
- break;
- }
- }
-
- /* everything done, restore original char at *end */
- *end = sav;
-
- *buf = scan;
- return nf;
-}
-
-/*
- * this is called both from get_field() and from do_split()
- * via (*parse_field)(). This variation is for fields are fixed widths.
- */
-static long
-fw_parse_field(up_to, buf, len, fs, rp, set, n)
-int up_to; /* parse only up to this field number */
-char **buf; /* on input: string to parse; on output: point to start next */
-int len;
-NODE *fs;
-Regexp *rp;
-Setfunc set; /* routine to set the value of the parsed field */
-NODE *n;
-{
- register char *scan = *buf;
- register long nf = parse_high_water;
- register char *end = scan + len;
-
- if (up_to == HUGE)
- nf = 0;
- if (len == 0)
- return nf;
- for (; nf < up_to && (len = FIELDWIDTHS[nf+1]) != -1; ) {
- if (len > end - scan)
- len = end - scan;
- (*set)(++nf, scan, len, n);
- scan += len;
- }
- if (len == -1)
- *buf = end;
- else
- *buf = scan;
- return nf;
-}
-
-NODE **
-get_field(requested, assign)
-register int requested;
-Func_ptr *assign; /* this field is on the LHS of an assign */
-{
- /*
- * if requesting whole line but some other field has been altered,
- * then the whole line must be rebuilt
- */
- if (requested == 0) {
- if (!field0_valid) {
- /* first, parse remainder of input record */
- if (NF == -1) {
- NF = (*parse_field)(HUGE-1, &parse_extent,
- fields_arr[0]->stlen -
- (parse_extent - fields_arr[0]->stptr),
- save_FS, FS_regexp, set_field,
- (NODE *)NULL);
- parse_high_water = NF;
- }
- rebuild_record();
- }
- if (assign)
- *assign = reset_record;
- return &fields_arr[0];
- }
-
- /* assert(requested > 0); */
-
- if (assign)
- field0_valid = 0; /* $0 needs reconstruction */
-
- if (requested <= parse_high_water) /* already parsed this field */
- return &fields_arr[requested];
-
- if (NF == -1) { /* have not yet parsed to end of record */
- /*
- * parse up to requested fields, calling set_field() for each,
- * saving in parse_extent the point where the parse left off
- */
- if (parse_high_water == 0) /* starting at the beginning */
- parse_extent = fields_arr[0]->stptr;
- parse_high_water = (*parse_field)(requested, &parse_extent,
- fields_arr[0]->stlen - (parse_extent-fields_arr[0]->stptr),
- save_FS, FS_regexp, set_field, (NODE *)NULL);
-
- /*
- * if we reached the end of the record, set NF to the number of
- * fields so far. Note that requested might actually refer to
- * a field that is beyond the end of the record, but we won't
- * set NF to that value at this point, since this is only a
- * reference to the field and NF only gets set if the field
- * is assigned to -- this case is handled below
- */
- if (parse_extent == fields_arr[0]->stptr + fields_arr[0]->stlen)
- NF = parse_high_water;
- if (requested == HUGE-1) /* HUGE-1 means set NF */
- requested = parse_high_water;
- }
- if (parse_high_water < requested) { /* requested beyond end of record */
- if (assign) { /* expand record */
- register int i;
-
- if (requested > nf_high_water)
- grow_fields_arr(requested);
-
- /* fill in fields that don't exist */
- for (i = parse_high_water + 1; i <= requested; i++)
- fields_arr[i] = Nnull_string;
-
- NF = requested;
- parse_high_water = requested;
- } else
- return &Nnull_string;
- }
-
- return &fields_arr[requested];
-}
-
-static void
-set_element(num, s, len, n)
-int num;
-char *s;
-int len;
-NODE *n;
-{
- register NODE *it;
-
- it = make_string(s, len);
- it->flags |= MAYBE_NUM;
- *assoc_lookup(n, tmp_number((AWKNUM) (num))) = it;
-}
-
-NODE *
-do_split(tree)
-NODE *tree;
-{
- NODE *t1, *t2, *t3, *tmp;
- NODE *fs;
- char *s;
- long (*parseit)P((int, char **, int, NODE *,
- Regexp *, Setfunc, NODE *));
- Regexp *rp = NULL;
-
-
- /*
- * do dupnode(), to avoid problems like
- * x = split(a[1], a, "blah")
- * since we assoc_clear the array. gack.
- * this also gives up complete call by value semantics.
- */
- tmp = tree_eval(tree->lnode);
- t1 = dupnode(tmp);
- free_temp(tmp);
-
- t2 = tree->rnode->lnode;
- t3 = tree->rnode->rnode->lnode;
-
- (void) force_string(t1);
-
- if (t2->type == Node_param_list)
- t2 = stack_ptr[t2->param_cnt];
- if (t2->type != Node_var && t2->type != Node_var_array)
- fatal("second argument of split is not a variable");
- assoc_clear(t2);
-
- if (t3->re_flags & FS_DFLT) {
- parseit = parse_field;
- fs = force_string(FS_node->var_value);
- rp = FS_regexp;
- } else {
- tmp = force_string(tree_eval(t3->re_exp));
- if (tmp->stlen == 1) {
- if (tmp->stptr[0] == ' ')
- parseit = def_parse_field;
- else
- parseit = sc_parse_field;
- } else {
- parseit = re_parse_field;
- rp = re_update(t3);
- }
- fs = tmp;
- }
-
- s = t1->stptr;
- tmp = tmp_number((AWKNUM) (*parseit)(HUGE, &s, (int)t1->stlen,
- fs, rp, set_element, t2));
- unref(t1);
- free_temp(t3);
- return tmp;
-}
-
-void
-set_FS()
-{
- char buf[10];
- NODE *fs;
-
- /*
- * If changing the way fields are split, obey least-suprise
- * semantics, and force $0 to be split totally.
- */
- if (fields_arr != NULL)
- (void) get_field(HUGE - 1, 0);
-
- buf[0] = '\0';
- default_FS = 0;
- if (FS_regexp) {
- refree(FS_regexp);
- FS_regexp = NULL;
- }
- fs = force_string(FS_node->var_value);
- if (fs->stlen > 1)
- parse_field = re_parse_field;
- else if (*RS == 0) {
- parse_field = sc_parse_field;
- if (fs->stlen == 1) {
- if (fs->stptr[0] == ' ') {
- default_FS = 1;
- strcpy(buf, "[ \t\n]+");
- } else if (fs->stptr[0] != '\n')
- sprintf(buf, "[%c\n]", fs->stptr[0]);
- }
- } else {
- parse_field = def_parse_field;
- if (fs->stptr[0] == ' ' && fs->stlen == 1)
- default_FS = 1;
- else if (fs->stptr[0] != ' ' && fs->stlen == 1) {
- if (IGNORECASE == 0)
- parse_field = sc_parse_field;
- else if (fs->stptr[0] == '\\')
- /* yet another special case */
- strcpy(buf, "[\\\\]");
- else
- sprintf(buf, "[%c]", fs->stptr[0]);
- }
- }
- if (buf[0]) {
- FS_regexp = make_regexp(buf, strlen(buf), IGNORECASE, 1);
- parse_field = re_parse_field;
- } else if (parse_field == re_parse_field) {
- FS_regexp = make_regexp(fs->stptr, fs->stlen, IGNORECASE, 1);
- } else
- FS_regexp = NULL;
- resave_fs = 1;
-}
-
-void
-set_RS()
-{
- (void) force_string(RS_node->var_value);
- RS = RS_node->var_value->stptr;
- set_FS();
-}
-
-void
-set_FIELDWIDTHS()
-{
- register char *scan;
- char *end;
- register int i;
- static int fw_alloc = 1;
- static int warned = 0;
- extern double strtod();
-
- if (do_lint && ! warned) {
- warned = 1;
- warning("use of FIELDWIDTHS is a gawk extension");
- }
- if (do_unix) /* quick and dirty, does the trick */
- return;
-
- /*
- * If changing the way fields are split, obey least-suprise
- * semantics, and force $0 to be split totally.
- */
- if (fields_arr != NULL)
- (void) get_field(HUGE - 1, 0);
-
- parse_field = fw_parse_field;
- scan = force_string(FIELDWIDTHS_node->var_value)->stptr;
- end = scan + 1;
- if (FIELDWIDTHS == NULL)
- emalloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS");
- FIELDWIDTHS[0] = 0;
- for (i = 1; ; i++) {
- if (i >= fw_alloc) {
- fw_alloc *= 2;
- erealloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS");
- }
- FIELDWIDTHS[i] = (int) strtod(scan, &end);
- if (end == scan)
- break;
- scan = end;
- }
- FIELDWIDTHS[i] = -1;
-}
diff --git a/gnu/usr.bin/awk/getopt.c b/gnu/usr.bin/awk/getopt.c
deleted file mode 100644
index fd142f5..0000000
--- a/gnu/usr.bin/awk/getopt.c
+++ /dev/null
@@ -1,757 +0,0 @@
-/* Getopt for GNU.
- NOTE: getopt is now part of the C library, so if you don't know what
- "Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu
- before changing it!
-
- Copyright (C) 1987, 88, 89, 90, 91, 92, 93, 1994
- Free Software Foundation, Inc.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License as published by the
- Free Software Foundation; either version 2, or (at your option) any
- later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
-
-#ifdef HAVE_CONFIG_H
-#if defined (emacs) || defined (CONFIG_BROKETS)
-/* We use <config.h> instead of "config.h" so that a compilation
- using -I. -I$srcdir will use ./config.h rather than $srcdir/config.h
- (which it would do because it found this file in $srcdir). */
-#include <config.h>
-#else
-#include "config.h"
-#endif
-#endif
-
-#ifndef __STDC__
-/* This is a separate conditional since some stdc systems
- reject `defined (const)'. */
-#ifndef const
-#define const
-#endif
-#endif
-
-/* This tells Alpha OSF/1 not to define a getopt prototype in <stdio.h>. */
-#ifndef _NO_PROTO
-#define _NO_PROTO
-#endif
-
-#include <stdio.h>
-
-/* Comment out all this code if we are using the GNU C Library, and are not
- actually compiling the library itself. This code is part of the GNU C
- Library, but also included in many other GNU distributions. Compiling
- and linking in this code is a waste when using the GNU C library
- (especially if it is a shared library). Rather than having every GNU
- program understand `configure --with-gnu-libc' and omit the object files,
- it is simpler to just do this in the source for each such file. */
-
-#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
-
-
-/* This needs to come after some library #include
- to get __GNU_LIBRARY__ defined. */
-#if defined(__GNU_LIBRARY__) || defined(STDC_HEADERS)
-/* Don't include stdlib.h for non-GNU C libraries because some of them
- contain conflicting prototypes for getopt. */
-#include <stdlib.h>
-#else
-extern char *getenv ();
-#endif /* __GNU_LIBRARY || STDC_HEADERS */
-
-/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a
- long-named option. Because this is not POSIX.2 compliant, it is
- being phased out. */
-/* #define GETOPT_COMPAT */
-
-/* This version of `getopt' appears to the caller like standard Unix `getopt'
- but it behaves differently for the user, since it allows the user
- to intersperse the options with the other arguments.
-
- As `getopt' works, it permutes the elements of ARGV so that,
- when it is done, all the options precede everything else. Thus
- all application programs are extended to handle flexible argument order.
-
- Setting the environment variable POSIXLY_CORRECT disables permutation.
- Then the behavior is completely standard.
-
- GNU application programs can use a third alternative mode in which
- they can distinguish the relative order of options and other arguments. */
-
-#include "getopt.h"
-
-/* For communication from `getopt' to the caller.
- When `getopt' finds an option that takes an argument,
- the argument value is returned here.
- Also, when `ordering' is RETURN_IN_ORDER,
- each non-option ARGV-element is returned here. */
-
-char *optarg = 0;
-
-/* Index in ARGV of the next element to be scanned.
- This is used for communication to and from the caller
- and for communication between successive calls to `getopt'.
-
- On entry to `getopt', zero means this is the first call; initialize.
-
- When `getopt' returns EOF, this is the index of the first of the
- non-option elements that the caller should itself scan.
-
- Otherwise, `optind' communicates from one call to the next
- how much of ARGV has been scanned so far. */
-
-/* XXX 1003.2 says this must be 1 before any call. */
-int optind = 0;
-
-/* The next char to be scanned in the option-element
- in which the last option character we returned was found.
- This allows us to pick up the scan where we left off.
-
- If this is zero, or a null string, it means resume the scan
- by advancing to the next ARGV-element. */
-
-static char *nextchar;
-
-/* Callers store zero here to inhibit the error message
- for unrecognized options. */
-
-int opterr = 1;
-
-/* Set to an option character which was unrecognized.
- This must be initialized on some systems to avoid linking in the
- system's own getopt implementation. */
-
-int optopt = '?';
-
-/* Describe how to deal with options that follow non-option ARGV-elements.
-
- If the caller did not specify anything,
- the default is REQUIRE_ORDER if the environment variable
- POSIXLY_CORRECT is defined, PERMUTE otherwise.
-
- REQUIRE_ORDER means don't recognize them as options;
- stop option processing when the first non-option is seen.
- This is what Unix does.
- This mode of operation is selected by either setting the environment
- variable POSIXLY_CORRECT, or using `+' as the first character
- of the list of option characters.
-
- PERMUTE is the default. We permute the contents of ARGV as we scan,
- so that eventually all the non-options are at the end. This allows options
- to be given in any order, even with programs that were not written to
- expect this.
-
- RETURN_IN_ORDER is an option available to programs that were written
- to expect options and other ARGV-elements in any order and that care about
- the ordering of the two. We describe each non-option ARGV-element
- as if it were the argument of an option with character code 1.
- Using `-' as the first character of the list of option characters
- selects this mode of operation.
-
- The special argument `--' forces an end of option-scanning regardless
- of the value of `ordering'. In the case of RETURN_IN_ORDER, only
- `--' can cause `getopt' to return EOF with `optind' != ARGC. */
-
-static enum
-{
- REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER
-} ordering;
-
-#if defined(__GNU_LIBRARY__) || defined(STDC_HEADERS)
-/* We want to avoid inclusion of string.h with non-GNU libraries
- because there are many ways it can cause trouble.
- On some systems, it contains special magic macros that don't work
- in GCC. */
-#include <string.h>
-#define my_index strchr
-#else
-
-/* Avoid depending on library functions or files
- whose names are inconsistent. */
-
-static char *
-my_index (str, chr)
- const char *str;
- int chr;
-{
- while (*str)
- {
- if (*str == chr)
- return (char *) str;
- str++;
- }
- return 0;
-}
-
-/* If using GCC, we can safely declare strlen this way.
- If not using GCC, it is ok not to declare it.
- (Supposedly there are some machines where it might get a warning,
- but changing this conditional to __STDC__ is too risky.) */
-#ifdef __GNUC__
-#ifdef IN_GCC
-#include "gstddef.h"
-#else
-#include <stddef.h>
-#endif
-extern size_t strlen (const char *);
-#endif
-
-#endif /* __GNU_LIBRARY__ || STDC_HEADERS */
-
-/* Handle permutation of arguments. */
-
-/* Describe the part of ARGV that contains non-options that have
- been skipped. `first_nonopt' is the index in ARGV of the first of them;
- `last_nonopt' is the index after the last of them. */
-
-static int first_nonopt;
-static int last_nonopt;
-
-/* Exchange two adjacent subsequences of ARGV.
- One subsequence is elements [first_nonopt,last_nonopt)
- which contains all the non-options that have been skipped so far.
- The other is elements [last_nonopt,optind), which contains all
- the options processed since those non-options were skipped.
-
- `first_nonopt' and `last_nonopt' are relocated so that they describe
- the new indices of the non-options in ARGV after they are moved. */
-
-static void
-exchange (argv)
- char **argv;
-{
- int bottom = first_nonopt;
- int middle = last_nonopt;
- int top = optind;
- char *tem;
-
- /* Exchange the shorter segment with the far end of the longer segment.
- That puts the shorter segment into the right place.
- It leaves the longer segment in the right place overall,
- but it consists of two parts that need to be swapped next. */
-
- while (top > middle && middle > bottom)
- {
- if (top - middle > middle - bottom)
- {
- /* Bottom segment is the short one. */
- int len = middle - bottom;
- register int i;
-
- /* Swap it with the top part of the top segment. */
- for (i = 0; i < len; i++)
- {
- tem = argv[bottom + i];
- argv[bottom + i] = argv[top - (middle - bottom) + i];
- argv[top - (middle - bottom) + i] = tem;
- }
- /* Exclude the moved bottom segment from further swapping. */
- top -= len;
- }
- else
- {
- /* Top segment is the short one. */
- int len = top - middle;
- register int i;
-
- /* Swap it with the bottom part of the bottom segment. */
- for (i = 0; i < len; i++)
- {
- tem = argv[bottom + i];
- argv[bottom + i] = argv[middle + i];
- argv[middle + i] = tem;
- }
- /* Exclude the moved top segment from further swapping. */
- bottom += len;
- }
- }
-
- /* Update records for the slots the non-options now occupy. */
-
- first_nonopt += (optind - last_nonopt);
- last_nonopt = optind;
-}
-
-/* Scan elements of ARGV (whose length is ARGC) for option characters
- given in OPTSTRING.
-
- If an element of ARGV starts with '-', and is not exactly "-" or "--",
- then it is an option element. The characters of this element
- (aside from the initial '-') are option characters. If `getopt'
- is called repeatedly, it returns successively each of the option characters
- from each of the option elements.
-
- If `getopt' finds another option character, it returns that character,
- updating `optind' and `nextchar' so that the next call to `getopt' can
- resume the scan with the following option character or ARGV-element.
-
- If there are no more option characters, `getopt' returns `EOF'.
- Then `optind' is the index in ARGV of the first ARGV-element
- that is not an option. (The ARGV-elements have been permuted
- so that those that are not options now come last.)
-
- OPTSTRING is a string containing the legitimate option characters.
- If an option character is seen that is not listed in OPTSTRING,
- return '?' after printing an error message. If you set `opterr' to
- zero, the error message is suppressed but we still return '?'.
-
- If a char in OPTSTRING is followed by a colon, that means it wants an arg,
- so the following text in the same ARGV-element, or the text of the following
- ARGV-element, is returned in `optarg'. Two colons mean an option that
- wants an optional arg; if there is text in the current ARGV-element,
- it is returned in `optarg', otherwise `optarg' is set to zero.
-
- If OPTSTRING starts with `-' or `+', it requests different methods of
- handling the non-option ARGV-elements.
- See the comments about RETURN_IN_ORDER and REQUIRE_ORDER, above.
-
- Long-named options begin with `--' instead of `-'.
- Their names may be abbreviated as long as the abbreviation is unique
- or is an exact match for some defined option. If they have an
- argument, it follows the option name in the same ARGV-element, separated
- from the option name by a `=', or else the in next ARGV-element.
- When `getopt' finds a long-named option, it returns 0 if that option's
- `flag' field is nonzero, the value of the option's `val' field
- if the `flag' field is zero.
-
- The elements of ARGV aren't really const, because we permute them.
- But we pretend they're const in the prototype to be compatible
- with other systems.
-
- LONGOPTS is a vector of `struct option' terminated by an
- element containing a name which is zero.
-
- LONGIND returns the index in LONGOPT of the long-named option found.
- It is only valid when a long-named option has been found by the most
- recent call.
-
- If LONG_ONLY is nonzero, '-' as well as '--' can introduce
- long-named options. */
-
-int
-_getopt_internal (argc, argv, optstring, longopts, longind, long_only)
- int argc;
- char *const *argv;
- const char *optstring;
- const struct option *longopts;
- int *longind;
- int long_only;
-{
- int option_index;
-
- optarg = 0;
-
- /* Initialize the internal data when the first call is made.
- Start processing options with ARGV-element 1 (since ARGV-element 0
- is the program name); the sequence of previously skipped
- non-option ARGV-elements is empty. */
-
- if (optind == 0)
- {
- first_nonopt = last_nonopt = optind = 1;
-
- nextchar = NULL;
-
- /* Determine how to handle the ordering of options and nonoptions. */
-
- if (optstring[0] == '-')
- {
- ordering = RETURN_IN_ORDER;
- ++optstring;
- }
- else if (optstring[0] == '+')
- {
- ordering = REQUIRE_ORDER;
- ++optstring;
- }
- else if (getenv ("POSIXLY_CORRECT") != NULL)
- ordering = REQUIRE_ORDER;
- else
- ordering = PERMUTE;
- }
-
- if (nextchar == NULL || *nextchar == '\0')
- {
- if (ordering == PERMUTE)
- {
- /* If we have just processed some options following some non-options,
- exchange them so that the options come first. */
-
- if (first_nonopt != last_nonopt && last_nonopt != optind)
- exchange ((char **) argv);
- else if (last_nonopt != optind)
- first_nonopt = optind;
-
- /* Now skip any additional non-options
- and extend the range of non-options previously skipped. */
-
- while (optind < argc
- && (argv[optind][0] != '-' || argv[optind][1] == '\0')
-#ifdef GETOPT_COMPAT
- && (longopts == NULL
- || argv[optind][0] != '+' || argv[optind][1] == '\0')
-#endif /* GETOPT_COMPAT */
- )
- optind++;
- last_nonopt = optind;
- }
-
- /* Special ARGV-element `--' means premature end of options.
- Skip it like a null option,
- then exchange with previous non-options as if it were an option,
- then skip everything else like a non-option. */
-
- if (optind != argc && !strcmp (argv[optind], "--"))
- {
- optind++;
-
- if (first_nonopt != last_nonopt && last_nonopt != optind)
- exchange ((char **) argv);
- else if (first_nonopt == last_nonopt)
- first_nonopt = optind;
- last_nonopt = argc;
-
- optind = argc;
- }
-
- /* If we have done all the ARGV-elements, stop the scan
- and back over any non-options that we skipped and permuted. */
-
- if (optind == argc)
- {
- /* Set the next-arg-index to point at the non-options
- that we previously skipped, so the caller will digest them. */
- if (first_nonopt != last_nonopt)
- optind = first_nonopt;
- return EOF;
- }
-
- /* If we have come to a non-option and did not permute it,
- either stop the scan or describe it to the caller and pass it by. */
-
- if ((argv[optind][0] != '-' || argv[optind][1] == '\0')
-#ifdef GETOPT_COMPAT
- && (longopts == NULL
- || argv[optind][0] != '+' || argv[optind][1] == '\0')
-#endif /* GETOPT_COMPAT */
- )
- {
- if (ordering == REQUIRE_ORDER)
- return EOF;
- optarg = argv[optind++];
- return 1;
- }
-
- /* We have found another option-ARGV-element.
- Start decoding its characters. */
-
- nextchar = (argv[optind] + 1
- + (longopts != NULL && argv[optind][1] == '-'));
- }
-
- if (longopts != NULL
- && ((argv[optind][0] == '-'
- && (argv[optind][1] == '-' || long_only))
-#ifdef GETOPT_COMPAT
- || argv[optind][0] == '+'
-#endif /* GETOPT_COMPAT */
- ))
- {
- const struct option *p;
- char *s = nextchar;
- int exact = 0;
- int ambig = 0;
- const struct option *pfound = NULL;
- int indfound;
-
- while (*s && *s != '=')
- s++;
-
- /* Test all options for either exact match or abbreviated matches. */
- for (p = longopts, option_index = 0; p->name;
- p++, option_index++)
- if (!strncmp (p->name, nextchar, s - nextchar))
- {
- if (s - nextchar == strlen (p->name))
- {
- /* Exact match found. */
- pfound = p;
- indfound = option_index;
- exact = 1;
- break;
- }
- else if (pfound == NULL)
- {
- /* First nonexact match found. */
- pfound = p;
- indfound = option_index;
- }
- else
- /* Second nonexact match found. */
- ambig = 1;
- }
-
- if (ambig && !exact)
- {
- if (opterr)
- fprintf (stderr, "%s: option `%s' is ambiguous\n",
- argv[0], argv[optind]);
- nextchar += strlen (nextchar);
- optind++;
- return '?';
- }
-
- if (pfound != NULL)
- {
- option_index = indfound;
- optind++;
- if (*s)
- {
- /* Don't test has_arg with >, because some C compilers don't
- allow it to be used on enums. */
- if (pfound->has_arg)
- optarg = s + 1;
- else
- {
- if (opterr)
- {
- if (argv[optind - 1][1] == '-')
- /* --option */
- fprintf (stderr,
- "%s: option `--%s' doesn't allow an argument\n",
- argv[0], pfound->name);
- else
- /* +option or -option */
- fprintf (stderr,
- "%s: option `%c%s' doesn't allow an argument\n",
- argv[0], argv[optind - 1][0], pfound->name);
- }
- nextchar += strlen (nextchar);
- return '?';
- }
- }
- else if (pfound->has_arg == 1)
- {
- if (optind < argc)
- optarg = argv[optind++];
- else
- {
- if (opterr)
- fprintf (stderr, "%s: option `%s' requires an argument\n",
- argv[0], argv[optind - 1]);
- nextchar += strlen (nextchar);
- return optstring[0] == ':' ? ':' : '?';
- }
- }
- nextchar += strlen (nextchar);
- if (longind != NULL)
- *longind = option_index;
- if (pfound->flag)
- {
- *(pfound->flag) = pfound->val;
- return 0;
- }
- return pfound->val;
- }
- /* Can't find it as a long option. If this is not getopt_long_only,
- or the option starts with '--' or is not a valid short
- option, then it's an error.
- Otherwise interpret it as a short option. */
- if (!long_only || argv[optind][1] == '-'
-#ifdef GETOPT_COMPAT
- || argv[optind][0] == '+'
-#endif /* GETOPT_COMPAT */
- || my_index (optstring, *nextchar) == NULL)
- {
- if (opterr)
- {
- if (argv[optind][1] == '-')
- /* --option */
- fprintf (stderr, "%s: unrecognized option `--%s'\n",
- argv[0], nextchar);
- else
- /* +option or -option */
- fprintf (stderr, "%s: unrecognized option `%c%s'\n",
- argv[0], argv[optind][0], nextchar);
- }
- nextchar = (char *) "";
- optind++;
- return '?';
- }
- }
-
- /* Look at and handle the next option-character. */
-
- {
- char c = *nextchar++;
- char *temp = my_index (optstring, c);
-
- /* Increment `optind' when we start to process its last character. */
- if (*nextchar == '\0')
- ++optind;
-
- if (temp == NULL || c == ':')
- {
- if (opterr)
- {
-#if 0
- if (c < 040 || c >= 0177)
- fprintf (stderr, "%s: unrecognized option, character code 0%o\n",
- argv[0], c);
- else
- fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c);
-#else
- /* 1003.2 specifies the format of this message. */
- fprintf (stderr, "%s: illegal option -- %c\n", argv[0], c);
-#endif
- }
- optopt = c;
- return '?';
- }
- if (temp[1] == ':')
- {
- if (temp[2] == ':')
- {
- /* This is an option that accepts an argument optionally. */
- if (*nextchar != '\0')
- {
- optarg = nextchar;
- optind++;
- }
- else
- optarg = 0;
- nextchar = NULL;
- }
- else
- {
- /* This is an option that requires an argument. */
- if (*nextchar != '\0')
- {
- optarg = nextchar;
- /* If we end this ARGV-element by taking the rest as an arg,
- we must advance to the next element now. */
- optind++;
- }
- else if (optind == argc)
- {
- if (opterr)
- {
-#if 0
- fprintf (stderr, "%s: option `-%c' requires an argument\n",
- argv[0], c);
-#else
- /* 1003.2 specifies the format of this message. */
- fprintf (stderr, "%s: option requires an argument -- %c\n",
- argv[0], c);
-#endif
- }
- optopt = c;
- if (optstring[0] == ':')
- c = ':';
- else
- c = '?';
- }
- else
- /* We already incremented `optind' once;
- increment it again when taking next ARGV-elt as argument. */
- optarg = argv[optind++];
- nextchar = NULL;
- }
- }
- return c;
- }
-}
-
-int
-getopt (argc, argv, optstring)
- int argc;
- char *const *argv;
- const char *optstring;
-{
- return _getopt_internal (argc, argv, optstring,
- (const struct option *) 0,
- (int *) 0,
- 0);
-}
-
-#endif /* _LIBC or not __GNU_LIBRARY__. */
-
-#ifdef TEST
-
-/* Compile with -DTEST to make an executable for use in testing
- the above definition of `getopt'. */
-
-int
-main (argc, argv)
- int argc;
- char **argv;
-{
- int c;
- int digit_optind = 0;
-
- while (1)
- {
- int this_option_optind = optind ? optind : 1;
-
- c = getopt (argc, argv, "abc:d:0123456789");
- if (c == EOF)
- break;
-
- switch (c)
- {
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- if (digit_optind != 0 && digit_optind != this_option_optind)
- printf ("digits occur in two different argv-elements.\n");
- digit_optind = this_option_optind;
- printf ("option %c\n", c);
- break;
-
- case 'a':
- printf ("option a\n");
- break;
-
- case 'b':
- printf ("option b\n");
- break;
-
- case 'c':
- printf ("option c with value `%s'\n", optarg);
- break;
-
- case '?':
- break;
-
- default:
- printf ("?? getopt returned character code 0%o ??\n", c);
- }
- }
-
- if (optind < argc)
- {
- printf ("non-option ARGV-elements: ");
- while (optind < argc)
- printf ("%s ", argv[optind++]);
- printf ("\n");
- }
-
- exit (0);
-}
-
-#endif /* TEST */
diff --git a/gnu/usr.bin/awk/getopt.h b/gnu/usr.bin/awk/getopt.h
deleted file mode 100644
index b0fc4ff..0000000
--- a/gnu/usr.bin/awk/getopt.h
+++ /dev/null
@@ -1,129 +0,0 @@
-/* Declarations for getopt.
- Copyright (C) 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License as published by the
- Free Software Foundation; either version 2, or (at your option) any
- later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
-
-#ifndef _GETOPT_H
-#define _GETOPT_H 1
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* For communication from `getopt' to the caller.
- When `getopt' finds an option that takes an argument,
- the argument value is returned here.
- Also, when `ordering' is RETURN_IN_ORDER,
- each non-option ARGV-element is returned here. */
-
-extern char *optarg;
-
-/* Index in ARGV of the next element to be scanned.
- This is used for communication to and from the caller
- and for communication between successive calls to `getopt'.
-
- On entry to `getopt', zero means this is the first call; initialize.
-
- When `getopt' returns EOF, this is the index of the first of the
- non-option elements that the caller should itself scan.
-
- Otherwise, `optind' communicates from one call to the next
- how much of ARGV has been scanned so far. */
-
-extern int optind;
-
-/* Callers store zero here to inhibit the error message `getopt' prints
- for unrecognized options. */
-
-extern int opterr;
-
-/* Set to an option character which was unrecognized. */
-
-extern int optopt;
-
-/* Describe the long-named options requested by the application.
- The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector
- of `struct option' terminated by an element containing a name which is
- zero.
-
- The field `has_arg' is:
- no_argument (or 0) if the option does not take an argument,
- required_argument (or 1) if the option requires an argument,
- optional_argument (or 2) if the option takes an optional argument.
-
- If the field `flag' is not NULL, it points to a variable that is set
- to the value given in the field `val' when the option is found, but
- left unchanged if the option is not found.
-
- To have a long-named option do something other than set an `int' to
- a compiled-in constant, such as set a value from `optarg', set the
- option's `flag' field to zero and its `val' field to a nonzero
- value (the equivalent single-letter option character, if there is
- one). For long options that have a zero `flag' field, `getopt'
- returns the contents of the `val' field. */
-
-struct option
-{
-#ifdef __STDC__
- const char *name;
-#else
- char *name;
-#endif
- /* has_arg can't be an enum because some compilers complain about
- type mismatches in all the code that assumes it is an int. */
- int has_arg;
- int *flag;
- int val;
-};
-
-/* Names for the values of the `has_arg' field of `struct option'. */
-
-#define no_argument 0
-#define required_argument 1
-#define optional_argument 2
-
-#ifdef __STDC__
-#if defined(__GNU_LIBRARY__)
-/* Many other libraries have conflicting prototypes for getopt, with
- differences in the consts, in stdlib.h. To avoid compilation
- errors, only prototype getopt for the GNU C library. */
-extern int getopt (int argc, char *const *argv, const char *shortopts);
-#else /* not __GNU_LIBRARY__ */
-extern int getopt ();
-#endif /* not __GNU_LIBRARY__ */
-extern int getopt_long (int argc, char *const *argv, const char *shortopts,
- const struct option *longopts, int *longind);
-extern int getopt_long_only (int argc, char *const *argv,
- const char *shortopts,
- const struct option *longopts, int *longind);
-
-/* Internal only. Users should not call this directly. */
-extern int _getopt_internal (int argc, char *const *argv,
- const char *shortopts,
- const struct option *longopts, int *longind,
- int long_only);
-#else /* not __STDC__ */
-extern int getopt ();
-extern int getopt_long ();
-extern int getopt_long_only ();
-
-extern int _getopt_internal ();
-#endif /* not __STDC__ */
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _GETOPT_H */
diff --git a/gnu/usr.bin/awk/getopt1.c b/gnu/usr.bin/awk/getopt1.c
deleted file mode 100644
index 7739b51..0000000
--- a/gnu/usr.bin/awk/getopt1.c
+++ /dev/null
@@ -1,187 +0,0 @@
-/* getopt_long and getopt_long_only entry points for GNU getopt.
- Copyright (C) 1987, 88, 89, 90, 91, 92, 1993
- Free Software Foundation, Inc.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License as published by the
- Free Software Foundation; either version 2, or (at your option) any
- later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
-
-#ifdef HAVE_CONFIG_H
-#if defined (emacs) || defined (CONFIG_BROKETS)
-/* We use <config.h> instead of "config.h" so that a compilation
- using -I. -I$srcdir will use ./config.h rather than $srcdir/config.h
- (which it would do because it found this file in $srcdir). */
-#include <config.h>
-#else
-#include "config.h"
-#endif
-#endif
-
-#include "getopt.h"
-
-#ifndef __STDC__
-/* This is a separate conditional since some stdc systems
- reject `defined (const)'. */
-#ifndef const
-#define const
-#endif
-#endif
-
-#include <stdio.h>
-
-/* Comment out all this code if we are using the GNU C Library, and are not
- actually compiling the library itself. This code is part of the GNU C
- Library, but also included in many other GNU distributions. Compiling
- and linking in this code is a waste when using the GNU C library
- (especially if it is a shared library). Rather than having every GNU
- program understand `configure --with-gnu-libc' and omit the object files,
- it is simpler to just do this in the source for each such file. */
-
-#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
-
-
-/* This needs to come after some library #include
- to get __GNU_LIBRARY__ defined. */
-#if defined(__GNU_LIBRARY__) || defined(OS2) || defined(MSDOS) || defined(atarist)
-#include <stdlib.h>
-#else
-char *getenv ();
-#endif
-
-#ifndef NULL
-#define NULL 0
-#endif
-
-int
-getopt_long (argc, argv, options, long_options, opt_index)
- int argc;
- char *const *argv;
- const char *options;
- const struct option *long_options;
- int *opt_index;
-{
- return _getopt_internal (argc, argv, options, long_options, opt_index, 0);
-}
-
-/* Like getopt_long, but '-' as well as '--' can indicate a long option.
- If an option that starts with '-' (not '--') doesn't match a long option,
- but does match a short option, it is parsed as a short option
- instead. */
-
-int
-getopt_long_only (argc, argv, options, long_options, opt_index)
- int argc;
- char *const *argv;
- const char *options;
- const struct option *long_options;
- int *opt_index;
-{
- return _getopt_internal (argc, argv, options, long_options, opt_index, 1);
-}
-
-
-#endif /* _LIBC or not __GNU_LIBRARY__. */
-
-#ifdef TEST
-
-#include <stdio.h>
-
-int
-main (argc, argv)
- int argc;
- char **argv;
-{
- int c;
- int digit_optind = 0;
-
- while (1)
- {
- int this_option_optind = optind ? optind : 1;
- int option_index = 0;
- static struct option long_options[] =
- {
- {"add", 1, 0, 0},
- {"append", 0, 0, 0},
- {"delete", 1, 0, 0},
- {"verbose", 0, 0, 0},
- {"create", 0, 0, 0},
- {"file", 1, 0, 0},
- {0, 0, 0, 0}
- };
-
- c = getopt_long (argc, argv, "abc:d:0123456789",
- long_options, &option_index);
- if (c == EOF)
- break;
-
- switch (c)
- {
- case 0:
- printf ("option %s", long_options[option_index].name);
- if (optarg)
- printf (" with arg %s", optarg);
- printf ("\n");
- break;
-
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- if (digit_optind != 0 && digit_optind != this_option_optind)
- printf ("digits occur in two different argv-elements.\n");
- digit_optind = this_option_optind;
- printf ("option %c\n", c);
- break;
-
- case 'a':
- printf ("option a\n");
- break;
-
- case 'b':
- printf ("option b\n");
- break;
-
- case 'c':
- printf ("option c with value `%s'\n", optarg);
- break;
-
- case 'd':
- printf ("option d with value `%s'\n", optarg);
- break;
-
- case '?':
- break;
-
- default:
- printf ("?? getopt returned character code 0%o ??\n", c);
- }
- }
-
- if (optind < argc)
- {
- printf ("non-option ARGV-elements: ");
- while (optind < argc)
- printf ("%s ", argv[optind++]);
- printf ("\n");
- }
-
- exit (0);
-}
-
-#endif /* TEST */
diff --git a/gnu/usr.bin/awk/io.c b/gnu/usr.bin/awk/io.c
deleted file mode 100644
index 7f92556..0000000
--- a/gnu/usr.bin/awk/io.c
+++ /dev/null
@@ -1,1283 +0,0 @@
-/*
- * io.c --- routines for dealing with input and output and records
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#if !defined(VMS) && !defined(VMS_POSIX) && !defined(_MSC_VER)
-#include <sys/param.h>
-#endif
-#include "awk.h"
-
-#ifndef O_RDONLY
-#include <fcntl.h>
-#endif
-
-#if !defined(S_ISDIR) && defined(S_IFDIR)
-#define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR)
-#endif
-
-#ifndef ENFILE
-#define ENFILE EMFILE
-#endif
-
-#ifndef atarist
-#define INVALID_HANDLE (-1)
-#else
-#define INVALID_HANDLE (__SMALLEST_VALID_HANDLE - 1)
-#endif
-
-#if defined(MSDOS) || defined(OS2) || defined(atarist)
-#define PIPES_SIMULATED
-#endif
-
-static IOBUF *nextfile P((int skipping));
-static int inrec P((IOBUF *iop));
-static int iop_close P((IOBUF *iop));
-struct redirect *redirect P((NODE *tree, int *errflg));
-static void close_one P((void));
-static int close_redir P((struct redirect *rp, int exitwarn));
-#ifndef PIPES_SIMULATED
-static int wait_any P((int interesting));
-#endif
-static IOBUF *gawk_popen P((char *cmd, struct redirect *rp));
-static IOBUF *iop_open P((const char *file, const char *how));
-static int gawk_pclose P((struct redirect *rp));
-static int do_pathopen P((const char *file));
-static int str2mode P((const char *mode));
-static void spec_setup P((IOBUF *iop, int len, int allocate));
-static int specfdopen P((IOBUF *iop, const char *name, const char *mode));
-static int pidopen P((IOBUF *iop, const char *name, const char *mode));
-static int useropen P((IOBUF *iop, const char *name, const char *mode));
-
-extern FILE *fdopen();
-
-#if defined (MSDOS)
-#include "popen.h"
-#define popen(c,m) os_popen(c,m)
-#define pclose(f) os_pclose(f)
-#elif defined (OS2) /* OS/2, but not family mode */
-#if defined (_MSC_VER)
-#define popen(c,m) _popen(c,m)
-#define pclose(f) _pclose(f)
-#endif
-#else
-extern FILE *popen();
-#endif
-
-static struct redirect *red_head = NULL;
-
-extern int output_is_tty;
-extern NODE *ARGC_node;
-extern NODE *ARGV_node;
-extern NODE *ARGIND_node;
-extern NODE *ERRNO_node;
-extern NODE **fields_arr;
-
-static jmp_buf filebuf; /* for do_nextfile() */
-
-/* do_nextfile --- implement gawk "next file" extension */
-
-void
-do_nextfile()
-{
- (void) nextfile(1);
- longjmp(filebuf, 1);
-}
-
-static IOBUF *
-nextfile(skipping)
-int skipping;
-{
- static int i = 1;
- static int files = 0;
- NODE *arg;
- static IOBUF *curfile = NULL;
-
- if (skipping) {
- if (curfile != NULL)
- iop_close(curfile);
- curfile = NULL;
- return NULL;
- }
- if (curfile != NULL) {
- if (curfile->cnt == EOF) {
- (void) iop_close(curfile);
- curfile = NULL;
- } else
- return curfile;
- }
- for (; i < (int) (ARGC_node->lnode->numbr); i++) {
- arg = *assoc_lookup(ARGV_node, tmp_number((AWKNUM) i));
- if (arg->stptr[0] == '\0')
- continue;
- arg->stptr[arg->stlen] = '\0';
- if (! do_unix) {
- ARGIND_node->var_value->numbr = i;
- ARGIND_node->var_value->flags = NUM|NUMBER;
- }
- if (!arg_assign(arg->stptr)) {
- files++;
- curfile = iop_open(arg->stptr, "r");
- if (curfile == NULL)
- fatal("cannot open file `%s' for reading (%s)",
- arg->stptr, strerror(errno));
- /* NOTREACHED */
- /* This is a kludge. */
- unref(FILENAME_node->var_value);
- FILENAME_node->var_value = dupnode(arg);
- FNR = 0;
- i++;
- break;
- }
- }
- if (files == 0) {
- files++;
- /* no args. -- use stdin */
- /* FNR is init'ed to 0 */
- FILENAME_node->var_value = make_string("-", 1);
- curfile = iop_alloc(fileno(stdin));
- }
- return curfile;
-}
-
-void
-set_FNR()
-{
- FNR = (long) FNR_node->var_value->numbr;
-}
-
-void
-set_NR()
-{
- NR = (long) NR_node->var_value->numbr;
-}
-
-/*
- * This reads in a record from the input file
- */
-static int
-inrec(iop)
-IOBUF *iop;
-{
- char *begin;
- register int cnt;
- int retval = 0;
-
- cnt = get_a_record(&begin, iop, *RS, NULL);
- if (cnt == EOF) {
- cnt = 0;
- retval = 1;
- } else {
- NR += 1;
- FNR += 1;
- }
- set_record(begin, cnt, 1);
-
- return retval;
-}
-
-static int
-iop_close(iop)
-IOBUF *iop;
-{
- int ret;
-
- if (iop == NULL)
- return 0;
- errno = 0;
-
-#ifdef _CRAY
- /* Work around bug in UNICOS popen */
- if (iop->fd < 3)
- ret = 0;
- else
-#endif
- /* save these for re-use; don't free the storage */
- if ((iop->flag & IOP_IS_INTERNAL) != 0) {
- iop->off = iop->buf;
- iop->end = iop->buf + strlen(iop->buf);
- iop->cnt = 0;
- iop->secsiz = 0;
- return 0;
- }
-
- /* Don't close standard files or else crufty code elsewhere will lose */
- if (iop->fd == fileno(stdin) ||
- iop->fd == fileno(stdout) ||
- iop->fd == fileno(stderr))
- ret = 0;
- else
- ret = close(iop->fd);
- if (ret == -1)
- warning("close of fd %d failed (%s)", iop->fd, strerror(errno));
- if ((iop->flag & IOP_NO_FREE) == 0) {
- /*
- * be careful -- $0 may still reference the buffer even though
- * an explicit close is being done; in the future, maybe we
- * can do this a bit better
- */
- if (iop->buf) {
- if ((fields_arr[0]->stptr >= iop->buf)
- && (fields_arr[0]->stptr < iop->end)) {
- NODE *t;
-
- t = make_string(fields_arr[0]->stptr,
- fields_arr[0]->stlen);
- unref(fields_arr[0]);
- fields_arr [0] = t;
- reset_record ();
- }
- free(iop->buf);
- }
- free((char *)iop);
- }
- return ret == -1 ? 1 : 0;
-}
-
-void
-do_input()
-{
- IOBUF *iop;
- extern int exiting;
-
- (void) setjmp(filebuf);
-
- while ((iop = nextfile(0)) != NULL) {
- if (inrec(iop) == 0)
- while (interpret(expression_value) && inrec(iop) == 0)
- continue;
- /* recover any space from C based alloca */
- (void) alloca(0);
-
- if (exiting)
- break;
- }
-}
-
-/* Redirection for printf and print commands */
-struct redirect *
-redirect(tree, errflg)
-NODE *tree;
-int *errflg;
-{
- register NODE *tmp;
- register struct redirect *rp;
- register char *str;
- int tflag = 0;
- int outflag = 0;
- const char *direction = "to";
- const char *mode;
- int fd;
- const char *what = NULL;
-
- switch (tree->type) {
- case Node_redirect_append:
- tflag = RED_APPEND;
- /* FALL THROUGH */
- case Node_redirect_output:
- outflag = (RED_FILE|RED_WRITE);
- tflag |= outflag;
- if (tree->type == Node_redirect_output)
- what = ">";
- else
- what = ">>";
- break;
- case Node_redirect_pipe:
- tflag = (RED_PIPE|RED_WRITE);
- what = "|";
- break;
- case Node_redirect_pipein:
- tflag = (RED_PIPE|RED_READ);
- what = "|";
- break;
- case Node_redirect_input:
- tflag = (RED_FILE|RED_READ);
- what = "<";
- break;
- default:
- fatal ("invalid tree type %d in redirect()", tree->type);
- break;
- }
- tmp = tree_eval(tree->subnode);
- if (do_lint && ! (tmp->flags & STR))
- warning("expression in `%s' redirection only has numeric value",
- what);
- tmp = force_string(tmp);
- str = tmp->stptr;
- if (str == NULL || *str == '\0')
- fatal("expression for `%s' redirection has null string value",
- what);
- if (do_lint
- && (STREQN(str, "0", tmp->stlen) || STREQN(str, "1", tmp->stlen)))
- warning("filename `%s' for `%s' redirection may be result of logical expression", str, what);
- for (rp = red_head; rp != NULL; rp = rp->next)
- if (strlen(rp->value) == tmp->stlen
- && STREQN(rp->value, str, tmp->stlen)
- && ((rp->flag & ~(RED_NOBUF|RED_EOF)) == tflag
- || (outflag
- && (rp->flag & (RED_FILE|RED_WRITE)) == outflag)))
- break;
- if (rp == NULL) {
- emalloc(rp, struct redirect *, sizeof(struct redirect),
- "redirect");
- emalloc(str, char *, tmp->stlen+1, "redirect");
- memcpy(str, tmp->stptr, tmp->stlen);
- str[tmp->stlen] = '\0';
- rp->value = str;
- rp->flag = tflag;
- rp->fp = NULL;
- rp->iop = NULL;
- rp->pid = 0; /* unlikely that we're worried about init */
- rp->status = 0;
- /* maintain list in most-recently-used first order */
- if (red_head)
- red_head->prev = rp;
- rp->prev = NULL;
- rp->next = red_head;
- red_head = rp;
- }
- while (rp->fp == NULL && rp->iop == NULL) {
- if (rp->flag & RED_EOF)
- /* encountered EOF on file or pipe -- must be cleared
- * by explicit close() before reading more
- */
- return rp;
- mode = NULL;
- errno = 0;
- switch (tree->type) {
- case Node_redirect_output:
- mode = "w";
- if (rp->flag & RED_USED)
- mode = "a";
- break;
- case Node_redirect_append:
- mode = "a";
- break;
- case Node_redirect_pipe:
- if ((rp->fp = popen(str, "w")) == NULL)
- fatal("can't open pipe (\"%s\") for output (%s)",
- str, strerror(errno));
- rp->flag |= RED_NOBUF;
- break;
- case Node_redirect_pipein:
- direction = "from";
- if (gawk_popen(str, rp) == NULL)
- fatal("can't open pipe (\"%s\") for input (%s)",
- str, strerror(errno));
- break;
- case Node_redirect_input:
- direction = "from";
- rp->iop = iop_open(str, "r");
- break;
- default:
- cant_happen();
- }
- if (mode != NULL) {
- fd = devopen(str, mode);
- if (fd > INVALID_HANDLE) {
- if (fd == fileno(stdin))
- rp->fp = stdin;
- else if (fd == fileno(stdout))
- rp->fp = stdout;
- else if (fd == fileno(stderr))
- rp->fp = stderr;
- else {
- rp->fp = fdopen(fd, (char *) mode);
- /* don't leak file descriptors */
- if (rp->fp == NULL)
- close(fd);
- }
- if (rp->fp != NULL && isatty(fd))
- rp->flag |= RED_NOBUF;
- }
- }
- if (rp->fp == NULL && rp->iop == NULL) {
- /* too many files open -- close one and try again */
- if (errno == EMFILE || errno == ENFILE)
- close_one();
- else {
- /*
- * Some other reason for failure.
- *
- * On redirection of input from a file,
- * just return an error, so e.g. getline
- * can return -1. For output to file,
- * complain. The shell will complain on
- * a bad command to a pipe.
- */
- *errflg = errno;
- if (tree->type == Node_redirect_output
- || tree->type == Node_redirect_append)
- fatal("can't redirect %s `%s' (%s)",
- direction, str, strerror(errno));
- else {
- free_temp(tmp);
- return NULL;
- }
- }
- }
- }
- free_temp(tmp);
- return rp;
-}
-
-static void
-close_one()
-{
- register struct redirect *rp;
- register struct redirect *rplast = NULL;
-
- /* go to end of list first, to pick up least recently used entry */
- for (rp = red_head; rp != NULL; rp = rp->next)
- rplast = rp;
- /* now work back up through the list */
- for (rp = rplast; rp != NULL; rp = rp->prev)
- if (rp->fp && (rp->flag & RED_FILE)) {
- rp->flag |= RED_USED;
- errno = 0;
- if (fclose(rp->fp))
- warning("close of \"%s\" failed (%s).",
- rp->value, strerror(errno));
- rp->fp = NULL;
- break;
- }
- if (rp == NULL)
- /* surely this is the only reason ??? */
- fatal("too many pipes or input files open");
-}
-
-NODE *
-do_close(tree)
-NODE *tree;
-{
- NODE *tmp;
- register struct redirect *rp;
-
- tmp = force_string(tree_eval(tree->subnode));
- for (rp = red_head; rp != NULL; rp = rp->next) {
- if (strlen(rp->value) == tmp->stlen
- && STREQN(rp->value, tmp->stptr, tmp->stlen))
- break;
- }
- free_temp(tmp);
- if (rp == NULL) /* no match */
- return tmp_number((AWKNUM) 0.0);
- fflush(stdout); /* synchronize regular output */
- tmp = tmp_number((AWKNUM)close_redir(rp, 0));
- rp = NULL;
- return tmp;
-}
-
-static int
-close_redir(rp, exitwarn)
-register struct redirect *rp;
-int exitwarn;
-{
- int status = 0;
- char *what;
-
- if (rp == NULL)
- return 0;
- if (rp->fp == stdout || rp->fp == stderr)
- return 0;
- errno = 0;
- if ((rp->flag & (RED_PIPE|RED_WRITE)) == (RED_PIPE|RED_WRITE))
- status = pclose(rp->fp);
- else if (rp->fp)
- status = fclose(rp->fp);
- else if (rp->iop) {
- if (rp->flag & RED_PIPE)
- status = gawk_pclose(rp);
- else {
- status = iop_close(rp->iop);
- rp->iop = NULL;
- }
- }
-
- what = (rp->flag & RED_PIPE) ? "pipe" : "file";
-
- if (exitwarn)
- warning("no explicit close of %s \"%s\" provided",
- what, rp->value);
-
- /* SVR4 awk checks and warns about status of close */
- if (status) {
- char *s = strerror(errno);
-
- warning("failure status (%d) on %s close of \"%s\" (%s)",
- status, what, rp->value, s);
-
- if (! do_unix) {
- /* set ERRNO too so that program can get at it */
- unref(ERRNO_node->var_value);
- ERRNO_node->var_value = make_string(s, strlen(s));
- }
- }
- if (rp->next)
- rp->next->prev = rp->prev;
- if (rp->prev)
- rp->prev->next = rp->next;
- else
- red_head = rp->next;
- free(rp->value);
- free((char *)rp);
- return status;
-}
-
-int
-flush_io ()
-{
- register struct redirect *rp;
- int status = 0;
-
- errno = 0;
- if (fflush(stdout)) {
- warning("error writing standard output (%s).", strerror(errno));
- status++;
- }
- if (fflush(stderr)) {
- warning("error writing standard error (%s).", strerror(errno));
- status++;
- }
- for (rp = red_head; rp != NULL; rp = rp->next)
- /* flush both files and pipes, what the heck */
- if ((rp->flag & RED_WRITE) && rp->fp != NULL) {
- if (fflush(rp->fp)) {
- warning("%s flush of \"%s\" failed (%s).",
- (rp->flag & RED_PIPE) ? "pipe" :
- "file", rp->value, strerror(errno));
- status++;
- }
- }
- return status;
-}
-
-int
-close_io ()
-{
- register struct redirect *rp;
- register struct redirect *next;
- int status = 0;
-
- errno = 0;
- for (rp = red_head; rp != NULL; rp = next) {
- next = rp->next;
- /* close_redir() will print a message if needed */
- /* if do_lint, warn about lack of explicit close */
- if (close_redir(rp, do_lint))
- status++;
- rp = NULL;
- }
- /*
- * Some of the non-Unix os's have problems doing an fclose
- * on stdout and stderr. Since we don't really need to close
- * them, we just flush them, and do that across the board.
- */
- if (fflush(stdout)) {
- warning("error writing standard output (%s).", strerror(errno));
- status++;
- }
- if (fflush(stderr)) {
- warning("error writing standard error (%s).", strerror(errno));
- status++;
- }
- return status;
-}
-
-/* str2mode --- convert a string mode to an integer mode */
-
-static int
-str2mode(mode)
-const char *mode;
-{
- int ret;
-
- switch(mode[0]) {
- case 'r':
- ret = O_RDONLY;
- break;
-
- case 'w':
- ret = O_WRONLY|O_CREAT|O_TRUNC;
- break;
-
- case 'a':
- ret = O_WRONLY|O_APPEND|O_CREAT;
- break;
-
- default:
- ret = 0; /* lint */
- cant_happen();
- }
- return ret;
-}
-
-/* devopen --- handle /dev/std{in,out,err}, /dev/fd/N, regular files */
-
-/*
- * This separate version is still needed for output, since file and pipe
- * output is done with stdio. iop_open() handles input with IOBUFs of
- * more "special" files. Those files are not handled here since it makes
- * no sense to use them for output.
- */
-
-int
-devopen(name, mode)
-const char *name, *mode;
-{
- int openfd = INVALID_HANDLE;
- const char *cp;
- char *ptr;
- int flag = 0;
- struct stat buf;
- extern double strtod();
-
- flag = str2mode(mode);
-
- if (do_unix)
- goto strictopen;
-
-#ifdef VMS
- if ((openfd = vms_devopen(name, flag)) >= 0)
- return openfd;
-#endif /* VMS */
-
- if (STREQ(name, "-"))
- openfd = fileno(stdin);
- else if (STREQN(name, "/dev/", 5) && stat((char *) name, &buf) == -1) {
- cp = name + 5;
-
- if (STREQ(cp, "stdin") && (flag & O_RDONLY) == O_RDONLY)
- openfd = fileno(stdin);
- else if (STREQ(cp, "stdout") && (flag & O_WRONLY) == O_WRONLY)
- openfd = fileno(stdout);
- else if (STREQ(cp, "stderr") && (flag & O_WRONLY) == O_WRONLY)
- openfd = fileno(stderr);
- else if (STREQN(cp, "fd/", 3)) {
- cp += 3;
- openfd = (int)strtod(cp, &ptr);
- if (openfd <= INVALID_HANDLE || ptr == cp)
- openfd = INVALID_HANDLE;
- }
- }
-
-strictopen:
- if (openfd == INVALID_HANDLE)
- openfd = open(name, flag, 0666);
- if (openfd != INVALID_HANDLE && fstat(openfd, &buf) > 0)
- if (S_ISDIR(buf.st_mode))
- fatal("file `%s' is a directory", name);
- return openfd;
-}
-
-
-/* spec_setup --- setup an IOBUF for a special internal file */
-
-static void
-spec_setup(iop, len, allocate)
-IOBUF *iop;
-int len;
-int allocate;
-{
- char *cp;
-
- if (allocate) {
- emalloc(cp, char *, len+2, "spec_setup");
- iop->buf = cp;
- } else {
- len = strlen(iop->buf);
- iop->buf[len++] = '\n'; /* get_a_record clobbered it */
- iop->buf[len] = '\0'; /* just in case */
- }
- iop->off = iop->buf;
- iop->cnt = 0;
- iop->secsiz = 0;
- iop->size = len;
- iop->end = iop->buf + len;
- iop->fd = -1;
- iop->flag = IOP_IS_INTERNAL;
-}
-
-/* specfdopen --- open a fd special file */
-
-static int
-specfdopen(iop, name, mode)
-IOBUF *iop;
-const char *name, *mode;
-{
- int fd;
- IOBUF *tp;
-
- fd = devopen(name, mode);
- if (fd == INVALID_HANDLE)
- return INVALID_HANDLE;
- tp = iop_alloc(fd);
- if (tp == NULL)
- return INVALID_HANDLE;
- *iop = *tp;
- iop->flag |= IOP_NO_FREE;
- free(tp);
- return 0;
-}
-
-/*
- * Following mess will improve in 2.16; this is written to avoid
- * long lines, avoid splitting #if with backslash, and avoid #elif
- * to maximize portability.
- */
-#ifndef GETPGRP_NOARG
-#if defined(__svr4__) || defined(BSD4_4) || defined(_POSIX_SOURCE)
-#define GETPGRP_NOARG
-#else
-#if defined(i860) || defined(_AIX) || defined(hpux) || defined(VMS)
-#define GETPGRP_NOARG
-#else
-#if defined(OS2) || defined(MSDOS) || defined(AMIGA) || defined(atarist)
-#define GETPGRP_NOARG
-#endif
-#endif
-#endif
-#endif
-
-#ifdef GETPGRP_NOARG
-#define getpgrp_ARG /* nothing */
-#else
-#define getpgrp_ARG getpid()
-#endif
-
-/* pidopen --- "open" /dev/pid, /dev/ppid, and /dev/pgrpid */
-
-static int
-pidopen(iop, name, mode)
-IOBUF *iop;
-const char *name, *mode;
-{
- char tbuf[BUFSIZ];
- int i;
-
- if (name[6] == 'g')
- sprintf(tbuf, "%d\n", getpgrp( getpgrp_ARG ));
- else if (name[6] == 'i')
- sprintf(tbuf, "%d\n", getpid());
- else
- sprintf(tbuf, "%d\n", getppid());
- i = strlen(tbuf);
- spec_setup(iop, i, 1);
- strcpy(iop->buf, tbuf);
- return 0;
-}
-
-/* useropen --- "open" /dev/user */
-
-/*
- * /dev/user creates a record as follows:
- * $1 = getuid()
- * $2 = geteuid()
- * $3 = getgid()
- * $4 = getegid()
- * If multiple groups are supported, the $5 through $NF are the
- * supplementary group set.
- */
-
-static int
-useropen(iop, name, mode)
-IOBUF *iop;
-const char *name, *mode;
-{
- char tbuf[BUFSIZ], *cp;
- int i;
-#if defined(NGROUPS_MAX) && NGROUPS_MAX > 0
-#if defined(atarist) || defined(__svr4__) || defined(__osf__) || defined(__FreeBSD__)
- gid_t groupset[NGROUPS_MAX];
-#else
- int groupset[NGROUPS_MAX];
-#endif
- int ngroups;
-#endif
-
- sprintf(tbuf, "%d %d %d %d", getuid(), geteuid(), getgid(), getegid());
-
- cp = tbuf + strlen(tbuf);
-#if defined(NGROUPS_MAX) && NGROUPS_MAX > 0
- ngroups = getgroups(NGROUPS_MAX, groupset);
- if (ngroups == -1)
- fatal("could not find groups: %s", strerror(errno));
-
- for (i = 0; i < ngroups; i++) {
- *cp++ = ' ';
- sprintf(cp, "%d", (int)groupset[i]);
- cp += strlen(cp);
- }
-#endif
- *cp++ = '\n';
- *cp++ = '\0';
-
-
- i = strlen(tbuf);
- spec_setup(iop, i, 1);
- strcpy(iop->buf, tbuf);
- return 0;
-}
-
-/* iop_open --- handle special and regular files for input */
-
-static IOBUF *
-iop_open(name, mode)
-const char *name, *mode;
-{
- int openfd = INVALID_HANDLE;
- int flag = 0;
- struct stat buf;
- IOBUF *iop;
- static struct internal {
- const char *name;
- int compare;
- int (*fp) P((IOBUF*,const char *,const char *));
- IOBUF iob;
- } table[] = {
- { "/dev/fd/", 8, specfdopen },
- { "/dev/stdin", 10, specfdopen },
- { "/dev/stdout", 11, specfdopen },
- { "/dev/stderr", 11, specfdopen },
- { "/dev/pid", 8, pidopen },
- { "/dev/ppid", 9, pidopen },
- { "/dev/pgrpid", 11, pidopen },
- { "/dev/user", 9, useropen },
- };
- int devcount = sizeof(table) / sizeof(table[0]);
-
- flag = str2mode(mode);
-
- if (do_unix)
- goto strictopen;
-
- if (STREQ(name, "-"))
- openfd = fileno(stdin);
- else if (STREQN(name, "/dev/", 5) && stat((char *) name, &buf) == -1) {
- int i;
-
- for (i = 0; i < devcount; i++) {
- if (STREQN(name, table[i].name, table[i].compare)) {
- iop = & table[i].iob;
-
- if (iop->buf != NULL) {
- spec_setup(iop, 0, 0);
- return iop;
- } else if ((*table[i].fp)(iop, name, mode) == 0)
- return iop;
- else {
- warning("could not open %s, mode `%s'",
- name, mode);
- return NULL;
- }
- }
- }
- }
-
-strictopen:
- if (openfd == INVALID_HANDLE)
- openfd = open(name, flag, 0666);
- if (openfd != INVALID_HANDLE && fstat(openfd, &buf) > 0)
- if ((buf.st_mode & S_IFMT) == S_IFDIR)
- fatal("file `%s' is a directory", name);
- iop = iop_alloc(openfd);
- return iop;
-}
-
-#ifndef PIPES_SIMULATED
- /* real pipes */
-static int
-wait_any(interesting)
-int interesting; /* pid of interest, if any */
-{
- SIGTYPE (*hstat)(), (*istat)(), (*qstat)();
- int pid;
- int status = 0;
- struct redirect *redp;
- extern int errno;
-
- hstat = signal(SIGHUP, SIG_IGN);
- istat = signal(SIGINT, SIG_IGN);
- qstat = signal(SIGQUIT, SIG_IGN);
- for (;;) {
-#ifdef NeXT
- pid = wait((union wait *)&status);
-#else
- pid = wait(&status);
-#endif /* NeXT */
- if (interesting && pid == interesting) {
- break;
- } else if (pid != -1) {
- for (redp = red_head; redp != NULL; redp = redp->next)
- if (pid == redp->pid) {
- redp->pid = -1;
- redp->status = status;
- if (redp->fp) {
- pclose(redp->fp);
- redp->fp = 0;
- }
- if (redp->iop) {
- (void) iop_close(redp->iop);
- redp->iop = 0;
- }
- break;
- }
- }
- if (pid == -1 && errno == ECHILD)
- break;
- }
- signal(SIGHUP, hstat);
- signal(SIGINT, istat);
- signal(SIGQUIT, qstat);
- return(status);
-}
-
-static IOBUF *
-gawk_popen(cmd, rp)
-char *cmd;
-struct redirect *rp;
-{
- int p[2];
- register int pid;
-
- /* used to wait for any children to synchronize input and output,
- * but this could cause gawk to hang when it is started in a pipeline
- * and thus has a child process feeding it input (shell dependant)
- */
- /*(void) wait_any(0);*/ /* wait for outstanding processes */
-
- if (pipe(p) < 0)
- fatal("cannot open pipe \"%s\" (%s)", cmd, strerror(errno));
- if ((pid = fork()) == 0) {
- if (close(1) == -1)
- fatal("close of stdout in child failed (%s)",
- strerror(errno));
- if (dup(p[1]) != 1)
- fatal("dup of pipe failed (%s)", strerror(errno));
- if (close(p[0]) == -1 || close(p[1]) == -1)
- fatal("close of pipe failed (%s)", strerror(errno));
- if (close(0) == -1)
- fatal("close of stdin in child failed (%s)",
- strerror(errno));
- execl("/bin/sh", "sh", "-c", cmd, 0);
- _exit(127);
- }
- if (pid == -1)
- fatal("cannot fork for \"%s\" (%s)", cmd, strerror(errno));
- rp->pid = pid;
- if (close(p[1]) == -1)
- fatal("close of pipe failed (%s)", strerror(errno));
- return (rp->iop = iop_alloc(p[0]));
-}
-
-static int
-gawk_pclose(rp)
-struct redirect *rp;
-{
- (void) iop_close(rp->iop);
- rp->iop = NULL;
-
- /* process previously found, return stored status */
- if (rp->pid == -1)
- return (rp->status >> 8) & 0xFF;
- rp->status = wait_any(rp->pid);
- rp->pid = -1;
- return (rp->status >> 8) & 0xFF;
-}
-
-#else /* PIPES_SIMULATED */
- /* use temporary file rather than pipe */
- /* except if popen() provides real pipes too */
-
-#if defined(VMS) || defined(OS2) || defined (MSDOS)
-static IOBUF *
-gawk_popen(cmd, rp)
-char *cmd;
-struct redirect *rp;
-{
- FILE *current;
-
- if ((current = popen(cmd, "r")) == NULL)
- return NULL;
- return (rp->iop = iop_alloc(fileno(current)));
-}
-
-static int
-gawk_pclose(rp)
-struct redirect *rp;
-{
- int rval, aval, fd = rp->iop->fd;
- FILE *kludge = fdopen(fd, (char *) "r"); /* pclose needs FILE* w/ right fileno */
-
- rp->iop->fd = dup(fd); /* kludge to allow close() + pclose() */
- rval = iop_close(rp->iop);
- rp->iop = NULL;
- aval = pclose(kludge);
- return (rval < 0 ? rval : aval);
-}
-#else /* VMS || OS2 || MSDOS */
-
-static
-struct {
- char *command;
- char *name;
-} pipes[_NFILE];
-
-static IOBUF *
-gawk_popen(cmd, rp)
-char *cmd;
-struct redirect *rp;
-{
- extern char *strdup(const char *);
- int current;
- char *name;
- static char cmdbuf[256];
-
- /* get a name to use. */
- if ((name = tempnam(".", "pip")) == NULL)
- return NULL;
- sprintf(cmdbuf,"%s > %s", cmd, name);
- system(cmdbuf);
- if ((current = open(name,O_RDONLY)) == INVALID_HANDLE)
- return NULL;
- pipes[current].name = name;
- pipes[current].command = strdup(cmd);
- rp->iop = iop_alloc(current);
- return (rp->iop = iop_alloc(current));
-}
-
-static int
-gawk_pclose(rp)
-struct redirect *rp;
-{
- int cur = rp->iop->fd;
- int rval;
-
- rval = iop_close(rp->iop);
- rp->iop = NULL;
-
- /* check for an open file */
- if (pipes[cur].name == NULL)
- return -1;
- unlink(pipes[cur].name);
- free(pipes[cur].name);
- pipes[cur].name = NULL;
- free(pipes[cur].command);
- return rval;
-}
-#endif /* VMS || OS2 || MSDOS */
-
-#endif /* PIPES_SIMULATED */
-
-NODE *
-do_getline(tree)
-NODE *tree;
-{
- struct redirect *rp = NULL;
- IOBUF *iop;
- int cnt = EOF;
- char *s = NULL;
- int errcode;
-
- while (cnt == EOF) {
- if (tree->rnode == NULL) { /* no redirection */
- iop = nextfile(0);
- if (iop == NULL) /* end of input */
- return tmp_number((AWKNUM) 0.0);
- } else {
- int redir_error = 0;
-
- rp = redirect(tree->rnode, &redir_error);
- if (rp == NULL && redir_error) { /* failed redirect */
- if (! do_unix) {
- s = strerror(redir_error);
-
- unref(ERRNO_node->var_value);
- ERRNO_node->var_value =
- make_string(s, strlen(s));
- }
- return tmp_number((AWKNUM) -1.0);
- }
- iop = rp->iop;
- if (iop == NULL) /* end of input */
- return tmp_number((AWKNUM) 0.0);
- }
- errcode = 0;
- cnt = get_a_record(&s, iop, *RS, & errcode);
- if (! do_unix && errcode != 0) {
- s = strerror(errcode);
-
- unref(ERRNO_node->var_value);
- ERRNO_node->var_value = make_string(s, strlen(s));
- return tmp_number((AWKNUM) -1.0);
- }
- if (cnt == EOF) {
- if (rp) {
- /*
- * Don't do iop_close() here if we are
- * reading from a pipe; otherwise
- * gawk_pclose will not be called.
- */
- if (!(rp->flag & RED_PIPE)) {
- (void) iop_close(iop);
- rp->iop = NULL;
- }
- rp->flag |= RED_EOF; /* sticky EOF */
- return tmp_number((AWKNUM) 0.0);
- } else
- continue; /* try another file */
- }
- if (!rp) {
- NR += 1;
- FNR += 1;
- }
- if (tree->lnode == NULL) /* no optional var. */
- set_record(s, cnt, 1);
- else { /* assignment to variable */
- Func_ptr after_assign = NULL;
- NODE **lhs;
-
- lhs = get_lhs(tree->lnode, &after_assign);
- unref(*lhs);
- *lhs = make_string(s, strlen(s));
- (*lhs)->flags |= MAYBE_NUM;
- /* we may have to regenerate $0 here! */
- if (after_assign)
- (*after_assign)();
- }
- }
- return tmp_number((AWKNUM) 1.0);
-}
-
-int
-pathopen (file)
-const char *file;
-{
- int fd = do_pathopen(file);
-
-#ifdef DEFAULT_FILETYPE
- if (! do_unix && fd <= INVALID_HANDLE) {
- char *file_awk;
- int save = errno;
-#ifdef VMS
- int vms_save = vaxc$errno;
-#endif
-
- /* append ".awk" and try again */
- emalloc(file_awk, char *, strlen(file) +
- sizeof(DEFAULT_FILETYPE) + 1, "pathopen");
- sprintf(file_awk, "%s%s", file, DEFAULT_FILETYPE);
- fd = do_pathopen(file_awk);
- free(file_awk);
- if (fd <= INVALID_HANDLE) {
- errno = save;
-#ifdef VMS
- vaxc$errno = vms_save;
-#endif
- }
- }
-#endif /*DEFAULT_FILETYPE*/
-
- return fd;
-}
-
-static int
-do_pathopen (file)
-const char *file;
-{
- static const char *savepath = DEFPATH; /* defined in config.h */
- static int first = 1;
- const char *awkpath;
- char *cp, trypath[BUFSIZ];
- int fd;
-
- if (STREQ(file, "-"))
- return (0);
-
- if (do_unix)
- return (devopen(file, "r"));
-
- if (first) {
- first = 0;
- if ((awkpath = getenv ("AWKPATH")) != NULL && *awkpath)
- savepath = awkpath; /* used for restarting */
- }
- awkpath = savepath;
-
- /* some kind of path name, no search */
-#ifdef VMS /* (strchr not equal implies either or both not NULL) */
- if (strchr(file, ':') != strchr(file, ']')
- || strchr(file, '>') != strchr(file, '/'))
-#else /*!VMS*/
-#if defined(MSDOS) || defined(OS2)
- if (strchr(file, '/') != strchr(file, '\\')
- || strchr(file, ':') != NULL)
-#else
- if (strchr(file, '/') != NULL)
-#endif /*MSDOS*/
-#endif /*VMS*/
- return (devopen(file, "r"));
-
-#if defined(MSDOS) || defined(OS2)
- _searchenv(file, "AWKPATH", trypath);
- if (trypath[0] == '\0')
- _searchenv(file, "PATH", trypath);
- return (trypath[0] == '\0') ? 0 : devopen(trypath, "r");
-#else
- do {
- trypath[0] = '\0';
- /* this should take into account limits on size of trypath */
- for (cp = trypath; *awkpath && *awkpath != ENVSEP; )
- *cp++ = *awkpath++;
-
- if (cp != trypath) { /* nun-null element in path */
- /* add directory punctuation only if needed */
-#ifdef VMS
- if (strchr(":]>/", *(cp-1)) == NULL)
-#else
-#if defined(MSDOS) || defined(OS2)
- if (strchr(":\\/", *(cp-1)) == NULL)
-#else
- if (*(cp-1) != '/')
-#endif
-#endif
- *cp++ = '/';
- /* append filename */
- strcpy (cp, file);
- } else
- strcpy (trypath, file);
- if ((fd = devopen(trypath, "r")) >= 0)
- return (fd);
-
- /* no luck, keep going */
- if(*awkpath == ENVSEP && awkpath[1] != '\0')
- awkpath++; /* skip colon */
- } while (*awkpath);
- /*
- * You might have one of the awk
- * paths defined, WITHOUT the current working directory in it.
- * Therefore try to open the file in the current directory.
- */
- return (devopen(file, "r"));
-#endif
-}
diff --git a/gnu/usr.bin/awk/iop.c b/gnu/usr.bin/awk/iop.c
deleted file mode 100644
index 6b6a03b..0000000
--- a/gnu/usr.bin/awk/iop.c
+++ /dev/null
@@ -1,321 +0,0 @@
-/*
- * iop.c - do i/o related things.
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include "awk.h"
-
-#ifndef atarist
-#define INVALID_HANDLE (-1)
-#else
-#include <stddef.h>
-#include <fcntl.h>
-#define INVALID_HANDLE (__SMALLEST_VALID_HANDLE - 1)
-#endif /* atarist */
-
-
-#ifdef TEST
-int bufsize = 8192;
-
-void
-fatal(s)
-char *s;
-{
- printf("%s\n", s);
- exit(1);
-}
-#endif
-
-int
-optimal_bufsize(fd)
-int fd;
-{
- struct stat stb;
-
-#ifdef VMS
- /*
- * These values correspond with the RMS multi-block count used by
- * vms_open() in vms/vms_misc.c.
- */
- if (isatty(fd) > 0)
- return BUFSIZ;
- else if (fstat(fd, &stb) < 0)
- return 8*512; /* conservative in case of DECnet access */
- else
- return 32*512;
-
-#else
- /*
- * System V doesn't have the file system block size in the
- * stat structure. So we have to make some sort of reasonable
- * guess. We use stdio's BUFSIZ, since that is what it was
- * meant for in the first place.
- */
-#ifdef BLKSIZE_MISSING
-#define DEFBLKSIZE BUFSIZ
-#else
-#define DEFBLKSIZE (stb.st_blksize ? stb.st_blksize : BUFSIZ)
-#endif
-
-#ifdef TEST
- return bufsize;
-#else
-#ifndef atarist
- if (isatty(fd))
-#else
- /*
- * On ST redirected stdin does not have a name attached
- * (this could be hard to do to) and fstat would fail
- */
- if (0 == fd || isatty(fd))
-#endif /*atarist */
- return BUFSIZ;
-#ifndef BLKSIZE_MISSING
- /* VMS POSIX 1.0: st_blksize is never assigned a value, so zero it */
- stb.st_blksize = 0;
-#endif
- if (fstat(fd, &stb) == -1)
- fatal("can't stat fd %d (%s)", fd, strerror(errno));
- if (lseek(fd, (off_t)0, 0) == -1)
- return DEFBLKSIZE;
- return ((int) (stb.st_size < DEFBLKSIZE ? stb.st_size : DEFBLKSIZE));
-#endif /*! TEST */
-#endif /*! VMS */
-}
-
-IOBUF *
-iop_alloc(fd)
-int fd;
-{
- IOBUF *iop;
-
- if (fd == INVALID_HANDLE)
- return NULL;
- emalloc(iop, IOBUF *, sizeof(IOBUF), "iop_alloc");
- iop->flag = 0;
- if (isatty(fd))
- iop->flag |= IOP_IS_TTY;
- iop->size = optimal_bufsize(fd);
- iop->secsiz = -2;
- errno = 0;
- iop->fd = fd;
- iop->off = iop->buf = NULL;
- iop->cnt = 0;
- return iop;
-}
-
-/*
- * Get the next record. Uses a "split buffer" where the latter part is
- * the normal read buffer and the head part is an "overflow" area that is used
- * when a record spans the end of the normal buffer, in which case the first
- * part of the record is copied into the overflow area just before the
- * normal buffer. Thus, the eventual full record can be returned as a
- * contiguous area of memory with a minimum of copying. The overflow area
- * is expanded as needed, so that records are unlimited in length.
- * We also mark both the end of the buffer and the end of the read() with
- * a sentinel character (the current record separator) so that the inside
- * loop can run as a single test.
- */
-int
-get_a_record(out, iop, grRS, errcode)
-char **out;
-IOBUF *iop;
-register int grRS;
-int *errcode;
-{
- register char *bp = iop->off;
- char *bufend;
- char *start = iop->off; /* beginning of record */
- char rs;
- int saw_newline = 0, eat_whitespace = 0; /* used iff grRS==0 */
-
- if (iop->cnt == EOF) { /* previous read hit EOF */
- *out = NULL;
- return EOF;
- }
-
- if (grRS == 0) { /* special case: grRS == "" */
- rs = '\n';
- } else
- rs = (char) grRS;
-
- /* set up sentinel */
- if (iop->buf) {
- bufend = iop->buf + iop->size + iop->secsiz;
- *bufend = rs;
- } else
- bufend = NULL;
-
- for (;;) { /* break on end of record, read error or EOF */
-
- /* Following code is entered on the first call of this routine
- * for a new iop, or when we scan to the end of the buffer.
- * In the latter case, we copy the current partial record to
- * the space preceding the normal read buffer. If necessary,
- * we expand this space. This is done so that we can return
- * the record as a contiguous area of memory.
- */
- if ((iop->flag & IOP_IS_INTERNAL) == 0 && bp >= bufend) {
- char *oldbuf = NULL;
- char *oldsplit = iop->buf + iop->secsiz;
- long len; /* record length so far */
-
- len = bp - start;
- if (len > iop->secsiz) {
- /* expand secondary buffer */
- if (iop->secsiz == -2)
- iop->secsiz = 256;
- while (len > iop->secsiz)
- iop->secsiz *= 2;
- oldbuf = iop->buf;
- emalloc(iop->buf, char *,
- iop->size+iop->secsiz+2, "get_a_record");
- bufend = iop->buf + iop->size + iop->secsiz;
- *bufend = rs;
- }
- if (len > 0) {
- char *newsplit = iop->buf + iop->secsiz;
-
- if (start < oldsplit) {
- memcpy(newsplit - len, start,
- oldsplit - start);
- memcpy(newsplit - (bp - oldsplit),
- oldsplit, bp - oldsplit);
- } else
- memcpy(newsplit - len, start, len);
- }
- bp = iop->end = iop->off = iop->buf + iop->secsiz;
- start = bp - len;
- if (oldbuf) {
- free(oldbuf);
- oldbuf = NULL;
- }
- }
- /* Following code is entered whenever we have no more data to
- * scan. In most cases this will read into the beginning of
- * the main buffer, but in some cases (terminal, pipe etc.)
- * we may be doing smallish reads into more advanced positions.
- */
- if (bp >= iop->end) {
- if ((iop->flag & IOP_IS_INTERNAL) != 0) {
- iop->cnt = EOF;
- break;
- }
- iop->cnt = read(iop->fd, iop->end, bufend - iop->end);
- if (iop->cnt == -1) {
- if (! do_unix && errcode != NULL) {
- *errcode = errno;
- iop->cnt = EOF;
- break;
- } else
- fatal("error reading input: %s",
- strerror(errno));
- } else if (iop->cnt == 0) {
- iop->cnt = EOF;
- break;
- }
- iop->end += iop->cnt;
- *iop->end = rs;
- }
- if (grRS == 0) {
- extern int default_FS;
-
- if (default_FS && (bp == start || eat_whitespace)) {
- while (bp < iop->end
- && (*bp == ' ' || *bp == '\t' || *bp == '\n'))
- bp++;
- if (bp == iop->end) {
- eat_whitespace = 1;
- continue;
- } else
- eat_whitespace = 0;
- }
- if (saw_newline && *bp == rs) {
- bp++;
- break;
- }
- saw_newline = 0;
- }
-
- while (*bp++ != rs)
- ;
-
- if (bp <= iop->end) {
- if (grRS == 0)
- saw_newline = 1;
- else
- break;
- } else
- bp--;
-
- if ((iop->flag & IOP_IS_INTERNAL) != 0)
- iop->cnt = bp - start;
- }
- if (iop->cnt == EOF
- && (((iop->flag & IOP_IS_INTERNAL) != 0) || start == bp)) {
- *out = NULL;
- return EOF;
- }
-
- iop->off = bp;
- bp--;
- if (*bp != rs)
- bp++;
- *bp = '\0';
- if (grRS == 0) {
- /* there could be more newlines left, clean 'em out now */
- while (*(iop->off) == rs && iop->off <= iop->end)
- (iop->off)++;
-
- if (*--bp == rs)
- *bp = '\0';
- else
- bp++;
- }
-
- *out = start;
- return bp - start;
-}
-
-#ifdef TEST
-main(argc, argv)
-int argc;
-char *argv[];
-{
- IOBUF *iop;
- char *out;
- int cnt;
- char rs[2];
-
- rs[0] = 0;
- if (argc > 1)
- bufsize = atoi(argv[1]);
- if (argc > 2)
- rs[0] = *argv[2];
- iop = iop_alloc(0);
- while ((cnt = get_a_record(&out, iop, rs[0], NULL)) > 0) {
- fwrite(out, 1, cnt, stdout);
- fwrite(rs, 1, 1, stdout);
- }
-}
-#endif
diff --git a/gnu/usr.bin/awk/main.c b/gnu/usr.bin/awk/main.c
deleted file mode 100644
index 2276460..0000000
--- a/gnu/usr.bin/awk/main.c
+++ /dev/null
@@ -1,823 +0,0 @@
-/*
- * main.c -- Expression tree constructors and main program for gawk.
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#ifdef __FreeBSD__
-#include <locale.h>
-#endif
-#include "getopt.h"
-#include "awk.h"
-#include "patchlevel.h"
-
-static void usage P((int exitval));
-static void copyleft P((void));
-static void cmdline_fs P((char *str));
-static void init_args P((int argc0, int argc, char *argv0, char **argv));
-static void init_vars P((void));
-static void pre_assign P((char *v));
-SIGTYPE catchsig P((int sig, int code));
-static void gawk_option P((char *optstr));
-static void nostalgia P((void));
-static void version P((void));
-char *gawk_name P((char *filespec));
-
-#ifdef MSDOS
-extern int isatty P((int));
-#endif
-
-extern void resetup P((void));
-
-/* These nodes store all the special variables AWK uses */
-NODE *FS_node, *NF_node, *RS_node, *NR_node;
-NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node;
-NODE *CONVFMT_node;
-NODE *ERRNO_node;
-NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node;
-NODE *ENVIRON_node, *IGNORECASE_node;
-NODE *ARGC_node, *ARGV_node, *ARGIND_node;
-NODE *FIELDWIDTHS_node;
-
-long NF;
-long NR;
-long FNR;
-int IGNORECASE;
-char *RS;
-char *OFS;
-char *ORS;
-char *OFMT;
-char *CONVFMT;
-
-/*
- * The parse tree and field nodes are stored here. Parse_end is a dummy item
- * used to free up unneeded fields without freeing the program being run
- */
-int errcount = 0; /* error counter, used by yyerror() */
-
-/* The global null string */
-NODE *Nnull_string;
-
-/* The name the program was invoked under, for error messages */
-const char *myname;
-
-/* A block of AWK code to be run before running the program */
-NODE *begin_block = 0;
-
-/* A block of AWK code to be run after the last input file */
-NODE *end_block = 0;
-
-int exiting = 0; /* Was an "exit" statement executed? */
-int exit_val = 0; /* optional exit value */
-
-#if defined(YYDEBUG) || defined(DEBUG)
-extern int yydebug;
-#endif
-
-struct src *srcfiles = NULL; /* source file name(s) */
-int numfiles = -1; /* how many source files */
-
-int do_unix = 0; /* turn off gnu extensions */
-int do_posix = 0; /* turn off gnu and unix extensions */
-int do_lint = 0; /* provide warnings about questionable stuff */
-int do_nostalgia = 0; /* provide a blast from the past */
-
-int in_begin_rule = 0; /* we're in a BEGIN rule */
-int in_end_rule = 0; /* we're in a END rule */
-
-int output_is_tty = 0; /* control flushing of output */
-
-extern char *version_string; /* current version, for printing */
-
-NODE *expression_value;
-
-static struct option optab[] = {
- { "compat", no_argument, & do_unix, 1 },
- { "lint", no_argument, & do_lint, 1 },
- { "posix", no_argument, & do_posix, 1 },
- { "nostalgia", no_argument, & do_nostalgia, 1 },
- { "copyleft", no_argument, NULL, 'C' },
- { "copyright", no_argument, NULL, 'C' },
- { "field-separator", required_argument, NULL, 'F' },
- { "file", required_argument, NULL, 'f' },
- { "assign", required_argument, NULL, 'v' },
- { "version", no_argument, NULL, 'V' },
- { "usage", no_argument, NULL, 'u' },
- { "help", no_argument, NULL, 'u' },
- { "source", required_argument, NULL, 's' },
-#ifdef DEBUG
- { "parsedebug", no_argument, NULL, 'D' },
-#endif
- { 0, 0, 0, 0 }
-};
-
-int
-main(argc, argv)
-int argc;
-char **argv;
-{
- int c;
- char *scan;
- /* the + on the front tells GNU getopt not to rearrange argv */
- const char *optlist = "+F:f:v:W:m:";
- int stopped_early = 0;
- int old_optind;
- extern int optind;
- extern int opterr;
- extern char *optarg;
-
-#ifdef __FreeBSD__
- (void) setlocale(LC_ALL, "");
-#endif
-#ifdef __EMX__
- _response(&argc, &argv);
- _wildcard(&argc, &argv);
- setvbuf(stdout, NULL, _IOLBF, BUFSIZ);
-#endif
-
- (void) signal(SIGFPE, (SIGTYPE (*) P((int))) catchsig);
- (void) signal(SIGSEGV, (SIGTYPE (*) P((int))) catchsig);
-#ifdef SIGBUS
- (void) signal(SIGBUS, (SIGTYPE (*) P((int))) catchsig);
-#endif
-
- myname = gawk_name(argv[0]);
- argv[0] = (char *)myname;
-#ifdef VMS
- vms_arg_fixup(&argc, &argv); /* emulate redirection, expand wildcards */
-#endif
-
- /* remove sccs gunk */
- if (strncmp(version_string, "@(#)", 4) == 0)
- version_string += 4;
-
- if (argc < 2)
- usage(1);
-
- /* initialize the null string */
- Nnull_string = make_string("", 0);
- Nnull_string->numbr = 0.0;
- Nnull_string->type = Node_val;
- Nnull_string->flags = (PERM|STR|STRING|NUM|NUMBER);
-
- /* Set up the special variables */
- /*
- * Note that this must be done BEFORE arg parsing else -F
- * breaks horribly
- */
- init_vars();
-
- /* worst case */
- emalloc(srcfiles, struct src *, argc * sizeof(struct src), "main");
- memset(srcfiles, '\0', argc * sizeof(struct src));
-
- /* Tell the regex routines how they should work. . . */
- resetup();
-
-#ifdef fpsetmask
- fpsetmask(~0xff);
-#endif
- /* we do error messages ourselves on invalid options */
- opterr = 0;
-
- /* option processing. ready, set, go! */
- for (optopt = 0, old_optind = 1;
- (c = getopt_long(argc, argv, optlist, optab, NULL)) != EOF;
- optopt = 0, old_optind = optind) {
- if (do_posix)
- opterr = 1;
- switch (c) {
- case 'F':
- cmdline_fs(optarg);
- break;
-
- case 'f':
- /*
- * a la MKS awk, allow multiple -f options.
- * this makes function libraries real easy.
- * most of the magic is in the scanner.
- */
- /* The following is to allow for whitespace at the end
- * of a #! /bin/gawk line in an executable file
- */
- scan = optarg;
- while (isspace(*scan))
- scan++;
- ++numfiles;
- srcfiles[numfiles].stype = SOURCEFILE;
- if (*scan == '\0')
- srcfiles[numfiles].val = argv[optind++];
- else
- srcfiles[numfiles].val = optarg;
- break;
-
- case 'v':
- pre_assign(optarg);
- break;
-
- case 'm':
- /*
- * Research awk extension.
- * -mf=nnn set # fields, gawk ignores
- * -mr=nnn set record length, ditto
- */
- if (do_lint)
- warning("-m[fr] option irrelevant");
- if ((optarg[0] != 'r' && optarg[0] != 'f')
- || optarg[1] != '=')
- warning("-m option usage: -m[fn]=nnn");
- break;
-
- case 'W': /* gawk specific options */
- gawk_option(optarg);
- break;
-
- /* These can only come from long form options */
- case 'V':
- version();
- break;
-
- case 'C':
- copyleft();
- break;
-
- case 'u':
- usage(0);
- break;
-
- case 's':
- if (optarg[0] == '\0')
- warning("empty argument to --source ignored");
- else {
- srcfiles[++numfiles].stype = CMDLINE;
- srcfiles[numfiles].val = optarg;
- }
- break;
-
-#ifdef DEBUG
- case 'D':
- yydebug = 2;
- break;
-#endif
-
- case 0:
- /*
- * getopt_long found an option that sets a variable
- * instead of returning a letter. Do nothing, just
- * cycle around for the next one.
- */
- break;
-
- case '?':
- default:
- /*
- * New behavior. If not posix, an unrecognized
- * option stops argument processing so that it can
- * go into ARGV for the awk program to see. This
- * makes use of ``#! /bin/gawk -f'' easier.
- *
- * However, it's never simple. If optopt is set,
- * an option that requires an argument didn't get the
- * argument. We care because if opterr is 0, then
- * getopt_long won't print the error message for us.
- */
- if (! do_posix
- && (optopt == 0 || strchr(optlist, optopt) == NULL)) {
- /*
- * can't just do optind--. In case of an
- * option with >=2 letters, getopt_long
- * won't have incremented optind.
- */
- optind = old_optind;
- stopped_early = 1;
- goto out;
- } else if (optopt)
- /* Use 1003.2 required message format */
- fprintf (stderr,
- "%s: option requires an argument -- %c\n",
- myname, optopt);
- /* else
- let getopt print error message for us */
- break;
- }
- }
-out:
-
- if (do_nostalgia)
- nostalgia();
-
- /* check for POSIXLY_CORRECT environment variable */
- if (! do_posix && getenv("POSIXLY_CORRECT") != NULL) {
- do_posix = 1;
- if (do_lint)
- warning(
- "environment variable `POSIXLY_CORRECT' set: turning on --posix");
- }
-
- /* POSIX compliance also implies no Unix extensions either */
- if (do_posix)
- do_unix = 1;
-
-#ifdef DEBUG
- setbuf(stdout, (char *) NULL); /* make debugging easier */
-#endif
- if (isatty(fileno(stdout)))
- output_is_tty = 1;
- /* No -f or --source options, use next arg */
- if (numfiles == -1) {
- if (optind > argc - 1 || stopped_early) /* no args left or no program */
- usage(1);
- srcfiles[++numfiles].stype = CMDLINE;
- srcfiles[numfiles].val = argv[optind];
- optind++;
- }
- init_args(optind, argc, (char *) myname, argv);
- (void) tokexpand();
-
- /* Read in the program */
- if (yyparse() || errcount)
- exit(1);
-
- /* Set up the field variables */
- init_fields();
-
- if (do_lint && begin_block == NULL && expression_value == NULL
- && end_block == NULL)
- warning("no program");
-
- if (begin_block) {
- in_begin_rule = 1;
- (void) interpret(begin_block);
- }
- in_begin_rule = 0;
- if (!exiting && (expression_value || end_block))
- do_input();
- if (end_block) {
- in_end_rule = 1;
- (void) interpret(end_block);
- }
- in_end_rule = 0;
- if (close_io() != 0 && exit_val == 0)
- exit_val = 1;
- exit(exit_val); /* more portable */
- return exit_val; /* to suppress warnings */
-}
-
-/* usage --- print usage information and exit */
-
-static void
-usage(exitval)
-int exitval;
-{
- const char *opt1 = " -f progfile [--]";
-#if defined(MSDOS) || defined(OS2) || defined(VMS)
- const char *opt2 = " [--] \"program\"";
-#else
- const char *opt2 = " [--] 'program'";
-#endif
- const char *regops = " [POSIX or GNU style options]";
-
- fprintf(stderr, "Usage:\t%s%s%s file ...\n\t%s%s%s file ...\n",
- myname, regops, opt1, myname, regops, opt2);
-
- /* GNU long options info. Gack. */
- fputs("POSIX options:\t\tGNU long options:\n", stderr);
- fputs("\t-f progfile\t\t--file=progfile\n", stderr);
- fputs("\t-F fs\t\t\t--field-separator=fs\n", stderr);
- fputs("\t-v var=val\t\t--assign=var=val\n", stderr);
- fputs("\t-m[fr]=val\n", stderr);
- fputs("\t-W compat\t\t--compat\n", stderr);
- fputs("\t-W copyleft\t\t--copyleft\n", stderr);
- fputs("\t-W copyright\t\t--copyright\n", stderr);
- fputs("\t-W help\t\t\t--help\n", stderr);
- fputs("\t-W lint\t\t\t--lint\n", stderr);
-#ifdef NOSTALGIA
- fputs("\t-W nostalgia\t\t--nostalgia\n", stderr);
-#endif
-#ifdef DEBUG
- fputs("\t-W parsedebug\t\t--parsedebug\n", stderr);
-#endif
- fputs("\t-W posix\t\t--posix\n", stderr);
- fputs("\t-W source=program-text\t--source=program-text\n", stderr);
- fputs("\t-W usage\t\t--usage\n", stderr);
- fputs("\t-W version\t\t--version\n", stderr);
- exit(exitval);
-}
-
-static void
-copyleft ()
-{
- static char blurb_part1[] =
-"Copyright (C) 1989, 1991, 1992, Free Software Foundation.\n\
-\n\
-This program is free software; you can redistribute it and/or modify\n\
-it under the terms of the GNU General Public License as published by\n\
-the Free Software Foundation; either version 2 of the License, or\n\
-(at your option) any later version.\n\
-\n";
- static char blurb_part2[] =
-"This program is distributed in the hope that it will be useful,\n\
-but WITHOUT ANY WARRANTY; without even the implied warranty of\n\
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n\
-GNU General Public License for more details.\n\
-\n";
- static char blurb_part3[] =
-"You should have received a copy of the GNU General Public License\n\
-along with this program; if not, write to the Free Software\n\
-Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.\n";
-
- fputs(blurb_part1, stderr);
- fputs(blurb_part2, stderr);
- fputs(blurb_part3, stderr);
- fflush(stderr);
-}
-
-static void
-cmdline_fs(str)
-char *str;
-{
- register NODE **tmp;
- /* int len = strlen(str); *//* don't do that - we want to
- avoid mismatched types */
-
- tmp = get_lhs(FS_node, (Func_ptr *) 0);
- unref(*tmp);
- /*
- * Only if in full compatibility mode check for the stupid special
- * case so -F\t works as documented in awk even though the shell
- * hands us -Ft. Bleah!
- *
- * Thankfully, Posix didn't propogate this "feature".
- */
- if (str[0] == 't' && str[1] == '\0') {
- if (do_lint)
- warning("-Ft does not set FS to tab in POSIX awk");
- if (do_unix && ! do_posix)
- str[0] = '\t';
- }
- *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */
- set_FS();
-}
-
-static void
-init_args(argc0, argc, argv0, argv)
-int argc0, argc;
-char *argv0;
-char **argv;
-{
- int i, j;
- NODE **aptr;
-
- ARGV_node = install("ARGV", node(Nnull_string, Node_var, (NODE *)NULL));
- aptr = assoc_lookup(ARGV_node, tmp_number(0.0));
- *aptr = make_string(argv0, strlen(argv0));
- (*aptr)->flags |= MAYBE_NUM;
- for (i = argc0, j = 1; i < argc; i++) {
- aptr = assoc_lookup(ARGV_node, tmp_number((AWKNUM) j));
- *aptr = make_string(argv[i], strlen(argv[i]));
- (*aptr)->flags |= MAYBE_NUM;
- j++;
- }
- ARGC_node = install("ARGC",
- node(make_number((AWKNUM) j), Node_var, (NODE *) NULL));
-}
-
-/*
- * Set all the special variables to their initial values.
- */
-struct varinit {
- NODE **spec;
- const char *name;
- NODETYPE type;
- const char *strval;
- AWKNUM numval;
- Func_ptr assign;
-};
-static struct varinit varinit[] = {
-{&NF_node, "NF", Node_NF, 0, -1, set_NF },
-{&FIELDWIDTHS_node, "FIELDWIDTHS", Node_FIELDWIDTHS, "", 0, 0 },
-{&NR_node, "NR", Node_NR, 0, 0, set_NR },
-{&FNR_node, "FNR", Node_FNR, 0, 0, set_FNR },
-{&FS_node, "FS", Node_FS, " ", 0, 0 },
-{&RS_node, "RS", Node_RS, "\n", 0, set_RS },
-{&IGNORECASE_node, "IGNORECASE", Node_IGNORECASE, 0, 0, set_IGNORECASE },
-{&FILENAME_node, "FILENAME", Node_var, "", 0, 0 },
-{&OFS_node, "OFS", Node_OFS, " ", 0, set_OFS },
-{&ORS_node, "ORS", Node_ORS, "\n", 0, set_ORS },
-{&OFMT_node, "OFMT", Node_OFMT, "%.6g", 0, set_OFMT },
-{&CONVFMT_node, "CONVFMT", Node_CONVFMT, "%.6g", 0, set_CONVFMT },
-{&RLENGTH_node, "RLENGTH", Node_var, 0, 0, 0 },
-{&RSTART_node, "RSTART", Node_var, 0, 0, 0 },
-{&SUBSEP_node, "SUBSEP", Node_var, "\034", 0, 0 },
-{&ARGIND_node, "ARGIND", Node_var, 0, 0, 0 },
-{&ERRNO_node, "ERRNO", Node_var, 0, 0, 0 },
-{0, 0, Node_illegal, 0, 0, 0 },
-};
-
-static void
-init_vars()
-{
- register struct varinit *vp;
-
- for (vp = varinit; vp->name; vp++) {
- *(vp->spec) = install((char *) vp->name,
- node(vp->strval == 0 ? make_number(vp->numval)
- : make_string((char *) vp->strval,
- strlen(vp->strval)),
- vp->type, (NODE *) NULL));
- if (vp->assign)
- (*(vp->assign))();
- }
-}
-
-void
-load_environ()
-{
-#if !defined(MSDOS) && !defined(OS2) && !(defined(VMS) && defined(__DECC))
- extern char **environ;
-#endif
- register char *var, *val;
- NODE **aptr;
- register int i;
-
- ENVIRON_node = install("ENVIRON",
- node(Nnull_string, Node_var, (NODE *) NULL));
- for (i = 0; environ[i]; i++) {
- static char nullstr[] = "";
-
- var = environ[i];
- val = strchr(var, '=');
- if (val)
- *val++ = '\0';
- else
- val = nullstr;
- aptr = assoc_lookup(ENVIRON_node, tmp_string(var, strlen (var)));
- *aptr = make_string(val, strlen (val));
- (*aptr)->flags |= MAYBE_NUM;
-
- /* restore '=' so that system() gets a valid environment */
- if (val != nullstr)
- *--val = '=';
- }
-}
-
-/* Process a command-line assignment */
-char *
-arg_assign(arg)
-char *arg;
-{
- char *cp, *cp2;
- int badvar;
- Func_ptr after_assign = NULL;
- NODE *var;
- NODE *it;
- NODE **lhs;
-
- cp = strchr(arg, '=');
- if (cp != NULL) {
- *cp++ = '\0';
- /* first check that the variable name has valid syntax */
- badvar = 0;
- if (! isalpha(arg[0]) && arg[0] != '_')
- badvar = 1;
- else
- for (cp2 = arg+1; *cp2; cp2++)
- if (! isalnum(*cp2) && *cp2 != '_') {
- badvar = 1;
- break;
- }
- if (badvar)
- fatal("illegal name `%s' in variable assignment", arg);
-
- /*
- * Recent versions of nawk expand escapes inside assignments.
- * This makes sense, so we do it too.
- */
- it = make_str_node(cp, strlen(cp), SCAN);
- it->flags |= MAYBE_NUM;
- var = variable(arg, 0);
- lhs = get_lhs(var, &after_assign);
- unref(*lhs);
- *lhs = it;
- if (after_assign)
- (*after_assign)();
- *--cp = '='; /* restore original text of ARGV */
- }
- return cp;
-}
-
-static void
-pre_assign(v)
-char *v;
-{
- if (!arg_assign(v)) {
- fprintf (stderr,
- "%s: '%s' argument to -v not in 'var=value' form\n",
- myname, v);
- usage(1);
- }
-}
-
-SIGTYPE
-catchsig(sig, code)
-int sig, code;
-{
-#ifdef lint
- code = 0; sig = code; code = sig;
-#endif
- if (sig == SIGFPE) {
- fatal("floating point exception");
- } else if (sig == SIGSEGV
-#ifdef SIGBUS
- || sig == SIGBUS
-#endif
- ) {
- msg("fatal error: internal error");
- /* fatal won't abort() if not compiled for debugging */
- abort();
- } else
- cant_happen();
- /* NOTREACHED */
-}
-
-/* gawk_option --- do gawk specific things */
-
-static void
-gawk_option(optstr)
-char *optstr;
-{
- char *cp;
-
- for (cp = optstr; *cp; cp++) {
- switch (*cp) {
- case ' ':
- case '\t':
- case ',':
- break;
- case 'v':
- case 'V':
- /* print version */
- if (strncasecmp(cp, "version", 7) != 0)
- goto unknown;
- else
- cp += 6;
- version();
- break;
- case 'c':
- case 'C':
- if (strncasecmp(cp, "copyright", 9) == 0) {
- cp += 8;
- copyleft();
- } else if (strncasecmp(cp, "copyleft", 8) == 0) {
- cp += 7;
- copyleft();
- } else if (strncasecmp(cp, "compat", 6) == 0) {
- cp += 5;
- do_unix = 1;
- } else
- goto unknown;
- break;
- case 'n':
- case 'N':
- /*
- * Undocumented feature,
- * inspired by nostalgia, and a T-shirt
- */
- if (strncasecmp(cp, "nostalgia", 9) != 0)
- goto unknown;
- nostalgia();
- break;
- case 'p':
- case 'P':
-#ifdef DEBUG
- if (strncasecmp(cp, "parsedebug", 10) == 0) {
- cp += 9;
- yydebug = 2;
- break;
- }
-#endif
- if (strncasecmp(cp, "posix", 5) != 0)
- goto unknown;
- cp += 4;
- do_posix = do_unix = 1;
- break;
- case 'l':
- case 'L':
- if (strncasecmp(cp, "lint", 4) != 0)
- goto unknown;
- cp += 3;
- do_lint = 1;
- break;
- case 'H':
- case 'h':
- if (strncasecmp(cp, "help", 4) != 0)
- goto unknown;
- cp += 3;
- usage(0);
- break;
- case 'U':
- case 'u':
- if (strncasecmp(cp, "usage", 5) != 0)
- goto unknown;
- cp += 4;
- usage(0);
- break;
- case 's':
- case 'S':
- if (strncasecmp(cp, "source=", 7) != 0)
- goto unknown;
- cp += 7;
- if (cp[0] == '\0')
- warning("empty argument to -Wsource ignored");
- else {
- srcfiles[++numfiles].stype = CMDLINE;
- srcfiles[numfiles].val = cp;
- return;
- }
- break;
- default:
- unknown:
- fprintf(stderr, "'%c' -- unknown option, ignored\n",
- *cp);
- break;
- }
- }
-}
-
-/* nostalgia --- print the famous error message and die */
-
-static void
-nostalgia()
-{
- fprintf(stderr, "awk: bailing out near line 1\n");
- abort();
-}
-
-/* version --- print version message */
-
-static void
-version()
-{
- fprintf(stderr, "%s, patchlevel %d\n", version_string, PATCHLEVEL);
- /* per GNU coding standards, exit successfully, do nothing else */
- exit(0);
-}
-
-/* this mess will improve in 2.16 */
-char *
-gawk_name(filespec)
-char *filespec;
-{
- char *p;
-
-#ifdef VMS /* "device:[root.][directory.subdir]GAWK.EXE;n" -> "GAWK" */
- char *q;
-
- p = strrchr(filespec, ']'); /* directory punctuation */
- q = strrchr(filespec, '>'); /* alternate <international> punct */
-
- if (p == NULL || q > p) p = q;
- p = strdup(p == NULL ? filespec : (p + 1));
- if ((q = strrchr(p, '.')) != NULL) *q = '\0'; /* strip .typ;vers */
-
- return p;
-#endif /*VMS*/
-
-#if defined(MSDOS) || defined(OS2) || defined(atarist)
- char *q;
-
- for (p = filespec; (p = strchr(p, '\\')); *p = '/')
- ;
- p = filespec;
- if ((q = strrchr(p, '/')))
- p = q + 1;
- if ((q = strchr(p, '.')))
- *q = '\0';
- strlwr(p);
-
- return (p == NULL ? filespec : p);
-#endif /* MSDOS || atarist */
-
- /* "path/name" -> "name" */
- p = strrchr(filespec, '/');
- return (p == NULL ? filespec : p + 1);
-}
diff --git a/gnu/usr.bin/awk/msg.c b/gnu/usr.bin/awk/msg.c
deleted file mode 100644
index 4244fd3..0000000
--- a/gnu/usr.bin/awk/msg.c
+++ /dev/null
@@ -1,107 +0,0 @@
-/*
- * msg.c - routines for error messages
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2, or (at your option)
- * any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include "awk.h"
-
-int sourceline = 0;
-char *source = NULL;
-
-/* VARARGS2 */
-void
-err(s, emsg, argp)
-const char *s;
-const char *emsg;
-va_list argp;
-{
- char *file;
-
- (void) fflush(stdout);
- (void) fprintf(stderr, "%s: ", myname);
- if (sourceline) {
- if (source)
- (void) fprintf(stderr, "%s:", source);
- else
- (void) fprintf(stderr, "cmd. line:");
-
- (void) fprintf(stderr, "%d: ", sourceline);
- }
- if (FNR) {
- file = FILENAME_node->var_value->stptr;
- (void) putc('(', stderr);
- if (file)
- (void) fprintf(stderr, "FILENAME=%s ", file);
- (void) fprintf(stderr, "FNR=%ld) ", FNR);
- }
- (void) fprintf(stderr, s);
- vfprintf(stderr, emsg, argp);
- (void) fprintf(stderr, "\n");
- (void) fflush(stderr);
-}
-
-/*VARARGS0*/
-void
-msg(va_alist)
-va_dcl
-{
- va_list args;
- char *mesg;
-
- va_start(args);
- mesg = va_arg(args, char *);
- err("", mesg, args);
- va_end(args);
-}
-
-/*VARARGS0*/
-void
-warning(va_alist)
-va_dcl
-{
- va_list args;
- char *mesg;
-
- va_start(args);
- mesg = va_arg(args, char *);
- err("warning: ", mesg, args);
- va_end(args);
-}
-
-/*VARARGS0*/
-void
-fatal(va_alist)
-va_dcl
-{
- va_list args;
- char *mesg;
-
- va_start(args);
- mesg = va_arg(args, char *);
- err("fatal: ", mesg, args);
- va_end(args);
-#ifdef DEBUG
- abort();
-#endif
- exit(2);
-}
diff --git a/gnu/usr.bin/awk/node.c b/gnu/usr.bin/awk/node.c
deleted file mode 100644
index 748028b..0000000
--- a/gnu/usr.bin/awk/node.c
+++ /dev/null
@@ -1,463 +0,0 @@
-/*
- * node.c -- routines for node management
- */
-
-/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include "awk.h"
-
-extern double strtod();
-
-AWKNUM
-r_force_number(n)
-register NODE *n;
-{
- register char *cp;
- register char *cpend;
- char save;
- char *ptr;
- unsigned int newflags;
-
-#ifdef DEBUG
- if (n == NULL)
- cant_happen();
- if (n->type != Node_val)
- cant_happen();
- if(n->flags == 0)
- cant_happen();
- if (n->flags & NUM)
- return n->numbr;
-#endif
-
- /* all the conditionals are an attempt to avoid the expensive strtod */
-
- n->numbr = 0.0;
- n->flags |= NUM;
-
- if (n->stlen == 0)
- return 0.0;
-
- cp = n->stptr;
- if (isalpha(*cp))
- return 0.0;
-
- cpend = cp + n->stlen;
- while (cp < cpend && isspace(*cp))
- cp++;
- if (cp == cpend || isalpha(*cp))
- return 0.0;
-
- if (n->flags & MAYBE_NUM) {
- newflags = NUMBER;
- n->flags &= ~MAYBE_NUM;
- } else
- newflags = 0;
- if (cpend - cp == 1) {
- if (isdigit(*cp)) {
- n->numbr = (AWKNUM)(*cp - '0');
- n->flags |= newflags;
- }
- return n->numbr;
- }
-
- errno = 0;
- save = *cpend;
- *cpend = '\0';
- n->numbr = (AWKNUM) strtod((const char *)cp, &ptr);
-
- /* POSIX says trailing space is OK for NUMBER */
- while (isspace(*ptr))
- ptr++;
- *cpend = save;
- /* the >= should be ==, but for SunOS 3.5 strtod() */
- if (errno == 0 && ptr >= cpend)
- n->flags |= newflags;
- else
- errno = 0;
-
- return n->numbr;
-}
-
-/*
- * the following lookup table is used as an optimization in force_string
- * (more complicated) variations on this theme didn't seem to pay off, but
- * systematic testing might be in order at some point
- */
-static const char *values[] = {
- "0",
- "1",
- "2",
- "3",
- "4",
- "5",
- "6",
- "7",
- "8",
- "9",
-};
-#define NVAL (sizeof(values)/sizeof(values[0]))
-
-NODE *
-r_force_string(s)
-register NODE *s;
-{
- char buf[128];
- register char *sp = buf;
- double val;
-
-#ifdef DEBUG
- if (s == NULL) cant_happen();
- if (s->type != Node_val) cant_happen();
- if (s->flags & STR) return s;
- if (!(s->flags & NUM)) cant_happen();
- if (s->stref != 0) ; /*cant_happen();*/
-#endif
-
- /* not an integral value, or out of range */
- if ((val = double_to_int(s->numbr)) != s->numbr
- || val < LONG_MIN || val > LONG_MAX) {
-#ifdef GFMT_WORKAROUND
- NODE *dummy, *r;
- unsigned short oflags;
- extern NODE *format_tree P((const char *, int, NODE *));
- extern NODE **fmt_list; /* declared in eval.c */
-
- /* create dummy node for a sole use of format_tree */
- getnode(dummy);
- dummy->lnode = s;
- dummy->rnode = NULL;
- oflags = s->flags;
- s->flags |= PERM; /* prevent from freeing by format_tree() */
- r = format_tree(CONVFMT, fmt_list[CONVFMTidx]->stlen, dummy);
- s->flags = oflags;
- s->stfmt = (char)CONVFMTidx;
- s->stlen = r->stlen;
- s->stptr = r->stptr;
- freenode(r); /* Do not free_temp(r)! We want */
- freenode(dummy); /* to keep s->stptr == r->stpr. */
-
- goto no_malloc;
-#else
- /*
- * no need for a "replacement" formatting by gawk,
- * just use sprintf
- */
- sprintf(sp, CONVFMT, s->numbr);
- s->stlen = strlen(sp);
- s->stfmt = (char)CONVFMTidx;
-#endif /* GFMT_WORKAROUND */
- } else {
- /* integral value */
- /* force conversion to long only once */
- register long num = (long) val;
- if (num < NVAL && num >= 0) {
- sp = (char *) values[num];
- s->stlen = 1;
- } else {
- (void) sprintf(sp, "%ld", num);
- s->stlen = strlen(sp);
- }
- s->stfmt = -1;
- }
- emalloc(s->stptr, char *, s->stlen + 2, "force_string");
- memcpy(s->stptr, sp, s->stlen+1);
-no_malloc:
- s->stref = 1;
- s->flags |= STR;
- return s;
-}
-
-/*
- * Duplicate a node. (For strings, "duplicate" means crank up the
- * reference count.)
- */
-NODE *
-dupnode(n)
-NODE *n;
-{
- register NODE *r;
-
- if (n->flags & TEMP) {
- n->flags &= ~TEMP;
- n->flags |= MALLOC;
- return n;
- }
- if ((n->flags & (MALLOC|STR)) == (MALLOC|STR)) {
- if (n->stref < 255)
- n->stref++;
- return n;
- }
- getnode(r);
- *r = *n;
- r->flags &= ~(PERM|TEMP);
- r->flags |= MALLOC;
- if (n->type == Node_val && (n->flags & STR)) {
- r->stref = 1;
- emalloc(r->stptr, char *, r->stlen + 2, "dupnode");
- memcpy(r->stptr, n->stptr, r->stlen);
- r->stptr[r->stlen] = '\0';
- }
- return r;
-}
-
-/* this allocates a node with defined numbr */
-NODE *
-mk_number(x, flags)
-AWKNUM x;
-unsigned int flags;
-{
- register NODE *r;
-
- getnode(r);
- r->type = Node_val;
- r->numbr = x;
- r->flags = flags;
-#ifdef DEBUG
- r->stref = 1;
- r->stptr = 0;
- r->stlen = 0;
-#endif
- return r;
-}
-
-/*
- * Make a string node.
- */
-NODE *
-make_str_node(s, len, flags)
-char *s;
-size_t len;
-int flags;
-{
- register NODE *r;
-
- getnode(r);
- r->type = Node_val;
- r->flags = (STRING|STR|MALLOC);
- if (flags & ALREADY_MALLOCED)
- r->stptr = s;
- else {
- emalloc(r->stptr, char *, len + 2, s);
- memcpy(r->stptr, s, len);
- }
- r->stptr[len] = '\0';
-
- if (flags & SCAN) { /* scan for escape sequences */
- char *pf;
- register char *ptm;
- register int c;
- register char *end;
-
- end = &(r->stptr[len]);
- for (pf = ptm = r->stptr; pf < end;) {
- c = *pf++;
- if (c == '\\') {
- c = parse_escape(&pf);
- if (c < 0) {
- if (do_lint)
- warning("backslash at end of string");
- c = '\\';
- }
- *ptm++ = c;
- } else
- *ptm++ = c;
- }
- len = ptm - r->stptr;
- erealloc(r->stptr, char *, len + 1, "make_str_node");
- r->stptr[len] = '\0';
- r->flags |= PERM;
- }
- r->stlen = len;
- r->stref = 1;
- r->stfmt = -1;
-
- return r;
-}
-
-NODE *
-tmp_string(s, len)
-char *s;
-size_t len;
-{
- register NODE *r;
-
- r = make_string(s, len);
- r->flags |= TEMP;
- return r;
-}
-
-
-#define NODECHUNK 100
-
-NODE *nextfree = NULL;
-
-NODE *
-more_nodes()
-{
- register NODE *np;
-
- /* get more nodes and initialize list */
- emalloc(nextfree, NODE *, NODECHUNK * sizeof(NODE), "newnode");
- for (np = nextfree; np < &nextfree[NODECHUNK - 1]; np++) {
- np->flags = 0;
- np->nextp = np + 1;
- }
- np->nextp = NULL;
- np = nextfree;
- nextfree = nextfree->nextp;
- return np;
-}
-
-#ifdef DEBUG
-void
-freenode(it)
-NODE *it;
-{
-#ifdef MPROF
- it->stref = 0;
- free((char *) it);
-#else /* not MPROF */
- /* add it to head of freelist */
- it->nextp = nextfree;
- nextfree = it;
-#endif /* not MPROF */
-}
-#endif /* DEBUG */
-
-void
-unref(tmp)
-register NODE *tmp;
-{
- if (tmp == NULL)
- return;
- if (tmp->flags & PERM)
- return;
- if (tmp->flags & (MALLOC|TEMP)) {
- tmp->flags &= ~TEMP;
- if (tmp->flags & STR) {
- if (tmp->stref > 1) {
- if (tmp->stref != 255)
- tmp->stref--;
- return;
- }
- free(tmp->stptr);
- }
- freenode(tmp);
- }
-}
-
-/*
- * Parse a C escape sequence. STRING_PTR points to a variable containing a
- * pointer to the string to parse. That pointer is updated past the
- * characters we use. The value of the escape sequence is returned.
- *
- * A negative value means the sequence \ newline was seen, which is supposed to
- * be equivalent to nothing at all.
- *
- * If \ is followed by a null character, we return a negative value and leave
- * the string pointer pointing at the null character.
- *
- * If \ is followed by 000, we return 0 and leave the string pointer after the
- * zeros. A value of 0 does not mean end of string.
- *
- * Posix doesn't allow \x.
- */
-
-int
-parse_escape(string_ptr)
-char **string_ptr;
-{
- register int c = *(*string_ptr)++;
- register int i;
- register int count;
-
- switch (c) {
- case 'a':
- return BELL;
- case 'b':
- return '\b';
- case 'f':
- return '\f';
- case 'n':
- return '\n';
- case 'r':
- return '\r';
- case 't':
- return '\t';
- case 'v':
- return '\v';
- case '\n':
- return -2;
- case 0:
- (*string_ptr)--;
- return -1;
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- i = c - '0';
- count = 0;
- while (++count < 3) {
- if ((c = *(*string_ptr)++) >= '0' && c <= '7') {
- i *= 8;
- i += c - '0';
- } else {
- (*string_ptr)--;
- break;
- }
- }
- return i;
- case 'x':
- if (do_lint) {
- static int didwarn;
-
- if (! didwarn) {
- didwarn = 1;
- warning("Posix does not allow \"\\x\" escapes");
- }
- }
- if (do_posix)
- return ('x');
- i = 0;
- while (1) {
- if (isxdigit((c = *(*string_ptr)++))) {
- i *= 16;
- if (isdigit(c))
- i += c - '0';
- else if (isupper(c))
- i += c - 'A' + 10;
- else
- i += c - 'a' + 10;
- } else {
- (*string_ptr)--;
- break;
- }
- }
- return i;
- default:
- return c;
- }
-}
diff --git a/gnu/usr.bin/awk/patchlevel.h b/gnu/usr.bin/awk/patchlevel.h
deleted file mode 100644
index c80ca15..0000000
--- a/gnu/usr.bin/awk/patchlevel.h
+++ /dev/null
@@ -1 +0,0 @@
-#define PATCHLEVEL 5
diff --git a/gnu/usr.bin/awk/protos.h b/gnu/usr.bin/awk/protos.h
deleted file mode 100644
index 62e933a..0000000
--- a/gnu/usr.bin/awk/protos.h
+++ /dev/null
@@ -1,122 +0,0 @@
-/*
- * protos.h -- function prototypes for when the headers don't have them.
- */
-
-/*
- * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#ifdef __STDC__
-#define aptr_t void * /* arbitrary pointer type */
-#else
-#define aptr_t char *
-#endif
-extern aptr_t malloc P((MALLOC_ARG_T));
-extern aptr_t realloc P((aptr_t, MALLOC_ARG_T));
-extern aptr_t calloc P((MALLOC_ARG_T, MALLOC_ARG_T));
-
-#if !defined(sun) && !defined(__sun__)
-extern void free P((aptr_t));
-#endif
-extern char *getenv P((const char *));
-
-extern char *strcpy P((char *, const char *));
-extern char *strcat P((char *, const char *));
-extern int strcmp P((const char *, const char *));
-extern char *strncpy P((char *, const char *, size_t));
-extern int strncmp P((const char *, const char *, size_t));
-#ifndef VMS
-extern char *strerror P((int));
-#else
-extern char *strerror P((int,...));
-#endif
-extern char *strchr P((const char *, int));
-extern char *strrchr P((const char *, int));
-extern char *strstr P((const char *s1, const char *s2));
-extern size_t strlen P((const char *));
-extern long strtol P((const char *, char **, int));
-#if !defined(_MSC_VER) && !defined(__GNU_LIBRARY__)
-extern size_t strftime P((char *, size_t, const char *, const struct tm *));
-#endif
-#ifdef __STDC__
-extern time_t time P((time_t *));
-#else
-extern long time();
-#endif
-extern aptr_t memset P((aptr_t, int, size_t));
-extern aptr_t memcpy P((aptr_t, const aptr_t, size_t));
-extern aptr_t memmove P((aptr_t, const aptr_t, size_t));
-extern aptr_t memchr P((const aptr_t, int, size_t));
-extern int memcmp P((const aptr_t, const aptr_t, size_t));
-
-extern int fprintf P((FILE *, const char *, ...));
-#if !defined(MSDOS) && !defined(__GNU_LIBRARY__)
-#ifdef __STDC__
-extern size_t fwrite P((const aptr_t, size_t, size_t, FILE *));
-#else
-extern int fwrite();
-#endif
-extern int fputs P((const char *, FILE *));
-extern int unlink P((const char *));
-#endif
-extern int fflush P((FILE *));
-extern int fclose P((FILE *));
-extern FILE *popen P((const char *, const char *));
-extern int pclose P((FILE *));
-extern void abort P(());
-extern int isatty P((int));
-extern void exit P((int));
-extern int system P((const char *));
-extern int sscanf P((const char *, const char *, ...));
-#ifndef toupper
-extern int toupper P((int));
-#endif
-#ifndef tolower
-extern int tolower P((int));
-#endif
-
-extern double pow P((double x, double y));
-extern double atof P((const char *));
-extern double strtod P((const char *, char **));
-extern int fstat P((int, struct stat *));
-extern int stat P((const char *, struct stat *));
-extern off_t lseek P((int, off_t, int));
-extern int fseek P((FILE *, long, int));
-extern int close P((int));
-extern int creat P((const char *, mode_t));
-extern int open P((const char *, int, ...));
-extern int pipe P((int *));
-extern int dup P((int));
-extern int dup2 P((int,int));
-extern int fork P(());
-extern int execl P((/* char *, char *, ... */));
-#ifndef __STDC__
-extern int read P((int, char *, int));
-#endif
-extern int wait P((int *));
-extern void _exit P((int));
-
-#ifdef NON_STD_SPRINTF
-extern char *sprintf P((char *, const char*, ...));
-#else
-extern int sprintf P((char *, const char*, ...));
-#endif /* SPRINTF_INT */
-
-#undef aptr_t
diff --git a/gnu/usr.bin/awk/re.c b/gnu/usr.bin/awk/re.c
deleted file mode 100644
index 5ee1c43..0000000
--- a/gnu/usr.bin/awk/re.c
+++ /dev/null
@@ -1,212 +0,0 @@
-/*
- * re.c - compile regular expressions.
- */
-
-/*
- * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc.
- *
- * This file is part of GAWK, the GNU implementation of the
- * AWK Progamming Language.
- *
- * GAWK is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * GAWK is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with GAWK; see the file COPYING. If not, write to
- * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include "awk.h"
-
-/* Generate compiled regular expressions */
-
-Regexp *
-make_regexp(s, len, ignorecase, dfa)
-char *s;
-size_t len;
-int ignorecase;
-int dfa;
-{
- Regexp *rp;
- const char *rerr;
- char *src = s;
- char *temp;
- char *end = s + len;
- register char *dest;
- register int c;
-
- /* Handle escaped characters first. */
-
- /* Build a copy of the string (in dest) with the
- escaped characters translated, and generate the regex
- from that.
- */
- emalloc(dest, char *, len + 2, "make_regexp");
- temp = dest;
-
- while (src < end) {
- if (*src == '\\') {
- c = *++src;
- switch (c) {
- case 'a':
- case 'b':
- case 'f':
- case 'n':
- case 'r':
- case 't':
- case 'v':
- case 'x':
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- c = parse_escape(&src);
- if (c < 0)
- cant_happen();
- *dest++ = (char)c;
- break;
- default:
- *dest++ = '\\';
- *dest++ = (char)c;
- src++;
- break;
- } /* switch */
- } else {
- *dest++ = *src++; /* not '\\' */
- }
- } /* for */
-
- *dest = '\0' ; /* Only necessary if we print dest ? */
- emalloc(rp, Regexp *, sizeof(*rp), "make_regexp");
- memset((char *) rp, 0, sizeof(*rp));
- emalloc(rp->pat.buffer, unsigned char *, 16, "make_regexp");
- rp->pat.allocated = 16;
- emalloc(rp->pat.fastmap, char *, 256, "make_regexp");
-
- if (ignorecase)
- rp->pat.translate = casetable;
- else
- rp->pat.translate = NULL;
- len = dest - temp;
- if ((rerr = re_compile_pattern(temp, len, &(rp->pat))) != NULL)
- fatal("%s: /%s/", rerr, temp);
- if (dfa && !ignorecase) {
- dfacomp(temp, len, &(rp->dfareg), 1);
- rp->dfa = 1;
- } else
- rp->dfa = 0;
-
- free(temp);
- return rp;
-}
-
-int
-research(rp, str, start, len, need_start)
-Regexp *rp;
-register char *str;
-int start;
-register size_t len;
-int need_start;
-{
- char *ret = str;
-
- if (rp->dfa) {
- char save;
- int count = 0;
- int try_backref;
-
- /*
- * dfa likes to stick a '\n' right after the matched
- * text. So we just save and restore the character.
- */
- save = str[start+len];
- ret = dfaexec(&(rp->dfareg), str+start, str+start+len, 1,
- &count, &try_backref);
- str[start+len] = save;
- }
- if (ret) {
- if (need_start || rp->dfa == 0)
- return re_search(&(rp->pat), str, start+len, start,
- len, &(rp->regs));
- else
- return 1;
- } else
- return -1;
-}
-
-void
-refree(rp)
-Regexp *rp;
-{
- free(rp->pat.buffer);
- free(rp->pat.fastmap);
- if (rp->dfa)
- dfafree(&(rp->dfareg));
- free(rp);
-}
-
-void
-dfaerror(s)
-const char *s;
-{
- fatal(s);
-}
-
-Regexp *
-re_update(t)
-NODE *t;
-{
- NODE *t1;
-
-# define CASE 1
- if ((t->re_flags & CASE) == IGNORECASE) {
- if (t->re_flags & CONST)
- return t->re_reg;
- t1 = force_string(tree_eval(t->re_exp));
- if (t->re_text) {
- if (cmp_nodes(t->re_text, t1) == 0) {
- free_temp(t1);
- return t->re_reg;
- }
- unref(t->re_text);
- }
- t->re_text = dupnode(t1);
- free_temp(t1);
- }
- if (t->re_reg)
- refree(t->re_reg);
- if (t->re_cnt)
- t->re_cnt++;
- if (t->re_cnt > 10)
- t->re_cnt = 0;
- if (!t->re_text) {
- t1 = force_string(tree_eval(t->re_exp));
- t->re_text = dupnode(t1);
- free_temp(t1);
- }
- t->re_reg = make_regexp(t->re_text->stptr, t->re_text->stlen,
- IGNORECASE, t->re_cnt);
- t->re_flags &= ~CASE;
- t->re_flags |= IGNORECASE;
- return t->re_reg;
-}
-
-void
-resetup()
-{
- reg_syntax_t syn = RE_SYNTAX_AWK;
-
- (void) re_set_syntax(syn);
- dfasyntax(syn, 0);
-}
diff --git a/gnu/usr.bin/awk/version.c b/gnu/usr.bin/awk/version.c
deleted file mode 100644
index ffcd8e1..0000000
--- a/gnu/usr.bin/awk/version.c
+++ /dev/null
@@ -1,47 +0,0 @@
-char *version_string = "@(#)Gnu Awk (gawk) 2.15";
-
-/* 1.02 fixed /= += *= etc to return the new Left Hand Side instead
- of the Right Hand Side */
-
-/* 1.03 Fixed split() to treat strings of space and tab as FS if
- the split char is ' '.
-
- Added -v option to print version number
-
- Fixed bug that caused rounding when printing large numbers */
-
-/* 2.00beta Incorporated the functionality of the "new" awk as described
- the book (reference not handy). Extensively tested, but no
- doubt still buggy. Badly needs tuning and cleanup, in
- particular in memory management which is currently almost
- non-existent. */
-
-/* 2.01 JF: Modified to compile under GCC, and fixed a few
- bugs while I was at it. I hope I didn't add any more.
- I modified parse.y to reduce the number of reduce/reduce
- conflicts. There are still a few left. */
-
-/* 2.02 Fixed JF's bugs; improved memory management, still needs
- lots of work. */
-
-/* 2.10 Major grammar rework and lots of bug fixes from David.
- Major changes for performance enhancements from David.
- A number of minor bug fixes and new features from Arnold.
- Changes for MSDOS from Conrad Kwok and Scott Garfinkle.
- The gawk.texinfo and info files included! */
-
-/* 2.11 Bug fix release to 2.10. Lots of changes for portability,
- speed, and configurability. */
-
-/* 2.12 Lots of changes for portability, speed, and configurability.
- Several bugs fixed. POSIX compliance. Removal of last set
- of hard-wired limits. Atari and VMS ports added. */
-
-/* 2.13 Public release of 2.12 */
-
-/* 2.14 Mostly bug fixes. */
-
-/* 2.15 Bug fixes plus intermixing of command-line source and files,
- GNU long options, ARGIND, ERRNO and Plan 9 style /dev/ files.
- `delete array'. OS/2 port added. */
-
OpenPOWER on IntegriCloud