diff options
author | jraynard <jraynard@FreeBSD.org> | 1997-10-14 18:29:32 +0000 |
---|---|---|
committer | jraynard <jraynard@FreeBSD.org> | 1997-10-14 18:29:32 +0000 |
commit | 6db12e8fe98fa53c73801a8d9c6dc559eb2a65fa (patch) | |
tree | cf24aefca65bf86802f8fd598c150cd794a0ef01 /gnu/usr.bin/awk | |
parent | 66a0e8d6b0be9bddd17a134bb709154357040eb4 (diff) | |
download | FreeBSD-src-6db12e8fe98fa53c73801a8d9c6dc559eb2a65fa.zip FreeBSD-src-6db12e8fe98fa53c73801a8d9c6dc559eb2a65fa.tar.gz |
Remove old version of awk.
Diffstat (limited to 'gnu/usr.bin/awk')
32 files changed, 0 insertions, 29563 deletions
diff --git a/gnu/usr.bin/awk/ACKNOWLEDGMENT b/gnu/usr.bin/awk/ACKNOWLEDGMENT deleted file mode 100644 index cb4021f..0000000 --- a/gnu/usr.bin/awk/ACKNOWLEDGMENT +++ /dev/null @@ -1,25 +0,0 @@ -The current developers of Gawk would like to thank and acknowledge the -many people who have contributed to the development through bug reports -and fixes and suggestions. Unfortunately, we have not been organized -enough to keep track of all the names -- for that we apologize. - -Another group of people have assisted even more by porting Gawk to new -platforms and providing a great deal of feedback. They are: - - Hal Peterson <hrp@pecan.cray.com> (Cray) - Pat Rankin <gawk.rankin@EQL.Caltech.Edu> (VMS) - Michal Jaegermann <michal@gortel.phys.UAlberta.CA> (Atari, NeXT, DEC 3100) - Mike Lijewski <mjlx@eagle.cnsf.cornell.edu> (IBM RS6000) - Scott Deifik <scottd@amgen.com> (MSDOS 2.14 and 2.15) - Kent Williams (MSDOS 2.11) - Conrad Kwok (MSDOS earlier versions) - Scott Garfinkle (MSDOS earlier versions) - Kai Uwe Rommel <rommel@ars.muc.de> (OS/2) - Darrel Hankerson <hankedr@mail.auburn.edu> (OS/2) - Mark Moraes <Mark-Moraes@deshaw.com> (Code Center, Purify) - Kaveh Ghazi <ghazi@noc.rutgers.edu> (Lots of Unix variants) - -Last, but far from least, we would like to thank Brian Kernighan who -has helped to clear up many dark corners of the language and provided a -restraining touch when we have been overly tempted by "feeping -creaturism". diff --git a/gnu/usr.bin/awk/COPYING b/gnu/usr.bin/awk/COPYING deleted file mode 100644 index 3358a7b..0000000 --- a/gnu/usr.bin/awk/COPYING +++ /dev/null @@ -1,340 +0,0 @@ - GNU GENERAL PUBLIC LICENSE - Version 2, June 1991 - - Copyright (C) 1989, 1991 Free Software Foundation, Inc. - 675 Mass Ave, Cambridge, MA 02139, USA - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble - - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software--to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Library General Public License instead.) You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - - GNU GENERAL PUBLIC LICENSE - TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION - - 0. This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The "Program", below, -refers to any such program or work, and a "work based on the Program" -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term "modification".) Each licensee is addressed as "you". - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - - 1. You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - - 2. You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - - a) You must cause the modified files to carry prominent notices - stating that you changed the files and the date of any change. - - b) You must cause any work that you distribute or publish, that in - whole or in part contains or is derived from the Program or any - part thereof, to be licensed as a whole at no charge to all third - parties under the terms of this License. - - c) If the modified program normally reads commands interactively - when run, you must cause it, when started running for such - interactive use in the most ordinary way, to print or display an - announcement including an appropriate copyright notice and a - notice that there is no warranty (or else, saying that you provide - a warranty) and that users may redistribute the program under - these conditions, and telling the user how to view a copy of this - License. (Exception: if the Program itself is interactive but - does not normally print such an announcement, your work based on - the Program is not required to print an announcement.) - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - - 3. You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - - a) Accompany it with the complete corresponding machine-readable - source code, which must be distributed under the terms of Sections - 1 and 2 above on a medium customarily used for software interchange; or, - - b) Accompany it with a written offer, valid for at least three - years, to give any third party, for a charge no more than your - cost of physically performing source distribution, a complete - machine-readable copy of the corresponding source code, to be - distributed under the terms of Sections 1 and 2 above on a medium - customarily used for software interchange; or, - - c) Accompany it with the information you received as to the offer - to distribute corresponding source code. (This alternative is - allowed only for noncommercial distribution and only if you - received the program in object code or executable form with such - an offer, in accord with Subsection b above.) - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - - 4. You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - - 5. You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - - 6. Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - - 7. If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - - 8. If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - - 9. The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and "any -later version", you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - - 10. If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - - NO WARRANTY - - 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - - 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. - - END OF TERMS AND CONDITIONS - - Appendix: How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - <one line to give the program's name and a brief idea of what it does.> - Copyright (C) 19yy <name of author> - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - - Gnomovision version 69, Copyright (C) 19yy name of author - Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - -The hypothetical commands `show w' and `show c' should show the appropriate -parts of the General Public License. Of course, the commands you use may -be called something other than `show w' and `show c'; they could even be -mouse-clicks or menu items--whatever suits your program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a "copyright disclaimer" for the program, if -necessary. Here is a sample; alter the names: - - Yoyodyne, Inc., hereby disclaims all copyright interest in the program - `Gnomovision' (which makes passes at compilers) written by James Hacker. - - <signature of Ty Coon>, 1 April 1989 - Ty Coon, President of Vice - -This General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may -consider it more useful to permit linking proprietary applications with the -library. If this is what you want to do, use the GNU Library General -Public License instead of this License. - diff --git a/gnu/usr.bin/awk/FUTURES b/gnu/usr.bin/awk/FUTURES deleted file mode 100644 index 6250584..0000000 --- a/gnu/usr.bin/awk/FUTURES +++ /dev/null @@ -1,117 +0,0 @@ -This file lists future projects and enhancements for gawk. Items are listed -in roughly the order they will be done for a given release. This file is -mainly for use by the developer(s) to help keep themselves on track, please -don't bug us too much about schedules or what all this really means. - -(An `x' indicates that some progress has been made, but that the feature is -not complete yet.) - -For 2.16 -======== -x Move to autoconf-based configure system. - -x Research awk `fflush' function. - -x Generalize IGNORECASE - any value makes it work, not just numeric non-zero - make it apply to *all* string comparisons - -x Fix FILENAME to have an initial value of "", not "-" - -In 2.17 -======= -x Allow RS to be a regexp. - - RT variable to hold text of record terminator - - RECLEN variable for fixed length records - - Feedback alloca.s changes to FSF - -x Split() with null string as third arg to split up strings - -x Analogously, setting FS="" would split the input record into individual - characters. - -x Clean up code by isolating system-specific functions in separate files. - - Undertake significant directory reorganization. - -x Extensive manual cleanup: - Use of texinfo 2.0 features - Lots more examples - Document all of the above. - -x Go to POSIX regexps - - Make regex + dfa less dependant on gawk header file includes - - Additional manual features: - Document posix regexps - Document use of dbm arrays - ? Add an error messages section to the manual - ? A section on where gawk is bounded - regex - i/o - sun fp conversions - -For 2.18 -======== - DBM storage of awk arrays. Try to allow multiple dbm packages - - General sub functions: - edit(line, pat, sub) and gedit(line, pat, sub) - that return the substituted strings and allow \1 etc. in the sub - string. - - ? Have strftime() pay attention to the value of ENVIRON["TZ"] - -For 2.19 -======== - Add chdir and stat built-in functions. - - Add function pointers as valid variable types. - - Add an `ftw' built-in function that takes a function pointer. - - Do an optimization pass over parse tree? - -For 2.20 or later: -================== -Add variables similar to C's __FILE__ and __LINE__ for better diagnostics -from within awk programs. - -Add an explicit concatenation operator and assignment version. - -? Add a switch statement - -Add the ability to seek on an open file and retrieve the current file position. - -Add lint checking everywhere, including check for use of builtin vars. -only in new awk. - -"restart" keyword - -Add |& - -Make awk '/foo/' files... run at egrep speeds - -Do a reference card - -Allow OFMT and CONVFMT to be other than a floating point format. - -Allow redefining of builtin functions? - -Make it faster and smaller. - -For 3.x: -======== - -Create a gawk compiler? - -Create a gawk-to-C translator? (or C++??) - -Provide awk profiling and debugging. - - - diff --git a/gnu/usr.bin/awk/LIMITATIONS b/gnu/usr.bin/awk/LIMITATIONS deleted file mode 100644 index 64eab85..0000000 --- a/gnu/usr.bin/awk/LIMITATIONS +++ /dev/null @@ -1,16 +0,0 @@ -This file describes limits of gawk on a Unix system (although it -is variable even then). Non-Unix systems may have other limits. - -# of fields in a record: MAX_INT -Length of input record: MAX_INT -Length of output record: unlimited -Size of a field: MAX_INT -Size of a printf string: MAX_INT -Size of a literal string: MAX_INT -Characters in a character class: 2^(# of bits per byte) -# of file redirections: unlimited -# of pipe redirections: min(# of processes per user, # of open files) -double-precision floating point -Length of source line: unlimited -Number of input records in one file: MAX_LONG -Number of input records total: MAX_LONG diff --git a/gnu/usr.bin/awk/NEWS b/gnu/usr.bin/awk/NEWS deleted file mode 100644 index 4df69e7..0000000 --- a/gnu/usr.bin/awk/NEWS +++ /dev/null @@ -1,1480 +0,0 @@ -Changes from 2.15.4 to 2.15.5 ------------------------------ - -FUTURES file updated and re-arranged some with more rational schedule. - -Many prototypes handled better for ANSI C in protos.h. - -getopt.c updated somewhat. - -test/Makefile now removes junk directory, `bardargtest' renamed `badargs.' - -Bug fix in iop.c for RS = "". Eat trailing newlines off of record separator. - -Bug fix in Makefile.bsd44, use leading tab in actions. - -Fix in field.c:set_FS for FS == "\\" and IGNORECASE != 0. - -Config files updated or added: - cray60, DEC OSF/1 2.0, Utek, sgi405, next21, next30, atari/config.h, - sco. - -Fix in io.c for ENFILE as well as EMFILE, update decl of groupset to -include OSF/1. - -Rationalized printing as integers if numbers are outside the range of a long. -Changes to node.c:force_string and builtin.c. - -Made internal NF, NR, and FNR variables longs instead of ints. - -Add LIMITS_H_MISSING stuff to config.in and awk.h, and default defs for -INT_MAX and LONG_MAX, if no limits.h file. Add a standard decl of -the time() function for __STDC__. From ghazi@noc.rutgers.edu. - -Fix tree_eval in awk.h and r_tree_eval in eval.c to deal better with -function parameters, particularly ones that are arrays. - -Fix eval.c to print out array names of arrays used in scalar contexts. - -Fix eval.c in interpret to zero out source and sourceline initially. This -does a better job of providing source file and line number information. - -Fix to re_parse_field in field.c to not use isspace when RS = "", but rather -to explicitly look for blank and tab. - -Fix to sc_parse_field in field.c to catch the case of the FS character at the -end of a record. - -Lots of miscellanious bug fixes for memory leaks, courtesy Mark Moraes, -also fixes for arrays. - -io.c fixed to warn about lack of explicit closes if --lint. - -Updated missing/strftime.c to match posted strftime 6.2. - -Bug fix in builtin.c, in case of non-match in sub_common. - -Updated constant used for division in builtin.c:do_rand for DEC Alpha -and CRAY Y-MP. - -POSIXLY_CORRECT in the environment turns on --posix (fixed in main.c). - -Updated srandom prototype and calls in builtin.c. - -Fix awk.y to enforce posix semantics of unary +: result is numeric. - -Fix array.c to not rearrange the hash chain upon finding an index in -the array. This messed things up in cases like: - for (index1 in array) { - blah - if (index2 in array) # blew away the for - stuff - } - -Fixed spelling errors in the man page. - -Fixes in awk.y so that - gawk '' /path/to/file -will work without core dumping or finding parse errors. - -Fix main.c so that --lint will fuss about an empty program. -Yet another fix for argument parsing in the case of unrecognized options. - -Bug fix in dfa.c to not attempt to free null pointers. - -Bug fix in builtin.c to only use DEFAULT_G_PRECISION for %g or %G. - -Bug fix in field.c to achieve call by value semantics for split. - -Changes from 2.15.3 to 2.15.4 ------------------------------ - -Lots of lint fixes, and do_sprintf made mostly ANSI C compatible. - -Man page updated and edited. - -Copyrights updated. - -Arrays now grow dynamically, initially scaling up by an order of magnitude - and then doubling, up to ~ 64K. This should keep gawk's performance - graceful under heavy load. - -New `delete array' feature added. Only documented in the man page. - -Switched to dfa and regex suites from grep-2.0. These offer the ability to - move to POSIX regexps in the next release. - -Disabled GNU regex ops. - -Research awk -m option now recognized. It does nothing in gawk, since gawk - has no static limits. Only documented in the man page. - -New bionic (faster, better, stronger than before) hashing function. - -Bug fix in argument handling. `gawk -X' now notices there was no program. - Additional bug fixes to make --compat and --lint work again. - -Many changes for systems where sizeof(int) != sizeof(void *). - -Add explicit alloca(0) in io.c to recover space from C alloca. - -Fixed file descriptor leak in io.c. - -The --version option now follows the GNU coding standards and exits. - -Fixed several prototypes in protos.h. - -Several tests updated. On Solaris, warn that the out? tests will fail. - -Configuration files for SunOS with cc and Solaris 2.x added. - -Improved error messages in awk.y on gawk extensions if do_unix or do_compat. - -INSTALL file added. - -Fixed Atari Makefile and several VMS specific changes. - -Better conversion of numbers to strings on systems with broken sprintfs. - -Changes from 2.15.2 to 2.15.3 ------------------------------ - -Increased HASHSIZE to a decent number, 127 was way too small. - -FILENAME is now the null string in a BEGIN rule. - -Argument processing fixed for invalid options and missing arguments. - -This version will build on VMS. This included a fix to close all files - and pipes opened with redirections before closing stdout and stderr. - -More getpgrp() defines. - -Changes for BSD44: <sys/param.h> in io.c and Makefile.bsd44. - -All directories in the distribution are now writable. - -Separated LDFLAGS and CFLAGS in Makefile. CFLAGS can now be overridden by - user. - -Make dist now builds compressed archives ending in .gz and runs doschk. - -Amiga port. - -New getopt.c fixes Alpha OSF/1 problem. - -Make clean now removes possible test output. - -Improved algorithm for multiple adjacent string concatenations leads to - performance improvements. - -Fix nasty bug whereby command-line assignments, both with -v and at run time, - could create variables with syntactically illegal names. - -Fix obscure bug in printf with %0 flag and filling. - -Add a lint check for substr if provided length exceeds remaining characters - in string. - -Update atari support. - -PC support enhanced to include support for both DOS and OS/2. (Lots more - #ifdefs. Sigh.) - -Config files for Hitachi Unix and OSF/1, courtesy of Yoko Morishita - (morisita@sra.co.jp) - -Changes from 2.15.1 to 2.15.2 ------------------------------ - -Additions to the FUTURES file. - -Document undefined order of output when using both standard output - and /dev/stdout or any of the /dev output files that gawk emulates in - the absence of OS support. - -Clean up the distribution generation in Makefile.in: the info files are - now included, the distributed files are marked read-only and patched - distributions are now unpacked in a directory named with the patch level. - -Changes from 2.15 to 2.15.1 ---------------------------- - -Close stdout and stderr before all redirections on program exit. This allows - detection of write errors and also fixes the messages test on Solaris 2.x. - -Removed YYMAXDEPTH define in awk.y which was limiting the parser stack depth. - -Changes to config/bsd44, Makefile.bsd44 and configure to bring it into line - with the BSD4.4 release. - -Changed Makefile to use prefix, exec_prefix, bindir etc. - -make install now installs info files. - -make install now sets permissions on installed files. - -Make targets added: uninstall, distclean, mostlyclean and realclean. - -Added config.h to cleaner and clobber make targets. - -Changes to config/{hpux8x,sysv3,sysv4,ultrix41} to deal with alloca(). - -Change to getopt.h for portability. - -Added more special cases to the getpgrp() call. - -Added README.ibmrt-aos and config/ibmrt-aos. - -Changes from 2.14 to 2.15 ---------------------------- - -Command-line source can now be mixed with library functions. - -ARGIND variable tracks index in ARGV of FILENAME. - -GNU style long options in addition to short options. - -Plan 9 style special files interpreted by gawk: - /dev/pid - /dev/ppid - /dev/pgrpid - /dev/user - $1 = getuid - $2 = geteuid - $3 = getgid - $4 = getegid - $5 ... $NF = getgroups if supported - -ERRNO variable contains error string if getline or close fails. - -Very old options -a and -e have gone away. - -Inftest has been removed from the default target in test/Makefile -- the - results were too machine specific and resulted in too many false alarms. - -A README.amiga has been added. - -The "too many arguments supplied for format string" warning message is only - in effect under the lint option. - -Code improvements in dfa.c. - -Fixed all reported bugs: - - Writes are checked for failure (such as full filesystem). - - Stopped (at least some) runaway error messages. - - gsub(/^/, "x") does the right thing for $0 of 0, 1, or more length. - - close() on a command being piped to a getline now works properly. - - The input record will no longer be freed upon an explicit close() - of the input file. - - A NUL character in FS now works. - - In a substitute, \\& now means a literal backslash followed by what - was matched. - - Integer overflow of substring length in substr() is caught. - - An input record without a newline termination is handled properly. - - In io.c, check is against only EMFILE so that system file table - is not filled. - - Renamed all files with names longer than 14 characters. - - Escaped characters in regular expressions were being lost when - IGNORECASE was used. - - Long source lines were not being handled properly. - - Sourcefiles that ended in a tab but no newline were bombing. - - Patterns that could match zero characters in split() were not working - properly. - - The parsedebug option was not working. - - The grammar was being a bit too lenient, allowing some very dubious - programs to pass. - - Compilation with DEBUG defined now works. - - A variable read in with getline was not being treated as a potential - number. - - Array subscripts were not always of string type. - - -Changes from 2.13.2 to 2.14 ---------------------------- - -Updated manual! - -Added "next file" to skip efficiently to the next input file. - -Fixed potential of overflowing buffer in do_sprintf(). - -Plugged small memory leak in sub_common(). - -EOF on a redirect is now "sticky" -- it can only be cleared by close()ing - the pipe or file. - -Now works if used via a #! /bin/gawk line at the top of an executable file - when that line ends with whitespace. - -Added some checks to the grammar to catch redefinition of builtin functions. - This could eventually be the basis for an extension to allow redefining - functions, but in the mean time it's a good error catching facility. - -Negative integer exponents now work. - -Modified do_system() to make sure it had a non-null string to be passed - to system(3). Thus, system("") will flush any pending output but not go - through the overhead of forking an un-needed shell. - -A fix to floating point comparisons so that NaNs compare right on IEEE systems. - -Added code to make sure we're not opening directories for reading and such. - -Added code to do better diagnoses of weird or null file names. - -Allow continue outside of a loop, unless in strict posix mode. Lint option - will issue warning. - -New missing/strftime.c. There has been one change that affects gawk. Posix - now defines a %V conversion so the vms conversion has been changed to %v. - If this version is used with gawk -Wlint and they use %V in a call to - strftime, they'll get a warning. - -Error messages now conform to GNU standard (I hope). - -Changed comparisons to conform to the description found in the file POSIX. - This is inconsistent with the current POSIX draft, but that is broken. - Hopefully the final POSIX standard will conform to this version. - (Alas, this will have to wait for 1003.2b, which will be a revision to - the 1003.2 standard. That standard has been frozen with the broken - comparison rules.) - -The length of a string was a short and now is a size_t. - -Updated VMS help. - -Added quite a few new tests to the test suite and deleted many due to lack of - written releases. Test output is only removed if it is identical to the - "good" output. - -Fixed a couple of bugs for reference to $0 when $0 is "" -- particularly in - a BEGIN block. - -Fixed premature freeing in construct "$0 = $0". - -Removed the call to wait_any() in gawk_popen(), since on at least some systems, - if gawk's input was from a pipe, the predecessor process in the pipe was a - child of gawk and this caused a deadlock. - -Regexp can (once again) match a newline, if given explicitly. - -nextopen() makes sure file name is null terminated. - -Fixed VMS pipe simulation. Improved VMS I/O performance. - -Catch . used in variable names. - -Fixed bug in getline without redirect from a file -- it was quitting after the - first EOF, rather than trying the next file. - -Fixed bug in treatment of backslash at the end of a string -- it was bombing - rather than doing something sensible. It is not clear what this should mean, - but for now I issue a warning and take it as a literal backslash. - -Moved setting of regexp syntax to before the option parsing in main(), to - handle things like -v FS='[.,;]' - -Fixed bug when NF is set by user -- fields_arr must be expanded if necessary - and "new" fields must be initialized. - -Fixed several bugs in [g]sub() for no match found or the match is 0-length. - -Fixed bug where in gsub() a pattern anchored at the beginning would still - substitute throughout the string. - -make test does not assume the . is in PATH. - -Fixed bug when a field beyond the end of the record was requested after - $0 was altered (directly or indirectly). - -Fixed bug for assignment to field beyond end of record -- the assigned value - was not found on subsequent reference to that field. - -Fixed bug for FS a regexp and it matches at the end of a record. - -Fixed memory leak for an array local to a function. - -Fixed hanging of pipe redirection to getline - -Fixed coredump on access to $0 inside BEGIN block. - -Fixed treatment of RS = "". It now parses the fields correctly and strips - leading whitespace from a record if FS is a space. - -Fixed faking of /dev/stdin. - -Fixed problem with x += x - -Use of scalar as array and vice versa is now detected. - -IGNORECASE now obeyed for FS (even if FS is a single alphabetic character). - -Switch to GPL version 2. - -Renamed awk.tab.c to awktab.c for MSDOS and VMS tar programs. - -Renamed this file (CHANGES) to NEWS. - -Use fmod() instead of modf() and provide FMOD_MISSING #define to undo - this change. - -Correct the volatile declarations in eval.c. - -Avoid errant closing of the file descriptors for stdin, stdout and stderr. - -Be more flexible about where semi-colons can occur in programs. - -Check for write errors on all output, not just on close(). - -Eliminate the need for missing/{strtol.c,vprintf.c}. - -Use GNU getopt and eliminate missing/getopt.c. - -More "lint" checking. - - -Changes from 2.13.1 to 2.13.2 ------------------------------ - -Toward conformity with GNU standards, configure is a link to mkconf, the latter - to disappear in the next major release. - -Update to config/bsd43. - -Added config/apollo, config/msc60, config/cray2-50, config/interactive2.2 - -sgi33.cc added for compilation using cc rather than gcc. - -Ultrix41 now propagates to config.h properly -- as part of a general - mechanism in configure for kludges -- #define anything from a config file - just gets tacked onto the end of config.h -- to be used sparingly. - -Got rid of an unnecessary and troublesome declaration of vprintf(). - -Small improvement in locality of error messages. - -Try to diagnose use of array as scalar and vice versa -- to be improved in - the future. - -Fix for last bug fix for Cray division code--sigh. - -More changes to test suite to explicitly use sh. Also get rid of - a few generated files. - -Fixed off-by-one bug in string concatenation code. - -Fix for use of array that is passed in from a previous function parameter. - Addition to test suite for above. - -A number of changes associated with changing NF and access to fields - beyond the end of the current record. - -Change to missing/memcmp.c to avoid seg. fault on zero length input. - -Updates to test suite (including some inadvertently left out of the last patch) - to invoke sh explicitly (rather than rely on #!/bin/sh) and remove some - junk files. test/chem/good updated to correspond to bug fixes. - -Changes from 2.13.0 to 2.13.1 ------------------------------ - -More configs and PORTS. - -Fixed bug wherein a simple division produced an erroneous FPE, caused by - the Cray division workaround -- that code is now #ifdef'd only for - Cray *and* fixed. - -Fixed bug in modulus implementation -- it was very close to the above - code, so I noticed it. - -Fixed portability problem with limits.h in missing.c - -Fixed portability problem with tzname and daylight -- define TZNAME_MISSING - if strftime() is missing and tzname is also. - -Better support for Latin-1 character set. - -Fixed portability problem in test Makefile. - -Updated PROBLEMS file. - -=============================== gawk-2.13 released ========================= -Changes from 2.12.42 to 2.12.43 -------------------------------- - -Typo in awk.y - -Fixed up strftime.3 and added doc. for %V. - -Changes from 2.12.41 to 2.12.42 -------------------------------- - -Fixed bug in devopen() -- if you had write permission in /dev, - it would just create /dev/stdout etc.!! - -Final (?) VMS update. - -Make NeXT use GFMT_WORKAROUND - -Fixed bug in sub_common() for substitute on zero-length match. Improved the - code a bit while I was at it. - -Fixed grammar so that $i++ parses as ($i)++ - -Put support/* back in the distribution (didn't I already do this?!) - -Changes from 2.12.40 to 2.12.41 -------------------------------- - -VMS workaround for broken %g format. - -Changes from 2.12.39 to 2.12.40 -------------------------------- - -Minor man page update. - -Fixed latent bug in redirect(). - -Changes from 2.12.38 to 2.12.39 -------------------------------- - -Updates to test suite -- remove dependence on changing gawk.1 man page. - -Changes from 2.12.37 to 2.12.38 -------------------------------- - -Fixed bug in use of *= without whitespace following. - -VMS update. - -Updates to man page. - -Option handling updates in main.c - -test/manyfiles redone and added to bigtest. - -Fixed latent (on Sun) bug in handling of save_fs. - -Changes from 2.12.36 to 2.12.37 -------------------------------- - -Update REL in Makefile-dist. Incorporate test suite into main distribution. - -Minor fix in regtest. - -Changes from 2.12.35 to 2.12.36 -------------------------------- - -Release takes on dual personality -- 2.12.36 and 2.13.0 -- any further - patches before public release won't count for 2.13, although they will for - 2.12 -- be careful to avoid confusion! patchlevel.h will be the last thing - to change. - -Cray updates to deal with arithmetic problems. - -Minor test suite updates. - -Fixed latent bug in parser (freeing memory). - -Changes from 2.12.34 to 2.12.35 -------------------------------- - -VMS updates. - -Flush stdout at top of err() and stderr at bottom. - -Fixed bug in eval_condition() -- it wasn't testing for MAYBE_NUM and - doing the force_number(). - -Included the missing manyfiles.awk and a new test to catch the above bug which - I am amazed wasn't already caught by the test suite -- it's pretty basic. - -Changes from 2.12.33 to 2.12.34 -------------------------------- - -Atari updates -- including bug fix. - -More VMS updates -- also nuke vms/version.com. - -Fixed bug in handling of large numbers of redirections -- it was probably never - tested before (blush!). - -Minor rearrangement of code in r_force_number(). - -Made chem and regtest tests a bit more portable (Ultrix again). - -Added another test -- manyfiles -- not invoked under any other test -- very Unix - specific. - -Rough beginning of LIMITATIONS file -- need my AWK book to complete it. - -Changes from 2.12.32 to 2.12.33 -------------------------------- - -Expunge debug.? from various files. - -Remove vestiges of Floor and Ceil kludge. - -Special case integer division -- mainly for Cray, but maybe someone else - will benefit. - -Workaround for iop_close closing an output pipe descriptor on Cray -- - not conditional since I think it may fix a bug on SGI as well and I don't - think it can hurt elsewhere. - -Fixed memory leak in assoc_lookup(). - -Small cleanup in test suite. - -Changes from 2.12.31 to 2.12.32 -------------------------------- - -Nuked debug.c and debugging flag -- there are better ways. - -Nuked version.sh and version.c in subdirectories. - -Fixed bug in handling of IGNORECASE. - -Fixed bug when FIELDWIDTHS was set via -v option. - -Fixed (obscure) bug when $0 is assigned a numerical value. - -Fixed so that escape sequences in command-line assignments work (as it already - said in the comment). - -Added a few cases to test suite. - -Moved support/* back into distribution. - -VMS updates. - -Changes from 2.12.30 to 2.12.31 -------------------------------- - -Cosmetic manual page changes. - -Updated sunos3 config. - -Small changes in test suite including renaming files over 14 chars. in length. - -Changes from 2.12.29 to 2.12.30 -------------------------------- - -Bug fix for many string concatenations in a row. - -Changes from 2.12.28 to 2.12.29 -------------------------------- - -Minor cleanup in awk.y - -Minor VMS update. - -Minor atari update. - -Changes from 2.12.27 to 2.12.28 -------------------------------- - -Got rid of the debugging goop in eval.c -- there are better ways. - -Sequent port. - -VMS changes left out of the last patch -- sigh! config/vms.h renamed - to config/vms-conf.h. - -Fixed missing/tzset.c - -Removed use of gcvt() and GCVT_MISSING -- turns out it was no faster than - sprintf("%g") and caused all sorts of portability headaches. - -Tuned get_field() -- it was unnecessarily parsing the whole record on reference - to $0. - -Tuned interpret() a bit in the rule_node loop. - -In r_force_number(), worked around bug in Uglix strtod() and got rid of - ugly do{}while(0) at Michal's urging. - -Replaced do_deref() and deref with unref(node) -- much cleaner and a bit faster. - -Got rid of assign_number() -- contrary to comment, it was no faster than - just making a new node and freeing the old one. - -Replaced make_number() and tmp_number() with macros that call mk_number(). - -Changed freenode() and newnode() into macros -- the latter is getnode() - which calls more_nodes() as necessary. - -Changes from 2.12.26 to 2.12.27 -------------------------------- - -Completion of Cray 2 port (includes a kludge for floor() and ceil() - that may go or be changed -- I think that it may just be working around - a bug in chem that is being tweaked on the Cray). - -More VMS updates. - -Moved kludge over yacc's insertion of malloc and realloc declarations - from protos.h to the Makefile. - -Added a lisp interpreter in awk to the test suite. (Invoked under - bigtest.) - -Cleanup in r_force_number() -- I had never gotten around to a thorough - profile of the cache code and it turns out to be not worth it. - -Performance boost -- do lazy force_number()'ing for fields etc. i.e. - flag them (MAYBE_NUM) and call force_number only as necessary. - -Changes from 2.12.25 to 2.12.26 -------------------------------- - -Rework of regexp stuff so that dynamic regexps have reasonable - performance -- string used for compiled regexp is stored and - compared to new string -- if same, no recompilation is necessary. - Also, very dynamic regexps cause dfa-based searching to be turned - off. - -Code in dev_open() is back to returning fileno(std*) rather than - dup()ing it. This will be documented. Sorry for the run-around - on this. - -Minor atari updates. - -Minor vms update. - -Missing file from MSDOS port. - -Added warning (under lint) if third arg. of [g]sub is a constant and - handle it properly in the code (i.e. return how many matches). - -Changes from 2.12.24 to 2.12.25 -------------------------------- - -MSDOS port. - -Non-consequential changes to regexp variables in preparation for - a more serious change to fix a serious performance problem. - -Changes from 2.12.23 to 2.12.24 -------------------------------- - -Fixed bug in output flushing introduced a few patches back. This caused - serious performance losses. - -Changes from 2.12.22 to 2.12.23 -------------------------------- - -Accidentally left config/cray2-60 out of last patch. - -Added some missing dependencies to Makefile. - -Cleaned up mkconf a bit; made yacc the default parser (no alloca needed, - right?); added rs6000 hook for signed characters. - -Made regex.c with NO_ALLOCA undefined work. - -Fixed bug in dfa.c for systems where free(NULL) bombs. - -Deleted a few cant_happen()'s that *really* can't hapen. - -Changes from 2.12.21 to 2.12.22 -------------------------------- - -Added to config stuff the ability to choose YACC rather than bison. - -Fixed CHAR_UNSIGNED in config.h-dist. - -Second arg. of strtod() is char ** rather than const char **. - -stackb is now initially malloc()'ed since it may be realloc()'ed. - -VMS updates. - -Added SIZE_T_MISSING to config stuff and a default typedef to awk.h. - (Maybe it is not needed on any current systems??) - -re_compile_pattern()'s size is now size_t unconditionally. - -Changes from 2.12.20 to 2.12.21 -------------------------------- - -Corrected missing/gcvt.c. - -Got rid of use of dup2() and thus DUP_MISSING. - -Updated config/sgi33. - -Turned on (and fixed) in cmp_nodes() the behaviour that I *hope* will be in - POSIX 1003.2 for relational comparisons. - -Small updates to test suite. - -Changes from 2.12.19 to 2.12.20 -------------------------------- - -Sloppy, sloppy, sloppy!! I didn't even try to compile the last two - patches. This one fixes goofs in regex.c. - -Changes from 2.12.18 to 2.12.19 -------------------------------- - -Cleanup of last patch. - -Changes from 2.12.17 to 2.12.18 -------------------------------- - -Makefile renamed to Makefile-dist. - -Added alloca() configuration to mkconf. (A bit kludgey.) Just - add a single line containing ALLOCA_PW, ALLOCA_S or ALLOCA_C - to the appropriate config file to have Makefile-dist edited - accordingly. - -Reorganized output flushing to correspond with new semantics of - devopen() on "/dev/std*" etc. - -Fixed rest of last goof!! - -Save and restore errno in do_pathopen(). - -Miscellaneous atari updates. - -Get rid of the trailing comma in the NODETYPE definition (Cray - compiler won't take it). - -Try to make the use of `const' consistent since Cray compiler is - fussy about that. See the changes to `basename' and `myname'. - -It turns out that, according to section 3.8.3 (Macro Replacement) - of the ANSI Standard: ``If there are sequences of preprocessing - tokens within the list of arguments that would otherwise act as - preprocessing directives, the behavior is undefined.'' That means - that you cannot count on the behavior of the declaration of - re_compile_pattern in awk.h, and indeed the Cray compiler chokes on it. - -Replaced alloca with malloc/realloc/free in regex.c. It was much simpler - than expected. (Inside NO_ALLOCA for now -- by default no alloca.) - -Added a configuration file, config/cray60, for Unicos-6.0. - -Changes from 2.12.16 to 2.12.17 -------------------------------- - -Ooops. Goofed signal use in last patch. - -Changes from 2.12.15 to 2.12.16 -------------------------------- - -RENAMED *_dir to just * (e.g. missing_dir). - -Numerous VMS changes. - -Proper inclusion of atari and vms files. - -Added experimental (ifdef'd out) RELAXED_CONTINUATION and DEFAULT_FILETYPE - -- please comment on these! - -Moved pathopen() to io.c (sigh). - -Put local directory ahead in default AWKPATH. - -Added facility in mkconf to echo comments on stdout: lines beginning - with "#echo " will have the remainder of the line echoed when mkconf is run. - Any lines starting with "#" will otherwise be treated as comments. The - intent is to be able to say: - "#echo Make sure you uncomment alloca.c in the Makefile" - or the like. - -Prototype fix for V.4 - -Fixed version_string to not print leading @(#). - -Fixed FIELDWIDTHS to work with strict (turned out to be easy). - -Fixed conf for V.2. - -Changed semantics of /dev/fd/n to be like on real /dev/fd. - -Several configuration and updates in the makefile. - -Updated manpage. - -Include tzset.c and system.c from missing_dir that were accidently left out of - the last patch. - -Fixed bug in cmdline variable assignment -- arg was getting freed(!) in - call to variable. - -Backed out of parse-time constant folding for now, until I can figure out - how to do it right. - -Fixed devopen() so that getline <"-" works. - -Changes from 2.12.14 to 2.12.15 -------------------------------- - -Changed config/* to a condensed form that can be used with mkconf to generate - a config.h from config.h-dist -- much easier to maintain. Please check - carefully against what you had before for a particular system and report - any problems. vms.h remains separate since the stuff at the bottom - didn't quite fit the mkconf model -- hopefully cleared up later. - -Fixed bug in grammar -- didn't allow function definition to be separated from - other rules by a semi-colon. - -VMS fix to #includes in missing.c -- should we just be including awk.h? - -Updated README for texinfo.tex version. - -Updating of copyright in all .[chy] files. - -Added but commented out Michal's fix to strftime. - -Added tzset() emulation based on Rick Adams' code. Added TZSET_MISSING to - config.h-dist. - -Added strftime.3 man page for missing_dir - -More posix: func, **, **= don't work in -W posix - -More lint: ^, ^= not in old awk - -gawk.1: removed ref to -DNO_DEV_FD, other minor updating. - -Style change: pushbak becomes pushback() in yylex(). - -Changes from 2.12.13 to 2.12.14 -------------------------------- - -Better (?) organization of awk.h -- attempt to keep all system dependencies - near the top and move some of the non-general things out of the config.h - files. - -Change to handling of SYSTEM_MISSING. - -Small change to ultrix config. - -Do "/dev/fd/*" etc. checking at runtime. - -First pass at VMS port. - -Improvements to error handling (when lexeme spans buffers). - -Fixed backslash handling -- why didn't I notice this sooner? - -Added programs from book to test suite and new target "bigtest" to Makefile. - -Changes from 2.12.12 to 2.12.13 -------------------------------- - -Recognize OFS and ORS specially so that OFS = 9 works without efficiency hit. - Took advantage of opportunity to tune do_print*() for about 10% win on a - print with 5 args (i.e. small but significant). - -Somewhat pervasive changes to reconcile CONVFMT vs. OFMT. - -Better initialization of builtin vars. - -Make config/* consistent wrt STRTOL_MISSING. - -Small portability improvement to alloca.s - -Improvements to lint code in awk.y - -Replaced strtol() with a better one by Chris Torek. - -Changes from 2.12.11 to 2.12.12 -------------------------------- - -Added PORTS file to record successful ports. - -Added #define const to nothing if not STDC and added const to strtod() header. - -Added * to printf capabilities and partially implemented ' ' and '+' (has an - effect for %d only, silently ignored for other formats). I'm afraid that's - as far as I want to go before I look at a complete replacement for - do_sprintf(). - -Added warning for /regexp/ on LHS of MATCHOP. - -Changes from 2.12.10 to 2.12.11 -------------------------------- - -Small Makefile improvements. - -Some remaining nits from the NeXT port. - -Got rid of bcopy() define in awk.h -- not needed anymore (??) - -Changed private in builtin.c -- it is special on Sequent. - -Added subset implementation of strtol() and STRTOL_MISSING. - -A little bit of cleanup in debug.c, dfa.c. - -Changes from 2.12.9 to 2.12.10 ------------------------------- - -Redid compatability checking and checking for # of args. - -Removed all references to variables[] from outside awk.y, in preparation - for a more abstract interface to the symbol table. - -Got rid of a remaining use of bcopy() in regex.c. - -Changes from 2.12.8 to 2.12.9 ------------------------------ - -Portability improvements for atari, next and decstation. - -Bug fix in substr() -- wasn't handling 3rd arg. of -1 properly. - -Manpage updates. - -Moved support from src release to doc release. - -Updated FUTURES file. - -Added some "lint" warnings. - -Changes from 2.12.7 to 2.12.8 ------------------------------ - -Changed time() to systime(). - -Changed warning() in snode() to fatal(). - -strftime() now defaults second arg. to current time. - -Changes from 2.12.6 to 2.12.7 ------------------------------ - -Fixed bug in sub_common() involving inadequate allocation of a buffer. - -Added some missing files to the Makefile. - -Changes from 2.12.5 to 2.12.6 ------------------------------ - -Fixed bug wherein non-redirected getline could call iop_close() just - prior to a call from do_input(). - -Fixed bug in handling of /dev/stdout and /dev/stderr. - -Changes from 2.12.4 to 2.12.5 ------------------------------ - -Updated README and support directory. - -Changes from 2.12.3 to 2.12.4 ------------------------------ - -Updated CHANGES and TODO (should have been done in previous 2 patches). - -Changes from 2.12.2 to 2.12.3 ------------------------------ - -Brought regex.c and alloca.s into line with current FSF versions. - -Changes from 2.12.1 to 2.12.2 ------------------------------ - -Portability improvements; mostly moving system prototypes out of awk.h - -Introduction of strftime. - -Use of CONVFMT. - -Changes from 2.12 to 2.12.1 ------------------------------ - -Consolidated treatment of command-line assignments (thus correcting the --v treatment). - -Rationalized builtin-variable handling into a table-driven process, thus -simplifying variable() and eliminating spc_var(). - -Fixed bug in handling of command-line source that ended in a newline. - -Simplified install() and lookup(). - -Did away with double-mallocing of identifiers and now free second and later -instances of a name, after the first gets installed into the symbol table. - -Treat IGNORECASE specially, simplifying a lot of code, and allowing -checking against strict conformance only on setting it, rather than on each -pattern match. - -Fixed regexp matching when IGNORECASE is non-zero (broken when dfa.c was -added). - -Fixed bug where $0 was not being marked as valid, even after it was rebuilt. -This caused mangling of $0. - - -Changes from 2.11.1 to 2.12 ------------------------------ - -Makefile: - -Portability improvements in Makefile. -Move configuration stuff into config.h - -FSF files: - -Synchronized alloca.[cs] and regex.[ch] with FSF. - -array.c: - -Rationalized hash routines into one with a different algorithm. -delete() now works if the array is a local variable. -Changed interface of assoc_next() and avoided dereferencing past the end of the - array. - -awk.h: - -Merged non-prototype and prototype declarations in awk.h. -Expanded tree_eval #define to short-circuit more calls of r_tree_eval(). - -awk.y: - -Delinted some of the code in the grammar. -Fixed and improved some of the error message printing. -Changed to accomodate unlimited length source lines. -Line continuation now works as advertised. -Source lines can be arbitrarily long. -Refined grammar hacks so that /= assignment works. Regular expressions - starting with /= are recognized at the beginning of a line, after && or || - and after ~ or !~. More contexts can be added if necessary. -Fixed IGNORECASE (multiple scans for backslash). -Condensed expression_lists in array references. -Detect and warn for correct # args in builtin functions -- call most of them - with a fixed number (i.e. fill in defaults at parse-time rather than at - run-time). -Load ENVIRON only if it is referenced (detected at parse-time). -Treat NF, FS, RS, NR, FNR specially at parse time, to improve run time. -Fold constant expressions at parse time. -Do make_regexp() on third arg. of split() at parse tiem if it is a constant. - -builtin.c: - -srand() returns 0 the first time called. -Replaced alloca() with malloc() in do_sprintf(). -Fixed setting of RSTART and RLENGTH in do_match(). -Got rid of get_{one,two,three} and allowance for variable # of args. at - run-time -- this is now done at parse-time. -Fixed latent bug in [g]sub whereby changes to $0 would never get made. -Rewrote much of sub_common() for simplicity and performance. -Added ctime() and time() builtin functions (unless -DSTRICT). ctime() returns - a time string like the C function, given the number of seconds since the epoch - and time() returns the current time in seconds. -do_sprintf() now checks for mismatch between format string and number of - arguments supplied. - -dfa.c - -This is borrowed (almost unmodified) from GNU grep to provide faster searches. - -eval.c - -Node_var, Node_var_array and Node_param_list handled from macro rather - than in r_tree_eval(). -Changed cmp_nodes() to not do a force_number() -- this, combined with a - force_number() on ARGV[] and ENVIRON[] brings it into line with other awks -Greatly simplified cmp_nodes(). -Separated out Node_NF, Node_FS, Node_RS, Node_NR and Node_FNR in get_lhs(). -All adjacent string concatenations now done at once. - -field.c - -Added support for FIELDWIDTHS. -Fixed bug in get_field() whereby changes to a field were not always - properly reflected in $0. -Reordered tests in parse_field() so that reference off the end of the buffer - doesn't happen. -set_FS() now sets *parse_field i.e. routine to call depending on type of FS. -It also does make_regexp() for FS if needed. get_field() passes FS_regexp - to re_parse_field(), as does do_split(). -Changes to set_field() and set_record() to avoid malloc'ing and free'ing the - field nodes repeatedly. The fields now just point into $0 unless they are - assigned to another variable or changed. force_number() on the field is - *only* done when the field is needed. - -gawk.1 - -Fixed troff formatting problem on .TP lines. - -io.c - -Moved some code out into iop.c. -Output from pipes and system() calls is properly synchronized. -Status from pipe close properly returned. -Bug in getline with no redirect fixed. - -iop.c - -This file contains a totally revamped get_a_record and associated code. - -main.c - -Command line programs no longer use a temporary file. -Therefore, tmpnam() no longer required. -Deprecated -a and -e options -- they will go away in the next release, - but for now they cause a warning. -Moved -C, -V, -c options to -W ala posix. -Added -W posix option: throw out \x -Added -W lint option. - - -node.c - -force_number() now allows pure numerics to have leading whitespace. -Added make_string facility to optimize case of adding an already malloc'd - string. -Cleaned up and simplified do_deref(). -Fixed bug in handling of stref==255 in do_deref(). - -re.c - -contains the interface to regexp code - -Changes from 2.11.1 to FSF version of same ------------------------------------------- -Thu Jan 4 14:19:30 1990 Jim Kingdon (kingdon at albert) - - * Makefile (YACC): Add -y to bison part. - - * missing.c: Add #include <stdio.h>. - -Sun Dec 24 16:16:05 1989 David J. MacKenzie (djm at hobbes.ai.mit.edu) - - * * Makefile: Add (commented out) default defines for Sony News. - - * awk.h: Move declaration of vprintf so it will compile when - -DVPRINTF_MISSING is defined. - -Mon Nov 13 18:54:08 1989 Robert J. Chassell (bob at apple-gunkies.ai.mit.edu) - - * gawk.texinfo: changed @-commands that are not part of the - standard, currently released texinfmt.el to those that are. - Otherwise, only people with the as-yet unreleased makeinfo.c can - format this file. - -Changes from 2.11beta to 2.11.1 (production) --------------------------------------------- - -Went from "beta" to production status!!! - -Now flushes stdout before closing pipes or redirected files to -synchronize output. - -MS-DOS changes added in. - -Signal handler return type parameterized in Makefile and awk.h and -some lint removed. debug.c cleaned up. - -Fixed FS splitting to never match null strings, per book. - -Correction to the manual's description of FS. - -Some compilers break on char *foo = "string" + 4 so fixed version.sh and -main.c. - -Changes from 2.10beta to 2.11beta ---------------------------------- - -This release fixes all reported bugs that we could reproduce. Probably -some of the changes are not documented here. - -The next release will probably not be a beta release! - -The most important change is the addition of the -nostalgia option. :-) - -The documentation has been improved and brought up-to-date. - -There has been a lot of general cleaning up of the code that is not otherwise -documented here. There has been a movement toward using standard-conforming -library routines and providing them (in missing.d) for systems lacking them. -Improved (hopefully) configuration through Makfile modifications and missing.c. -In particular, straightened out confusion over vprintf #defines, declarations -etc. - -Deleted RCS log comments from source, to reduce source size by about one third. -Most of them were horribly out-of-date, anyway. - -Renamed source files to reflect (for the most part) their contents. - -More and improved error messages. Cleanup and fixes to yyerror(). -String constants are not altered in input buffer, so error messages come out -better. Fixed usage message. Make use of ANSI C strerror() function -(provided). - -Plugged many more memory leaks. The memory consumption is now quite -reasonable over a wide range of programs. - -Uses volatile declaration if STDC > 0 to avoid problems due to longjmp. - -New -a and -e options to use awk or egrep style regexps, respectively, -since POSIX says awk should use egrep regexps. Default is -a. - -Added -v option for setting variables before the first file is encountered. -Version information now uses -V and copyleft uses -C. - -Added a patchlevel.h file and its use for -V and -C. - -Append_right() optimized for major improvement to programs with a *lot* -of statements. - -Operator precedence has been corrected to match draft Posix. - -Tightened up grammar for builtin functions so that only length -may be called without arguments or parentheses. - -/regex/ is now a normal expression that can appear in any expression -context. - -Allow /= to begin a regexp. Allow ..[../..].. in a regexp. - -Allow empty compound statements ({}). - -Made return and next illegal outside a function and in BEGIN/END respectively. - -Division by zero is now illegal and causes a fatal error. - -Fixed exponentiation so that x ^ 0 and x ^= 0 both return 1. - -Fixed do_sqrt, do_log, and do_exp to do argument/return checking and -print an error message, per the manual. - -Fixed main to catch SIGSEGV to get source and data file line numbers. - -Fixed yyerror to print the ^ at the beginning of the bad token, not the end. - -Fix to substr() builtin: it was failing if the arguments -weren't already strings. - -Added new node value flag NUMERIC to indicate that a variable is -purely a number as opposed to type NUM which indicates that -the node's numeric value is valid. This is set in make_number(), -tmp_number and r_force_number() when appropriate and used in -cmp_nodes(). This fixed a bug in comparison of variables that had -numeric prefixes. The new code uses strtod() and eliminates is_a_number(). -A simple strtod() is provided for systems lacking one. It does no -overflow checking, so could be improved. - -Simplification and efficiency improvement in force_string. - -Added performance tweak in r_force_number(). - -Fixed a bug with nested loops and break/continue in functions. - -Fixed inconsistency in handling of empty fields when $0 has to be rebuilt. -Happens to simplify rebuild_record(). - -Cleaned up the code associated with opening a pipe for reading. Gawk -now has its own popen routine (gawk_popen) that allocates an IOBUF -and keeps track of the pid of the child process. gawk_pclose -marks the appropriate child as defunct in the right struct redirect. - -Cleaned up and fixed close_redir(). - -Fixed an obscure bug to do with redirection. Intermingled ">" and ">>" -redirects did not output in a predictable order. - -Improved handling of output buffering: now all print[f]s redirected to a tty -or pipe are flushed immediately and non-redirected output to a tty is flushed -before the next input record is read. - -Fixed a bug in get_a_record() where bcopy() could have copied over -a random pointer. - -Fixed a bug when RS="" and records separated by multiple blank lines. - -Got rid of SLOWIO code which was out-of-date anyway. - -Fix in get_field() for case where $0 is changed and then $(n) are -changed and then $0 is used. - -Fixed infinite loop on failure to open file for reading from getline. -Now handles redirect file open failures properly. - -Filenames such as /dev/stdin now allowed on the command line as well as -in redirects. - -Fixed so that gawk '$1' where $1 is a zero tests false. - -Fixed parsing so that `RLENGTH -1' parses the same as `RLENGTH - 1', -for example. - -The return from a user-defined function now defaults to the Null node. -This fixes a core-dump-causing bug when the return value of a function -is used and that function returns no value. - -Now catches floating point exceptions to avoid core dumps. - -Bug fix for deleting elements of an array -- under some conditions, it was -deleting more than one element at a time. - -Fix in AWKPATH code for running off the end of the string. - -Fixed handling of precision in *printf calls. %0.2d now works properly, -as does %c. [s]printf now recognizes %i and %X. - -Fixed a bug in printing of very large (>240) strings. - -Cleaned up erroneous behaviour for RS == "". - -Added IGNORECASE support to index(). - -Simplified and fixed newnode/freenode. - -Fixed reference to $(anything) in a BEGIN block. - -Eliminated use of USG rand48(). - -Bug fix in force_string for machines with 16-bit ints. - -Replaced use of mktemp() with tmpnam() and provided a partial implementation of -the latter for systems that don't have it. - -Added a portability check for includes in io.c. - -Minor portability fix in alloc.c plus addition of xmalloc(). - -Portability fix: on UMAX4.2, st_blksize is zero for a pipe, thus breaking -iop_alloc() -- fixed. - -Workaround for compiler bug on Sun386i in do_sprintf. - -More and improved prototypes in awk.h. - -Consolidated C escape parsing code into one place. - -strict flag is now turned on only when invoked with compatability option. -It now applies to fewer things. - -Changed cast of f._ptr in vprintf.c from (unsigned char *) to (char *). -Hopefully this is right for the systems that use this code (I don't). - -Support for pipes under MSDOS added. diff --git a/gnu/usr.bin/awk/PORTS b/gnu/usr.bin/awk/PORTS deleted file mode 100644 index 5087a43..0000000 --- a/gnu/usr.bin/awk/PORTS +++ /dev/null @@ -1,35 +0,0 @@ -A recent version of gawk has been successfully compiled and run "make test" -on the following: - -Sun 4/490 running 4.1 -NeXT running 2.0 -DECstation 3100 running Ultrix 4.0 or Ultrix 3.1 (different config) -AtariST (16-bit ints, gcc compiler, byacc, running under TOS) -ESIX V.3.2 Rev D (== System V Release 3.2), the 386. compiler was gcc + bison -IBM RS/6000 (see README.rs6000) -486 running SVR4, using cc and bison -SGI running IRIX 3.3 using gcc (fails with cc) -Sequent Balance running Dynix V3.1 -Cray Y-MP8 running Unicos 6.0.11 -Cray 2 running Unicos 6.1 (modulo trailing zeroes in chem) -VAX/VMS V5.x (should also work on 4.6 and 4.7) -VMS POSIX V1.0, V1.1 -OpenVMS AXP V1.0 -MSDOS - Microsoft C 5.1, compiles and runs very simple testing -BSD 4.4alpha - -From: ghazi@noc.rutgers.edu (Kaveh R. Ghazi): - -arch configured as: ----- -------------- -Dec Alpha OSF 1.3 osf1 -Hpux 9.0 hpux8x -NeXTStep 2.0 next20 -Sgi Irix 4.0.5 (/bin/cc) sgi405.cc -Stardent Titan 1500 OSv2.5 sysv3 -Stardent Vistra (i860) SVR4 sysv4 -Solaris 2.3 solaris2.cc -SunOS 4.1.3 sunos41 -Tektronix XD88 (UTekV 3.2e) sysv3 -Tektronix 4300 (UTek 4.0) utek -Ultrix 4.2 ultrix41 diff --git a/gnu/usr.bin/awk/POSIX b/gnu/usr.bin/awk/POSIX deleted file mode 100644 index f240542..0000000 --- a/gnu/usr.bin/awk/POSIX +++ /dev/null @@ -1,95 +0,0 @@ -Right now, the numeric vs. string comparisons are screwed up in draft -11.2. What prompted me to check it out was the note in gnu.bug.utils -which observed that gawk was doing the comparison $1 == "000" -numerically. I think that we can agree that intuitively, this should -be done as a string comparison. Version 2.13.2 of gawk follows the -current POSIX draft. Following is how I (now) think this -stuff should be done. - -1. A numeric literal or the result of a numeric operation has the NUMERIC - attribute. - -2. A string literal or the result of a string operation has the STRING - attribute. - -3. Fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the - elements of an array created by split() that are numeric strings - have the STRNUM attribute. Otherwise, they have the STRING attribute. - Uninitialized variables also have the STRNUM attribute. - -4. Attributes propagate across assignments, but are not changed by - any use. (Although a use may cause the entity to acquire an additional - value such that it has both a numeric and string value -- this leaves the - attribute unchanged.) - -When two operands are compared, either string comparison or numeric comparison -may be used, depending on the attributes of the operands, according to the -following (symmetric) matrix: - - +---------------------------------------------- - | STRING NUMERIC STRNUM ---------+---------------------------------------------- - | -STRING | string string string - | -NUMERIC | string numeric numeric - | -STRNUM | string numeric numeric ---------+---------------------------------------------- - -So, the following program should print all OKs. - -echo '0e2 0a 0 0b -0e2 0a 0 0b' | -$AWK ' -NR == 1 { - num = 0 - str = "0e2" - - print ++test ": " ( (str == "0e2") ? "OK" : "OOPS" ) - print ++test ": " ( ("0e2" != 0) ? "OK" : "OOPS" ) - print ++test ": " ( ("0" != $2) ? "OK" : "OOPS" ) - print ++test ": " ( ("0e2" == $1) ? "OK" : "OOPS" ) - - print ++test ": " ( (0 == "0") ? "OK" : "OOPS" ) - print ++test ": " ( (0 == num) ? "OK" : "OOPS" ) - print ++test ": " ( (0 != $2) ? "OK" : "OOPS" ) - print ++test ": " ( (0 == $1) ? "OK" : "OOPS" ) - - print ++test ": " ( ($1 != "0") ? "OK" : "OOPS" ) - print ++test ": " ( ($1 == num) ? "OK" : "OOPS" ) - print ++test ": " ( ($2 != 0) ? "OK" : "OOPS" ) - print ++test ": " ( ($2 != $1) ? "OK" : "OOPS" ) - print ++test ": " ( ($3 == 0) ? "OK" : "OOPS" ) - print ++test ": " ( ($3 == $1) ? "OK" : "OOPS" ) - print ++test ": " ( ($2 != $4) ? "OK" : "OOPS" ) # 15 -} -{ - a = "+2" - b = 2 - if (NR % 2) - c = a + b - print ++test ": " ( (a != b) ? "OK" : "OOPS" ) # 16 and 22 - - d = "2a" - b = 2 - if (NR % 2) - c = d + b - print ++test ": " ( (d != b) ? "OK" : "OOPS" ) - - print ++test ": " ( (d + 0 == b) ? "OK" : "OOPS" ) - - e = "2" - print ++test ": " ( (e == b "") ? "OK" : "OOPS" ) - - a = "2.13" - print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" ) - - a = "2.130000" - print ++test ": " ( (a != 2.13) ? "OK" : "OOPS" ) - - if (NR == 2) { - CONVFMT = "%.6f" - print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" ) - } -}' diff --git a/gnu/usr.bin/awk/PROBLEMS b/gnu/usr.bin/awk/PROBLEMS deleted file mode 100644 index a436180..0000000 --- a/gnu/usr.bin/awk/PROBLEMS +++ /dev/null @@ -1,10 +0,0 @@ -This is a list of known problems in gawk 2.15. -Hopefully they will all be fixed in the next major release of gawk. - -Please keep in mind that the code is still undergoing significant evolution. - -1. The interactions with the lexer and yyerror need reworking. It is possible - to get line numbers that are one line off if --compat or --posix is - true and either `next file' or `delete array' are used. - - Really the whole lexical analysis stuff needs reworking. diff --git a/gnu/usr.bin/awk/README b/gnu/usr.bin/awk/README deleted file mode 100644 index 90ed9c2..0000000 --- a/gnu/usr.bin/awk/README +++ /dev/null @@ -1,125 +0,0 @@ -README: - -This is GNU Awk 2.15. It should be upwardly compatible with the System -V Release 4 awk. It is almost completely compliant with POSIX 1003.2. - -This release adds new features -- see NEWS for details. - -See the installation instructions, below. - -Known problems are given in the PROBLEMS file. Work to be done is -described briefly in the FUTURES file. Verified ports are listed in -the PORTS file. Changes in this version are summarized in the NEWS file. -Please read the LIMITATIONS and ACKNOWLEDGMENT files. - -Read the file POSIX for a discussion of how the standard says comparisons -should be done vs. how they really should be done and how gawk does them. - -To format the documentation with TeX, you must use texinfo.tex 2.53 -or later. Otherwise footnotes look unacceptable. - -If you wish to remake the Info files, you should use makeinfo. The 2.15 -version of makeinfo works with no errors. - -The man page is up to date. - -INSTALLATION: - -Check whether there is a system-specific README file for your system. - -A quick overview of the installation process is in the file INSTALL. - -Makefile.in may need some tailoring. The only changes necessary should -be to change installation targets or to change compiler flags. -The changes to make in Makefile.in are commented and should be obvious. - -All other changes should be made in a config file. Samples for -various systems are included in the config directory. Starting with -2.11, our intent has been to make the code conform to standards (ANSI, -POSIX, SVID, in that order) whenever possible, and to not penalize -standard conforming systems. We have included substitute versions of -routines not universally available. Simply add the appropriate define -for the missing feature(s) on your system. - -If you have neither bison nor yacc, use the awktab.c file here. It was -generated with bison, and should have no AT&T code in it. (Note that -modifying awk.y without bison or yacc will be difficult, at best. You might -want to get a copy of bison from the FSF too.) - -If no config file is included for your system, start by copying one -for a similar system. One way of determining the defines needed is to -try to load gawk with nothing defined and see what routines are -unresolved by the loader. This should give you a good idea of how to -proceed. - -The next release will use the FSF autoconfig program, so we are no longer -soliciting new config files. - -If you have an MS-DOS or OS/2 system, use the stuff in the pc directory. -For an Atari there is an atari directory and similarly one for VMS. - -Chapter 16 of The GAWK Manual discusses configuration in detail. -(However, it does not discuss OS/2 configuration, see README.pc for -the details. The manual is being massively revised for 2.16.) - -After successful compilation, do 'make test' to run a small test -suite. There should be no output from the 'cmp' invocations except in -the cases where there are small differences in floating point values. -If there are other differences, please investigate and report the -problem. - -PRINTING THE MANUAL - -The 'support' directory contains texinfo.tex 2.115, which will be necessary -for printing the manual, and the texindex.c program from the texinfo -distribution which is also necessary. See the makefile for the steps needed -to get a DVI file from the manual. - -CAVEATS - -The existence of a patchlevel.h file does *N*O*T* imply a commitment on -our part to issue bug fixes or patches. It is there in case we should -decide to do so. - -BUG REPORTS AND FIXES (Un*x systems): - -Please coordinate changes through David Trueman and/or Arnold Robbins. - -David Trueman -Department of Mathematics, Statistics and Computing Science, -Dalhousie University, Halifax, Nova Scotia, Canada - -UUCP: {uunet utai watmath}!dalcs!david -INTERNET: david@cs.dal.ca - -Arnold Robbins -1736 Reindeer Drive -Atlanta, GA, 30329-3528, USA - -INTERNET: arnold@skeeve.atl.ga.us -UUCP: { gatech, emory, emoryu1 }!skeeve!arnold - -BUG REPORTS AND FIXES (non-Unix ports): - -MS-DOS: - Scott Deifik - AMGEN Inc. - Amgen Center, Bldg.17-Dept.393 - Thousand Oaks, CA 91320-1789 - Tel-805-499-5725 ext.4677 - Fax-805-498-0358 - scottd@amgen.com - -VMS: - Pat Rankin - rankin@eql.caltech.edu (e-mail only) - -Atari ST: - Michal Jaegermann - michal@gortel.phys.ualberta.ca (e-mail only) - -OS/2: - Kai Uwe Rommel - rommel@ars.muc.de (e-mail only) - Darrel Hankerson - hankedr@mail.auburn.edu (e-mail only) diff --git a/gnu/usr.bin/awk/array.c b/gnu/usr.bin/awk/array.c deleted file mode 100644 index 9166c4e..0000000 --- a/gnu/usr.bin/awk/array.c +++ /dev/null @@ -1,515 +0,0 @@ -/* - * array.c - routines for associative arrays. - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -/* - * Tree walks (``for (iggy in foo)'') and array deletions use expensive - * linear searching. So what we do is start out with small arrays and - * grow them as needed, so that our arrays are hopefully small enough, - * most of the time, that they're pretty full and we're not looking at - * wasted space. - * - * The decision is made to grow the array if the average chain length is - * ``too big''. This is defined as the total number of entries in the table - * divided by the size of the array being greater than some constant. - */ - -#define AVG_CHAIN_MAX 10 /* don't want to linear search more than this */ - -#include "awk.h" - -static NODE *assoc_find P((NODE *symbol, NODE *subs, int hash1)); -static void grow_table P((NODE *symbol)); - -NODE * -concat_exp(tree) -register NODE *tree; -{ - register NODE *r; - char *str; - char *s; - size_t len; - int offset; - size_t subseplen; - char *subsep; - - if (tree->type != Node_expression_list) - return force_string(tree_eval(tree)); - r = force_string(tree_eval(tree->lnode)); - if (tree->rnode == NULL) - return r; - subseplen = SUBSEP_node->lnode->stlen; - subsep = SUBSEP_node->lnode->stptr; - len = r->stlen + subseplen + 2; - emalloc(str, char *, len, "concat_exp"); - memcpy(str, r->stptr, r->stlen+1); - s = str + r->stlen; - free_temp(r); - tree = tree->rnode; - while (tree) { - if (subseplen == 1) - *s++ = *subsep; - else { - memcpy(s, subsep, subseplen+1); - s += subseplen; - } - r = force_string(tree_eval(tree->lnode)); - len += r->stlen + subseplen; - offset = s - str; - erealloc(str, char *, len, "concat_exp"); - s = str + offset; - memcpy(s, r->stptr, r->stlen+1); - s += r->stlen; - free_temp(r); - tree = tree->rnode; - } - r = make_str_node(str, s - str, ALREADY_MALLOCED); - r->flags |= TEMP; - return r; -} - -/* Flush all the values in symbol[] before doing a split() */ -void -assoc_clear(symbol) -NODE *symbol; -{ - int i; - NODE *bucket, *next; - - if (symbol->var_array == 0) - return; - for (i = 0; i < symbol->array_size; i++) { - for (bucket = symbol->var_array[i]; bucket; bucket = next) { - next = bucket->ahnext; - unref(bucket->ahname); - unref(bucket->ahvalue); - freenode(bucket); - } - symbol->var_array[i] = 0; - } - free(symbol->var_array); - symbol->var_array = NULL; - symbol->array_size = symbol->table_size = 0; - symbol->flags &= ~ARRAYMAXED; -} - -/* - * calculate the hash function of the string in subs - */ -unsigned int -hash(s, len, hsize) -register const char *s; -register size_t len; -unsigned long hsize; -{ - register unsigned long h = 0; - -#ifdef this_is_really_slow - - register unsigned long g; - - while (len--) { - h = (h << 4) + *s++; - g = (h & 0xf0000000); - if (g) { - h = h ^ (g >> 24); - h = h ^ g; - } - } - -#else /* this_is_really_slow */ -/* - * This is INCREDIBLY ugly, but fast. We break the string up into 8 byte - * units. On the first time through the loop we get the "leftover bytes" - * (strlen % 8). On every other iteration, we perform 8 HASHC's so we handle - * all 8 bytes. Essentially, this saves us 7 cmp & branch instructions. If - * this routine is heavily used enough, it's worth the ugly coding. - * - * OZ's original sdbm hash, copied from Margo Seltzers db package. - * - */ - -/* Even more speed: */ -/* #define HASHC h = *s++ + 65599 * h */ -/* Because 65599 = pow(2,6) + pow(2,16) - 1 we multiply by shifts */ -#define HASHC htmp = (h << 6); \ - h = *s++ + htmp + (htmp << 10) - h - - unsigned long htmp; - - h = 0; - -#if defined(VAXC) -/* - * [This was an implementation of "Duff's Device", but it has been - * redone, separating the switch for extra iterations from the loop. - * This is necessary because the DEC VAX-C compiler is STOOPID.] - */ - switch (len & (8 - 1)) { - case 7: HASHC; - case 6: HASHC; - case 5: HASHC; - case 4: HASHC; - case 3: HASHC; - case 2: HASHC; - case 1: HASHC; - default: break; - } - - if (len > (8 - 1)) { - register size_t loop = len >> 3; - do { - HASHC; - HASHC; - HASHC; - HASHC; - HASHC; - HASHC; - HASHC; - HASHC; - } while (--loop); - } -#else /* !VAXC */ - /* "Duff's Device" for those who can handle it */ - if (len > 0) { - register size_t loop = (len + 8 - 1) >> 3; - - switch (len & (8 - 1)) { - case 0: - do { /* All fall throughs */ - HASHC; - case 7: HASHC; - case 6: HASHC; - case 5: HASHC; - case 4: HASHC; - case 3: HASHC; - case 2: HASHC; - case 1: HASHC; - } while (--loop); - } - } -#endif /* !VAXC */ -#endif /* this_is_really_slow - not */ - - if (h >= hsize) - h %= hsize; - return h; -} - -/* - * locate symbol[subs] - */ -static NODE * /* NULL if not found */ -assoc_find(symbol, subs, hash1) -NODE *symbol; -register NODE *subs; -int hash1; -{ - register NODE *bucket, *prev = 0; - - for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->ahnext) { - if (cmp_nodes(bucket->ahname, subs) == 0) { -#if 0 - /* - * Disable this code for now. It screws things up if we have - * a ``for (iggy in foo)'' in progress. Interestingly enough, - * this was not a problem in 2.15.3, only in 2.15.4. I'm not - * sure why it works in 2.15.3. - */ - if (prev) { /* move found to front of chain */ - prev->ahnext = bucket->ahnext; - bucket->ahnext = symbol->var_array[hash1]; - symbol->var_array[hash1] = bucket; - } -#endif - return bucket; - } else - prev = bucket; /* save previous list entry */ - } - return NULL; -} - -/* - * test whether the array element symbol[subs] exists or not - */ -int -in_array(symbol, subs) -NODE *symbol, *subs; -{ - register int hash1; - - if (symbol->type == Node_param_list) - symbol = stack_ptr[symbol->param_cnt]; - if (symbol->var_array == 0) - return 0; - subs = concat_exp(subs); /* concat_exp returns a string node */ - hash1 = hash(subs->stptr, subs->stlen, (unsigned long) symbol->array_size); - if (assoc_find(symbol, subs, hash1) == NULL) { - free_temp(subs); - return 0; - } else { - free_temp(subs); - return 1; - } -} - -/* - * SYMBOL is the address of the node (or other pointer) being dereferenced. - * SUBS is a number or string used as the subscript. - * - * Find SYMBOL[SUBS] in the assoc array. Install it with value "" if it - * isn't there. Returns a pointer ala get_lhs to where its value is stored - */ -NODE ** -assoc_lookup(symbol, subs) -NODE *symbol, *subs; -{ - register int hash1; - register NODE *bucket; - - (void) force_string(subs); - - if (symbol->var_array == 0) { - symbol->type = Node_var_array; - symbol->array_size = symbol->table_size = 0; /* sanity */ - symbol->flags &= ~ARRAYMAXED; - grow_table(symbol); - hash1 = hash(subs->stptr, subs->stlen, - (unsigned long) symbol->array_size); - } else { - hash1 = hash(subs->stptr, subs->stlen, - (unsigned long) symbol->array_size); - bucket = assoc_find(symbol, subs, hash1); - if (bucket != NULL) { - free_temp(subs); - return &(bucket->ahvalue); - } - } - - /* It's not there, install it. */ - if (do_lint && subs->stlen == 0) - warning("subscript of array `%s' is null string", - symbol->vname); - - /* first see if we would need to grow the array, before installing */ - symbol->table_size++; - if ((symbol->flags & ARRAYMAXED) == 0 - && symbol->table_size/symbol->array_size > AVG_CHAIN_MAX) { - grow_table(symbol); - /* have to recompute hash value for new size */ - hash1 = hash(subs->stptr, subs->stlen, - (unsigned long) symbol->array_size); - } - - getnode(bucket); - bucket->type = Node_ahash; - if (subs->flags & TEMP) - bucket->ahname = dupnode(subs); - else { - unsigned int saveflags = subs->flags; - - subs->flags &= ~MALLOC; - bucket->ahname = dupnode(subs); - subs->flags = saveflags; - } - free_temp(subs); - - /* array subscripts are strings */ - bucket->ahname->flags &= ~NUMBER; - bucket->ahname->flags |= STRING; - bucket->ahvalue = Nnull_string; - bucket->ahnext = symbol->var_array[hash1]; - symbol->var_array[hash1] = bucket; - return &(bucket->ahvalue); -} - -void -do_delete(symbol, tree) -NODE *symbol, *tree; -{ - register int hash1; - register NODE *bucket, *last; - NODE *subs; - - if (symbol->type == Node_param_list) - symbol = stack_ptr[symbol->param_cnt]; - if (symbol->var_array == 0) - return; - subs = concat_exp(tree); /* concat_exp returns string node */ - hash1 = hash(subs->stptr, subs->stlen, (unsigned long) symbol->array_size); - - last = NULL; - for (bucket = symbol->var_array[hash1]; bucket; last = bucket, bucket = bucket->ahnext) - if (cmp_nodes(bucket->ahname, subs) == 0) - break; - free_temp(subs); - if (bucket == NULL) - return; - if (last) - last->ahnext = bucket->ahnext; - else - symbol->var_array[hash1] = bucket->ahnext; - unref(bucket->ahname); - unref(bucket->ahvalue); - freenode(bucket); - symbol->table_size--; - if (symbol->table_size <= 0) { - memset(symbol->var_array, '\0', - sizeof(NODE *) * symbol->array_size); - symbol->table_size = symbol->array_size = 0; - symbol->flags &= ~ARRAYMAXED; - free((char *) symbol->var_array); - symbol->var_array = NULL; - } -} - -void -assoc_scan(symbol, lookat) -NODE *symbol; -struct search *lookat; -{ - lookat->sym = symbol; - lookat->idx = 0; - lookat->bucket = NULL; - lookat->retval = NULL; - if (symbol->var_array != NULL) - assoc_next(lookat); -} - -void -assoc_next(lookat) -struct search *lookat; -{ - register NODE *symbol = lookat->sym; - - if (symbol == NULL) - fatal("null symbol in assoc_next"); - if (symbol->var_array == NULL || lookat->idx > symbol->array_size) { - lookat->retval = NULL; - return; - } - /* - * This is theoretically unsafe. The element bucket might have - * been freed if the body of the scan did a delete on the next - * element of the bucket. The only way to do that is by array - * reference, which is unlikely. Basically, if the user is doing - * anything other than an operation on the current element of an - * assoc array while walking through it sequentially, all bets are - * off. (The safe way is to register all search structs on an - * array with the array, and update all of them on a delete or - * insert) - */ - if (lookat->bucket != NULL) { - lookat->retval = lookat->bucket->ahname; - lookat->bucket = lookat->bucket->ahnext; - return; - } - for (; lookat->idx < symbol->array_size; lookat->idx++) { - NODE *bucket; - - if ((bucket = symbol->var_array[lookat->idx]) != NULL) { - lookat->retval = bucket->ahname; - lookat->bucket = bucket->ahnext; - lookat->idx++; - return; - } - } - lookat->retval = NULL; - lookat->bucket = NULL; - return; -} - -/* grow_table --- grow a hash table */ - -static void -grow_table(symbol) -NODE *symbol; -{ - NODE **old, **new, *chain, *next; - int i, j; - unsigned long hash1; - unsigned long oldsize, newsize; - /* - * This is an array of primes. We grow the table by an order of - * magnitude each time (not just doubling) so that growing is a - * rare operation. We expect, on average, that it won't happen - * more than twice. The final size is also chosen to be small - * enough so that MS-DOG mallocs can handle it. When things are - * very large (> 8K), we just double more or less, instead of - * just jumping from 8K to 64K. - */ - static long sizes[] = { 13, 127, 1021, 8191, 16381, 32749, 65497 }; - - /* find next biggest hash size */ - oldsize = symbol->array_size; - newsize = 0; - for (i = 0, j = sizeof(sizes)/sizeof(sizes[0]); i < j; i++) { - if (oldsize < sizes[i]) { - newsize = sizes[i]; - break; - } - } - - if (newsize == oldsize) { /* table already at max (!) */ - symbol->flags |= ARRAYMAXED; - return; - } - - /* allocate new table */ - emalloc(new, NODE **, newsize * sizeof(NODE *), "grow_table"); - memset(new, '\0', newsize * sizeof(NODE *)); - - /* brand new hash table, set things up and return */ - if (symbol->var_array == NULL) { - symbol->table_size = 0; - goto done; - } - - /* old hash table there, move stuff to new, free old */ - old = symbol->var_array; - for (i = 0; i < oldsize; i++) { - if (old[i] == NULL) - continue; - - for (chain = old[i]; chain != NULL; chain = next) { - next = chain->ahnext; - hash1 = hash(chain->ahname->stptr, - chain->ahname->stlen, newsize); - - /* remove from old list, add to new */ - chain->ahnext = new[hash1]; - new[hash1] = chain; - - } - } - free(old); - -done: - /* - * note that symbol->table_size does not change if an old array, - * and is explicitly set to 0 if a new one. - */ - symbol->var_array = new; - symbol->array_size = newsize; -} diff --git a/gnu/usr.bin/awk/awk.1 b/gnu/usr.bin/awk/awk.1 deleted file mode 100644 index 1b58bec..0000000 --- a/gnu/usr.bin/awk/awk.1 +++ /dev/null @@ -1,1969 +0,0 @@ -.ds PX \s-1POSIX\s+1 -.ds UX \s-1UNIX\s+1 -.ds AN \s-1ANSI\s+1 -.TH AWK 1 "Apr 18 1994" "Free Software Foundation" "Utility Commands" -.SH NAME -awk \- GNU awk pattern scanning and processing language -.SH SYNOPSIS -.B awk -[ POSIX or GNU style options ] -.B \-f -.I program-file -[ -.B \-\^\- -] file .\^.\^. -.br -.B awk -[ POSIX or GNU style options ] -[ -.B \-\^\- -] -.I program-text -file .\^.\^. -.SH DESCRIPTION -.I Gawk -is the GNU Project's implementation of the AWK programming language. -It conforms to the definition of the language in -the \*(PX 1003.2 Command Language And Utilities Standard. -This version in turn is based on the description in -.IR "The AWK Programming Language" , -by Aho, Kernighan, and Weinberger, -with the additional features defined in the System V Release 4 version -of \*(UX -.IR awk . -.I Gawk -also provides some GNU-specific extensions. -.PP -The command line consists of options to -.I awk -itself, the AWK program text (if not supplied via the -.B \-f -or -.B \-\^\-file -options), and values to be made -available in the -.B ARGC -and -.B ARGV -pre-defined AWK variables. -.SH OPTIONS -.PP -.I Gawk -options may be either the traditional \*(PX one letter options, -or the GNU style long options. \*(PX style options start with a single ``\-'', -while GNU long options start with ``\-\^\-''. -GNU style long options are provided for both GNU-specific features and -for \*(PX mandated features. Other implementations of the AWK language -are likely to only accept the traditional one letter options. -.PP -Following the \*(PX standard, -.IR awk -specific -options are supplied via arguments to the -.B \-W -option. Multiple -.B \-W -options may be supplied, or multiple arguments may be supplied together -if they are separated by commas, or enclosed in quotes and separated -by white space. -Case is ignored in arguments to the -.B \-W -option. -Each -.B \-W -option has a corresponding GNU style long option, as detailed below. -Arguments to GNU style long options are either joined with the option -by an -.B = -sign, with no intervening spaces, or they may be provided in the -next command line argument. -.PP -.I Gawk -accepts the following options. -.TP -.PD 0 -.BI \-F " fs" -.TP -.PD -.BI \-\^\-field-separator= fs -Use -.I fs -for the input field separator (the value of the -.B FS -predefined -variable). -.TP -.PD 0 -\fB\-v\fI var\fB\^=\^\fIval\fR -.TP -.PD -\fB\-\^\-assign=\fIvar\fB\^=\^\fIval\fR -Assign the value -.IR val , -to the variable -.IR var , -before execution of the program begins. -Such variable values are available to the -.B BEGIN -block of an AWK program. -.TP -.PD 0 -.BI \-f " program-file" -.TP -.PD -.BI \-\^\-file= program-file -Read the AWK program source from the file -.IR program-file , -instead of from the first command line argument. -Multiple -.B \-f -(or -.BR \-\^\-file ) -options may be used. -.TP -.PD 0 -.BI \-mf= NNN -.TP -.BI \-mr= NNN -Set various memory limits to the value -.IR NNN . -The -.B f -flag sets the maximum number of fields, and the -.B r -flag sets the maximum record size. These two flags and the -.B \-m -option are from the AT&T Bell Labs research version of \*(UX -.IR awk . -They are ignored by -.IR awk , -since -.I awk -has no pre-defined limits. -.TP \w'\fB\-\^\-copyright\fR'u+1n -.PD 0 -.B "\-W compat" -.TP -.PD -.B \-\^\-compat -Run in -.I compatibility -mode. In compatibility mode, -.I awk -behaves identically to \*(UX -.IR awk ; -none of the GNU-specific extensions are recognized. -See -.BR "GNU EXTENSIONS" , -below, for more information. -.TP -.PD 0 -.B "\-W copyleft" -.TP -.PD 0 -.B "\-W copyright" -.TP -.PD 0 -.B \-\^\-copyleft -.TP -.PD -.B \-\^\-copyright -Print the short version of the GNU copyright information message on -the error output. -.TP -.PD 0 -.B "\-W help" -.TP -.PD 0 -.B "\-W usage" -.TP -.PD 0 -.B \-\^\-help -.TP -.PD -.B \-\^\-usage -Print a relatively short summary of the available options on -the error output. -Per the GNU Coding Standards, these options cause an immediate, -successful exit. -.TP -.PD 0 -.B "\-W lint" -.TP -.PD 0 -.B \-\^\-lint -Provide warnings about constructs that are -dubious or non-portable to other AWK implementations. -.ig -.\" This option is left undocumented, on purpose. -.TP -.PD 0 -.B "\-W nostalgia" -.TP -.PD -.B \-\^\-nostalgia -Provide a moment of nostalgia for long time -.I awk -users. -.. -.TP -.PD 0 -.B "\-W posix" -.TP -.PD -.B \-\^\-posix -This turns on -.I compatibility -mode, with the following additional restrictions: -.RS -.TP \w'\(bu'u+1n -\(bu -.B \ex -escape sequences are not recognized. -.TP -\(bu -The synonym -.B func -for the keyword -.B function -is not recognized. -.TP -\(bu -The operators -.B ** -and -.B **= -cannot be used in place of -.B ^ -and -.BR ^= . -.RE -.TP -.PD 0 -.BI "\-W source=" program-text -.TP -.PD -.BI \-\^\-source= program-text -Use -.I program-text -as AWK program source code. -This option allows the easy intermixing of library functions (used via the -.B \-f -and -.B \-\^\-file -options) with source code entered on the command line. -It is intended primarily for medium to large size AWK programs used -in shell scripts. -.sp .5 -The -.B "\-W source=" -form of this option uses the rest of the command line argument for -.IR program-text ; -no other options to -.B \-W -will be recognized in the same argument. -.TP -.PD 0 -.B "\-W version" -.TP -.PD -.B \-\^\-version -Print version information for this particular copy of -.I awk -on the error output. -This is useful mainly for knowing if the current copy of -.I awk -on your system -is up to date with respect to whatever the Free Software Foundation -is distributing. -Per the GNU Coding Standards, these options cause an immediate, -successful exit. -.TP -.B \-\^\- -Signal the end of options. This is useful to allow further arguments to the -AWK program itself to start with a ``\-''. -This is mainly for consistency with the argument parsing convention used -by most other \*(PX programs. -.PP -In compatibility mode, -any other options are flagged as illegal, but are otherwise ignored. -In normal operation, as long as program text has been supplied, unknown -options are passed on to the AWK program in the -.B ARGV -array for processing. This is particularly useful for running AWK -programs via the ``#!'' executable interpreter mechanism. -.SH AWK PROGRAM EXECUTION -.PP -An AWK program consists of a sequence of pattern-action statements -and optional function definitions. -.RS -.PP -\fIpattern\fB { \fIaction statements\fB }\fR -.br -\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR -.RE -.PP -.I Gawk -first reads the program source from the -.IR program-file (s) -if specified, -from arguments to -.BR "\-W source=" , -or from the first non-option argument on the command line. -The -.B \-f -and -.B "\-W source=" -options may be used multiple times on the command line. -.I Gawk -will read the program text as if all the -.IR program-file s -and command line source texts -had been concatenated together. This is useful for building libraries -of AWK functions, without having to include them in each new AWK -program that uses them. It also provides the ability to mix library -functions with command line programs. -.PP -The environment variable -.B AWKPATH -specifies a search path to use when finding source files named with -the -.B \-f -option. If this variable does not exist, the default path is -\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR. -If a file name given to the -.B \-f -option contains a ``/'' character, no path search is performed. -.PP -.I Gawk -executes AWK programs in the following order. -First, -all variable assignments specified via the -.B \-v -option are performed. -Next, -.I awk -compiles the program into an internal form. -Then, -.I awk -executes the code in the -.B BEGIN -block(s) (if any), -and then proceeds to read -each file named in the -.B ARGV -array. -If there are no files named on the command line, -.I awk -reads the standard input. -.PP -If a filename on the command line has the form -.IB var = val -it is treated as a variable assignment. The variable -.I var -will be assigned the value -.IR val . -(This happens after any -.B BEGIN -block(s) have been run.) -Command line variable assignment -is most useful for dynamically assigning values to the variables -AWK uses to control how input is broken into fields and records. It -is also useful for controlling state if multiple passes are needed over -a single data file. -.PP -If the value of a particular element of -.B ARGV -is empty (\fB""\fR), -.I awk -skips over it. -.PP -For each line in the input, -.I awk -tests to see if it matches any -.I pattern -in the AWK program. -For each pattern that the line matches, the associated -.I action -is executed. -The patterns are tested in the order they occur in the program. -.PP -Finally, after all the input is exhausted, -.I awk -executes the code in the -.B END -block(s) (if any). -.SH VARIABLES AND FIELDS -AWK variables are dynamic; they come into existence when they are -first used. Their values are either floating-point numbers or strings, -or both, -depending upon how they are used. AWK also has one dimensional -arrays; arrays with multiple dimensions may be simulated. -Several pre-defined variables are set as a program -runs; these will be described as needed and summarized below. -.SS Fields -.PP -As each input line is read, -.I awk -splits the line into -.IR fields , -using the value of the -.B FS -variable as the field separator. -If -.B FS -is a single character, fields are separated by that character. -Otherwise, -.B FS -is expected to be a full regular expression. -In the special case that -.B FS -is a single blank, fields are separated -by runs of blanks and/or tabs. -Note that the value of -.B IGNORECASE -(see below) will also affect how fields are split when -.B FS -is a regular expression. -.PP -If the -.B FIELDWIDTHS -variable is set to a space separated list of numbers, each field is -expected to have fixed width, and -.I awk -will split up the record using the specified widths. The value of -.B FS -is ignored. -Assigning a new value to -.B FS -overrides the use of -.BR FIELDWIDTHS , -and restores the default behavior. -.PP -Each field in the input line may be referenced by its position, -.BR $1 , -.BR $2 , -and so on. -.B $0 -is the whole line. The value of a field may be assigned to as well. -Fields need not be referenced by constants: -.RS -.PP -.ft B -n = 5 -.br -print $n -.ft R -.RE -.PP -prints the fifth field in the input line. -The variable -.B NF -is set to the total number of fields in the input line. -.PP -References to non-existent fields (i.e. fields after -.BR $NF ) -produce the null-string. However, assigning to a non-existent field -(e.g., -.BR "$(NF+2) = 5" ) -will increase the value of -.BR NF , -create any intervening fields with the null string as their value, and -cause the value of -.B $0 -to be recomputed, with the fields being separated by the value of -.BR OFS . -References to negative numbered fields cause a fatal error. -.SS Built-in Variables -.PP -AWK's built-in variables are: -.PP -.TP \w'\fBFIELDWIDTHS\fR'u+1n -.B ARGC -The number of command line arguments (does not include options to -.IR awk , -or the program source). -.TP -.B ARGIND -The index in -.B ARGV -of the current file being processed. -.TP -.B ARGV -Array of command line arguments. The array is indexed from -0 to -.B ARGC -\- 1. -Dynamically changing the contents of -.B ARGV -can control the files used for data. -.TP -.B CONVFMT -The conversion format for numbers, \fB"%.6g"\fR, by default. -.TP -.B ENVIRON -An array containing the values of the current environment. -The array is indexed by the environment variables, each element being -the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be -.BR /u/arnold ). -Changing this array does not affect the environment seen by programs which -.I awk -spawns via redirection or the -.B system() -function. -(This may change in a future version of -.IR awk .) -.\" but don't hold your breath... -.TP -.B ERRNO -If a system error occurs either doing a redirection for -.BR getline , -during a read for -.BR getline , -or during a -.BR close() , -then -.B ERRNO -will contain -a string describing the error. -.TP -.B FIELDWIDTHS -A white-space separated list of fieldwidths. When set, -.I awk -parses the input into fields of fixed width, instead of using the -value of the -.B FS -variable as the field separator. -The fixed field width facility is still experimental; expect the -semantics to change as -.I awk -evolves over time. -.TP -.B FILENAME -The name of the current input file. -If no files are specified on the command line, the value of -.B FILENAME -is ``\-''. -However, -.B FILENAME -is undefined inside the -.B BEGIN -block. -.TP -.B FNR -The input record number in the current input file. -.TP -.B FS -The input field separator, a blank by default. -.TP -.B IGNORECASE -Controls the case-sensitivity of all regular expression operations. If -.B IGNORECASE -has a non-zero value, then pattern matching in rules, -field splitting with -.BR FS , -regular expression -matching with -.B ~ -and -.BR !~ , -and the -.BR gsub() , -.BR index() , -.BR match() , -.BR split() , -and -.B sub() -pre-defined functions will all ignore case when doing regular expression -operations. Thus, if -.B IGNORECASE -is not equal to zero, -.B /aB/ -matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP, -and \fB"AB"\fP. -As with all AWK variables, the initial value of -.B IGNORECASE -is zero, so all regular expression operations are normally case-sensitive. -.TP -.B NF -The number of fields in the current input record. -.TP -.B NR -The total number of input records seen so far. -.TP -.B OFMT -The output format for numbers, \fB"%.6g"\fR, by default. -.TP -.B OFS -The output field separator, a blank by default. -.TP -.B ORS -The output record separator, by default a newline. -.TP -.B RS -The input record separator, by default a newline. -.B RS -is exceptional in that only the first character of its string -value is used for separating records. -(This will probably change in a future release of -.IR awk .) -If -.B RS -is set to the null string, then records are separated by -blank lines. -When -.B RS -is set to the null string, then the newline character always acts as -a field separator, in addition to whatever value -.B FS -may have. -.TP -.B RSTART -The index of the first character matched by -.BR match() ; -0 if no match. -.TP -.B RLENGTH -The length of the string matched by -.BR match() ; -\-1 if no match. -.TP -.B SUBSEP -The character used to separate multiple subscripts in array -elements, by default \fB"\e034"\fR. -.SS Arrays -.PP -Arrays are subscripted with an expression between square brackets -.RB ( [ " and " ] ). -If the expression is an expression list -.RI ( expr ", " expr " ...)" -then the array subscript is a string consisting of the -concatenation of the (string) value of each expression, -separated by the value of the -.B SUBSEP -variable. -This facility is used to simulate multiply dimensioned -arrays. For example: -.PP -.RS -.ft B -i = "A" ;\^ j = "B" ;\^ k = "C" -.br -x[i, j, k] = "hello, world\en" -.ft R -.RE -.PP -assigns the string \fB"hello, world\en"\fR to the element of the array -.B x -which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK -are associative, i.e. indexed by string values. -.PP -The special operator -.B in -may be used in an -.B if -or -.B while -statement to see if an array has an index consisting of a particular -value. -.PP -.RS -.ft B -.nf -if (val in array) - print array[val] -.fi -.ft -.RE -.PP -If the array has multiple subscripts, use -.BR "(i, j) in array" . -.PP -The -.B in -construct may also be used in a -.B for -loop to iterate over all the elements of an array. -.PP -An element may be deleted from an array using the -.B delete -statement. -The -.B delete -statement may also be used to delete the entire contents of an array. -.SS Variable Typing And Conversion -.PP -Variables and fields -may be (floating point) numbers, or strings, or both. How the -value of a variable is interpreted depends upon its context. If used in -a numeric expression, it will be treated as a number, if used as a string -it will be treated as a string. -.PP -To force a variable to be treated as a number, add 0 to it; to force it -to be treated as a string, concatenate it with the null string. -.PP -When a string must be converted to a number, the conversion is accomplished -using -.IR atof (3). -A number is converted to a string by using the value of -.B CONVFMT -as a format string for -.IR sprintf (3), -with the numeric value of the variable as the argument. -However, even though all numbers in AWK are floating-point, -integral values are -.I always -converted as integers. Thus, given -.PP -.RS -.ft B -.nf -CONVFMT = "%2.2f" -a = 12 -b = a "" -.fi -.ft R -.RE -.PP -the variable -.B b -has a string value of \fB"12"\fR and not \fB"12.00"\fR. -.PP -.I Gawk -performs comparisons as follows: -If two variables are numeric, they are compared numerically. -If one value is numeric and the other has a string value that is a -``numeric string,'' then comparisons are also done numerically. -Otherwise, the numeric value is converted to a string and a string -comparison is performed. -Two strings are compared, of course, as strings. -According to the \*(PX standard, even if two strings are -numeric strings, a numeric comparison is performed. However, this is -clearly incorrect, and -.I awk -does not do this. -.PP -Uninitialized variables have the numeric value 0 and the string value "" -(the null, or empty, string). -.SH PATTERNS AND ACTIONS -AWK is a line oriented language. The pattern comes first, and then the -action. Action statements are enclosed in -.B { -and -.BR } . -Either the pattern may be missing, or the action may be missing, but, -of course, not both. If the pattern is missing, the action will be -executed for every single line of input. -A missing action is equivalent to -.RS -.PP -.B "{ print }" -.RE -.PP -which prints the entire line. -.PP -Comments begin with the ``#'' character, and continue until the -end of the line. -Blank lines may be used to separate statements. -Normally, a statement ends with a newline, however, this is not the -case for lines ending in -a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''. -Lines ending in -.B do -or -.B else -also have their statements automatically continued on the following line. -In other cases, a line can be continued by ending it with a ``\e'', -in which case the newline will be ignored. -.PP -Multiple statements may -be put on one line by separating them with a ``;''. -This applies to both the statements within the action part of a -pattern-action pair (the usual case), -and to the pattern-action statements themselves. -.SS Patterns -AWK patterns may be one of the following: -.PP -.RS -.nf -.B BEGIN -.B END -.BI / "regular expression" / -.I "relational expression" -.IB pattern " && " pattern -.IB pattern " || " pattern -.IB pattern " ? " pattern " : " pattern -.BI ( pattern ) -.BI ! " pattern" -.IB pattern1 ", " pattern2 -.fi -.RE -.PP -.B BEGIN -and -.B END -are two special kinds of patterns which are not tested against -the input. -The action parts of all -.B BEGIN -patterns are merged as if all the statements had -been written in a single -.B BEGIN -block. They are executed before any -of the input is read. Similarly, all the -.B END -blocks are merged, -and executed when all the input is exhausted (or when an -.B exit -statement is executed). -.B BEGIN -and -.B END -patterns cannot be combined with other patterns in pattern expressions. -.B BEGIN -and -.B END -patterns cannot have missing action parts. -.PP -For -.BI / "regular expression" / -patterns, the associated statement is executed for each input line that matches -the regular expression. -Regular expressions are the same as those in -.IR egrep (1), -and are summarized below. -.PP -A -.I "relational expression" -may use any of the operators defined below in the section on actions. -These generally test whether certain fields match certain regular expressions. -.PP -The -.BR && , -.BR || , -and -.B ! -operators are logical AND, logical OR, and logical NOT, respectively, as in C. -They do short-circuit evaluation, also as in C, and are used for combining -more primitive pattern expressions. As in most languages, parentheses -may be used to change the order of evaluation. -.PP -The -.B ?\^: -operator is like the same operator in C. If the first pattern is true -then the pattern used for testing is the second pattern, otherwise it is -the third. Only one of the second and third patterns is evaluated. -.PP -The -.IB pattern1 ", " pattern2 -form of an expression is called a -.IR "range pattern" . -It matches all input records starting with a line that matches -.IR pattern1 , -and continuing until a record that matches -.IR pattern2 , -inclusive. It does not combine with any other sort of pattern expression. -.SS Regular Expressions -Regular expressions are the extended kind found in -.IR egrep . -They are composed of characters as follows: -.TP \w'\fB[^\fIabc...\fB]\fR'u+2n -.I c -matches the non-metacharacter -.IR c . -.TP -.I \ec -matches the literal character -.IR c . -.TP -.B . -matches any character except newline. -.TP -.B ^ -matches the beginning of a line or a string. -.TP -.B $ -matches the end of a line or a string. -.TP -.BI [ abc... ] -character class, matches any of the characters -.IR abc... . -.TP -.BI [^ abc... ] -negated character class, matches any character except -.I abc... -and newline. -.TP -.IB r1 | r2 -alternation: matches either -.I r1 -or -.IR r2 . -.TP -.I r1r2 -concatenation: matches -.IR r1 , -and then -.IR r2 . -.TP -.IB r + -matches one or more -.IR r 's. -.TP -.IB r * -matches zero or more -.IR r 's. -.TP -.IB r ? -matches zero or one -.IR r 's. -.TP -.BI ( r ) -grouping: matches -.IR r . -.PP -The escape sequences that are valid in string constants (see below) -are also legal in regular expressions. -.SS Actions -Action statements are enclosed in braces, -.B { -and -.BR } . -Action statements consist of the usual assignment, conditional, and looping -statements found in most languages. The operators, control statements, -and input/output statements -available are patterned after those in C. -.SS Operators -.PP -The operators in AWK, in order of increasing precedence, are -.PP -.TP "\w'\fB*= /= %= ^=\fR'u+1n" -.PD 0 -.B "= += \-=" -.TP -.PD -.B "*= /= %= ^=" -Assignment. Both absolute assignment -.BI ( var " = " value ) -and operator-assignment (the other forms) are supported. -.TP -.B ?: -The C conditional expression. This has the form -.IB expr1 " ? " expr2 " : " expr3\c -\&. If -.I expr1 -is true, the value of the expression is -.IR expr2 , -otherwise it is -.IR expr3 . -Only one of -.I expr2 -and -.I expr3 -is evaluated. -.TP -.B || -Logical OR. -.TP -.B && -Logical AND. -.TP -.B "~ !~" -Regular expression match, negated match. -.B NOTE: -Do not use a constant regular expression -.RB ( /foo/ ) -on the left-hand side of a -.B ~ -or -.BR !~ . -Only use one on the right-hand side. The expression -.BI "/foo/ ~ " exp -has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR. -This is usually -.I not -what was intended. -.TP -.PD 0 -.B "< >" -.TP -.PD 0 -.B "<= >=" -.TP -.PD -.B "!= ==" -The regular relational operators. -.TP -.I blank -String concatenation. -.TP -.B "+ \-" -Addition and subtraction. -.TP -.B "* / %" -Multiplication, division, and modulus. -.TP -.B "+ \- !" -Unary plus, unary minus, and logical negation. -.TP -.B ^ -Exponentiation (\fB**\fR may also be used, and \fB**=\fR for -the assignment operator). -.TP -.B "++ \-\^\-" -Increment and decrement, both prefix and postfix. -.TP -.B $ -Field reference. -.SS Control Statements -.PP -The control statements are -as follows: -.PP -.RS -.nf -\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR] -\fBwhile (\fIcondition\fB) \fIstatement \fR -\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR -\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR -\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR -\fBbreak\fR -\fBcontinue\fR -\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR -\fBdelete \fIarray\^\fR -\fBexit\fR [ \fIexpression\fR ] -\fB{ \fIstatements \fB} -.fi -.RE -.SS "I/O Statements" -.PP -The input/output statements are as follows: -.PP -.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n" -.BI close( filename ) -Close file (or pipe, see below). -.TP -.B getline -Set -.B $0 -from next input record; set -.BR NF , -.BR NR , -.BR FNR . -.TP -.BI "getline <" file -Set -.B $0 -from next record of -.IR file ; -set -.BR NF . -.TP -.BI getline " var" -Set -.I var -from next input record; set -.BR NF , -.BR FNR . -.TP -.BI getline " var" " <" file -Set -.I var -from next record of -.IR file . -.TP -.B next -Stop processing the current input record. The next input record -is read and processing starts over with the first pattern in the -AWK program. If the end of the input data is reached, the -.B END -block(s), if any, are executed. -.TP -.B "next file" -Stop processing the current input file. The next input record read -comes from the next input file. -.B FILENAME -is updated, -.B FNR -is reset to 1, and processing starts over with the first pattern in the -AWK program. If the end of the input data is reached, the -.B END -block(s), if any, are executed. -.TP -.B print -Prints the current record. -.TP -.BI print " expr-list" -Prints expressions. -Each expression is separated by the value of the -.B OFS -variable. The output record is terminated with the value of the -.B ORS -variable. -.TP -.BI print " expr-list" " >" file -Prints expressions on -.IR file . -Each expression is separated by the value of the -.B OFS -variable. The output record is terminated with the value of the -.B ORS -variable. -.TP -.BI printf " fmt, expr-list" -Format and print. -.TP -.BI printf " fmt, expr-list" " >" file -Format and print on -.IR file . -.TP -.BI system( cmd-line ) -Execute the command -.IR cmd-line , -and return the exit status. -(This may not be available on non-\*(PX systems.) -.PP -Other input/output redirections are also allowed. For -.B print -and -.BR printf , -.BI >> file -appends output to the -.IR file , -while -.BI | " command" -writes on a pipe. -In a similar fashion, -.IB command " | getline" -pipes into -.BR getline . -The -.BR getline -command will return 0 on end of file, and \-1 on an error. -.SS The \fIprintf\fP\^ Statement -.PP -The AWK versions of the -.B printf -statement and -.B sprintf() -function -(see below) -accept the following conversion specification formats: -.TP -.B %c -An \s-1ASCII\s+1 character. -If the argument used for -.B %c -is numeric, it is treated as a character and printed. -Otherwise, the argument is assumed to be a string, and the only first -character of that string is printed. -.TP -.B %d -A decimal number (the integer part). -.TP -.B %i -Just like -.BR %d . -.TP -.B %e -A floating point number of the form -.BR [\-]d.ddddddE[+\^\-]dd . -.TP -.B %f -A floating point number of the form -.BR [\-]ddd.dddddd . -.TP -.B %g -Use -.B e -or -.B f -conversion, whichever is shorter, with nonsignificant zeros suppressed. -.TP -.B %o -An unsigned octal number (again, an integer). -.TP -.B %s -A character string. -.TP -.B %x -An unsigned hexadecimal number (an integer). -.TP -.B %X -Like -.BR %x , -but using -.B ABCDEF -instead of -.BR abcdef . -.TP -.B %% -A single -.B % -character; no argument is converted. -.PP -There are optional, additional parameters that may lie between the -.B % -and the control letter: -.TP -.B \- -The expression should be left-justified within its field. -.TP -.I width -The field should be padded to this width. If the number has a leading -zero, then the field will be padded with zeros. -Otherwise it is padded with blanks. -This applies even to the non-numeric output formats. -.TP -.BI . prec -A number indicating the maximum width of strings or digits to the right -of the decimal point. -.PP -The dynamic -.I width -and -.I prec -capabilities of the \*(AN C -.B printf() -routines are supported. -A -.B * -in place of either the -.B width -or -.B prec -specifications will cause their values to be taken from -the argument list to -.B printf -or -.BR sprintf() . -.SS Special File Names -.PP -When doing I/O redirection from either -.B print -or -.B printf -into a file, -or via -.B getline -from a file, -.I awk -recognizes certain special filenames internally. These filenames -allow access to open file descriptors inherited from -.IR awk 's -parent process (usually the shell). -Other special filenames provide access information about the running -.B awk -process. -The filenames are: -.TP \w'\fB/dev/stdout\fR'u+1n -.B /dev/pid -Reading this file returns the process ID of the current process, -in decimal, terminated with a newline. -.TP -.B /dev/ppid -Reading this file returns the parent process ID of the current process, -in decimal, terminated with a newline. -.TP -.B /dev/pgrpid -Reading this file returns the process group ID of the current process, -in decimal, terminated with a newline. -.TP -.B /dev/user -Reading this file returns a single record terminated with a newline. -The fields are separated with blanks. -.B $1 -is the value of the -.IR getuid (2) -system call, -.B $2 -is the value of the -.IR geteuid (2) -system call, -.B $3 -is the value of the -.IR getgid (2) -system call, and -.B $4 -is the value of the -.IR getegid (2) -system call. -If there are any additional fields, they are the group IDs returned by -.IR getgroups (2). -Multiple groups may not be supported on all systems. -.TP -.B /dev/stdin -The standard input. -.TP -.B /dev/stdout -The standard output. -.TP -.B /dev/stderr -The standard error output. -.TP -.BI /dev/fd/\^ n -The file associated with the open file descriptor -.IR n . -.PP -These are particularly useful for error messages. For example: -.PP -.RS -.ft B -print "You blew it!" > "/dev/stderr" -.ft R -.RE -.PP -whereas you would otherwise have to use -.PP -.RS -.ft B -print "You blew it!" | "cat 1>&2" -.ft R -.RE -.PP -These file names may also be used on the command line to name data files. -.SS Numeric Functions -.PP -AWK has the following pre-defined arithmetic functions: -.PP -.TP \w'\fBsrand(\^\fIexpr\^\fB)\fR'u+1n -.BI atan2( y , " x" ) -returns the arctangent of -.I y/x -in radians. -.TP -.BI cos( expr ) -returns the cosine in radians. -.TP -.BI exp( expr ) -the exponential function. -.TP -.BI int( expr ) -truncates to integer. -.TP -.BI log( expr ) -the natural logarithm function. -.TP -.B rand() -returns a random number between 0 and 1. -.TP -.BI sin( expr ) -returns the sine in radians. -.TP -.BI sqrt( expr ) -the square root function. -.TP -.BI srand( expr ) -use -.I expr -as a new seed for the random number generator. If no -.I expr -is provided, the time of day will be used. -The return value is the previous seed for the random -number generator. -.SS String Functions -.PP -AWK has the following pre-defined string functions: -.PP -.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n" -\fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR -for each substring matching the regular expression -.I r -in the string -.IR t , -substitute the string -.IR s , -and return the number of substitutions. -If -.I t -is not supplied, use -.BR $0 . -.TP -.BI index( s , " t" ) -returns the index of the string -.I t -in the string -.IR s , -or 0 if -.I t -is not present. -.TP -.BI length( s ) -returns the length of the string -.IR s , -or the length of -.B $0 -if -.I s -is not supplied. -.TP -.BI match( s , " r" ) -returns the position in -.I s -where the regular expression -.I r -occurs, or 0 if -.I r -is not present, and sets the values of -.B RSTART -and -.BR RLENGTH . -.TP -\fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR -splits the string -.I s -into the array -.I a -on the regular expression -.IR r , -and returns the number of fields. If -.I r -is omitted, -.B FS -is used instead. -The array -.I a -is cleared first. -.TP -.BI sprintf( fmt , " expr-list" ) -prints -.I expr-list -according to -.IR fmt , -and returns the resulting string. -.TP -\fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR -just like -.BR gsub() , -but only the first matching substring is replaced. -.TP -\fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR -returns the -.IR n -character -substring of -.I s -starting at -.IR i . -If -.I n -is omitted, the rest of -.I s -is used. -.TP -.BI tolower( str ) -returns a copy of the string -.IR str , -with all the upper-case characters in -.I str -translated to their corresponding lower-case counterparts. -Non-alphabetic characters are left unchanged. -.TP -.BI toupper( str ) -returns a copy of the string -.IR str , -with all the lower-case characters in -.I str -translated to their corresponding upper-case counterparts. -Non-alphabetic characters are left unchanged. -.SS Time Functions -.PP -Since one of the primary uses of AWK programs is processing log files -that contain time stamp information, -.I awk -provides the following two functions for obtaining time stamps and -formatting them. -.PP -.TP "\w'\fBsystime()\fR'u+1n" -.B systime() -returns the current time of day as the number of seconds since the Epoch -(Midnight UTC, January 1, 1970 on \*(PX systems). -.TP -\fBstrftime(\fIformat\fR, \fItimestamp\fB)\fR -formats -.I timestamp -according to the specification in -.IR format. -The -.I timestamp -should be of the same form as returned by -.BR systime() . -If -.I timestamp -is missing, the current time of day is used. -See the specification for the -.B strftime() -function in \*(AN C for the format conversions that are -guaranteed to be available. -A public-domain version of -.IR strftime (3) -and a man page for it are shipped with -.IR awk ; -if that version was used to build -.IR awk , -then all of the conversions described in that man page are available to -.IR awk. -.SS String Constants -.PP -String constants in AWK are sequences of characters enclosed -between double quotes (\fB"\fR). Within strings, certain -.I "escape sequences" -are recognized, as in C. These are: -.PP -.TP \w'\fB\e\^\fIddd\fR'u+1n -.B \e\e -A literal backslash. -.TP -.B \ea -The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character. -.TP -.B \eb -backspace. -.TP -.B \ef -form-feed. -.TP -.B \en -new line. -.TP -.B \er -carriage return. -.TP -.B \et -horizontal tab. -.TP -.B \ev -vertical tab. -.TP -.BI \ex "\^hex digits" -The character represented by the string of hexadecimal digits following -the -.BR \ex . -As in \*(AN C, all following hexadecimal digits are considered part of -the escape sequence. -(This feature should tell us something about language design by committee.) -E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character. -.TP -.BI \e ddd -The character represented by the 1-, 2-, or 3-digit sequence of octal -digits. E.g. \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character. -.TP -.BI \e c -The literal character -.IR c\^ . -.PP -The escape sequences may also be used inside constant regular expressions -(e.g., -.B "/[\ \et\ef\en\er\ev]/" -matches whitespace characters). -.SH FUNCTIONS -Functions in AWK are defined as follows: -.PP -.RS -\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR -.RE -.PP -Functions are executed when called from within the action parts of regular -pattern-action statements. Actual parameters supplied in the function -call are used to instantiate the formal parameters declared in the function. -Arrays are passed by reference, other variables are passed by value. -.PP -Since functions were not originally part of the AWK language, the provision -for local variables is rather clumsy: They are declared as extra parameters -in the parameter list. The convention is to separate local variables from -real parameters by extra spaces in the parameter list. For example: -.PP -.RS -.ft B -.nf -function f(p, q, a, b) { # a & b are local - ..... } - -/abc/ { ... ; f(1, 2) ; ... } -.fi -.ft R -.RE -.PP -The left parenthesis in a function call is required -to immediately follow the function name, -without any intervening white space. -This is to avoid a syntactic ambiguity with the concatenation operator. -This restriction does not apply to the built-in functions listed above. -.PP -Functions may call each other and may be recursive. -Function parameters used as local variables are initialized -to the null string and the number zero upon function invocation. -.PP -The word -.B func -may be used in place of -.BR function . -.SH EXAMPLES -.nf -Print and sort the login names of all users: - -.ft B - BEGIN { FS = ":" } - { print $1 | "sort" } - -.ft R -Count lines in a file: - -.ft B - { nlines++ } - END { print nlines } - -.ft R -Precede each line by its number in the file: - -.ft B - { print FNR, $0 } - -.ft R -Concatenate and line number (a variation on a theme): - -.ft B - { print NR, $0 } -.ft R -.fi -.SH SEE ALSO -.IR egrep (1), -.IR getpid (2), -.IR getppid (2), -.IR getpgrp (2), -.IR getuid (2), -.IR geteuid (2), -.IR getgid (2), -.IR getegid (2), -.IR getgroups (2) -.PP -.IR "The AWK Programming Language" , -Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, -Addison-Wesley, 1988. ISBN 0-201-07981-X. -.PP -.IR "The GAWK Manual" , -Edition 0.15, published by the Free Software Foundation, 1993. -.SH POSIX COMPATIBILITY -A primary goal for -.I awk -is compatibility with the \*(PX standard, as well as with the -latest version of \*(UX -.IR awk . -To this end, -.I awk -incorporates the following user visible -features which are not described in the AWK book, -but are part of -.I awk -in System V Release 4, and are in the \*(PX standard. -.PP -The -.B \-v -option for assigning variables before program execution starts is new. -The book indicates that command line variable assignment happens when -.I awk -would otherwise open the argument as a file, which is after the -.B BEGIN -block is executed. However, in earlier implementations, when such an -assignment appeared before any file names, the assignment would happen -.I before -the -.B BEGIN -block was run. Applications came to depend on this ``feature.'' -When -.I awk -was changed to match its documentation, this option was added to -accommodate applications that depended upon the old behavior. -(This feature was agreed upon by both the AT&T and GNU developers.) -.PP -The -.B \-W -option for implementation specific features is from the \*(PX standard. -.PP -When processing arguments, -.I awk -uses the special option ``\fB\-\^\-\fP'' to signal the end of -arguments. -In compatibility mode, it will warn about, but otherwise ignore, -undefined options. -In normal operation, such arguments are passed on to the AWK program for -it to process. -.PP -The AWK book does not define the return value of -.BR srand() . -The System V Release 4 version of \*(UX -.I awk -(and the \*(PX standard) -has it return the seed it was using, to allow keeping track -of random number sequences. Therefore -.B srand() -in -.I awk -also returns its current seed. -.PP -Other new features are: -The use of multiple -.B \-f -options (from MKS -.IR awk ); -the -.B ENVIRON -array; the -.BR \ea , -and -.BR \ev -escape sequences (done originally in -.I awk -and fed back into AT&T's); the -.B tolower() -and -.B toupper() -built-in functions (from AT&T); and the \*(AN C conversion specifications in -.B printf -(done first in AT&T's version). -.SH GNU EXTENSIONS -.I Gawk -has some extensions to \*(PX -.IR awk . -They are described in this section. All the extensions described here -can be disabled by -invoking -.I awk -with the -.B "\-W compat" -option. -.PP -The following features of -.I awk -are not available in -\*(PX -.IR awk . -.RS -.TP \w'\(bu'u+1n -\(bu -The -.B \ex -escape sequence. -.TP -\(bu -The -.B systime() -and -.B strftime() -functions. -.TP -\(bu -The special file names available for I/O redirection are not recognized. -.TP -\(bu -The -.B ARGIND -and -.B ERRNO -variables are not special. -.TP -\(bu -The -.B IGNORECASE -variable and its side-effects are not available. -.TP -\(bu -The -.B FIELDWIDTHS -variable and fixed width field splitting. -.TP -\(bu -No path search is performed for files named via the -.B \-f -option. Therefore the -.B AWKPATH -environment variable is not special. -.TP -\(bu -The use of -.B "next file" -to abandon processing of the current input file. -.TP -\(bu -The use of -.BI delete " array" -to delete the entire contents of an array. -.RE -.PP -The AWK book does not define the return value of the -.B close() -function. -.IR Gawk\^ 's -.B close() -returns the value from -.IR fclose (3), -or -.IR pclose (3), -when closing a file or pipe, respectively. -.PP -When -.I awk -is invoked with the -.B "\-W compat" -option, -if the -.I fs -argument to the -.B \-F -option is ``t'', then -.B FS -will be set to the tab character. -Since this is a rather ugly special case, it is not the default behavior. -This behavior also does not occur if -.B "\-W posix" -has been specified. -.ig -.PP -If -.I awk -was compiled for debugging, it will -accept the following additional options: -.TP -.PD 0 -.B \-Wparsedebug -.TP -.PD -.B \-\^\-parsedebug -Turn on -.IR yacc (1) -or -.IR bison (1) -debugging output during program parsing. -This option should only be of interest to the -.I awk -maintainers, and may not even be compiled into -.IR awk . -.. -.SH HISTORICAL FEATURES -There are two features of historical AWK implementations that -.I awk -supports. -First, it is possible to call the -.B length() -built-in function not only with no argument, but even without parentheses! -Thus, -.RS -.PP -.ft B -a = length -.ft R -.RE -.PP -is the same as either of -.RS -.PP -.ft B -a = length() -.br -a = length($0) -.ft R -.RE -.PP -This feature is marked as ``deprecated'' in the \*(PX standard, and -.I awk -will issue a warning about its use if -.B "\-W lint" -is specified on the command line. -.PP -The other feature is the use of the -.B continue -statement outside the body of a -.BR while , -.BR for , -or -.B do -loop. Traditional AWK implementations have treated such usage as -equivalent to the -.B next -statement. -.I Gawk -will support this usage if -.B "\-W posix" -has not been specified. -.SH ENVIRONMENT VARIABLES -If -.B POSIXLY_CORRECT -exists in the environment, then -.I awk -behaves exactly as if -.B \-\-posix -had been specified on the command line. -If -.B \-\-lint -has been specified, -.I awk -will issue a warning message to this effect. -.SH BUGS -The -.B \-F -option is not necessary given the command line variable assignment feature; -it remains only for backwards compatibility. -.PP -If your system actually has support for -.B /dev/fd -and the associated -.BR /dev/stdin , -.BR /dev/stdout , -and -.B /dev/stderr -files, you may get different output from -.I awk -than you would get on a system without those files. When -.I awk -interprets these files internally, it synchronizes output to the standard -output with output to -.BR /dev/stdout , -while on a system with those files, the output is actually to different -open files. -Caveat Emptor. -.SH VERSION INFORMATION -This man page documents -.IR awk , -version 2.15. -.PP -Starting with the 2.15 version of -.IR awk , -the -.BR \-c , -.BR \-V , -.BR \-C , -.ig -.BR \-D , -.. -.BR \-a , -and -.B \-e -options of the 2.11 version are no longer recognized. -This fact will not even be documented in the manual page for version 2.16. -.SH AUTHORS -The original version of \*(UX -.I awk -was designed and implemented by Alfred Aho, -Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan -continues to maintain and enhance it. -.PP -Paul Rubin and Jay Fenlason, -of the Free Software Foundation, wrote -.IR gawk , -to be compatible with the original version of -.I awk -distributed in Seventh Edition \*(UX. -John Woods contributed a number of bug fixes. -David Trueman, with contributions -from Arnold Robbins, made -.I gawk -compatible with the new version of \*(UX -.IR awk . -.PP -The initial DOS port was done by Conrad Kwok and Scott Garfinkle. -Scott Deifik is the current DOS maintainer. Pat Rankin did the -port to VMS, and Michal Jaegermann did the port to the Atari ST. -The port to OS/2 was done by Kai Uwe Rommel, with contributions and -help from Darrel Hankerson. -.SH ACKNOWLEDGEMENTS -Brian Kernighan of Bell Labs -provided valuable assistance during testing and debugging. -We thank him. diff --git a/gnu/usr.bin/awk/awk.h b/gnu/usr.bin/awk/awk.h deleted file mode 100644 index 453a724..0000000 --- a/gnu/usr.bin/awk/awk.h +++ /dev/null @@ -1,790 +0,0 @@ -/* - * awk.h -- Definitions for gawk. - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -/* ------------------------------ Includes ------------------------------ */ -#include "config.h" - -#include <stdio.h> -#ifndef LIMITS_H_MISSING -#include <limits.h> -#endif -#include <ctype.h> -#include <setjmp.h> -#include <varargs.h> -#include <time.h> -#include <errno.h> -#if !defined(errno) && !defined(MSDOS) && !defined(OS2) -extern int errno; -#endif -#ifdef __GNU_LIBRARY__ -#ifndef linux -#include <signum.h> -#endif -#endif - -/* ----------------- System dependencies (with more includes) -----------*/ - -#if defined(__FreeBSD__) -# include <floatingpoint.h> -#endif - -#if !defined(VMS) || (!defined(VAXC) && !defined(__DECC)) -#include <sys/types.h> -#include <sys/stat.h> -#else /* VMS w/ VAXC or DECC */ -#include <types.h> -#include <stat.h> -#include <file.h> /* avoid <fcntl.h> in io.c */ -#endif - -#include <signal.h> - -#ifdef __STDC__ -#define P(s) s -#define MALLOC_ARG_T size_t -#else -#define P(s) () -#define MALLOC_ARG_T unsigned -#define volatile -#define const -#endif - -#ifndef SIGTYPE -#define SIGTYPE void -#endif - -#ifdef SIZE_T_MISSING -typedef unsigned int size_t; -#endif - -#ifndef SZTC -#define SZTC -#define INTC -#endif - -#ifdef STDC_HEADERS -#include <stdlib.h> -#include <string.h> -#ifdef NeXT -#include <libc.h> -#undef atof -#else -#if defined(atarist) || defined(VMS) -#include <unixlib.h> -#else /* atarist || VMS */ -#if !defined(MSDOS) && !defined(_MSC_VER) -#include <unistd.h> -#endif /* MSDOS */ -#endif /* atarist || VMS */ -#endif /* Next */ -#else /* STDC_HEADERS */ -#include "protos.h" -#endif /* STDC_HEADERS */ - -#if defined(ultrix) && !defined(Ultrix41) -extern char * getenv P((char *name)); -extern double atof P((char *s)); -#endif - -#ifndef __GNUC__ -#ifdef sparc -/* nasty nasty SunOS-ism */ -#include <alloca.h> -#ifdef lint -extern char *alloca(); -#endif -#else /* not sparc */ -#if !defined(alloca) && !defined(ALLOCA_PROTO) -#if defined(_MSC_VER) -#include <malloc.h> -#else -extern char *alloca(); -#endif /* _MSC_VER */ -#endif -#endif /* sparc */ -#endif /* __GNUC__ */ - -#ifdef HAVE_UNDERSCORE_SETJMP -/* nasty nasty berkelixm */ -#define setjmp _setjmp -#define longjmp _longjmp -#endif - -/* - * if you don't have vprintf, try this and cross your fingers. - */ -#if defined(VPRINTF_MISSING) -#define vfprintf(fp,fmt,arg) _doprnt((fmt), (arg), (fp)) -#endif - -#ifdef VMS -/* some macros to redirect to code in vms/vms_misc.c */ -#define exit vms_exit -#define open vms_open -#define strerror vms_strerror -#define strdup vms_strdup -extern void exit P((int)); -extern int open P((const char *,int,...)); -extern char *strerror P((int)); -extern char *strdup P((const char *str)); -extern int vms_devopen P((const char *,int)); -# ifndef NO_TTY_FWRITE -#define fwrite tty_fwrite -#define fclose tty_fclose -extern size_t fwrite P((const void *,size_t,size_t,FILE *)); -extern int fclose P((FILE *)); -# endif -extern FILE *popen P((const char *,const char *)); -extern int pclose P((FILE *)); -extern void vms_arg_fixup P((int *,char ***)); -/* some things not in STDC_HEADERS */ -extern size_t gnu_strftime P((char *,size_t,const char *,const struct tm *)); -extern int unlink P((const char *)); -extern int getopt P((int,char **,char *)); -extern int isatty P((int)); -#ifndef fileno -extern int fileno P((FILE *)); -#endif -extern int close(), dup(), dup2(), fstat(), read(), stat(); -extern int getpgrp P((void)); -#endif /*VMS*/ - -#define GNU_REGEX -#ifdef GNU_REGEX -#include "gnuregex.h" -#include "dfa.h" -typedef struct Regexp { - struct re_pattern_buffer pat; - struct re_registers regs; - struct dfa dfareg; - int dfa; -} Regexp; -#define RESTART(rp,s) (rp)->regs.start[0] -#define REEND(rp,s) (rp)->regs.end[0] -#else /* GNU_REGEX */ -#endif /* GNU_REGEX */ - -#ifdef atarist -#define read _text_read /* we do not want all these CR's to mess our input */ -extern int _text_read (int, char *, int); -#ifndef __MINT__ -#undef NGROUPS_MAX -#endif /* __MINT__ */ -#endif - -#ifndef DEFPATH -#define DEFPATH ".:/usr/local/lib/awk:/usr/lib/awk" -#endif - -#ifndef ENVSEP -#define ENVSEP ':' -#endif - -extern double double_to_int P((double d)); - -/* ------------------ Constants, Structures, Typedefs ------------------ */ -#define AWKNUM double - -typedef enum { - /* illegal entry == 0 */ - Node_illegal, - - /* binary operators lnode and rnode are the expressions to work on */ - Node_times, - Node_quotient, - Node_mod, - Node_plus, - Node_minus, - Node_cond_pair, /* conditional pair (see Node_line_range) */ - Node_subscript, - Node_concat, - Node_exp, - - /* unary operators subnode is the expression to work on */ -/*10*/ Node_preincrement, - Node_predecrement, - Node_postincrement, - Node_postdecrement, - Node_unary_minus, - Node_field_spec, - - /* assignments lnode is the var to assign to, rnode is the exp */ - Node_assign, - Node_assign_times, - Node_assign_quotient, - Node_assign_mod, -/*20*/ Node_assign_plus, - Node_assign_minus, - Node_assign_exp, - - /* boolean binaries lnode and rnode are expressions */ - Node_and, - Node_or, - - /* binary relationals compares lnode and rnode */ - Node_equal, - Node_notequal, - Node_less, - Node_greater, - Node_leq, -/*30*/ Node_geq, - Node_match, - Node_nomatch, - - /* unary relationals works on subnode */ - Node_not, - - /* program structures */ - Node_rule_list, /* lnode is a rule, rnode is rest of list */ - Node_rule_node, /* lnode is pattern, rnode is statement */ - Node_statement_list, /* lnode is statement, rnode is more list */ - Node_if_branches, /* lnode is to run on true, rnode on false */ - Node_expression_list, /* lnode is an exp, rnode is more list */ - Node_param_list, /* lnode is a variable, rnode is more list */ - - /* keywords */ -/*40*/ Node_K_if, /* lnode is conditonal, rnode is if_branches */ - Node_K_while, /* lnode is condtional, rnode is stuff to run */ - Node_K_for, /* lnode is for_struct, rnode is stuff to run */ - Node_K_arrayfor, /* lnode is for_struct, rnode is stuff to run */ - Node_K_break, /* no subs */ - Node_K_continue, /* no stuff */ - Node_K_print, /* lnode is exp_list, rnode is redirect */ - Node_K_printf, /* lnode is exp_list, rnode is redirect */ - Node_K_next, /* no subs */ - Node_K_exit, /* subnode is return value, or NULL */ -/*50*/ Node_K_do, /* lnode is conditional, rnode stuff to run */ - Node_K_return, - Node_K_delete, - Node_K_getline, - Node_K_function, /* lnode is statement list, rnode is params */ - - /* I/O redirection for print statements */ - Node_redirect_output, /* subnode is where to redirect */ - Node_redirect_append, /* subnode is where to redirect */ - Node_redirect_pipe, /* subnode is where to redirect */ - Node_redirect_pipein, /* subnode is where to redirect */ - Node_redirect_input, /* subnode is where to redirect */ - - /* Variables */ -/*60*/ Node_var, /* rnode is value, lnode is array stuff */ - Node_var_array, /* array is ptr to elements, asize num of - * eles */ - Node_val, /* node is a value - type in flags */ - - /* Builtins subnode is explist to work on, proc is func to call */ - Node_builtin, - - /* - * pattern: conditional ',' conditional ; lnode of Node_line_range - * is the two conditionals (Node_cond_pair), other word (rnode place) - * is a flag indicating whether or not this range has been entered. - */ - Node_line_range, - - /* - * boolean test of membership in array lnode is string-valued - * expression rnode is array name - */ - Node_in_array, - - Node_func, /* lnode is param. list, rnode is body */ - Node_func_call, /* lnode is name, rnode is argument list */ - - Node_cond_exp, /* lnode is conditonal, rnode is if_branches */ - Node_regex, -/*70*/ Node_hashnode, - Node_ahash, - Node_NF, - Node_NR, - Node_FNR, - Node_FS, - Node_RS, - Node_FIELDWIDTHS, - Node_IGNORECASE, - Node_OFS, - Node_ORS, - Node_OFMT, - Node_CONVFMT, - Node_K_nextfile -} NODETYPE; - -/* - * NOTE - this struct is a rather kludgey -- it is packed to minimize - * space usage, at the expense of cleanliness. Alter at own risk. - */ -typedef struct exp_node { - union { - struct { - union { - struct exp_node *lptr; - char *param_name; - long ll; - } l; - union { - struct exp_node *rptr; - struct exp_node *(*pptr) (); - Regexp *preg; - struct for_loop_header *hd; - struct exp_node **av; - int r_ent; /* range entered */ - } r; - union { - char *name; - struct exp_node *extra; - long xl; - } x; - short number; - unsigned char reflags; -# define CASE 1 -# define CONST 2 -# define FS_DFLT 4 - } nodep; - struct { - AWKNUM fltnum; /* this is here for optimal packing of - * the structure on many machines - */ - char *sp; - size_t slen; - unsigned char sref; - char idx; - } val; - struct { - struct exp_node *next; - char *name; - size_t length; - struct exp_node *value; - } hash; -#define hnext sub.hash.next -#define hname sub.hash.name -#define hlength sub.hash.length -#define hvalue sub.hash.value - struct { - struct exp_node *next; - struct exp_node *name; - struct exp_node *value; - } ahash; -#define ahnext sub.ahash.next -#define ahname sub.ahash.name -#define ahvalue sub.ahash.value - } sub; - NODETYPE type; - unsigned short flags; -# define MALLOC 1 /* can be free'd */ -# define TEMP 2 /* should be free'd */ -# define PERM 4 /* can't be free'd */ -# define STRING 8 /* assigned as string */ -# define STR 16 /* string value is current */ -# define NUM 32 /* numeric value is current */ -# define NUMBER 64 /* assigned as number */ -# define MAYBE_NUM 128 /* user input: if NUMERIC then - * a NUMBER */ -# define ARRAYMAXED 256 /* array is at max size */ - char *vname; /* variable's name */ -} NODE; - -#define lnode sub.nodep.l.lptr -#define nextp sub.nodep.l.lptr -#define rnode sub.nodep.r.rptr -#define source_file sub.nodep.x.name -#define source_line sub.nodep.number -#define param_cnt sub.nodep.number -#define param sub.nodep.l.param_name - -#define subnode lnode -#define proc sub.nodep.r.pptr - -#define re_reg sub.nodep.r.preg -#define re_flags sub.nodep.reflags -#define re_text lnode -#define re_exp sub.nodep.x.extra -#define re_cnt sub.nodep.number - -#define forsub lnode -#define forloop rnode->sub.nodep.r.hd - -#define stptr sub.val.sp -#define stlen sub.val.slen -#define stref sub.val.sref -#define stfmt sub.val.idx - -#define numbr sub.val.fltnum - -#define var_value lnode -#define var_array sub.nodep.r.av -#define array_size sub.nodep.l.ll -#define table_size sub.nodep.x.xl - -#define condpair lnode -#define triggered sub.nodep.r.r_ent - -#ifdef DONTDEF -int primes[] = {31, 61, 127, 257, 509, 1021, 2053, 4099, 8191, 16381}; -#endif - -typedef struct for_loop_header { - NODE *init; - NODE *cond; - NODE *incr; -} FOR_LOOP_HEADER; - -/* for "for(iggy in foo) {" */ -struct search { - NODE *sym; - size_t idx; - NODE *bucket; - NODE *retval; -}; - -/* for faster input, bypass stdio */ -typedef struct iobuf { - int fd; - char *buf; - char *off; - char *end; - size_t size; /* this will be determined by an fstat() call */ - int cnt; - long secsiz; - int flag; -# define IOP_IS_TTY 1 -# define IOP_IS_INTERNAL 2 -# define IOP_NO_FREE 4 -} IOBUF; - -typedef void (*Func_ptr)(); - -/* - * structure used to dynamically maintain a linked-list of open files/pipes - */ -struct redirect { - unsigned int flag; -# define RED_FILE 1 -# define RED_PIPE 2 -# define RED_READ 4 -# define RED_WRITE 8 -# define RED_APPEND 16 -# define RED_NOBUF 32 -# define RED_USED 64 -# define RED_EOF 128 - char *value; - FILE *fp; - IOBUF *iop; - int pid; - int status; - struct redirect *prev; - struct redirect *next; -}; - -/* structure for our source, either a command line string or a source file */ -struct src { - enum srctype { CMDLINE = 1, SOURCEFILE } stype; - char *val; -}; - -/* longjmp return codes, must be nonzero */ -/* Continue means either for loop/while continue, or next input record */ -#define TAG_CONTINUE 1 -/* Break means either for/while break, or stop reading input */ -#define TAG_BREAK 2 -/* Return means return from a function call; leave value in ret_node */ -#define TAG_RETURN 3 - -#ifndef INT_MAX -#define INT_MAX (~(1 << (sizeof (int) * 8 - 1))) -#endif -#ifndef LONG_MAX -#define LONG_MAX (~(1 << (sizeof (long) * 8 - 1))) -#endif -#ifndef ULONG_MAX -#define ULONG_MAX (~(unsigned long)0) -#endif -#ifndef LONG_MIN -#define LONG_MIN (-LONG_MAX - 1) -#endif -#define HUGE INT_MAX - -/* -------------------------- External variables -------------------------- */ -/* gawk builtin variables */ -extern long NF; -extern long NR; -extern long FNR; -extern int IGNORECASE; -extern char *RS; -extern char *OFS; -extern int OFSlen; -extern char *ORS; -extern int ORSlen; -extern char *OFMT; -extern char *CONVFMT; -extern int CONVFMTidx; -extern int OFMTidx; -extern NODE *FS_node, *NF_node, *RS_node, *NR_node; -extern NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node; -extern NODE *CONVFMT_node; -extern NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node; -extern NODE *IGNORECASE_node; -extern NODE *FIELDWIDTHS_node; - -extern NODE **stack_ptr; -extern NODE *Nnull_string; -extern NODE **fields_arr; -extern int sourceline; -extern char *source; -extern NODE *expression_value; - -extern NODE *_t; /* used as temporary in tree_eval */ - -extern const char *myname; - -extern NODE *nextfree; -extern int field0_valid; -extern int do_unix; -extern int do_posix; -extern int do_lint; -extern int in_begin_rule; -extern int in_end_rule; - -/* ------------------------- Pseudo-functions ------------------------- */ - -#define is_identchar(c) (isalnum(c) || (c) == '_') - - -#ifndef MPROF -#define getnode(n) if (nextfree) n = nextfree, nextfree = nextfree->nextp;\ - else n = more_nodes() -#define freenode(n) ((n)->nextp = nextfree, nextfree = (n)) -#else -#define getnode(n) emalloc(n, NODE *, sizeof(NODE), "getnode") -#define freenode(n) free(n) -#endif - -#ifdef DEBUG -#define tree_eval(t) r_tree_eval(t) -#define get_lhs(p, a) r_get_lhs((p), (a)) -#undef freenode -#else -#define get_lhs(p, a) ((p)->type == Node_var ? (&(p)->var_value) : \ - r_get_lhs((p), (a))) -#define tree_eval(t) (_t = (t),_t == NULL ? Nnull_string : \ - (_t->type == Node_param_list ? r_tree_eval(_t) : \ - (_t->type == Node_val ? _t : \ - (_t->type == Node_var ? _t->var_value : \ - r_tree_eval(_t))))) -#endif - -#define make_number(x) mk_number((x), (unsigned int)(MALLOC|NUM|NUMBER)) -#define tmp_number(x) mk_number((x), (unsigned int)(MALLOC|TEMP|NUM|NUMBER)) - -#define free_temp(n) do {if ((n)->flags&TEMP) { unref(n); }} while (0) -#define make_string(s,l) make_str_node((s), SZTC (l),0) -#define SCAN 1 -#define ALREADY_MALLOCED 2 - -#define cant_happen() fatal("internal error line %d, file: %s", \ - __LINE__, __FILE__); - -#if defined(__STDC__) && !defined(NO_TOKEN_PASTING) -#define emalloc(var,ty,x,str) (void)((var=(ty)malloc((MALLOC_ARG_T)(x))) ||\ - (fatal("%s: %s: can't allocate memory (%s)",\ - (str), #var, strerror(errno)),0)) -#define erealloc(var,ty,x,str) (void)((var=(ty)realloc((char *)var,\ - (MALLOC_ARG_T)(x))) ||\ - (fatal("%s: %s: can't allocate memory (%s)",\ - (str), #var, strerror(errno)),0)) -#else /* __STDC__ */ -#define emalloc(var,ty,x,str) (void)((var=(ty)malloc((MALLOC_ARG_T)(x))) ||\ - (fatal("%s: %s: can't allocate memory (%s)",\ - (str), "var", strerror(errno)),0)) -#define erealloc(var,ty,x,str) (void)((var=(ty)realloc((char *)var,\ - (MALLOC_ARG_T)(x))) ||\ - (fatal("%s: %s: can't allocate memory (%s)",\ - (str), "var", strerror(errno)),0)) -#endif /* __STDC__ */ - -#ifdef DEBUG -#define force_number r_force_number -#define force_string r_force_string -#else /* not DEBUG */ -#ifdef lint -extern AWKNUM force_number(); -#endif -#ifdef MSDOS -extern double _msc51bug; -#define force_number(n) (_msc51bug=(_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t))) -#else /* not MSDOS */ -#define force_number(n) (_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t)) -#endif /* MSDOS */ -#define force_string(s) (_t = (s),(_t->flags & STR) ? _t : r_force_string(_t)) -#endif /* not DEBUG */ - -#define STREQ(a,b) (*(a) == *(b) && strcmp((a), (b)) == 0) -#define STREQN(a,b,n) ((n)&& *(a)== *(b) && strncmp((a), (b), SZTC (n)) == 0) - -/* ------------- Function prototypes or defs (as appropriate) ------------- */ - -/* array.c */ -extern NODE *concat_exp P((NODE *tree)); -extern void assoc_clear P((NODE *symbol)); -extern unsigned int hash P((const char *s, size_t len, unsigned long hsize)); -extern int in_array P((NODE *symbol, NODE *subs)); -extern NODE **assoc_lookup P((NODE *symbol, NODE *subs)); -extern void do_delete P((NODE *symbol, NODE *tree)); -extern void assoc_scan P((NODE *symbol, struct search *lookat)); -extern void assoc_next P((struct search *lookat)); -/* awk.tab.c */ -extern char *tokexpand P((void)); -extern char nextc P((void)); -extern NODE *node P((NODE *left, NODETYPE op, NODE *right)); -extern NODE *install P((char *name, NODE *value)); -extern NODE *lookup P((const char *name)); -extern NODE *variable P((char *name, int can_free)); -extern int yyparse P((void)); -/* builtin.c */ -extern NODE *do_exp P((NODE *tree)); -extern NODE *do_index P((NODE *tree)); -extern NODE *do_int P((NODE *tree)); -extern NODE *do_length P((NODE *tree)); -extern NODE *do_log P((NODE *tree)); -extern NODE *do_sprintf P((NODE *tree)); -extern void do_printf P((NODE *tree)); -extern void print_simple P((NODE *tree, FILE *fp)); -extern NODE *do_sqrt P((NODE *tree)); -extern NODE *do_substr P((NODE *tree)); -extern NODE *do_strftime P((NODE *tree)); -extern NODE *do_systime P((NODE *tree)); -extern NODE *do_system P((NODE *tree)); -extern void do_print P((NODE *tree)); -extern NODE *do_tolower P((NODE *tree)); -extern NODE *do_toupper P((NODE *tree)); -extern NODE *do_atan2 P((NODE *tree)); -extern NODE *do_sin P((NODE *tree)); -extern NODE *do_cos P((NODE *tree)); -extern NODE *do_rand P((NODE *tree)); -extern NODE *do_srand P((NODE *tree)); -extern NODE *do_match P((NODE *tree)); -extern NODE *do_gsub P((NODE *tree)); -extern NODE *do_sub P((NODE *tree)); -/* eval.c */ -extern int interpret P((NODE *volatile tree)); -extern NODE *r_tree_eval P((NODE *tree)); -extern int cmp_nodes P((NODE *t1, NODE *t2)); -extern NODE **r_get_lhs P((NODE *ptr, Func_ptr *assign)); -extern void set_IGNORECASE P((void)); -void set_OFS P((void)); -void set_ORS P((void)); -void set_OFMT P((void)); -void set_CONVFMT P((void)); -/* field.c */ -extern void init_fields P((void)); -extern void set_record P((char *buf, int cnt, int freeold)); -extern void reset_record P((void)); -extern void set_NF P((void)); -extern NODE **get_field P((int num, Func_ptr *assign)); -extern NODE *do_split P((NODE *tree)); -extern void set_FS P((void)); -extern void set_RS P((void)); -extern void set_FIELDWIDTHS P((void)); -/* io.c */ -extern void set_FNR P((void)); -extern void set_NR P((void)); -extern void do_input P((void)); -extern struct redirect *redirect P((NODE *tree, int *errflg)); -extern NODE *do_close P((NODE *tree)); -extern int flush_io P((void)); -extern int close_io P((void)); -extern int devopen P((const char *name, const char *mode)); -extern int pathopen P((const char *file)); -extern NODE *do_getline P((NODE *tree)); -extern void do_nextfile P((void)); -/* iop.c */ -extern int optimal_bufsize P((int fd)); -extern IOBUF *iop_alloc P((int fd)); -extern int get_a_record P((char **out, IOBUF *iop, int rs, int *errcode)); -/* main.c */ -extern int main P((int argc, char **argv)); -extern Regexp *mk_re_parse P((char *s, int ignorecase)); -extern void load_environ P((void)); -extern char *arg_assign P((char *arg)); -extern SIGTYPE catchsig P((int sig, int code)); -/* msg.c */ -extern void err P((const char *s, const char *emsg, va_list argp)); -#if _MSC_VER == 510 -extern void msg P((va_list va_alist, ...)); -extern void warning P((va_list va_alist, ...)); -extern void fatal P((va_list va_alist, ...)); -#else -extern void msg (); -extern void warning (); -extern void fatal (); -#endif -/* node.c */ -extern AWKNUM r_force_number P((NODE *n)); -extern NODE *r_force_string P((NODE *s)); -extern NODE *dupnode P((NODE *n)); -extern NODE *mk_number P((AWKNUM x, unsigned int flags)); -extern NODE *make_str_node P((char *s, size_t len, int scan )); -extern NODE *tmp_string P((char *s, size_t len )); -extern NODE *more_nodes P((void)); -#ifdef DEBUG -extern void freenode P((NODE *it)); -#endif -extern void unref P((NODE *tmp)); -extern int parse_escape P((char **string_ptr)); -/* re.c */ -extern Regexp *make_regexp P((char *s, size_t len, int ignorecase, int dfa)); -extern int research P((Regexp *rp, char *str, int start, - size_t len, int need_start)); -extern void refree P((Regexp *rp)); -extern void reg_error P((const char *s)); -extern Regexp *re_update P((NODE *t)); -extern void resyntax P((int syntax)); -extern void resetup P((void)); - -/* strcase.c */ -extern int strcasecmp P((const char *s1, const char *s2)); -extern int strncasecmp P((const char *s1, const char *s2, register size_t n)); - -#ifdef atarist -/* atari/tmpnam.c */ -extern char *tmpnam P((char *buf)); -extern char *tempnam P((const char *path, const char *base)); -#endif - -/* Figure out what '\a' really is. */ -#ifdef __STDC__ -#define BELL '\a' /* sure makes life easy, don't it? */ -#else -# if 'z' - 'a' == 25 /* ascii */ -# if 'a' != 97 /* machine is dumb enough to use mark parity */ -# define BELL '\207' -# else -# define BELL '\07' -# endif -# else -# define BELL '\057' -# endif -#endif - -extern char casetable[]; /* for case-independent regexp matching */ diff --git a/gnu/usr.bin/awk/awk.y b/gnu/usr.bin/awk/awk.y deleted file mode 100644 index 175cea9..0000000 --- a/gnu/usr.bin/awk/awk.y +++ /dev/null @@ -1,1868 +0,0 @@ -/* - * awk.y --- yacc/bison parser - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -%{ -#ifdef DEBUG -#define YYDEBUG 12 -#endif - -#include "awk.h" - -static void yyerror (); /* va_alist */ -static char *get_src_buf P((void)); -static int yylex P((void)); -static NODE *node_common P((NODETYPE op)); -static NODE *snode P((NODE *subn, NODETYPE op, int sindex)); -static NODE *mkrangenode P((NODE *cpair)); -static NODE *make_for_loop P((NODE *init, NODE *cond, NODE *incr)); -static NODE *append_right P((NODE *list, NODE *new)); -static void func_install P((NODE *params, NODE *def)); -static void pop_var P((NODE *np, int freeit)); -static void pop_params P((NODE *params)); -static NODE *make_param P((char *name)); -static NODE *mk_rexp P((NODE *exp)); - -static int want_assign; /* lexical scanning kludge */ -static int want_regexp; /* lexical scanning kludge */ -static int can_return; /* lexical scanning kludge */ -static int io_allowed = 1; /* lexical scanning kludge */ -static char *lexptr; /* pointer to next char during parsing */ -static char *lexend; -static char *lexptr_begin; /* keep track of where we were for error msgs */ -static char *lexeme; /* beginning of lexeme for debugging */ -static char *thisline = NULL; -#define YYDEBUG_LEXER_TEXT (lexeme) -static int param_counter; -static char *tokstart = NULL; -static char *tok = NULL; -static char *tokend; - -#define HASHSIZE 1021 /* this constant only used here */ -NODE *variables[HASHSIZE]; - -extern char *source; -extern int sourceline; -extern struct src *srcfiles; -extern int numfiles; -extern int errcount; -extern NODE *begin_block; -extern NODE *end_block; -%} - -%union { - long lval; - AWKNUM fval; - NODE *nodeval; - NODETYPE nodetypeval; - char *sval; - NODE *(*ptrval)(); -} - -%type <nodeval> function_prologue function_body -%type <nodeval> rexp exp start program rule simp_exp -%type <nodeval> non_post_simp_exp -%type <nodeval> pattern -%type <nodeval> action variable param_list -%type <nodeval> rexpression_list opt_rexpression_list -%type <nodeval> expression_list opt_expression_list -%type <nodeval> statements statement if_statement opt_param_list -%type <nodeval> opt_exp opt_variable regexp -%type <nodeval> input_redir output_redir -%type <nodetypeval> print -%type <sval> func_name -%type <lval> lex_builtin - -%token <sval> FUNC_CALL NAME REGEXP -%token <lval> ERROR -%token <nodeval> YNUMBER YSTRING -%token <nodetypeval> RELOP APPEND_OP -%token <nodetypeval> ASSIGNOP MATCHOP NEWLINE CONCAT_OP -%token <nodetypeval> LEX_BEGIN LEX_END LEX_IF LEX_ELSE LEX_RETURN LEX_DELETE -%token <nodetypeval> LEX_WHILE LEX_DO LEX_FOR LEX_BREAK LEX_CONTINUE -%token <nodetypeval> LEX_PRINT LEX_PRINTF LEX_NEXT LEX_EXIT LEX_FUNCTION -%token <nodetypeval> LEX_GETLINE -%token <nodetypeval> LEX_IN -%token <lval> LEX_AND LEX_OR INCREMENT DECREMENT -%token <lval> LEX_BUILTIN LEX_LENGTH - -/* these are just yylval numbers */ - -/* Lowest to highest */ -%right ASSIGNOP -%right '?' ':' -%left LEX_OR -%left LEX_AND -%left LEX_GETLINE -%nonassoc LEX_IN -%left FUNC_CALL LEX_BUILTIN LEX_LENGTH -%nonassoc ',' -%nonassoc MATCHOP -%nonassoc RELOP '<' '>' '|' APPEND_OP -%left CONCAT_OP -%left YSTRING YNUMBER -%left '+' '-' -%left '*' '/' '%' -%right '!' UNARY -%right '^' -%left INCREMENT DECREMENT -%left '$' -%left '(' ')' -%% - -start - : opt_nls program opt_nls - { expression_value = $2; } - ; - -program - : rule - { - if ($1 != NULL) - $$ = $1; - else - $$ = NULL; - yyerrok; - } - | program rule - /* add the rule to the tail of list */ - { - if ($2 == NULL) - $$ = $1; - else if ($1 == NULL) - $$ = $2; - else { - if ($1->type != Node_rule_list) - $1 = node($1, Node_rule_list, - (NODE*)NULL); - $$ = append_right ($1, - node($2, Node_rule_list,(NODE *) NULL)); - } - yyerrok; - } - | error { $$ = NULL; } - | program error { $$ = NULL; } - | /* empty */ { $$ = NULL; } - ; - -rule - : LEX_BEGIN { io_allowed = 0; } - action - { - if (begin_block) { - if (begin_block->type != Node_rule_list) - begin_block = node(begin_block, Node_rule_list, - (NODE *)NULL); - (void) append_right (begin_block, node( - node((NODE *)NULL, Node_rule_node, $3), - Node_rule_list, (NODE *)NULL) ); - } else - begin_block = node((NODE *)NULL, Node_rule_node, $3); - $$ = NULL; - io_allowed = 1; - yyerrok; - } - | LEX_END { io_allowed = 0; } - action - { - if (end_block) { - if (end_block->type != Node_rule_list) - end_block = node(end_block, Node_rule_list, - (NODE *)NULL); - (void) append_right (end_block, node( - node((NODE *)NULL, Node_rule_node, $3), - Node_rule_list, (NODE *)NULL)); - } else - end_block = node((NODE *)NULL, Node_rule_node, $3); - $$ = NULL; - io_allowed = 1; - yyerrok; - } - | LEX_BEGIN statement_term - { - warning("BEGIN blocks must have an action part"); - errcount++; - yyerrok; - } - | LEX_END statement_term - { - warning("END blocks must have an action part"); - errcount++; - yyerrok; - } - | pattern action - { $$ = node ($1, Node_rule_node, $2); yyerrok; } - | action - { $$ = node ((NODE *)NULL, Node_rule_node, $1); yyerrok; } - | pattern statement_term - { - $$ = node ($1, - Node_rule_node, - node(node(node(make_number(0.0), - Node_field_spec, - (NODE *) NULL), - Node_expression_list, - (NODE *) NULL), - Node_K_print, - (NODE *) NULL)); - yyerrok; - } - | function_prologue function_body - { - func_install($1, $2); - $$ = NULL; - yyerrok; - } - ; - -func_name - : NAME - { $$ = $1; } - | FUNC_CALL - { $$ = $1; } - | lex_builtin - { - yyerror("%s() is a built-in function, it cannot be redefined", - tokstart); - errcount++; - /* yyerrok; */ - } - ; - -lex_builtin - : LEX_BUILTIN - | LEX_LENGTH - ; - -function_prologue - : LEX_FUNCTION - { - param_counter = 0; - } - func_name '(' opt_param_list r_paren opt_nls - { - $$ = append_right(make_param($3), $5); - can_return = 1; - } - ; - -function_body - : l_brace statements r_brace opt_semi - { - $$ = $2; - can_return = 0; - } - ; - - -pattern - : exp - { $$ = $1; } - | exp ',' exp - { $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); } - ; - -regexp - /* - * In this rule, want_regexp tells yylex that the next thing - * is a regexp so it should read up to the closing slash. - */ - : '/' - { ++want_regexp; } - REGEXP '/' - { - NODE *n; - size_t len; - - getnode(n); - n->type = Node_regex; - len = strlen($3); - n->re_exp = make_string($3, len); - n->re_reg = make_regexp($3, len, 0, 1); - n->re_text = NULL; - n->re_flags = CONST; - n->re_cnt = 1; - $$ = n; - } - ; - -action - : l_brace statements r_brace opt_semi opt_nls - { $$ = $2 ; } - | l_brace r_brace opt_semi opt_nls - { $$ = NULL; } - ; - -statements - : statement - { $$ = $1; } - | statements statement - { - if ($1 == NULL || $1->type != Node_statement_list) - $1 = node($1, Node_statement_list,(NODE *)NULL); - $$ = append_right($1, - node( $2, Node_statement_list, (NODE *)NULL)); - yyerrok; - } - | error - { $$ = NULL; } - | statements error - { $$ = NULL; } - ; - -statement_term - : nls - | semi opt_nls - ; - -statement - : semi opt_nls - { $$ = NULL; } - | l_brace r_brace - { $$ = NULL; } - | l_brace statements r_brace - { $$ = $2; } - | if_statement - { $$ = $1; } - | LEX_WHILE '(' exp r_paren opt_nls statement - { $$ = node ($3, Node_K_while, $6); } - | LEX_DO opt_nls statement LEX_WHILE '(' exp r_paren opt_nls - { $$ = node ($6, Node_K_do, $3); } - | LEX_FOR '(' NAME LEX_IN NAME r_paren opt_nls statement - { - $$ = node ($8, Node_K_arrayfor, make_for_loop(variable($3,1), - (NODE *)NULL, variable($5,1))); - } - | LEX_FOR '(' opt_exp semi exp semi opt_exp r_paren opt_nls statement - { - $$ = node($10, Node_K_for, (NODE *)make_for_loop($3, $5, $7)); - } - | LEX_FOR '(' opt_exp semi semi opt_exp r_paren opt_nls statement - { - $$ = node ($9, Node_K_for, - (NODE *)make_for_loop($3, (NODE *)NULL, $6)); - } - | LEX_BREAK statement_term - /* for break, maybe we'll have to remember where to break to */ - { $$ = node ((NODE *)NULL, Node_K_break, (NODE *)NULL); } - | LEX_CONTINUE statement_term - /* similarly */ - { $$ = node ((NODE *)NULL, Node_K_continue, (NODE *)NULL); } - | print '(' expression_list r_paren output_redir statement_term - { $$ = node ($3, $1, $5); } - | print opt_rexpression_list output_redir statement_term - { - if ($1 == Node_K_print && $2 == NULL) - $2 = node(node(make_number(0.0), - Node_field_spec, - (NODE *) NULL), - Node_expression_list, - (NODE *) NULL); - - $$ = node ($2, $1, $3); - } - | LEX_NEXT opt_exp statement_term - { NODETYPE type; - - if ($2 && $2 == lookup("file")) { - if (do_lint) - warning("`next file' is a gawk extension"); - if (do_unix || do_posix) { - /* - * can't use yyerror, since may have overshot - * the source line - */ - errcount++; - msg("`next file' is a gawk extension"); - } - if (! io_allowed) { - /* same thing */ - errcount++; - msg("`next file' used in BEGIN or END action"); - } - type = Node_K_nextfile; - } else { - if (! io_allowed) - yyerror("next used in BEGIN or END action"); - type = Node_K_next; - } - $$ = node ((NODE *)NULL, type, (NODE *)NULL); - } - | LEX_EXIT opt_exp statement_term - { $$ = node ($2, Node_K_exit, (NODE *)NULL); } - | LEX_RETURN - { if (! can_return) yyerror("return used outside function context"); } - opt_exp statement_term - { $$ = node ($3, Node_K_return, (NODE *)NULL); } - | LEX_DELETE NAME '[' expression_list ']' statement_term - { $$ = node (variable($2,1), Node_K_delete, $4); } - | LEX_DELETE NAME statement_term - { - if (do_lint) - warning("`delete array' is a gawk extension"); - if (do_unix || do_posix) { - /* - * can't use yyerror, since may have overshot - * the source line - */ - errcount++; - msg("`delete array' is a gawk extension"); - } - $$ = node (variable($2,1), Node_K_delete, (NODE *) NULL); - } - | exp statement_term - { $$ = $1; } - ; - -print - : LEX_PRINT - { $$ = $1; } - | LEX_PRINTF - { $$ = $1; } - ; - -if_statement - : LEX_IF '(' exp r_paren opt_nls statement - { - $$ = node($3, Node_K_if, - node($6, Node_if_branches, (NODE *)NULL)); - } - | LEX_IF '(' exp r_paren opt_nls statement - LEX_ELSE opt_nls statement - { $$ = node ($3, Node_K_if, - node ($6, Node_if_branches, $9)); } - ; - -nls - : NEWLINE - { want_assign = 0; } - | nls NEWLINE - ; - -opt_nls - : /* empty */ - | nls - ; - -input_redir - : /* empty */ - { $$ = NULL; } - | '<' simp_exp - { $$ = node ($2, Node_redirect_input, (NODE *)NULL); } - ; - -output_redir - : /* empty */ - { $$ = NULL; } - | '>' exp - { $$ = node ($2, Node_redirect_output, (NODE *)NULL); } - | APPEND_OP exp - { $$ = node ($2, Node_redirect_append, (NODE *)NULL); } - | '|' exp - { $$ = node ($2, Node_redirect_pipe, (NODE *)NULL); } - ; - -opt_param_list - : /* empty */ - { $$ = NULL; } - | param_list - { $$ = $1; } - ; - -param_list - : NAME - { $$ = make_param($1); } - | param_list comma NAME - { $$ = append_right($1, make_param($3)); yyerrok; } - | error - { $$ = NULL; } - | param_list error - { $$ = NULL; } - | param_list comma error - { $$ = NULL; } - ; - -/* optional expression, as in for loop */ -opt_exp - : /* empty */ - { $$ = NULL; } - | exp - { $$ = $1; } - ; - -opt_rexpression_list - : /* empty */ - { $$ = NULL; } - | rexpression_list - { $$ = $1; } - ; - -rexpression_list - : rexp - { $$ = node ($1, Node_expression_list, (NODE *)NULL); } - | rexpression_list comma rexp - { - $$ = append_right($1, - node( $3, Node_expression_list, (NODE *)NULL)); - yyerrok; - } - | error - { $$ = NULL; } - | rexpression_list error - { $$ = NULL; } - | rexpression_list error rexp - { $$ = NULL; } - | rexpression_list comma error - { $$ = NULL; } - ; - -opt_expression_list - : /* empty */ - { $$ = NULL; } - | expression_list - { $$ = $1; } - ; - -expression_list - : exp - { $$ = node ($1, Node_expression_list, (NODE *)NULL); } - | expression_list comma exp - { - $$ = append_right($1, - node( $3, Node_expression_list, (NODE *)NULL)); - yyerrok; - } - | error - { $$ = NULL; } - | expression_list error - { $$ = NULL; } - | expression_list error exp - { $$ = NULL; } - | expression_list comma error - { $$ = NULL; } - ; - -/* Expressions, not including the comma operator. */ -exp : variable ASSIGNOP - { want_assign = 0; } - exp - { - if (do_lint && $4->type == Node_regex) - warning("Regular expression on left of assignment."); - $$ = node ($1, $2, $4); - } - | '(' expression_list r_paren LEX_IN NAME - { $$ = node (variable($5,1), Node_in_array, $2); } - | exp '|' LEX_GETLINE opt_variable - { - $$ = node ($4, Node_K_getline, - node ($1, Node_redirect_pipein, (NODE *)NULL)); - } - | LEX_GETLINE opt_variable input_redir - { - if (do_lint && ! io_allowed && $3 == NULL) - warning("non-redirected getline undefined inside BEGIN or END action"); - $$ = node ($2, Node_K_getline, $3); - } - | exp LEX_AND exp - { $$ = node ($1, Node_and, $3); } - | exp LEX_OR exp - { $$ = node ($1, Node_or, $3); } - | exp MATCHOP exp - { - if ($1->type == Node_regex) - warning("Regular expression on left of MATCH operator."); - $$ = node ($1, $2, mk_rexp($3)); - } - | regexp - { $$ = $1; } - | '!' regexp %prec UNARY - { - $$ = node(node(make_number(0.0), - Node_field_spec, - (NODE *) NULL), - Node_nomatch, - $2); - } - | exp LEX_IN NAME - { $$ = node (variable($3,1), Node_in_array, $1); } - | exp RELOP exp - { - if (do_lint && $3->type == Node_regex) - warning("Regular expression on left of comparison."); - $$ = node ($1, $2, $3); - } - | exp '<' exp - { $$ = node ($1, Node_less, $3); } - | exp '>' exp - { $$ = node ($1, Node_greater, $3); } - | exp '?' exp ':' exp - { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5));} - | simp_exp - { $$ = $1; } - | exp simp_exp %prec CONCAT_OP - { $$ = node ($1, Node_concat, $2); } - ; - -rexp - : variable ASSIGNOP - { want_assign = 0; } - rexp - { $$ = node ($1, $2, $4); } - | rexp LEX_AND rexp - { $$ = node ($1, Node_and, $3); } - | rexp LEX_OR rexp - { $$ = node ($1, Node_or, $3); } - | LEX_GETLINE opt_variable input_redir - { - if (do_lint && ! io_allowed && $3 == NULL) - warning("non-redirected getline undefined inside BEGIN or END action"); - $$ = node ($2, Node_K_getline, $3); - } - | regexp - { $$ = $1; } - | '!' regexp %prec UNARY - { $$ = node((NODE *) NULL, Node_nomatch, $2); } - | rexp MATCHOP rexp - { $$ = node ($1, $2, mk_rexp($3)); } - | rexp LEX_IN NAME - { $$ = node (variable($3,1), Node_in_array, $1); } - | rexp RELOP rexp - { $$ = node ($1, $2, $3); } - | rexp '?' rexp ':' rexp - { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5));} - | simp_exp - { $$ = $1; } - | rexp simp_exp %prec CONCAT_OP - { $$ = node ($1, Node_concat, $2); } - ; - -simp_exp - : non_post_simp_exp - /* Binary operators in order of decreasing precedence. */ - | simp_exp '^' simp_exp - { $$ = node ($1, Node_exp, $3); } - | simp_exp '*' simp_exp - { $$ = node ($1, Node_times, $3); } - | simp_exp '/' simp_exp - { $$ = node ($1, Node_quotient, $3); } - | simp_exp '%' simp_exp - { $$ = node ($1, Node_mod, $3); } - | simp_exp '+' simp_exp - { $$ = node ($1, Node_plus, $3); } - | simp_exp '-' simp_exp - { $$ = node ($1, Node_minus, $3); } - | variable INCREMENT - { $$ = node ($1, Node_postincrement, (NODE *)NULL); } - | variable DECREMENT - { $$ = node ($1, Node_postdecrement, (NODE *)NULL); } - ; - -non_post_simp_exp - : '!' simp_exp %prec UNARY - { $$ = node ($2, Node_not,(NODE *) NULL); } - | '(' exp r_paren - { $$ = $2; } - | LEX_BUILTIN - '(' opt_expression_list r_paren - { $$ = snode ($3, Node_builtin, (int) $1); } - | LEX_LENGTH '(' opt_expression_list r_paren - { $$ = snode ($3, Node_builtin, (int) $1); } - | LEX_LENGTH - { - if (do_lint) - warning("call of `length' without parentheses is not portable"); - $$ = snode ((NODE *)NULL, Node_builtin, (int) $1); - if (do_posix) - warning( "call of `length' without parentheses is deprecated by POSIX"); - } - | FUNC_CALL '(' opt_expression_list r_paren - { - $$ = node ($3, Node_func_call, make_string($1, strlen($1))); - } - | variable - | INCREMENT variable - { $$ = node ($2, Node_preincrement, (NODE *)NULL); } - | DECREMENT variable - { $$ = node ($2, Node_predecrement, (NODE *)NULL); } - | YNUMBER - { $$ = $1; } - | YSTRING - { $$ = $1; } - - | '-' simp_exp %prec UNARY - { if ($2->type == Node_val) { - $2->numbr = -(force_number($2)); - $$ = $2; - } else - $$ = node ($2, Node_unary_minus, (NODE *)NULL); - } - | '+' simp_exp %prec UNARY - { - /* was: $$ = $2 */ - /* POSIX semantics: force a conversion to numeric type */ - $$ = node (make_number(0.0), Node_plus, $2); - } - ; - -opt_variable - : /* empty */ - { $$ = NULL; } - | variable - { $$ = $1; } - ; - -variable - : NAME - { $$ = variable($1,1); } - | NAME '[' expression_list ']' - { - if ($3->rnode == NULL) { - $$ = node (variable($1,1), Node_subscript, $3->lnode); - freenode($3); - } else - $$ = node (variable($1,1), Node_subscript, $3); - } - | '$' non_post_simp_exp - { $$ = node ($2, Node_field_spec, (NODE *)NULL); } - ; - -l_brace - : '{' opt_nls - ; - -r_brace - : '}' opt_nls { yyerrok; } - ; - -r_paren - : ')' { yyerrok; } - ; - -opt_semi - : /* empty */ - | semi - ; - -semi - : ';' { yyerrok; want_assign = 0; } - ; - -comma : ',' opt_nls { yyerrok; } - ; - -%% - -struct token { - const char *operator; /* text to match */ - NODETYPE value; /* node type */ - int class; /* lexical class */ - unsigned flags; /* # of args. allowed and compatability */ -# define ARGS 0xFF /* 0, 1, 2, 3 args allowed (any combination */ -# define A(n) (1<<(n)) -# define VERSION 0xFF00 /* old awk is zero */ -# define NOT_OLD 0x0100 /* feature not in old awk */ -# define NOT_POSIX 0x0200 /* feature not in POSIX */ -# define GAWKX 0x0400 /* gawk extension */ - NODE *(*ptr) (); /* function that implements this keyword */ -}; - -extern NODE - *do_exp(), *do_getline(), *do_index(), *do_length(), - *do_sqrt(), *do_log(), *do_sprintf(), *do_substr(), - *do_split(), *do_system(), *do_int(), *do_close(), - *do_atan2(), *do_sin(), *do_cos(), *do_rand(), - *do_srand(), *do_match(), *do_tolower(), *do_toupper(), - *do_sub(), *do_gsub(), *do_strftime(), *do_systime(); - -/* Tokentab is sorted ascii ascending order, so it can be binary searched. */ - -static struct token tokentab[] = { -{"BEGIN", Node_illegal, LEX_BEGIN, 0, 0}, -{"END", Node_illegal, LEX_END, 0, 0}, -{"atan2", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2), do_atan2}, -{"break", Node_K_break, LEX_BREAK, 0, 0}, -{"close", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_close}, -{"continue", Node_K_continue, LEX_CONTINUE, 0, 0}, -{"cos", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_cos}, -{"delete", Node_K_delete, LEX_DELETE, NOT_OLD, 0}, -{"do", Node_K_do, LEX_DO, NOT_OLD, 0}, -{"else", Node_illegal, LEX_ELSE, 0, 0}, -{"exit", Node_K_exit, LEX_EXIT, 0, 0}, -{"exp", Node_builtin, LEX_BUILTIN, A(1), do_exp}, -{"for", Node_K_for, LEX_FOR, 0, 0}, -{"func", Node_K_function, LEX_FUNCTION, NOT_POSIX|NOT_OLD, 0}, -{"function", Node_K_function, LEX_FUNCTION, NOT_OLD, 0}, -{"getline", Node_K_getline, LEX_GETLINE, NOT_OLD, 0}, -{"gsub", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2)|A(3), do_gsub}, -{"if", Node_K_if, LEX_IF, 0, 0}, -{"in", Node_illegal, LEX_IN, 0, 0}, -{"index", Node_builtin, LEX_BUILTIN, A(2), do_index}, -{"int", Node_builtin, LEX_BUILTIN, A(1), do_int}, -{"length", Node_builtin, LEX_LENGTH, A(0)|A(1), do_length}, -{"log", Node_builtin, LEX_BUILTIN, A(1), do_log}, -{"match", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2), do_match}, -{"next", Node_K_next, LEX_NEXT, 0, 0}, -{"print", Node_K_print, LEX_PRINT, 0, 0}, -{"printf", Node_K_printf, LEX_PRINTF, 0, 0}, -{"rand", Node_builtin, LEX_BUILTIN, NOT_OLD|A(0), do_rand}, -{"return", Node_K_return, LEX_RETURN, NOT_OLD, 0}, -{"sin", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_sin}, -{"split", Node_builtin, LEX_BUILTIN, A(2)|A(3), do_split}, -{"sprintf", Node_builtin, LEX_BUILTIN, 0, do_sprintf}, -{"sqrt", Node_builtin, LEX_BUILTIN, A(1), do_sqrt}, -{"srand", Node_builtin, LEX_BUILTIN, NOT_OLD|A(0)|A(1), do_srand}, -{"strftime", Node_builtin, LEX_BUILTIN, GAWKX|A(1)|A(2), do_strftime}, -{"sub", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2)|A(3), do_sub}, -{"substr", Node_builtin, LEX_BUILTIN, A(2)|A(3), do_substr}, -{"system", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_system}, -{"systime", Node_builtin, LEX_BUILTIN, GAWKX|A(0), do_systime}, -{"tolower", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_tolower}, -{"toupper", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_toupper}, -{"while", Node_K_while, LEX_WHILE, 0, 0}, -}; - -/* VARARGS0 */ -static void -yyerror(va_alist) -va_dcl -{ - va_list args; - const char *mesg = NULL; - register char *bp, *cp; - char *scan; - char buf[120]; - static char end_of_file_line[] = "(END OF FILE)"; - - errcount++; - /* Find the current line in the input file */ - if (lexptr && lexeme) { - if (!thisline) { - cp = lexeme; - if (*cp == '\n') { - cp--; - mesg = "unexpected newline"; - } - for ( ; cp != lexptr_begin && *cp != '\n'; --cp) - continue; - if (*cp == '\n') - cp++; - thisline = cp; - } - /* NL isn't guaranteed */ - bp = lexeme; - while (bp < lexend && *bp && *bp != '\n') - bp++; - } else { - thisline = end_of_file_line; - bp = thisline + strlen(thisline); - } - msg("%.*s", (int) (bp - thisline), thisline); - bp = buf; - cp = buf + sizeof(buf) - 24; /* 24 more than longest msg. input */ - if (lexptr) { - scan = thisline; - while (bp < cp && scan < lexeme) - if (*scan++ == '\t') - *bp++ = '\t'; - else - *bp++ = ' '; - *bp++ = '^'; - *bp++ = ' '; - } - va_start(args); - if (mesg == NULL) - mesg = va_arg(args, char *); - strcpy(bp, mesg); - err("", buf, args); - va_end(args); - exit(2); -} - -static char * -get_src_buf() -{ - static int samefile = 0; - static int nextfile = 0; - static char *buf = NULL; - static int fd; - int n; - register char *scan; - static int len = 0; - static int did_newline = 0; -# define SLOP 128 /* enough space to hold most source lines */ - -again: - if (nextfile > numfiles) - return NULL; - - if (srcfiles[nextfile].stype == CMDLINE) { - if (len == 0) { - len = strlen(srcfiles[nextfile].val); - if (len == 0) { - /* - * Yet Another Special case: - * gawk '' /path/name - * Sigh. - */ - ++nextfile; - goto again; - } - sourceline = 1; - lexptr = lexptr_begin = srcfiles[nextfile].val; - lexend = lexptr + len; - } else if (!did_newline && *(lexptr-1) != '\n') { - /* - * The following goop is to ensure that the source - * ends with a newline and that the entire current - * line is available for error messages. - */ - int offset; - - did_newline = 1; - offset = lexptr - lexeme; - for (scan = lexeme; scan > lexptr_begin; scan--) - if (*scan == '\n') { - scan++; - break; - } - len = lexptr - scan; - emalloc(buf, char *, len+1, "get_src_buf"); - memcpy(buf, scan, len); - thisline = buf; - lexptr = buf + len; - *lexptr = '\n'; - lexeme = lexptr - offset; - lexptr_begin = buf; - lexend = lexptr + 1; - } else { - len = 0; - lexeme = lexptr = lexptr_begin = NULL; - } - if (lexptr == NULL && ++nextfile <= numfiles) - return get_src_buf(); - return lexptr; - } - if (!samefile) { - source = srcfiles[nextfile].val; - if (source == NULL) { - if (buf) { - free(buf); - buf = NULL; - } - len = 0; - return lexeme = lexptr = lexptr_begin = NULL; - } - fd = pathopen(source); - if (fd == -1) - fatal("can't open source file \"%s\" for reading (%s)", - source, strerror(errno)); - len = optimal_bufsize(fd); - if (buf) - free(buf); - emalloc(buf, char *, len + SLOP, "get_src_buf"); - lexptr_begin = buf + SLOP; - samefile = 1; - sourceline = 1; - } else { - /* - * Here, we retain the current source line (up to length SLOP) - * in the beginning of the buffer that was overallocated above - */ - int offset; - int linelen; - - offset = lexptr - lexeme; - for (scan = lexeme; scan > lexptr_begin; scan--) - if (*scan == '\n') { - scan++; - break; - } - linelen = lexptr - scan; - if (linelen > SLOP) - linelen = SLOP; - thisline = buf + SLOP - linelen; - memcpy(thisline, scan, linelen); - lexeme = buf + SLOP - offset; - lexptr_begin = thisline; - } - n = read(fd, buf + SLOP, len); - if (n == -1) - fatal("can't read sourcefile \"%s\" (%s)", - source, strerror(errno)); - if (n == 0) { - samefile = 0; - nextfile++; - *lexeme = '\0'; - len = 0; - return get_src_buf(); - } - lexptr = buf + SLOP; - lexend = lexptr + n; - return buf; -} - -#define tokadd(x) (*tok++ = (x), tok == tokend ? tokexpand() : tok) - -char * -tokexpand() -{ - static int toksize = 60; - int tokoffset; - - tokoffset = tok - tokstart; - toksize *= 2; - if (tokstart) - erealloc(tokstart, char *, toksize, "tokexpand"); - else - emalloc(tokstart, char *, toksize, "tokexpand"); - tokend = tokstart + toksize; - tok = tokstart + tokoffset; - return tok; -} - -#if DEBUG -char -nextc() { - if (lexptr && lexptr < lexend) - return *lexptr++; - else if (get_src_buf()) - return *lexptr++; - else - return '\0'; -} -#else -#define nextc() ((lexptr && lexptr < lexend) ? \ - *lexptr++ : \ - (get_src_buf() ? *lexptr++ : '\0') \ - ) -#endif -#define pushback() (lexptr && lexptr > lexptr_begin ? lexptr-- : lexptr) - -/* - * Read the input and turn it into tokens. - */ - -static int -yylex() -{ - register int c; - int seen_e = 0; /* These are for numbers */ - int seen_point = 0; - int esc_seen; /* for literal strings */ - int low, mid, high; - static int did_newline = 0; - char *tokkey; - - if (!nextc()) - return 0; - pushback(); -#ifdef OS2 - /* - * added for OS/2's extproc feature of cmd.exe - * (like #! in BSD sh) - */ - if (strncasecmp(lexptr, "extproc ", 8) == 0) { - while (*lexptr && *lexptr != '\n') - lexptr++; - } -#endif - lexeme = lexptr; - thisline = NULL; - if (want_regexp) { - int in_brack = 0; - - want_regexp = 0; - tok = tokstart; - while ((c = nextc()) != 0) { - switch (c) { - case '[': - in_brack = 1; - break; - case ']': - in_brack = 0; - break; - case '\\': - if ((c = nextc()) == '\0') { - yyerror("unterminated regexp ends with \\ at end of file"); - } else if (c == '\n') { - sourceline++; - continue; - } else - tokadd('\\'); - break; - case '/': /* end of the regexp */ - if (in_brack) - break; - - pushback(); - tokadd('\0'); - yylval.sval = tokstart; - return REGEXP; - case '\n': - pushback(); - yyerror("unterminated regexp"); - case '\0': - yyerror("unterminated regexp at end of file"); - } - tokadd(c); - } - } -retry: - while ((c = nextc()) == ' ' || c == '\t') - continue; - - lexeme = lexptr ? lexptr - 1 : lexptr; - thisline = NULL; - tok = tokstart; - yylval.nodetypeval = Node_illegal; - - switch (c) { - case 0: - return 0; - - case '\n': - sourceline++; - return NEWLINE; - - case '#': /* it's a comment */ - while ((c = nextc()) != '\n') { - if (c == '\0') - return 0; - } - sourceline++; - return NEWLINE; - - case '\\': -#ifdef RELAXED_CONTINUATION - /* - * This code puports to allow comments and/or whitespace - * after the `\' at the end of a line used for continuation. - * Use it at your own risk. We think it's a bad idea, which - * is why it's not on by default. - */ - if (!do_unix) { - /* strip trailing white-space and/or comment */ - while ((c = nextc()) == ' ' || c == '\t') - continue; - if (c == '#') - while ((c = nextc()) != '\n') - if (c == '\0') - break; - pushback(); - } -#endif /* RELAXED_CONTINUATION */ - if (nextc() == '\n') { - sourceline++; - goto retry; - } else - yyerror("backslash not last character on line"); - break; - - case '$': - want_assign = 1; - return '$'; - - case ')': - case ']': - case '(': - case '[': - case ';': - case ':': - case '?': - case '{': - case ',': - return c; - - case '*': - if ((c = nextc()) == '=') { - yylval.nodetypeval = Node_assign_times; - return ASSIGNOP; - } else if (do_posix) { - pushback(); - return '*'; - } else if (c == '*') { - /* make ** and **= aliases for ^ and ^= */ - static int did_warn_op = 0, did_warn_assgn = 0; - - if (nextc() == '=') { - if (do_lint && ! did_warn_assgn) { - did_warn_assgn = 1; - warning("**= is not allowed by POSIX"); - } - yylval.nodetypeval = Node_assign_exp; - return ASSIGNOP; - } else { - pushback(); - if (do_lint && ! did_warn_op) { - did_warn_op = 1; - warning("** is not allowed by POSIX"); - } - return '^'; - } - } - pushback(); - return '*'; - - case '/': - if (want_assign) { - if (nextc() == '=') { - yylval.nodetypeval = Node_assign_quotient; - return ASSIGNOP; - } - pushback(); - } - return '/'; - - case '%': - if (nextc() == '=') { - yylval.nodetypeval = Node_assign_mod; - return ASSIGNOP; - } - pushback(); - return '%'; - - case '^': - { - static int did_warn_op = 0, did_warn_assgn = 0; - - if (nextc() == '=') { - - if (do_lint && ! did_warn_assgn) { - did_warn_assgn = 1; - warning("operator `^=' is not supported in old awk"); - } - yylval.nodetypeval = Node_assign_exp; - return ASSIGNOP; - } - pushback(); - if (do_lint && ! did_warn_op) { - did_warn_op = 1; - warning("operator `^' is not supported in old awk"); - } - return '^'; - } - - case '+': - if ((c = nextc()) == '=') { - yylval.nodetypeval = Node_assign_plus; - return ASSIGNOP; - } - if (c == '+') - return INCREMENT; - pushback(); - return '+'; - - case '!': - if ((c = nextc()) == '=') { - yylval.nodetypeval = Node_notequal; - return RELOP; - } - if (c == '~') { - yylval.nodetypeval = Node_nomatch; - want_assign = 0; - return MATCHOP; - } - pushback(); - return '!'; - - case '<': - if (nextc() == '=') { - yylval.nodetypeval = Node_leq; - return RELOP; - } - yylval.nodetypeval = Node_less; - pushback(); - return '<'; - - case '=': - if (nextc() == '=') { - yylval.nodetypeval = Node_equal; - return RELOP; - } - yylval.nodetypeval = Node_assign; - pushback(); - return ASSIGNOP; - - case '>': - if ((c = nextc()) == '=') { - yylval.nodetypeval = Node_geq; - return RELOP; - } else if (c == '>') { - yylval.nodetypeval = Node_redirect_append; - return APPEND_OP; - } - yylval.nodetypeval = Node_greater; - pushback(); - return '>'; - - case '~': - yylval.nodetypeval = Node_match; - want_assign = 0; - return MATCHOP; - - case '}': - /* - * Added did newline stuff. Easier than - * hacking the grammar - */ - if (did_newline) { - did_newline = 0; - return c; - } - did_newline++; - --lexptr; /* pick up } next time */ - return NEWLINE; - - case '"': - esc_seen = 0; - while ((c = nextc()) != '"') { - if (c == '\n') { - pushback(); - yyerror("unterminated string"); - } - if (c == '\\') { - c = nextc(); - if (c == '\n') { - sourceline++; - continue; - } - esc_seen = 1; - tokadd('\\'); - } - if (c == '\0') { - pushback(); - yyerror("unterminated string"); - } - tokadd(c); - } - yylval.nodeval = make_str_node(tokstart, - tok - tokstart, esc_seen ? SCAN : 0); - yylval.nodeval->flags |= PERM; - return YSTRING; - - case '-': - if ((c = nextc()) == '=') { - yylval.nodetypeval = Node_assign_minus; - return ASSIGNOP; - } - if (c == '-') - return DECREMENT; - pushback(); - return '-'; - - case '.': - c = nextc(); - pushback(); - if (!isdigit(c)) - return '.'; - else - c = '.'; /* FALL THROUGH */ - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - /* It's a number */ - for (;;) { - int gotnumber = 0; - - tokadd(c); - switch (c) { - case '.': - if (seen_point) { - gotnumber++; - break; - } - ++seen_point; - break; - case 'e': - case 'E': - if (seen_e) { - gotnumber++; - break; - } - ++seen_e; - if ((c = nextc()) == '-' || c == '+') - tokadd(c); - else - pushback(); - break; - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - break; - default: - gotnumber++; - } - if (gotnumber) - break; - c = nextc(); - } - pushback(); - yylval.nodeval = make_number(atof(tokstart)); - yylval.nodeval->flags |= PERM; - return YNUMBER; - - case '&': - if ((c = nextc()) == '&') { - yylval.nodetypeval = Node_and; - for (;;) { - c = nextc(); - if (c == '\0') - break; - if (c == '#') { - while ((c = nextc()) != '\n' && c != '\0') - continue; - if (c == '\0') - break; - } - if (c == '\n') - sourceline++; - if (! isspace(c)) { - pushback(); - break; - } - } - want_assign = 0; - return LEX_AND; - } - pushback(); - return '&'; - - case '|': - if ((c = nextc()) == '|') { - yylval.nodetypeval = Node_or; - for (;;) { - c = nextc(); - if (c == '\0') - break; - if (c == '#') { - while ((c = nextc()) != '\n' && c != '\0') - continue; - if (c == '\0') - break; - } - if (c == '\n') - sourceline++; - if (! isspace(c)) { - pushback(); - break; - } - } - want_assign = 0; - return LEX_OR; - } - pushback(); - return '|'; - } - - if (c != '_' && ! isalpha(c)) - yyerror("Invalid char '%c' in expression\n", c); - - /* it's some type of name-type-thing. Find its length */ - tok = tokstart; - while (is_identchar(c)) { - tokadd(c); - c = nextc(); - } - tokadd('\0'); - emalloc(tokkey, char *, tok - tokstart, "yylex"); - memcpy(tokkey, tokstart, tok - tokstart); - pushback(); - - /* See if it is a special token. */ - low = 0; - high = (sizeof (tokentab) / sizeof (tokentab[0])) - 1; - while (low <= high) { - int i/* , c */; - - mid = (low + high) / 2; - c = *tokstart - tokentab[mid].operator[0]; - i = c ? c : strcmp (tokstart, tokentab[mid].operator); - - if (i < 0) { /* token < mid */ - high = mid - 1; - } else if (i > 0) { /* token > mid */ - low = mid + 1; - } else { - if (do_lint) { - if (tokentab[mid].flags & GAWKX) - warning("%s() is a gawk extension", - tokentab[mid].operator); - if (tokentab[mid].flags & NOT_POSIX) - warning("POSIX does not allow %s", - tokentab[mid].operator); - if (tokentab[mid].flags & NOT_OLD) - warning("%s is not supported in old awk", - tokentab[mid].operator); - } - if ((do_unix && (tokentab[mid].flags & GAWKX)) - || (do_posix && (tokentab[mid].flags & NOT_POSIX))) - break; - if (tokentab[mid].class == LEX_BUILTIN - || tokentab[mid].class == LEX_LENGTH - ) - yylval.lval = mid; - else - yylval.nodetypeval = tokentab[mid].value; - - free(tokkey); - return tokentab[mid].class; - } - } - - yylval.sval = tokkey; - if (*lexptr == '(') - return FUNC_CALL; - else { - want_assign = 1; - return NAME; - } -} - -static NODE * -node_common(op) -NODETYPE op; -{ - register NODE *r; - - getnode(r); - r->type = op; - r->flags = MALLOC; - /* if lookahead is NL, lineno is 1 too high */ - if (lexeme && *lexeme == '\n') - r->source_line = sourceline - 1; - else - r->source_line = sourceline; - r->source_file = source; - return r; -} - -/* - * This allocates a node with defined lnode and rnode. - */ -NODE * -node(left, op, right) -NODE *left, *right; -NODETYPE op; -{ - register NODE *r; - - r = node_common(op); - r->lnode = left; - r->rnode = right; - return r; -} - -/* - * This allocates a node with defined subnode and proc for builtin functions - * Checks for arg. count and supplies defaults where possible. - */ -static NODE * -snode(subn, op, idx) -NODETYPE op; -int idx; -NODE *subn; -{ - register NODE *r; - register NODE *n; - int nexp = 0; - int args_allowed; - - r = node_common(op); - - /* traverse expression list to see how many args. given */ - for (n= subn; n; n= n->rnode) { - nexp++; - if (nexp > 3) - break; - } - - /* check against how many args. are allowed for this builtin */ - args_allowed = tokentab[idx].flags & ARGS; - if (args_allowed && !(args_allowed & A(nexp))) - fatal("%s() cannot have %d argument%c", - tokentab[idx].operator, nexp, nexp == 1 ? ' ' : 's'); - - r->proc = tokentab[idx].ptr; - - /* special case processing for a few builtins */ - if (nexp == 0 && r->proc == do_length) { - subn = node(node(make_number(0.0),Node_field_spec,(NODE *)NULL), - Node_expression_list, - (NODE *) NULL); - } else if (r->proc == do_match) { - if (subn->rnode->lnode->type != Node_regex) - subn->rnode->lnode = mk_rexp(subn->rnode->lnode); - } else if (r->proc == do_sub || r->proc == do_gsub) { - if (subn->lnode->type != Node_regex) - subn->lnode = mk_rexp(subn->lnode); - if (nexp == 2) - append_right(subn, node(node(make_number(0.0), - Node_field_spec, - (NODE *) NULL), - Node_expression_list, - (NODE *) NULL)); - else if (do_lint && subn->rnode->rnode->lnode->type == Node_val) - warning("string literal as last arg of substitute"); - } else if (r->proc == do_split) { - if (nexp == 2) - append_right(subn, - node(FS_node, Node_expression_list, (NODE *) NULL)); - n = subn->rnode->rnode->lnode; - if (n->type != Node_regex) - subn->rnode->rnode->lnode = mk_rexp(n); - if (nexp == 2) - subn->rnode->rnode->lnode->re_flags |= FS_DFLT; - } - - r->subnode = subn; - return r; -} - -/* - * This allocates a Node_line_range node with defined condpair and - * zeroes the trigger word to avoid the temptation of assuming that calling - * 'node( foo, Node_line_range, 0)' will properly initialize 'triggered'. - */ -/* Otherwise like node() */ -static NODE * -mkrangenode(cpair) -NODE *cpair; -{ - register NODE *r; - - getnode(r); - r->type = Node_line_range; - r->condpair = cpair; - r->triggered = 0; - return r; -} - -/* Build a for loop */ -static NODE * -make_for_loop(init, cond, incr) -NODE *init, *cond, *incr; -{ - register FOR_LOOP_HEADER *r; - NODE *n; - - emalloc(r, FOR_LOOP_HEADER *, sizeof(FOR_LOOP_HEADER), "make_for_loop"); - getnode(n); - n->type = Node_illegal; - r->init = init; - r->cond = cond; - r->incr = incr; - n->sub.nodep.r.hd = r; - return n; -} - -/* - * Install a name in the symbol table, even if it is already there. - * Caller must check against redefinition if that is desired. - */ -NODE * -install(name, value) -char *name; -NODE *value; -{ - register NODE *hp; - register size_t len; - register int bucket; - - len = strlen(name); - bucket = hash(name, len, (unsigned long) HASHSIZE); - getnode(hp); - hp->type = Node_hashnode; - hp->hnext = variables[bucket]; - variables[bucket] = hp; - hp->hlength = len; - hp->hvalue = value; - hp->hname = name; - hp->hvalue->vname = name; - return hp->hvalue; -} - -/* find the most recent hash node for name installed by install */ -NODE * -lookup(name) -const char *name; -{ - register NODE *bucket; - register size_t len; - - len = strlen(name); - bucket = variables[hash(name, len, (unsigned long) HASHSIZE)]; - while (bucket) { - if (bucket->hlength == len && STREQN(bucket->hname, name, len)) - return bucket->hvalue; - bucket = bucket->hnext; - } - return NULL; -} - -/* - * Add new to the rightmost branch of LIST. This uses n^2 time, so we make - * a simple attempt at optimizing it. - */ -static NODE * -append_right(list, new) -NODE *list, *new; -{ - register NODE *oldlist; - static NODE *savefront = NULL, *savetail = NULL; - - oldlist = list; - if (savefront == oldlist) { - savetail = savetail->rnode = new; - return oldlist; - } else - savefront = oldlist; - while (list->rnode != NULL) - list = list->rnode; - savetail = list->rnode = new; - return oldlist; -} - -/* - * check if name is already installed; if so, it had better have Null value, - * in which case def is added as the value. Otherwise, install name with def - * as value. - */ -static void -func_install(params, def) -NODE *params; -NODE *def; -{ - NODE *r; - - pop_params(params->rnode); - pop_var(params, 0); - r = lookup(params->param); - if (r != NULL) { - fatal("function name `%s' previously defined", params->param); - } else - (void) install(params->param, node(params, Node_func, def)); -} - -static void -pop_var(np, freeit) -NODE *np; -int freeit; -{ - register NODE *bucket, **save; - register size_t len; - char *name; - - name = np->param; - len = strlen(name); - save = &(variables[hash(name, len, (unsigned long) HASHSIZE)]); - for (bucket = *save; bucket; bucket = bucket->hnext) { - if (len == bucket->hlength && STREQN(bucket->hname, name, len)) { - *save = bucket->hnext; - freenode(bucket); - if (freeit) - free(np->param); - return; - } - save = &(bucket->hnext); - } -} - -static void -pop_params(params) -NODE *params; -{ - register NODE *np; - - for (np = params; np != NULL; np = np->rnode) - pop_var(np, 1); -} - -static NODE * -make_param(name) -char *name; -{ - NODE *r; - - getnode(r); - r->type = Node_param_list; - r->rnode = NULL; - r->param = name; - r->param_cnt = param_counter++; - return (install(name, r)); -} - -/* Name points to a variable name. Make sure its in the symbol table */ -NODE * -variable(name, can_free) -char *name; -int can_free; -{ - register NODE *r; - static int env_loaded = 0; - - if (!env_loaded && STREQ(name, "ENVIRON")) { - load_environ(); - env_loaded = 1; - } - if ((r = lookup(name)) == NULL) - r = install(name, node(Nnull_string, Node_var, (NODE *) NULL)); - else if (can_free) - free(name); - return r; -} - -static NODE * -mk_rexp(exp) -NODE *exp; -{ - if (exp->type == Node_regex) - return exp; - else { - NODE *n; - - getnode(n); - n->type = Node_regex; - n->re_exp = exp; - n->re_text = NULL; - n->re_reg = NULL; - n->re_flags = 0; - n->re_cnt = 1; - return n; - } -} diff --git a/gnu/usr.bin/awk/builtin.c b/gnu/usr.bin/awk/builtin.c deleted file mode 100644 index 00c52e7..0000000 --- a/gnu/usr.bin/awk/builtin.c +++ /dev/null @@ -1,1239 +0,0 @@ -/* - * builtin.c - Builtin functions and various utility procedures - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - - -#include "awk.h" - -#ifndef SRANDOM_PROTO -extern void srandom P((unsigned int seed)); -#endif -#if !defined(linux) && !defined(__FreeBSD__) -extern char *initstate P((unsigned seed, char *state, int n)); -extern char *setstate P((char *state)); -extern long random P((void)); -#endif - -extern NODE **fields_arr; -extern int output_is_tty; - -static NODE *sub_common P((NODE *tree, int global)); -NODE *format_tree P((const char *, int, NODE *)); - -#ifdef _CRAY -/* Work around a problem in conversion of doubles to exact integers. */ -#include <float.h> -#define Floor(n) floor((n) * (1.0 + DBL_EPSILON)) -#define Ceil(n) ceil((n) * (1.0 + DBL_EPSILON)) - -/* Force the standard C compiler to use the library math functions. */ -extern double exp(double); -double (*Exp)() = exp; -#define exp(x) (*Exp)(x) -extern double log(double); -double (*Log)() = log; -#define log(x) (*Log)(x) -#else -#define Floor(n) floor(n) -#define Ceil(n) ceil(n) -#endif - -#define DEFAULT_G_PRECISION 6 - -#ifdef GFMT_WORKAROUND -/* semi-temporary hack, mostly to gracefully handle VMS */ -static void sgfmt P((char *buf, const char *format, int alt, - int fwidth, int precision, double value)); -#endif /* GFMT_WORKAROUND */ - -/* - * On the alpha, LONG_MAX is too big for doing rand(). - * On the Cray (Y-MP, anyway), ints and longs are 64 bits, but - * random() does things in terms of 32 bits. So we have to chop - * LONG_MAX down. - */ -#if (defined(__alpha) && defined(__osf__)) || defined(_CRAY) -#define GAWK_RANDOM_MAX (LONG_MAX & 0x7fffffff) -#else -#define GAWK_RANDOM_MAX LONG_MAX -#endif - -static void efwrite P((const void *ptr, size_t size, size_t count, FILE *fp, - const char *from, struct redirect *rp,int flush)); - -static void -efwrite(ptr, size, count, fp, from, rp, flush) -const void *ptr; -size_t size, count; -FILE *fp; -const char *from; -struct redirect *rp; -int flush; -{ - errno = 0; - if (fwrite(ptr, size, count, fp) != count) - goto wrerror; - if (flush - && ((fp == stdout && output_is_tty) - || (rp && (rp->flag & RED_NOBUF)))) { - fflush(fp); - if (ferror(fp)) - goto wrerror; - } - return; - - wrerror: - fatal("%s to \"%s\" failed (%s)", from, - rp ? rp->value : "standard output", - errno ? strerror(errno) : "reason unknown"); -} - -/* Builtin functions */ -NODE * -do_exp(tree) -NODE *tree; -{ - NODE *tmp; - double d, res; -#ifndef exp - double exp P((double)); -#endif - - tmp= tree_eval(tree->lnode); - d = force_number(tmp); - free_temp(tmp); - errno = 0; - res = exp(d); - if (errno == ERANGE) - warning("exp argument %g is out of range", d); - return tmp_number((AWKNUM) res); -} - -NODE * -do_index(tree) -NODE *tree; -{ - NODE *s1, *s2; - register char *p1, *p2; - register size_t l1, l2; - long ret; - - - s1 = tree_eval(tree->lnode); - s2 = tree_eval(tree->rnode->lnode); - force_string(s1); - force_string(s2); - p1 = s1->stptr; - p2 = s2->stptr; - l1 = s1->stlen; - l2 = s2->stlen; - ret = 0; - if (IGNORECASE) { - while (l1) { - if (l2 > l1) - break; - if (casetable[(int)*p1] == casetable[(int)*p2] - && (l2 == 1 || strncasecmp(p1, p2, l2) == 0)) { - ret = 1 + s1->stlen - l1; - break; - } - l1--; - p1++; - } - } else { - while (l1) { - if (l2 > l1) - break; - if (*p1 == *p2 - && (l2 == 1 || STREQN(p1, p2, l2))) { - ret = 1 + s1->stlen - l1; - break; - } - l1--; - p1++; - } - } - free_temp(s1); - free_temp(s2); - return tmp_number((AWKNUM) ret); -} - -double -double_to_int(d) -double d; -{ - double floor P((double)); - double ceil P((double)); - - if (d >= 0) - d = Floor(d); - else - d = Ceil(d); - return d; -} - -NODE * -do_int(tree) -NODE *tree; -{ - NODE *tmp; - double d; - - tmp = tree_eval(tree->lnode); - d = force_number(tmp); - d = double_to_int(d); - free_temp(tmp); - return tmp_number((AWKNUM) d); -} - -NODE * -do_length(tree) -NODE *tree; -{ - NODE *tmp; - size_t len; - - tmp = tree_eval(tree->lnode); - len = force_string(tmp)->stlen; - free_temp(tmp); - return tmp_number((AWKNUM) len); -} - -NODE * -do_log(tree) -NODE *tree; -{ - NODE *tmp; -#ifndef log - double log P((double)); -#endif - double d, arg; - - tmp = tree_eval(tree->lnode); - arg = (double) force_number(tmp); - if (arg < 0.0) - warning("log called with negative argument %g", arg); - d = log(arg); - free_temp(tmp); - return tmp_number((AWKNUM) d); -} - -/* - * format_tree() formats nodes of a tree, starting with a left node, - * and accordingly to a fmt_string providing a format like in - * printf family from C library. Returns a string node which value - * is a formatted string. Called by sprintf function. - * - * It is one of the uglier parts of gawk. Thanks to Michal Jaegermann - * for taming this beast and making it compatible with ANSI C. - */ - -NODE * -format_tree(fmt_string, n0, carg) -const char *fmt_string; -int n0; -register NODE *carg; -{ -/* copy 'l' bytes from 's' to 'obufout' checking for space in the process */ -/* difference of pointers should be of ptrdiff_t type, but let us be kind */ -#define bchunk(s,l) if(l) {\ - while((l)>ofre) {\ - long olen = obufout - obuf;\ - erealloc(obuf, char *, osiz*2, "format_tree");\ - ofre+=osiz;\ - osiz*=2;\ - obufout = obuf + olen;\ - }\ - memcpy(obufout,s,(size_t)(l));\ - obufout+=(l);\ - ofre-=(l);\ - } -/* copy one byte from 's' to 'obufout' checking for space in the process */ -#define bchunk_one(s) {\ - if(ofre <= 0) {\ - long olen = obufout - obuf;\ - erealloc(obuf, char *, osiz*2, "format_tree");\ - ofre+=osiz;\ - osiz*=2;\ - obufout = obuf + olen;\ - }\ - *obufout++ = *s;\ - --ofre;\ - } - - /* Is there space for something L big in the buffer? */ -#define chksize(l) if((l)>ofre) {\ - long olen = obufout - obuf;\ - erealloc(obuf, char *, osiz*2, "format_tree");\ - obufout = obuf + olen;\ - ofre+=osiz;\ - osiz*=2;\ - } - - /* - * Get the next arg to be formatted. If we've run out of args, - * return "" (Null string) - */ -#define parse_next_arg() {\ - if(!carg) { toofew = 1; break; }\ - else {\ - arg=tree_eval(carg->lnode);\ - carg=carg->rnode;\ - }\ - } - - NODE *r; - int toofew = 0; - char *obuf, *obufout; - size_t osiz, ofre; - char *chbuf; - const char *s0, *s1; - int cs1; - NODE *arg; - long fw, prec; - int lj, alt, big, have_prec; - long *cur; - long val; -#ifdef sun386 /* Can't cast unsigned (int/long) from ptr->value */ - long tmp_uval; /* on 386i 4.0.1 C compiler -- it just hangs */ -#endif - unsigned long uval; - int sgn; - int base = 0; - char cpbuf[30]; /* if we have numbers bigger than 30 */ - char *cend = &cpbuf[30];/* chars, we lose, but seems unlikely */ - char *cp; - char *fill; - double tmpval; - char signchar = 0; - size_t len; - static char sp[] = " "; - static char zero_string[] = "0"; - static char lchbuf[] = "0123456789abcdef"; - static char Uchbuf[] = "0123456789ABCDEF"; - - emalloc(obuf, char *, 120, "format_tree"); - obufout = obuf; - osiz = 120; - ofre = osiz - 1; - - s0 = s1 = fmt_string; - while (n0-- > 0) { - if (*s1 != '%') { - s1++; - continue; - } - bchunk(s0, s1 - s0); - s0 = s1; - cur = &fw; - fw = 0; - prec = 0; - have_prec = 0; - lj = alt = big = 0; - fill = sp; - cp = cend; - chbuf = lchbuf; - s1++; - -retry: - --n0; - switch (cs1 = *s1++) { - case (-1): /* dummy case to allow for checking */ -check_pos: - if (cur != &fw) - break; /* reject as a valid format */ - goto retry; - case '%': - bchunk_one("%"); - s0 = s1; - break; - - case '0': - if (lj) - goto retry; - if (cur == &fw) - fill = zero_string; /* FALL through */ - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - if (cur == 0) - /* goto lose; */ - break; - if (prec >= 0) - *cur = cs1 - '0'; - /* with a negative precision *cur is already set */ - /* to -1, so it will remain negative, but we have */ - /* to "eat" precision digits in any case */ - while (n0 > 0 && *s1 >= '0' && *s1 <= '9') { - --n0; - *cur = *cur * 10 + *s1++ - '0'; - } - if (prec < 0) /* negative precision is discarded */ - have_prec = 0; - if (cur == &prec) - cur = 0; - goto retry; - case '*': - if (cur == 0) - /* goto lose; */ - break; - parse_next_arg(); - *cur = force_number(arg); - free_temp(arg); - if (cur == &prec) - cur = 0; - goto retry; - case ' ': /* print ' ' or '-' */ - /* 'space' flag is ignored */ - /* if '+' already present */ - if (signchar != 0) - goto check_pos; - /* FALL THROUGH */ - case '+': /* print '+' or '-' */ - signchar = cs1; - goto check_pos; - case '-': - if (prec < 0) - break; - if (cur == &prec) { - prec = -1; - goto retry; - } - fill = sp; /* if left justified then other */ - lj++; /* filling is ignored */ - goto check_pos; - case '.': - if (cur != &fw) - break; - cur = ≺ - have_prec++; - goto retry; - case '#': - alt++; - goto check_pos; - case 'l': - if (big) - break; - big++; - goto check_pos; - case 'c': - parse_next_arg(); - if (arg->flags & NUMBER) { -#ifdef sun386 - tmp_uval = arg->numbr; - uval= (unsigned long) tmp_uval; -#else - uval = (unsigned long) arg->numbr; -#endif - cpbuf[0] = uval; - prec = 1; - cp = cpbuf; - goto pr_tail; - } - if (have_prec == 0) - prec = 1; - else if (prec > arg->stlen) - prec = arg->stlen; - cp = arg->stptr; - goto pr_tail; - case 's': - parse_next_arg(); - arg = force_string(arg); - if (have_prec == 0 || prec > arg->stlen) - prec = arg->stlen; - cp = arg->stptr; - goto pr_tail; - case 'd': - case 'i': - parse_next_arg(); - tmpval = force_number(arg); - if (tmpval > LONG_MAX || tmpval < LONG_MIN) { - /* out of range - emergency use of %g format */ - cs1 = 'g'; - goto format_float; - } - val = (long) tmpval; - - if (val < 0) { - sgn = 1; - if (val > LONG_MIN) - uval = (unsigned long) -val; - else - uval = (unsigned long)(-(LONG_MIN + 1)) - + (unsigned long)1; - } else { - sgn = 0; - uval = (unsigned long) val; - } - do { - *--cp = (char) ('0' + uval % 10); - uval /= 10; - } while (uval); - if (sgn) - *--cp = '-'; - else if (signchar) - *--cp = signchar; - if (have_prec != 0) /* ignore '0' flag if */ - fill = sp; /* precision given */ - if (prec > fw) - fw = prec; - prec = cend - cp; - if (fw > prec && ! lj && fill != sp - && (*cp == '-' || signchar)) { - bchunk_one(cp); - cp++; - prec--; - fw--; - } - goto pr_tail; - case 'X': - chbuf = Uchbuf; /* FALL THROUGH */ - case 'x': - base += 6; /* FALL THROUGH */ - case 'u': - base += 2; /* FALL THROUGH */ - case 'o': - base += 8; - parse_next_arg(); - tmpval = force_number(arg); - if (tmpval > ULONG_MAX || tmpval < LONG_MIN) { - /* out of range - emergency use of %g format */ - cs1 = 'g'; - goto format_float; - } - uval = (unsigned long)tmpval; - if (have_prec != 0) /* ignore '0' flag if */ - fill = sp; /* precision given */ - do { - *--cp = chbuf[uval % base]; - uval /= base; - } while (uval); - if (alt) { - if (base == 16) { - *--cp = cs1; - *--cp = '0'; - if (fill != sp) { - bchunk(cp, 2); - cp += 2; - fw -= 2; - } - } else if (base == 8) - *--cp = '0'; - } - base = 0; - prec = cend - cp; - pr_tail: - if (! lj) { - while (fw > prec) { - bchunk_one(fill); - fw--; - } - } - bchunk(cp, (int) prec); - while (fw > prec) { - bchunk_one(fill); - fw--; - } - s0 = s1; - free_temp(arg); - break; - case 'g': - case 'G': - case 'e': - case 'f': - case 'E': - parse_next_arg(); - tmpval = force_number(arg); - format_float: - free_temp(arg); - if (have_prec == 0) - prec = DEFAULT_G_PRECISION; - chksize(fw + prec + 9); /* 9==slop */ - - cp = cpbuf; - *cp++ = '%'; - if (lj) - *cp++ = '-'; - if (signchar) - *cp++ = signchar; - if (alt) - *cp++ = '#'; - if (fill != sp) - *cp++ = '0'; - cp = strcpy(cp, "*.*") + 3; - *cp++ = cs1; - *cp = '\0'; -#ifndef GFMT_WORKAROUND - (void) sprintf(obufout, cpbuf, - (int) fw, (int) prec, (double) tmpval); -#else /* GFMT_WORKAROUND */ - if (cs1 == 'g' || cs1 == 'G') - sgfmt(obufout, cpbuf, (int) alt, - (int) fw, (int) prec, (double) tmpval); - else - (void) sprintf(obufout, cpbuf, - (int) fw, (int) prec, (double) tmpval); -#endif /* GFMT_WORKAROUND */ - len = strlen(obufout); - ofre -= len; - obufout += len; - s0 = s1; - break; - default: - break; - } - if (toofew) - fatal("%s\n\t%s\n\t%*s%s", - "not enough arguments to satisfy format string", - fmt_string, s1 - fmt_string - 2, "", - "^ ran out for this one" - ); - } - if (do_lint && carg != NULL) - warning("too many arguments supplied for format string"); - bchunk(s0, s1 - s0); - r = make_str_node(obuf, obufout - obuf, ALREADY_MALLOCED); - r->flags |= TEMP; - return r; -} - -NODE * -do_sprintf(tree) -NODE *tree; -{ - NODE *r; - NODE *sfmt = force_string(tree_eval(tree->lnode)); - - r = format_tree(sfmt->stptr, sfmt->stlen, tree->rnode); - free_temp(sfmt); - return r; -} - - -void -do_printf(tree) -register NODE *tree; -{ - struct redirect *rp = NULL; - register FILE *fp; - - if (tree->rnode) { - int errflg; /* not used, sigh */ - - rp = redirect(tree->rnode, &errflg); - if (rp) { - fp = rp->fp; - if (!fp) - return; - } else - return; - } else - fp = stdout; - tree = do_sprintf(tree->lnode); - efwrite(tree->stptr, sizeof(char), tree->stlen, fp, "printf", rp , 1); - free_temp(tree); -} - -NODE * -do_sqrt(tree) -NODE *tree; -{ - NODE *tmp; - double arg; - extern double sqrt P((double)); - - tmp = tree_eval(tree->lnode); - arg = (double) force_number(tmp); - free_temp(tmp); - if (arg < 0.0) - warning("sqrt called with negative argument %g", arg); - return tmp_number((AWKNUM) sqrt(arg)); -} - -NODE * -do_substr(tree) -NODE *tree; -{ - NODE *t1, *t2, *t3; - NODE *r; - register int indx; - size_t length; - int is_long; - - t1 = tree_eval(tree->lnode); - t2 = tree_eval(tree->rnode->lnode); - if (tree->rnode->rnode == NULL) /* third arg. missing */ - length = t1->stlen; - else { - t3 = tree_eval(tree->rnode->rnode->lnode); - length = (size_t) force_number(t3); - free_temp(t3); - } - indx = (int) force_number(t2) - 1; - free_temp(t2); - t1 = force_string(t1); - if (indx < 0) - indx = 0; - if (indx >= t1->stlen || (long) length <= 0) { - free_temp(t1); - return Nnull_string; - } - if ((is_long = (indx + length > t1->stlen)) || LONG_MAX - indx < length) { - length = t1->stlen - indx; - if (do_lint && is_long) - warning("substr: length %d at position %d exceeds length of first argument", - length, indx+1); - } - r = tmp_string(t1->stptr + indx, length); - free_temp(t1); - return r; -} - -NODE * -do_strftime(tree) -NODE *tree; -{ - NODE *t1, *t2; - struct tm *tm; - time_t fclock; - char buf[100]; - - t1 = force_string(tree_eval(tree->lnode)); - - if (tree->rnode == NULL) /* second arg. missing, default */ - (void) time(&fclock); - else { - t2 = tree_eval(tree->rnode->lnode); - fclock = (time_t) force_number(t2); - free_temp(t2); - } - tm = localtime(&fclock); - - return tmp_string(buf, strftime(buf, 100, t1->stptr, tm)); -} - -NODE * -do_systime(tree) -NODE *tree; -{ - time_t lclock; - - (void) time(&lclock); - return tmp_number((AWKNUM) lclock); -} - -NODE * -do_system(tree) -NODE *tree; -{ - NODE *tmp; - int ret = 0; - char *cmd; - char save; - - (void) flush_io (); /* so output is synchronous with gawk's */ - tmp = tree_eval(tree->lnode); - cmd = force_string(tmp)->stptr; - - if (cmd && *cmd) { - /* insure arg to system is zero-terminated */ - - /* - * From: David Trueman <emory!cs.dal.ca!david> - * To: arnold@cc.gatech.edu (Arnold Robbins) - * Date: Wed, 3 Nov 1993 12:49:41 -0400 - * - * It may not be necessary to save the character, but - * I'm not sure. It would normally be the field - * separator. If the parse has not yet gone beyond - * that, it could mess up (although I doubt it). If - * FIELDWIDTHS is being used, it might be the first - * character of the next field. Unless someone wants - * to check it out exhaustively, I suggest saving it - * for now... - */ - save = cmd[tmp->stlen]; - cmd[tmp->stlen] = '\0'; - - ret = system(cmd); - ret = (ret >> 8) & 0xff; - - cmd[tmp->stlen] = save; - } - free_temp(tmp); - return tmp_number((AWKNUM) ret); -} - -extern NODE **fmt_list; /* declared in eval.c */ - -void -do_print(tree) -register NODE *tree; -{ - register NODE *t1; - struct redirect *rp = NULL; - register FILE *fp; - register char *s; - - if (tree->rnode) { - int errflg; /* not used, sigh */ - - rp = redirect(tree->rnode, &errflg); - if (rp) { - fp = rp->fp; - if (!fp) - return; - } else - return; - } else - fp = stdout; - tree = tree->lnode; - while (tree) { - t1 = tree_eval(tree->lnode); - if (t1->flags & NUMBER) { - if (OFMTidx == CONVFMTidx) - (void) force_string(t1); - else { -#ifndef GFMT_WORKAROUND - char buf[100]; - - (void) sprintf(buf, OFMT, t1->numbr); - free_temp(t1); - t1 = tmp_string(buf, strlen(buf)); -#else /* GFMT_WORKAROUND */ - free_temp(t1); - t1 = format_tree(OFMT, - fmt_list[OFMTidx]->stlen, - tree); -#endif /* GFMT_WORKAROUND */ - } - } - efwrite(t1->stptr, sizeof(char), t1->stlen, fp, "print", rp, 0); - free_temp(t1); - tree = tree->rnode; - if (tree) { - s = OFS; - if (OFSlen) - efwrite(s, sizeof(char), (size_t)OFSlen, - fp, "print", rp, 0); - } - } - s = ORS; - if (ORSlen) - efwrite(s, sizeof(char), (size_t)ORSlen, fp, "print", rp, 1); -} - -NODE * -do_tolower(tree) -NODE *tree; -{ - NODE *t1, *t2; - register char *cp, *cp2; - - t1 = tree_eval(tree->lnode); - t1 = force_string(t1); - t2 = tmp_string(t1->stptr, t1->stlen); - for (cp = t2->stptr, cp2 = t2->stptr + t2->stlen; cp < cp2; cp++) - if (isupper(*cp)) - *cp = tolower(*cp); - free_temp(t1); - return t2; -} - -NODE * -do_toupper(tree) -NODE *tree; -{ - NODE *t1, *t2; - register char *cp; - - t1 = tree_eval(tree->lnode); - t1 = force_string(t1); - t2 = tmp_string(t1->stptr, t1->stlen); - for (cp = t2->stptr; cp < t2->stptr + t2->stlen; cp++) - if (islower(*cp)) - *cp = toupper(*cp); - free_temp(t1); - return t2; -} - -NODE * -do_atan2(tree) -NODE *tree; -{ - NODE *t1, *t2; - extern double atan2 P((double, double)); - double d1, d2; - - t1 = tree_eval(tree->lnode); - t2 = tree_eval(tree->rnode->lnode); - d1 = force_number(t1); - d2 = force_number(t2); - free_temp(t1); - free_temp(t2); - return tmp_number((AWKNUM) atan2(d1, d2)); -} - -NODE * -do_sin(tree) -NODE *tree; -{ - NODE *tmp; - extern double sin P((double)); - double d; - - tmp = tree_eval(tree->lnode); - d = sin((double)force_number(tmp)); - free_temp(tmp); - return tmp_number((AWKNUM) d); -} - -NODE * -do_cos(tree) -NODE *tree; -{ - NODE *tmp; - extern double cos P((double)); - double d; - - tmp = tree_eval(tree->lnode); - d = cos((double)force_number(tmp)); - free_temp(tmp); - return tmp_number((AWKNUM) d); -} - -static int firstrand = 1; -static char state[512]; - -/* ARGSUSED */ -NODE * -do_rand(tree) -NODE *tree; -{ - if (firstrand) { - (void) initstate((unsigned long) 1, state, sizeof state); - srandom(1); - firstrand = 0; - } - return tmp_number((AWKNUM) random() / GAWK_RANDOM_MAX); -} - -NODE * -do_srand(tree) -NODE *tree; -{ - NODE *tmp; - static long save_seed = 0; - long ret = save_seed; /* SVR4 awk srand returns previous seed */ - - if (firstrand) - (void) initstate((unsigned long) 1, state, sizeof state); - else - (void) setstate(state); - - if (!tree) - srandom((unsigned long) (save_seed = (long) time((time_t *) 0))); - else { - tmp = tree_eval(tree->lnode); - srandom((unsigned long) (save_seed = (long) force_number(tmp))); - free_temp(tmp); - } - firstrand = 0; - return tmp_number((AWKNUM) ret); -} - -NODE * -do_match(tree) -NODE *tree; -{ - NODE *t1; - int rstart; - AWKNUM rlength; - Regexp *rp; - - t1 = force_string(tree_eval(tree->lnode)); - tree = tree->rnode->lnode; - rp = re_update(tree); - rstart = research(rp, t1->stptr, 0, t1->stlen, 1); - if (rstart >= 0) { /* match succeded */ - rstart++; /* 1-based indexing */ - rlength = REEND(rp, t1->stptr) - RESTART(rp, t1->stptr); - } else { /* match failed */ - rstart = 0; - rlength = -1.0; - } - free_temp(t1); - unref(RSTART_node->var_value); - RSTART_node->var_value = make_number((AWKNUM) rstart); - unref(RLENGTH_node->var_value); - RLENGTH_node->var_value = make_number(rlength); - return tmp_number((AWKNUM) rstart); -} - -static NODE * -sub_common(tree, global) -NODE *tree; -int global; -{ - register char *scan; - register char *bp, *cp; - char *buf; - size_t buflen; - register char *matchend; - register size_t len; - char *matchstart; - char *text; - size_t textlen; - char *repl; - char *replend; - size_t repllen; - int sofar; - int ampersands; - int matches = 0; - Regexp *rp; - NODE *s; /* subst. pattern */ - NODE *t; /* string to make sub. in; $0 if none given */ - NODE *tmp; - NODE **lhs = &tree; /* value not used -- just different from NULL */ - int priv = 0; - Func_ptr after_assign = NULL; - - tmp = tree->lnode; - rp = re_update(tmp); - - tree = tree->rnode; - s = tree->lnode; - - tree = tree->rnode; - tmp = tree->lnode; - t = force_string(tree_eval(tmp)); - - /* do the search early to avoid work on non-match */ - if (research(rp, t->stptr, 0, t->stlen, 1) == -1 || - RESTART(rp, t->stptr) > t->stlen) { - free_temp(t); - return tmp_number((AWKNUM) 0.0); - } - - if (tmp->type == Node_val) - lhs = NULL; - else - lhs = get_lhs(tmp, &after_assign); - t->flags |= STRING; - /* - * create a private copy of the string - */ - if (t->stref > 1 || (t->flags & PERM)) { - unsigned int saveflags; - - saveflags = t->flags; - t->flags &= ~MALLOC; - tmp = dupnode(t); - t->flags = saveflags; - t = tmp; - priv = 1; - } - text = t->stptr; - textlen = t->stlen; - buflen = textlen + 2; - - s = force_string(tree_eval(s)); - repl = s->stptr; - replend = repl + s->stlen; - repllen = replend - repl; - emalloc(buf, char *, buflen + 2, "do_sub"); - buf[buflen] = '\0'; - buf[buflen + 1] = '\0'; - ampersands = 0; - for (scan = repl; scan < replend; scan++) { - if (*scan == '&') { - repllen--; - ampersands++; - } else if (*scan == '\\' && *(scan+1) == '&') { - repllen--; - scan++; - } - } - - bp = buf; - for (;;) { - matches++; - matchstart = t->stptr + RESTART(rp, t->stptr); - matchend = t->stptr + REEND(rp, t->stptr); - - /* - * create the result, copying in parts of the original - * string - */ - len = matchstart - text + repllen - + ampersands * (matchend - matchstart); - sofar = bp - buf; - while ((long)(buflen - sofar - len - 1) < 0) { - buflen *= 2; - erealloc(buf, char *, buflen, "do_sub"); - bp = buf + sofar; - } - for (scan = text; scan < matchstart; scan++) - *bp++ = *scan; - for (scan = repl; scan < replend; scan++) - if (*scan == '&') - for (cp = matchstart; cp < matchend; cp++) - *bp++ = *cp; - else if (*scan == '\\' && *(scan+1) == '&') { - scan++; - *bp++ = *scan; - } else - *bp++ = *scan; - - /* catch the case of gsub(//, "blah", whatever), i.e. empty regexp */ - if (global && matchstart == matchend && matchend < text + textlen) { - *bp++ = *matchend; - matchend++; - } - textlen = text + textlen - matchend; - text = matchend; - if (!global || (long)textlen <= 0 || - research(rp, t->stptr, text-t->stptr, textlen, 1) == -1) - break; - } - sofar = bp - buf; - if (buflen - sofar - textlen - 1) { - buflen = sofar + textlen + 2; - erealloc(buf, char *, buflen, "do_sub"); - bp = buf + sofar; - } - for (scan = matchend; scan < text + textlen; scan++) - *bp++ = *scan; - *bp = '\0'; - textlen = bp - buf; - free(t->stptr); - t->stptr = buf; - t->stlen = textlen; - - free_temp(s); - if (matches > 0 && lhs) { - if (priv) { - unref(*lhs); - *lhs = t; - } - if (after_assign) - (*after_assign)(); - t->flags &= ~(NUM|NUMBER); - } - return tmp_number((AWKNUM) matches); -} - -NODE * -do_gsub(tree) -NODE *tree; -{ - return sub_common(tree, 1); -} - -NODE * -do_sub(tree) -NODE *tree; -{ - return sub_common(tree, 0); -} - -#ifdef GFMT_WORKAROUND -/* - * printf's %g format [can't rely on gcvt()] - * caveat: don't use as argument to *printf()! - * 'format' string HAS to be of "<flags>*.*g" kind, or we bomb! - */ -static void -sgfmt(buf, format, alt, fwidth, prec, g) -char *buf; /* return buffer; assumed big enough to hold result */ -const char *format; -int alt; /* use alternate form flag */ -int fwidth; /* field width in a format */ -int prec; /* indicates desired significant digits, not decimal places */ -double g; /* value to format */ -{ - char dform[40]; - register char *gpos; - register char *d, *e, *p; - int again = 0; - - strncpy(dform, format, sizeof dform - 1); - dform[sizeof dform - 1] = '\0'; - gpos = strrchr(dform, '.'); - - if (g == 0.0 && alt == 0) { /* easy special case */ - *gpos++ = 'd'; - *gpos = '\0'; - (void) sprintf(buf, dform, fwidth, 0); - return; - } - gpos += 2; /* advance to location of 'g' in the format */ - - if (prec <= 0) /* negative precision is ignored */ - prec = (prec < 0 ? DEFAULT_G_PRECISION : 1); - - if (*gpos == 'G') - again = 1; - /* start with 'e' format (it'll provide nice exponent) */ - *gpos = 'e'; - prec -= 1; - (void) sprintf(buf, dform, fwidth, prec, g); - if ((e = strrchr(buf, 'e')) != NULL) { /* find exponent */ - int exp = atoi(e+1); /* fetch exponent */ - if (exp >= -4 && exp <= prec) { /* per K&R2, B1.2 */ - /* switch to 'f' format and re-do */ - *gpos = 'f'; - prec -= exp; /* decimal precision */ - (void) sprintf(buf, dform, fwidth, prec, g); - e = buf + strlen(buf); - while (*--e == ' ') - continue; - e += 1; - } - else if (again != 0) - *gpos = 'E'; - - /* if 'alt' in force, then trailing zeros are not removed */ - if (alt == 0 && (d = strrchr(buf, '.')) != NULL) { - /* throw away an excess of precision */ - for (p = e; p > d && *--p == '0'; ) - prec -= 1; - if (d == p) - prec -= 1; - if (prec < 0) - prec = 0; - /* and do that once again */ - again = 1; - } - if (again != 0) - (void) sprintf(buf, dform, fwidth, prec, g); - } -} -#endif /* GFMT_WORKAROUND */ diff --git a/gnu/usr.bin/awk/config.h b/gnu/usr.bin/awk/config.h deleted file mode 100644 index 601f483..0000000 --- a/gnu/usr.bin/awk/config.h +++ /dev/null @@ -1,306 +0,0 @@ -/* - * config.h -- configuration definitions for gawk. - * - * For generic 4.4 alpha - */ - -/* - * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -/* - * This file isolates configuration dependencies for gnu awk. - * You should know something about your system, perhaps by having - * a manual handy, when you edit this file. You should copy config.h-dist - * to config.h, and edit config.h. Do not modify config.h-dist, so that - * it will be easy to apply any patches that may be distributed. - * - * The general idea is that systems conforming to the various standards - * should need to do the least amount of changing. Definining the various - * items in ths file usually means that your system is missing that - * particular feature. - * - * The order of preference in standard conformance is ANSI C, POSIX, - * and the SVID. - * - * If you have no clue as to what's going on with your system, try - * compiling gawk without editing this file and see what shows up - * missing in the link stage. From there, you can probably figure out - * which defines to turn on. - */ - -/**************************/ -/* Miscellanious features */ -/**************************/ - -/* - * BLKSIZE_MISSING - * - * Check your /usr/include/sys/stat.h file. If the stat structure - * does not have a member named st_blksize, define this. (This will - * most likely be the case on most System V systems prior to V.4.) - */ -/* #define BLKSIZE_MISSING 1 */ - -/* - * SIGTYPE - * - * The return type of the routines passed to the signal function. - * Modern systems use `void', older systems use `int'. - * If left undefined, it will default to void. - */ -/* #define SIGTYPE int */ - -/* - * SIZE_T_MISSING - * - * If your system has no typedef for size_t, define this to get a default - */ -/* #define SIZE_T_MISSING 1 */ - -/* - * CHAR_UNSIGNED - * - * If your machine uses unsigned characters (IBM RT and RS/6000 and others) - * then define this for use in regex.c - */ -/* #define CHAR_UNSIGNED 1 */ - -/* - * HAVE_UNDERSCORE_SETJMP - * - * Check in your /usr/include/setjmp.h file. If there are routines - * there named _setjmp and _longjmp, then you should define this. - * Typically only systems derived from Berkeley Unix have this. - */ -#define HAVE_UNDERSCORE_SETJMP 1 - -/* - * LIMITS_H_MISSING - * - * You don't have a <limits.h> include file. - */ -/* #define LIMITS_H_MISSING 1 */ - -/***********************************************/ -/* Missing library subroutines or system calls */ -/***********************************************/ - -/* - * MEMCMP_MISSING - * MEMCPY_MISSING - * MEMSET_MISSING - * - * These three routines are for manipulating blocks of memory. Most - * likely they will either all three be present or all three be missing, - * so they're grouped together. - */ -/* #define MEMCMP_MISSING 1 */ -/* #define MEMCPY_MISSING 1 */ -/* #define MEMSET_MISSING 1 */ - -/* - * RANDOM_MISSING - * - * Your system does not have the random(3) suite of random number - * generating routines. These are different than the old rand(3) - * routines! - */ -/* #define RANDOM_MISSING 1 */ - -/* - * STRCASE_MISSING - * - * Your system does not have the strcasemp() and strncasecmp() - * routines that originated in Berkeley Unix. - */ -/* #define STRCASE_MISSING 1 */ - -/* - * STRCHR_MISSING - * - * Your system does not have the strchr() and strrchr() functions. - */ -/* #define STRCHR_MISSING 1 */ - -/* - * STRERROR_MISSING - * - * Your system lacks the ANSI C strerror() routine for returning the - * strings associated with errno values. - */ -/* #define STRERROR_MISSING 1 */ - -/* - * STRTOD_MISSING - * - * Your system does not have the strtod() routine for converting - * strings to double precision floating point values. - */ -/* #define STRTOD_MISSING 1 */ - -/* - * STRFTIME_MISSING - * - * Your system lacks the ANSI C strftime() routine for formatting - * broken down time values. - */ -/* #define STRFTIME_MISSING 1 */ - -/* - * TZSET_MISSING - * - * If you have a 4.2 BSD vintage system, then the strftime() routine - * supplied in the missing directory won't be enough, because it relies on the - * tzset() routine from System V / Posix. Fortunately, there is an - * emulation for tzset() too that should do the trick. If you don't - * have tzset(), define this. - */ -/* #define TZSET_MISSING 1 */ - -/* - * TZNAME_MISSING - * - * Some systems do not support the external variables tzname and daylight. - * If this is the case *and* strftime() is missing, define this. - */ -/* #define TZNAME_MISSING 1 */ - -/* - * TM_ZONE_MISSING - * - * Your "struct tm" is missing the tm_zone field. - * If this is the case *and* strftime() is missing *and* tzname is missing, - * define this. - */ -/* #define TM_ZONE_MISSING 1 */ - -/* - * STDC_HEADERS - * - * If your system does have ANSI compliant header files that - * provide prototypes for library routines, then define this. - */ -#define STDC_HEADERS 1 - -/* - * NO_TOKEN_PASTING - * - * If your compiler define's __STDC__ but does not support token - * pasting (tok##tok), then define this. - */ -/* #define NO_TOKEN_PASTING 1 */ - -/*****************************************************************/ -/* Stuff related to the Standard I/O Library. */ -/*****************************************************************/ -/* Much of this is (still, unfortunately) black magic in nature. */ -/* You may have to use some or all of these together to get gawk */ -/* to work correctly. */ -/*****************************************************************/ - -/* - * NON_STD_SPRINTF - * - * Look in your /usr/include/stdio.h file. If the return type of the - * sprintf() function is NOT `int', define this. - */ -/* #define NON_STD_SPRINTF 1 */ - -/* - * VPRINTF_MISSING - * - * Define this if your system lacks vprintf() and the other routines - * that go with it. This will trigger an attempt to use _doprnt(). - * If you don't have that, this attempt will fail and you are on your own. - */ -/* #define VPRINTF_MISSING 1 */ - -/* - * Casts from size_t to int and back. These will become unnecessary - * at some point in the future, but for now are required where the - * two types are a different representation. - */ -/* #define SZTC */ -/* #define INTC */ - -/* - * SYSTEM_MISSING - * - * Define this if your library does not provide a system function - * or you are not entirely happy with it and would rather use - * a provided replacement (atari only). - */ -/* #define SYSTEM_MISSING 1 */ - -/* - * FMOD_MISSING - * - * Define this if your system lacks the fmod() function and modf() will - * be used instead. - */ -/* #define FMOD_MISSING 1 */ - - -/*******************************/ -/* Gawk configuration options. */ -/*******************************/ - -/* - * DEFPATH - * - * The default search path for the -f option of gawk. It is used - * if the AWKPATH environment variable is undefined. The default - * definition is provided here. Most likely you should not change - * this. - */ - -/* #define DEFPATH ".:/usr/lib/awk:/usr/local/lib/awk" */ -/* #define ENVSEP ':' */ - -/* - * alloca already has a prototype defined - don't redefine it - */ -#define ALLOCA_PROTO 1 - -/* - * srandom already has a prototype defined - don't redefine it - */ -#define SRANDOM_PROTO 1 - -/* - * getpgrp() in sysvr4 and POSIX takes no argument - */ -/* #define GETPGRP_NOARG 0 */ - -/* - * define const to nothing if not __STDC__ - */ -#ifndef __STDC__ -#define const -#endif - -/* If svr4 and not gcc */ -/* #define SVR4 0 */ -#ifdef SVR4 -#define __svr4__ 1 -#endif - -/* anything that follows is for system-specific short-term kludges */ diff --git a/gnu/usr.bin/awk/dfa.c b/gnu/usr.bin/awk/dfa.c deleted file mode 100644 index e9c832b..0000000 --- a/gnu/usr.bin/awk/dfa.c +++ /dev/null @@ -1,2613 +0,0 @@ -/* dfa.c - deterministic extended regexp routines for GNU - Copyright (C) 1988 Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ - -/* Written June, 1988 by Mike Haertel - Modified July, 1988 by Arthur David Olson to assist BMG speedups */ - -#include <assert.h> -#include <ctype.h> -#include <stdio.h> - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#ifdef STDC_HEADERS -#include <stdlib.h> -#else -#include <sys/types.h> -extern char *calloc(), *malloc(), *realloc(); -extern void free(); -#endif - -#if defined(HAVE_STRING_H) || defined(STDC_HEADERS) -#include <string.h> -#undef index -#define index strchr -#else -#include <strings.h> -#endif - -#ifndef DEBUG /* use the same approach as regex.c */ -#undef assert -#define assert(e) -#endif /* DEBUG */ - -#ifndef isgraph -#define isgraph(C) (isprint(C) && !isspace(C)) -#endif - -#define ISALPHA(C) isalpha(C) -#define ISUPPER(C) isupper(C) -#define ISLOWER(C) islower(C) -#define ISDIGIT(C) isdigit(C) -#define ISXDIGIT(C) isxdigit(C) -#define ISSPACE(C) isspace(C) -#define ISPUNCT(C) ispunct(C) -#define ISALNUM(C) isalnum(C) -#define ISPRINT(C) isprint(C) -#define ISGRAPH(C) isgraph(C) -#define ISCNTRL(C) iscntrl(C) - -#include "gnuregex.h" -#include "dfa.h" - -#ifdef __STDC__ -typedef void *ptr_t; -#else -typedef char *ptr_t; -#ifndef const -#define const -#endif -#endif - -static void dfamust _RE_ARGS((struct dfa *dfa)); - -static ptr_t xcalloc _RE_ARGS((size_t n, size_t s)); -static ptr_t xmalloc _RE_ARGS((size_t n)); -static ptr_t xrealloc _RE_ARGS((ptr_t p, size_t n)); -#ifdef DEBUG -static void prtok _RE_ARGS((token t)); -#endif -static int tstbit _RE_ARGS((int b, charclass c)); -static void setbit _RE_ARGS((int b, charclass c)); -static void clrbit _RE_ARGS((int b, charclass c)); -static void copyset _RE_ARGS((charclass src, charclass dst)); -static void zeroset _RE_ARGS((charclass s)); -static void notset _RE_ARGS((charclass s)); -static int equal _RE_ARGS((charclass s1, charclass s2)); -static int charclass_index _RE_ARGS((charclass s)); -static int looking_at _RE_ARGS((const char *s)); -static token lex _RE_ARGS((void)); -static void addtok _RE_ARGS((token t)); -static void atom _RE_ARGS((void)); -static int nsubtoks _RE_ARGS((int tindex)); -static void copytoks _RE_ARGS((int tindex, int ntokens)); -static void closure _RE_ARGS((void)); -static void branch _RE_ARGS((void)); -static void regexp _RE_ARGS((int toplevel)); -static void copy _RE_ARGS((position_set *src, position_set *dst)); -static void insert _RE_ARGS((position p, position_set *s)); -static void merge _RE_ARGS((position_set *s1, position_set *s2, position_set *m)); -static void delete _RE_ARGS((position p, position_set *s)); -static int state_index _RE_ARGS((struct dfa *d, position_set *s, - int newline, int letter)); -static void build_state _RE_ARGS((int s, struct dfa *d)); -static void build_state_zero _RE_ARGS((struct dfa *d)); -static char *icatalloc _RE_ARGS((char *old, char *new)); -static char *icpyalloc _RE_ARGS((char *string)); -static char *istrstr _RE_ARGS((char *lookin, char *lookfor)); -static void ifree _RE_ARGS((char *cp)); -static void freelist _RE_ARGS((char **cpp)); -static char **enlist _RE_ARGS((char **cpp, char *new, size_t len)); -static char **comsubs _RE_ARGS((char *left, char *right)); -static char **addlists _RE_ARGS((char **old, char **new)); -static char **inboth _RE_ARGS((char **left, char **right)); - -#ifdef __FreeBSD__ -static int collate_range_cmp (a, b) - int a, b; -{ - int r; - static char s[2][2]; - - if ((unsigned char)a == (unsigned char)b) - return 0; - s[0][0] = a; - s[1][0] = b; - if ((r = strcoll(s[0], s[1])) == 0) - r = (unsigned char)a - (unsigned char)b; - return r; -} -#endif - -static ptr_t -xcalloc(n, s) - size_t n; - size_t s; -{ - ptr_t r = calloc(n, s); - - if (!r) - dfaerror("Memory exhausted"); - return r; -} - -static ptr_t -xmalloc(n) - size_t n; -{ - ptr_t r = malloc(n); - - assert(n != 0); - if (!r) - dfaerror("Memory exhausted"); - return r; -} - -static ptr_t -xrealloc(p, n) - ptr_t p; - size_t n; -{ - ptr_t r = realloc(p, n); - - assert(n != 0); - if (!r) - dfaerror("Memory exhausted"); - return r; -} - -#define CALLOC(p, t, n) ((p) = (t *) xcalloc((size_t)(n), sizeof (t))) -#define MALLOC(p, t, n) ((p) = (t *) xmalloc((n) * sizeof (t))) -#define REALLOC(p, t, n) ((p) = (t *) xrealloc((ptr_t) (p), (n) * sizeof (t))) - -/* Reallocate an array of type t if nalloc is too small for index. */ -#define REALLOC_IF_NECESSARY(p, t, nalloc, index) \ - if ((index) >= (nalloc)) \ - { \ - while ((index) >= (nalloc)) \ - (nalloc) *= 2; \ - REALLOC(p, t, nalloc); \ - } - -#ifdef DEBUG - -static void -prtok(t) - token t; -{ - char *s; - - if (t < 0) - fprintf(stderr, "END"); - else if (t < NOTCHAR) - fprintf(stderr, "%c", t); - else - { - switch (t) - { - case EMPTY: s = "EMPTY"; break; - case BACKREF: s = "BACKREF"; break; - case BEGLINE: s = "BEGLINE"; break; - case ENDLINE: s = "ENDLINE"; break; - case BEGWORD: s = "BEGWORD"; break; - case ENDWORD: s = "ENDWORD"; break; - case LIMWORD: s = "LIMWORD"; break; - case NOTLIMWORD: s = "NOTLIMWORD"; break; - case QMARK: s = "QMARK"; break; - case STAR: s = "STAR"; break; - case PLUS: s = "PLUS"; break; - case CAT: s = "CAT"; break; - case OR: s = "OR"; break; - case ORTOP: s = "ORTOP"; break; - case LPAREN: s = "LPAREN"; break; - case RPAREN: s = "RPAREN"; break; - default: s = "CSET"; break; - } - fprintf(stderr, "%s", s); - } -} -#endif /* DEBUG */ - -/* Stuff pertaining to charclasses. */ - -static int -tstbit(b, c) - int b; - charclass c; -{ - return c[b / INTBITS] & 1 << b % INTBITS; -} - -static void -setbit(b, c) - int b; - charclass c; -{ - c[b / INTBITS] |= 1 << b % INTBITS; -} - -static void -clrbit(b, c) - int b; - charclass c; -{ - c[b / INTBITS] &= ~(1 << b % INTBITS); -} - -static void -copyset(src, dst) - charclass src; - charclass dst; -{ - int i; - - for (i = 0; i < CHARCLASS_INTS; ++i) - dst[i] = src[i]; -} - -static void -zeroset(s) - charclass s; -{ - int i; - - for (i = 0; i < CHARCLASS_INTS; ++i) - s[i] = 0; -} - -static void -notset(s) - charclass s; -{ - int i; - - for (i = 0; i < CHARCLASS_INTS; ++i) - s[i] = ~s[i]; -} - -static int -equal(s1, s2) - charclass s1; - charclass s2; -{ - int i; - - for (i = 0; i < CHARCLASS_INTS; ++i) - if (s1[i] != s2[i]) - return 0; - return 1; -} - -/* A pointer to the current dfa is kept here during parsing. */ -static struct dfa *dfa; - -/* Find the index of charclass s in dfa->charclasses, or allocate a new charclass. */ -static int -charclass_index(s) - charclass s; -{ - int i; - - for (i = 0; i < dfa->cindex; ++i) - if (equal(s, dfa->charclasses[i])) - return i; - REALLOC_IF_NECESSARY(dfa->charclasses, charclass, dfa->calloc, dfa->cindex); - ++dfa->cindex; - copyset(s, dfa->charclasses[i]); - return i; -} - -/* Syntax bits controlling the behavior of the lexical analyzer. */ -static reg_syntax_t syntax_bits, syntax_bits_set; - -/* Flag for case-folding letters into sets. */ -static int case_fold; - -/* Entry point to set syntax options. */ -void -dfasyntax(bits, fold) - reg_syntax_t bits; - int fold; -{ - syntax_bits_set = 1; - syntax_bits = bits; - case_fold = fold; -} - -/* Lexical analyzer. All the dross that deals with the obnoxious - GNU Regex syntax bits is located here. The poor, suffering - reader is referred to the GNU Regex documentation for the - meaning of the @#%!@#%^!@ syntax bits. */ - -static char *lexstart; /* Pointer to beginning of input string. */ -static char *lexptr; /* Pointer to next input character. */ -static lexleft; /* Number of characters remaining. */ -static token lasttok; /* Previous token returned; initially END. */ -static int laststart; /* True if we're separated from beginning or (, | - only by zero-width characters. */ -static int parens; /* Count of outstanding left parens. */ -static int minrep, maxrep; /* Repeat counts for {m,n}. */ - -/* Note that characters become unsigned here. */ -#define FETCH(c, eoferr) \ - { \ - if (! lexleft) \ - if (eoferr != 0) \ - dfaerror(eoferr); \ - else \ - return lasttok = END; \ - (c) = (unsigned char) *lexptr++; \ - --lexleft; \ - } - -#ifdef __STDC__ -#define FUNC(F, P) static int F(int c) { return P(c); } -#else -#define FUNC(F, P) static int F(c) int c; { return P(c); } -#endif - -FUNC(is_alpha, ISALPHA) -FUNC(is_upper, ISUPPER) -FUNC(is_lower, ISLOWER) -FUNC(is_digit, ISDIGIT) -FUNC(is_xdigit, ISXDIGIT) -FUNC(is_space, ISSPACE) -FUNC(is_punct, ISPUNCT) -FUNC(is_alnum, ISALNUM) -FUNC(is_print, ISPRINT) -FUNC(is_graph, ISGRAPH) -FUNC(is_cntrl, ISCNTRL) - -/* The following list maps the names of the Posix named character classes - to predicate functions that determine whether a given character is in - the class. The leading [ has already been eaten by the lexical analyzer. */ -static struct { - const char *name; - int (*pred) _RE_ARGS((int)); -} prednames[] = { - { ":alpha:]", is_alpha }, - { ":upper:]", is_upper }, - { ":lower:]", is_lower }, - { ":digit:]", is_digit }, - { ":xdigit:]", is_xdigit }, - { ":space:]", is_space }, - { ":punct:]", is_punct }, - { ":alnum:]", is_alnum }, - { ":print:]", is_print }, - { ":graph:]", is_graph }, - { ":cntrl:]", is_cntrl }, - { 0 } -}; - -static int -looking_at(s) - const char *s; -{ - size_t len; - - len = strlen(s); - if (lexleft < len) - return 0; - return strncmp(s, lexptr, len) == 0; -} - -static token -lex() -{ - token c, c1, c2; - int backslash = 0, invert; - charclass ccl; - int i; - - /* Basic plan: We fetch a character. If it's a backslash, - we set the backslash flag and go through the loop again. - On the plus side, this avoids having a duplicate of the - main switch inside the backslash case. On the minus side, - it means that just about every case begins with - "if (backslash) ...". */ - for (i = 0; i < 2; ++i) - { - FETCH(c, 0); - switch (c) - { - case '\\': - if (backslash) - goto normal_char; - if (lexleft == 0) - dfaerror("Unfinished \\ escape"); - backslash = 1; - break; - - case '^': - if (backslash) - goto normal_char; - if (syntax_bits & RE_CONTEXT_INDEP_ANCHORS - || lasttok == END - || lasttok == LPAREN - || lasttok == OR) - return lasttok = BEGLINE; - goto normal_char; - - case '$': - if (backslash) - goto normal_char; - if (syntax_bits & RE_CONTEXT_INDEP_ANCHORS - || lexleft == 0 - || (syntax_bits & RE_NO_BK_PARENS - ? lexleft > 0 && *lexptr == ')' - : lexleft > 1 && lexptr[0] == '\\' && lexptr[1] == ')') - || (syntax_bits & RE_NO_BK_VBAR - ? lexleft > 0 && *lexptr == '|' - : lexleft > 1 && lexptr[0] == '\\' && lexptr[1] == '|') - || ((syntax_bits & RE_NEWLINE_ALT) - && lexleft > 0 && *lexptr == '\n')) - return lasttok = ENDLINE; - goto normal_char; - - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - if (backslash && !(syntax_bits & RE_NO_BK_REFS)) - { - laststart = 0; - return lasttok = BACKREF; - } - goto normal_char; - - case '<': - if (syntax_bits & RE_NO_GNU_OPS) - goto normal_char; - if (backslash) - return lasttok = BEGWORD; - goto normal_char; - - case '>': - if (syntax_bits & RE_NO_GNU_OPS) - goto normal_char; - if (backslash) - return lasttok = ENDWORD; - goto normal_char; - - case 'b': - if (syntax_bits & RE_NO_GNU_OPS) - goto normal_char; - if (backslash) - return lasttok = LIMWORD; - goto normal_char; - - case 'B': - if (syntax_bits & RE_NO_GNU_OPS) - goto normal_char; - if (backslash) - return lasttok = NOTLIMWORD; - goto normal_char; - - case '?': - if (syntax_bits & RE_LIMITED_OPS) - goto normal_char; - if (backslash != ((syntax_bits & RE_BK_PLUS_QM) != 0)) - goto normal_char; - if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart) - goto normal_char; - return lasttok = QMARK; - - case '*': - if (backslash) - goto normal_char; - if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart) - goto normal_char; - return lasttok = STAR; - - case '+': - if (syntax_bits & RE_LIMITED_OPS) - goto normal_char; - if (backslash != ((syntax_bits & RE_BK_PLUS_QM) != 0)) - goto normal_char; - if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart) - goto normal_char; - return lasttok = PLUS; - - case '{': - if (!(syntax_bits & RE_INTERVALS)) - goto normal_char; - if (backslash != ((syntax_bits & RE_NO_BK_BRACES) == 0)) - goto normal_char; - minrep = maxrep = 0; - /* Cases: - {M} - exact count - {M,} - minimum count, maximum is infinity - {,M} - 0 through M - {M,N} - M through N */ - FETCH(c, "unfinished repeat count"); - if (ISDIGIT(c)) - { - minrep = c - '0'; - for (;;) - { - FETCH(c, "unfinished repeat count"); - if (!ISDIGIT(c)) - break; - minrep = 10 * minrep + c - '0'; - } - } - else if (c != ',') - dfaerror("malformed repeat count"); - if (c == ',') - for (;;) - { - FETCH(c, "unfinished repeat count"); - if (!ISDIGIT(c)) - break; - maxrep = 10 * maxrep + c - '0'; - } - else - maxrep = minrep; - if (!(syntax_bits & RE_NO_BK_BRACES)) - { - if (c != '\\') - dfaerror("malformed repeat count"); - FETCH(c, "unfinished repeat count"); - } - if (c != '}') - dfaerror("malformed repeat count"); - laststart = 0; - return lasttok = REPMN; - - case '|': - if (syntax_bits & RE_LIMITED_OPS) - goto normal_char; - if (backslash != ((syntax_bits & RE_NO_BK_VBAR) == 0)) - goto normal_char; - laststart = 1; - return lasttok = OR; - - case '\n': - if (syntax_bits & RE_LIMITED_OPS - || backslash - || !(syntax_bits & RE_NEWLINE_ALT)) - goto normal_char; - laststart = 1; - return lasttok = OR; - - case '(': - if (backslash != ((syntax_bits & RE_NO_BK_PARENS) == 0)) - goto normal_char; - ++parens; - laststart = 1; - return lasttok = LPAREN; - - case ')': - if (backslash != ((syntax_bits & RE_NO_BK_PARENS) == 0)) - goto normal_char; - if (parens == 0 && syntax_bits & RE_UNMATCHED_RIGHT_PAREN_ORD) - goto normal_char; - --parens; - laststart = 0; - return lasttok = RPAREN; - - case '.': - if (backslash) - goto normal_char; - zeroset(ccl); - notset(ccl); - if (!(syntax_bits & RE_DOT_NEWLINE)) - clrbit('\n', ccl); - if (syntax_bits & RE_DOT_NOT_NULL) - clrbit('\0', ccl); - laststart = 0; - return lasttok = CSET + charclass_index(ccl); - - case 'w': - case 'W': - if (!backslash || (syntax_bits & RE_NO_GNU_OPS)) - goto normal_char; - zeroset(ccl); - for (c2 = 0; c2 < NOTCHAR; ++c2) - if (ISALNUM(c2)) - setbit(c2, ccl); - if (c == 'W') - notset(ccl); - laststart = 0; - return lasttok = CSET + charclass_index(ccl); - - case '[': - if (backslash) - goto normal_char; - zeroset(ccl); - FETCH(c, "Unbalanced ["); - if (c == '^') - { - FETCH(c, "Unbalanced ["); - invert = 1; - } - else - invert = 0; - do - { - /* Nobody ever said this had to be fast. :-) - Note that if we're looking at some other [:...:] - construct, we just treat it as a bunch of ordinary - characters. We can do this because we assume - regex has checked for syntax errors before - dfa is ever called. */ - if (c == '[' && (syntax_bits & RE_CHAR_CLASSES)) - for (c1 = 0; prednames[c1].name; ++c1) - if (looking_at(prednames[c1].name)) - { - for (c2 = 0; c2 < NOTCHAR; ++c2) - if ((*prednames[c1].pred)(c2)) - setbit(c2, ccl); - lexptr += strlen(prednames[c1].name); - lexleft -= strlen(prednames[c1].name); - FETCH(c1, "Unbalanced ["); - goto skip; - } - if (c == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS)) - FETCH(c, "Unbalanced ["); - FETCH(c1, "Unbalanced ["); - if (c1 == '-') - { - FETCH(c2, "Unbalanced ["); - if (c2 == ']') - { - /* In the case [x-], the - is an ordinary hyphen, - which is left in c1, the lookahead character. */ - --lexptr; - ++lexleft; - c2 = c; - } - else - { - if (c2 == '\\' - && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS)) - FETCH(c2, "Unbalanced ["); - FETCH(c1, "Unbalanced ["); - } - } - else - c2 = c; -#ifdef __FreeBSD__ - { token c3; - - if (collate_range_cmp(c, c2) > 0) { - FETCH(c2, "Invalid range"); - goto skip; - } - - for (c3 = 0; c3 < NOTCHAR; ++c3) - if ( collate_range_cmp(c, c3) <= 0 - && collate_range_cmp(c3, c2) <= 0 - ) { - setbit(c3, ccl); - if (case_fold) - if (ISUPPER(c3)) - setbit(tolower(c3), ccl); - else if (ISLOWER(c3)) - setbit(toupper(c3), ccl); - } - } -#else - while (c <= c2) - { - setbit(c, ccl); - if (case_fold) - if (ISUPPER(c)) - setbit(tolower(c), ccl); - else if (ISLOWER(c)) - setbit(toupper(c), ccl); - ++c; - } -#endif - skip: - ; - } - while ((c = c1) != ']'); - if (invert) - { - notset(ccl); - if (syntax_bits & RE_HAT_LISTS_NOT_NEWLINE) - clrbit('\n', ccl); - } - laststart = 0; - return lasttok = CSET + charclass_index(ccl); - - default: - normal_char: - laststart = 0; - if (case_fold && ISALPHA(c)) - { - zeroset(ccl); - setbit(c, ccl); - if (isupper(c)) - setbit(tolower(c), ccl); - else - setbit(toupper(c), ccl); - return lasttok = CSET + charclass_index(ccl); - } - return c; - } - } - - /* The above loop should consume at most a backslash - and some other character. */ - abort(); -} - -/* Recursive descent parser for regular expressions. */ - -static token tok; /* Lookahead token. */ -static depth; /* Current depth of a hypothetical stack - holding deferred productions. This is - used to determine the depth that will be - required of the real stack later on in - dfaanalyze(). */ - -/* Add the given token to the parse tree, maintaining the depth count and - updating the maximum depth if necessary. */ -static void -addtok(t) - token t; -{ - REALLOC_IF_NECESSARY(dfa->tokens, token, dfa->talloc, dfa->tindex); - dfa->tokens[dfa->tindex++] = t; - - switch (t) - { - case QMARK: - case STAR: - case PLUS: - break; - - case CAT: - case OR: - case ORTOP: - --depth; - break; - - default: - ++dfa->nleaves; - case EMPTY: - ++depth; - break; - } - if (depth > dfa->depth) - dfa->depth = depth; -} - -/* The grammar understood by the parser is as follows. - - regexp: - regexp OR branch - branch - - branch: - branch closure - closure - - closure: - closure QMARK - closure STAR - closure PLUS - atom - - atom: - <normal character> - CSET - BACKREF - BEGLINE - ENDLINE - BEGWORD - ENDWORD - LIMWORD - NOTLIMWORD - <empty> - - The parser builds a parse tree in postfix form in an array of tokens. */ - -static void -atom() -{ - if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF - || tok == BEGLINE || tok == ENDLINE || tok == BEGWORD - || tok == ENDWORD || tok == LIMWORD || tok == NOTLIMWORD) - { - addtok(tok); - tok = lex(); - } - else if (tok == LPAREN) - { - tok = lex(); - regexp(0); - if (tok != RPAREN) - dfaerror("Unbalanced ("); - tok = lex(); - } - else - addtok(EMPTY); -} - -/* Return the number of tokens in the given subexpression. */ -static int -nsubtoks(tindex) -int tindex; -{ - int ntoks1; - - switch (dfa->tokens[tindex - 1]) - { - default: - return 1; - case QMARK: - case STAR: - case PLUS: - return 1 + nsubtoks(tindex - 1); - case CAT: - case OR: - case ORTOP: - ntoks1 = nsubtoks(tindex - 1); - return 1 + ntoks1 + nsubtoks(tindex - 1 - ntoks1); - } -} - -/* Copy the given subexpression to the top of the tree. */ -static void -copytoks(tindex, ntokens) - int tindex, ntokens; -{ - int i; - - for (i = 0; i < ntokens; ++i) - addtok(dfa->tokens[tindex + i]); -} - -static void -closure() -{ - int tindex, ntokens, i; - - atom(); - while (tok == QMARK || tok == STAR || tok == PLUS || tok == REPMN) - if (tok == REPMN) - { - ntokens = nsubtoks(dfa->tindex); - tindex = dfa->tindex - ntokens; - if (maxrep == 0) - addtok(PLUS); - if (minrep == 0) - addtok(QMARK); - for (i = 1; i < minrep; ++i) - { - copytoks(tindex, ntokens); - addtok(CAT); - } - for (; i < maxrep; ++i) - { - copytoks(tindex, ntokens); - addtok(QMARK); - addtok(CAT); - } - tok = lex(); - } - else - { - addtok(tok); - tok = lex(); - } -} - -static void -branch() -{ - closure(); - while (tok != RPAREN && tok != OR && tok >= 0) - { - closure(); - addtok(CAT); - } -} - -static void -regexp(toplevel) - int toplevel; -{ - branch(); - while (tok == OR) - { - tok = lex(); - branch(); - if (toplevel) - addtok(ORTOP); - else - addtok(OR); - } -} - -/* Main entry point for the parser. S is a string to be parsed, len is the - length of the string, so s can include NUL characters. D is a pointer to - the struct dfa to parse into. */ -void -dfaparse(s, len, d) - char *s; - size_t len; - struct dfa *d; - -{ - dfa = d; - lexstart = lexptr = s; - lexleft = len; - lasttok = END; - laststart = 1; - parens = 0; - - if (! syntax_bits_set) - dfaerror("No syntax specified"); - - tok = lex(); - depth = d->depth; - - regexp(1); - - if (tok != END) - dfaerror("Unbalanced )"); - - addtok(END - d->nregexps); - addtok(CAT); - - if (d->nregexps) - addtok(ORTOP); - - ++d->nregexps; -} - -/* Some primitives for operating on sets of positions. */ - -/* Copy one set to another; the destination must be large enough. */ -static void -copy(src, dst) - position_set *src; - position_set *dst; -{ - int i; - - for (i = 0; i < src->nelem; ++i) - dst->elems[i] = src->elems[i]; - dst->nelem = src->nelem; -} - -/* Insert a position in a set. Position sets are maintained in sorted - order according to index. If position already exists in the set with - the same index then their constraints are logically or'd together. - S->elems must point to an array large enough to hold the resulting set. */ -static void -insert(p, s) - position p; - position_set *s; -{ - int i; - position t1, t2; - - for (i = 0; i < s->nelem && p.index < s->elems[i].index; ++i) - continue; - if (i < s->nelem && p.index == s->elems[i].index) - s->elems[i].constraint |= p.constraint; - else - { - t1 = p; - ++s->nelem; - while (i < s->nelem) - { - t2 = s->elems[i]; - s->elems[i++] = t1; - t1 = t2; - } - } -} - -/* Merge two sets of positions into a third. The result is exactly as if - the positions of both sets were inserted into an initially empty set. */ -static void -merge(s1, s2, m) - position_set *s1; - position_set *s2; - position_set *m; -{ - int i = 0, j = 0; - - m->nelem = 0; - while (i < s1->nelem && j < s2->nelem) - if (s1->elems[i].index > s2->elems[j].index) - m->elems[m->nelem++] = s1->elems[i++]; - else if (s1->elems[i].index < s2->elems[j].index) - m->elems[m->nelem++] = s2->elems[j++]; - else - { - m->elems[m->nelem] = s1->elems[i++]; - m->elems[m->nelem++].constraint |= s2->elems[j++].constraint; - } - while (i < s1->nelem) - m->elems[m->nelem++] = s1->elems[i++]; - while (j < s2->nelem) - m->elems[m->nelem++] = s2->elems[j++]; -} - -/* Delete a position from a set. */ -static void -delete(p, s) - position p; - position_set *s; -{ - int i; - - for (i = 0; i < s->nelem; ++i) - if (p.index == s->elems[i].index) - break; - if (i < s->nelem) - for (--s->nelem; i < s->nelem; ++i) - s->elems[i] = s->elems[i + 1]; -} - -/* Find the index of the state corresponding to the given position set with - the given preceding context, or create a new state if there is no such - state. Newline and letter tell whether we got here on a newline or - letter, respectively. */ -static int -state_index(d, s, newline, letter) - struct dfa *d; - position_set *s; - int newline; - int letter; -{ - int hash = 0; - int constraint; - int i, j; - - newline = newline ? 1 : 0; - letter = letter ? 1 : 0; - - for (i = 0; i < s->nelem; ++i) - hash ^= s->elems[i].index + s->elems[i].constraint; - - /* Try to find a state that exactly matches the proposed one. */ - for (i = 0; i < d->sindex; ++i) - { - if (hash != d->states[i].hash || s->nelem != d->states[i].elems.nelem - || newline != d->states[i].newline || letter != d->states[i].letter) - continue; - for (j = 0; j < s->nelem; ++j) - if (s->elems[j].constraint - != d->states[i].elems.elems[j].constraint - || s->elems[j].index != d->states[i].elems.elems[j].index) - break; - if (j == s->nelem) - return i; - } - - /* We'll have to create a new state. */ - REALLOC_IF_NECESSARY(d->states, dfa_state, d->salloc, d->sindex); - d->states[i].hash = hash; - MALLOC(d->states[i].elems.elems, position, s->nelem); - copy(s, &d->states[i].elems); - d->states[i].newline = newline; - d->states[i].letter = letter; - d->states[i].backref = 0; - d->states[i].constraint = 0; - d->states[i].first_end = 0; - for (j = 0; j < s->nelem; ++j) - if (d->tokens[s->elems[j].index] < 0) - { - constraint = s->elems[j].constraint; - if (SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 0) - || SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 1) - || SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 0) - || SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 1)) - d->states[i].constraint |= constraint; - if (! d->states[i].first_end) - d->states[i].first_end = d->tokens[s->elems[j].index]; - } - else if (d->tokens[s->elems[j].index] == BACKREF) - { - d->states[i].constraint = NO_CONSTRAINT; - d->states[i].backref = 1; - } - - ++d->sindex; - - return i; -} - -/* Find the epsilon closure of a set of positions. If any position of the set - contains a symbol that matches the empty string in some context, replace - that position with the elements of its follow labeled with an appropriate - constraint. Repeat exhaustively until no funny positions are left. - S->elems must be large enough to hold the result. */ -static void epsclosure _RE_ARGS((position_set *s, struct dfa *d)); - -static void -epsclosure(s, d) - position_set *s; - struct dfa *d; -{ - int i, j; - int *visited; - position p, old; - - MALLOC(visited, int, d->tindex); - for (i = 0; i < d->tindex; ++i) - visited[i] = 0; - - for (i = 0; i < s->nelem; ++i) - if (d->tokens[s->elems[i].index] >= NOTCHAR - && d->tokens[s->elems[i].index] != BACKREF - && d->tokens[s->elems[i].index] < CSET) - { - old = s->elems[i]; - p.constraint = old.constraint; - delete(s->elems[i], s); - if (visited[old.index]) - { - --i; - continue; - } - visited[old.index] = 1; - switch (d->tokens[old.index]) - { - case BEGLINE: - p.constraint &= BEGLINE_CONSTRAINT; - break; - case ENDLINE: - p.constraint &= ENDLINE_CONSTRAINT; - break; - case BEGWORD: - p.constraint &= BEGWORD_CONSTRAINT; - break; - case ENDWORD: - p.constraint &= ENDWORD_CONSTRAINT; - break; - case LIMWORD: - p.constraint &= LIMWORD_CONSTRAINT; - break; - case NOTLIMWORD: - p.constraint &= NOTLIMWORD_CONSTRAINT; - break; - default: - break; - } - for (j = 0; j < d->follows[old.index].nelem; ++j) - { - p.index = d->follows[old.index].elems[j].index; - insert(p, s); - } - /* Force rescan to start at the beginning. */ - i = -1; - } - - free(visited); -} - -/* Perform bottom-up analysis on the parse tree, computing various functions. - Note that at this point, we're pretending constructs like \< are real - characters rather than constraints on what can follow them. - - Nullable: A node is nullable if it is at the root of a regexp that can - match the empty string. - * EMPTY leaves are nullable. - * No other leaf is nullable. - * A QMARK or STAR node is nullable. - * A PLUS node is nullable if its argument is nullable. - * A CAT node is nullable if both its arguments are nullable. - * An OR node is nullable if either argument is nullable. - - Firstpos: The firstpos of a node is the set of positions (nonempty leaves) - that could correspond to the first character of a string matching the - regexp rooted at the given node. - * EMPTY leaves have empty firstpos. - * The firstpos of a nonempty leaf is that leaf itself. - * The firstpos of a QMARK, STAR, or PLUS node is the firstpos of its - argument. - * The firstpos of a CAT node is the firstpos of the left argument, union - the firstpos of the right if the left argument is nullable. - * The firstpos of an OR node is the union of firstpos of each argument. - - Lastpos: The lastpos of a node is the set of positions that could - correspond to the last character of a string matching the regexp at - the given node. - * EMPTY leaves have empty lastpos. - * The lastpos of a nonempty leaf is that leaf itself. - * The lastpos of a QMARK, STAR, or PLUS node is the lastpos of its - argument. - * The lastpos of a CAT node is the lastpos of its right argument, union - the lastpos of the left if the right argument is nullable. - * The lastpos of an OR node is the union of the lastpos of each argument. - - Follow: The follow of a position is the set of positions that could - correspond to the character following a character matching the node in - a string matching the regexp. At this point we consider special symbols - that match the empty string in some context to be just normal characters. - Later, if we find that a special symbol is in a follow set, we will - replace it with the elements of its follow, labeled with an appropriate - constraint. - * Every node in the firstpos of the argument of a STAR or PLUS node is in - the follow of every node in the lastpos. - * Every node in the firstpos of the second argument of a CAT node is in - the follow of every node in the lastpos of the first argument. - - Because of the postfix representation of the parse tree, the depth-first - analysis is conveniently done by a linear scan with the aid of a stack. - Sets are stored as arrays of the elements, obeying a stack-like allocation - scheme; the number of elements in each set deeper in the stack can be - used to determine the address of a particular set's array. */ -void -dfaanalyze(d, searchflag) - struct dfa *d; - int searchflag; -{ - int *nullable; /* Nullable stack. */ - int *nfirstpos; /* Element count stack for firstpos sets. */ - position *firstpos; /* Array where firstpos elements are stored. */ - int *nlastpos; /* Element count stack for lastpos sets. */ - position *lastpos; /* Array where lastpos elements are stored. */ - int *nalloc; /* Sizes of arrays allocated to follow sets. */ - position_set tmp; /* Temporary set for merging sets. */ - position_set merged; /* Result of merging sets. */ - int wants_newline; /* True if some position wants newline info. */ - int *o_nullable; - int *o_nfirst, *o_nlast; - position *o_firstpos, *o_lastpos; - int i, j; - position *pos; - -#ifdef DEBUG - fprintf(stderr, "dfaanalyze:\n"); - for (i = 0; i < d->tindex; ++i) - { - fprintf(stderr, " %d:", i); - prtok(d->tokens[i]); - } - putc('\n', stderr); -#endif - - d->searchflag = searchflag; - - MALLOC(nullable, int, d->depth); - o_nullable = nullable; - MALLOC(nfirstpos, int, d->depth); - o_nfirst = nfirstpos; - MALLOC(firstpos, position, d->nleaves); - o_firstpos = firstpos, firstpos += d->nleaves; - MALLOC(nlastpos, int, d->depth); - o_nlast = nlastpos; - MALLOC(lastpos, position, d->nleaves); - o_lastpos = lastpos, lastpos += d->nleaves; - MALLOC(nalloc, int, d->tindex); - for (i = 0; i < d->tindex; ++i) - nalloc[i] = 0; - MALLOC(merged.elems, position, d->nleaves); - - CALLOC(d->follows, position_set, d->tindex); - - for (i = 0; i < d->tindex; ++i) -#ifdef DEBUG - { /* Nonsyntactic #ifdef goo... */ -#endif - switch (d->tokens[i]) - { - case EMPTY: - /* The empty set is nullable. */ - *nullable++ = 1; - - /* The firstpos and lastpos of the empty leaf are both empty. */ - *nfirstpos++ = *nlastpos++ = 0; - break; - - case STAR: - case PLUS: - /* Every element in the firstpos of the argument is in the follow - of every element in the lastpos. */ - tmp.nelem = nfirstpos[-1]; - tmp.elems = firstpos; - pos = lastpos; - for (j = 0; j < nlastpos[-1]; ++j) - { - merge(&tmp, &d->follows[pos[j].index], &merged); - REALLOC_IF_NECESSARY(d->follows[pos[j].index].elems, position, - nalloc[pos[j].index], merged.nelem - 1); - copy(&merged, &d->follows[pos[j].index]); - } - - case QMARK: - /* A QMARK or STAR node is automatically nullable. */ - if (d->tokens[i] != PLUS) - nullable[-1] = 1; - break; - - case CAT: - /* Every element in the firstpos of the second argument is in the - follow of every element in the lastpos of the first argument. */ - tmp.nelem = nfirstpos[-1]; - tmp.elems = firstpos; - pos = lastpos + nlastpos[-1]; - for (j = 0; j < nlastpos[-2]; ++j) - { - merge(&tmp, &d->follows[pos[j].index], &merged); - REALLOC_IF_NECESSARY(d->follows[pos[j].index].elems, position, - nalloc[pos[j].index], merged.nelem - 1); - copy(&merged, &d->follows[pos[j].index]); - } - - /* The firstpos of a CAT node is the firstpos of the first argument, - union that of the second argument if the first is nullable. */ - if (nullable[-2]) - nfirstpos[-2] += nfirstpos[-1]; - else - firstpos += nfirstpos[-1]; - --nfirstpos; - - /* The lastpos of a CAT node is the lastpos of the second argument, - union that of the first argument if the second is nullable. */ - if (nullable[-1]) - nlastpos[-2] += nlastpos[-1]; - else - { - pos = lastpos + nlastpos[-2]; - for (j = nlastpos[-1] - 1; j >= 0; --j) - pos[j] = lastpos[j]; - lastpos += nlastpos[-2]; - nlastpos[-2] = nlastpos[-1]; - } - --nlastpos; - - /* A CAT node is nullable if both arguments are nullable. */ - nullable[-2] = nullable[-1] && nullable[-2]; - --nullable; - break; - - case OR: - case ORTOP: - /* The firstpos is the union of the firstpos of each argument. */ - nfirstpos[-2] += nfirstpos[-1]; - --nfirstpos; - - /* The lastpos is the union of the lastpos of each argument. */ - nlastpos[-2] += nlastpos[-1]; - --nlastpos; - - /* An OR node is nullable if either argument is nullable. */ - nullable[-2] = nullable[-1] || nullable[-2]; - --nullable; - break; - - default: - /* Anything else is a nonempty position. (Note that special - constructs like \< are treated as nonempty strings here; - an "epsilon closure" effectively makes them nullable later. - Backreferences have to get a real position so we can detect - transitions on them later. But they are nullable. */ - *nullable++ = d->tokens[i] == BACKREF; - - /* This position is in its own firstpos and lastpos. */ - *nfirstpos++ = *nlastpos++ = 1; - --firstpos, --lastpos; - firstpos->index = lastpos->index = i; - firstpos->constraint = lastpos->constraint = NO_CONSTRAINT; - - /* Allocate the follow set for this position. */ - nalloc[i] = 1; - MALLOC(d->follows[i].elems, position, nalloc[i]); - break; - } -#ifdef DEBUG - /* ... balance the above nonsyntactic #ifdef goo... */ - fprintf(stderr, "node %d:", i); - prtok(d->tokens[i]); - putc('\n', stderr); - fprintf(stderr, nullable[-1] ? " nullable: yes\n" : " nullable: no\n"); - fprintf(stderr, " firstpos:"); - for (j = nfirstpos[-1] - 1; j >= 0; --j) - { - fprintf(stderr, " %d:", firstpos[j].index); - prtok(d->tokens[firstpos[j].index]); - } - fprintf(stderr, "\n lastpos:"); - for (j = nlastpos[-1] - 1; j >= 0; --j) - { - fprintf(stderr, " %d:", lastpos[j].index); - prtok(d->tokens[lastpos[j].index]); - } - putc('\n', stderr); - } -#endif - - /* For each follow set that is the follow set of a real position, replace - it with its epsilon closure. */ - for (i = 0; i < d->tindex; ++i) - if (d->tokens[i] < NOTCHAR || d->tokens[i] == BACKREF - || d->tokens[i] >= CSET) - { -#ifdef DEBUG - fprintf(stderr, "follows(%d:", i); - prtok(d->tokens[i]); - fprintf(stderr, "):"); - for (j = d->follows[i].nelem - 1; j >= 0; --j) - { - fprintf(stderr, " %d:", d->follows[i].elems[j].index); - prtok(d->tokens[d->follows[i].elems[j].index]); - } - putc('\n', stderr); -#endif - copy(&d->follows[i], &merged); - epsclosure(&merged, d); - if (d->follows[i].nelem < merged.nelem) - REALLOC(d->follows[i].elems, position, merged.nelem); - copy(&merged, &d->follows[i]); - } - - /* Get the epsilon closure of the firstpos of the regexp. The result will - be the set of positions of state 0. */ - merged.nelem = 0; - for (i = 0; i < nfirstpos[-1]; ++i) - insert(firstpos[i], &merged); - epsclosure(&merged, d); - - /* Check if any of the positions of state 0 will want newline context. */ - wants_newline = 0; - for (i = 0; i < merged.nelem; ++i) - if (PREV_NEWLINE_DEPENDENT(merged.elems[i].constraint)) - wants_newline = 1; - - /* Build the initial state. */ - d->salloc = 1; - d->sindex = 0; - MALLOC(d->states, dfa_state, d->salloc); - state_index(d, &merged, wants_newline, 0); - - free(o_nullable); - free(o_nfirst); - free(o_firstpos); - free(o_nlast); - free(o_lastpos); - free(nalloc); - free(merged.elems); -} - -/* Find, for each character, the transition out of state s of d, and store - it in the appropriate slot of trans. - - We divide the positions of s into groups (positions can appear in more - than one group). Each group is labeled with a set of characters that - every position in the group matches (taking into account, if necessary, - preceding context information of s). For each group, find the union - of the its elements' follows. This set is the set of positions of the - new state. For each character in the group's label, set the transition - on this character to be to a state corresponding to the set's positions, - and its associated backward context information, if necessary. - - If we are building a searching matcher, we include the positions of state - 0 in every state. - - The collection of groups is constructed by building an equivalence-class - partition of the positions of s. - - For each position, find the set of characters C that it matches. Eliminate - any characters from C that fail on grounds of backward context. - - Search through the groups, looking for a group whose label L has nonempty - intersection with C. If L - C is nonempty, create a new group labeled - L - C and having the same positions as the current group, and set L to - the intersection of L and C. Insert the position in this group, set - C = C - L, and resume scanning. - - If after comparing with every group there are characters remaining in C, - create a new group labeled with the characters of C and insert this - position in that group. */ -void -dfastate(s, d, trans) - int s; - struct dfa *d; - int trans[]; -{ - position_set grps[NOTCHAR]; /* As many as will ever be needed. */ - charclass labels[NOTCHAR]; /* Labels corresponding to the groups. */ - int ngrps = 0; /* Number of groups actually used. */ - position pos; /* Current position being considered. */ - charclass matches; /* Set of matching characters. */ - int matchesf; /* True if matches is nonempty. */ - charclass intersect; /* Intersection with some label set. */ - int intersectf; /* True if intersect is nonempty. */ - charclass leftovers; /* Stuff in the label that didn't match. */ - int leftoversf; /* True if leftovers is nonempty. */ - static charclass letters; /* Set of characters considered letters. */ - static charclass newline; /* Set of characters that aren't newline. */ - position_set follows; /* Union of the follows of some group. */ - position_set tmp; /* Temporary space for merging sets. */ - int state; /* New state. */ - int wants_newline; /* New state wants to know newline context. */ - int state_newline; /* New state on a newline transition. */ - int wants_letter; /* New state wants to know letter context. */ - int state_letter; /* New state on a letter transition. */ - static initialized; /* Flag for static initialization. */ - int i, j, k; - - /* Initialize the set of letters, if necessary. */ - if (! initialized) - { - initialized = 1; - for (i = 0; i < NOTCHAR; ++i) - if (ISALNUM(i)) - setbit(i, letters); - setbit('\n', newline); - } - - zeroset(matches); - - for (i = 0; i < d->states[s].elems.nelem; ++i) - { - pos = d->states[s].elems.elems[i]; - if (d->tokens[pos.index] >= 0 && d->tokens[pos.index] < NOTCHAR) - setbit(d->tokens[pos.index], matches); - else if (d->tokens[pos.index] >= CSET) - copyset(d->charclasses[d->tokens[pos.index] - CSET], matches); - else - continue; - - /* Some characters may need to be eliminated from matches because - they fail in the current context. */ - if (pos.constraint != 0xFF) - { - if (! MATCHES_NEWLINE_CONTEXT(pos.constraint, - d->states[s].newline, 1)) - clrbit('\n', matches); - if (! MATCHES_NEWLINE_CONTEXT(pos.constraint, - d->states[s].newline, 0)) - for (j = 0; j < CHARCLASS_INTS; ++j) - matches[j] &= newline[j]; - if (! MATCHES_LETTER_CONTEXT(pos.constraint, - d->states[s].letter, 1)) - for (j = 0; j < CHARCLASS_INTS; ++j) - matches[j] &= ~letters[j]; - if (! MATCHES_LETTER_CONTEXT(pos.constraint, - d->states[s].letter, 0)) - for (j = 0; j < CHARCLASS_INTS; ++j) - matches[j] &= letters[j]; - - /* If there are no characters left, there's no point in going on. */ - for (j = 0; j < CHARCLASS_INTS && !matches[j]; ++j) - continue; - if (j == CHARCLASS_INTS) - continue; - } - - for (j = 0; j < ngrps; ++j) - { - /* If matches contains a single character only, and the current - group's label doesn't contain that character, go on to the - next group. */ - if (d->tokens[pos.index] >= 0 && d->tokens[pos.index] < NOTCHAR - && !tstbit(d->tokens[pos.index], labels[j])) - continue; - - /* Check if this group's label has a nonempty intersection with - matches. */ - intersectf = 0; - for (k = 0; k < CHARCLASS_INTS; ++k) - (intersect[k] = matches[k] & labels[j][k]) ? (intersectf = 1) : 0; - if (! intersectf) - continue; - - /* It does; now find the set differences both ways. */ - leftoversf = matchesf = 0; - for (k = 0; k < CHARCLASS_INTS; ++k) - { - /* Even an optimizing compiler can't know this for sure. */ - int match = matches[k], label = labels[j][k]; - - (leftovers[k] = ~match & label) ? (leftoversf = 1) : 0; - (matches[k] = match & ~label) ? (matchesf = 1) : 0; - } - - /* If there were leftovers, create a new group labeled with them. */ - if (leftoversf) - { - copyset(leftovers, labels[ngrps]); - copyset(intersect, labels[j]); - MALLOC(grps[ngrps].elems, position, d->nleaves); - copy(&grps[j], &grps[ngrps]); - ++ngrps; - } - - /* Put the position in the current group. Note that there is no - reason to call insert() here. */ - grps[j].elems[grps[j].nelem++] = pos; - - /* If every character matching the current position has been - accounted for, we're done. */ - if (! matchesf) - break; - } - - /* If we've passed the last group, and there are still characters - unaccounted for, then we'll have to create a new group. */ - if (j == ngrps) - { - copyset(matches, labels[ngrps]); - zeroset(matches); - MALLOC(grps[ngrps].elems, position, d->nleaves); - grps[ngrps].nelem = 1; - grps[ngrps].elems[0] = pos; - ++ngrps; - } - } - - MALLOC(follows.elems, position, d->nleaves); - MALLOC(tmp.elems, position, d->nleaves); - - /* If we are a searching matcher, the default transition is to a state - containing the positions of state 0, otherwise the default transition - is to fail miserably. */ - if (d->searchflag) - { - wants_newline = 0; - wants_letter = 0; - for (i = 0; i < d->states[0].elems.nelem; ++i) - { - if (PREV_NEWLINE_DEPENDENT(d->states[0].elems.elems[i].constraint)) - wants_newline = 1; - if (PREV_LETTER_DEPENDENT(d->states[0].elems.elems[i].constraint)) - wants_letter = 1; - } - copy(&d->states[0].elems, &follows); - state = state_index(d, &follows, 0, 0); - if (wants_newline) - state_newline = state_index(d, &follows, 1, 0); - else - state_newline = state; - if (wants_letter) - state_letter = state_index(d, &follows, 0, 1); - else - state_letter = state; - for (i = 0; i < NOTCHAR; ++i) - if (i == '\n') - trans[i] = state_newline; - else if (ISALNUM(i)) - trans[i] = state_letter; - else - trans[i] = state; - } - else - for (i = 0; i < NOTCHAR; ++i) - trans[i] = -1; - - for (i = 0; i < ngrps; ++i) - { - follows.nelem = 0; - - /* Find the union of the follows of the positions of the group. - This is a hideously inefficient loop. Fix it someday. */ - for (j = 0; j < grps[i].nelem; ++j) - for (k = 0; k < d->follows[grps[i].elems[j].index].nelem; ++k) - insert(d->follows[grps[i].elems[j].index].elems[k], &follows); - - /* If we are building a searching matcher, throw in the positions - of state 0 as well. */ - if (d->searchflag) - for (j = 0; j < d->states[0].elems.nelem; ++j) - insert(d->states[0].elems.elems[j], &follows); - - /* Find out if the new state will want any context information. */ - wants_newline = 0; - if (tstbit('\n', labels[i])) - for (j = 0; j < follows.nelem; ++j) - if (PREV_NEWLINE_DEPENDENT(follows.elems[j].constraint)) - wants_newline = 1; - - wants_letter = 0; - for (j = 0; j < CHARCLASS_INTS; ++j) - if (labels[i][j] & letters[j]) - break; - if (j < CHARCLASS_INTS) - for (j = 0; j < follows.nelem; ++j) - if (PREV_LETTER_DEPENDENT(follows.elems[j].constraint)) - wants_letter = 1; - - /* Find the state(s) corresponding to the union of the follows. */ - state = state_index(d, &follows, 0, 0); - if (wants_newline) - state_newline = state_index(d, &follows, 1, 0); - else - state_newline = state; - if (wants_letter) - state_letter = state_index(d, &follows, 0, 1); - else - state_letter = state; - - /* Set the transitions for each character in the current label. */ - for (j = 0; j < CHARCLASS_INTS; ++j) - for (k = 0; k < INTBITS; ++k) - if (labels[i][j] & 1 << k) - { - int c = j * INTBITS + k; - - if (c == '\n') - trans[c] = state_newline; - else if (ISALNUM(c)) - trans[c] = state_letter; - else if (c < NOTCHAR) - trans[c] = state; - } - } - - for (i = 0; i < ngrps; ++i) - free(grps[i].elems); - free(follows.elems); - free(tmp.elems); -} - -/* Some routines for manipulating a compiled dfa's transition tables. - Each state may or may not have a transition table; if it does, and it - is a non-accepting state, then d->trans[state] points to its table. - If it is an accepting state then d->fails[state] points to its table. - If it has no table at all, then d->trans[state] is NULL. - TODO: Improve this comment, get rid of the unnecessary redundancy. */ - -static void -build_state(s, d) - int s; - struct dfa *d; -{ - int *trans; /* The new transition table. */ - int i; - - /* Set an upper limit on the number of transition tables that will ever - exist at once. 1024 is arbitrary. The idea is that the frequently - used transition tables will be quickly rebuilt, whereas the ones that - were only needed once or twice will be cleared away. */ - if (d->trcount >= 1024) - { - for (i = 0; i < d->tralloc; ++i) - if (d->trans[i]) - { - free((ptr_t) d->trans[i]); - d->trans[i] = NULL; - } - else if (d->fails[i]) - { - free((ptr_t) d->fails[i]); - d->fails[i] = NULL; - } - d->trcount = 0; - } - - ++d->trcount; - - /* Set up the success bits for this state. */ - d->success[s] = 0; - if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 1, d->states[s].letter, 0, - s, *d)) - d->success[s] |= 4; - if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 0, d->states[s].letter, 1, - s, *d)) - d->success[s] |= 2; - if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 0, d->states[s].letter, 0, - s, *d)) - d->success[s] |= 1; - - MALLOC(trans, int, NOTCHAR); - dfastate(s, d, trans); - - /* Now go through the new transition table, and make sure that the trans - and fail arrays are allocated large enough to hold a pointer for the - largest state mentioned in the table. */ - for (i = 0; i < NOTCHAR; ++i) - if (trans[i] >= d->tralloc) - { - int oldalloc = d->tralloc; - - while (trans[i] >= d->tralloc) - d->tralloc *= 2; - REALLOC(d->realtrans, int *, d->tralloc + 1); - d->trans = d->realtrans + 1; - REALLOC(d->fails, int *, d->tralloc); - REALLOC(d->success, int, d->tralloc); - REALLOC(d->newlines, int, d->tralloc); - while (oldalloc < d->tralloc) - { - d->trans[oldalloc] = NULL; - d->fails[oldalloc++] = NULL; - } - } - - /* Keep the newline transition in a special place so we can use it as - a sentinel. */ - d->newlines[s] = trans['\n']; - trans['\n'] = -1; - - if (ACCEPTING(s, *d)) - d->fails[s] = trans; - else - d->trans[s] = trans; -} - -static void -build_state_zero(d) - struct dfa *d; -{ - d->tralloc = 1; - d->trcount = 0; - CALLOC(d->realtrans, int *, d->tralloc + 1); - d->trans = d->realtrans + 1; - CALLOC(d->fails, int *, d->tralloc); - MALLOC(d->success, int, d->tralloc); - MALLOC(d->newlines, int, d->tralloc); - build_state(0, d); -} - -/* Search through a buffer looking for a match to the given struct dfa. - Find the first occurrence of a string matching the regexp in the buffer, - and the shortest possible version thereof. Return a pointer to the first - character after the match, or NULL if none is found. Begin points to - the beginning of the buffer, and end points to the first character after - its end. We store a newline in *end to act as a sentinel, so end had - better point somewhere valid. Newline is a flag indicating whether to - allow newlines to be in the matching string. If count is non- - NULL it points to a place we're supposed to increment every time we - see a newline. Finally, if backref is non-NULL it points to a place - where we're supposed to store a 1 if backreferencing happened and the - match needs to be verified by a backtracking matcher. Otherwise - we store a 0 in *backref. */ -char * -dfaexec(d, begin, end, newline, count, backref) - struct dfa *d; - char *begin; - char *end; - int newline; - int *count; - int *backref; -{ - register s, s1, tmp; /* Current state. */ - register unsigned char *p; /* Current input character. */ - register **trans, *t; /* Copy of d->trans so it can be optimized - into a register. */ - static sbit[NOTCHAR]; /* Table for anding with d->success. */ - static sbit_init; - - if (! sbit_init) - { - int i; - - sbit_init = 1; - for (i = 0; i < NOTCHAR; ++i) - if (i == '\n') - sbit[i] = 4; - else if (ISALNUM(i)) - sbit[i] = 2; - else - sbit[i] = 1; - } - - if (! d->tralloc) - build_state_zero(d); - - s = s1 = 0; - p = (unsigned char *) begin; - trans = d->trans; - *end = '\n'; - - for (;;) - { - /* The dreaded inner loop. */ - if ((t = trans[s]) != 0) - do - { - s1 = t[*p++]; - if (! (t = trans[s1])) - goto last_was_s; - s = t[*p++]; - } - while ((t = trans[s]) != 0); - goto last_was_s1; - last_was_s: - tmp = s, s = s1, s1 = tmp; - last_was_s1: - - if (s >= 0 && p <= (unsigned char *) end && d->fails[s]) - { - if (d->success[s] & sbit[*p]) - { - if (backref) - if (d->states[s].backref) - *backref = 1; - else - *backref = 0; - return (char *) p; - } - - s1 = s; - s = d->fails[s][*p++]; - continue; - } - - /* If the previous character was a newline, count it. */ - if (count && (char *) p <= end && p[-1] == '\n') - ++*count; - - /* Check if we've run off the end of the buffer. */ - if ((char *) p > end) - return NULL; - - if (s >= 0) - { - build_state(s, d); - trans = d->trans; - continue; - } - - if (p[-1] == '\n' && newline) - { - s = d->newlines[s1]; - continue; - } - - s = 0; - } -} - -/* Initialize the components of a dfa that the other routines don't - initialize for themselves. */ -void -dfainit(d) - struct dfa *d; -{ - d->calloc = 1; - MALLOC(d->charclasses, charclass, d->calloc); - d->cindex = 0; - - d->talloc = 1; - MALLOC(d->tokens, token, d->talloc); - d->tindex = d->depth = d->nleaves = d->nregexps = 0; - - d->searchflag = 0; - d->tralloc = 0; - - d->musts = 0; -} - -/* Parse and analyze a single string of the given length. */ -void -dfacomp(s, len, d, searchflag) - char *s; - size_t len; - struct dfa *d; - int searchflag; -{ - if (case_fold) /* dummy folding in service of dfamust() */ - { - char *lcopy; - int i; - - lcopy = malloc(len); - if (!lcopy) - dfaerror("out of memory"); - - /* This is a kludge. */ - case_fold = 0; - for (i = 0; i < len; ++i) - if (ISUPPER(s[i])) - lcopy[i] = tolower(s[i]); - else - lcopy[i] = s[i]; - - dfainit(d); - dfaparse(lcopy, len, d); - free(lcopy); - dfamust(d); - d->cindex = d->tindex = d->depth = d->nleaves = d->nregexps = 0; - case_fold = 1; - dfaparse(s, len, d); - dfaanalyze(d, searchflag); - } - else - { - dfainit(d); - dfaparse(s, len, d); - dfamust(d); - dfaanalyze(d, searchflag); - } -} - -/* Free the storage held by the components of a dfa. */ -void -dfafree(d) - struct dfa *d; -{ - int i; - struct dfamust *dm, *ndm; - - free((ptr_t) d->charclasses); - free((ptr_t) d->tokens); - for (i = 0; i < d->sindex; ++i) - free((ptr_t) d->states[i].elems.elems); - free((ptr_t) d->states); - for (i = 0; i < d->tindex; ++i) - if (d->follows[i].elems) - free((ptr_t) d->follows[i].elems); - free((ptr_t) d->follows); - for (i = 0; i < d->tralloc; ++i) - if (d->trans[i]) - free((ptr_t) d->trans[i]); - else if (d->fails[i]) - free((ptr_t) d->fails[i]); - if (d->realtrans) free((ptr_t) d->realtrans); - if (d->fails) free((ptr_t) d->fails); - if (d->newlines) free((ptr_t) d->newlines); - for (dm = d->musts; dm; dm = ndm) - { - ndm = dm->next; - free(dm->must); - free((ptr_t) dm); - } -} - -/* Having found the postfix representation of the regular expression, - try to find a long sequence of characters that must appear in any line - containing the r.e. - Finding a "longest" sequence is beyond the scope here; - we take an easy way out and hope for the best. - (Take "(ab|a)b"--please.) - - We do a bottom-up calculation of sequences of characters that must appear - in matches of r.e.'s represented by trees rooted at the nodes of the postfix - representation: - sequences that must appear at the left of the match ("left") - sequences that must appear at the right of the match ("right") - lists of sequences that must appear somewhere in the match ("in") - sequences that must constitute the match ("is") - - When we get to the root of the tree, we use one of the longest of its - calculated "in" sequences as our answer. The sequence we find is returned in - d->must (where "d" is the single argument passed to "dfamust"); - the length of the sequence is returned in d->mustn. - - The sequences calculated for the various types of node (in pseudo ANSI c) - are shown below. "p" is the operand of unary operators (and the left-hand - operand of binary operators); "q" is the right-hand operand of binary - operators. - - "ZERO" means "a zero-length sequence" below. - - Type left right is in - ---- ---- ----- -- -- - char c # c # c # c # c - - CSET ZERO ZERO ZERO ZERO - - STAR ZERO ZERO ZERO ZERO - - QMARK ZERO ZERO ZERO ZERO - - PLUS p->left p->right ZERO p->in - - CAT (p->is==ZERO)? (q->is==ZERO)? (p->is!=ZERO && p->in plus - p->left : q->right : q->is!=ZERO) ? q->in plus - p->is##q->left p->right##q->is p->is##q->is : p->right##q->left - ZERO - - OR longest common longest common (do p->is and substrings common to - leading trailing q->is have same p->in and q->in - (sub)sequence (sub)sequence length and - of p->left of p->right content) ? - and q->left and q->right p->is : NULL - - If there's anything else we recognize in the tree, all four sequences get set - to zero-length sequences. If there's something we don't recognize in the tree, - we just return a zero-length sequence. - - Break ties in favor of infrequent letters (choosing 'zzz' in preference to - 'aaa')? - - And. . .is it here or someplace that we might ponder "optimizations" such as - egrep 'psi|epsilon' -> egrep 'psi' - egrep 'pepsi|epsilon' -> egrep 'epsi' - (Yes, we now find "epsi" as a "string - that must occur", but we might also - simplify the *entire* r.e. being sought) - grep '[c]' -> grep 'c' - grep '(ab|a)b' -> grep 'ab' - grep 'ab*' -> grep 'a' - grep 'a*b' -> grep 'b' - - There are several issues: - - Is optimization easy (enough)? - - Does optimization actually accomplish anything, - or is the automaton you get from "psi|epsilon" (for example) - the same as the one you get from "psi" (for example)? - - Are optimizable r.e.'s likely to be used in real-life situations - (something like 'ab*' is probably unlikely; something like is - 'psi|epsilon' is likelier)? */ - -static char * -icatalloc(old, new) - char *old; - char *new; -{ - char *result; - size_t oldsize, newsize; - - newsize = (new == NULL) ? 0 : strlen(new); - if (old == NULL) - oldsize = 0; - else if (newsize == 0) - return old; - else oldsize = strlen(old); - if (old == NULL) - result = (char *) malloc(newsize + 1); - else - result = (char *) realloc((void *) old, oldsize + newsize + 1); - if (result != NULL && new != NULL) - (void) strcpy(result + oldsize, new); - return result; -} - -static char * -icpyalloc(string) - char *string; -{ - return icatalloc((char *) NULL, string); -} - -static char * -istrstr(lookin, lookfor) - char *lookin; - char *lookfor; -{ - char *cp; - size_t len; - - len = strlen(lookfor); - for (cp = lookin; *cp != '\0'; ++cp) - if (strncmp(cp, lookfor, len) == 0) - return cp; - return NULL; -} - -static void -ifree(cp) - char *cp; -{ - if (cp != NULL) - free(cp); -} - -static void -freelist(cpp) - char **cpp; -{ - int i; - - if (cpp == NULL) - return; - for (i = 0; cpp[i] != NULL; ++i) - { - free(cpp[i]); - cpp[i] = NULL; - } -} - -static char ** -enlist(cpp, new, len) - char **cpp; - char *new; - size_t len; -{ - int i, j; - - if (cpp == NULL) - return NULL; - if ((new = icpyalloc(new)) == NULL) - { - freelist(cpp); - return NULL; - } - new[len] = '\0'; - /* Is there already something in the list that's new (or longer)? */ - for (i = 0; cpp[i] != NULL; ++i) - if (istrstr(cpp[i], new) != NULL) - { - free(new); - return cpp; - } - /* Eliminate any obsoleted strings. */ - j = 0; - while (cpp[j] != NULL) - if (istrstr(new, cpp[j]) == NULL) - ++j; - else - { - free(cpp[j]); - if (--i == j) - break; - cpp[j] = cpp[i]; - cpp[i] = NULL; - } - /* Add the new string. */ - cpp = (char **) realloc((char *) cpp, (i + 2) * sizeof *cpp); - if (cpp == NULL) - return NULL; - cpp[i] = new; - cpp[i + 1] = NULL; - return cpp; -} - -/* Given pointers to two strings, return a pointer to an allocated - list of their distinct common substrings. Return NULL if something - seems wild. */ -static char ** -comsubs(left, right) - char *left; - char *right; -{ - char **cpp; - char *lcp; - char *rcp; - size_t i, len; - - if (left == NULL || right == NULL) - return NULL; - cpp = (char **) malloc(sizeof *cpp); - if (cpp == NULL) - return NULL; - cpp[0] = NULL; - for (lcp = left; *lcp != '\0'; ++lcp) - { - len = 0; - rcp = index(right, *lcp); - while (rcp != NULL) - { - for (i = 1; lcp[i] != '\0' && lcp[i] == rcp[i]; ++i) - continue; - if (i > len) - len = i; - rcp = index(rcp + 1, *lcp); - } - if (len == 0) - continue; - if ((cpp = enlist(cpp, lcp, len)) == NULL) - break; - } - return cpp; -} - -static char ** -addlists(old, new) -char **old; -char **new; -{ - int i; - - if (old == NULL || new == NULL) - return NULL; - for (i = 0; new[i] != NULL; ++i) - { - old = enlist(old, new[i], strlen(new[i])); - if (old == NULL) - break; - } - return old; -} - -/* Given two lists of substrings, return a new list giving substrings - common to both. */ -static char ** -inboth(left, right) - char **left; - char **right; -{ - char **both; - char **temp; - int lnum, rnum; - - if (left == NULL || right == NULL) - return NULL; - both = (char **) malloc(sizeof *both); - if (both == NULL) - return NULL; - both[0] = NULL; - for (lnum = 0; left[lnum] != NULL; ++lnum) - { - for (rnum = 0; right[rnum] != NULL; ++rnum) - { - temp = comsubs(left[lnum], right[rnum]); - if (temp == NULL) - { - freelist(both); - return NULL; - } - both = addlists(both, temp); - freelist(temp); - if (both == NULL) - return NULL; - } - } - return both; -} - -typedef struct -{ - char **in; - char *left; - char *right; - char *is; -} must; - -static void -resetmust(mp) -must *mp; -{ - mp->left[0] = mp->right[0] = mp->is[0] = '\0'; - freelist(mp->in); -} - -static void -dfamust(dfa) -struct dfa *dfa; -{ - must *musts; - must *mp; - char *result; - int ri; - int i; - int exact; - token t; - static must must0; - struct dfamust *dm; - static char empty_string[] = ""; - - result = empty_string; - exact = 0; - musts = (must *) malloc((dfa->tindex + 1) * sizeof *musts); - if (musts == NULL) - return; - mp = musts; - for (i = 0; i <= dfa->tindex; ++i) - mp[i] = must0; - for (i = 0; i <= dfa->tindex; ++i) - { - mp[i].in = (char **) malloc(sizeof *mp[i].in); - mp[i].left = malloc(2); - mp[i].right = malloc(2); - mp[i].is = malloc(2); - if (mp[i].in == NULL || mp[i].left == NULL || - mp[i].right == NULL || mp[i].is == NULL) - goto done; - mp[i].left[0] = mp[i].right[0] = mp[i].is[0] = '\0'; - mp[i].in[0] = NULL; - } -#ifdef DEBUG - fprintf(stderr, "dfamust:\n"); - for (i = 0; i < dfa->tindex; ++i) - { - fprintf(stderr, " %d:", i); - prtok(dfa->tokens[i]); - } - putc('\n', stderr); -#endif - for (ri = 0; ri < dfa->tindex; ++ri) - { - switch (t = dfa->tokens[ri]) - { - case LPAREN: - case RPAREN: - goto done; /* "cannot happen" */ - case EMPTY: - case BEGLINE: - case ENDLINE: - case BEGWORD: - case ENDWORD: - case LIMWORD: - case NOTLIMWORD: - case BACKREF: - resetmust(mp); - break; - case STAR: - case QMARK: - if (mp <= musts) - goto done; /* "cannot happen" */ - --mp; - resetmust(mp); - break; - case OR: - case ORTOP: - if (mp < &musts[2]) - goto done; /* "cannot happen" */ - { - char **new; - must *lmp; - must *rmp; - int j, ln, rn, n; - - rmp = --mp; - lmp = --mp; - /* Guaranteed to be. Unlikely, but. . . */ - if (strcmp(lmp->is, rmp->is) != 0) - lmp->is[0] = '\0'; - /* Left side--easy */ - i = 0; - while (lmp->left[i] != '\0' && lmp->left[i] == rmp->left[i]) - ++i; - lmp->left[i] = '\0'; - /* Right side */ - ln = strlen(lmp->right); - rn = strlen(rmp->right); - n = ln; - if (n > rn) - n = rn; - for (i = 0; i < n; ++i) - if (lmp->right[ln - i - 1] != rmp->right[rn - i - 1]) - break; - for (j = 0; j < i; ++j) - lmp->right[j] = lmp->right[(ln - i) + j]; - lmp->right[j] = '\0'; - new = inboth(lmp->in, rmp->in); - if (new == NULL) - goto done; - freelist(lmp->in); - free((char *) lmp->in); - lmp->in = new; - } - break; - case PLUS: - if (mp <= musts) - goto done; /* "cannot happen" */ - --mp; - mp->is[0] = '\0'; - break; - case END: - if (mp != &musts[1]) - goto done; /* "cannot happen" */ - for (i = 0; musts[0].in[i] != NULL; ++i) - if (strlen(musts[0].in[i]) > strlen(result)) - result = musts[0].in[i]; - if (strcmp(result, musts[0].is) == 0) - exact = 1; - goto done; - case CAT: - if (mp < &musts[2]) - goto done; /* "cannot happen" */ - { - must *lmp; - must *rmp; - - rmp = --mp; - lmp = --mp; - /* In. Everything in left, plus everything in - right, plus catenation of - left's right and right's left. */ - lmp->in = addlists(lmp->in, rmp->in); - if (lmp->in == NULL) - goto done; - if (lmp->right[0] != '\0' && - rmp->left[0] != '\0') - { - char *tp; - - tp = icpyalloc(lmp->right); - if (tp == NULL) - goto done; - tp = icatalloc(tp, rmp->left); - if (tp == NULL) - goto done; - lmp->in = enlist(lmp->in, tp, - strlen(tp)); - free(tp); - if (lmp->in == NULL) - goto done; - } - /* Left-hand */ - if (lmp->is[0] != '\0') - { - lmp->left = icatalloc(lmp->left, - rmp->left); - if (lmp->left == NULL) - goto done; - } - /* Right-hand */ - if (rmp->is[0] == '\0') - lmp->right[0] = '\0'; - lmp->right = icatalloc(lmp->right, rmp->right); - if (lmp->right == NULL) - goto done; - /* Guaranteed to be */ - if (lmp->is[0] != '\0' && rmp->is[0] != '\0') - { - lmp->is = icatalloc(lmp->is, rmp->is); - if (lmp->is == NULL) - goto done; - } - else - lmp->is[0] = '\0'; - } - break; - default: - if (t < END) - { - /* "cannot happen" */ - goto done; - } - else if (t == '\0') - { - /* not on *my* shift */ - goto done; - } - else if (t >= CSET) - { - /* easy enough */ - resetmust(mp); - } - else - { - /* plain character */ - resetmust(mp); - mp->is[0] = mp->left[0] = mp->right[0] = t; - mp->is[1] = mp->left[1] = mp->right[1] = '\0'; - mp->in = enlist(mp->in, mp->is, (size_t)1); - if (mp->in == NULL) - goto done; - } - break; - } -#ifdef DEBUG - fprintf(stderr, " node: %d:", ri); - prtok(dfa->tokens[ri]); - fprintf(stderr, "\n in:"); - for (i = 0; mp->in[i]; ++i) - fprintf(stderr, " \"%s\"", mp->in[i]); - fprintf(stderr, "\n is: \"%s\"\n", mp->is); - fprintf(stderr, " left: \"%s\"\n", mp->left); - fprintf(stderr, " right: \"%s\"\n", mp->right); -#endif - ++mp; - } - done: - if (strlen(result)) - { - dm = (struct dfamust *) malloc(sizeof (struct dfamust)); - dm->exact = exact; - dm->must = malloc(strlen(result) + 1); - strcpy(dm->must, result); - dm->next = dfa->musts; - dfa->musts = dm; - } - mp = musts; - for (i = 0; i <= dfa->tindex; ++i) - { - freelist(mp[i].in); - ifree((char *) mp[i].in); - ifree(mp[i].left); - ifree(mp[i].right); - ifree(mp[i].is); - } - free((char *) mp); -} diff --git a/gnu/usr.bin/awk/dfa.h b/gnu/usr.bin/awk/dfa.h deleted file mode 100644 index cc27d7a..0000000 --- a/gnu/usr.bin/awk/dfa.h +++ /dev/null @@ -1,360 +0,0 @@ -/* dfa.h - declarations for GNU deterministic regexp compiler - Copyright (C) 1988 Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ - -/* Written June, 1988 by Mike Haertel */ - -/* FIXME: - 2. We should not export so much of the DFA internals. - In addition to clobbering modularity, we eat up valuable - name space. */ - -/* Number of bits in an unsigned char. */ -#define CHARBITS 8 - -/* First integer value that is greater than any character code. */ -#define NOTCHAR (1 << CHARBITS) - -/* INTBITS need not be exact, just a lower bound. */ -#define INTBITS (CHARBITS * sizeof (int)) - -/* Number of ints required to hold a bit for every character. */ -#define CHARCLASS_INTS ((NOTCHAR + INTBITS - 1) / INTBITS) - -/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */ -typedef int charclass[CHARCLASS_INTS]; - -/* The regexp is parsed into an array of tokens in postfix form. Some tokens - are operators and others are terminal symbols. Most (but not all) of these - codes are returned by the lexical analyzer. */ - -typedef enum -{ - END = -1, /* END is a terminal symbol that matches the - end of input; any value of END or less in - the parse tree is such a symbol. Accepting - states of the DFA are those that would have - a transition on END. */ - - /* Ordinary character values are terminal symbols that match themselves. */ - - EMPTY = NOTCHAR, /* EMPTY is a terminal symbol that matches - the empty string. */ - - BACKREF, /* BACKREF is generated by \<digit>; it - it not completely handled. If the scanner - detects a transition on backref, it returns - a kind of "semi-success" indicating that - the match will have to be verified with - a backtracking matcher. */ - - BEGLINE, /* BEGLINE is a terminal symbol that matches - the empty string if it is at the beginning - of a line. */ - - ENDLINE, /* ENDLINE is a terminal symbol that matches - the empty string if it is at the end of - a line. */ - - BEGWORD, /* BEGWORD is a terminal symbol that matches - the empty string if it is at the beginning - of a word. */ - - ENDWORD, /* ENDWORD is a terminal symbol that matches - the empty string if it is at the end of - a word. */ - - LIMWORD, /* LIMWORD is a terminal symbol that matches - the empty string if it is at the beginning - or the end of a word. */ - - NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that - matches the empty string if it is not at - the beginning or end of a word. */ - - QMARK, /* QMARK is an operator of one argument that - matches zero or one occurences of its - argument. */ - - STAR, /* STAR is an operator of one argument that - matches the Kleene closure (zero or more - occurrences) of its argument. */ - - PLUS, /* PLUS is an operator of one argument that - matches the positive closure (one or more - occurrences) of its argument. */ - - REPMN, /* REPMN is a lexical token corresponding - to the {m,n} construct. REPMN never - appears in the compiled token vector. */ - - CAT, /* CAT is an operator of two arguments that - matches the concatenation of its - arguments. CAT is never returned by the - lexical analyzer. */ - - OR, /* OR is an operator of two arguments that - matches either of its arguments. */ - - ORTOP, /* OR at the toplevel in the parse tree. - This is used for a boyer-moore heuristic. */ - - LPAREN, /* LPAREN never appears in the parse tree, - it is only a lexeme. */ - - RPAREN, /* RPAREN never appears in the parse tree. */ - - CSET /* CSET and (and any value greater) is a - terminal symbol that matches any of a - class of characters. */ -} token; - -/* Sets are stored in an array in the compiled dfa; the index of the - array corresponding to a given set token is given by SET_INDEX(t). */ -#define SET_INDEX(t) ((t) - CSET) - -/* Sometimes characters can only be matched depending on the surrounding - context. Such context decisions depend on what the previous character - was, and the value of the current (lookahead) character. Context - dependent constraints are encoded as 8 bit integers. Each bit that - is set indicates that the constraint succeeds in the corresponding - context. - - bit 7 - previous and current are newlines - bit 6 - previous was newline, current isn't - bit 5 - previous wasn't newline, current is - bit 4 - neither previous nor current is a newline - bit 3 - previous and current are word-constituents - bit 2 - previous was word-constituent, current isn't - bit 1 - previous wasn't word-constituent, current is - bit 0 - neither previous nor current is word-constituent - - Word-constituent characters are those that satisfy isalnum(). - - The macro SUCCEEDS_IN_CONTEXT determines whether a a given constraint - succeeds in a particular context. Prevn is true if the previous character - was a newline, currn is true if the lookahead character is a newline. - Prevl and currl similarly depend upon whether the previous and current - characters are word-constituent letters. */ -#define MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \ - ((constraint) & 1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4)) -#define MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \ - ((constraint) & 1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0))) -#define SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \ - (MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \ - && MATCHES_LETTER_CONTEXT(constraint, prevl, currl)) - -/* The following macros give information about what a constraint depends on. */ -#define PREV_NEWLINE_DEPENDENT(constraint) \ - (((constraint) & 0xc0) >> 2 != ((constraint) & 0x30)) -#define PREV_LETTER_DEPENDENT(constraint) \ - (((constraint) & 0x0c) >> 2 != ((constraint) & 0x03)) - -/* Tokens that match the empty string subject to some constraint actually - work by applying that constraint to determine what may follow them, - taking into account what has gone before. The following values are - the constraints corresponding to the special tokens previously defined. */ -#define NO_CONSTRAINT 0xff -#define BEGLINE_CONSTRAINT 0xcf -#define ENDLINE_CONSTRAINT 0xaf -#define BEGWORD_CONSTRAINT 0xf2 -#define ENDWORD_CONSTRAINT 0xf4 -#define LIMWORD_CONSTRAINT 0xf6 -#define NOTLIMWORD_CONSTRAINT 0xf9 - -/* States of the recognizer correspond to sets of positions in the parse - tree, together with the constraints under which they may be matched. - So a position is encoded as an index into the parse tree together with - a constraint. */ -typedef struct -{ - unsigned index; /* Index into the parse array. */ - unsigned constraint; /* Constraint for matching this position. */ -} position; - -/* Sets of positions are stored as arrays. */ -typedef struct -{ - position *elems; /* Elements of this position set. */ - int nelem; /* Number of elements in this set. */ -} position_set; - -/* A state of the dfa consists of a set of positions, some flags, - and the token value of the lowest-numbered position of the state that - contains an END token. */ -typedef struct -{ - int hash; /* Hash of the positions of this state. */ - position_set elems; /* Positions this state could match. */ - char newline; /* True if previous state matched newline. */ - char letter; /* True if previous state matched a letter. */ - char backref; /* True if this state matches a \<digit>. */ - unsigned char constraint; /* Constraint for this state to accept. */ - int first_end; /* Token value of the first END in elems. */ -} dfa_state; - -/* Element of a list of strings, at least one of which is known to - appear in any R.E. matching the DFA. */ -struct dfamust -{ - int exact; - char *must; - struct dfamust *next; -}; - -/* A compiled regular expression. */ -struct dfa -{ - /* Stuff built by the scanner. */ - charclass *charclasses; /* Array of character sets for CSET tokens. */ - int cindex; /* Index for adding new charclasses. */ - int calloc; /* Number of charclasses currently allocated. */ - - /* Stuff built by the parser. */ - token *tokens; /* Postfix parse array. */ - int tindex; /* Index for adding new tokens. */ - int talloc; /* Number of tokens currently allocated. */ - int depth; /* Depth required of an evaluation stack - used for depth-first traversal of the - parse tree. */ - int nleaves; /* Number of leaves on the parse tree. */ - int nregexps; /* Count of parallel regexps being built - with dfaparse(). */ - - /* Stuff owned by the state builder. */ - dfa_state *states; /* States of the dfa. */ - int sindex; /* Index for adding new states. */ - int salloc; /* Number of states currently allocated. */ - - /* Stuff built by the structure analyzer. */ - position_set *follows; /* Array of follow sets, indexed by position - index. The follow of a position is the set - of positions containing characters that - could conceivably follow a character - matching the given position in a string - matching the regexp. Allocated to the - maximum possible position index. */ - int searchflag; /* True if we are supposed to build a searching - as opposed to an exact matcher. A searching - matcher finds the first and shortest string - matching a regexp anywhere in the buffer, - whereas an exact matcher finds the longest - string matching, but anchored to the - beginning of the buffer. */ - - /* Stuff owned by the executor. */ - int tralloc; /* Number of transition tables that have - slots so far. */ - int trcount; /* Number of transition tables that have - actually been built. */ - int **trans; /* Transition tables for states that can - never accept. If the transitions for a - state have not yet been computed, or the - state could possibly accept, its entry in - this table is NULL. */ - int **realtrans; /* Trans always points to realtrans + 1; this - is so trans[-1] can contain NULL. */ - int **fails; /* Transition tables after failing to accept - on a state that potentially could do so. */ - int *success; /* Table of acceptance conditions used in - dfaexec and computed in build_state. */ - int *newlines; /* Transitions on newlines. The entry for a - newline in any transition table is always - -1 so we can count lines without wasting - too many cycles. The transition for a - newline is stored separately and handled - as a special case. Newline is also used - as a sentinel at the end of the buffer. */ - struct dfamust *musts; /* List of strings, at least one of which - is known to appear in any r.e. matching - the dfa. */ -}; - -/* Some macros for user access to dfa internals. */ - -/* ACCEPTING returns true if s could possibly be an accepting state of r. */ -#define ACCEPTING(s, r) ((r).states[s].constraint) - -/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the - specified context. */ -#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, dfa) \ - SUCCEEDS_IN_CONTEXT((dfa).states[state].constraint, \ - prevn, currn, prevl, currl) - -/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel - regexps that a given state could accept. Parallel regexps are numbered - starting at 1. */ -#define FIRST_MATCHING_REGEXP(state, dfa) (-(dfa).states[state].first_end) - -/* Entry points. */ - -#ifdef __STDC__ - -/* dfasyntax() takes two arguments; the first sets the syntax bits described - earlier in this file, and the second sets the case-folding flag. */ -extern void dfasyntax(reg_syntax_t, int); - -/* Compile the given string of the given length into the given struct dfa. - Final argument is a flag specifying whether to build a searching or an - exact matcher. */ -extern void dfacomp(char *, size_t, struct dfa *, int); - -/* Execute the given struct dfa on the buffer of characters. The - first char * points to the beginning, and the second points to the - first character after the end of the buffer, which must be a writable - place so a sentinel end-of-buffer marker can be stored there. The - second-to-last argument is a flag telling whether to allow newlines to - be part of a string matching the regexp. The next-to-last argument, - if non-NULL, points to a place to increment every time we see a - newline. The final argument, if non-NULL, points to a flag that will - be set if further examination by a backtracking matcher is needed in - order to verify backreferencing; otherwise the flag will be cleared. - Returns NULL if no match is found, or a pointer to the first - character after the first & shortest matching string in the buffer. */ -extern char *dfaexec(struct dfa *, char *, char *, int, int *, int *); - -/* Free the storage held by the components of a struct dfa. */ -extern void dfafree(struct dfa *); - -/* Entry points for people who know what they're doing. */ - -/* Initialize the components of a struct dfa. */ -extern void dfainit(struct dfa *); - -/* Incrementally parse a string of given length into a struct dfa. */ -extern void dfaparse(char *, size_t, struct dfa *); - -/* Analyze a parsed regexp; second argument tells whether to build a searching - or an exact matcher. */ -extern void dfaanalyze(struct dfa *, int); - -/* Compute, for each possible character, the transitions out of a given - state, storing them in an array of integers. */ -extern void dfastate(int, struct dfa *, int []); - -/* Error handling. */ - -/* dfaerror() is called by the regexp routines whenever an error occurs. It - takes a single argument, a NUL-terminated string describing the error. - The default dfaerror() prints the error message to stderr and exits. - The user can provide a different dfafree() if so desired. */ -extern void dfaerror(const char *); - -#else /* ! __STDC__ */ -extern void dfasyntax(), dfacomp(), dfafree(), dfainit(), dfaparse(); -extern void dfaanalyze(), dfastate(), dfaerror(); -extern char *dfaexec(); -#endif /* ! __STDC__ */ diff --git a/gnu/usr.bin/awk/doc/gawk.texi b/gnu/usr.bin/awk/doc/gawk.texi deleted file mode 100644 index b280262..0000000 --- a/gnu/usr.bin/awk/doc/gawk.texi +++ /dev/null @@ -1,11270 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@c %**start of header (This is for running Texinfo on a region.) -@setfilename gawk.info -@settitle The GAWK Manual -@c @smallbook -@c %**end of header (This is for running Texinfo on a region.) - -@ifinfo -@synindex fn cp -@synindex vr cp -@end ifinfo -@iftex -@syncodeindex fn cp -@syncodeindex vr cp -@end iftex - -@c If "finalout" is commented out, the printed output will show -@c black boxes that mark lines that are too long. Thus, it is -@c unwise to comment it out when running a master in case there are -@c overfulls which are deemed okay. - -@iftex -@finalout -@end iftex - -@c ===> NOTE! <== -@c Determine the edition number in *four* places by hand: -@c 1. First ifinfo section 2. title page 3. copyright page 4. top node -@c To find the locations, search for !!set - -@ifinfo -This file documents @code{awk}, a program that you can use to select -particular records in a file and perform operations upon them. - -This is Edition 0.15 of @cite{The GAWK Manual}, @* -for the 2.15 version of the GNU implementation @* -of AWK. - -Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -@ignore -Permission is granted to process this file through TeX and print the -results, provided the printed document carries copying permission -notice identical to this one except for the removal of this paragraph -(this paragraph not being relevant to the printed manual). - -@end ignore -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the entire -resulting derived work is distributed under the terms of a permission -notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that this permission notice may be stated in a translation approved -by the Foundation. -@end ifinfo - -@setchapternewpage odd - -@c !!set edition, date, version -@titlepage -@title The GAWK Manual -@subtitle Edition 0.15 -@subtitle April 1993 -@author Diane Barlow Close -@author Arnold D. Robbins -@author Paul H. Rubin -@author Richard Stallman - -@c Include the Distribution inside the titlepage environment so -@c that headings are turned off. Headings on and off do not work. - -@page -@vskip 0pt plus 1filll -Copyright @copyright{} 1989, 1991, 1992, 1993 Free Software Foundation, Inc. -@sp 2 - -@c !!set edition, date, version -This is Edition 0.15 of @cite{The GAWK Manual}, @* -for the 2.15 version of the GNU implementation @* -of AWK. - -@sp 2 -Published by the Free Software Foundation @* -675 Massachusetts Avenue @* -Cambridge, MA 02139 USA @* -Printed copies are available for $20 each. - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the entire -resulting derived work is distributed under the terms of a permission -notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that this permission notice may be stated in a translation approved -by the Foundation. -@end titlepage - -@ifinfo -@node Top, Preface, (dir), (dir) -@comment node-name, next, previous, up -@top General Introduction -@c Preface or Licensing nodes should come right after the Top -@c node, in `unnumbered' sections, then the chapter, `What is gawk'. - -This file documents @code{awk}, a program that you can use to select -particular records in a file and perform operations upon them. - -@c !!set edition, date, version -This is Edition 0.15 of @cite{The GAWK Manual}, @* -for the 2.15 version of the GNU implementation @* -of AWK. - -@end ifinfo - -@menu -* Preface:: What you can do with @code{awk}; brief history - and acknowledgements. -* Copying:: Your right to copy and distribute @code{gawk}. -* This Manual:: Using this manual. - Includes sample input files that you can use. -* Getting Started:: A basic introduction to using @code{awk}. - How to run an @code{awk} program. - Command line syntax. -* Reading Files:: How to read files and manipulate fields. -* Printing:: How to print using @code{awk}. Describes the - @code{print} and @code{printf} statements. - Also describes redirection of output. -* One-liners:: Short, sample @code{awk} programs. -* Patterns:: The various types of patterns - explained in detail. -* Actions:: The various types of actions are - introduced here. Describes - expressions and the various operators in - detail. Also describes comparison expressions. -* Expressions:: Expressions are the basic building - blocks of statements. -* Statements:: The various control statements are - described in detail. -* Arrays:: The description and use of arrays. - Also includes array-oriented control - statements. -* Built-in:: The built-in functions are summarized here. -* User-defined:: User-defined functions are described in detail. -* Built-in Variables:: Built-in Variables -* Command Line:: How to run @code{gawk}. -* Language History:: The evolution of the @code{awk} language. -* Installation:: Installing @code{gawk} under - various operating systems. -* Gawk Summary:: @code{gawk} Options and Language Summary. -* Sample Program:: A sample @code{awk} program with a - complete explanation. -* Bugs:: Reporting Problems and Bugs. -* Notes:: Something about the - implementation of @code{gawk}. -* Glossary:: An explanation of some unfamiliar terms. -* Index:: -@end menu - -@node Preface, Copying, Top, Top -@comment node-name, next, previous, up -@unnumbered Preface - -@iftex -@cindex what is @code{awk} -@end iftex -If you are like many computer users, you would frequently like to make -changes in various text files wherever certain patterns appear, or -extract data from parts of certain lines while discarding the rest. To -write a program to do this in a language such as C or Pascal is a -time-consuming inconvenience that may take many lines of code. The job -may be easier with @code{awk}. - -The @code{awk} utility interprets a special-purpose programming language -that makes it possible to handle simple data-reformatting jobs easily -with just a few lines of code. - -The GNU implementation of @code{awk} is called @code{gawk}; it is fully -upward compatible with the System V Release 4 version of -@code{awk}. @code{gawk} is also upward compatible with the @sc{posix} -(draft) specification of the @code{awk} language. This means that all -properly written @code{awk} programs should work with @code{gawk}. -Thus, we usually don't distinguish between @code{gawk} and other @code{awk} -implementations in this manual.@refill - -@cindex uses of @code{awk} -This manual teaches you what @code{awk} does and how you can use -@code{awk} effectively. You should already be familiar with basic -system commands such as @code{ls}. Using @code{awk} you can: @refill - -@itemize @bullet -@item -manage small, personal databases - -@item -generate reports - -@item -validate data -@item -produce indexes, and perform other document preparation tasks - -@item -even experiment with algorithms that can be adapted later to other computer -languages -@end itemize - -@iftex -This manual has the difficult task of being both tutorial and reference. -If you are a novice, feel free to skip over details that seem too complex. -You should also ignore the many cross references; they are for the -expert user, and for the on-line Info version of the manual. -@end iftex - -@menu -* History:: The history of @code{gawk} and - @code{awk}. Acknowledgements. -@end menu - -@node History, , Preface, Preface -@comment node-name, next, previous, up -@unnumberedsec History of @code{awk} and @code{gawk} - -@cindex acronym -@cindex history of @code{awk} -The name @code{awk} comes from the initials of its designers: Alfred V. -Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of -@code{awk} was written in 1977. In 1985 a new version made the programming -language more powerful, introducing user-defined functions, multiple input -streams, and computed regular expressions. -This new version became generally available with System V Release 3.1. -The version in System V Release 4 added some new features and also cleaned -up the behavior in some of the ``dark corners'' of the language. -The specification for @code{awk} in the @sc{posix} Command Language -and Utilities standard further clarified the language based on feedback -from both the @code{gawk} designers, and the original @code{awk} -designers.@refill - -The GNU implementation, @code{gawk}, was written in 1986 by Paul Rubin -and Jay Fenlason, with advice from Richard Stallman. John Woods -contributed parts of the code as well. In 1988 and 1989, David Trueman, with -help from Arnold Robbins, thoroughly reworked @code{gawk} for compatibility -with the newer @code{awk}. Current development (1992) focuses on bug fixes, -performance improvements, and standards compliance. - -We need to thank many people for their assistance in producing this -manual. Jay Fenlason contributed many ideas and sample programs. Richard -Mlynarik and Robert J. Chassell gave helpful comments on early drafts of this -manual. The paper @cite{A Supplemental Document for @code{awk}} by John W. -Pierce of the Chemistry Department at UC San Diego, pinpointed several -issues relevant both to @code{awk} implementation and to this manual, that -would otherwise have escaped us. David Trueman, Pat Rankin, and Michal -Jaegermann also contributed sections of the manual.@refill - -The following people provided many helpful comments on this edition of -the manual: Rick Adams, Michael Brennan, Rich Burridge, Diane Close, -Christopher (``Topher'') Eliot, Michael Lijewski, Pat Rankin, Miriam Robbins, -and Michal Jaegermann. Robert J. Chassell provided much valuable advice on -the use of Texinfo. - -Finally, we would like to thank Brian Kernighan of Bell Labs for invaluable -assistance during the testing and debugging of @code{gawk}, and for -help in clarifying numerous points about the language.@refill - -@node Copying, This Manual, Preface, Top -@unnumbered GNU GENERAL PUBLIC LICENSE -@center Version 2, June 1991 - -@display -Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc. -675 Mass Ave, Cambridge, MA 02139, USA - -Everyone is permitted to copy and distribute verbatim copies -of this license document, but changing it is not allowed. -@end display - -@c fakenode --- for prepinfo -@unnumberedsec Preamble - - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software---to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Library General Public License instead.) You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - -@iftex -@c fakenode --- for prepinfo -@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION -@end iftex -@ifinfo -@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION -@end ifinfo - -@enumerate -@item -This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The ``Program'', below, -refers to any such program or work, and a ``work based on the Program'' -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term ``modification''.) Each licensee is addressed as ``you''. - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - -@item -You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - -@item -You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - -@enumerate a -@item -You must cause the modified files to carry prominent notices -stating that you changed the files and the date of any change. - -@item -You must cause any work that you distribute or publish, that in -whole or in part contains or is derived from the Program or any -part thereof, to be licensed as a whole at no charge to all third -parties under the terms of this License. - -@item -If the modified program normally reads commands interactively -when run, you must cause it, when started running for such -interactive use in the most ordinary way, to print or display an -announcement including an appropriate copyright notice and a -notice that there is no warranty (or else, saying that you provide -a warranty) and that users may redistribute the program under -these conditions, and telling the user how to view a copy of this -License. (Exception: if the Program itself is interactive but -does not normally print such an announcement, your work based on -the Program is not required to print an announcement.) -@end enumerate - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - -@item -You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - -@enumerate a -@item -Accompany it with the complete corresponding machine-readable -source code, which must be distributed under the terms of Sections -1 and 2 above on a medium customarily used for software interchange; or, - -@item -Accompany it with a written offer, valid for at least three -years, to give any third party, for a charge no more than your -cost of physically performing source distribution, a complete -machine-readable copy of the corresponding source code, to be -distributed under the terms of Sections 1 and 2 above on a medium -customarily used for software interchange; or, - -@item -Accompany it with the information you received as to the offer -to distribute corresponding source code. (This alternative is -allowed only for noncommercial distribution and only if you -received the program in object code or executable form with such -an offer, in accord with Subsection b above.) -@end enumerate - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - -@item -You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - -@item -You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - -@item -Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - -@item -If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - -@item -If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - -@item -The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and ``any -later version'', you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - -@item -If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - -@iftex -@c fakenode --- for prepinfo -@heading NO WARRANTY -@end iftex -@ifinfo -@center NO WARRANTY -@end ifinfo - -@item -BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - -@item -IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. -@end enumerate - -@iftex -@c fakenode --- for prepinfo -@heading END OF TERMS AND CONDITIONS -@end iftex -@ifinfo -@center END OF TERMS AND CONDITIONS -@end ifinfo - -@page -@c fakenode --- for prepinfo -@unnumberedsec How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the ``copyright'' line and a pointer to where the full notice is found. - -@smallexample -@var{one line to give the program's name and a brief idea of what it does.} -Copyright (C) 19@var{yy} @var{name of author} - -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 2 of the License, or -(at your option) any later version. - -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with this program; if not, write to the Free Software -Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -@end smallexample - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - -@smallexample -Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author} -Gnomovision comes with ABSOLUTELY NO WARRANTY; for details -type `show w'. -This is free software, and you are welcome to redistribute it -under certain conditions; type `show c' for details. -@end smallexample - -The hypothetical commands @samp{show w} and @samp{show c} should show -the appropriate parts of the General Public License. Of course, the -commands you use may be called something other than @samp{show w} and -@samp{show c}; they could even be mouse-clicks or menu items---whatever -suits your program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a ``copyright disclaimer'' for the program, if -necessary. Here is a sample; alter the names: - -@smallexample -Yoyodyne, Inc., hereby disclaims all copyright interest in the program -`Gnomovision' (which makes passes at compilers) written by James Hacker. - -@var{signature of Ty Coon}, 1 April 1989 -Ty Coon, President of Vice -@end smallexample - -This General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may -consider it more useful to permit linking proprietary applications with the -library. If this is what you want to do, use the GNU Library General -Public License instead of this License. - -@node This Manual, Getting Started, Copying, Top -@chapter Using this Manual -@cindex manual, using this -@cindex using this manual -@cindex language, @code{awk} -@cindex program, @code{awk} -@cindex @code{awk} language -@cindex @code{awk} program - -The term @code{awk} refers to a particular program, and to the language you -use to tell this program what to do. When we need to be careful, we call -the program ``the @code{awk} utility'' and the language ``the @code{awk} -language.'' The term @code{gawk} refers to a version of @code{awk} developed -as part the GNU project. The purpose of this manual is to explain -both the -@code{awk} language and how to run the @code{awk} utility.@refill - -While concentrating on the features of @code{gawk}, the manual will also -attempt to describe important differences between @code{gawk} and other -@code{awk} implementations. In particular, any features that are not -in the @sc{posix} standard for @code{awk} will be noted. @refill - -The term @dfn{@code{awk} program} refers to a program written by you in -the @code{awk} programming language.@refill - -@xref{Getting Started, ,Getting Started with @code{awk}}, for the bare -essentials you need to know to start using @code{awk}. - -Some useful ``one-liners'' are included to give you a feel for the -@code{awk} language (@pxref{One-liners, ,Useful ``One-liners''}). - -@ignore -@strong{I deleted four paragraphs here because they would confuse the -beginner more than help him. They mention terms such as ``field,'' -``pattern,'' ``action,'' ``built-in function'' which the beginner -doesn't know.} - -@strong{If you can find a way to introduce several of these concepts here, -enough to give the reader a map of what is to follow, that might -be useful. I'm not sure that can be done without taking up more -space than ought to be used here. There may be no way to win.} - -@strong{ADR: I'd like to tackle this in phase 2 of my editing.} -@end ignore - -A sample @code{awk} program has been provided for you -(@pxref{Sample Program}).@refill - -If you find terms that you aren't familiar with, try looking them -up in the glossary (@pxref{Glossary}).@refill - -The entire @code{awk} language is summarized for quick reference in -@ref{Gawk Summary, ,@code{gawk} Summary}. Look there if you just need -to refresh your memory about a particular feature.@refill - -Most of the time complete @code{awk} programs are used as examples, but in -some of the more advanced sections, only the part of the @code{awk} program -that illustrates the concept being described is shown.@refill - -@menu -* Sample Data Files:: Sample data files for use in the @code{awk} - programs illustrated in this manual. -@end menu - -@node Sample Data Files, , This Manual, This Manual -@section Data Files for the Examples - -@cindex input file, sample -@cindex sample input file -@cindex @file{BBS-list} file -Many of the examples in this manual take their input from two sample -data files. The first, called @file{BBS-list}, represents a list of -computer bulletin board systems together with information about those systems. -The second data file, called @file{inventory-shipped}, contains -information about shipments on a monthly basis. Each line of these -files is one @dfn{record}. - -In the file @file{BBS-list}, each record contains the name of a computer -bulletin board, its phone number, the board's baud rate, and a code for -the number of hours it is operational. An @samp{A} in the last column -means the board operates 24 hours a day. A @samp{B} in the last -column means the board operates evening and weekend hours, only. A -@samp{C} means the board operates only on weekends. - -@example -aardvark 555-5553 1200/300 B -alpo-net 555-3412 2400/1200/300 A -barfly 555-7685 1200/300 A -bites 555-1675 2400/1200/300 A -camelot 555-0542 300 C -core 555-2912 1200/300 C -fooey 555-1234 2400/1200/300 B -foot 555-6699 1200/300 B -macfoo 555-6480 1200/300 A -sdace 555-3430 2400/1200/300 A -sabafoo 555-2127 1200/300 C -@end example - -@cindex @file{inventory-shipped} file -The second data file, called @file{inventory-shipped}, represents -information about shipments during the year. -Each record contains the month of the year, the number -of green crates shipped, the number of red boxes shipped, the number of -orange bags shipped, and the number of blue packages shipped, -respectively. There are 16 entries, covering the 12 months of one year -and 4 months of the next year.@refill - -@example -Jan 13 25 15 115 -Feb 15 32 24 226 -Mar 15 24 34 228 -Apr 31 52 63 420 -May 16 34 29 208 -Jun 31 42 75 492 -Jul 24 34 67 436 -Aug 15 34 47 316 -Sep 13 55 37 277 -Oct 29 54 68 525 -Nov 20 87 82 577 -Dec 17 35 61 401 - -Jan 21 36 64 620 -Feb 26 58 80 652 -Mar 24 75 70 495 -Apr 21 70 74 514 -@end example - -@ifinfo -If you are reading this in GNU Emacs using Info, you can copy the regions -of text showing these sample files into your own test files. This way you -can try out the examples shown in the remainder of this document. You do -this by using the command @kbd{M-x write-region} to copy text from the Info -file into a file for use with @code{awk} -(@xref{Misc File Ops, , , emacs, GNU Emacs Manual}, -for more information). Using this information, create your own -@file{BBS-list} and @file{inventory-shipped} files, and practice what you -learn in this manual. -@end ifinfo - -@node Getting Started, Reading Files, This Manual, Top -@chapter Getting Started with @code{awk} -@cindex script, definition of -@cindex rule, definition of -@cindex program, definition of -@cindex basic function of @code{gawk} - -The basic function of @code{awk} is to search files for lines (or other -units of text) that contain certain patterns. When a line matches one -of the patterns, @code{awk} performs specified actions on that line. -@code{awk} keeps processing input lines in this way until the end of the -input file is reached.@refill - -When you run @code{awk}, you specify an @code{awk} @dfn{program} which -tells @code{awk} what to do. The program consists of a series of -@dfn{rules}. (It may also contain @dfn{function definitions}, but that -is an advanced feature, so we will ignore it for now. -@xref{User-defined, ,User-defined Functions}.) Each rule specifies one -pattern to search for, and one action to perform when that pattern is found. - -Syntactically, a rule consists of a pattern followed by an action. The -action is enclosed in curly braces to separate it from the pattern. -Rules are usually separated by newlines. Therefore, an @code{awk} -program looks like this: - -@example -@var{pattern} @{ @var{action} @} -@var{pattern} @{ @var{action} @} -@dots{} -@end example - -@menu -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example with two rules. -* More Complex:: A more complex example. -* Running gawk:: How to run @code{gawk} programs; - includes command line syntax. -* Comments:: Adding documentation to @code{gawk} programs. -* Statements/Lines:: Subdividing or combining statements into lines. -* When:: When to use @code{gawk} and - when to use other things. -@end menu - -@node Very Simple, Two Rules, Getting Started, Getting Started -@section A Very Simple Example - -@cindex @samp{print $0} -The following command runs a simple @code{awk} program that searches the -input file @file{BBS-list} for the string of characters: @samp{foo}. (A -string of characters is usually called, a @dfn{string}. -The term @dfn{string} is perhaps based on similar usage in English, such -as ``a string of pearls,'' or, ``a string of cars in a train.'') - -@example -awk '/foo/ @{ print $0 @}' BBS-list -@end example - -@noindent -When lines containing @samp{foo} are found, they are printed, because -@w{@samp{print $0}} means print the current line. (Just @samp{print} by -itself means the same thing, so we could have written that -instead.) - -You will notice that slashes, @samp{/}, surround the string @samp{foo} -in the actual @code{awk} program. The slashes indicate that @samp{foo} -is a pattern to search for. This type of pattern is called a -@dfn{regular expression}, and is covered in more detail later -(@pxref{Regexp, ,Regular Expressions as Patterns}). There are -single-quotes around the @code{awk} program so that the shell won't -interpret any of it as special shell characters.@refill - -Here is what this program prints: - -@example -@group -fooey 555-1234 2400/1200/300 B -foot 555-6699 1200/300 B -macfoo 555-6480 1200/300 A -sabafoo 555-2127 1200/300 C -@end group -@end example - -@cindex action, default -@cindex pattern, default -@cindex default action -@cindex default pattern -In an @code{awk} rule, either the pattern or the action can be omitted, -but not both. If the pattern is omitted, then the action is performed -for @emph{every} input line. If the action is omitted, the default -action is to print all lines that match the pattern. - -Thus, we could leave out the action (the @code{print} statement and the curly -braces) in the above example, and the result would be the same: all -lines matching the pattern @samp{foo} would be printed. By comparison, -omitting the @code{print} statement but retaining the curly braces makes an -empty action that does nothing; then no lines would be printed. - -@node Two Rules, More Complex, Very Simple, Getting Started -@section An Example with Two Rules -@cindex how @code{awk} works - -The @code{awk} utility reads the input files one line at a -time. For each line, @code{awk} tries the patterns of each of the rules. -If several patterns match then several actions are run, in the order in -which they appear in the @code{awk} program. If no patterns match, then -no actions are run. - -After processing all the rules (perhaps none) that match the line, -@code{awk} reads the next line (however, -@pxref{Next Statement, ,The @code{next} Statement}). This continues -until the end of the file is reached.@refill - -For example, the @code{awk} program: - -@example -/12/ @{ print $0 @} -/21/ @{ print $0 @} -@end example - -@noindent -contains two rules. The first rule has the string @samp{12} as the -pattern and @samp{print $0} as the action. The second rule has the -string @samp{21} as the pattern and also has @samp{print $0} as the -action. Each rule's action is enclosed in its own pair of braces. - -This @code{awk} program prints every line that contains the string -@samp{12} @emph{or} the string @samp{21}. If a line contains both -strings, it is printed twice, once by each rule. - -If we run this program on our two sample data files, @file{BBS-list} and -@file{inventory-shipped}, as shown here: - -@example -awk '/12/ @{ print $0 @} - /21/ @{ print $0 @}' BBS-list inventory-shipped -@end example - -@noindent -we get the following output: - -@example -aardvark 555-5553 1200/300 B -alpo-net 555-3412 2400/1200/300 A -barfly 555-7685 1200/300 A -bites 555-1675 2400/1200/300 A -core 555-2912 1200/300 C -fooey 555-1234 2400/1200/300 B -foot 555-6699 1200/300 B -macfoo 555-6480 1200/300 A -sdace 555-3430 2400/1200/300 A -sabafoo 555-2127 1200/300 C -sabafoo 555-2127 1200/300 C -Jan 21 36 64 620 -Apr 21 70 74 514 -@end example - -@noindent -Note how the line in @file{BBS-list} beginning with @samp{sabafoo} -was printed twice, once for each rule. - -@node More Complex, Running gawk, Two Rules, Getting Started -@comment node-name, next, previous, up -@section A More Complex Example - -Here is an example to give you an idea of what typical @code{awk} -programs do. This example shows how @code{awk} can be used to -summarize, select, and rearrange the output of another utility. It uses -features that haven't been covered yet, so don't worry if you don't -understand all the details. - -@example -ls -l | awk '$5 == "Nov" @{ sum += $4 @} - END @{ print sum @}' -@end example - -This command prints the total number of bytes in all the files in the -current directory that were last modified in November (of any year). -(In the C shell you would need to type a semicolon and then a backslash -at the end of the first line; in a @sc{posix}-compliant shell, such as the -Bourne shell or the Bourne-Again shell, you can type the example as shown.) - -The @w{@samp{ls -l}} part of this example is a command that gives you a -listing of the files in a directory, including file size and date. -Its output looks like this:@refill - -@example --rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile --rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h --rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h --rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y --rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c --rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c --rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c --rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c -@end example - -@noindent -The first field contains read-write permissions, the second field contains -the number of links to the file, and the third field identifies the owner of -the file. The fourth field contains the size of the file in bytes. The -fifth, sixth, and seventh fields contain the month, day, and time, -respectively, that the file was last modified. Finally, the eighth field -contains the name of the file. - -The @code{$5 == "Nov"} in our @code{awk} program is an expression that -tests whether the fifth field of the output from @w{@samp{ls -l}} -matches the string @samp{Nov}. Each time a line has the string -@samp{Nov} in its fifth field, the action @samp{@{ sum += $4 @}} is -performed. This adds the fourth field (the file size) to the variable -@code{sum}. As a result, when @code{awk} has finished reading all the -input lines, @code{sum} is the sum of the sizes of files whose -lines matched the pattern. (This works because @code{awk} variables -are automatically initialized to zero.)@refill - -After the last line of output from @code{ls} has been processed, the -@code{END} rule is executed, and the value of @code{sum} is -printed. In this example, the value of @code{sum} would be 80600.@refill - -These more advanced @code{awk} techniques are covered in later sections -(@pxref{Actions, ,Overview of Actions}). Before you can move on to more -advanced @code{awk} programming, you have to know how @code{awk} interprets -your input and displays your output. By manipulating fields and using -@code{print} statements, you can produce some very useful and spectacular -looking reports.@refill - -@node Running gawk, Comments, More Complex, Getting Started -@section How to Run @code{awk} Programs - -@ignore -Date: Mon, 26 Aug 91 09:48:10 +0200 -From: gatech!vsoc07.cern.ch!matheys (Jean-Pol Matheys (CERN - ECP Division)) -To: uunet.UU.NET!skeeve!arnold -Subject: RE: status check - -The introduction of Chapter 2 (i.e. before 2.1) should include -the whole of section 2.4 - it's better to tell people how to run awk programs -before giving any examples - -ADR --- he's right. but for now, don't do this because the rest of the -chapter would need some rewriting. -@end ignore - -@cindex command line formats -@cindex running @code{awk} programs -There are several ways to run an @code{awk} program. If the program is -short, it is easiest to include it in the command that runs @code{awk}, -like this: - -@example -awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} -@end example - -@noindent -where @var{program} consists of a series of patterns and actions, as -described earlier. - -When the program is long, it is usually more convenient to put it in a file -and run it with a command like this: - -@example -awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{} -@end example - -@menu -* One-shot:: Running a short throw-away @code{awk} program. -* Read Terminal:: Using no input files (input from - terminal instead). -* Long:: Putting permanent @code{awk} programs in files. -* Executable Scripts:: Making self-contained @code{awk} programs. -@end menu - -@node One-shot, Read Terminal, Running gawk, Running gawk -@subsection One-shot Throw-away @code{awk} Programs - -Once you are familiar with @code{awk}, you will often type simple -programs at the moment you want to use them. Then you can write the -program as the first argument of the @code{awk} command, like this: - -@example -awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} -@end example - -@noindent -where @var{program} consists of a series of @var{patterns} and -@var{actions}, as described earlier. - -@cindex single quotes, why needed -This command format instructs the shell to start @code{awk} and use the -@var{program} to process records in the input file(s). There are single -quotes around @var{program} so that the shell doesn't interpret any -@code{awk} characters as special shell characters. They also cause the -shell to treat all of @var{program} as a single argument for -@code{awk} and allow @var{program} to be more than one line long.@refill - -This format is also useful for running short or medium-sized @code{awk} -programs from shell scripts, because it avoids the need for a separate -file for the @code{awk} program. A self-contained shell script is more -reliable since there are no other files to misplace. - -@node Read Terminal, Long, One-shot, Running gawk -@subsection Running @code{awk} without Input Files - -@cindex standard input -@cindex input, standard -You can also run @code{awk} without any input files. If you type the -command line:@refill - -@example -awk '@var{program}' -@end example - -@noindent -then @code{awk} applies the @var{program} to the @dfn{standard input}, -which usually means whatever you type on the terminal. This continues -until you indicate end-of-file by typing @kbd{Control-d}. - -For example, if you execute this command: - -@example -awk '/th/' -@end example - -@noindent -whatever you type next is taken as data for that @code{awk} -program. If you go on to type the following data: - -@example -Kathy -Ben -Tom -Beth -Seth -Karen -Thomas -@kbd{Control-d} -@end example - -@noindent -then @code{awk} prints this output: - -@example -Kathy -Beth -Seth -@end example - -@noindent -@cindex case sensitivity -@cindex pattern, case sensitive -as matching the pattern @samp{th}. Notice that it did not recognize -@samp{Thomas} as matching the pattern. The @code{awk} language is -@dfn{case sensitive}, and matches patterns exactly. (However, you can -override this with the variable @code{IGNORECASE}. -@xref{Case-sensitivity, ,Case-sensitivity in Matching}.) - -@node Long, Executable Scripts, Read Terminal, Running gawk -@subsection Running Long Programs - -@cindex running long programs -@cindex @samp{-f} option -@cindex program file -@cindex file, @code{awk} program -Sometimes your @code{awk} programs can be very long. In this case it is -more convenient to put the program into a separate file. To tell -@code{awk} to use that file for its program, you type:@refill - -@example -awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{} -@end example - -The @samp{-f} instructs the @code{awk} utility to get the @code{awk} program -from the file @var{source-file}. Any file name can be used for -@var{source-file}. For example, you could put the program:@refill - -@example -/th/ -@end example - -@noindent -into the file @file{th-prog}. Then this command: - -@example -awk -f th-prog -@end example - -@noindent -does the same thing as this one: - -@example -awk '/th/' -@end example - -@noindent -which was explained earlier (@pxref{Read Terminal, ,Running @code{awk} without Input Files}). -Note that you don't usually need single quotes around the file name that you -specify with @samp{-f}, because most file names don't contain any of the shell's -special characters. Notice that in @file{th-prog}, the @code{awk} -program did not have single quotes around it. The quotes are only needed -for programs that are provided on the @code{awk} command line. - -If you want to identify your @code{awk} program files clearly as such, -you can add the extension @file{.awk} to the file name. This doesn't -affect the execution of the @code{awk} program, but it does make -``housekeeping'' easier. - -@node Executable Scripts, , Long, Running gawk -@c node-name, next, previous, up -@subsection Executable @code{awk} Programs -@cindex executable scripts -@cindex scripts, executable -@cindex self contained programs -@cindex program, self contained -@cindex @samp{#!} - -Once you have learned @code{awk}, you may want to write self-contained -@code{awk} scripts, using the @samp{#!} script mechanism. You can do -this on many Unix systems @footnote{The @samp{#!} mechanism works on -Unix systems derived from Berkeley Unix, System V Release 4, and some System -V Release 3 systems.} (and someday on GNU).@refill - -For example, you could create a text file named @file{hello}, containing -the following (where @samp{BEGIN} is a feature we have not yet -discussed): - -@example -#! /bin/awk -f - -# a sample awk program -BEGIN @{ print "hello, world" @} -@end example - -@noindent -After making this file executable (with the @code{chmod} command), you -can simply type: - -@example -hello -@end example - -@noindent -at the shell, and the system will arrange to run @code{awk} @footnote{The -line beginning with @samp{#!} lists the full pathname of an interpreter -to be run, and an optional initial command line argument to pass to that -interpreter. The operating system then runs the interpreter with the given -argument and the full argument list of the executed program. The first argument -in the list is the full pathname of the @code{awk} program. The rest of the -argument list will either be options to @code{awk}, or data files, -or both.} as if you had typed:@refill - -@example -awk -f hello -@end example - -@noindent -Self-contained @code{awk} scripts are useful when you want to write a -program which users can invoke without knowing that the program is -written in @code{awk}. - -@cindex shell scripts -@cindex scripts, shell -If your system does not support the @samp{#!} mechanism, you can get a -similar effect using a regular shell script. It would look something -like this: - -@example -: The colon makes sure this script is executed by the Bourne shell. -awk '@var{program}' "$@@" -@end example - -Using this technique, it is @emph{vital} to enclose the @var{program} in -single quotes to protect it from interpretation by the shell. If you -omit the quotes, only a shell wizard can predict the results. - -The @samp{"$@@"} causes the shell to forward all the command line -arguments to the @code{awk} program, without interpretation. The first -line, which starts with a colon, is used so that this shell script will -work even if invoked by a user who uses the C shell. -@c Someday: (See @cite{The Bourne Again Shell}, by ??.) - -@node Comments, Statements/Lines, Running gawk, Getting Started -@section Comments in @code{awk} Programs -@cindex @samp{#} -@cindex comments -@cindex use of comments -@cindex documenting @code{awk} programs -@cindex programs, documenting - -A @dfn{comment} is some text that is included in a program for the sake -of human readers, and that is not really part of the program. Comments -can explain what the program does, and how it works. Nearly all -programming languages have provisions for comments, because programs are -typically hard to understand without their extra help. - -In the @code{awk} language, a comment starts with the sharp sign -character, @samp{#}, and continues to the end of the line. The -@code{awk} language ignores the rest of a line following a sharp sign. -For example, we could have put the following into @file{th-prog}:@refill - -@smallexample -# This program finds records containing the pattern @samp{th}. This is how -# you continue comments on additional lines. -/th/ -@end smallexample - -You can put comment lines into keyboard-composed throw-away @code{awk} -programs also, but this usually isn't very useful; the purpose of a -comment is to help you or another person understand the program at -a later time.@refill - -@node Statements/Lines, When, Comments, Getting Started -@section @code{awk} Statements versus Lines - -Most often, each line in an @code{awk} program is a separate statement or -separate rule, like this: - -@example -awk '/12/ @{ print $0 @} - /21/ @{ print $0 @}' BBS-list inventory-shipped -@end example - -But sometimes statements can be more than one line, and lines can -contain several statements. You can split a statement into multiple -lines by inserting a newline after any of the following:@refill - -@example -, @{ ? : || && do else -@end example - -@noindent -A newline at any other point is considered the end of the statement. -(Splitting lines after @samp{?} and @samp{:} is a minor @code{gawk} -extension. The @samp{?} and @samp{:} referred to here is the -three operand conditional expression described in -@ref{Conditional Exp, ,Conditional Expressions}.)@refill - -@cindex backslash continuation -@cindex continuation of lines -If you would like to split a single statement into two lines at a point -where a newline would terminate it, you can @dfn{continue} it by ending the -first line with a backslash character, @samp{\}. This is allowed -absolutely anywhere in the statement, even in the middle of a string or -regular expression. For example: - -@example -awk '/This program is too long, so continue it\ - on the next line/ @{ print $1 @}' -@end example - -@noindent -We have generally not used backslash continuation in the sample programs in -this manual. Since in @code{gawk} there is no limit on the length of a line, -it is never strictly necessary; it just makes programs prettier. We have -preferred to make them even more pretty by keeping the statements short. -Backslash continuation is most useful when your @code{awk} program is in a -separate source file, instead of typed in on the command line. You should -also note that many @code{awk} implementations are more picky about where -you may use backslash continuation. For maximal portability of your @code{awk} -programs, it is best not to split your lines in the middle of a regular -expression or a string.@refill - -@strong{Warning: backslash continuation does not work as described above -with the C shell.} Continuation with backslash works for @code{awk} -programs in files, and also for one-shot programs @emph{provided} you -are using a @sc{posix}-compliant shell, such as the Bourne shell or the -Bourne-again shell. But the C shell used on Berkeley Unix behaves -differently! There, you must use two backslashes in a row, followed by -a newline.@refill - -@cindex multiple statements on one line -When @code{awk} statements within one rule are short, you might want to put -more than one of them on a line. You do this by separating the statements -with a semicolon, @samp{;}. -This also applies to the rules themselves. -Thus, the previous program could have been written:@refill - -@example -/12/ @{ print $0 @} ; /21/ @{ print $0 @} -@end example - -@noindent -@strong{Note:} the requirement that rules on the same line must be -separated with a semicolon is a recent change in the @code{awk} -language; it was done for consistency with the treatment of statements -within an action. - -@node When, , Statements/Lines, Getting Started -@section When to Use @code{awk} - -@cindex when to use @code{awk} -@cindex applications of @code{awk} -You might wonder how @code{awk} might be useful for you. Using additional -utility programs, more advanced patterns, field separators, arithmetic -statements, and other selection criteria, you can produce much more -complex output. The @code{awk} language is very useful for producing -reports from large amounts of raw data, such as summarizing information -from the output of other utility programs like @code{ls}. -(@xref{More Complex, ,A More Complex Example}.) - -Programs written with @code{awk} are usually much smaller than they would -be in other languages. This makes @code{awk} programs easy to compose and -use. Often @code{awk} programs can be quickly composed at your terminal, -used once, and thrown away. Since @code{awk} programs are interpreted, you -can avoid the usually lengthy edit-compile-test-debug cycle of software -development. - -Complex programs have been written in @code{awk}, including a complete -retargetable assembler for 8-bit microprocessors (@pxref{Glossary}, for -more information) and a microcode assembler for a special purpose Prolog -computer. However, @code{awk}'s capabilities are strained by tasks of -such complexity. - -If you find yourself writing @code{awk} scripts of more than, say, a few -hundred lines, you might consider using a different programming -language. Emacs Lisp is a good choice if you need sophisticated string -or pattern matching capabilities. The shell is also good at string and -pattern matching; in addition, it allows powerful use of the system -utilities. More conventional languages, such as C, C++, and Lisp, offer -better facilities for system programming and for managing the complexity -of large programs. Programs in these languages may require more lines -of source code than the equivalent @code{awk} programs, but they are -easier to maintain and usually run more efficiently.@refill - -@node Reading Files, Printing, Getting Started, Top -@chapter Reading Input Files - -@cindex reading files -@cindex input -@cindex standard input -@vindex FILENAME -In the typical @code{awk} program, all input is read either from the -standard input (by default the keyboard, but often a pipe from another -command) or from files whose names you specify on the @code{awk} command -line. If you specify input files, @code{awk} reads them in order, reading -all the data from one before going on to the next. The name of the current -input file can be found in the built-in variable @code{FILENAME} -(@pxref{Built-in Variables}).@refill - -The input is read in units called records, and processed by the -rules one record at a time. By default, each record is one line. Each -record is split automatically into fields, to make it more -convenient for a rule to work on its parts. - -On rare occasions you will need to use the @code{getline} command, -which can do explicit input from any number of files -(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill - -@menu -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Non-Constant Fields:: Non-constant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Constant Size:: Reading constant width data. -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program control - using the @code{getline} function. -* Close Input:: Closing an input file (so you can read from - the beginning once more). -@end menu - -@node Records, Fields, Reading Files, Reading Files -@section How Input is Split into Records - -@cindex record separator -The @code{awk} language divides its input into records and fields. -Records are separated by a character called the @dfn{record separator}. -By default, the record separator is the newline character, defining -a record to be a single line of text.@refill - -@iftex -@cindex changing the record separator -@end iftex -@vindex RS -Sometimes you may want to use a different character to separate your -records. You can use a different character by changing the built-in -variable @code{RS}. The value of @code{RS} is a string that says how -to separate records; the default value is @code{"\n"}, the string containing -just a newline character. This is why records are, by default, single lines. - -@code{RS} can have any string as its value, but only the first character -of the string is used as the record separator. The other characters are -ignored. @code{RS} is exceptional in this regard; @code{awk} uses the -full value of all its other built-in variables.@refill - -@ignore -Someday this should be true! - -The value of @code{RS} is not limited to a one-character string. It can -be any regular expression (@pxref{Regexp, ,Regular Expressions as Patterns}). -In general, each record -ends at the next string that matches the regular expression; the next -record starts at the end of the matching string. This general rule is -actually at work in the usual case, where @code{RS} contains just a -newline: a record ends at the beginning of the next matching string (the -next newline in the input) and the following record starts just after -the end of this string (at the first character of the following line). -The newline, since it matches @code{RS}, is not part of either record.@refill -@end ignore - -You can change the value of @code{RS} in the @code{awk} program with the -assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}). -The new record-separator character should be enclosed in quotation marks to make -a string constant. Often the right time to do this is at the beginning -of execution, before any input has been processed, so that the very -first record will be read with the proper separator. To do this, use -the special @code{BEGIN} pattern -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}). For -example:@refill - -@example -awk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list -@end example - -@noindent -changes the value of @code{RS} to @code{"/"}, before reading any input. -This is a string whose first character is a slash; as a result, records -are separated by slashes. Then the input file is read, and the second -rule in the @code{awk} program (the action with no pattern) prints each -record. Since each @code{print} statement adds a newline at the end of -its output, the effect of this @code{awk} program is to copy the input -with each slash changed to a newline. - -Another way to change the record separator is on the command line, -using the variable-assignment feature -(@pxref{Command Line, ,Invoking @code{awk}}).@refill - -@example -awk '@{ print $0 @}' RS="/" BBS-list -@end example - -@noindent -This sets @code{RS} to @samp{/} before processing @file{BBS-list}. - -Reaching the end of an input file terminates the current input record, -even if the last character in the file is not the character in @code{RS}. - -@ignore -@c merge the preceding paragraph and this stuff into one paragraph -@c and put it in an `expert info' section. -This produces correct behavior in the vast majority of cases, although -the following (extreme) pipeline prints a surprising @samp{1}. (There -is one field, consisting of a newline.) - -@example -echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}' -@end example - -@end ignore - -The empty string, @code{""} (a string of no characters), has a special meaning -as the value of @code{RS}: it means that records are separated only -by blank lines. @xref{Multiple Line, ,Multiple-Line Records}, for more details. - -@cindex number of records, @code{NR} or @code{FNR} -@vindex NR -@vindex FNR -The @code{awk} utility keeps track of the number of records that have -been read so far from the current input file. This value is stored in a -built-in variable called @code{FNR}. It is reset to zero when a new -file is started. Another built-in variable, @code{NR}, is the total -number of input records read so far from all files. It starts at zero -but is never automatically reset to zero. - -If you change the value of @code{RS} in the middle of an @code{awk} run, -the new value is used to delimit subsequent records, but the record -currently being processed (and records already processed) are not -affected. - -@node Fields, Non-Constant Fields, Records, Reading Files -@section Examining Fields - -@cindex examining fields -@cindex fields -@cindex accessing fields -When @code{awk} reads an input record, the record is -automatically separated or @dfn{parsed} by the interpreter into chunks -called @dfn{fields}. By default, fields are separated by whitespace, -like words in a line. -Whitespace in @code{awk} means any string of one or more spaces and/or -tabs; other characters such as newline, formfeed, and so on, that are -considered whitespace by other languages are @emph{not} considered -whitespace by @code{awk}.@refill - -The purpose of fields is to make it more convenient for you to refer to -these pieces of the record. You don't have to use them---you can -operate on the whole record if you wish---but fields are what make -simple @code{awk} programs so powerful. - -@cindex @code{$} (field operator) -@cindex operators, @code{$} -To refer to a field in an @code{awk} program, you use a dollar-sign, -@samp{$}, followed by the number of the field you want. Thus, @code{$1} -refers to the first field, @code{$2} to the second, and so on. For -example, suppose the following is a line of input:@refill - -@example -This seems like a pretty nice example. -@end example - -@noindent -Here the first field, or @code{$1}, is @samp{This}; the second field, or -@code{$2}, is @samp{seems}; and so on. Note that the last field, -@code{$7}, is @samp{example.}. Because there is no space between the -@samp{e} and the @samp{.}, the period is considered part of the seventh -field.@refill - -No matter how many fields there are, the last field in a record can be -represented by @code{$NF}. So, in the example above, @code{$NF} would -be the same as @code{$7}, which is @samp{example.}. Why this works is -explained below (@pxref{Non-Constant Fields, ,Non-constant Field Numbers}). -If you try to refer to a field beyond the last one, such as @code{$8} -when the record has only 7 fields, you get the empty string.@refill - -@vindex NF -@cindex number of fields, @code{NF} -Plain @code{NF}, with no @samp{$}, is a built-in variable whose value -is the number of fields in the current record. - -@code{$0}, which looks like an attempt to refer to the zeroth field, is -a special case: it represents the whole input record. This is what you -would use if you weren't interested in fields. - -Here are some more examples: - -@example -awk '$1 ~ /foo/ @{ print $0 @}' BBS-list -@end example - -@noindent -This example prints each record in the file @file{BBS-list} whose first -field contains the string @samp{foo}. The operator @samp{~} is called a -@dfn{matching operator} (@pxref{Comparison Ops, ,Comparison Expressions}); -it tests whether a string (here, the field @code{$1}) matches a given regular -expression.@refill - -By contrast, the following example: - -@example -awk '/foo/ @{ print $1, $NF @}' BBS-list -@end example - -@noindent -looks for @samp{foo} in @emph{the entire record} and prints the first -field and the last field for each input record containing a -match.@refill - -@node Non-Constant Fields, Changing Fields, Fields, Reading Files -@section Non-constant Field Numbers - -The number of a field does not need to be a constant. Any expression in -the @code{awk} language can be used after a @samp{$} to refer to a -field. The value of the expression specifies the field number. If the -value is a string, rather than a number, it is converted to a number. -Consider this example:@refill - -@example -awk '@{ print $NR @}' -@end example - -@noindent -Recall that @code{NR} is the number of records read so far: 1 in the -first record, 2 in the second, etc. So this example prints the first -field of the first record, the second field of the second record, and so -on. For the twentieth record, field number 20 is printed; most likely, -the record has fewer than 20 fields, so this prints a blank line. - -Here is another example of using expressions as field numbers: - -@example -awk '@{ print $(2*2) @}' BBS-list -@end example - -The @code{awk} language must evaluate the expression @code{(2*2)} and use -its value as the number of the field to print. The @samp{*} sign -represents multiplication, so the expression @code{2*2} evaluates to 4. -The parentheses are used so that the multiplication is done before the -@samp{$} operation; they are necessary whenever there is a binary -operator in the field-number expression. This example, then, prints the -hours of operation (the fourth field) for every line of the file -@file{BBS-list}.@refill - -If the field number you compute is zero, you get the entire record. -Thus, @code{$(2-2)} has the same value as @code{$0}. Negative field -numbers are not allowed. - -The number of fields in the current record is stored in the built-in -variable @code{NF} (@pxref{Built-in Variables}). The expression -@code{$NF} is not a special feature: it is the direct consequence of -evaluating @code{NF} and using its value as a field number. - -@node Changing Fields, Field Separators, Non-Constant Fields, Reading Files -@section Changing the Contents of a Field - -@cindex field, changing contents of -@cindex changing contents of a field -@cindex assignment to fields -You can change the contents of a field as seen by @code{awk} within an -@code{awk} program; this changes what @code{awk} perceives as the -current input record. (The actual input is untouched: @code{awk} never -modifies the input file.) - -Consider this example: - -@smallexample -awk '@{ $3 = $2 - 10; print $2, $3 @}' inventory-shipped -@end smallexample - -@noindent -The @samp{-} sign represents subtraction, so this program reassigns -field three, @code{$3}, to be the value of field two minus ten, -@code{$2 - 10}. (@xref{Arithmetic Ops, ,Arithmetic Operators}.) -Then field two, and the new value for field three, are printed. - -In order for this to work, the text in field @code{$2} must make sense -as a number; the string of characters must be converted to a number in -order for the computer to do arithmetic on it. The number resulting -from the subtraction is converted back to a string of characters which -then becomes field three. -@xref{Conversion, ,Conversion of Strings and Numbers}.@refill - -When you change the value of a field (as perceived by @code{awk}), the -text of the input record is recalculated to contain the new field where -the old one was. Therefore, @code{$0} changes to reflect the altered -field. Thus, - -@smallexample -awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped -@end smallexample - -@noindent -prints a copy of the input file, with 10 subtracted from the second -field of each line. - -You can also assign contents to fields that are out of range. For -example: - -@smallexample -awk '@{ $6 = ($5 + $4 + $3 + $2) ; print $6 @}' inventory-shipped -@end smallexample - -@noindent -We've just created @code{$6}, whose value is the sum of fields -@code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign -represents addition. For the file @file{inventory-shipped}, @code{$6} -represents the total number of parcels shipped for a particular month. - -Creating a new field changes the internal @code{awk} copy of the current -input record---the value of @code{$0}. Thus, if you do @samp{print $0} -after adding a field, the record printed includes the new field, with -the appropriate number of field separators between it and the previously -existing fields. - -This recomputation affects and is affected by several features not yet -discussed, in particular, the @dfn{output field separator}, @code{OFS}, -which is used to separate the fields (@pxref{Output Separators}), and -@code{NF} (the number of fields; @pxref{Fields, ,Examining Fields}). -For example, the value of @code{NF} is set to the number of the highest -field you create.@refill - -Note, however, that merely @emph{referencing} an out-of-range field -does @emph{not} change the value of either @code{$0} or @code{NF}. -Referencing an out-of-range field merely produces a null string. For -example:@refill - -@smallexample -if ($(NF+1) != "") - print "can't happen" -else - print "everything is normal" -@end smallexample - -@noindent -should print @samp{everything is normal}, because @code{NF+1} is certain -to be out of range. (@xref{If Statement, ,The @code{if} Statement}, -for more information about @code{awk}'s @code{if-else} statements.)@refill - -It is important to note that assigning to a field will change the -value of @code{$0}, but will not change the value of @code{NF}, -even when you assign the null string to a field. For example: - -@smallexample -echo a b c d | awk '@{ OFS = ":"; $2 = "" ; print ; print NF @}' -@end smallexample - -@noindent -prints - -@smallexample -a::c:d -4 -@end smallexample - -@noindent -The field is still there, it just has an empty value. You can tell -because there are two colons in a row. - -@node Field Separators, Constant Size, Changing Fields, Reading Files -@section Specifying how Fields are Separated -@vindex FS -@cindex fields, separating -@cindex field separator, @code{FS} -@cindex @samp{-F} option - -(This section is rather long; it describes one of the most fundamental -operations in @code{awk}. If you are a novice with @code{awk}, we -recommend that you re-read this section after you have studied the -section on regular expressions, @ref{Regexp, ,Regular Expressions as Patterns}.) - -The way @code{awk} splits an input record into fields is controlled by -the @dfn{field separator}, which is a single character or a regular -expression. @code{awk} scans the input record for matches for the -separator; the fields themselves are the text between the matches. For -example, if the field separator is @samp{oo}, then the following line: - -@smallexample -moo goo gai pan -@end smallexample - -@noindent -would be split into three fields: @samp{m}, @samp{@ g} and @samp{@ gai@ -pan}. - -The field separator is represented by the built-in variable @code{FS}. -Shell programmers take note! @code{awk} does not use the name @code{IFS} -which is used by the shell.@refill - -You can change the value of @code{FS} in the @code{awk} program with the -assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}). -Often the right time to do this is at the beginning of execution, -before any input has been processed, so that the very first record -will be read with the proper separator. To do this, use the special -@code{BEGIN} pattern -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}). -For example, here we set the value of @code{FS} to the string -@code{","}:@refill - -@smallexample -awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}' -@end smallexample - -@noindent -Given the input line, - -@smallexample -John Q. Smith, 29 Oak St., Walamazoo, MI 42139 -@end smallexample - -@noindent -this @code{awk} program extracts the string @samp{@ 29 Oak St.}. - -@cindex field separator, choice of -@cindex regular expressions as field separators -Sometimes your input data will contain separator characters that don't -separate fields the way you thought they would. For instance, the -person's name in the example we've been using might have a title or -suffix attached, such as @samp{John Q. Smith, LXIX}. From input -containing such a name: - -@smallexample -John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 -@end smallexample - -@noindent -the previous sample program would extract @samp{@ LXIX}, instead of -@samp{@ 29 Oak St.}. If you were expecting the program to print the -address, you would be surprised. So choose your data layout and -separator characters carefully to prevent such problems. - -As you know, by default, fields are separated by whitespace sequences -(spaces and tabs), not by single spaces: two spaces in a row do not -delimit an empty field. The default value of the field separator is a -string @w{@code{" "}} containing a single space. If this value were -interpreted in the usual way, each space character would separate -fields, so two spaces in a row would make an empty field between them. -The reason this does not happen is that a single space as the value of -@code{FS} is a special case: it is taken to specify the default manner -of delimiting fields. - -If @code{FS} is any other single character, such as @code{","}, then -each occurrence of that character separates two fields. Two consecutive -occurrences delimit an empty field. If the character occurs at the -beginning or the end of the line, that too delimits an empty field. The -space character is the only single character which does not follow these -rules. - -More generally, the value of @code{FS} may be a string containing any -regular expression. Then each match in the record for the regular -expression separates fields. For example, the assignment:@refill - -@smallexample -FS = ", \t" -@end smallexample - -@noindent -makes every area of an input line that consists of a comma followed by a -space and a tab, into a field separator. (@samp{\t} stands for a -tab.)@refill - -For a less trivial example of a regular expression, suppose you want -single spaces to separate fields the way single commas were used above. -You can set @code{FS} to @w{@code{"[@ ]"}}. This regular expression -matches a single space and nothing else. - -@c the following index entry is an overfull hbox. --mew 30jan1992 -@cindex field separator: on command line -@cindex command line, setting @code{FS} on -@code{FS} can be set on the command line. You use the @samp{-F} argument to -do so. For example: - -@smallexample -awk -F, '@var{program}' @var{input-files} -@end smallexample - -@noindent -sets @code{FS} to be the @samp{,} character. Notice that the argument uses -a capital @samp{F}. Contrast this with @samp{-f}, which specifies a file -containing an @code{awk} program. Case is significant in command options: -the @samp{-F} and @samp{-f} options have nothing to do with each other. -You can use both options at the same time to set the @code{FS} argument -@emph{and} get an @code{awk} program from a file.@refill - -@c begin expert info -The value used for the argument to @samp{-F} is processed in exactly the -same way as assignments to the built-in variable @code{FS}. This means that -if the field separator contains special characters, they must be escaped -appropriately. For example, to use a @samp{\} as the field separator, you -would have to type: - -@smallexample -# same as FS = "\\" -awk -F\\\\ '@dots{}' files @dots{} -@end smallexample - -@noindent -Since @samp{\} is used for quoting in the shell, @code{awk} will see -@samp{-F\\}. Then @code{awk} processes the @samp{\\} for escape -characters (@pxref{Constants, ,Constant Expressions}), finally yielding -a single @samp{\} to be used for the field separator. -@c end expert info - -As a special case, in compatibility mode -(@pxref{Command Line, ,Invoking @code{awk}}), if the -argument to @samp{-F} is @samp{t}, then @code{FS} is set to the tab -character. (This is because if you type @samp{-F\t}, without the quotes, -at the shell, the @samp{\} gets deleted, so @code{awk} figures that you -really want your fields to be separated with tabs, and not @samp{t}s. -Use @samp{-v FS="t"} on the command line if you really do want to separate -your fields with @samp{t}s.)@refill - -For example, let's use an @code{awk} program file called @file{baud.awk} -that contains the pattern @code{/300/}, and the action @samp{print $1}. -Here is the program: - -@smallexample -/300/ @{ print $1 @} -@end smallexample - -Let's also set @code{FS} to be the @samp{-} character, and run the -program on the file @file{BBS-list}. The following command prints a -list of the names of the bulletin boards that operate at 300 baud and -the first three digits of their phone numbers:@refill - -@smallexample -awk -F- -f baud.awk BBS-list -@end smallexample - -@noindent -It produces this output: - -@smallexample -aardvark 555 -alpo -barfly 555 -bites 555 -camelot 555 -core 555 -fooey 555 -foot 555 -macfoo 555 -sdace 555 -sabafoo 555 -@end smallexample - -@noindent -Note the second line of output. If you check the original file, you will -see that the second line looked like this: - -@smallexample -alpo-net 555-3412 2400/1200/300 A -@end smallexample - -The @samp{-} as part of the system's name was used as the field -separator, instead of the @samp{-} in the phone number that was -originally intended. This demonstrates why you have to be careful in -choosing your field and record separators. - -The following program searches the system password file, and prints -the entries for users who have no password: - -@smallexample -awk -F: '$2 == ""' /etc/passwd -@end smallexample - -@noindent -Here we use the @samp{-F} option on the command line to set the field -separator. Note that fields in @file{/etc/passwd} are separated by -colons. The second field represents a user's encrypted password, but if -the field is empty, that user has no password. - -@c begin expert info -According to the @sc{posix} standard, @code{awk} is supposed to behave -as if each record is split into fields at the time that it is read. -In particular, this means that you can change the value of @code{FS} -after a record is read, but before any of the fields are referenced. -The value of the fields (i.e. how they were split) should reflect the -old value of @code{FS}, not the new one. - -However, many implementations of @code{awk} do not do this. Instead, -they defer splitting the fields until a field reference actually happens, -using the @emph{current} value of @code{FS}! This behavior can be difficult -to diagnose. The following example illustrates the results of the two methods. -(The @code{sed} command prints just the first line of @file{/etc/passwd}.) - -@smallexample -sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}' -@end smallexample - -@noindent -will usually print - -@smallexample -root -@end smallexample - -@noindent -on an incorrect implementation of @code{awk}, while @code{gawk} -will print something like - -@smallexample -root:nSijPlPhZZwgE:0:0:Root:/: -@end smallexample -@c end expert info - -@c begin expert info -There is an important difference between the two cases of @samp{FS = @w{" "}} -(a single blank) and @samp{FS = @w{"[ \t]+"}} (which is a regular expression -matching one or more blanks or tabs). For both values of @code{FS}, fields -are separated by runs of blanks and/or tabs. However, when the value of -@code{FS} is @code{" "}, @code{awk} will strip leading and trailing whitespace -from the record, and then decide where the fields are. - -For example, the following expression prints @samp{b}: - -@smallexample -echo ' a b c d ' | awk '@{ print $2 @}' -@end smallexample - -@noindent -However, the following prints @samp{a}: - -@smallexample -echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t]+" @} ; @{ print $2 @}' -@end smallexample - -@noindent -In this case, the first field is null. - -The stripping of leading and trailing whitespace also comes into -play whenever @code{$0} is recomputed. For instance, this pipeline - -@smallexample -echo ' a b c d' | awk '@{ print; $2 = $2; print @}' -@end smallexample - -@noindent -produces this output: - -@smallexample - a b c d -a b c d -@end smallexample - -@noindent -The first @code{print} statement prints the record as it was read, -with leading whitespace intact. The assignment to @code{$2} rebuilds -@code{$0} by concatenating @code{$1} through @code{$NF} together, -separated by the value of @code{OFS}. Since the leading whitespace -was ignored when finding @code{$1}, it is not part of the new @code{$0}. -Finally, the last @code{print} statement prints the new @code{$0}. -@c end expert info - -The following table summarizes how fields are split, based on the -value of @code{FS}. - -@table @code -@item FS == " " -Fields are separated by runs of whitespace. Leading and trailing -whitespace are ignored. This is the default. - -@item FS == @var{any single character} -Fields are separated by each occurrence of the character. Multiple -successive occurrences delimit empty fields, as do leading and -trailing occurrences. - -@item FS == @var{regexp} -Fields are separated by occurrences of characters that match @var{regexp}. -Leading and trailing matches of @var{regexp} delimit empty fields. -@end table - -@node Constant Size, Multiple Line, Field Separators, Reading Files -@section Reading Fixed-width Data - -(This section discusses an advanced, experimental feature. If you are -a novice @code{awk} user, you may wish to skip it on the first reading.) - -@code{gawk} 2.13 introduced a new facility for dealing with fixed-width fields -with no distinctive field separator. Data of this nature arises typically -in one of at least two ways: the input for old FORTRAN programs where -numbers are run together, and the output of programs that did not anticipate -the use of their output as input for other programs. - -An example of the latter is a table where all the columns are lined up by -the use of a variable number of spaces and @emph{empty fields are just -spaces}. Clearly, @code{awk}'s normal field splitting based on @code{FS} -will not work well in this case. (Although a portable @code{awk} program -can use a series of @code{substr} calls on @code{$0}, this is awkward and -inefficient for a large number of fields.)@refill - -The splitting of an input record into fixed-width fields is specified by -assigning a string containing space-separated numbers to the built-in -variable @code{FIELDWIDTHS}. Each number specifies the width of the field -@emph{including} columns between fields. If you want to ignore the columns -between fields, you can specify the width as a separate field that is -subsequently ignored. - -The following data is the output of the @code{w} utility. It is useful -to illustrate the use of @code{FIELDWIDTHS}. - -@smallexample - 10:06pm up 21 days, 14:04, 23 users -User tty login@ idle JCPU PCPU what -hzuo ttyV0 8:58pm 9 5 vi p24.tex -hzang ttyV3 6:37pm 50 -csh -eklye ttyV5 9:53pm 7 1 em thes.tex -dportein ttyV6 8:17pm 1:47 -csh -gierd ttyD3 10:00pm 1 elm -dave ttyD4 9:47pm 4 4 w -brent ttyp0 26Jun91 4:46 26:46 4:41 bash -dave ttyq4 26Jun9115days 46 46 wnewmail -@end smallexample - -The following program takes the above input, converts the idle time to -number of seconds and prints out the first two fields and the calculated -idle time. (This program uses a number of @code{awk} features that -haven't been introduced yet.)@refill - -@smallexample -BEGIN @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @} -NR > 2 @{ - idle = $4 - sub(/^ */, "", idle) # strip leading spaces - if (idle == "") idle = 0 - if (idle ~ /:/) @{ split(idle, t, ":"); idle = t[1] * 60 + t[2] @} - if (idle ~ /days/) @{ idle *= 24 * 60 * 60 @} - - print $1, $2, idle -@} -@end smallexample - -Here is the result of running the program on the data: - -@smallexample -hzuo ttyV0 0 -hzang ttyV3 50 -eklye ttyV5 0 -dportein ttyV6 107 -gierd ttyD3 1 -dave ttyD4 0 -brent ttyp0 286 -dave ttyq4 1296000 -@end smallexample - -Another (possibly more practical) example of fixed-width input data -would be the input from a deck of balloting cards. In some parts of -the United States, voters make their choices by punching holes in computer -cards. These cards are then processed to count the votes for any particular -candidate or on any particular issue. Since a voter may choose not to -vote on some issue, any column on the card may be empty. An @code{awk} -program for processing such data could use the @code{FIELDWIDTHS} feature -to simplify reading the data.@refill - -@c of course, getting gawk to run on a system with card readers is -@c another story! - -This feature is still experimental, and will likely evolve over time. - -@node Multiple Line, Getline, Constant Size, Reading Files -@section Multiple-Line Records - -@cindex multiple line records -@cindex input, multiple line records -@cindex reading files, multiple line records -@cindex records, multiple line -In some data bases, a single line cannot conveniently hold all the -information in one entry. In such cases, you can use multi-line -records. - -The first step in doing this is to choose your data format: when records -are not defined as single lines, how do you want to define them? -What should separate records? - -One technique is to use an unusual character or string to separate -records. For example, you could use the formfeed character (written -@code{\f} in @code{awk}, as in C) to separate them, making each record -a page of the file. To do this, just set the variable @code{RS} to -@code{"\f"} (a string containing the formfeed character). Any -other character could equally well be used, as long as it won't be part -of the data in a record.@refill - -@ignore -Another technique is to have blank lines separate records. The string -@code{"^\n+"} is a regular expression that matches any sequence of -newlines starting at the beginning of a line---in other words, it -matches a sequence of blank lines. If you set @code{RS} to this string, -a record always ends at the first blank line encountered. In -addition, a regular expression always matches the longest possible -sequence when there is a choice. So the next record doesn't start until -the first nonblank line that follows---no matter how many blank lines -appear in a row, they are considered one record-separator. -@end ignore - -Another technique is to have blank lines separate records. By a special -dispensation, a null string as the value of @code{RS} indicates that -records are separated by one or more blank lines. If you set @code{RS} -to the null string, a record always ends at the first blank line -encountered. And the next record doesn't start until the first nonblank -line that follows---no matter how many blank lines appear in a row, they -are considered one record-separator. (End of file is also considered -a record separator.)@refill -@c !!! This use of `end of file' is confusing. Needs to be clarified. - -The second step is to separate the fields in the record. One way to do -this is to put each field on a separate line: to do this, just set the -variable @code{FS} to the string @code{"\n"}. (This simple regular -expression matches a single newline.) - -Another way to separate fields is to divide each of the lines into fields -in the normal manner. This happens by default as a result of a special -feature: when @code{RS} is set to the null string, the newline character -@emph{always} acts as a field separator. This is in addition to whatever -field separations result from @code{FS}. - -The original motivation for this special exception was probably so that -you get useful behavior in the default case (i.e., @w{@code{FS == " "}}). -This feature can be a problem if you really don't want the -newline character to separate fields, since there is no way to -prevent it. However, you can work around this by using the @code{split} -function to break up the record manually -(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill - -@ignore -Here are two ways to use records separated by blank lines and break each -line into fields normally: - -@example -awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list - -@exdent @r{or} - -awk 'BEGIN @{ RS = "^\n+"; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list -@end example -@end ignore - -@ignore -Here is how to use records separated by blank lines and break each -line into fields normally: - -@example -awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} ; @{ print $1 @}' BBS-list -@end example -@end ignore - -@node Getline, Close Input, Multiple Line, Reading Files -@section Explicit Input with @code{getline} - -@findex getline -@cindex input, explicit -@cindex explicit input -@cindex input, @code{getline} command -@cindex reading files, @code{getline} command -So far we have been getting our input files from @code{awk}'s main -input stream---either the standard input (usually your terminal) or the -files specified on the command line. The @code{awk} language has a -special built-in command called @code{getline} that -can be used to read input under your explicit control.@refill - -This command is quite complex and should @emph{not} be used by -beginners. It is covered here because this is the chapter on input. -The examples that follow the explanation of the @code{getline} command -include material that has not been covered yet. Therefore, come back -and study the @code{getline} command @emph{after} you have reviewed the -rest of this manual and have a good knowledge of how @code{awk} works. - -@vindex ERRNO -@cindex differences: @code{gawk} and @code{awk} -@code{getline} returns 1 if it finds a record, and 0 if the end of the -file is encountered. If there is some error in getting a record, such -as a file that cannot be opened, then @code{getline} returns @minus{}1. -In this case, @code{gawk} sets the variable @code{ERRNO} to a string -describing the error that occurred. - -In the following examples, @var{command} stands for a string value that -represents a shell command. - -@table @code -@item getline -The @code{getline} command can be used without arguments to read input -from the current input file. All it does in this case is read the next -input record and split it up into fields. This is useful if you've -finished processing the current record, but you want to do some special -processing @emph{right now} on the next record. Here's an -example:@refill - -@example -awk '@{ - if (t = index($0, "/*")) @{ - if (t > 1) - tmp = substr($0, 1, t - 1) - else - tmp = "" - u = index(substr($0, t + 2), "*/") - while (u == 0) @{ - getline - t = -1 - u = index($0, "*/") - @} - if (u <= length($0) - 2) - $0 = tmp substr($0, t + u + 3) - else - $0 = tmp - @} - print $0 -@}' -@end example - -This @code{awk} program deletes all C-style comments, @samp{/* @dots{} -*/}, from the input. By replacing the @samp{print $0} with other -statements, you could perform more complicated processing on the -decommented input, like searching for matches of a regular -expression. (This program has a subtle problem---can you spot it?) - -@c the program to remove comments doesn't work if one -@c comment ends and another begins on the same line. (Your -@c idea for restart would be useful here). --- brennan@boeing.com - -This form of the @code{getline} command sets @code{NF} (the number of -fields; @pxref{Fields, ,Examining Fields}), @code{NR} (the number of -records read so far; @pxref{Records, ,How Input is Split into Records}), -@code{FNR} (the number of records read from this input file), and the -value of @code{$0}. - -@strong{Note:} the new value of @code{$0} is used in testing -the patterns of any subsequent rules. The original value -of @code{$0} that triggered the rule which executed @code{getline} -is lost. By contrast, the @code{next} statement reads a new record -but immediately begins processing it normally, starting with the first -rule in the program. @xref{Next Statement, ,The @code{next} Statement}. - -@item getline @var{var} -This form of @code{getline} reads a record into the variable @var{var}. -This is useful when you want your program to read the next record from -the current input file, but you don't want to subject the record to the -normal input processing. - -For example, suppose the next line is a comment, or a special string, -and you want to read it, but you must make certain that it won't trigger -any rules. This version of @code{getline} allows you to read that line -and store it in a variable so that the main -read-a-line-and-check-each-rule loop of @code{awk} never sees it. - -The following example swaps every two lines of input. For example, given: - -@example -wan -tew -free -phore -@end example - -@noindent -it outputs: - -@example -tew -wan -phore -free -@end example - -@noindent -Here's the program: - -@example -@group -awk '@{ - if ((getline tmp) > 0) @{ - print tmp - print $0 - @} else - print $0 -@}' -@end group -@end example - -The @code{getline} function used in this way sets only the variables -@code{NR} and @code{FNR} (and of course, @var{var}). The record is not -split into fields, so the values of the fields (including @code{$0}) and -the value of @code{NF} do not change.@refill - -@item getline < @var{file} -@cindex input redirection -@cindex redirection of input -This form of the @code{getline} function takes its input from the file -@var{file}. Here @var{file} is a string-valued expression that -specifies the file name. @samp{< @var{file}} is called a @dfn{redirection} -since it directs input to come from a different place. - -This form is useful if you want to read your input from a particular -file, instead of from the main input stream. For example, the following -program reads its input record from the file @file{foo.input} when it -encounters a first field with a value equal to 10 in the current input -file.@refill - -@example -awk '@{ - if ($1 == 10) @{ - getline < "foo.input" - print - @} else - print -@}' -@end example - -Since the main input stream is not used, the values of @code{NR} and -@code{FNR} are not changed. But the record read is split into fields in -the normal manner, so the values of @code{$0} and other fields are -changed. So is the value of @code{NF}. - -This does not cause the record to be tested against all the patterns -in the @code{awk} program, in the way that would happen if the record -were read normally by the main processing loop of @code{awk}. However -the new record is tested against any subsequent rules, just as when -@code{getline} is used without a redirection. - -@item getline @var{var} < @var{file} -This form of the @code{getline} function takes its input from the file -@var{file} and puts it in the variable @var{var}. As above, @var{file} -is a string-valued expression that specifies the file from which to read. - -In this version of @code{getline}, none of the built-in variables are -changed, and the record is not split into fields. The only variable -changed is @var{var}. - -For example, the following program copies all the input files to the -output, except for records that say @w{@samp{@@include @var{filename}}}. -Such a record is replaced by the contents of the file -@var{filename}.@refill - -@example -awk '@{ - if (NF == 2 && $1 == "@@include") @{ - while ((getline line < $2) > 0) - print line - close($2) - @} else - print -@}' -@end example - -Note here how the name of the extra input file is not built into -the program; it is taken from the data, from the second field on -the @samp{@@include} line.@refill - -The @code{close} function is called to ensure that if two identical -@samp{@@include} lines appear in the input, the entire specified file is -included twice. @xref{Close Input, ,Closing Input Files and Pipes}.@refill - -One deficiency of this program is that it does not process nested -@samp{@@include} statements the way a true macro preprocessor would. - -@item @var{command} | getline -You can @dfn{pipe} the output of a command into @code{getline}. A pipe is -simply a way to link the output of one program to the input of another. In -this case, the string @var{command} is run as a shell command and its output -is piped into @code{awk} to be used as input. This form of @code{getline} -reads one record from the pipe. - -For example, the following program copies input to output, except for lines -that begin with @samp{@@execute}, which are replaced by the output produced by -running the rest of the line as a shell command: - -@example -awk '@{ - if ($1 == "@@execute") @{ - tmp = substr($0, 10) - while ((tmp | getline) > 0) - print - close(tmp) - @} else - print -@}' -@end example - -@noindent -The @code{close} function is called to ensure that if two identical -@samp{@@execute} lines appear in the input, the command is run for -each one. @xref{Close Input, ,Closing Input Files and Pipes}. - -Given the input: - -@example -foo -bar -baz -@@execute who -bletch -@end example - -@noindent -the program might produce: - -@example -foo -bar -baz -hack ttyv0 Jul 13 14:22 -hack ttyp0 Jul 13 14:23 (gnu:0) -hack ttyp1 Jul 13 14:23 (gnu:0) -hack ttyp2 Jul 13 14:23 (gnu:0) -hack ttyp3 Jul 13 14:23 (gnu:0) -bletch -@end example - -@noindent -Notice that this program ran the command @code{who} and printed the result. -(If you try this program yourself, you will get different results, showing -you who is logged in on your system.) - -This variation of @code{getline} splits the record into fields, sets the -value of @code{NF} and recomputes the value of @code{$0}. The values of -@code{NR} and @code{FNR} are not changed. - -@item @var{command} | getline @var{var} -The output of the command @var{command} is sent through a pipe to -@code{getline} and into the variable @var{var}. For example, the -following program reads the current date and time into the variable -@code{current_time}, using the @code{date} utility, and then -prints it.@refill - -@example -awk 'BEGIN @{ - "date" | getline current_time - close("date") - print "Report printed on " current_time -@}' -@end example - -In this version of @code{getline}, none of the built-in variables are -changed, and the record is not split into fields. -@end table - -@node Close Input, , Getline, Reading Files -@section Closing Input Files and Pipes -@cindex closing input files and pipes -@findex close - -If the same file name or the same shell command is used with -@code{getline} more than once during the execution of an @code{awk} -program, the file is opened (or the command is executed) only the first time. -At that time, the first record of input is read from that file or command. -The next time the same file or command is used in @code{getline}, another -record is read from it, and so on. - -This implies that if you want to start reading the same file again from -the beginning, or if you want to rerun a shell command (rather than -reading more output from the command), you must take special steps. -What you must do is use the @code{close} function, as follows: - -@example -close(@var{filename}) -@end example - -@noindent -or - -@example -close(@var{command}) -@end example - -The argument @var{filename} or @var{command} can be any expression. Its -value must exactly equal the string that was used to open the file or -start the command---for example, if you open a pipe with this: - -@example -"sort -r names" | getline foo -@end example - -@noindent -then you must close it with this: - -@example -close("sort -r names") -@end example - -Once this function call is executed, the next @code{getline} from that -file or command will reopen the file or rerun the command. - -@iftex -@vindex ERRNO -@cindex differences: @code{gawk} and @code{awk} -@end iftex -@code{close} returns a value of zero if the close succeeded. -Otherwise, the value will be non-zero. -In this case, @code{gawk} sets the variable @code{ERRNO} to a string -describing the error that occurred. - -@node Printing, One-liners, Reading Files, Top -@chapter Printing Output - -@cindex printing -@cindex output -One of the most common things that actions do is to output or @dfn{print} -some or all of the input. For simple output, use the @code{print} -statement. For fancier formatting use the @code{printf} statement. -Both are described in this chapter. - -@menu -* Print:: The @code{print} statement. -* Print Examples:: Simple examples of @code{print} statements. -* Output Separators:: The output separators and how to change them. -* OFMT:: Controlling Numeric Output With @code{print}. -* Printf:: The @code{printf} statement. -* Redirection:: How to redirect output to multiple - files and pipes. -* Special Files:: File name interpretation in @code{gawk}. - @code{gawk} allows access to - inherited file descriptors. -@end menu - -@node Print, Print Examples, Printing, Printing -@section The @code{print} Statement -@cindex @code{print} statement - -The @code{print} statement does output with simple, standardized -formatting. You specify only the strings or numbers to be printed, in a -list separated by commas. They are output, separated by single spaces, -followed by a newline. The statement looks like this: - -@example -print @var{item1}, @var{item2}, @dots{} -@end example - -@noindent -The entire list of items may optionally be enclosed in parentheses. The -parentheses are necessary if any of the item expressions uses a -relational operator; otherwise it could be confused with a redirection -(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}). -The relational operators are @samp{==}, -@samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and -@samp{!~} (@pxref{Comparison Ops, ,Comparison Expressions}).@refill - -The items printed can be constant strings or numbers, fields of the -current record (such as @code{$1}), variables, or any @code{awk} -expressions. The @code{print} statement is completely general for -computing @emph{what} values to print. With two exceptions, -you cannot specify @emph{how} to print them---how many -columns, whether to use exponential notation or not, and so on. -(@xref{Output Separators}, and -@ref{OFMT, ,Controlling Numeric Output with @code{print}}.) -For that, you need the @code{printf} statement -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill - -The simple statement @samp{print} with no items is equivalent to -@samp{print $0}: it prints the entire current record. To print a blank -line, use @samp{print ""}, where @code{""} is the null, or empty, -string. - -To print a fixed piece of text, use a string constant such as -@w{@code{"Hello there"}} as one item. If you forget to use the -double-quote characters, your text will be taken as an @code{awk} -expression, and you will probably get an error. Keep in mind that a -space is printed between any two items. - -Most often, each @code{print} statement makes one line of output. But it -isn't limited to one line. If an item value is a string that contains a -newline, the newline is output along with the rest of the string. A -single @code{print} can make any number of lines this way. - -@node Print Examples, Output Separators, Print, Printing -@section Examples of @code{print} Statements - -Here is an example of printing a string that contains embedded newlines: - -@example -awk 'BEGIN @{ print "line one\nline two\nline three" @}' -@end example - -@noindent -produces output like this: - -@example -line one -line two -line three -@end example - -Here is an example that prints the first two fields of each input record, -with a space between them: - -@example -awk '@{ print $1, $2 @}' inventory-shipped -@end example - -@noindent -Its output looks like this: - -@example -Jan 13 -Feb 15 -Mar 15 -@dots{} -@end example - -A common mistake in using the @code{print} statement is to omit the comma -between two items. This often has the effect of making the items run -together in the output, with no space. The reason for this is that -juxtaposing two string expressions in @code{awk} means to concatenate -them. For example, without the comma: - -@example -awk '@{ print $1 $2 @}' inventory-shipped -@end example - -@noindent -prints: - -@example -@group -Jan13 -Feb15 -Mar15 -@dots{} -@end group -@end example - -Neither example's output makes much sense to someone unfamiliar with the -file @file{inventory-shipped}. A heading line at the beginning would make -it clearer. Let's add some headings to our table of months (@code{$1}) and -green crates shipped (@code{$2}). We do this using the @code{BEGIN} pattern -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}) to force the headings to be printed only once: - -@example -awk 'BEGIN @{ print "Month Crates" - print "----- ------" @} - @{ print $1, $2 @}' inventory-shipped -@end example - -@noindent -Did you already guess what happens? This program prints the following: - -@example -@group -Month Crates ------ ------ -Jan 13 -Feb 15 -Mar 15 -@dots{} -@end group -@end example - -@noindent -The headings and the table data don't line up! We can fix this by printing -some spaces between the two fields: - -@example -awk 'BEGIN @{ print "Month Crates" - print "----- ------" @} - @{ print $1, " ", $2 @}' inventory-shipped -@end example - -You can imagine that this way of lining up columns can get pretty -complicated when you have many columns to fix. Counting spaces for two -or three columns can be simple, but more than this and you can get -``lost'' quite easily. This is why the @code{printf} statement was -created (@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}); -one of its specialties is lining up columns of data.@refill - -@node Output Separators, OFMT, Print Examples, Printing -@section Output Separators - -@cindex output field separator, @code{OFS} -@vindex OFS -@vindex ORS -@cindex output record separator, @code{ORS} -As mentioned previously, a @code{print} statement contains a list -of items, separated by commas. In the output, the items are normally -separated by single spaces. But they do not have to be spaces; a -single space is only the default. You can specify any string of -characters to use as the @dfn{output field separator} by setting the -built-in variable @code{OFS}. The initial value of this variable -is the string @w{@code{" "}}, that is, just a single space.@refill - -The output from an entire @code{print} statement is called an -@dfn{output record}. Each @code{print} statement outputs one output -record and then outputs a string called the @dfn{output record separator}. -The built-in variable @code{ORS} specifies this string. The initial -value of the variable is the string @code{"\n"} containing a newline -character; thus, normally each @code{print} statement makes a separate line. - -You can change how output fields and records are separated by assigning -new values to the variables @code{OFS} and/or @code{ORS}. The usual -place to do this is in the @code{BEGIN} rule -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}), so -that it happens before any input is processed. You may also do this -with assignments on the command line, before the names of your input -files.@refill - -The following example prints the first and second fields of each input -record separated by a semicolon, with a blank line added after each -line:@refill - -@example -@group -awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @} - @{ print $1, $2 @}' BBS-list -@end group -@end example - -If the value of @code{ORS} does not contain a newline, all your output -will be run together on a single line, unless you output newlines some -other way. - -@node OFMT, Printf, Output Separators, Printing -@section Controlling Numeric Output with @code{print} -@vindex OFMT -When you use the @code{print} statement to print numeric values, -@code{awk} internally converts the number to a string of characters, -and prints that string. @code{awk} uses the @code{sprintf} function -to do this conversion. For now, it suffices to say that the @code{sprintf} -function accepts a @dfn{format specification} that tells it how to format -numbers (or strings), and that there are a number of different ways that -numbers can be formatted. The different format specifications are discussed -more fully in -@ref{Printf, ,Using @code{printf} Statements for Fancier Printing}.@refill - -The built-in variable @code{OFMT} contains the default format specification -that @code{print} uses with @code{sprintf} when it wants to convert a -number to a string for printing. By supplying different format specifications -as the value of @code{OFMT}, you can change how @code{print} will print -your numbers. As a brief example: - -@example -@group -awk 'BEGIN @{ OFMT = "%d" # print numbers as integers - print 17.23 @}' -@end group -@end example - -@noindent -will print @samp{17}. - -@node Printf, Redirection, OFMT, Printing -@section Using @code{printf} Statements for Fancier Printing -@cindex formatted output -@cindex output, formatted - -If you want more precise control over the output format than -@code{print} gives you, use @code{printf}. With @code{printf} you can -specify the width to use for each item, and you can specify various -stylistic choices for numbers (such as what radix to use, whether to -print an exponent, whether to print a sign, and how many digits to print -after the decimal point). You do this by specifying a string, called -the @dfn{format string}, which controls how and where to print the other -arguments. - -@menu -* Basic Printf:: Syntax of the @code{printf} statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -@end menu - -@node Basic Printf, Control Letters, Printf, Printf -@subsection Introduction to the @code{printf} Statement - -@cindex @code{printf} statement, syntax of -The @code{printf} statement looks like this:@refill - -@example -printf @var{format}, @var{item1}, @var{item2}, @dots{} -@end example - -@noindent -The entire list of arguments may optionally be enclosed in parentheses. The -parentheses are necessary if any of the item expressions uses a -relational operator; otherwise it could be confused with a redirection -(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}). -The relational operators are @samp{==}, -@samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and -@samp{!~} (@pxref{Comparison Ops, ,Comparison Expressions}).@refill - -@cindex format string -The difference between @code{printf} and @code{print} is the argument -@var{format}. This is an expression whose value is taken as a string; it -specifies how to output each of the other arguments. It is called -the @dfn{format string}. - -The format string is the same as in the @sc{ansi} C library function -@code{printf}. Most of @var{format} is text to be output verbatim. -Scattered among this text are @dfn{format specifiers}, one per item. -Each format specifier says to output the next item at that place in the -format.@refill - -The @code{printf} statement does not automatically append a newline to its -output. It outputs only what the format specifies. So if you want -a newline, you must include one in the format. The output separator -variables @code{OFS} and @code{ORS} have no effect on @code{printf} -statements.@refill - -@node Control Letters, Format Modifiers, Basic Printf, Printf -@subsection Format-Control Letters -@cindex @code{printf}, format-control characters -@cindex format specifier - -A format specifier starts with the character @samp{%} and ends with a -@dfn{format-control letter}; it tells the @code{printf} statement how -to output one item. (If you actually want to output a @samp{%}, write -@samp{%%}.) The format-control letter specifies what kind of value to -print. The rest of the format specifier is made up of optional -@dfn{modifiers} which are parameters such as the field width to use.@refill - -Here is a list of the format-control letters: - -@table @samp -@item c -This prints a number as an ASCII character. Thus, @samp{printf "%c", -65} outputs the letter @samp{A}. The output for a string value is -the first character of the string. - -@item d -This prints a decimal integer. - -@item i -This also prints a decimal integer. - -@item e -This prints a number in scientific (exponential) notation. -For example, - -@example -printf "%4.3e", 1950 -@end example - -@noindent -prints @samp{1.950e+03}, with a total of four significant figures of -which three follow the decimal point. The @samp{4.3} are @dfn{modifiers}, -discussed below. - -@item f -This prints a number in floating point notation. - -@item g -This prints a number in either scientific notation or floating point -notation, whichever uses fewer characters. -@ignore -From: gatech!ames!elroy!cit-vax!EQL.Caltech.Edu!rankin (Pat Rankin) - -In the description of printf formats (p.43), the information for %g -is incorrect (mainly, it's too much of an oversimplification). It's -wrong in the AWK book too, and in the gawk man page. I suggested to -David Trueman before 2.13 was released that the latter be revised, so -that it matched gawk's behavior (rather than trying to change gawk to -match the docs ;-). The documented description is nice and simple, but -it doesn't match the actual underlying behavior of %g in the various C -run-time libraries that gawk relies on. The precision value for g format -is different than for f and e formats, so it's inaccurate to say 'g' is -the shorter of 'e' or 'f'. For 'g', precision represents the number of -significant digits rather than the number of decimal places, and it has -special rules about how to format numbers with range between 10E-1 and -10E-4. All in all, it's pretty messy, and I had to add that clumsy -GFMT_WORKAROUND code because the VMS run-time library doesn't conform to -the ANSI-C specifications. -@end ignore - -@item o -This prints an unsigned octal integer. - -@item s -This prints a string. - -@item x -This prints an unsigned hexadecimal integer. - -@item X -This prints an unsigned hexadecimal integer. However, for the values 10 -through 15, it uses the letters @samp{A} through @samp{F} instead of -@samp{a} through @samp{f}. - -@item % -This isn't really a format-control letter, but it does have a meaning -when used after a @samp{%}: the sequence @samp{%%} outputs one -@samp{%}. It does not consume an argument. -@end table - -@node Format Modifiers, Printf Examples, Control Letters, Printf -@subsection Modifiers for @code{printf} Formats - -@cindex @code{printf}, modifiers -@cindex modifiers (in format specifiers) -A format specification can also include @dfn{modifiers} that can control -how much of the item's value is printed and how much space it gets. The -modifiers come between the @samp{%} and the format-control letter. Here -are the possible modifiers, in the order in which they may appear: - -@table @samp -@item - -The minus sign, used before the width modifier, says to left-justify -the argument within its specified width. Normally the argument -is printed right-justified in the specified width. Thus, - -@example -printf "%-4s", "foo" -@end example - -@noindent -prints @samp{foo }. - -@item @var{width} -This is a number representing the desired width of a field. Inserting any -number between the @samp{%} sign and the format control character forces the -field to be expanded to this width. The default way to do this is to -pad with spaces on the left. For example, - -@example -printf "%4s", "foo" -@end example - -@noindent -prints @samp{ foo}. - -The value of @var{width} is a minimum width, not a maximum. If the item -value requires more than @var{width} characters, it can be as wide as -necessary. Thus, - -@example -printf "%4s", "foobar" -@end example - -@noindent -prints @samp{foobar}. - -Preceding the @var{width} with a minus sign causes the output to be -padded with spaces on the right, instead of on the left. - -@item .@var{prec} -This is a number that specifies the precision to use when printing. -This specifies the number of digits you want printed to the right of the -decimal point. For a string, it specifies the maximum number of -characters from the string that should be printed. -@end table - -The C library @code{printf}'s dynamic @var{width} and @var{prec} -capability (for example, @code{"%*.*s"}) is supported. Instead of -supplying explicit @var{width} and/or @var{prec} values in the format -string, you pass them in the argument list. For example:@refill - -@example -w = 5 -p = 3 -s = "abcdefg" -printf "<%*.*s>\n", w, p, s -@end example - -@noindent -is exactly equivalent to - -@example -s = "abcdefg" -printf "<%5.3s>\n", s -@end example - -@noindent -Both programs output @samp{@w{<@bullet{}@bullet{}abc>}}. (We have -used the bullet symbol ``@bullet{}'' to represent a space, to clearly -show you that there are two spaces in the output.)@refill - -Earlier versions of @code{awk} did not support this capability. You may -simulate it by using concatenation to build up the format string, -like so:@refill - -@example -w = 5 -p = 3 -s = "abcdefg" -printf "<%" w "." p "s>\n", s -@end example - -@noindent -This is not particularly easy to read, however. - -@node Printf Examples, , Format Modifiers, Printf -@subsection Examples of Using @code{printf} - -Here is how to use @code{printf} to make an aligned table: - -@example -awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@end example - -@noindent -prints the names of bulletin boards (@code{$1}) of the file -@file{BBS-list} as a string of 10 characters, left justified. It also -prints the phone numbers (@code{$2}) afterward on the line. This -produces an aligned two-column table of names and phone numbers:@refill - -@example -@group -aardvark 555-5553 -alpo-net 555-3412 -barfly 555-7685 -bites 555-1675 -camelot 555-0542 -core 555-2912 -fooey 555-1234 -foot 555-6699 -macfoo 555-6480 -sdace 555-3430 -sabafoo 555-2127 -@end group -@end example - -Did you notice that we did not specify that the phone numbers be printed -as numbers? They had to be printed as strings because the numbers are -separated by a dash. This dash would be interpreted as a minus sign if -we had tried to print the phone numbers as numbers. This would have led -to some pretty confusing results. - -We did not specify a width for the phone numbers because they are the -last things on their lines. We don't need to put spaces after them. - -We could make our table look even nicer by adding headings to the tops -of the columns. To do this, use the @code{BEGIN} pattern -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}) -to force the header to be printed only once, at the beginning of -the @code{awk} program:@refill - -@example -@group -awk 'BEGIN @{ print "Name Number" - print "---- ------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@end group -@end example - -Did you notice that we mixed @code{print} and @code{printf} statements in -the above example? We could have used just @code{printf} statements to get -the same results: - -@example -@group -awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number" - printf "%-10s %s\n", "----", "------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@end group -@end example - -@noindent -By outputting each column heading with the same format specification -used for the elements of the column, we have made sure that the headings -are aligned just like the columns. - -The fact that the same format specification is used three times can be -emphasized by storing it in a variable, like this: - -@example -awk 'BEGIN @{ format = "%-10s %s\n" - printf format, "Name", "Number" - printf format, "----", "------" @} - @{ printf format, $1, $2 @}' BBS-list -@end example - -See if you can use the @code{printf} statement to line up the headings and -table data for our @file{inventory-shipped} example covered earlier in the -section on the @code{print} statement -(@pxref{Print, ,The @code{print} Statement}).@refill - -@node Redirection, Special Files, Printf, Printing -@section Redirecting Output of @code{print} and @code{printf} - -@cindex output redirection -@cindex redirection of output -So far we have been dealing only with output that prints to the standard -output, usually your terminal. Both @code{print} and @code{printf} can -also send their output to other places. -This is called @dfn{redirection}.@refill - -A redirection appears after the @code{print} or @code{printf} statement. -Redirections in @code{awk} are written just like redirections in shell -commands, except that they are written inside the @code{awk} program. - -@menu -* File/Pipe Redirection:: Redirecting Output to Files and Pipes. -* Close Output:: How to close output files and pipes. -@end menu - -@node File/Pipe Redirection, Close Output, Redirection, Redirection -@subsection Redirecting Output to Files and Pipes - -Here are the three forms of output redirection. They are all shown for -the @code{print} statement, but they work identically for @code{printf} -also.@refill - -@table @code -@item print @var{items} > @var{output-file} -This type of redirection prints the items onto the output file -@var{output-file}. The file name @var{output-file} can be any -expression. Its value is changed to a string and then used as a -file name (@pxref{Expressions, ,Expressions as Action Statements}).@refill - -When this type of redirection is used, the @var{output-file} is erased -before the first output is written to it. Subsequent writes do not -erase @var{output-file}, but append to it. If @var{output-file} does -not exist, then it is created.@refill - -For example, here is how one @code{awk} program can write a list of -BBS names to a file @file{name-list} and a list of phone numbers to a -file @file{phone-list}. Each output file contains one name or number -per line. - -@smallexample -awk '@{ print $2 > "phone-list" - print $1 > "name-list" @}' BBS-list -@end smallexample - -@item print @var{items} >> @var{output-file} -This type of redirection prints the items onto the output file -@var{output-file}. The difference between this and the -single-@samp{>} redirection is that the old contents (if any) of -@var{output-file} are not erased. Instead, the @code{awk} output is -appended to the file. - -@cindex pipes for output -@cindex output, piping -@item print @var{items} | @var{command} -It is also possible to send output through a @dfn{pipe} instead of into a -file. This type of redirection opens a pipe to @var{command} and writes -the values of @var{items} through this pipe, to another process created -to execute @var{command}.@refill - -The redirection argument @var{command} is actually an @code{awk} -expression. Its value is converted to a string, whose contents give the -shell command to be run. - -For example, this produces two files, one unsorted list of BBS names -and one list sorted in reverse alphabetical order: - -@smallexample -awk '@{ print $1 > "names.unsorted" - print $1 | "sort -r > names.sorted" @}' BBS-list -@end smallexample - -Here the unsorted list is written with an ordinary redirection while -the sorted list is written by piping through the @code{sort} utility. - -Here is an example that uses redirection to mail a message to a mailing -list @samp{bug-system}. This might be useful when trouble is encountered -in an @code{awk} script run periodically for system maintenance. - -@smallexample -report = "mail bug-system" -print "Awk script failed:", $0 | report -print "at record number", FNR, "of", FILENAME | report -close(report) -@end smallexample - -We call the @code{close} function here because it's a good idea to close -the pipe as soon as all the intended output has been sent to it. -@xref{Close Output, ,Closing Output Files and Pipes}, for more information -on this. This example also illustrates the use of a variable to represent -a @var{file} or @var{command}: it is not necessary to always -use a string constant. Using a variable is generally a good idea, -since @code{awk} requires you to spell the string value identically -every time. -@end table - -Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system -to open a file or pipe only if the particular @var{file} or @var{command} -you've specified has not already been written to by your program, or if -it has been closed since it was last written to.@refill - -@node Close Output, , File/Pipe Redirection, Redirection -@subsection Closing Output Files and Pipes -@cindex closing output files and pipes -@findex close - -When a file or pipe is opened, the file name or command associated with -it is remembered by @code{awk} and subsequent writes to the same file or -command are appended to the previous writes. The file or pipe stays -open until @code{awk} exits. This is usually convenient. - -Sometimes there is a reason to close an output file or pipe earlier -than that. To do this, use the @code{close} function, as follows: - -@example -close(@var{filename}) -@end example - -@noindent -or - -@example -close(@var{command}) -@end example - -The argument @var{filename} or @var{command} can be any expression. -Its value must exactly equal the string used to open the file or pipe -to begin with---for example, if you open a pipe with this: - -@example -print $1 | "sort -r > names.sorted" -@end example - -@noindent -then you must close it with this: - -@example -close("sort -r > names.sorted") -@end example - -Here are some reasons why you might need to close an output file: - -@itemize @bullet -@item -To write a file and read it back later on in the same @code{awk} -program. Close the file when you are finished writing it; then -you can start reading it with @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill - -@item -To write numerous files, successively, in the same @code{awk} -program. If you don't close the files, eventually you may exceed a -system limit on the number of open files in one process. So close -each one when you are finished writing it. - -@item -To make a command finish. When you redirect output through a pipe, -the command reading the pipe normally continues to try to read input -as long as the pipe is open. Often this means the command cannot -really do its work until the pipe is closed. For example, if you -redirect output to the @code{mail} program, the message is not -actually sent until the pipe is closed. - -@item -To run the same program a second time, with the same arguments. -This is not the same thing as giving more input to the first run! - -For example, suppose you pipe output to the @code{mail} program. If you -output several lines redirected to this pipe without closing it, they make -a single message of several lines. By contrast, if you close the pipe -after each line of output, then each line makes a separate message. -@end itemize - -@iftex -@vindex ERRNO -@cindex differences: @code{gawk} and @code{awk} -@end iftex -@code{close} returns a value of zero if the close succeeded. -Otherwise, the value will be non-zero. -In this case, @code{gawk} sets the variable @code{ERRNO} to a string -describing the error that occurred. - -@node Special Files, , Redirection, Printing -@section Standard I/O Streams -@cindex standard input -@cindex standard output -@cindex standard error output -@cindex file descriptors - -Running programs conventionally have three input and output streams -already available to them for reading and writing. These are known as -the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error -output}. These streams are, by default, terminal input and output, but -they are often redirected with the shell, via the @samp{<}, @samp{<<}, -@samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error -is used only for writing error messages; the reason we have two separate -streams, standard output and standard error, is so that they can be -redirected separately. - -@iftex -@cindex differences: @code{gawk} and @code{awk} -@end iftex -In other implementations of @code{awk}, the only way to write an error -message to standard error in an @code{awk} program is as follows: - -@smallexample -print "Serious error detected!\n" | "cat 1>&2" -@end smallexample - -@noindent -This works by opening a pipeline to a shell command which can access the -standard error stream which it inherits from the @code{awk} process. -This is far from elegant, and is also inefficient, since it requires a -separate process. So people writing @code{awk} programs have often -neglected to do this. Instead, they have sent the error messages to the -terminal, like this: - -@smallexample -@group -NF != 4 @{ - printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty" -@} -@end group -@end smallexample - -@noindent -This has the same effect most of the time, but not always: although the -standard error stream is usually the terminal, it can be redirected, and -when that happens, writing to the terminal is not correct. In fact, if -@code{awk} is run from a background job, it may not have a terminal at all. -Then opening @file{/dev/tty} will fail. - -@code{gawk} provides special file names for accessing the three standard -streams. When you redirect input or output in @code{gawk}, if the file name -matches one of these special names, then @code{gawk} directly uses the -stream it stands for. - -@cindex @file{/dev/stdin} -@cindex @file{/dev/stdout} -@cindex @file{/dev/stderr} -@cindex @file{/dev/fd/} -@table @file -@item /dev/stdin -The standard input (file descriptor 0). - -@item /dev/stdout -The standard output (file descriptor 1). - -@item /dev/stderr -The standard error output (file descriptor 2). - -@item /dev/fd/@var{N} -The file associated with file descriptor @var{N}. Such a file must have -been opened by the program initiating the @code{awk} execution (typically -the shell). Unless you take special pains, only descriptors 0, 1 and 2 -are available. -@end table - -The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} -are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2}, -respectively, but they are more self-explanatory. - -The proper way to write an error message in a @code{gawk} program -is to use @file{/dev/stderr}, like this: - -@smallexample -NF != 4 @{ - printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr" -@} -@end smallexample - -@code{gawk} also provides special file names that give access to information -about the running @code{gawk} process. Each of these ``files'' provides -a single record of information. To read them more than once, you must -first close them with the @code{close} function -(@pxref{Close Input, ,Closing Input Files and Pipes}). -The filenames are: - -@cindex @file{/dev/pid} -@cindex @file{/dev/pgrpid} -@cindex @file{/dev/ppid} -@cindex @file{/dev/user} -@table @file -@item /dev/pid -Reading this file returns the process ID of the current process, -in decimal, terminated with a newline. - -@item /dev/ppid -Reading this file returns the parent process ID of the current process, -in decimal, terminated with a newline. - -@item /dev/pgrpid -Reading this file returns the process group ID of the current process, -in decimal, terminated with a newline. - -@item /dev/user -Reading this file returns a single record terminated with a newline. -The fields are separated with blanks. The fields represent the -following information: - -@table @code -@item $1 -The value of the @code{getuid} system call. - -@item $2 -The value of the @code{geteuid} system call. - -@item $3 -The value of the @code{getgid} system call. - -@item $4 -The value of the @code{getegid} system call. -@end table - -If there are any additional fields, they are the group IDs returned by -@code{getgroups} system call. -(Multiple groups may not be supported on all systems.)@refill -@end table - -These special file names may be used on the command line as data -files, as well as for I/O redirections within an @code{awk} program. -They may not be used as source files with the @samp{-f} option. - -Recognition of these special file names is disabled if @code{gawk} is in -compatibility mode (@pxref{Command Line, ,Invoking @code{awk}}). - -@quotation -@strong{Caution}: Unless your system actually has a @file{/dev/fd} directory -(or any of the other above listed special files), -the interpretation of these file names is done by @code{gawk} itself. -For example, using @samp{/dev/fd/4} for output will actually write on -file descriptor 4, and not on a new file descriptor that was @code{dup}'ed -from file descriptor 4. Most of the time this does not matter; however, it -is important to @emph{not} close any of the files related to file descriptors -0, 1, and 2. If you do close one of these files, unpredictable behavior -will result. -@end quotation - -@node One-liners, Patterns, Printing, Top -@chapter Useful ``One-liners'' - -@cindex one-liners -Useful @code{awk} programs are often short, just a line or two. Here is a -collection of useful, short programs to get you started. Some of these -programs contain constructs that haven't been covered yet. The description -of the program will give you a good idea of what is going on, but please -read the rest of the manual to become an @code{awk} expert! - -@c Per suggestions from Michal Jaegermann -@ifinfo -Since you are reading this in Info, each line of the example code is -enclosed in quotes, to represent text that you would type literally. -The examples themselves represent shell commands that use single quotes -to keep the shell from interpreting the contents of the program. -When reading the examples, focus on the text between the open and close -quotes. -@end ifinfo - -@table @code -@item awk '@{ if (NF > max) max = NF @} -@itemx @ @ @ @ @ END @{ print max @}' -This program prints the maximum number of fields on any input line. - -@item awk 'length($0) > 80' -This program prints every line longer than 80 characters. The sole -rule has a relational expression as its pattern, and has no action (so the -default action, printing the record, is used). - -@item awk 'NF > 0' -This program prints every line that has at least one field. This is an -easy way to delete blank lines from a file (or rather, to create a new -file similar to the old file but from which the blank lines have been -deleted). - -@item awk '@{ if (NF > 0) print @}' -This program also prints every line that has at least one field. Here we -allow the rule to match every line, then decide in the action whether -to print. - -@item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++) -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}' -This program prints 7 random numbers from 0 to 100, inclusive. - -@item ls -l @var{files} | awk '@{ x += $4 @} ; END @{ print "total bytes: " x @}' -This program prints the total number of bytes used by @var{files}. - -@item expand@ @var{file}@ |@ awk@ '@{ if (x < length()) x = length() @} -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}' -This program prints the maximum line length of @var{file}. The input -is piped through the @code{expand} program to change tabs into spaces, -so the widths compared are actually the right-margin columns. - -@item awk 'BEGIN @{ FS = ":" @} -@itemx @ @ @ @ @ @{ print $1 | "sort" @}' /etc/passwd -This program prints a sorted list of the login names of all users. - -@item awk '@{ nlines++ @} -@itemx @ @ @ @ @ END@ @{ print nlines @}' -This programs counts lines in a file. - -@item awk 'END @{ print NR @}' -This program also counts lines in a file, but lets @code{awk} do the work. - -@item awk '@{ print NR, $0 @}' -This program adds line numbers to all its input files, -similar to @samp{cat -n}. -@end table - -@node Patterns, Actions, One-liners, Top -@chapter Patterns -@cindex pattern, definition of - -Patterns in @code{awk} control the execution of rules: a rule is -executed when its pattern matches the current input record. This -chapter tells all about how to write patterns. - -@menu -* Kinds of Patterns:: A list of all kinds of patterns. - The following subsections describe - them in detail. -* Regexp:: Regular expressions such as @samp{/foo/}. -* Comparison Patterns:: Comparison expressions such as @code{$1 > 10}. -* Boolean Patterns:: Combining comparison expressions. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup rules. -* Empty:: The empty pattern, which matches every record. -@end menu - -@node Kinds of Patterns, Regexp, Patterns, Patterns -@section Kinds of Patterns -@cindex patterns, types of - -Here is a summary of the types of patterns supported in @code{awk}. -@c At the next rewrite, check to see that this order matches the -@c order in the text. It might not matter to a reader, but it's good -@c style. Also, it might be nice to mention all the topics of sections -@c that follow in this list; that way people can scan and know when to -@c expect a specific topic. Specifically please also make an entry -@c for Boolean operators as patterns in the right place. --mew - -@table @code -@item /@var{regular expression}/ -A regular expression as a pattern. It matches when the text of the -input record fits the regular expression. -(@xref{Regexp, ,Regular Expressions as Patterns}.)@refill - -@item @var{expression} -A single expression. It matches when its value, converted to a number, -is nonzero (if a number) or nonnull (if a string). -(@xref{Expression Patterns, ,Expressions as Patterns}.)@refill - -@item @var{pat1}, @var{pat2} -A pair of patterns separated by a comma, specifying a range of records. -(@xref{Ranges, ,Specifying Record Ranges with Patterns}.) - -@item BEGIN -@itemx END -Special patterns to supply start-up or clean-up information to -@code{awk}. (@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}.) - -@item @var{null} -The empty pattern matches every input record. -(@xref{Empty, ,The Empty Pattern}.)@refill -@end table - - -@node Regexp, Comparison Patterns, Kinds of Patterns, Patterns -@section Regular Expressions as Patterns -@cindex pattern, regular expressions -@cindex regexp -@cindex regular expressions as patterns - -A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a -class of strings. A regular expression enclosed in slashes (@samp{/}) -is an @code{awk} pattern that matches every input record whose text -belongs to that class. - -The simplest regular expression is a sequence of letters, numbers, or -both. Such a regexp matches any string that contains that sequence. -Thus, the regexp @samp{foo} matches any string containing @samp{foo}. -Therefore, the pattern @code{/foo/} matches any input record containing -@samp{foo}. Other kinds of regexps let you specify more complicated -classes of strings. - -@menu -* Regexp Usage:: How to Use Regular Expressions -* Regexp Operators:: Regular Expression Operators -* Case-sensitivity:: How to do case-insensitive matching. -@end menu - -@node Regexp Usage, Regexp Operators, Regexp, Regexp -@subsection How to Use Regular Expressions - -A regular expression can be used as a pattern by enclosing it in -slashes. Then the regular expression is matched against the -entire text of each record. (Normally, it only needs -to match some part of the text in order to succeed.) For example, this -prints the second field of each record that contains @samp{foo} anywhere: - -@example -awk '/foo/ @{ print $2 @}' BBS-list -@end example - -@cindex regular expression matching operators -@cindex string-matching operators -@cindex operators, string-matching -@cindex operators, regexp matching -@cindex regexp search operators -Regular expressions can also be used in comparison expressions. Then -you can specify the string to match against; it need not be the entire -current input record. These comparison expressions can be used as -patterns or in @code{if}, @code{while}, @code{for}, and @code{do} statements. - -@table @code -@item @var{exp} ~ /@var{regexp}/ -This is true if the expression @var{exp} (taken as a character string) -is matched by @var{regexp}. The following example matches, or selects, -all input records with the upper-case letter @samp{J} somewhere in the -first field:@refill - -@example -awk '$1 ~ /J/' inventory-shipped -@end example - -So does this: - -@example -awk '@{ if ($1 ~ /J/) print @}' inventory-shipped -@end example - -@item @var{exp} !~ /@var{regexp}/ -This is true if the expression @var{exp} (taken as a character string) -is @emph{not} matched by @var{regexp}. The following example matches, -or selects, all input records whose first field @emph{does not} contain -the upper-case letter @samp{J}:@refill - -@example -awk '$1 !~ /J/' inventory-shipped -@end example -@end table - -@cindex computed regular expressions -@cindex regular expressions, computed -@cindex dynamic regular expressions -The right hand side of a @samp{~} or @samp{!~} operator need not be a -constant regexp (i.e., a string of characters between slashes). It may -be any expression. The expression is evaluated, and converted if -necessary to a string; the contents of the string are used as the -regexp. A regexp that is computed in this way is called a @dfn{dynamic -regexp}. For example: - -@example -identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" -$0 ~ identifier_regexp -@end example - -@noindent -sets @code{identifier_regexp} to a regexp that describes @code{awk} -variable names, and tests if the input record matches this regexp. - -@node Regexp Operators, Case-sensitivity, Regexp Usage, Regexp -@subsection Regular Expression Operators -@cindex metacharacters -@cindex regular expression metacharacters - -You can combine regular expressions with the following characters, -called @dfn{regular expression operators}, or @dfn{metacharacters}, to -increase the power and versatility of regular expressions. - -Here is a table of metacharacters. All characters not listed in the -table stand for themselves. - -@table @code -@item ^ -This matches the beginning of the string or the beginning of a line -within the string. For example: - -@example -^@@chapter -@end example - -@noindent -matches the @samp{@@chapter} at the beginning of a string, and can be used -to identify chapter beginnings in Texinfo source files. - -@item $ -This is similar to @samp{^}, but it matches only at the end of a string -or the end of a line within the string. For example: - -@example -p$ -@end example - -@noindent -matches a record that ends with a @samp{p}. - -@item . -This matches any single character except a newline. For example: - -@example -.P -@end example - -@noindent -matches any single character followed by a @samp{P} in a string. Using -concatenation we can make regular expressions like @samp{U.A}, which -matches any three-character sequence that begins with @samp{U} and ends -with @samp{A}. - -@item [@dots{}] -This is called a @dfn{character set}. It matches any one of the -characters that are enclosed in the square brackets. For example: - -@example -[MVX] -@end example - -@noindent -matches any one of the characters @samp{M}, @samp{V}, or @samp{X} in a -string.@refill - -Ranges of characters are indicated by using a hyphen between the beginning -and ending characters, and enclosing the whole thing in brackets. For -example:@refill - -@example -[0-9] -@end example - -@noindent -matches any digit. - -To include the character @samp{\}, @samp{]}, @samp{-} or @samp{^} in a -character set, put a @samp{\} in front of it. For example: - -@example -[d\]] -@end example - -@noindent -matches either @samp{d}, or @samp{]}.@refill - -This treatment of @samp{\} is compatible with other @code{awk} -implementations, and is also mandated by the @sc{posix} Command Language -and Utilities standard. The regular expressions in @code{awk} are a superset -of the @sc{posix} specification for Extended Regular Expressions (EREs). -@sc{posix} EREs are based on the regular expressions accepted by the -traditional @code{egrep} utility. - -In @code{egrep} syntax, backslash is not syntactically special within -square brackets. This means that special tricks have to be used to -represent the characters @samp{]}, @samp{-} and @samp{^} as members of a -character set. - -In @code{egrep} syntax, to match @samp{-}, write it as @samp{---}, -which is a range containing only @w{@samp{-}.} You may also give @samp{-} -as the first or last character in the set. To match @samp{^}, put it -anywhere except as the first character of a set. To match a @samp{]}, -make it the first character in the set. For example:@refill - -@example -[]d^] -@end example - -@noindent -matches either @samp{]}, @samp{d} or @samp{^}.@refill - -@item [^ @dots{}] -This is a @dfn{complemented character set}. The first character after -the @samp{[} @emph{must} be a @samp{^}. It matches any characters -@emph{except} those in the square brackets (or newline). For example: - -@example -[^0-9] -@end example - -@noindent -matches any character that is not a digit. - -@item | -This is the @dfn{alternation operator} and it is used to specify -alternatives. For example: - -@example -^P|[0-9] -@end example - -@noindent -matches any string that matches either @samp{^P} or @samp{[0-9]}. This -means it matches any string that contains a digit or starts with @samp{P}. - -The alternation applies to the largest possible regexps on either side. -@item (@dots{}) -Parentheses are used for grouping in regular expressions as in -arithmetic. They can be used to concatenate regular expressions -containing the alternation operator, @samp{|}. - -@item * -This symbol means that the preceding regular expression is to be -repeated as many times as possible to find a match. For example: - -@example -ph* -@end example - -@noindent -applies the @samp{*} symbol to the preceding @samp{h} and looks for matches -to one @samp{p} followed by any number of @samp{h}s. This will also match -just @samp{p} if no @samp{h}s are present. - -The @samp{*} repeats the @emph{smallest} possible preceding expression. -(Use parentheses if you wish to repeat a larger expression.) It finds -as many repetitions as possible. For example: - -@example -awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample -@end example - -@noindent -prints every record in the input containing a string of the form -@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.@refill - -@item + -This symbol is similar to @samp{*}, but the preceding expression must be -matched at least once. This means that: - -@example -wh+y -@end example - -@noindent -would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas -@samp{wh*y} would match all three of these strings. This is a simpler -way of writing the last @samp{*} example: - -@example -awk '/\(c[ad]+r x\)/ @{ print @}' sample -@end example - -@item ? -This symbol is similar to @samp{*}, but the preceding expression can be -matched once or not at all. For example: - -@example -fe?d -@end example - -@noindent -will match @samp{fed} and @samp{fd}, but nothing else.@refill - -@item \ -This is used to suppress the special meaning of a character when -matching. For example: - -@example -\$ -@end example - -@noindent -matches the character @samp{$}. - -The escape sequences used for string constants -(@pxref{Constants, ,Constant Expressions}) are -valid in regular expressions as well; they are also introduced by a -@samp{\}.@refill -@end table - -In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators have -the highest precedence, followed by concatenation, and finally by @samp{|}. -As in arithmetic, parentheses can change how operators are grouped.@refill - -@node Case-sensitivity, , Regexp Operators, Regexp -@subsection Case-sensitivity in Matching - -Case is normally significant in regular expressions, both when matching -ordinary characters (i.e., not metacharacters), and inside character -sets. Thus a @samp{w} in a regular expression matches only a lower case -@samp{w} and not an upper case @samp{W}. - -The simplest way to do a case-independent match is to use a character -set: @samp{[Ww]}. However, this can be cumbersome if you need to use it -often; and it can make the regular expressions harder for humans to -read. There are two other alternatives that you might prefer. - -One way to do a case-insensitive match at a particular point in the -program is to convert the data to a single case, using the -@code{tolower} or @code{toupper} built-in string functions (which we -haven't discussed yet; -@pxref{String Functions, ,Built-in Functions for String Manipulation}). -For example:@refill - -@example -tolower($1) ~ /foo/ @{ @dots{} @} -@end example - -@noindent -converts the first field to lower case before matching against it. - -Another method is to set the variable @code{IGNORECASE} to a nonzero -value (@pxref{Built-in Variables}). When @code{IGNORECASE} is not zero, -@emph{all} regexp operations ignore case. Changing the value of -@code{IGNORECASE} dynamically controls the case sensitivity of your -program as it runs. Case is significant by default because -@code{IGNORECASE} (like most variables) is initialized to zero. - -@example -x = "aB" -if (x ~ /ab/) @dots{} # this test will fail - -IGNORECASE = 1 -if (x ~ /ab/) @dots{} # now it will succeed -@end example - -In general, you cannot use @code{IGNORECASE} to make certain rules -case-insensitive and other rules case-sensitive, because there is no way -to set @code{IGNORECASE} just for the pattern of a particular rule. To -do this, you must use character sets or @code{tolower}. However, one -thing you can do only with @code{IGNORECASE} is turn case-sensitivity on -or off dynamically for all the rules at once.@refill - -@code{IGNORECASE} can be set on the command line, or in a @code{BEGIN} -rule. Setting @code{IGNORECASE} from the command line is a way to make -a program case-insensitive without having to edit it. - -The value of @code{IGNORECASE} has no effect if @code{gawk} is in -compatibility mode (@pxref{Command Line, ,Invoking @code{awk}}). -Case is always significant in compatibility mode.@refill - -@node Comparison Patterns, Boolean Patterns, Regexp, Patterns -@section Comparison Expressions as Patterns -@cindex comparison expressions as patterns -@cindex pattern, comparison expressions -@cindex relational operators -@cindex operators, relational - -@dfn{Comparison patterns} test relationships such as equality between -two strings or numbers. They are a special case of expression patterns -(@pxref{Expression Patterns, ,Expressions as Patterns}). They are written -with @dfn{relational operators}, which are a superset of those in C. -Here is a table of them:@refill - -@table @code -@item @var{x} < @var{y} -True if @var{x} is less than @var{y}. - -@item @var{x} <= @var{y} -True if @var{x} is less than or equal to @var{y}. - -@item @var{x} > @var{y} -True if @var{x} is greater than @var{y}. - -@item @var{x} >= @var{y} -True if @var{x} is greater than or equal to @var{y}. - -@item @var{x} == @var{y} -True if @var{x} is equal to @var{y}. - -@item @var{x} != @var{y} -True if @var{x} is not equal to @var{y}. - -@item @var{x} ~ @var{y} -True if @var{x} matches the regular expression described by @var{y}. - -@item @var{x} !~ @var{y} -True if @var{x} does not match the regular expression described by @var{y}. -@end table - -The operands of a relational operator are compared as numbers if they -are both numbers. Otherwise they are converted to, and compared as, -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}, -for the detailed rules). Strings are compared by comparing the first -character of each, then the second character of each, -and so on, until there is a difference. If the two strings are equal until -the shorter one runs out, the shorter one is considered to be less than the -longer one. Thus, @code{"10"} is less than @code{"9"}, and @code{"abc"} -is less than @code{"abcd"}.@refill - -The left operand of the @samp{~} and @samp{!~} operators is a string. -The right operand is either a constant regular expression enclosed in -slashes (@code{/@var{regexp}/}), or any expression, whose string value -is used as a dynamic regular expression -(@pxref{Regexp Usage, ,How to Use Regular Expressions}).@refill - -The following example prints the second field of each input record -whose first field is precisely @samp{foo}. - -@example -awk '$1 == "foo" @{ print $2 @}' BBS-list -@end example - -@noindent -Contrast this with the following regular expression match, which would -accept any record with a first field that contains @samp{foo}: - -@example -awk '$1 ~ "foo" @{ print $2 @}' BBS-list -@end example - -@noindent -or, equivalently, this one: - -@example -awk '$1 ~ /foo/ @{ print $2 @}' BBS-list -@end example - -@node Boolean Patterns, Expression Patterns, Comparison Patterns, Patterns -@section Boolean Operators and Patterns -@cindex patterns, boolean -@cindex boolean patterns - -A @dfn{boolean pattern} is an expression which combines other patterns -using the @dfn{boolean operators} ``or'' (@samp{||}), ``and'' -(@samp{&&}), and ``not'' (@samp{!}). Whether the boolean pattern -matches an input record depends on whether its subpatterns match. - -For example, the following command prints all records in the input file -@file{BBS-list} that contain both @samp{2400} and @samp{foo}.@refill - -@example -awk '/2400/ && /foo/' BBS-list -@end example - -The following command prints all records in the input file -@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or -both.@refill - -@example -awk '/2400/ || /foo/' BBS-list -@end example - -The following command prints all records in the input file -@file{BBS-list} that do @emph{not} contain the string @samp{foo}. - -@example -awk '! /foo/' BBS-list -@end example - -Note that boolean patterns are a special case of expression patterns -(@pxref{Expression Patterns, ,Expressions as Patterns}); they are -expressions that use the boolean operators. -@xref{Boolean Ops, ,Boolean Expressions}, for complete information -on the boolean operators.@refill - -The subpatterns of a boolean pattern can be constant regular -expressions, comparisons, or any other @code{awk} expressions. Range -patterns are not expressions, so they cannot appear inside boolean -patterns. Likewise, the special patterns @code{BEGIN} and @code{END}, -which never match any input record, are not expressions and cannot -appear inside boolean patterns. - -@node Expression Patterns, Ranges, Boolean Patterns, Patterns -@section Expressions as Patterns - -Any @code{awk} expression is also valid as an @code{awk} pattern. -Then the pattern ``matches'' if the expression's value is nonzero (if a -number) or nonnull (if a string). - -The expression is reevaluated each time the rule is tested against a new -input record. If the expression uses fields such as @code{$1}, the -value depends directly on the new input record's text; otherwise, it -depends only on what has happened so far in the execution of the -@code{awk} program, but that may still be useful. - -Comparison patterns are actually a special case of this. For -example, the expression @code{$5 == "foo"} has the value 1 when the -value of @code{$5} equals @code{"foo"}, and 0 otherwise; therefore, this -expression as a pattern matches when the two values are equal. - -Boolean patterns are also special cases of expression patterns. - -A constant regexp as a pattern is also a special case of an expression -pattern. @code{/foo/} as an expression has the value 1 if @samp{foo} -appears in the current input record; thus, as a pattern, @code{/foo/} -matches any record containing @samp{foo}. - -Other implementations of @code{awk} that are not yet @sc{posix} compliant -are less general than @code{gawk}: they allow comparison expressions, and -boolean combinations thereof (optionally with parentheses), but not -necessarily other kinds of expressions. - -@node Ranges, BEGIN/END, Expression Patterns, Patterns -@section Specifying Record Ranges with Patterns - -@cindex range pattern -@cindex patterns, range -A @dfn{range pattern} is made of two patterns separated by a comma, of -the form @code{@var{begpat}, @var{endpat}}. It matches ranges of -consecutive input records. The first pattern @var{begpat} controls -where the range begins, and the second one @var{endpat} controls where -it ends. For example,@refill - -@example -awk '$1 == "on", $1 == "off"' -@end example - -@noindent -prints every record between @samp{on}/@samp{off} pairs, inclusive. - -A range pattern starts out by matching @var{begpat} -against every input record; when a record matches @var{begpat}, the -range pattern becomes @dfn{turned on}. The range pattern matches this -record. As long as it stays turned on, it automatically matches every -input record read. It also matches @var{endpat} against -every input record; when that succeeds, the range pattern is turned -off again for the following record. Now it goes back to checking -@var{begpat} against each record. - -The record that turns on the range pattern and the one that turns it -off both match the range pattern. If you don't want to operate on -these records, you can write @code{if} statements in the rule's action -to distinguish them. - -It is possible for a pattern to be turned both on and off by the same -record, if both conditions are satisfied by that record. Then the action is -executed for just that record. - -@node BEGIN/END, Empty, Ranges, Patterns -@section @code{BEGIN} and @code{END} Special Patterns - -@cindex @code{BEGIN} special pattern -@cindex patterns, @code{BEGIN} -@cindex @code{END} special pattern -@cindex patterns, @code{END} -@code{BEGIN} and @code{END} are special patterns. They are not used to -match input records. Rather, they are used for supplying start-up or -clean-up information to your @code{awk} script. A @code{BEGIN} rule is -executed, once, before the first input record has been read. An @code{END} -rule is executed, once, after all the input has been read. For -example:@refill - -@example -awk 'BEGIN @{ print "Analysis of `foo'" @} - /foo/ @{ ++foobar @} - END @{ print "`foo' appears " foobar " times." @}' BBS-list -@end example - -This program finds the number of records in the input file @file{BBS-list} -that contain the string @samp{foo}. The @code{BEGIN} rule prints a title -for the report. There is no need to use the @code{BEGIN} rule to -initialize the counter @code{foobar} to zero, as @code{awk} does this -for us automatically (@pxref{Variables}). - -The second rule increments the variable @code{foobar} every time a -record containing the pattern @samp{foo} is read. The @code{END} rule -prints the value of @code{foobar} at the end of the run.@refill - -The special patterns @code{BEGIN} and @code{END} cannot be used in ranges -or with boolean operators (indeed, they cannot be used with any operators). - -An @code{awk} program may have multiple @code{BEGIN} and/or @code{END} -rules. They are executed in the order they appear, all the @code{BEGIN} -rules at start-up and all the @code{END} rules at termination. - -Multiple @code{BEGIN} and @code{END} sections are useful for writing -library functions, since each library can have its own @code{BEGIN} or -@code{END} rule to do its own initialization and/or cleanup. Note that -the order in which library functions are named on the command line -controls the order in which their @code{BEGIN} and @code{END} rules are -executed. Therefore you have to be careful to write such rules in -library files so that the order in which they are executed doesn't matter. -@xref{Command Line, ,Invoking @code{awk}}, for more information on -using library functions. - -If an @code{awk} program only has a @code{BEGIN} rule, and no other -rules, then the program exits after the @code{BEGIN} rule has been run. -(Older versions of @code{awk} used to keep reading and ignoring input -until end of file was seen.) However, if an @code{END} rule exists as -well, then the input will be read, even if there are no other rules in -the program. This is necessary in case the @code{END} rule checks the -@code{NR} variable. - -@code{BEGIN} and @code{END} rules must have actions; there is no default -action for these rules since there is no current record when they run. - -@node Empty, , BEGIN/END, Patterns -@comment node-name, next, previous, up -@section The Empty Pattern - -@cindex empty pattern -@cindex pattern, empty -An empty pattern is considered to match @emph{every} input record. For -example, the program:@refill - -@example -awk '@{ print $1 @}' BBS-list -@end example - -@noindent -prints the first field of every record. - -@node Actions, Expressions, Patterns, Top -@chapter Overview of Actions -@cindex action, definition of -@cindex curly braces -@cindex action, curly braces -@cindex action, separating statements - -An @code{awk} program or script consists of a series of -rules and function definitions, interspersed. (Functions are -described later. @xref{User-defined, ,User-defined Functions}.) - -A rule contains a pattern and an action, either of which may be -omitted. The purpose of the @dfn{action} is to tell @code{awk} what to do -once a match for the pattern is found. Thus, the entire program -looks somewhat like this: - -@example -@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]} -@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]} -@dots{} -function @var{name} (@var{args}) @{ @dots{} @} -@dots{} -@end example - -An action consists of one or more @code{awk} @dfn{statements}, enclosed -in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one -thing to be done. The statements are separated by newlines or -semicolons. - -The curly braces around an action must be used even if the action -contains only one statement, or even if it contains no statements at -all. However, if you omit the action entirely, omit the curly braces as -well. (An omitted action is equivalent to @samp{@{ print $0 @}}.) - -Here are the kinds of statements supported in @code{awk}: - -@itemize @bullet -@item -Expressions, which can call functions or assign values to variables -(@pxref{Expressions, ,Expressions as Action Statements}). Executing -this kind of statement simply computes the value of the expression and -then ignores it. This is useful when the expression has side effects -(@pxref{Assignment Ops, ,Assignment Expressions}).@refill - -@item -Control statements, which specify the control flow of @code{awk} -programs. The @code{awk} language gives you C-like constructs -(@code{if}, @code{for}, @code{while}, and so on) as well as a few -special ones (@pxref{Statements, ,Control Statements in Actions}).@refill - -@item -Compound statements, which consist of one or more statements enclosed in -curly braces. A compound statement is used in order to put several -statements together in the body of an @code{if}, @code{while}, @code{do} -or @code{for} statement. - -@item -Input control, using the @code{getline} command -(@pxref{Getline, ,Explicit Input with @code{getline}}), and the @code{next} -statement (@pxref{Next Statement, ,The @code{next} Statement}). - -@item -Output statements, @code{print} and @code{printf}. -@xref{Printing, ,Printing Output}.@refill - -@item -Deletion statements, for deleting array elements. -@xref{Delete, ,The @code{delete} Statement}.@refill -@end itemize - -@iftex -The next two chapters cover in detail expressions and control -statements, respectively. We go on to treat arrays and built-in -functions, both of which are used in expressions. Then we proceed -to discuss how to define your own functions. -@end iftex - -@node Expressions, Statements, Actions, Top -@chapter Expressions as Action Statements -@cindex expression - -Expressions are the basic building block of @code{awk} actions. An -expression evaluates to a value, which you can print, test, store in a -variable or pass to a function. But beyond that, an expression can assign a new value to a variable -or a field, with an assignment operator. - -An expression can serve as a statement on its own. Most other kinds of -statements contain one or more expressions which specify data to be -operated on. As in other languages, expressions in @code{awk} include -variables, array references, constants, and function calls, as well as -combinations of these with various operators. - -@menu -* Constants:: String, numeric, and regexp constants. -* Variables:: Variables give names to values for later use. -* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, etc.) -* Concatenation:: Concatenating strings. -* Comparison Ops:: Comparison of numbers and strings - with @samp{<}, etc. -* Boolean Ops:: Combining comparison expressions - using boolean operators - @samp{||} (``or''), @samp{&&} (``and'') and @samp{!} (``not''). - -* Assignment Ops:: Changing the value of a variable or a field. -* Increment Ops:: Incrementing the numeric value of a variable. - -* Conversion:: The conversion of strings to numbers - and vice versa. -* Values:: The whole truth about numbers and strings. -* Conditional Exp:: Conditional expressions select - between two subexpressions under control - of a third subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -@end menu - -@node Constants, Variables, Expressions, Expressions -@section Constant Expressions -@cindex constants, types of -@cindex string constants - -The simplest type of expression is the @dfn{constant}, which always has -the same value. There are three types of constants: numeric constants, -string constants, and regular expression constants. - -@cindex numeric constant -@cindex numeric value -A @dfn{numeric constant} stands for a number. This number can be an -integer, a decimal fraction, or a number in scientific (exponential) -notation. Note that all numeric values are represented within -@code{awk} in double-precision floating point. Here are some examples -of numeric constants, which all have the same value: - -@example -105 -1.05e+2 -1050e-1 -@end example - -A string constant consists of a sequence of characters enclosed in -double-quote marks. For example: - -@example -"parrot" -@end example - -@noindent -@iftex -@cindex differences between @code{gawk} and @code{awk} -@end iftex -represents the string whose contents are @samp{parrot}. Strings in -@code{gawk} can be of any length and they can contain all the possible -8-bit ASCII characters including ASCII NUL. Other @code{awk} -implementations may have difficulty with some character codes.@refill - -@cindex escape sequence notation -Some characters cannot be included literally in a string constant. You -represent them instead with @dfn{escape sequences}, which are character -sequences beginning with a backslash (@samp{\}). - -One use of an escape sequence is to include a double-quote character in -a string constant. Since a plain double-quote would end the string, you -must use @samp{\"} to represent a single double-quote character as a -part of the string. -The -backslash character itself is another character that cannot be -included normally; you write @samp{\\} to put one backslash in the -string. Thus, the string whose contents are the two characters -@samp{"\} must be written @code{"\"\\"}. - -Another use of backslash is to represent unprintable characters -such as newline. While there is nothing to stop you from writing most -of these characters directly in a string constant, they may look ugly. - -Here is a table of all the escape sequences used in @code{awk}: - -@table @code -@item \\ -Represents a literal backslash, @samp{\}. - -@item \a -Represents the ``alert'' character, control-g, ASCII code 7. - -@item \b -Represents a backspace, control-h, ASCII code 8. - -@item \f -Represents a formfeed, control-l, ASCII code 12. - -@item \n -Represents a newline, control-j, ASCII code 10. - -@item \r -Represents a carriage return, control-m, ASCII code 13. - -@item \t -Represents a horizontal tab, control-i, ASCII code 9. - -@item \v -Represents a vertical tab, control-k, ASCII code 11. - -@item \@var{nnn} -Represents the octal value @var{nnn}, where @var{nnn} are one to three -digits between 0 and 7. For example, the code for the ASCII ESC -(escape) character is @samp{\033}.@refill - -@item \x@var{hh}@dots{} -Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal -digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or -@samp{a} through @samp{f}). Like the same construct in @sc{ansi} C, the escape -sequence continues until the first non-hexadecimal digit is seen. However, -using more than two hexadecimal digits produces undefined results. (The -@samp{\x} escape sequence is not allowed in @sc{posix} @code{awk}.)@refill -@end table - -A @dfn{constant regexp} is a regular expression description enclosed in -slashes, such as @code{/^beginning and end$/}. Most regexps used in -@code{awk} programs are constant, but the @samp{~} and @samp{!~} -operators can also match computed or ``dynamic'' regexps -(@pxref{Regexp Usage, ,How to Use Regular Expressions}).@refill - -Constant regexps may be used like simple expressions. When a -constant regexp is not on the right hand side of the @samp{~} or -@samp{!~} operators, it has the same meaning as if it appeared -in a pattern, i.e. @samp{($0 ~ /foo/)} -(@pxref{Expression Patterns, ,Expressions as Patterns}). -This means that the two code segments,@refill - -@example -if ($0 ~ /barfly/ || $0 ~ /camelot/) - print "found" -@end example - -@noindent -and - -@example -if (/barfly/ || /camelot/) - print "found" -@end example - -@noindent -are exactly equivalent. One rather bizarre consequence of this rule is -that the following boolean expression is legal, but does not do what the user -intended:@refill - -@example -if (/foo/ ~ $1) print "found foo" -@end example - -This code is ``obviously'' testing @code{$1} for a match against the regexp -@code{/foo/}. But in fact, the expression @code{(/foo/ ~ $1)} actually means -@code{(($0 ~ /foo/) ~ $1)}. In other words, first match the input record -against the regexp @code{/foo/}. The result will be either a 0 or a 1, -depending upon the success or failure of the match. Then match that result -against the first field in the record.@refill - -Since it is unlikely that you would ever really wish to make this kind of -test, @code{gawk} will issue a warning when it sees this construct in -a program.@refill - -Another consequence of this rule is that the assignment statement - -@example -matches = /foo/ -@end example - -@noindent -will assign either 0 or 1 to the variable @code{matches}, depending -upon the contents of the current input record. - -Constant regular expressions are also used as the first argument for -the @code{sub} and @code{gsub} functions -(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill - -This feature of the language was never well documented until the -@sc{posix} specification. - -You may be wondering, when is - -@example -$1 ~ /foo/ @{ @dots{} @} -@end example - -@noindent -preferable to - -@example -$1 ~ "foo" @{ @dots{} @} -@end example - -Since the right-hand sides of both @samp{~} operators are constants, -it is more efficient to use the @samp{/foo/} form: @code{awk} can note -that you have supplied a regexp and store it internally in a form that -makes pattern matching more efficient. In the second form, @code{awk} -must first convert the string into this internal form, and then perform -the pattern matching. The first form is also better style; it shows -clearly that you intend a regexp match. - -@node Variables, Arithmetic Ops, Constants, Expressions -@section Variables -@cindex variables, user-defined -@cindex user-defined variables -@c there should be more than one subsection, ideally. Not a big deal. -@c But usually there are supposed to be at least two. One way to get -@c around this is to write the info in the subsection as the info in the -@c section itself and not have any subsections.. --mew - -Variables let you give names to values and refer to them later. You have -already seen variables in many of the examples. The name of a variable -must be a sequence of letters, digits and underscores, but it may not begin -with a digit. Case is significant in variable names; @code{a} and @code{A} -are distinct variables. - -A variable name is a valid expression by itself; it represents the -variable's current value. Variables are given new values with -@dfn{assignment operators} and @dfn{increment operators}. -@xref{Assignment Ops, ,Assignment Expressions}. - -A few variables have special built-in meanings, such as @code{FS}, the -field separator, and @code{NF}, the number of fields in the current -input record. @xref{Built-in Variables}, for a list of them. These -built-in variables can be used and assigned just like all other -variables, but their values are also used or changed automatically by -@code{awk}. Each built-in variable's name is made entirely of upper case -letters. - -Variables in @code{awk} can be assigned either numeric or string -values. By default, variables are initialized to the null string, which -is effectively zero if converted to a number. There is no need to -``initialize'' each variable explicitly in @code{awk}, the way you would in C or most other traditional languages. - -@menu -* Assignment Options:: Setting variables on the command line - and a summary of command line syntax. - This is an advanced method of input. -@end menu - -@node Assignment Options, , Variables, Variables -@subsection Assigning Variables on the Command Line - -You can set any @code{awk} variable by including a @dfn{variable assignment} -among the arguments on the command line when you invoke @code{awk} -(@pxref{Command Line, ,Invoking @code{awk}}). Such an assignment has -this form:@refill - -@example -@var{variable}=@var{text} -@end example - -@noindent -With it, you can set a variable either at the beginning of the -@code{awk} run or in between input files. - -If you precede the assignment with the @samp{-v} option, like this: - -@example --v @var{variable}=@var{text} -@end example - -@noindent -then the variable is set at the very beginning, before even the -@code{BEGIN} rules are run. The @samp{-v} option and its assignment -must precede all the file name arguments, as well as the program text. - -Otherwise, the variable assignment is performed at a time determined by -its position among the input file arguments: after the processing of the -preceding input file argument. For example: - -@example -awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list -@end example - -@noindent -prints the value of field number @code{n} for all input records. Before -the first file is read, the command line sets the variable @code{n} -equal to 4. This causes the fourth field to be printed in lines from -the file @file{inventory-shipped}. After the first file has finished, -but before the second file is started, @code{n} is set to 2, so that the -second field is printed in lines from @file{BBS-list}. - -Command line arguments are made available for explicit examination by -the @code{awk} program in an array named @code{ARGV} -(@pxref{Built-in Variables}).@refill - -@code{awk} processes the values of command line assignments for escape -sequences (@pxref{Constants, ,Constant Expressions}). - -@node Arithmetic Ops, Concatenation, Variables, Expressions -@section Arithmetic Operators -@cindex arithmetic operators -@cindex operators, arithmetic -@cindex addition -@cindex subtraction -@cindex multiplication -@cindex division -@cindex remainder -@cindex quotient -@cindex exponentiation - -The @code{awk} language uses the common arithmetic operators when -evaluating expressions. All of these arithmetic operators follow normal -precedence rules, and work as you would expect them to. This example -divides field three by field four, adds field two, stores the result -into field one, and prints the resulting altered input record: - -@example -awk '@{ $1 = $2 + $3 / $4; print @}' inventory-shipped -@end example - -The arithmetic operators in @code{awk} are: - -@table @code -@item @var{x} + @var{y} -Addition. - -@item @var{x} - @var{y} -Subtraction. - -@item - @var{x} -Negation. - -@item + @var{x} -Unary plus. No real effect on the expression. - -@item @var{x} * @var{y} -Multiplication. - -@item @var{x} / @var{y} -Division. Since all numbers in @code{awk} are double-precision -floating point, the result is not rounded to an integer: @code{3 / 4} -has the value 0.75. - -@item @var{x} % @var{y} -@iftex -@cindex differences between @code{gawk} and @code{awk} -@end iftex -Remainder. The quotient is rounded toward zero to an integer, -multiplied by @var{y} and this result is subtracted from @var{x}. -This operation is sometimes known as ``trunc-mod.'' The following -relation always holds: - -@example -b * int(a / b) + (a % b) == a -@end example - -One possibly undesirable effect of this definition of remainder is that -@code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus, - -@example --17 % 8 = -1 -@end example - -In other @code{awk} implementations, the signedness of the remainder -may be machine dependent. - -@item @var{x} ^ @var{y} -@itemx @var{x} ** @var{y} -Exponentiation: @var{x} raised to the @var{y} power. @code{2 ^ 3} has -the value 8. The character sequence @samp{**} is equivalent to -@samp{^}. (The @sc{posix} standard only specifies the use of @samp{^} -for exponentiation.) -@end table - -@node Concatenation, Comparison Ops, Arithmetic Ops, Expressions -@section String Concatenation - -@cindex string operators -@cindex operators, string -@cindex concatenation -There is only one string operation: concatenation. It does not have a -specific operator to represent it. Instead, concatenation is performed by -writing expressions next to one another, with no operator. For example: - -@example -awk '@{ print "Field number one: " $1 @}' BBS-list -@end example - -@noindent -produces, for the first record in @file{BBS-list}: - -@example -Field number one: aardvark -@end example - -Without the space in the string constant after the @samp{:}, the line -would run together. For example: - -@example -awk '@{ print "Field number one:" $1 @}' BBS-list -@end example - -@noindent -produces, for the first record in @file{BBS-list}: - -@example -Field number one:aardvark -@end example - -Since string concatenation does not have an explicit operator, it is -often necessary to insure that it happens where you want it to by -enclosing the items to be concatenated in parentheses. For example, the -following code fragment does not concatenate @code{file} and @code{name} -as you might expect: - -@example -file = "file" -name = "name" -print "something meaningful" > file name -@end example - -@noindent -It is necessary to use the following: - -@example -print "something meaningful" > (file name) -@end example - -We recommend you use parentheses around concatenation in all but the -most common contexts (such as in the right-hand operand of @samp{=}). - -@ignore -@code{gawk} actually now allows a concatenation on the right hand -side of a @code{>} redirection, but other @code{awk}s don't. So for -now we won't mention that fact. -@end ignore - -@node Comparison Ops, Boolean Ops, Concatenation, Expressions -@section Comparison Expressions -@cindex comparison expressions -@cindex expressions, comparison -@cindex relational operators -@cindex operators, relational -@cindex regexp operators - -@dfn{Comparison expressions} compare strings or numbers for -relationships such as equality. They are written using @dfn{relational -operators}, which are a superset of those in C. Here is a table of -them: - -@table @code -@item @var{x} < @var{y} -True if @var{x} is less than @var{y}. - -@item @var{x} <= @var{y} -True if @var{x} is less than or equal to @var{y}. - -@item @var{x} > @var{y} -True if @var{x} is greater than @var{y}. - -@item @var{x} >= @var{y} -True if @var{x} is greater than or equal to @var{y}. - -@item @var{x} == @var{y} -True if @var{x} is equal to @var{y}. - -@item @var{x} != @var{y} -True if @var{x} is not equal to @var{y}. - -@item @var{x} ~ @var{y} -True if the string @var{x} matches the regexp denoted by @var{y}. - -@item @var{x} !~ @var{y} -True if the string @var{x} does not match the regexp denoted by @var{y}. - -@item @var{subscript} in @var{array} -True if array @var{array} has an element with the subscript @var{subscript}. -@end table - -Comparison expressions have the value 1 if true and 0 if false. - -The rules @code{gawk} uses for performing comparisons are based on those -in draft 11.2 of the @sc{posix} standard. The @sc{posix} standard introduced -the concept of a @dfn{numeric string}, which is simply a string that looks -like a number, for example, @code{@w{" +2"}}. - -@vindex CONVFMT -When performing a relational operation, @code{gawk} considers the type of an -operand to be the type it received on its last @emph{assignment}, rather -than the type of its last @emph{use} -(@pxref{Values, ,Numeric and String Values}). -This type is @emph{unknown} when the operand is from an ``external'' source: -field variables, command line arguments, array elements resulting from a -@code{split} operation, and the value of an @code{ENVIRON} element. -In this case only, if the operand is a numeric string, then it is -considered to be of both string type and numeric type. If at least one -operand of a comparison is of string type only, then a string -comparison is performed. Any numeric operand will be converted to a -string using the value of @code{CONVFMT} -(@pxref{Conversion, ,Conversion of Strings and Numbers}). -If one operand of a comparison is numeric, and the other operand is -either numeric or both numeric and string, then @code{gawk} does a -numeric comparison. If both operands have both types, then the -comparison is numeric. Strings are compared -by comparing the first character of each, then the second character of each, -and so on. Thus @code{"10"} is less than @code{"9"}. If there are two -strings where one is a prefix of the other, the shorter string is less than -the longer one. Thus @code{"abc"} is less than @code{"abcd"}.@refill - -Here are some sample expressions, how @code{gawk} compares them, and what -the result of the comparison is. - -@table @code -@item 1.5 <= 2.0 -numeric comparison (true) - -@item "abc" >= "xyz" -string comparison (false) - -@item 1.5 != " +2" -string comparison (true) - -@item "1e2" < "3" -string comparison (true) - -@item a = 2; b = "2" -@itemx a == b -string comparison (true) -@end table - -@example -echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}' -@end example - -@noindent -prints @samp{false} since both @code{$1} and @code{$2} are numeric -strings and thus have both string and numeric types, thus dictating -a numeric comparison. - -The purpose of the comparison rules and the use of numeric strings is -to attempt to produce the behavior that is ``least surprising,'' while -still ``doing the right thing.'' - -String comparisons and regular expression comparisons are very different. -For example, - -@example -$1 == "foo" -@end example - -@noindent -has the value of 1, or is true, if the first field of the current input -record is precisely @samp{foo}. By contrast, - -@example -$1 ~ /foo/ -@end example - -@noindent -has the value 1 if the first field contains @samp{foo}, such as @samp{foobar}. - -The right hand operand of the @samp{~} and @samp{!~} operators may be -either a constant regexp (@code{/@dots{}/}), or it may be an ordinary -expression, in which case the value of the expression as a string is a -dynamic regexp (@pxref{Regexp Usage, ,How to Use Regular Expressions}). - -@cindex regexp as expression -In very recent implementations of @code{awk}, a constant regular -expression in slashes by itself is also an expression. The regexp -@code{/@var{regexp}/} is an abbreviation for this comparison expression: - -@example -$0 ~ /@var{regexp}/ -@end example - -In some contexts it may be necessary to write parentheses around the -regexp to avoid confusing the @code{gawk} parser. For example, -@code{(/x/ - /y/) > threshold} is not allowed, but @code{((/x/) - (/y/)) -> threshold} parses properly. - -One special place where @code{/foo/} is @emph{not} an abbreviation for -@code{$0 ~ /foo/} is when it is the right-hand operand of @samp{~} or -@samp{!~}! @xref{Constants, ,Constant Expressions}, where this is -discussed in more detail. - -@node Boolean Ops, Assignment Ops, Comparison Ops, Expressions -@section Boolean Expressions -@cindex expressions, boolean -@cindex boolean expressions -@cindex operators, boolean -@cindex boolean operators -@cindex logical operations -@cindex and operator -@cindex or operator -@cindex not operator - -A @dfn{boolean expression} is a combination of comparison expressions or -matching expressions, using the boolean operators ``or'' -(@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with -parentheses to control nesting. The truth of the boolean expression is -computed by combining the truth values of the component expressions. - -Boolean expressions can be used wherever comparison and matching -expressions can be used. They can be used in @code{if}, @code{while} -@code{do} and @code{for} statements. They have numeric values (1 if true, -0 if false), which come into play if the result of the boolean expression -is stored in a variable, or used in arithmetic.@refill - -In addition, every boolean expression is also a valid boolean pattern, so -you can use it as a pattern to control the execution of rules. - -Here are descriptions of the three boolean operators, with an example of -each. It may be instructive to compare these examples with the -analogous examples of boolean patterns -(@pxref{Boolean Patterns, ,Boolean Operators and Patterns}), which -use the same boolean operators in patterns instead of expressions.@refill - -@table @code -@item @var{boolean1} && @var{boolean2} -True if both @var{boolean1} and @var{boolean2} are true. For example, -the following statement prints the current input record if it contains -both @samp{2400} and @samp{foo}.@refill - -@smallexample -if ($0 ~ /2400/ && $0 ~ /foo/) print -@end smallexample - -The subexpression @var{boolean2} is evaluated only if @var{boolean1} -is true. This can make a difference when @var{boolean2} contains -expressions that have side effects: in the case of @code{$0 ~ /foo/ && -($2 == bar++)}, the variable @code{bar} is not incremented if there is -no @samp{foo} in the record. - -@item @var{boolean1} || @var{boolean2} -True if at least one of @var{boolean1} or @var{boolean2} is true. -For example, the following command prints all records in the input -file @file{BBS-list} that contain @emph{either} @samp{2400} or -@samp{foo}, or both.@refill - -@smallexample -awk '@{ if ($0 ~ /2400/ || $0 ~ /foo/) print @}' BBS-list -@end smallexample - -The subexpression @var{boolean2} is evaluated only if @var{boolean1} -is false. This can make a difference when @var{boolean2} contains -expressions that have side effects. - -@item !@var{boolean} -True if @var{boolean} is false. For example, the following program prints -all records in the input file @file{BBS-list} that do @emph{not} contain the -string @samp{foo}. - -@smallexample -awk '@{ if (! ($0 ~ /foo/)) print @}' BBS-list -@end smallexample -@end table - -@node Assignment Ops, Increment Ops, Boolean Ops, Expressions -@section Assignment Expressions -@cindex assignment operators -@cindex operators, assignment -@cindex expressions, assignment - -An @dfn{assignment} is an expression that stores a new value into a -variable. For example, let's assign the value 1 to the variable -@code{z}:@refill - -@example -z = 1 -@end example - -After this expression is executed, the variable @code{z} has the value 1. -Whatever old value @code{z} had before the assignment is forgotten. - -Assignments can store string values also. For example, this would store -the value @code{"this food is good"} in the variable @code{message}: - -@example -thing = "food" -predicate = "good" -message = "this " thing " is " predicate -@end example - -@noindent -(This also illustrates concatenation of strings.) - -The @samp{=} sign is called an @dfn{assignment operator}. It is the -simplest assignment operator because the value of the right-hand -operand is stored unchanged. - -@cindex side effect -Most operators (addition, concatenation, and so on) have no effect -except to compute a value. If you ignore the value, you might as well -not use the operator. An assignment operator is different; it does -produce a value, but even if you ignore the value, the assignment still -makes itself felt through the alteration of the variable. We call this -a @dfn{side effect}. - -@cindex lvalue -The left-hand operand of an assignment need not be a variable -(@pxref{Variables}); it can also be a field -(@pxref{Changing Fields, ,Changing the Contents of a Field}) or -an array element (@pxref{Arrays, ,Arrays in @code{awk}}). -These are all called @dfn{lvalues}, -which means they can appear on the left-hand side of an assignment operator. -The right-hand operand may be any expression; it produces the new value -which the assignment stores in the specified variable, field or array -element.@refill - -It is important to note that variables do @emph{not} have permanent types. -The type of a variable is simply the type of whatever value it happens -to hold at the moment. In the following program fragment, the variable -@code{foo} has a numeric value at first, and a string value later on: - -@example -foo = 1 -print foo -foo = "bar" -print foo -@end example - -@noindent -When the second assignment gives @code{foo} a string value, the fact that -it previously had a numeric value is forgotten. - -An assignment is an expression, so it has a value: the same value that -is assigned. Thus, @code{z = 1} as an expression has the value 1. -One consequence of this is that you can write multiple assignments together: - -@example -x = y = z = 0 -@end example - -@noindent -stores the value 0 in all three variables. It does this because the -value of @code{z = 0}, which is 0, is stored into @code{y}, and then -the value of @code{y = z = 0}, which is 0, is stored into @code{x}. - -You can use an assignment anywhere an expression is called for. For -example, it is valid to write @code{x != (y = 1)} to set @code{y} to 1 -and then test whether @code{x} equals 1. But this style tends to make -programs hard to read; except in a one-shot program, you should -rewrite it to get rid of such nesting of assignments. This is never very -hard. - -Aside from @samp{=}, there are several other assignment operators that -do arithmetic with the old value of the variable. For example, the -operator @samp{+=} computes a new value by adding the right-hand value -to the old value of the variable. Thus, the following assignment adds -5 to the value of @code{foo}: - -@example -foo += 5 -@end example - -@noindent -This is precisely equivalent to the following: - -@example -foo = foo + 5 -@end example - -@noindent -Use whichever one makes the meaning of your program clearer. - -Here is a table of the arithmetic assignment operators. In each -case, the right-hand operand is an expression whose value is converted -to a number. - -@table @code -@item @var{lvalue} += @var{increment} -Adds @var{increment} to the value of @var{lvalue} to make the new value -of @var{lvalue}. - -@item @var{lvalue} -= @var{decrement} -Subtracts @var{decrement} from the value of @var{lvalue}. - -@item @var{lvalue} *= @var{coefficient} -Multiplies the value of @var{lvalue} by @var{coefficient}. - -@item @var{lvalue} /= @var{quotient} -Divides the value of @var{lvalue} by @var{quotient}. - -@item @var{lvalue} %= @var{modulus} -Sets @var{lvalue} to its remainder by @var{modulus}. - -@item @var{lvalue} ^= @var{power} -@itemx @var{lvalue} **= @var{power} -Raises @var{lvalue} to the power @var{power}. -(Only the @code{^=} operator is specified by @sc{posix}.) -@end table - -@ignore -From: gatech!ames!elroy!cit-vax!EQL.Caltech.Edu!rankin (Pat Rankin) - In the discussion of assignment operators, it states that -``foo += 5'' "is precisely equivalent to" ``foo = foo + 5'' (p.77). That -may be true for simple variables, but it's not true for expressions with -side effects, like array references. For proof, try - BEGIN { - foo[rand()] += 5; for (x in foo) print x, foo[x] - bar[rand()] = bar[rand()] + 5; for (x in bar) print x, bar[x] - } -I suspect that the original statement is simply untrue--that '+=' is more -efficient in all cases. - -ADR --- Try to add something about this here for the next go 'round. -@end ignore - -@node Increment Ops, Conversion, Assignment Ops, Expressions -@section Increment Operators - -@cindex increment operators -@cindex operators, increment -@dfn{Increment operators} increase or decrease the value of a variable -by 1. You could do the same thing with an assignment operator, so -the increment operators add no power to the @code{awk} language; but they -are convenient abbreviations for something very common. - -The operator to add 1 is written @samp{++}. It can be used to increment -a variable either before or after taking its value. - -To pre-increment a variable @var{v}, write @code{++@var{v}}. This adds -1 to the value of @var{v} and that new value is also the value of this -expression. The assignment expression @code{@var{v} += 1} is completely -equivalent. - -Writing the @samp{++} after the variable specifies post-increment. This -increments the variable value just the same; the difference is that the -value of the increment expression itself is the variable's @emph{old} -value. Thus, if @code{foo} has the value 4, then the expression @code{foo++} -has the value 4, but it changes the value of @code{foo} to 5. - -The post-increment @code{foo++} is nearly equivalent to writing @code{(foo -+= 1) - 1}. It is not perfectly equivalent because all numbers in -@code{awk} are floating point: in floating point, @code{foo + 1 - 1} does -not necessarily equal @code{foo}. But the difference is minute as -long as you stick to numbers that are fairly small (less than a trillion). - -Any lvalue can be incremented. Fields and array elements are incremented -just like variables. (Use @samp{$(i++)} when you wish to do a field reference -and a variable increment at the same time. The parentheses are necessary -because of the precedence of the field reference operator, @samp{$}.) -@c expert information in the last parenthetical remark - -The decrement operator @samp{--} works just like @samp{++} except that -it subtracts 1 instead of adding. Like @samp{++}, it can be used before -the lvalue to pre-decrement or after it to post-decrement. - -Here is a summary of increment and decrement expressions. - -@table @code -@item ++@var{lvalue} -This expression increments @var{lvalue} and the new value becomes the -value of this expression. - -@item @var{lvalue}++ -This expression causes the contents of @var{lvalue} to be incremented. -The value of the expression is the @emph{old} value of @var{lvalue}. - -@item --@var{lvalue} -Like @code{++@var{lvalue}}, but instead of adding, it subtracts. It -decrements @var{lvalue} and delivers the value that results. - -@item @var{lvalue}-- -Like @code{@var{lvalue}++}, but instead of adding, it subtracts. It -decrements @var{lvalue}. The value of the expression is the @emph{old} -value of @var{lvalue}. -@end table - -@node Conversion, Values, Increment Ops, Expressions -@section Conversion of Strings and Numbers - -@cindex conversion of strings and numbers -Strings are converted to numbers, and numbers to strings, if the context -of the @code{awk} program demands it. For example, if the value of -either @code{foo} or @code{bar} in the expression @code{foo + bar} -happens to be a string, it is converted to a number before the addition -is performed. If numeric values appear in string concatenation, they -are converted to strings. Consider this:@refill - -@example -two = 2; three = 3 -print (two three) + 4 -@end example - -@noindent -This eventually prints the (numeric) value 27. The numeric values of -the variables @code{two} and @code{three} are converted to strings and -concatenated together, and the resulting string is converted back to the -number 23, to which 4 is then added. - -If, for some reason, you need to force a number to be converted to a -string, concatenate the null string with that number. To force a string -to be converted to a number, add zero to that string. - -A string is converted to a number by interpreting a numeric prefix -of the string as numerals: -@code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"} -has a numeric value of 25. -Strings that can't be interpreted as valid numbers are converted to -zero. - -@vindex CONVFMT -The exact manner in which numbers are converted into strings is controlled -by the @code{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}). -Numbers are converted using a special version of the @code{sprintf} function -(@pxref{Built-in, ,Built-in Functions}) with @code{CONVFMT} as the format -specifier.@refill - -@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with -at least six significant digits. For some applications you will want to -change it to specify more precision. Double precision on most modern -machines gives you 16 or 17 decimal digits of precision. - -Strange results can happen if you set @code{CONVFMT} to a string that doesn't -tell @code{sprintf} how to format floating point numbers in a useful way. -For example, if you forget the @samp{%} in the format, all numbers will be -converted to the same constant string.@refill - -As a special case, if a number is an integer, then the result of converting -it to a string is @emph{always} an integer, no matter what the value of -@code{CONVFMT} may be. Given the following code fragment: - -@example -CONVFMT = "%2.2f" -a = 12 -b = a "" -@end example - -@noindent -@code{b} has the value @code{"12"}, not @code{"12.00"}. - -@ignore -For the 2.14 version, describe the ``stickyness'' of conversions. Right now -the manual assumes everywhere that variables are either numbers or strings; -in fact both kinds of values may be valid. If both happen to be valid, a -conversion isn't necessary and isn't done. Revising the manual to be -consistent with this, though, is too big a job to tackle at the moment. - -7/92: This has sort of been done, only the section isn't completely right! - What to do? -7/92: Pretty much fixed, at least for the short term, thanks to text - from David. -@end ignore - -@vindex OFMT -Prior to the @sc{posix} standard, @code{awk} specified that the value -of @code{OFMT} was used for converting numbers to strings. @code{OFMT} -specifies the output format to use when printing numbers with @code{print}. -@code{CONVFMT} was introduced in order to separate the semantics of -conversions from the semantics of printing. Both @code{CONVFMT} and -@code{OFMT} have the same default value: @code{"%.6g"}. In the vast majority -of cases, old @code{awk} programs will not change their behavior. -However, this use of @code{OFMT} is something to keep in mind if you must -port your program to other implementations of @code{awk}; we recommend -that instead of changing your programs, you just port @code{gawk} itself!@refill - -@node Values, Conditional Exp, Conversion, Expressions -@section Numeric and String Values -@cindex conversion of strings and numbers - -Through most of this manual, we present @code{awk} values (such as constants, -fields, or variables) as @emph{either} numbers @emph{or} strings. This is -a convenient way to think about them, since typically they are used in only -one way, or the other. - -In truth though, @code{awk} values can be @emph{both} string and -numeric, at the same time. Internally, @code{awk} represents values -with a string, a (floating point) number, and an indication that one, -the other, or both representations of the value are valid. - -Keeping track of both kinds of values is important for execution -efficiency: a variable can acquire a string value the first time it -is used as a string, and then that string value can be used until the -variable is assigned a new value. Thus, if a variable with only a numeric -value is used in several concatenations in a row, it only has to be given -a string representation once. The numeric value remains valid, so that -no conversion back to a number is necessary if the variable is later used -in an arithmetic expression. - -Tracking both kinds of values is also important for precise numerical -calculations. Consider the following: - -@smallexample -a = 123.321 -CONVFMT = "%3.1f" -b = a " is a number" -c = a + 1.654 -@end smallexample - -@noindent -The variable @code{a} receives a string value in the concatenation and -assignment to @code{b}. The string value of @code{a} is @code{"123.3"}. -If the numeric value was lost when it was converted to a string, then the -numeric use of @code{a} in the last statement would lose information. -@code{c} would be assigned the value 124.954 instead of 124.975. -Such errors accumulate rapidly, and very adversely affect numeric -computations.@refill - -Once a numeric value acquires a corresponding string value, it stays valid -until a new assignment is made. If @code{CONVFMT} -(@pxref{Conversion, ,Conversion of Strings and Numbers}) changes in the -meantime, the old string value will still be used. For example:@refill - -@smallexample -BEGIN @{ - CONVFMT = "%2.2f" - a = 123.456 - b = a "" # force `a' to have string value too - printf "a = %s\n", a - CONVFMT = "%.6g" - printf "a = %s\n", a - a += 0 # make `a' numeric only again - printf "a = %s\n", a # use `a' as string -@} -@end smallexample - -@noindent -This program prints @samp{a = 123.46} twice, and then prints -@samp{a = 123.456}. - -@xref{Conversion, ,Conversion of Strings and Numbers}, for the rules that -specify how string values are made from numeric values. - -@node Conditional Exp, Function Calls, Values, Expressions -@section Conditional Expressions -@cindex conditional expression -@cindex expression, conditional - -A @dfn{conditional expression} is a special kind of expression with -three operands. It allows you to use one expression's value to select -one of two other expressions. - -The conditional expression looks the same as in the C language: - -@example -@var{selector} ? @var{if-true-exp} : @var{if-false-exp} -@end example - -@noindent -There are three subexpressions. The first, @var{selector}, is always -computed first. If it is ``true'' (not zero and not null) then -@var{if-true-exp} is computed next and its value becomes the value of -the whole expression. Otherwise, @var{if-false-exp} is computed next -and its value becomes the value of the whole expression.@refill - -For example, this expression produces the absolute value of @code{x}: - -@example -x > 0 ? x : -x -@end example - -Each time the conditional expression is computed, exactly one of -@var{if-true-exp} and @var{if-false-exp} is computed; the other is ignored. -This is important when the expressions contain side effects. For example, -this conditional expression examines element @code{i} of either array -@code{a} or array @code{b}, and increments @code{i}. - -@example -x == y ? a[i++] : b[i++] -@end example - -@noindent -This is guaranteed to increment @code{i} exactly once, because each time -one or the other of the two increment expressions is executed, -and the other is not. - -@node Function Calls, Precedence, Conditional Exp, Expressions -@section Function Calls -@cindex function call -@cindex calling a function - -A @dfn{function} is a name for a particular calculation. Because it has -a name, you can ask for it by name at any point in the program. For -example, the function @code{sqrt} computes the square root of a number. - -A fixed set of functions are @dfn{built-in}, which means they are -available in every @code{awk} program. The @code{sqrt} function is one -of these. @xref{Built-in, ,Built-in Functions}, for a list of built-in -functions and their descriptions. In addition, you can define your own -functions in the program for use elsewhere in the same program. -@xref{User-defined, ,User-defined Functions}, for how to do this.@refill - -@cindex arguments in function call -The way to use a function is with a @dfn{function call} expression, -which consists of the function name followed by a list of -@dfn{arguments} in parentheses. The arguments are expressions which -give the raw materials for the calculation that the function will do. -When there is more than one argument, they are separated by commas. If -there are no arguments, write just @samp{()} after the function name. -Here are some examples: - -@example -sqrt(x^2 + y^2) # @r{One argument} -atan2(y, x) # @r{Two arguments} -rand() # @r{No arguments} -@end example - -@strong{Do not put any space between the function name and the -open-parenthesis!} A user-defined function name looks just like the name of -a variable, and space would make the expression look like concatenation -of a variable with an expression inside parentheses. Space before the -parenthesis is harmless with built-in functions, but it is best not to get -into the habit of using space to avoid mistakes with user-defined -functions. - -Each function expects a particular number of arguments. For example, the -@code{sqrt} function must be called with a single argument, the number -to take the square root of: - -@example -sqrt(@var{argument}) -@end example - -Some of the built-in functions allow you to omit the final argument. -If you do so, they use a reasonable default. -@xref{Built-in, ,Built-in Functions}, for full details. If arguments -are omitted in calls to user-defined functions, then those arguments are -treated as local variables, initialized to the null string -(@pxref{User-defined, ,User-defined Functions}).@refill - -Like every other expression, the function call has a value, which is -computed by the function based on the arguments you give it. In this -example, the value of @code{sqrt(@var{argument})} is the square root of the -argument. A function can also have side effects, such as assigning the -values of certain variables or doing I/O. - -Here is a command to read numbers, one number per line, and print the -square root of each one: - -@example -awk '@{ print "The square root of", $1, "is", sqrt($1) @}' -@end example - -@node Precedence, , Function Calls, Expressions -@section Operator Precedence (How Operators Nest) -@cindex precedence -@cindex operator precedence - -@dfn{Operator precedence} determines how operators are grouped, when -different operators appear close by in one expression. For example, -@samp{*} has higher precedence than @samp{+}; thus, @code{a + b * c} -means to multiply @code{b} and @code{c}, and then add @code{a} to the -product (i.e., @code{a + (b * c)}). - -You can overrule the precedence of the operators by using parentheses. -You can think of the precedence rules as saying where the -parentheses are assumed if you do not write parentheses yourself. In -fact, it is wise to always use parentheses whenever you have an unusual -combination of operators, because other people who read the program may -not remember what the precedence is in this case. You might forget, -too; then you could make a mistake. Explicit parentheses will help prevent -any such mistake. - -When operators of equal precedence are used together, the leftmost -operator groups first, except for the assignment, conditional and -exponentiation operators, which group in the opposite order. -Thus, @code{a - b + c} groups as @code{(a - b) + c}; -@code{a = b = c} groups as @code{a = (b = c)}.@refill - -The precedence of prefix unary operators does not matter as long as only -unary operators are involved, because there is only one way to parse -them---innermost first. Thus, @code{$++i} means @code{$(++i)} and -@code{++$x} means @code{++($x)}. However, when another operator follows -the operand, then the precedence of the unary operators can matter. -Thus, @code{$x^2} means @code{($x)^2}, but @code{-x^2} means -@code{-(x^2)}, because @samp{-} has lower precedence than @samp{^} -while @samp{$} has higher precedence. - -Here is a table of the operators of @code{awk}, in order of increasing -precedence: - -@table @asis -@item assignment -@samp{=}, @samp{+=}, @samp{-=}, @samp{*=}, @samp{/=}, @samp{%=}, -@samp{^=}, @samp{**=}. These operators group right-to-left. -(The @samp{**=} operator is not specified by @sc{posix}.) - -@item conditional -@samp{?:}. This operator groups right-to-left. - -@item logical ``or''. -@samp{||}. - -@item logical ``and''. -@samp{&&}. - -@item array membership -@samp{in}. - -@item matching -@samp{~}, @samp{!~}. - -@item relational, and redirection -The relational operators and the redirections have the same precedence -level. Characters such as @samp{>} serve both as relationals and as -redirections; the context distinguishes between the two meanings. - -The relational operators are @samp{<}, @samp{<=}, @samp{==}, @samp{!=}, -@samp{>=} and @samp{>}. - -The I/O redirection operators are @samp{<}, @samp{>}, @samp{>>} and -@samp{|}. - -Note that I/O redirection operators in @code{print} and @code{printf} -statements belong to the statement level, not to expressions. The -redirection does not produce an expression which could be the operand of -another operator. As a result, it does not make sense to use a -redirection operator near another operator of lower precedence, without -parentheses. Such combinations, for example @samp{print foo > a ? b : -c}, result in syntax errors. - -@item concatenation -No special token is used to indicate concatenation. -The operands are simply written side by side. - -@item add, subtract -@samp{+}, @samp{-}. - -@item multiply, divide, mod -@samp{*}, @samp{/}, @samp{%}. - -@item unary plus, minus, ``not'' -@samp{+}, @samp{-}, @samp{!}. - -@item exponentiation -@samp{^}, @samp{**}. These operators group right-to-left. -(The @samp{**} operator is not specified by @sc{posix}.) - -@item increment, decrement -@samp{++}, @samp{--}. - -@item field -@samp{$}. -@end table - -@node Statements, Arrays, Expressions, Top -@chapter Control Statements in Actions -@cindex control statement - -@dfn{Control statements} such as @code{if}, @code{while}, and so on -control the flow of execution in @code{awk} programs. Most of the -control statements in @code{awk} are patterned on similar statements in -C. - -All the control statements start with special keywords such as @code{if} -and @code{while}, to distinguish them from simple expressions. - -Many control statements contain other statements; for example, the -@code{if} statement contains another statement which may or may not be -executed. The contained statement is called the @dfn{body}. If you -want to include more than one statement in the body, group them into a -single compound statement with curly braces, separating them with -newlines or semicolons. - -@menu -* If Statement:: Conditionally execute - some @code{awk} statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until some - condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Break Statement:: Immediately exit the innermost enclosing loop. -* Continue Statement:: Skip to the end of the innermost - enclosing loop. -* Next Statement:: Stop processing the current input record. -* Next File Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of @code{awk}. -@end menu - -@node If Statement, While Statement, Statements, Statements -@section The @code{if} Statement - -@cindex @code{if} statement -The @code{if}-@code{else} statement is @code{awk}'s decision-making -statement. It looks like this:@refill - -@example -if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]} -@end example - -@noindent -@var{condition} is an expression that controls what the rest of the -statement will do. If @var{condition} is true, @var{then-body} is -executed; otherwise, @var{else-body} is executed (assuming that the -@code{else} clause is present). The @code{else} part of the statement is -optional. The condition is considered false if its value is zero or -the null string, and true otherwise.@refill - -Here is an example: - -@example -if (x % 2 == 0) - print "x is even" -else - print "x is odd" -@end example - -In this example, if the expression @code{x % 2 == 0} is true (that is, -the value of @code{x} is divisible by 2), then the first @code{print} -statement is executed, otherwise the second @code{print} statement is -performed.@refill - -If the @code{else} appears on the same line as @var{then-body}, and -@var{then-body} is not a compound statement (i.e., not surrounded by -curly braces), then a semicolon must separate @var{then-body} from -@code{else}. To illustrate this, let's rewrite the previous example: - -@example -awk '@{ if (x % 2 == 0) print "x is even"; else - print "x is odd" @}' -@end example - -@noindent -If you forget the @samp{;}, @code{awk} won't be able to parse the -statement, and you will get a syntax error. - -We would not actually write this example this way, because a human -reader might fail to see the @code{else} if it were not the first thing -on its line. - -@node While Statement, Do Statement, If Statement, Statements -@section The @code{while} Statement -@cindex @code{while} statement -@cindex loop -@cindex body of a loop - -In programming, a @dfn{loop} means a part of a program that is (or at least can -be) executed two or more times in succession. - -The @code{while} statement is the simplest looping statement in -@code{awk}. It repeatedly executes a statement as long as a condition is -true. It looks like this: - -@example -while (@var{condition}) - @var{body} -@end example - -@noindent -Here @var{body} is a statement that we call the @dfn{body} of the loop, -and @var{condition} is an expression that controls how long the loop -keeps running. - -The first thing the @code{while} statement does is test @var{condition}. -If @var{condition} is true, it executes the statement @var{body}. -(@var{condition} is true when the value -is not zero and not a null string.) After @var{body} has been executed, -@var{condition} is tested again, and if it is still true, @var{body} is -executed again. This process repeats until @var{condition} is no longer -true. If @var{condition} is initially false, the body of the loop is -never executed.@refill - -This example prints the first three fields of each record, one per line. - -@example -awk '@{ i = 1 - while (i <= 3) @{ - print $i - i++ - @} -@}' -@end example - -@noindent -Here the body of the loop is a compound statement enclosed in braces, -containing two statements. - -The loop works like this: first, the value of @code{i} is set to 1. -Then, the @code{while} tests whether @code{i} is less than or equal to -three. This is the case when @code{i} equals one, so the @code{i}-th -field is printed. Then the @code{i++} increments the value of @code{i} -and the loop repeats. The loop terminates when @code{i} reaches 4. - -As you can see, a newline is not required between the condition and the -body; but using one makes the program clearer unless the body is a -compound statement or is very simple. The newline after the open-brace -that begins the compound statement is not required either, but the -program would be hard to read without it. - -@node Do Statement, For Statement, While Statement, Statements -@section The @code{do}-@code{while} Statement - -The @code{do} loop is a variation of the @code{while} looping statement. -The @code{do} loop executes the @var{body} once, then repeats @var{body} -as long as @var{condition} is true. It looks like this: - -@example -do - @var{body} -while (@var{condition}) -@end example - -Even if @var{condition} is false at the start, @var{body} is executed at -least once (and only once, unless executing @var{body} makes -@var{condition} true). Contrast this with the corresponding -@code{while} statement: - -@example -while (@var{condition}) - @var{body} -@end example - -@noindent -This statement does not execute @var{body} even once if @var{condition} -is false to begin with. - -Here is an example of a @code{do} statement: - -@example -awk '@{ i = 1 - do @{ - print $0 - i++ - @} while (i <= 10) -@}' -@end example - -@noindent -prints each input record ten times. It isn't a very realistic example, -since in this case an ordinary @code{while} would do just as well. But -this reflects actual experience; there is only occasionally a real use -for a @code{do} statement.@refill - -@node For Statement, Break Statement, Do Statement, Statements -@section The @code{for} Statement -@cindex @code{for} statement - -The @code{for} statement makes it more convenient to count iterations of a -loop. The general form of the @code{for} statement looks like this:@refill - -@example -for (@var{initialization}; @var{condition}; @var{increment}) - @var{body} -@end example - -@noindent -This statement starts by executing @var{initialization}. Then, as long -as @var{condition} is true, it repeatedly executes @var{body} and then -@var{increment}. Typically @var{initialization} sets a variable to -either zero or one, @var{increment} adds 1 to it, and @var{condition} -compares it against the desired number of iterations. - -Here is an example of a @code{for} statement: - -@example -@group -awk '@{ for (i = 1; i <= 3; i++) - print $i -@}' -@end group -@end example - -@noindent -This prints the first three fields of each input record, one field per -line. - -In the @code{for} statement, @var{body} stands for any statement, but -@var{initialization}, @var{condition} and @var{increment} are just -expressions. You cannot set more than one variable in the -@var{initialization} part unless you use a multiple assignment statement -such as @code{x = y = 0}, which is possible only if all the initial values -are equal. (But you can initialize additional variables by writing -their assignments as separate statements preceding the @code{for} loop.) - -The same is true of the @var{increment} part; to increment additional -variables, you must write separate statements at the end of the loop. -The C compound expression, using C's comma operator, would be useful in -this context, but it is not supported in @code{awk}. - -Most often, @var{increment} is an increment expression, as in the -example above. But this is not required; it can be any expression -whatever. For example, this statement prints all the powers of 2 -between 1 and 100: - -@example -for (i = 1; i <= 100; i *= 2) - print i -@end example - -Any of the three expressions in the parentheses following the @code{for} may -be omitted if there is nothing to be done there. Thus, @w{@samp{for (;x -> 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the -@var{condition} is omitted, it is treated as @var{true}, effectively -yielding an @dfn{infinite loop} (i.e., a loop that will never -terminate).@refill - -In most cases, a @code{for} loop is an abbreviation for a @code{while} -loop, as shown here: - -@example -@var{initialization} -while (@var{condition}) @{ - @var{body} - @var{increment} -@} -@end example - -@noindent -The only exception is when the @code{continue} statement -(@pxref{Continue Statement, ,The @code{continue} Statement}) is used -inside the loop; changing a @code{for} statement to a @code{while} -statement in this way can change the effect of the @code{continue} -statement inside the loop.@refill - -There is an alternate version of the @code{for} loop, for iterating over -all the indices of an array: - -@example -for (i in array) - @var{do something with} array[i] -@end example - -@noindent -@xref{Arrays, ,Arrays in @code{awk}}, for more information on this -version of the @code{for} loop. - -The @code{awk} language has a @code{for} statement in addition to a -@code{while} statement because often a @code{for} loop is both less work to -type and more natural to think of. Counting the number of iterations is -very common in loops. It can be easier to think of this counting as part -of looping rather than as something to do inside the loop. - -The next section has more complicated examples of @code{for} loops. - -@node Break Statement, Continue Statement, For Statement, Statements -@section The @code{break} Statement -@cindex @code{break} statement -@cindex loops, exiting - -The @code{break} statement jumps out of the innermost @code{for}, -@code{while}, or @code{do}-@code{while} loop that encloses it. The -following example finds the smallest divisor of any integer, and also -identifies prime numbers:@refill - -@smallexample -awk '# find smallest divisor of num - @{ num = $1 - for (div = 2; div*div <= num; div++) - if (num % div == 0) - break - if (num % div == 0) - printf "Smallest divisor of %d is %d\n", num, div - else - printf "%d is prime\n", num @}' -@end smallexample - -When the remainder is zero in the first @code{if} statement, @code{awk} -immediately @dfn{breaks out} of the containing @code{for} loop. This means -that @code{awk} proceeds immediately to the statement following the loop -and continues processing. (This is very different from the @code{exit} -statement which stops the entire @code{awk} program. -@xref{Exit Statement, ,The @code{exit} Statement}.)@refill - -Here is another program equivalent to the previous one. It illustrates how -the @var{condition} of a @code{for} or @code{while} could just as well be -replaced with a @code{break} inside an @code{if}: - -@smallexample -@group -awk '# find smallest divisor of num - @{ num = $1 - for (div = 2; ; div++) @{ - if (num % div == 0) @{ - printf "Smallest divisor of %d is %d\n", num, div - break - @} - if (div*div > num) @{ - printf "%d is prime\n", num - break - @} - @} -@}' -@end group -@end smallexample - -@node Continue Statement, Next Statement, Break Statement, Statements -@section The @code{continue} Statement - -@cindex @code{continue} statement -The @code{continue} statement, like @code{break}, is used only inside -@code{for}, @code{while}, and @code{do}-@code{while} loops. It skips -over the rest of the loop body, causing the next cycle around the loop -to begin immediately. Contrast this with @code{break}, which jumps out -of the loop altogether. Here is an example:@refill - -@example -# print names that don't contain the string "ignore" - -# first, save the text of each line -@{ names[NR] = $0 @} - -# print what we're interested in -END @{ - for (x in names) @{ - if (names[x] ~ /ignore/) - continue - print names[x] - @} -@} -@end example - -If one of the input records contains the string @samp{ignore}, this -example skips the print statement for that record, and continues back to -the first statement in the loop. - -This is not a practical example of @code{continue}, since it would be -just as easy to write the loop like this: - -@example -for (x in names) - if (names[x] !~ /ignore/) - print names[x] -@end example - -@ignore -from brennan@boeing.com: - -page 90, section 9.6. The example is too artificial as -the one line program - - !/ignore/ - -does the same thing. -@end ignore -@c ADR --- he's right, but don't worry about this for now - -The @code{continue} statement in a @code{for} loop directs @code{awk} to -skip the rest of the body of the loop, and resume execution with the -increment-expression of the @code{for} statement. The following program -illustrates this fact:@refill - -@example -awk 'BEGIN @{ - for (x = 0; x <= 20; x++) @{ - if (x == 5) - continue - printf ("%d ", x) - @} - print "" -@}' -@end example - -@noindent -This program prints all the numbers from 0 to 20, except for 5, for -which the @code{printf} is skipped. Since the increment @code{x++} -is not skipped, @code{x} does not remain stuck at 5. Contrast the -@code{for} loop above with the @code{while} loop: - -@example -awk 'BEGIN @{ - x = 0 - while (x <= 20) @{ - if (x == 5) - continue - printf ("%d ", x) - x++ - @} - print "" -@}' -@end example - -@noindent -This program loops forever once @code{x} gets to 5. - -As described above, the @code{continue} statement has no meaning when -used outside the body of a loop. However, although it was never documented, -historical implementations of @code{awk} have treated the @code{continue} -statement outside of a loop as if it were a @code{next} statement -(@pxref{Next Statement, ,The @code{next} Statement}). -By default, @code{gawk} silently supports this usage. However, if -@samp{-W posix} has been specified on the command line -(@pxref{Command Line, ,Invoking @code{awk}}), -it will be treated as an error, since the @sc{posix} standard specifies -that @code{continue} should only be used inside the body of a loop.@refill - -@node Next Statement, Next File Statement, Continue Statement, Statements -@section The @code{next} Statement -@cindex @code{next} statement - -The @code{next} statement forces @code{awk} to immediately stop processing -the current record and go on to the next record. This means that no -further rules are executed for the current record. The rest of the -current rule's action is not executed either. - -Contrast this with the effect of the @code{getline} function -(@pxref{Getline, ,Explicit Input with @code{getline}}). That too causes -@code{awk} to read the next record immediately, but it does not alter the -flow of control in any way. So the rest of the current action executes -with a new input record. - -At the highest level, @code{awk} program execution is a loop that reads -an input record and then tests each rule's pattern against it. If you -think of this loop as a @code{for} statement whose body contains the -rules, then the @code{next} statement is analogous to a @code{continue} -statement: it skips to the end of the body of this implicit loop, and -executes the increment (which reads another record). - -For example, if your @code{awk} program works only on records with four -fields, and you don't want it to fail when given bad input, you might -use this rule near the beginning of the program: - -@smallexample -NF != 4 @{ - printf("line %d skipped: doesn't have 4 fields", FNR) > "/dev/stderr" - next -@} -@end smallexample - -@noindent -so that the following rules will not see the bad record. The error -message is redirected to the standard error output stream, as error -messages should be. @xref{Special Files, ,Standard I/O Streams}. - -According to the @sc{posix} standard, the behavior is undefined if -the @code{next} statement is used in a @code{BEGIN} or @code{END} rule. -@code{gawk} will treat it as a syntax error. - -If the @code{next} statement causes the end of the input to be reached, -then the code in the @code{END} rules, if any, will be executed. -@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}. - -@node Next File Statement, Exit Statement, Next Statement, Statements -@section The @code{next file} Statement - -@cindex @code{next file} statement -The @code{next file} statement is similar to the @code{next} statement. -However, instead of abandoning processing of the current record, the -@code{next file} statement instructs @code{awk} to stop processing the -current data file. - -Upon execution of the @code{next file} statement, @code{FILENAME} is -updated to the name of the next data file listed on the command line, -@code{FNR} is reset to 1, and processing starts over with the first -rule in the progam. @xref{Built-in Variables}. - -If the @code{next file} statement causes the end of the input to be reached, -then the code in the @code{END} rules, if any, will be executed. -@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}. - -The @code{next file} statement is a @code{gawk} extension; it is not -(currently) available in any other @code{awk} implementation. You can -simulate its behavior by creating a library file named @file{nextfile.awk}, -with the following contents. (This sample program uses user-defined -functions, a feature that has not been presented yet. -@xref{User-defined, ,User-defined Functions}, -for more information.)@refill - -@smallexample -# nextfile --- function to skip remaining records in current file - -# this should be read in before the "main" awk program - -function nextfile() @{ _abandon_ = FILENAME; next @} - -_abandon_ == FILENAME && FNR > 1 @{ next @} -_abandon_ == FILENAME && FNR == 1 @{ _abandon_ = "" @} -@end smallexample - -The @code{nextfile} function simply sets a ``private'' variable@footnote{Since -all variables in @code{awk} are global, this program uses the common -practice of prefixing the variable name with an underscore. In fact, it -also suffixes the variable name with an underscore, as extra insurance -against using a variable name that might be used in some other library -file.} to the name of the current data file, and then retrieves the next -record. Since this file is read before the main @code{awk} program, -the rules that follows the function definition will be executed before the -rules in the main program. The first rule continues to skip records as long as -the name of the input file has not changed, and this is not the first -record in the file. This rule is sufficient most of the time. But what if -the @emph{same} data file is named twice in a row on the command line? -This rule would not process the data file the second time. The second rule -catches this case: If the data file name is what was being skipped, but -@code{FNR} is 1, then this is the second time the file is being processed, -and it should not be skipped. - -The @code{next file} statement would be useful if you have many data -files to process, and due to the nature of the data, you expect that you -would not want to process every record in the file. In order to move on to -the next data file, you would have to continue scanning the unwanted -records (as described above). The @code{next file} statement accomplishes -this much more efficiently. - -@ignore -Would it make sense down the road to nuke `next file' in favor of -semantics that would make this work? - - function nextfile() { ARGIND++ ; next } -@end ignore - -@node Exit Statement, , Next File Statement, Statements -@section The @code{exit} Statement - -@cindex @code{exit} statement -The @code{exit} statement causes @code{awk} to immediately stop -executing the current rule and to stop processing input; any remaining input -is ignored.@refill - -If an @code{exit} statement is executed from a @code{BEGIN} rule the -program stops processing everything immediately. No input records are -read. However, if an @code{END} rule is present, it is executed -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}). - -If @code{exit} is used as part of an @code{END} rule, it causes -the program to stop immediately. - -An @code{exit} statement that is part of an ordinary rule (that is, not part -of a @code{BEGIN} or @code{END} rule) stops the execution of any further -automatic rules, but the @code{END} rule is executed if there is one. -If you do not want the @code{END} rule to do its job in this case, you -can set a variable to nonzero before the @code{exit} statement, and check -that variable in the @code{END} rule. - -If an argument is supplied to @code{exit}, its value is used as the exit -status code for the @code{awk} process. If no argument is supplied, -@code{exit} returns status zero (success).@refill - -For example, let's say you've discovered an error condition you really -don't know how to handle. Conventionally, programs report this by -exiting with a nonzero status. Your @code{awk} program can do this -using an @code{exit} statement with a nonzero argument. Here's an -example of this:@refill - -@example -@group -BEGIN @{ - if (("date" | getline date_now) < 0) @{ - print "Can't get system date" > "/dev/stderr" - exit 4 - @} -@} -@end group -@end example - -@node Arrays, Built-in, Statements, Top -@chapter Arrays in @code{awk} - -An @dfn{array} is a table of values, called @dfn{elements}. The -elements of an array are distinguished by their indices. @dfn{Indices} -may be either numbers or strings. Each array has a name, which looks -like a variable name, but must not be in use as a variable name in the -same @code{awk} program. - -@menu -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the @code{for} statement. - It loops through the indices of - an array's existing elements. -* Delete:: The @code{delete} statement removes - an element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in @code{awk}. -* Multi-dimensional:: Emulating multi-dimensional arrays in @code{awk}. -* Multi-scanning:: Scanning multi-dimensional arrays. -@end menu - -@node Array Intro, Reference to Elements, Arrays, Arrays -@section Introduction to Arrays - -@cindex arrays -The @code{awk} language has one-dimensional @dfn{arrays} for storing groups -of related strings or numbers. - -Every @code{awk} array must have a name. Array names have the same -syntax as variable names; any valid variable name would also be a valid -array name. But you cannot use one name in both ways (as an array and -as a variable) in one @code{awk} program. - -Arrays in @code{awk} superficially resemble arrays in other programming -languages; but there are fundamental differences. In @code{awk}, you -don't need to specify the size of an array before you start to use it. -Additionally, any number or string in @code{awk} may be used as an -array index. - -In most other languages, you have to @dfn{declare} an array and specify -how many elements or components it contains. In such languages, the -declaration causes a contiguous block of memory to be allocated for that -many elements. An index in the array must be a positive integer; for -example, the index 0 specifies the first element in the array, which is -actually stored at the beginning of the block of memory. Index 1 -specifies the second element, which is stored in memory right after the -first element, and so on. It is impossible to add more elements to the -array, because it has room for only as many elements as you declared. - -A contiguous array of four elements might look like this, -conceptually, if the element values are @code{8}, @code{"foo"}, -@code{""} and @code{30}:@refill - -@example -+---------+---------+--------+---------+ -| 8 | "foo" | "" | 30 | @r{value} -+---------+---------+--------+---------+ - 0 1 2 3 @r{index} -@end example - -@noindent -Only the values are stored; the indices are implicit from the order of -the values. @code{8} is the value at index 0, because @code{8} appears in the -position with 0 elements before it. - -@cindex arrays, definition of -@cindex associative arrays -Arrays in @code{awk} are different: they are @dfn{associative}. This means -that each array is a collection of pairs: an index, and its corresponding -array element value: - -@example -@r{Element} 4 @r{Value} 30 -@r{Element} 2 @r{Value} "foo" -@r{Element} 1 @r{Value} 8 -@r{Element} 3 @r{Value} "" -@end example - -@noindent -We have shown the pairs in jumbled order because their order is irrelevant. - -One advantage of an associative array is that new pairs can be added -at any time. For example, suppose we add to the above array a tenth element -whose value is @w{@code{"number ten"}}. The result is this: - -@example -@r{Element} 10 @r{Value} "number ten" -@r{Element} 4 @r{Value} 30 -@r{Element} 2 @r{Value} "foo" -@r{Element} 1 @r{Value} 8 -@r{Element} 3 @r{Value} "" -@end example - -@noindent -Now the array is @dfn{sparse} (i.e., some indices are missing): it has -elements 1--4 and 10, but doesn't have elements 5, 6, 7, 8, or 9.@refill - -Another consequence of associative arrays is that the indices don't -have to be positive integers. Any number, or even a string, can be -an index. For example, here is an array which translates words from -English into French: - -@example -@r{Element} "dog" @r{Value} "chien" -@r{Element} "cat" @r{Value} "chat" -@r{Element} "one" @r{Value} "un" -@r{Element} 1 @r{Value} "un" -@end example - -@noindent -Here we decided to translate the number 1 in both spelled-out and -numeric form---thus illustrating that a single array can have both -numbers and strings as indices. - -When @code{awk} creates an array for you, e.g., with the @code{split} -built-in function, -that array's indices are consecutive integers starting at 1. -(@xref{String Functions, ,Built-in Functions for String Manipulation}.) - -@node Reference to Elements, Assigning Elements, Array Intro, Arrays -@section Referring to an Array Element -@cindex array reference -@cindex element of array -@cindex reference to array - -The principal way of using an array is to refer to one of its elements. -An array reference is an expression which looks like this: - -@example -@var{array}[@var{index}] -@end example - -@noindent -Here, @var{array} is the name of an array. The expression @var{index} is -the index of the element of the array that you want. - -The value of the array reference is the current value of that array -element. For example, @code{foo[4.3]} is an expression for the element -of array @code{foo} at index 4.3. - -If you refer to an array element that has no recorded value, the value -of the reference is @code{""}, the null string. This includes elements -to which you have not assigned any value, and elements that have been -deleted (@pxref{Delete, ,The @code{delete} Statement}). Such a reference -automatically creates that array element, with the null string as its value. -(In some cases, this is unfortunate, because it might waste memory inside -@code{awk}). - -@cindex arrays, presence of elements -You can find out if an element exists in an array at a certain index with -the expression: - -@example -@var{index} in @var{array} -@end example - -@noindent -This expression tests whether or not the particular index exists, -without the side effect of creating that element if it is not present. -The expression has the value 1 (true) if @code{@var{array}[@var{index}]} -exists, and 0 (false) if it does not exist.@refill - -For example, to test whether the array @code{frequencies} contains the -index @code{"2"}, you could write this statement:@refill - -@smallexample -if ("2" in frequencies) print "Subscript \"2\" is present." -@end smallexample - -Note that this is @emph{not} a test of whether or not the array -@code{frequencies} contains an element whose @emph{value} is @code{"2"}. -(There is no way to do that except to scan all the elements.) Also, this -@emph{does not} create @code{frequencies["2"]}, while the following -(incorrect) alternative would do so:@refill - -@smallexample -if (frequencies["2"] != "") print "Subscript \"2\" is present." -@end smallexample - -@node Assigning Elements, Array Example, Reference to Elements, Arrays -@section Assigning Array Elements -@cindex array assignment -@cindex element assignment - -Array elements are lvalues: they can be assigned values just like -@code{awk} variables: - -@example -@var{array}[@var{subscript}] = @var{value} -@end example - -@noindent -Here @var{array} is the name of your array. The expression -@var{subscript} is the index of the element of the array that you want -to assign a value. The expression @var{value} is the value you are -assigning to that element of the array.@refill - -@node Array Example, Scanning an Array, Assigning Elements, Arrays -@section Basic Example of an Array - -The following program takes a list of lines, each beginning with a line -number, and prints them out in order of line number. The line numbers are -not in order, however, when they are first read: they are scrambled. This -program sorts the lines by making an array using the line numbers as -subscripts. It then prints out the lines in sorted order of their numbers. -It is a very simple program, and gets confused if it encounters repeated -numbers, gaps, or lines that don't begin with a number.@refill - -@example -@{ - if ($1 > max) - max = $1 - arr[$1] = $0 -@} - -END @{ - for (x = 1; x <= max; x++) - print arr[x] -@} -@end example - -The first rule keeps track of the largest line number seen so far; -it also stores each line into the array @code{arr}, at an index that -is the line's number. - -The second rule runs after all the input has been read, to print out -all the lines. - -When this program is run with the following input: - -@example -5 I am the Five man -2 Who are you? The new number two! -4 . . . And four on the floor -1 Who is number one? -3 I three you. -@end example - -@noindent -its output is this: - -@example -1 Who is number one? -2 Who are you? The new number two! -3 I three you. -4 . . . And four on the floor -5 I am the Five man -@end example - -If a line number is repeated, the last line with a given number overrides -the others. - -Gaps in the line numbers can be handled with an easy improvement to the -program's @code{END} rule: - -@example -END @{ - for (x = 1; x <= max; x++) - if (x in arr) - print arr[x] -@} -@end example - -@node Scanning an Array, Delete, Array Example, Arrays -@section Scanning all Elements of an Array -@cindex @code{for (x in @dots{})} -@cindex arrays, special @code{for} statement -@cindex scanning an array - -In programs that use arrays, often you need a loop that executes -once for each element of an array. In other languages, where arrays are -contiguous and indices are limited to positive integers, this is -easy: the largest index is one less than the length of the array, and you can -find all the valid indices by counting from zero up to that value. This -technique won't do the job in @code{awk}, since any number or string -may be an array index. So @code{awk} has a special kind of @code{for} -statement for scanning an array: - -@example -for (@var{var} in @var{array}) - @var{body} -@end example - -@noindent -This loop executes @var{body} once for each different value that your -program has previously used as an index in @var{array}, with the -variable @var{var} set to that index.@refill - -Here is a program that uses this form of the @code{for} statement. The -first rule scans the input records and notes which words appear (at -least once) in the input, by storing a 1 into the array @code{used} with -the word as index. The second rule scans the elements of @code{used} to -find all the distinct words that appear in the input. It prints each -word that is more than 10 characters long, and also prints the number of -such words. @xref{Built-in, ,Built-in Functions}, for more information -on the built-in function @code{length}. - -@smallexample -# Record a 1 for each word that is used at least once. -@{ - for (i = 1; i <= NF; i++) - used[$i] = 1 -@} - -# Find number of distinct words more than 10 characters long. -END @{ - for (x in used) - if (length(x) > 10) @{ - ++num_long_words - print x - @} - print num_long_words, "words longer than 10 characters" -@} -@end smallexample - -@noindent -@xref{Sample Program}, for a more detailed example of this type. - -The order in which elements of the array are accessed by this statement -is determined by the internal arrangement of the array elements within -@code{awk} and cannot be controlled or changed. This can lead to -problems if new elements are added to @var{array} by statements in -@var{body}; you cannot predict whether or not the @code{for} loop will -reach them. Similarly, changing @var{var} inside the loop can produce -strange results. It is best to avoid such things.@refill - -@node Delete, Numeric Array Subscripts, Scanning an Array, Arrays -@section The @code{delete} Statement -@cindex @code{delete} statement -@cindex deleting elements of arrays -@cindex removing elements of arrays -@cindex arrays, deleting an element - -You can remove an individual element of an array using the @code{delete} -statement: - -@example -delete @var{array}[@var{index}] -@end example - -You can not refer to an array element after it has been deleted; -it is as if you had never referred -to it and had never given it any value. You can no longer obtain any -value the element once had. - -Here is an example of deleting elements in an array: - -@example -for (i in frequencies) - delete frequencies[i] -@end example - -@noindent -This example removes all the elements from the array @code{frequencies}. - -If you delete an element, a subsequent @code{for} statement to scan the array -will not report that element, and the @code{in} operator to check for -the presence of that element will return 0: - -@example -delete foo[4] -if (4 in foo) - print "This will never be printed" -@end example - -It is not an error to delete an element which does not exist. - -@node Numeric Array Subscripts, Multi-dimensional, Delete, Arrays -@section Using Numbers to Subscript Arrays - -An important aspect of arrays to remember is that array subscripts -are @emph{always} strings. If you use a numeric value as a subscript, -it will be converted to a string value before it is used for subscripting -(@pxref{Conversion, ,Conversion of Strings and Numbers}). - -@cindex conversions, during subscripting -@cindex numbers, used as subscripts -@vindex CONVFMT -This means that the value of the @code{CONVFMT} can potentially -affect how your program accesses elements of an array. For example: - -@example -a = b = 12.153 -data[a] = 1 -CONVFMT = "%2.2f" -if (b in data) - printf "%s is in data", b -else - printf "%s is not in data", b -@end example - -@noindent -should print @samp{12.15 is not in data}. The first statement gives -both @code{a} and @code{b} the same numeric value. Assigning to -@code{data[a]} first gives @code{a} the string value @code{"12.153"} -(using the default conversion value of @code{CONVFMT}, @code{"%.6g"}), -and then assigns 1 to @code{data["12.153"]}. The program then changes -the value of @code{CONVFMT}. The test @samp{(b in data)} forces @code{b} -to be converted to a string, this time @code{"12.15"}, since the value of -@code{CONVFMT} only allows two significant digits. This test fails, -since @code{"12.15"} is a different string from @code{"12.153"}.@refill - -According to the rules for conversions -(@pxref{Conversion, ,Conversion of Strings and Numbers}), integer -values are always converted to strings as integers, no matter what the -value of @code{CONVFMT} may happen to be. So the usual case of@refill - -@example -for (i = 1; i <= maxsub; i++) - @i{do something with} array[i] -@end example - -@noindent -will work, no matter what the value of @code{CONVFMT}. - -Like many things in @code{awk}, the majority of the time things work -as you would expect them to work. But it is useful to have a precise -knowledge of the actual rules, since sometimes they can have a subtle -effect on your programs. - -@node Multi-dimensional, Multi-scanning, Numeric Array Subscripts, Arrays -@section Multi-dimensional Arrays - -@c the following index entry is an overfull hbox. --mew 30jan1992 -@cindex subscripts in arrays -@cindex arrays, multi-dimensional subscripts -@cindex multi-dimensional subscripts -A multi-dimensional array is an array in which an element is identified -by a sequence of indices, not a single index. For example, a -two-dimensional array requires two indices. The usual way (in most -languages, including @code{awk}) to refer to an element of a -two-dimensional array named @code{grid} is with -@code{grid[@var{x},@var{y}]}. - -@vindex SUBSEP -Multi-dimensional arrays are supported in @code{awk} through -concatenation of indices into one string. What happens is that -@code{awk} converts the indices into strings -(@pxref{Conversion, ,Conversion of Strings and Numbers}) and -concatenates them together, with a separator between them. This creates -a single string that describes the values of the separate indices. The -combined string is used as a single index into an ordinary, -one-dimensional array. The separator used is the value of the built-in -variable @code{SUBSEP}.@refill - -For example, suppose we evaluate the expression @code{foo[5,12]="value"} -when the value of @code{SUBSEP} is @code{"@@"}. The numbers 5 and 12 are -converted to strings and -concatenated with an @samp{@@} between them, yielding @code{"5@@12"}; thus, -the array element @code{foo["5@@12"]} is set to @code{"value"}.@refill - -Once the element's value is stored, @code{awk} has no record of whether -it was stored with a single index or a sequence of indices. The two -expressions @code{foo[5,12]} and @w{@code{foo[5 SUBSEP 12]}} always have -the same value. - -The default value of @code{SUBSEP} is the string @code{"\034"}, -which contains a nonprinting character that is unlikely to appear in an -@code{awk} program or in the input data. - -The usefulness of choosing an unlikely character comes from the fact -that index values that contain a string matching @code{SUBSEP} lead to -combined strings that are ambiguous. Suppose that @code{SUBSEP} were -@code{"@@"}; then @w{@code{foo["a@@b", "c"]}} and @w{@code{foo["a", -"b@@c"]}} would be indistinguishable because both would actually be -stored as @code{foo["a@@b@@c"]}. Because @code{SUBSEP} is -@code{"\034"}, such confusion can arise only when an index -contains the character with ASCII code 034, which is a rare -event.@refill - -You can test whether a particular index-sequence exists in a -``multi-dimensional'' array with the same operator @code{in} used for single -dimensional arrays. Instead of a single index as the left-hand operand, -write the whole sequence of indices, separated by commas, in -parentheses:@refill - -@example -(@var{subscript1}, @var{subscript2}, @dots{}) in @var{array} -@end example - -The following example treats its input as a two-dimensional array of -fields; it rotates this array 90 degrees clockwise and prints the -result. It assumes that all lines have the same number of -elements. - -@example -awk '@{ - if (max_nf < NF) - max_nf = NF - max_nr = NR - for (x = 1; x <= NF; x++) - vector[x, NR] = $x -@} - -END @{ - for (x = 1; x <= max_nf; x++) @{ - for (y = max_nr; y >= 1; --y) - printf("%s ", vector[x, y]) - printf("\n") - @} -@}' -@end example - -@noindent -When given the input: - -@example -@group -1 2 3 4 5 6 -2 3 4 5 6 1 -3 4 5 6 1 2 -4 5 6 1 2 3 -@end group -@end example - -@noindent -it produces: - -@example -@group -4 3 2 1 -5 4 3 2 -6 5 4 3 -1 6 5 4 -2 1 6 5 -3 2 1 6 -@end group -@end example - -@node Multi-scanning, , Multi-dimensional, Arrays -@section Scanning Multi-dimensional Arrays - -There is no special @code{for} statement for scanning a -``multi-dimensional'' array; there cannot be one, because in truth there -are no multi-dimensional arrays or elements; there is only a -multi-dimensional @emph{way of accessing} an array. - -However, if your program has an array that is always accessed as -multi-dimensional, you can get the effect of scanning it by combining -the scanning @code{for} statement -(@pxref{Scanning an Array, ,Scanning all Elements of an Array}) with the -@code{split} built-in function -(@pxref{String Functions, ,Built-in Functions for String Manipulation}). -It works like this:@refill - -@example -for (combined in @var{array}) @{ - split(combined, separate, SUBSEP) - @dots{} -@} -@end example - -@noindent -This finds each concatenated, combined index in the array, and splits it -into the individual indices by breaking it apart where the value of -@code{SUBSEP} appears. The split-out indices become the elements of -the array @code{separate}. - -Thus, suppose you have previously stored in @code{@var{array}[1, -"foo"]}; then an element with index @code{"1\034foo"} exists in -@var{array}. (Recall that the default value of @code{SUBSEP} contains -the character with code 034.) Sooner or later the @code{for} statement -will find that index and do an iteration with @code{combined} set to -@code{"1\034foo"}. Then the @code{split} function is called as -follows: - -@example -split("1\034foo", separate, "\034") -@end example - -@noindent -The result of this is to set @code{separate[1]} to 1 and @code{separate[2]} -to @code{"foo"}. Presto, the original sequence of separate indices has -been recovered. - -@node Built-in, User-defined, Arrays, Top -@chapter Built-in Functions - -@cindex built-in functions -@dfn{Built-in} functions are functions that are always available for -your @code{awk} program to call. This chapter defines all the built-in -functions in @code{awk}; some of them are mentioned in other sections, -but they are summarized here for your convenience. (You can also define -new functions yourself. @xref{User-defined, ,User-defined Functions}.) - -@menu -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, - including @code{int}, @code{sin} and @code{rand}. -* String Functions:: Functions for string manipulation, - such as @code{split}, @code{match}, and @code{sprintf}. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with time stamps. -@end menu - -@node Calling Built-in, Numeric Functions, Built-in, Built-in -@section Calling Built-in Functions - -To call a built-in function, write the name of the function followed -by arguments in parentheses. For example, @code{atan2(y + z, 1)} -is a call to the function @code{atan2}, with two arguments. - -Whitespace is ignored between the built-in function name and the -open-parenthesis, but we recommend that you avoid using whitespace -there. User-defined functions do not permit whitespace in this way, and -you will find it easier to avoid mistakes by following a simple -convention which always works: no whitespace after a function name. - -Each built-in function accepts a certain number of arguments. In most -cases, any extra arguments given to built-in functions are ignored. The -defaults for omitted arguments vary from function to function and are -described under the individual functions. - -When a function is called, expressions that create the function's actual -parameters are evaluated completely before the function call is performed. -For example, in the code fragment: - -@example -i = 4 -j = sqrt(i++) -@end example - -@noindent -the variable @code{i} is set to 5 before @code{sqrt} is called -with a value of 4 for its actual parameter. - -@node Numeric Functions, String Functions, Calling Built-in, Built-in -@section Numeric Built-in Functions -@c I didn't make all the examples small because a couple of them were -@c short already. --mew 29jan1992 - -Here is a full list of built-in functions that work with numbers: - -@table @code -@item int(@var{x}) -This gives you the integer part of @var{x}, truncated toward 0. This -produces the nearest integer to @var{x}, located between @var{x} and 0. - -For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)} -is @minus{}3, and @code{int(-3)} is @minus{}3 as well.@refill - -@item sqrt(@var{x}) -This gives you the positive square root of @var{x}. It reports an error -if @var{x} is negative. Thus, @code{sqrt(4)} is 2.@refill - -@item exp(@var{x}) -This gives you the exponential of @var{x}, or reports an error if -@var{x} is out of range. The range of values @var{x} can have depends -on your machine's floating point representation.@refill - -@item log(@var{x}) -This gives you the natural logarithm of @var{x}, if @var{x} is positive; -otherwise, it reports an error.@refill - -@item sin(@var{x}) -This gives you the sine of @var{x}, with @var{x} in radians. - -@item cos(@var{x}) -This gives you the cosine of @var{x}, with @var{x} in radians. - -@item atan2(@var{y}, @var{x}) -This gives you the arctangent of @code{@var{y} / @var{x}} in radians. - -@item rand() -This gives you a random number. The values of @code{rand} are -uniformly-distributed between 0 and 1. The value is never 0 and never -1. - -Often you want random integers instead. Here is a user-defined function -you can use to obtain a random nonnegative integer less than @var{n}: - -@example -function randint(n) @{ - return int(n * rand()) -@} -@end example - -@noindent -The multiplication produces a random real number greater than 0 and less -than @var{n}. We then make it an integer (using @code{int}) between 0 -and @code{@var{n} @minus{} 1}. - -Here is an example where a similar function is used to produce -random integers between 1 and @var{n}. Note that this program will -print a new random number for each input record. - -@smallexample -awk ' -# Function to roll a simulated die. -function roll(n) @{ return 1 + int(rand() * n) @} - -# Roll 3 six-sided dice and print total number of points. -@{ - printf("%d points\n", roll(6)+roll(6)+roll(6)) -@}' -@end smallexample - -@strong{Note:} @code{rand} starts generating numbers from the same -point, or @dfn{seed}, each time you run @code{awk}. This means that -a program will produce the same results each time you run it. -The numbers are random within one @code{awk} run, but predictable -from run to run. This is convenient for debugging, but if you want -a program to do different things each time it is used, you must change -the seed to a value that will be different in each run. To do this, -use @code{srand}. - -@item srand(@var{x}) -The function @code{srand} sets the starting point, or @dfn{seed}, -for generating random numbers to the value @var{x}. - -Each seed value leads to a particular sequence of ``random'' numbers. -Thus, if you set the seed to the same value a second time, you will get -the same sequence of ``random'' numbers again. - -If you omit the argument @var{x}, as in @code{srand()}, then the current -date and time of day are used for a seed. This is the way to get random -numbers that are truly unpredictable. - -The return value of @code{srand} is the previous seed. This makes it -easy to keep track of the seeds for use in consistently reproducing -sequences of random numbers. -@end table - -@node String Functions, I/O Functions, Numeric Functions, Built-in -@section Built-in Functions for String Manipulation - -The functions in this section look at or change the text of one or more -strings. - -@table @code -@item index(@var{in}, @var{find}) -@findex match -This searches the string @var{in} for the first occurrence of the string -@var{find}, and returns the position in characters where that occurrence -begins in the string @var{in}. For example:@refill - -@smallexample -awk 'BEGIN @{ print index("peanut", "an") @}' -@end smallexample - -@noindent -prints @samp{3}. If @var{find} is not found, @code{index} returns 0. -(Remember that string indices in @code{awk} start at 1.) - -@item length(@var{string}) -@findex length -This gives you the number of characters in @var{string}. If -@var{string} is a number, the length of the digit string representing -that number is returned. For example, @code{length("abcde")} is 5. By -contrast, @code{length(15 * 35)} works out to 3. How? Well, 15 * 35 = -525, and 525 is then converted to the string @samp{"525"}, which has -three characters. - -If no argument is supplied, @code{length} returns the length of @code{$0}. - -In older versions of @code{awk}, you could call the @code{length} function -without any parentheses. Doing so is marked as ``deprecated'' in the -@sc{posix} standard. This means that while you can do this in your -programs, it is a feature that can eventually be removed from a future -version of the standard. Therefore, for maximal portability of your -@code{awk} programs you should always supply the parentheses. - -@item match(@var{string}, @var{regexp}) -@findex match -The @code{match} function searches the string, @var{string}, for the -longest, leftmost substring matched by the regular expression, -@var{regexp}. It returns the character position, or @dfn{index}, of -where that substring begins (1, if it starts at the beginning of -@var{string}). If no match if found, it returns 0. - -@vindex RSTART -@vindex RLENGTH -The @code{match} function sets the built-in variable @code{RSTART} to -the index. It also sets the built-in variable @code{RLENGTH} to the -length in characters of the matched substring. If no match is found, -@code{RSTART} is set to 0, and @code{RLENGTH} to @minus{}1. - -For example: - -@smallexample -awk '@{ - if ($1 == "FIND") - regex = $2 - else @{ - where = match($0, regex) - if (where) - print "Match of", regex, "found at", where, "in", $0 - @} -@}' -@end smallexample - -@noindent -This program looks for lines that match the regular expression stored in -the variable @code{regex}. This regular expression can be changed. If the -first word on a line is @samp{FIND}, @code{regex} is changed to be the -second word on that line. Therefore, given: - -@smallexample -FIND fo*bar -My program was a foobar -But none of it would doobar -FIND Melvin -JF+KM -This line is property of The Reality Engineering Co. -This file created by Melvin. -@end smallexample - -@noindent -@code{awk} prints: - -@smallexample -Match of fo*bar found at 18 in My program was a foobar -Match of Melvin found at 26 in This file created by Melvin. -@end smallexample - -@item split(@var{string}, @var{array}, @var{fieldsep}) -@findex split -This divides @var{string} into pieces separated by @var{fieldsep}, -and stores the pieces in @var{array}. The first piece is stored in -@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so -forth. The string value of the third argument, @var{fieldsep}, is -a regexp describing where to split @var{string} (much as @code{FS} can -be a regexp describing where to split input records). If -the @var{fieldsep} is omitted, the value of @code{FS} is used. -@code{split} returns the number of elements created.@refill - -The @code{split} function, then, splits strings into pieces in a -manner similar to the way input lines are split into fields. For example: - -@smallexample -split("auto-da-fe", a, "-") -@end smallexample - -@noindent -splits the string @samp{auto-da-fe} into three fields using @samp{-} as the -separator. It sets the contents of the array @code{a} as follows: - -@smallexample -a[1] = "auto" -a[2] = "da" -a[3] = "fe" -@end smallexample - -@noindent -The value returned by this call to @code{split} is 3. - -As with input field-splitting, when the value of @var{fieldsep} is -@code{" "}, leading and trailing whitespace is ignored, and the elements -are separated by runs of whitespace. - -@item sprintf(@var{format}, @var{expression1},@dots{}) -@findex sprintf -This returns (without printing) the string that @code{printf} would -have printed out with the same arguments -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). -For example:@refill - -@smallexample -sprintf("pi = %.2f (approx.)", 22/7) -@end smallexample - -@noindent -returns the string @w{@code{"pi = 3.14 (approx.)"}}. - -@item sub(@var{regexp}, @var{replacement}, @var{target}) -@findex sub -The @code{sub} function alters the value of @var{target}. -It searches this value, which should be a string, for the -leftmost substring matched by the regular expression, @var{regexp}, -extending this match as far as possible. Then the entire string is -changed by replacing the matched text with @var{replacement}. -The modified string becomes the new value of @var{target}. - -This function is peculiar because @var{target} is not simply -used to compute a value, and not just any expression will do: it -must be a variable, field or array reference, so that @code{sub} can -store a modified value there. If this argument is omitted, then the -default is to use and alter @code{$0}. - -For example:@refill - -@smallexample -str = "water, water, everywhere" -sub(/at/, "ith", str) -@end smallexample - -@noindent -sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the -leftmost, longest occurrence of @samp{at} with @samp{ith}. - -The @code{sub} function returns the number of substitutions made (either -one or zero). - -If the special character @samp{&} appears in @var{replacement}, it -stands for the precise substring that was matched by @var{regexp}. (If -the regexp can match more than one string, then this precise substring -may vary.) For example:@refill - -@smallexample -awk '@{ sub(/candidate/, "& and his wife"); print @}' -@end smallexample - -@noindent -changes the first occurrence of @samp{candidate} to @samp{candidate -and his wife} on each input line. - -Here is another example: - -@smallexample -awk 'BEGIN @{ - str = "daabaaa" - sub(/a*/, "c&c", str) - print str -@}' -@end smallexample - -@noindent -prints @samp{dcaacbaaa}. This show how @samp{&} can represent a non-constant -string, and also illustrates the ``leftmost, longest'' rule. - -The effect of this special character (@samp{&}) can be turned off by putting a -backslash before it in the string. As usual, to insert one backslash in -the string, you must write two backslashes. Therefore, write @samp{\\&} -in a string constant to include a literal @samp{&} in the replacement. -For example, here is how to replace the first @samp{|} on each line with -an @samp{&}:@refill - -@smallexample -awk '@{ sub(/\|/, "\\&"); print @}' -@end smallexample - -@strong{Note:} as mentioned above, the third argument to @code{sub} must -be an lvalue. Some versions of @code{awk} allow the third argument to -be an expression which is not an lvalue. In such a case, @code{sub} -would still search for the pattern and return 0 or 1, but the result of -the substitution (if any) would be thrown away because there is no place -to put it. Such versions of @code{awk} accept expressions like -this:@refill - -@smallexample -sub(/USA/, "United States", "the USA and Canada") -@end smallexample - -@noindent -But that is considered erroneous in @code{gawk}. - -@item gsub(@var{regexp}, @var{replacement}, @var{target}) -@findex gsub -This is similar to the @code{sub} function, except @code{gsub} replaces -@emph{all} of the longest, leftmost, @emph{nonoverlapping} matching -substrings it can find. The @samp{g} in @code{gsub} stands for -``global,'' which means replace everywhere. For example:@refill - -@smallexample -awk '@{ gsub(/Britain/, "United Kingdom"); print @}' -@end smallexample - -@noindent -replaces all occurrences of the string @samp{Britain} with @samp{United -Kingdom} for all input records.@refill - -The @code{gsub} function returns the number of substitutions made. If -the variable to be searched and altered, @var{target}, is -omitted, then the entire input record, @code{$0}, is used.@refill - -As in @code{sub}, the characters @samp{&} and @samp{\} are special, and -the third argument must be an lvalue. - -@item substr(@var{string}, @var{start}, @var{length}) -@findex substr -This returns a @var{length}-character-long substring of @var{string}, -starting at character number @var{start}. The first character of a -string is character number one. For example, -@code{substr("washington", 5, 3)} returns @code{"ing"}.@refill - -If @var{length} is not present, this function returns the whole suffix of -@var{string} that begins at character number @var{start}. For example, -@code{substr("washington", 5)} returns @code{"ington"}. This is also -the case if @var{length} is greater than the number of characters remaining -in the string, counting from character number @var{start}. - -@item tolower(@var{string}) -@findex tolower -This returns a copy of @var{string}, with each upper-case character -in the string replaced with its corresponding lower-case character. -Nonalphabetic characters are left unchanged. For example, -@code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}. - -@item toupper(@var{string}) -@findex toupper -This returns a copy of @var{string}, with each lower-case character -in the string replaced with its corresponding upper-case character. -Nonalphabetic characters are left unchanged. For example, -@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}. -@end table - -@node I/O Functions, Time Functions, String Functions, Built-in -@section Built-in Functions for Input/Output - -@table @code -@item close(@var{filename}) -Close the file @var{filename}, for input or output. The argument may -alternatively be a shell command that was used for redirecting to or -from a pipe; then the pipe is closed. - -@xref{Close Input, ,Closing Input Files and Pipes}, regarding closing -input files and pipes. @xref{Close Output, ,Closing Output Files and Pipes}, -regarding closing output files and pipes.@refill - -@item system(@var{command}) -@findex system -@c the following index entry is an overfull hbox. --mew 30jan1992 -@cindex interaction, @code{awk} and other programs -The system function allows the user to execute operating system commands -and then return to the @code{awk} program. The @code{system} function -executes the command given by the string @var{command}. It returns, as -its value, the status returned by the command that was executed. - -For example, if the following fragment of code is put in your @code{awk} -program: - -@smallexample -END @{ - system("mail -s 'awk run done' operator < /dev/null") -@} -@end smallexample - -@noindent -the system operator will be sent mail when the @code{awk} program -finishes processing input and begins its end-of-input processing. - -Note that much the same result can be obtained by redirecting -@code{print} or @code{printf} into a pipe. However, if your @code{awk} -program is interactive, @code{system} is useful for cranking up large -self-contained programs, such as a shell or an editor.@refill - -Some operating systems cannot implement the @code{system} function. -@code{system} causes a fatal error if it is not supported. -@end table - -@c fakenode --- for prepinfo -@subheading Controlling Output Buffering with @code{system} -@cindex flushing buffers -@cindex buffers, flushing -@cindex buffering output -@cindex output, buffering - -Many utility programs will @dfn{buffer} their output; they save information -to be written to a disk file or terminal in memory, until there is enough -to be written in one operation. This is often more efficient than writing -every little bit of information as soon as it is ready. However, sometimes -it is necessary to force a program to @dfn{flush} its buffers; that is, -write the information to its destination, even if a buffer is not full. -You can do this from your @code{awk} program by calling @code{system} -with a null string as its argument: - -@example -system("") # flush output -@end example - -@noindent -@code{gawk} treats this use of the @code{system} function as a special -case, and is smart enough not to run a shell (or other command -interpreter) with the empty command. Therefore, with @code{gawk}, this -idiom is not only useful, it is efficient. While this idiom should work -with other @code{awk} implementations, it will not necessarily avoid -starting an unnecessary shell. -@ignore -Need a better explanation, perhaps in a separate paragraph. Explain that -for - -awk 'BEGIN { print "hi" - system("echo hello") - print "howdy" }' - -that the output had better be - - hi - hello - howdy - -and not - - hello - hi - howdy - -which it would be if awk did not flush its buffers before calling system. -@end ignore - -@node Time Functions, , I/O Functions, Built-in -@section Functions for Dealing with Time Stamps - -@cindex time stamps -@cindex time of day -A common use for @code{awk} programs is the processing of log files. -Log files often contain time stamp information, indicating when a -particular log record was written. Many programs log their time stamp -in the form returned by the @code{time} system call, which is the -number of seconds since a particular epoch. On @sc{posix} systems, -it is the number of seconds since Midnight, January 1, 1970, @sc{utc}. - -In order to make it easier to process such log files, and to easily produce -useful reports, @code{gawk} provides two functions for working with time -stamps. Both of these are @code{gawk} extensions; they are not specified -in the @sc{posix} standard, nor are they in any other known version -of @code{awk}. - -@table @code -@item systime() -@findex systime -This function returns the current time as the number of seconds since -the system epoch. On @sc{posix} systems, this is the number of seconds -since Midnight, January 1, 1970, @sc{utc}. It may be a different number on -other systems. - -@item strftime(@var{format}, @var{timestamp}) -@findex strftime -This function returns a string. It is similar to the function of the -same name in the @sc{ansi} C standard library. The time specified by -@var{timestamp} is used to produce a string, based on the contents -of the @var{format} string. -@end table - -The @code{systime} function allows you to compare a time stamp from a -log file with the current time of day. In particular, it is easy to -determine how long ago a particular record was logged. It also allows -you to produce log records using the ``seconds since the epoch'' format. - -The @code{strftime} function allows you to easily turn a time stamp -into human-readable information. It is similar in nature to the @code{sprintf} -function, copying non-format specification characters verbatim to the -returned string, and substituting date and time values for format -specifications in the @var{format} string. If no @var{timestamp} argument -is supplied, @code{gawk} will use the current time of day as the -time stamp.@refill - -@code{strftime} is guaranteed by the @sc{ansi} C standard to support -the following date format specifications: - -@table @code -@item %a -The locale's abbreviated weekday name. - -@item %A -The locale's full weekday name. - -@item %b -The locale's abbreviated month name. - -@item %B -The locale's full month name. - -@item %c -The locale's ``appropriate'' date and time representation. - -@item %d -The day of the month as a decimal number (01--31). - -@item %H -The hour (24-hour clock) as a decimal number (00--23). - -@item %I -The hour (12-hour clock) as a decimal number (01--12). - -@item %j -The day of the year as a decimal number (001--366). - -@item %m -The month as a decimal number (01--12). - -@item %M -The minute as a decimal number (00--59). - -@item %p -The locale's equivalent of the AM/PM designations associated -with a 12-hour clock. - -@item %S -The second as a decimal number (00--61). (Occasionally there are -minutes in a year with one or two leap seconds, which is why the -seconds can go from 0 all the way to 61.) - -@item %U -The week number of the year (the first Sunday as the first day of week 1) -as a decimal number (00--53). - -@item %w -The weekday as a decimal number (0--6). Sunday is day 0. - -@item %W -The week number of the year (the first Monday as the first day of week 1) -as a decimal number (00--53). - -@item %x -The locale's ``appropriate'' date representation. - -@item %X -The locale's ``appropriate'' time representation. - -@item %y -The year without century as a decimal number (00--99). - -@item %Y -The year with century as a decimal number. - -@item %Z -The time zone name or abbreviation, or no characters if -no time zone is determinable. - -@item %% -A literal @samp{%}. -@end table - -@c The parenthetical remark here should really be a footnote, but -@c it gave formatting problems at the FSF. So for now put it in -@c parentheses. -If a conversion specifier is not one of the above, the behavior is -undefined. (This is because the @sc{ansi} standard for C leaves the -behavior of the C version of @code{strftime} undefined, and @code{gawk} -will use the system's version of @code{strftime} if it's there. -Typically, the conversion specifier will either not appear in the -returned string, or it will appear literally.) - -Informally, a @dfn{locale} is the geographic place in which a program -is meant to run. For example, a common way to abbreviate the date -September 4, 1991 in the United States would be ``9/4/91''. -In many countries in Europe, however, it would be abbreviated ``4.9.91''. -Thus, the @samp{%x} specification in a @code{"US"} locale might produce -@samp{9/4/91}, while in a @code{"EUROPE"} locale, it might produce -@samp{4.9.91}. The @sc{ansi} C standard defines a default @code{"C"} -locale, which is an environment that is typical of what most C programmers -are used to. - -A public-domain C version of @code{strftime} is shipped with @code{gawk} -for systems that are not yet fully @sc{ansi}-compliant. If that version is -used to compile @code{gawk} (@pxref{Installation, ,Installing @code{gawk}}), -then the following additional format specifications are available:@refill - -@table @code -@item %D -Equivalent to specifying @samp{%m/%d/%y}. - -@item %e -The day of the month, padded with a blank if it is only one digit. - -@item %h -Equivalent to @samp{%b}, above. - -@item %n -A newline character (ASCII LF). - -@item %r -Equivalent to specifying @samp{%I:%M:%S %p}. - -@item %R -Equivalent to specifying @samp{%H:%M}. - -@item %T -Equivalent to specifying @samp{%H:%M:%S}. - -@item %t -A TAB character. - -@item %k -is replaced by the hour (24-hour clock) as a decimal number (0-23). -Single digit numbers are padded with a blank. - -@item %l -is replaced by the hour (12-hour clock) as a decimal number (1-12). -Single digit numbers are padded with a blank. - -@item %C -The century, as a number between 00 and 99. - -@item %u -is replaced by the weekday as a decimal number -[1 (Monday)--7]. - -@item %V -is replaced by the week number of the year (the first Monday as the first -day of week 1) as a decimal number (01--53). -The method for determining the week number is as specified by ISO 8601 -(to wit: if the week containing January 1 has four or more days in the -new year, then it is week 1, otherwise it is week 53 of the previous year -and the next week is week 1).@refill - -@item %Ec %EC %Ex %Ey %EY %Od %Oe %OH %OI -@itemx %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy -These are ``alternate representations'' for the specifications -that use only the second letter (@samp{%c}, @samp{%C}, and so on). -They are recognized, but their normal representations are used. -(These facilitate compliance with the @sc{posix} @code{date} -utility.)@refill - -@item %v -The date in VMS format (e.g. 20-JUN-1991). -@end table - -Here are two examples that use @code{strftime}. The first is an -@code{awk} version of the C @code{ctime} function. (This is a -user defined function, which we have not discussed yet. -@xref{User-defined, ,User-defined Functions}, for more information.) - -@smallexample -# ctime.awk -# -# awk version of C ctime(3) function - -function ctime(ts, format) -@{ - format = "%a %b %e %H:%M:%S %Z %Y" - if (ts == 0) - ts = systime() # use current time as default - return strftime(format, ts) -@} -@end smallexample - -This next example is an @code{awk} implementation of the @sc{posix} -@code{date} utility. Normally, the @code{date} utility prints the -current date and time of day in a well known format. However, if you -provide an argument to it that begins with a @samp{+}, @code{date} -will copy non-format specifier characters to the standard output, and -will interpret the current time according to the format specifiers in -the string. For example: - -@smallexample -date '+Today is %A, %B %d, %Y.' -@end smallexample - -@noindent -might print - -@smallexample -Today is Thursday, July 11, 1991. -@end smallexample - -Here is the @code{awk} version of the @code{date} utility. - -@smallexample -#! /usr/bin/gawk -f -# -# date --- implement the P1003.2 Draft 11 'date' command -# -# Bug: does not recognize the -u argument. - -BEGIN \ -@{ - format = "%a %b %e %H:%M:%S %Z %Y" - exitval = 0 - - if (ARGC > 2) - exitval = 1 - else if (ARGC == 2) @{ - format = ARGV[1] - if (format ~ /^\+/) - format = substr(format, 2) # remove leading + - @} - print strftime(format) - exit exitval -@} -@end smallexample - -@node User-defined, Built-in Variables, Built-in, Top -@chapter User-defined Functions - -@cindex user-defined functions -@cindex functions, user-defined -Complicated @code{awk} programs can often be simplified by defining -your own functions. User-defined functions can be called just like -built-in ones (@pxref{Function Calls}), but it is up to you to define -them---to tell @code{awk} what they should do. - -@menu -* Definition Syntax:: How to write definitions and what they mean. -* Function Example:: An example function definition and - what it does. -* Function Caveats:: Things to watch out for. -* Return Statement:: Specifying the value a function returns. -@end menu - -@node Definition Syntax, Function Example, User-defined, User-defined -@section Syntax of Function Definitions -@cindex defining functions -@cindex function definition - -Definitions of functions can appear anywhere between the rules of the -@code{awk} program. Thus, the general form of an @code{awk} program is -extended to include sequences of rules @emph{and} user-defined function -definitions. - -The definition of a function named @var{name} looks like this: - -@example -function @var{name} (@var{parameter-list}) @{ - @var{body-of-function} -@} -@end example - -@noindent -@var{name} is the name of the function to be defined. A valid function -name is like a valid variable name: a sequence of letters, digits and -underscores, not starting with a digit. Functions share the same pool -of names as variables and arrays. - -@var{parameter-list} is a list of the function's arguments and local -variable names, separated by commas. When the function is called, -the argument names are used to hold the argument values given in -the call. The local variables are initialized to the null string. - -The @var{body-of-function} consists of @code{awk} statements. It is the -most important part of the definition, because it says what the function -should actually @emph{do}. The argument names exist to give the body a -way to talk about the arguments; local variables, to give the body -places to keep temporary values. - -Argument names are not distinguished syntactically from local variable -names; instead, the number of arguments supplied when the function is -called determines how many argument variables there are. Thus, if three -argument values are given, the first three names in @var{parameter-list} -are arguments, and the rest are local variables. - -It follows that if the number of arguments is not the same in all calls -to the function, some of the names in @var{parameter-list} may be -arguments on some occasions and local variables on others. Another -way to think of this is that omitted arguments default to the -null string. - -Usually when you write a function you know how many names you intend to -use for arguments and how many you intend to use as locals. By -convention, you should write an extra space between the arguments and -the locals, so other people can follow how your function is -supposed to be used. - -During execution of the function body, the arguments and local variable -values hide or @dfn{shadow} any variables of the same names used in the -rest of the program. The shadowed variables are not accessible in the -function definition, because there is no way to name them while their -names have been taken away for the local variables. All other variables -used in the @code{awk} program can be referenced or set normally in the -function definition. - -The arguments and local variables last only as long as the function body -is executing. Once the body finishes, the shadowed variables come back. - -The function body can contain expressions which call functions. They -can even call this function, either directly or by way of another -function. When this happens, we say the function is @dfn{recursive}. - -There is no need in @code{awk} to put the definition of a function -before all uses of the function. This is because @code{awk} reads the -entire program before starting to execute any of it. - -In many @code{awk} implementations, the keyword @code{function} may be -abbreviated @code{func}. However, @sc{posix} only specifies the use of -the keyword @code{function}. This actually has some practical implications. -If @code{gawk} is in @sc{posix}-compatibility mode -(@pxref{Command Line, ,Invoking @code{awk}}), then the following -statement will @emph{not} define a function:@refill - -@example -func foo() @{ a = sqrt($1) ; print a @} -@end example - -@noindent -Instead it defines a rule that, for each record, concatenates the value -of the variable @samp{func} with the return value of the function @samp{foo}, -and based on the truth value of the result, executes the corresponding action. -This is probably not what was desired. (@code{awk} accepts this input as -syntactically valid, since functions may be used before they are defined -in @code{awk} programs.) - -@node Function Example, Function Caveats, Definition Syntax, User-defined -@section Function Definition Example - -Here is an example of a user-defined function, called @code{myprint}, that -takes a number and prints it in a specific format. - -@example -function myprint(num) -@{ - printf "%6.3g\n", num -@} -@end example - -@noindent -To illustrate, here is an @code{awk} rule which uses our @code{myprint} -function: - -@example -$3 > 0 @{ myprint($3) @} -@end example - -@noindent -This program prints, in our special format, all the third fields that -contain a positive number in our input. Therefore, when given: - -@example - 1.2 3.4 5.6 7.8 - 9.10 11.12 -13.14 15.16 -17.18 19.20 21.22 23.24 -@end example - -@noindent -this program, using our function to format the results, prints: - -@example - 5.6 - 21.2 -@end example - -Here is a rather contrived example of a recursive function. It prints a -string backwards: - -@example -function rev (str, len) @{ - if (len == 0) @{ - printf "\n" - return - @} - printf "%c", substr(str, len, 1) - rev(str, len - 1) -@} -@end example - -@node Function Caveats, Return Statement, Function Example, User-defined -@section Calling User-defined Functions - -@dfn{Calling a function} means causing the function to run and do its job. -A function call is an expression, and its value is the value returned by -the function. - -A function call consists of the function name followed by the arguments -in parentheses. What you write in the call for the arguments are -@code{awk} expressions; each time the call is executed, these -expressions are evaluated, and the values are the actual arguments. For -example, here is a call to @code{foo} with three arguments (the first -being a string concatenation): - -@example -foo(x y, "lose", 4 * z) -@end example - -@quotation -@strong{Caution:} whitespace characters (spaces and tabs) are not allowed -between the function name and the open-parenthesis of the argument list. -If you write whitespace by mistake, @code{awk} might think that you mean -to concatenate a variable with an expression in parentheses. However, it -notices that you used a function name and not a variable name, and reports -an error. -@end quotation - -@cindex call by value -When a function is called, it is given a @emph{copy} of the values of -its arguments. This is called @dfn{call by value}. The caller may use -a variable as the expression for the argument, but the called function -does not know this: it only knows what value the argument had. For -example, if you write this code: - -@example -foo = "bar" -z = myfunc(foo) -@end example - -@noindent -then you should not think of the argument to @code{myfunc} as being -``the variable @code{foo}.'' Instead, think of the argument as the -string value, @code{"bar"}. - -If the function @code{myfunc} alters the values of its local variables, -this has no effect on any other variables. In particular, if @code{myfunc} -does this: - -@example -function myfunc (win) @{ - print win - win = "zzz" - print win -@} -@end example - -@noindent -to change its first argument variable @code{win}, this @emph{does not} -change the value of @code{foo} in the caller. The role of @code{foo} in -calling @code{myfunc} ended when its value, @code{"bar"}, was computed. -If @code{win} also exists outside of @code{myfunc}, the function body -cannot alter this outer value, because it is shadowed during the -execution of @code{myfunc} and cannot be seen or changed from there. - -@cindex call by reference -However, when arrays are the parameters to functions, they are @emph{not} -copied. Instead, the array itself is made available for direct manipulation -by the function. This is usually called @dfn{call by reference}. -Changes made to an array parameter inside the body of a function @emph{are} -visible outside that function. -@ifinfo -This can be @strong{very} dangerous if you do not watch what you are -doing. For example:@refill -@end ifinfo -@iftex -@emph{This can be very dangerous if you do not watch what you are -doing.} For example:@refill -@end iftex - -@example -function changeit (array, ind, nvalue) @{ - array[ind] = nvalue -@} - -BEGIN @{ - a[1] = 1 ; a[2] = 2 ; a[3] = 3 - changeit(a, 2, "two") - printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3] - @} -@end example - -@noindent -prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because calling -@code{changeit} stores @code{"two"} in the second element of @code{a}. - -@node Return Statement, , Function Caveats, User-defined -@section The @code{return} Statement -@cindex @code{return} statement - -The body of a user-defined function can contain a @code{return} statement. -This statement returns control to the rest of the @code{awk} program. It -can also be used to return a value for use in the rest of the @code{awk} -program. It looks like this:@refill - -@example -return @var{expression} -@end example - -The @var{expression} part is optional. If it is omitted, then the returned -value is undefined and, therefore, unpredictable. - -A @code{return} statement with no value expression is assumed at the end of -every function definition. So if control reaches the end of the function -body, then the function returns an unpredictable value. @code{awk} -will not warn you if you use the return value of such a function; you will -simply get unpredictable or unexpected results. - -Here is an example of a user-defined function that returns a value -for the largest number among the elements of an array:@refill - -@example -@group -function maxelt (vec, i, ret) @{ - for (i in vec) @{ - if (ret == "" || vec[i] > ret) - ret = vec[i] - @} - return ret -@} -@end group -@end example - -@noindent -You call @code{maxelt} with one argument, which is an array name. The local -variables @code{i} and @code{ret} are not intended to be arguments; -while there is nothing to stop you from passing two or three arguments -to @code{maxelt}, the results would be strange. The extra space before -@code{i} in the function parameter list is to indicate that @code{i} and -@code{ret} are not supposed to be arguments. This is a convention which -you should follow when you define functions. - -Here is a program that uses our @code{maxelt} function. It loads an -array, calls @code{maxelt}, and then reports the maximum number in that -array:@refill - -@example -@group -awk ' -function maxelt (vec, i, ret) @{ - for (i in vec) @{ - if (ret == "" || vec[i] > ret) - ret = vec[i] - @} - return ret -@} -@end group - -@group -# Load all fields of each record into nums. -@{ - for(i = 1; i <= NF; i++) - nums[NR, i] = $i -@} - -END @{ - print maxelt(nums) -@}' -@end group -@end example - -Given the following input: - -@example -@group - 1 5 23 8 16 -44 3 5 2 8 26 -256 291 1396 2962 100 --6 467 998 1101 -99385 11 0 225 -@end group -@end example - -@noindent -our program tells us (predictably) that: - -@example -99385 -@end example - -@noindent -is the largest number in our array. - -@node Built-in Variables, Command Line, User-defined, Top -@chapter Built-in Variables -@cindex built-in variables - -Most @code{awk} variables are available for you to use for your own -purposes; they never change except when your program assigns values to -them, and never affect anything except when your program examines them. - -A few variables have special built-in meanings. Some of them @code{awk} -examines automatically, so that they enable you to tell @code{awk} how -to do certain things. Others are set automatically by @code{awk}, so -that they carry information from the internal workings of @code{awk} to -your program. - -This chapter documents all the built-in variables of @code{gawk}. Most -of them are also documented in the chapters where their areas of -activity are described. - -@menu -* User-modified:: Built-in variables that you change - to control @code{awk}. -* Auto-set:: Built-in variables where @code{awk} - gives you information. -@end menu - -@node User-modified, Auto-set, Built-in Variables, Built-in Variables -@section Built-in Variables that Control @code{awk} -@cindex built-in variables, user modifiable - -This is a list of the variables which you can change to control how -@code{awk} does certain things. - -@table @code -@iftex -@vindex CONVFMT -@end iftex -@item CONVFMT -This string is used by @code{awk} to control conversion of numbers to -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}). -It works by being passed, in effect, as the first argument to the -@code{sprintf} function. Its default value is @code{"%.6g"}. -@code{CONVFMT} was introduced by the @sc{posix} standard.@refill - -@iftex -@vindex FIELDWIDTHS -@end iftex -@item FIELDWIDTHS -This is a space separated list of columns that tells @code{gawk} -how to manage input with fixed, columnar boundaries. It is an -experimental feature that is still evolving. Assigning to @code{FIELDWIDTHS} -overrides the use of @code{FS} for field splitting. -@xref{Constant Size, ,Reading Fixed-width Data}, for more information.@refill - -If @code{gawk} is in compatibility mode -(@pxref{Command Line, ,Invoking @code{awk}}), then @code{FIELDWIDTHS} -has no special meaning, and field splitting operations are done based -exclusively on the value of @code{FS}.@refill - -@iftex -@vindex FS -@end iftex -@item FS -@code{FS} is the input field separator -(@pxref{Field Separators, ,Specifying how Fields are Separated}). -The value is a single-character string or a multi-character regular -expression that matches the separations between fields in an input -record.@refill - -The default value is @w{@code{" "}}, a string consisting of a single -space. As a special exception, this value actually means that any -sequence of spaces and tabs is a single separator. It also causes -spaces and tabs at the beginning or end of a line to be ignored. - -You can set the value of @code{FS} on the command line using the -@samp{-F} option: - -@example -awk -F, '@var{program}' @var{input-files} -@end example - -If @code{gawk} is using @code{FIELDWIDTHS} for field-splitting, -assigning a value to @code{FS} will cause @code{gawk} to return to -the normal, regexp-based, field splitting. - -@item IGNORECASE -@iftex -@vindex IGNORECASE -@end iftex -If @code{IGNORECASE} is nonzero, then @emph{all} regular expression -matching is done in a case-independent fashion. In particular, regexp -matching with @samp{~} and @samp{!~}, and the @code{gsub} @code{index}, -@code{match}, @code{split} and @code{sub} functions all ignore case when -doing their particular regexp operations. @strong{Note:} since field -splitting with the value of the @code{FS} variable is also a regular -expression operation, that too is done with case ignored. -@xref{Case-sensitivity, ,Case-sensitivity in Matching}. - -If @code{gawk} is in compatibility mode -(@pxref{Command Line, ,Invoking @code{awk}}), then @code{IGNORECASE} has -no special meaning, and regexp operations are always case-sensitive.@refill - -@item OFMT -@iftex -@vindex OFMT -@end iftex -This string is used by @code{awk} to control conversion of numbers to -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}) for -printing with the @code{print} statement. -It works by being passed, in effect, as the first argument to the -@code{sprintf} function. Its default value is @code{"%.6g"}. -Earlier versions of @code{awk} also used @code{OFMT} to specify the -format for converting numbers to strings in general expressions; this -has been taken over by @code{CONVFMT}.@refill - -@item OFS -@iftex -@vindex OFS -@end iftex -This is the output field separator (@pxref{Output Separators}). It is -output between the fields output by a @code{print} statement. Its -default value is @w{@code{" "}}, a string consisting of a single space. - -@item ORS -@iftex -@vindex ORS -@end iftex -This is the output record separator. It is output at the end of every -@code{print} statement. Its default value is a string containing a -single newline character, which could be written as @code{"\n"}. -(@xref{Output Separators}.)@refill - -@item RS -@iftex -@vindex RS -@end iftex -This is @code{awk}'s input record separator. Its default value is a string -containing a single newline character, which means that an input record -consists of a single line of text. -(@xref{Records, ,How Input is Split into Records}.)@refill - -@item SUBSEP -@iftex -@vindex SUBSEP -@end iftex -@code{SUBSEP} is the subscript separator. It has the default value of -@code{"\034"}, and is used to separate the parts of the name of a -multi-dimensional array. Thus, if you access @code{foo[12,3]}, it -really accesses @code{foo["12\0343"]} -(@pxref{Multi-dimensional, ,Multi-dimensional Arrays}).@refill -@end table - -@node Auto-set, , User-modified, Built-in Variables -@section Built-in Variables that Convey Information - -This is a list of the variables that are set automatically by @code{awk} -on certain occasions so as to provide information to your program. - -@table @code -@item ARGC -@itemx ARGV -@iftex -@vindex ARGC -@vindex ARGV -@end iftex -The command-line arguments available to @code{awk} programs are stored in -an array called @code{ARGV}. @code{ARGC} is the number of command-line -arguments present. @xref{Command Line, ,Invoking @code{awk}}. -@code{ARGV} is indexed from zero to @w{@code{ARGC - 1}}. For example:@refill - -@example -awk 'BEGIN @{ - for (i = 0; i < ARGC; i++) - print ARGV[i] - @}' inventory-shipped BBS-list -@end example - -@noindent -In this example, @code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]} -contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains -@code{"BBS-list"}. The value of @code{ARGC} is 3, one more than the -index of the last element in @code{ARGV} since the elements are numbered -from zero.@refill - -The names @code{ARGC} and @code{ARGV}, as well the convention of indexing -the array from 0 to @w{@code{ARGC - 1}}, are derived from the C language's -method of accessing command line arguments.@refill - -Notice that the @code{awk} program is not entered in @code{ARGV}. The -other special command line options, with their arguments, are also not -entered. But variable assignments on the command line @emph{are} -treated as arguments, and do show up in the @code{ARGV} array. - -Your program can alter @code{ARGC} and the elements of @code{ARGV}. -Each time @code{awk} reaches the end of an input file, it uses the next -element of @code{ARGV} as the name of the next input file. By storing a -different string there, your program can change which files are read. -You can use @code{"-"} to represent the standard input. By storing -additional elements and incrementing @code{ARGC} you can cause -additional files to be read. - -If you decrease the value of @code{ARGC}, that eliminates input files -from the end of the list. By recording the old value of @code{ARGC} -elsewhere, your program can treat the eliminated arguments as -something other than file names. - -To eliminate a file from the middle of the list, store the null string -(@code{""}) into @code{ARGV} in place of the file's name. As a -special feature, @code{awk} ignores file names that have been -replaced with the null string. - -@ignore -see getopt.awk in the examples... -@end ignore - -@item ARGIND -@vindex ARGIND -The index in @code{ARGV} of the current file being processed. -Every time @code{gawk} opens a new data file for processing, it sets -@code{ARGIND} to the index in @code{ARGV} of the file name. Thus, the -condition @samp{FILENAME == ARGV[ARGIND]} is always true. - -This variable is useful in file processing; it allows you to tell how far -along you are in the list of data files, and to distinguish between -multiple successive instances of the same filename on the command line. - -While you can change the value of @code{ARGIND} within your @code{awk} -program, @code{gawk} will automatically set it to a new value when the -next file is opened. - -This variable is a @code{gawk} extension; in other @code{awk} implementations -it is not special. - -@item ENVIRON -@vindex ENVIRON -This is an array that contains the values of the environment. The array -indices are the environment variable names; the values are the values of -the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @file{/u/close}. Changing this array -does not affect the environment passed on to any programs that -@code{awk} may spawn via redirection or the @code{system} function. -(In a future version of @code{gawk}, it may do so.) - -Some operating systems may not have environment variables. -On such systems, the array @code{ENVIRON} is empty. - -@item ERRNO -@iftex -@vindex ERRNO -@end iftex -If a system error occurs either doing a redirection for @code{getline}, -during a read for @code{getline}, or during a @code{close} operation, -then @code{ERRNO} will contain a string describing the error. - -This variable is a @code{gawk} extension; in other @code{awk} implementations -it is not special. - -@item FILENAME -@iftex -@vindex FILENAME -@end iftex -This is the name of the file that @code{awk} is currently reading. -If @code{awk} is reading from the standard input (in other words, -there are no files listed on the command line), -@code{FILENAME} is set to @code{"-"}. -@code{FILENAME} is changed each time a new file is read -(@pxref{Reading Files, ,Reading Input Files}).@refill - -@item FNR -@iftex -@vindex FNR -@end iftex -@code{FNR} is the current record number in the current file. @code{FNR} is -incremented each time a new record is read -(@pxref{Getline, ,Explicit Input with @code{getline}}). It is reinitialized -to 0 each time a new input file is started.@refill - -@item NF -@iftex -@vindex NF -@end iftex -@code{NF} is the number of fields in the current input record. -@code{NF} is set each time a new record is read, when a new field is -created, or when @code{$0} changes (@pxref{Fields, ,Examining Fields}).@refill - -@item NR -@iftex -@vindex NR -@end iftex -This is the number of input records @code{awk} has processed since -the beginning of the program's execution. -(@pxref{Records, ,How Input is Split into Records}). -@code{NR} is set each time a new record is read.@refill - -@item RLENGTH -@iftex -@vindex RLENGTH -@end iftex -@code{RLENGTH} is the length of the substring matched by the -@code{match} function -(@pxref{String Functions, ,Built-in Functions for String Manipulation}). -@code{RLENGTH} is set by invoking the @code{match} function. Its value -is the length of the matched string, or @minus{}1 if no match was found.@refill - -@item RSTART -@iftex -@vindex RSTART -@end iftex -@code{RSTART} is the start-index in characters of the substring matched by the -@code{match} function -(@pxref{String Functions, ,Built-in Functions for String Manipulation}). -@code{RSTART} is set by invoking the @code{match} function. Its value -is the position of the string where the matched substring starts, or 0 -if no match was found.@refill -@end table - -@node Command Line, Language History, Built-in Variables, Top -@c node-name, next, previous, up -@chapter Invoking @code{awk} -@cindex command line -@cindex invocation of @code{gawk} -@cindex arguments, command line -@cindex options, command line -@cindex long options -@cindex options, long - -There are two ways to run @code{awk}: with an explicit program, or with -one or more program files. Here are templates for both of them; items -enclosed in @samp{@r{[}@dots{}@r{]}} in these templates are optional. - -Besides traditional one-letter @sc{posix}-style options, @code{gawk} also -supports GNU long named options. - -@example -awk @r{[@var{POSIX or GNU style options}]} -f progfile @r{[@code{--}]} @var{file} @dots{} -awk @r{[@var{POSIX or GNU style options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{} -@end example - -@menu -* Options:: Command line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* AWKPATH Variable:: Searching directories for @code{awk} programs. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -@end menu - -@node Options, Other Arguments, Command Line, Command Line -@section Command Line Options - -Options begin with a minus sign, and consist of a single character. -GNU style long named options consist of two minus signs and -a keyword that can be abbreviated if the abbreviation allows the option -to be uniquely identified. If the option takes an argument, then the -keyword is immediately followed by an equals sign (@samp{=}) and the -argument's value. For brevity, the discussion below only refers to the -traditional short options; however the long and short options are -interchangeable in all contexts. - -Each long named option for @code{gawk} has a corresponding -@sc{posix}-style option. The options and their meanings are as follows: - -@table @code -@item -F @var{fs} -@itemx --field-separator=@var{fs} -@iftex -@cindex @code{-F} option -@end iftex -@cindex @code{--field-separator} option -Sets the @code{FS} variable to @var{fs} -(@pxref{Field Separators, ,Specifying how Fields are Separated}).@refill - -@item -f @var{source-file} -@itemx --file=@var{source-file} -@iftex -@cindex @code{-f} option -@end iftex -@cindex @code{--file} option -Indicates that the @code{awk} program is to be found in @var{source-file} -instead of in the first non-option argument. - -@item -v @var{var}=@var{val} -@itemx --assign=@var{var}=@var{val} -@cindex @samp{-v} option -@cindex @code{--assign} option -Sets the variable @var{var} to the value @var{val} @emph{before} -execution of the program begins. Such variable values are available -inside the @code{BEGIN} rule (see below for a fuller explanation). - -The @samp{-v} option can only set one variable, but you can use -it more than once, setting another variable each time, like this: -@samp{@w{-v foo=1} @w{-v bar=2}}. - -@item -W @var{gawk-opt} -@cindex @samp{-W} option -Following the @sc{posix} standard, options that are implementation -specific are supplied as arguments to the @samp{-W} option. With @code{gawk}, -these arguments may be separated by commas, or quoted and separated by -whitespace. Case is ignored when processing these options. These options -also have corresponding GNU style long named options. The following -@code{gawk}-specific options are available: - -@table @code -@item -W compat -@itemx --compat -@cindex @code{--compat} option -Specifies @dfn{compatibility mode}, in which the GNU extensions in -@code{gawk} are disabled, so that @code{gawk} behaves just like Unix -@code{awk}. -@xref{POSIX/GNU, ,Extensions in @code{gawk} not in POSIX @code{awk}}, -which summarizes the extensions. Also see -@ref{Compatibility Mode, ,Downward Compatibility and Debugging}.@refill - -@item -W copyleft -@itemx -W copyright -@itemx --copyleft -@itemx --copyright -@cindex @code{--copyleft} option -@cindex @code{--copyright} option -Print the short version of the General Public License. -This option may disappear in a future version of @code{gawk}. - -@item -W help -@itemx -W usage -@itemx --help -@itemx --usage -@cindex @code{--help} option -@cindex @code{--usage} option -Print a ``usage'' message summarizing the short and long style options -that @code{gawk} accepts, and then exit. - -@item -W lint -@itemx --lint -@cindex @code{--lint} option -Provide warnings about constructs that are dubious or non-portable to -other @code{awk} implementations. -Some warnings are issued when @code{gawk} first reads your program. Others -are issued at run-time, as your program executes. - -@item -W posix -@itemx --posix -@cindex @code{--posix} option -Operate in strict @sc{posix} mode. This disables all @code{gawk} -extensions (just like @code{-W compat}), and adds the following additional -restrictions: - -@itemize @bullet{} -@item -@code{\x} escape sequences are not recognized -(@pxref{Constants, ,Constant Expressions}).@refill - -@item -The synonym @code{func} for the keyword @code{function} is not -recognized (@pxref{Definition Syntax, ,Syntax of Function Definitions}). - -@item -The operators @samp{**} and @samp{**=} cannot be used in -place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators}, -and also @pxref{Assignment Ops, ,Assignment Expressions}).@refill - -@item -Specifying @samp{-Ft} on the command line does not set the value -of @code{FS} to be a single tab character -(@pxref{Field Separators, ,Specifying how Fields are Separated}).@refill -@end itemize - -Although you can supply both @samp{-W compat} and @samp{-W posix} on the -command line, @samp{-W posix} will take precedence. - -@item -W source=@var{program-text} -@itemx --source=@var{program-text} -@cindex @code{--source} option -Program source code is taken from the @var{program-text}. This option -allows you to mix @code{awk} source code in files with program source -code that you would enter on the command line. This is particularly useful -when you have library functions that you wish to use from your command line -programs (@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}). - -@item -W version -@itemx --version -@cindex @code{--version} option -Prints version information for this particular copy of @code{gawk}. -This is so you can determine if your copy of @code{gawk} is up to date -with respect to whatever the Free Software Foundation is currently -distributing. This option may disappear in a future version of @code{gawk}. -@end table - -@item -- -Signals the end of the command line options. The following arguments -are not treated as options even if they begin with @samp{-}. This -interpretation of @samp{--} follows the @sc{posix} argument parsing -conventions. - -This is useful if you have file names that start with @samp{-}, -or in shell scripts, if you have file names that will be specified -by the user which could start with @samp{-}. -@end table - -Any other options are flagged as invalid with a warning message, but -are otherwise ignored. - -In compatibility mode, as a special case, if the value of @var{fs} supplied -to the @samp{-F} option is @samp{t}, then @code{FS} is set to the tab -character (@code{"\t"}). This is only true for @samp{-W compat}, and not -for @samp{-W posix} -(@pxref{Field Separators, ,Specifying how Fields are Separated}).@refill - -If the @samp{-f} option is @emph{not} used, then the first non-option -command line argument is expected to be the program text. - -The @samp{-f} option may be used more than once on the command line. -If it is, @code{awk} reads its program source from all of the named files, as -if they had been concatenated together into one big file. This is -useful for creating libraries of @code{awk} functions. Useful functions -can be written once, and then retrieved from a standard place, instead -of having to be included into each individual program. You can still -type in a program at the terminal and use library functions, by specifying -@samp{-f /dev/tty}. @code{awk} will read a file from the terminal -to use as part of the @code{awk} program. After typing your program, -type @kbd{Control-d} (the end-of-file character) to terminate it. -(You may also use @samp{-f -} to read program source from the standard -input, but then you will not be able to also use the standard input as a -source of data.) - -Because it is clumsy using the standard @code{awk} mechanisms to mix source -file and command line @code{awk} programs, @code{gawk} provides the -@samp{--source} option. This does not require you to pre-empt the standard -input for your source code, and allows you to easily mix command line -and library source code -(@pxref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}). - -If no @samp{-f} or @samp{--source} option is specified, then @code{gawk} -will use the first non-option command line argument as the text of the -program source code. - -@node Other Arguments, AWKPATH Variable, Options, Command Line -@section Other Command Line Arguments - -Any additional arguments on the command line are normally treated as -input files to be processed in the order specified. However, an -argument that has the form @code{@var{var}=@var{value}}, means to assign -the value @var{value} to the variable @var{var}---it does not specify a -file at all. - -@vindex ARGV -All these arguments are made available to your @code{awk} program in the -@code{ARGV} array (@pxref{Built-in Variables}). Command line options -and the program text (if present) are omitted from the @code{ARGV} -array. All other arguments, including variable assignments, are -included. - -The distinction between file name arguments and variable-assignment -arguments is made when @code{awk} is about to open the next input file. -At that point in execution, it checks the ``file name'' to see whether -it is really a variable assignment; if so, @code{awk} sets the variable -instead of reading a file. - -Therefore, the variables actually receive the specified values after all -previously specified files have been read. In particular, the values of -variables assigned in this fashion are @emph{not} available inside a -@code{BEGIN} rule -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}), -since such rules are run before @code{awk} begins scanning the argument list. -The values given on the command line are processed for escape sequences -(@pxref{Constants, ,Constant Expressions}).@refill - -In some earlier implementations of @code{awk}, when a variable assignment -occurred before any file names, the assignment would happen @emph{before} -the @code{BEGIN} rule was executed. Some applications came to depend -upon this ``feature.'' When @code{awk} was changed to be more consistent, -the @samp{-v} option was added to accommodate applications that depended -upon this old behavior. - -The variable assignment feature is most useful for assigning to variables -such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and -output formats, before scanning the data files. It is also useful for -controlling state if multiple passes are needed over a data file. For -example:@refill - -@cindex multiple passes over data -@cindex passes, multiple -@smallexample -awk 'pass == 1 @{ @var{pass 1 stuff} @} - pass == 2 @{ @var{pass 2 stuff} @}' pass=1 datafile pass=2 datafile -@end smallexample - -Given the variable assignment feature, the @samp{-F} option is not -strictly necessary. It remains for historical compatibility. - -@node AWKPATH Variable, Obsolete, Other Arguments, Command Line -@section The @code{AWKPATH} Environment Variable -@cindex @code{AWKPATH} environment variable -@cindex search path -@cindex directory search -@cindex path, search -@iftex -@cindex differences between @code{gawk} and @code{awk} -@end iftex - -The previous section described how @code{awk} program files can be named -on the command line with the @samp{-f} option. In some @code{awk} -implementations, you must supply a precise path name for each program -file, unless the file is in the current directory. - -But in @code{gawk}, if the file name supplied in the @samp{-f} option -does not contain a @samp{/}, then @code{gawk} searches a list of -directories (called the @dfn{search path}), one by one, looking for a -file with the specified name. - -The search path is actually a string consisting of directory names -separated by colons. @code{gawk} gets its search path from the -@code{AWKPATH} environment variable. If that variable does not exist, -@code{gawk} uses the default path, which is -@samp{.:/usr/lib/awk:/usr/local/lib/awk}. (Programs written by -system administrators should use an @code{AWKPATH} variable that -does not include the current directory, @samp{.}.)@refill - -The search path feature is particularly useful for building up libraries -of useful @code{awk} functions. The library files can be placed in a -standard directory that is in the default path, and then specified on -the command line with a short file name. Otherwise, the full file name -would have to be typed for each file. - -By combining the @samp{--source} and @samp{-f} options, your command line -@code{awk} programs can use facilities in @code{awk} library files. - -Path searching is not done if @code{gawk} is in compatibility mode. -This is true for both @samp{-W compat} and @samp{-W posix}. -@xref{Options, ,Command Line Options}. - -@strong{Note:} if you want files in the current directory to be found, -you must include the current directory in the path, either by writing -@file{.} as an entry in the path, or by writing a null entry in the -path. (A null entry is indicated by starting or ending the path with a -colon, or by placing two colons next to each other (@samp{::}).) If the -current directory is not included in the path, then files cannot be -found in the current directory. This path search mechanism is identical -to the shell's. -@c someday, @cite{The Bourne Again Shell}.... - -@node Obsolete, Undocumented, AWKPATH Variable, Command Line -@section Obsolete Options and/or Features - -@cindex deprecated options -@cindex obsolete options -@cindex deprecated features -@cindex obsolete features -This section describes features and/or command line options from the -previous release of @code{gawk} that are either not available in the -current version, or that are still supported but deprecated (meaning that -they will @emph{not} be in the next release). - -@c update this section for each release! - -For version 2.15 of @code{gawk}, the following command line options -from version 2.11.1 are no longer recognized. - -@table @samp -@ignore -@item -nostalgia -Use @samp{-W nostalgia} instead. -@end ignore - -@item -c -Use @samp{-W compat} instead. - -@item -V -Use @samp{-W version} instead. - -@item -C -Use @samp{-W copyright} instead. - -@item -a -@itemx -e -These options produce an ``unrecognized option'' error message but have -no effect on the execution of @code{gawk}. The @sc{posix} standard now -specifies traditional @code{awk} regular expressions for the @code{awk} utility. -@end table - -The public-domain version of @code{strftime} that is distributed with -@code{gawk} changed for the 2.14 release. The @samp{%V} conversion specifier -that used to generate the date in VMS format was changed to @samp{%v}. -This is because the @sc{posix} standard for the @code{date} utility now -specifies a @samp{%V} conversion specifier. -@xref{Time Functions, ,Functions for Dealing with Time Stamps}, for details. - -@node Undocumented, , Obsolete, Command Line -@section Undocumented Options and Features - -This section intentionally left blank. - -@c Read The Source, Luke! - -@ignore -@c If these came out in the Info file or TeX manual, then they wouldn't -@c be undocumented, would they? - -@code{gawk} has one undocumented option: - -@table @samp -@item -W nostalgia -Print the message @code{"awk: bailing out near line 1"} and dump core. -This option was inspired by the common behavior of very early versions of -Unix @code{awk}, and by a t--shirt. -@end table - -Early versions of @code{awk} used to not require any separator (either -a newline or @samp{;}) between the rules in @code{awk} programs. Thus, -it was common to see one-line programs like: - -@example -awk '@{ sum += $1 @} END @{ print sum @}' -@end example - -@code{gawk} actually supports this, but it is purposely undocumented -since it is considered bad style. The correct way to write such a program -is either - -@example -awk '@{ sum += $1 @} ; END @{ print sum @}' -@end example - -@noindent -or - -@example -awk '@{ sum += $1 @} - END @{ print sum @}' data -@end example - -@noindent -@xref{Statements/Lines, ,@code{awk} Statements versus Lines}, for a fuller -explanation.@refill - -As an accident of the implementation of the original Unix @code{awk}, if -a built-in function used @code{$0} as its default argument, it was possible -to call that function without the parentheses. In particular, it was -common practice to use the @code{length} function in this fashion. -For example, the pipeline: - -@example -echo abcdef | awk '@{ print length @}' -@end example - -@noindent -would print @samp{6}. - -For backwards compatibility with old programs, @code{gawk} supports -this usage, but only for the @code{length} function. New programs should -@emph{not} call the @code{length} function this way. In particular, -this usage will not be portable to other @sc{posix} compliant versions -of @code{awk}. It is also poor style. - -@end ignore - -@node Language History, Installation, Command Line, Top -@chapter The Evolution of the @code{awk} Language - -This manual describes the GNU implementation of @code{awk}, which is patterned -after the @sc{posix} specification. Many @code{awk} users are only familiar -with the original @code{awk} implementation in Version 7 Unix, which is also -the basis for the version in Berkeley Unix (through 4.3--Reno). This chapter -briefly describes the evolution of the @code{awk} language. - -@menu -* V7/S5R3.1:: The major changes between V7 and - System V Release 3.1. -* S5R4:: Minor changes between System V - Releases 3.1 and 4. -* POSIX:: New features from the @sc{posix} standard. -* POSIX/GNU:: The extensions in @code{gawk} - not in @sc{posix} @code{awk}. -@end menu - -@node V7/S5R3.1, S5R4, Language History, Language History -@section Major Changes between V7 and S5R3.1 - -The @code{awk} language evolved considerably between the release of -Version 7 Unix (1978) and the new version first made widely available in -System V Release 3.1 (1987). This section summarizes the changes, with -cross-references to further details. - -@itemize @bullet -@item -The requirement for @samp{;} to separate rules on a line -(@pxref{Statements/Lines, ,@code{awk} Statements versus Lines}). - -@item -User-defined functions, and the @code{return} statement -(@pxref{User-defined, ,User-defined Functions}). - -@item -The @code{delete} statement (@pxref{Delete, ,The @code{delete} Statement}). - -@item -The @code{do}-@code{while} statement -(@pxref{Do Statement, ,The @code{do}-@code{while} Statement}).@refill - -@item -The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand} and -@code{srand} (@pxref{Numeric Functions, ,Numeric Built-in Functions}). - -@item -The built-in functions @code{gsub}, @code{sub}, and @code{match} -(@pxref{String Functions, ,Built-in Functions for String Manipulation}). - -@item -The built-in functions @code{close}, which closes an open file, and -@code{system}, which allows the user to execute operating system -commands (@pxref{I/O Functions, ,Built-in Functions for Input/Output}).@refill -@c Does the above verbiage prevents an overfull hbox? --mew, rjc 24jan1992 - -@item -The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART}, -and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}). - -@item -The conditional expression using the operators @samp{?} and @samp{:} -(@pxref{Conditional Exp, ,Conditional Expressions}).@refill - -@item -The exponentiation operator @samp{^} -(@pxref{Arithmetic Ops, ,Arithmetic Operators}) and its assignment operator -form @samp{^=} (@pxref{Assignment Ops, ,Assignment Expressions}).@refill - -@item -C-compatible operator precedence, which breaks some old @code{awk} -programs (@pxref{Precedence, ,Operator Precedence (How Operators Nest)}). - -@item -Regexps as the value of @code{FS} -(@pxref{Field Separators, ,Specifying how Fields are Separated}), and as the -third argument to the @code{split} function -(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill - -@item -Dynamic regexps as operands of the @samp{~} and @samp{!~} operators -(@pxref{Regexp Usage, ,How to Use Regular Expressions}). - -@item -Escape sequences (@pxref{Constants, ,Constant Expressions}) in regexps.@refill - -@item -The escape sequences @samp{\b}, @samp{\f}, and @samp{\r} -(@pxref{Constants, ,Constant Expressions}). - -@item -Redirection of input for the @code{getline} function -(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill - -@item -Multiple @code{BEGIN} and @code{END} rules -(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}).@refill - -@item -Simulated multi-dimensional arrays -(@pxref{Multi-dimensional, ,Multi-dimensional Arrays}).@refill -@end itemize - -@node S5R4, POSIX, V7/S5R3.1, Language History -@section Changes between S5R3.1 and S5R4 - -The System V Release 4 version of Unix @code{awk} added these features -(some of which originated in @code{gawk}): - -@itemize @bullet -@item -The @code{ENVIRON} variable (@pxref{Built-in Variables}). - -@item -Multiple @samp{-f} options on the command line -(@pxref{Command Line, ,Invoking @code{awk}}).@refill - -@item -The @samp{-v} option for assigning variables before program execution begins -(@pxref{Command Line, ,Invoking @code{awk}}).@refill - -@item -The @samp{--} option for terminating command line options. - -@item -The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences -(@pxref{Constants, ,Constant Expressions}).@refill - -@item -A defined return value for the @code{srand} built-in function -(@pxref{Numeric Functions, ,Numeric Built-in Functions}). - -@item -The @code{toupper} and @code{tolower} built-in string functions -for case translation -(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill - -@item -A cleaner specification for the @samp{%c} format-control letter in the -@code{printf} function -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill - -@item -The ability to dynamically pass the field width and precision (@code{"%*.*d"}) -in the argument list of the @code{printf} function -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill - -@item -The use of constant regexps such as @code{/foo/} as expressions, where -they are equivalent to use of the matching operator, as in @code{$0 ~ -/foo/} (@pxref{Constants, ,Constant Expressions}). -@end itemize - -@node POSIX, POSIX/GNU, S5R4, Language History -@section Changes between S5R4 and POSIX @code{awk} - -The @sc{posix} Command Language and Utilities standard for @code{awk} -introduced the following changes into the language: - -@itemize @bullet{} -@item -The use of @samp{-W} for implementation-specific options. - -@item -The use of @code{CONVFMT} for controlling the conversion of numbers -to strings (@pxref{Conversion, ,Conversion of Strings and Numbers}). - -@item -The concept of a numeric string, and tighter comparison rules to go -with it (@pxref{Comparison Ops, ,Comparison Expressions}). - -@item -More complete documentation of many of the previously undocumented -features of the language. -@end itemize - -@node POSIX/GNU, , POSIX, Language History -@section Extensions in @code{gawk} not in POSIX @code{awk} - -The GNU implementation, @code{gawk}, adds these features: - -@itemize @bullet -@item -The @code{AWKPATH} environment variable for specifying a path search for -the @samp{-f} command line option -(@pxref{Command Line, ,Invoking @code{awk}}).@refill - -@item -The various @code{gawk} specific features available via the @samp{-W} -command line option (@pxref{Command Line, ,Invoking @code{awk}}). - -@item -The @code{ARGIND} variable, that tracks the movement of @code{FILENAME} -through @code{ARGV}. (@pxref{Built-in Variables}). - -@item -The @code{ERRNO} variable, that contains the system error message when -@code{getline} returns @minus{}1, or when @code{close} fails. -(@pxref{Built-in Variables}). - -@item -The @code{IGNORECASE} variable and its effects -(@pxref{Case-sensitivity, ,Case-sensitivity in Matching}).@refill - -@item -The @code{FIELDWIDTHS} variable and its effects -(@pxref{Constant Size, ,Reading Fixed-width Data}).@refill - -@item -The @code{next file} statement for skipping to the next data file -(@pxref{Next File Statement, ,The @code{next file} Statement}).@refill - -@item -The @code{systime} and @code{strftime} built-in functions for obtaining -and printing time stamps -(@pxref{Time Functions, ,Functions for Dealing with Time Stamps}).@refill - -@item -The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr}, and -@file{/dev/fd/@var{n}} file name interpretation -(@pxref{Special Files, ,Standard I/O Streams}).@refill - -@item -The @samp{-W compat} option to turn off these extensions -(@pxref{Command Line, ,Invoking @code{awk}}).@refill - -@item -The @samp{-W posix} option for full @sc{posix} compliance -(@pxref{Command Line, ,Invoking @code{awk}}).@refill - -@end itemize - -@node Installation, Gawk Summary, Language History, Top -@chapter Installing @code{gawk} - -This chapter provides instructions for installing @code{gawk} on the -various platforms that are supported by the developers. The primary -developers support Unix (and one day, GNU), while the other ports were -contributed. The file @file{ACKNOWLEDGMENT} in the @code{gawk} -distribution lists the electronic mail addresses of the people who did -the respective ports.@refill - -@menu -* Gawk Distribution:: What is in the @code{gawk} distribution. -* Unix Installation:: Installing @code{gawk} under various versions - of Unix. -* VMS Installation:: Installing @code{gawk} on VMS. -* MS-DOS Installation:: Installing @code{gawk} on MS-DOS. -* Atari Installation:: Installing @code{gawk} on the Atari ST. -@end menu - -@node Gawk Distribution, Unix Installation, Installation, Installation -@section The @code{gawk} Distribution - -This section first describes how to get and extract the @code{gawk} -distribution, and then discusses what is in the various files and -subdirectories. - -@menu -* Extracting:: How to get and extract the distribution. -* Distribution contents:: What is in the distribution. -@end menu - -@node Extracting, Distribution contents, Gawk Distribution, Gawk Distribution -@subsection Getting the @code{gawk} Distribution - -@cindex getting gawk -@cindex anonymous ftp -@cindex anonymous uucp -@cindex ftp, anonymous -@cindex uucp, anonymous -@code{gawk} is distributed as a @code{tar} file compressed with the -GNU Zip program, @code{gzip}. You can -get it via anonymous @code{ftp} to the Internet host @code{prep.ai.mit.edu}. -Like all GNU software, it will be archived at other well known systems, -from which it will be possible to use some sort of anonymous @code{uucp} to -obtain the distribution as well. -You can also order @code{gawk} on tape or CD-ROM directly from the -Free Software Foundation. (The address is on the copyright page.) -Doing so directly contributes to the support of the foundation and to -the production of more free software. - -Once you have the distribution (for example, -@file{gawk-2.15.0.tar.z}), first use @code{gzip} to expand the -file, and then use @code{tar} to extract it. You can use the following -pipeline to produce the @code{gawk} distribution: - -@example -# Under System V, add 'o' to the tar flags -gzip -d -c gawk-2.15.0.tar.z | tar -xvpf - -@end example - -@noindent -This will create a directory named @file{gawk-2.15} in the current -directory. - -The distribution file name is of the form @file{gawk-2.15.@var{n}.tar.Z}. -The @var{n} represents a @dfn{patchlevel}, meaning that minor bugs have -been fixed in the major release. The current patchlevel is 0, but when -retrieving distributions, you should get the version with the highest -patchlevel.@refill - -If you are not on a Unix system, you will need to make other arrangements -for getting and extracting the @code{gawk} distribution. You should consult -a local expert. - -@node Distribution contents, , Extracting, Gawk Distribution -@subsection Contents of the @code{gawk} Distribution - -@code{gawk} has a number of C source files, documentation files, -subdirectories and files related to the configuration process -(@pxref{Unix Installation, ,Compiling and Installing @code{gawk} on Unix}), -and several subdirectories related to different, non-Unix, -operating systems.@refill - -@table @asis -@item various @samp{.c}, @samp{.y}, and @samp{.h} files - -The C and YACC source files are the actual @code{gawk} source code. -@end table - -@table @file -@item README -@itemx README.VMS -@itemx README.dos -@itemx README.rs6000 -@itemx README.ultrix -Descriptive files: @file{README} for @code{gawk} under Unix, and the -rest for the various hardware and software combinations. - -@item PORTS -A list of systems to which @code{gawk} has been ported, and which -have successfully run the test suite. - -@item ACKNOWLEDGMENT -A list of the people who contributed major parts of the code or documentation. - -@item NEWS -A list of changes to @code{gawk} since the last release or patch. - -@item COPYING -The GNU General Public License. - -@item FUTURES -A brief list of features and/or changes being contemplated for future -releases, with some indication of the time frame for the feature, based -on its difficulty. - -@item LIMITATIONS -A list of those factors that limit @code{gawk}'s performance. -Most of these depend on the hardware or operating system software, and -are not limits in @code{gawk} itself.@refill - -@item PROBLEMS -A file describing known problems with the current release. - -@item gawk.1 -The @code{troff} source for a manual page describing @code{gawk}. - -@item gawk.texinfo -@ifinfo -The @code{texinfo} source file for this Info file. -It should be processed with @TeX{} to produce a printed manual, and -with @code{makeinfo} to produce the Info file.@refill -@end ifinfo -@iftex -The @code{texinfo} source file for this manual. -It should be processed with @TeX{} to produce a printed manual, and -with @code{makeinfo} to produce the Info file.@refill -@end iftex - -@item Makefile.in -@itemx config -@itemx config.in -@itemx configure -@itemx missing -@itemx mungeconf -These files and subdirectories are used when configuring @code{gawk} -for various Unix systems. They are explained in detail in -@ref{Unix Installation, ,Compiling and Installing @code{gawk} on Unix}.@refill - -@item atari -Files needed for building @code{gawk} on an Atari ST. -@xref{Atari Installation, ,Installing @code{gawk} on the Atari ST}, for details. - -@item pc -Files needed for building @code{gawk} under MS-DOS. -@xref{MS-DOS Installation, ,Installing @code{gawk} on MS-DOS}, for details. - -@item vms -Files needed for building @code{gawk} under VMS. -@xref{VMS Installation, ,Compiling Installing and Running @code{gawk} on VMS}, for details. - -@item test -Many interesting @code{awk} programs, provided as a test suite for -@code{gawk}. You can use @samp{make test} from the top level @code{gawk} -directory to run your version of @code{gawk} against the test suite. -@c There are many programs here that are useful in their own right. -If @code{gawk} successfully passes @samp{make test} then you can -be confident of a successful port.@refill -@end table - -@node Unix Installation, VMS Installation, Gawk Distribution, Installation -@section Compiling and Installing @code{gawk} on Unix - -Often, you can compile and install @code{gawk} by typing only two -commands. However, if you do not use a supported system, you may need -to configure @code{gawk} for your system yourself. - -@menu -* Quick Installation:: Compiling @code{gawk} on a - supported Unix version. -* Configuration Philosophy:: How it's all supposed to work. -* New Configurations:: What to do if there is no supplied - configuration for your system. -@end menu - -@node Quick Installation, Configuration Philosophy, Unix Installation, Unix Installation -@subsection Compiling @code{gawk} for a Supported Unix Version - -@cindex installation, unix -After you have extracted the @code{gawk} distribution, @code{cd} -to @file{gawk-2.15}. Look in the @file{config} subdirectory for a -file that matches your hardware/software combination. In general, -only the software is relevant; for example @code{sunos41} is used -for SunOS 4.1, on both Sun 3 and Sun 4 hardware.@refill - -If you find such a file, run the command: - -@example -# assume you have SunOS 4.1 -./configure sunos41 -@end example - -This produces a @file{Makefile} and @file{config.h} tailored to your -system. You may wish to edit the @file{Makefile} to use a different -C compiler, such as @code{gcc}, the GNU C compiler, if you have it. -You may also wish to change the @code{CFLAGS} variable, which controls -the command line options that are passed to the C compiler (such as -optimization levels, or compiling for debugging).@refill - -After you have configured @file{Makefile} and @file{config.h}, type: - -@example -make -@end example - -@noindent -and shortly thereafter, you should have an executable version of @code{gawk}. -That's all there is to it! - -@node Configuration Philosophy, New Configurations, Quick Installation, Unix Installation -@subsection The Configuration Process - -(This section is of interest only if you know something about using the -C language and the Unix operating system.) - -The source code for @code{gawk} generally attempts to adhere to industry -standards wherever possible. This means that @code{gawk} uses library -routines that are specified by the @sc{ansi} C standard and by the @sc{posix} -operating system interface standard. When using an @sc{ansi} C compiler, -function prototypes are provided to help improve the compile-time checking. - -Many older Unix systems do not support all of either the @sc{ansi} or the -@sc{posix} standards. The @file{missing} subdirectory in the @code{gawk} -distribution contains replacement versions of those subroutines that are -most likely to be missing. - -The @file{config.h} file that is created by the @code{configure} program -contains definitions that describe features of the particular operating -system where you are attempting to compile @code{gawk}. For the most -part, it lists which standard subroutines are @emph{not} available. -For example, if your system lacks the @samp{getopt} routine, then -@samp{GETOPT_MISSING} would be defined. - -@file{config.h} also defines constants that describe facts about your -variant of Unix. For example, there may not be an @samp{st_blksize} -element in the @code{stat} structure. In this case @samp{BLKSIZE_MISSING} -would be defined. - -Based on the list in @file{config.h} of standard subroutines that are -missing, @file{missing.c} will do a @samp{#include} of the appropriate -file(s) from the @file{missing} subdirectory.@refill - -Conditionally compiled code in the other source files relies on the -other definitions in the @file{config.h} file. - -Besides creating @file{config.h}, @code{configure} produces a @file{Makefile} -from @file{Makefile.in}. There are a number of lines in @file{Makefile.in} -that are system or feature specific. For example, there is line that begins -with @samp{##MAKE_ALLOCA_C##}. This is normally a comment line, since -it starts with @samp{#}. If a configuration file has @samp{MAKE_ALLOCA_C} -in it, then @code{configure} will delete the @samp{##MAKE_ALLOCA_C##} -from the beginning of the line. This will enable the rules in the -@file{Makefile} that use a C version of @samp{alloca}. There are several -similar features that work in this fashion.@refill - -@node New Configurations, , Configuration Philosophy, Unix Installation -@subsection Configuring @code{gawk} for a New System - -(This section is of interest only if you know something about using the -C language and the Unix operating system, and if you have to install -@code{gawk} on a system that is not supported by the @code{gawk} distribution. -If you are a C or Unix novice, get help from a local expert.) - -If you need to configure @code{gawk} for a Unix system that is not -supported in the distribution, first see -@ref{Configuration Philosophy, ,The Configuration Process}. -Then, copy @file{config.in} to @file{config.h}, and copy -@file{Makefile.in} to @file{Makefile}.@refill - -Next, edit both files. Both files are liberally commented, and the -necessary changes should be straightforward. - -While editing @file{config.h}, you need to determine what library -routines you do or do not have by consulting your system documentation, or -by perusing your actual libraries using the @code{ar} or @code{nm} utilities. -In the worst case, simply do not define @emph{any} of the macros for missing -subroutines. When you compile @code{gawk}, the final link-editing step -will fail. The link editor will provide you with a list of unresolved external -references---these are the missing subroutines. Edit @file{config.h} again -and recompile, and you should be set.@refill - -Editing the @file{Makefile} should also be straightforward. Enable or -disable the lines that begin with @samp{##MAKE_@var{whatever}##}, as -appropriate. Select the correct C compiler and @code{CFLAGS} for it. -Then run @code{make}. - -Getting a correct configuration is likely to be an iterative process. -Do not be discouraged if it takes you several tries. If you have no -luck whatsoever, please report your system type, and the steps you took. -Once you do have a working configuration, please send it to the maintainers -so that support for your system can be added to the official release. - -@xref{Bugs, ,Reporting Problems and Bugs}, for information on how to report -problems in configuring @code{gawk}. You may also use the same mechanisms -for sending in new configurations.@refill - -@node VMS Installation, MS-DOS Installation, Unix Installation, Installation -@section Compiling, Installing, and Running @code{gawk} on VMS - -@c based on material from -@c Pat Rankin <rankin@eql.caltech.edu> - -@cindex installation, vms -This section describes how to compile and install @code{gawk} under VMS. - -@menu -* VMS Compilation:: How to compile @code{gawk} under VMS. -* VMS Installation Details:: How to install @code{gawk} under VMS. -* VMS Running:: How to run @code{gawk} under VMS. -* VMS POSIX:: Alternate instructions for VMS POSIX. -@end menu - -@node VMS Compilation, VMS Installation Details, VMS Installation, VMS Installation -@subsection Compiling @code{gawk} under VMS - -To compile @code{gawk} under VMS, there is a @code{DCL} command procedure that -will issue all the necessary @code{CC} and @code{LINK} commands, and there is -also a @file{Makefile} for use with the @code{MMS} utility. From the source -directory, use either - -@smallexample -$ @@[.VMS]VMSBUILD.COM -@end smallexample - -@noindent -or - -@smallexample -$ MMS/DESCRIPTION=[.VMS]DECSRIP.MMS GAWK -@end smallexample - -Depending upon which C compiler you are using, follow one of the sets -of instructions in this table: - -@table @asis -@item VAX C V3.x -Use either @file{vmsbuild.com} or @file{descrip.mms} as is. These use -@code{CC/OPTIMIZE=NOLINE}, which is essential for Version 3.0. - -@item VAX C V2.x -You must have Version 2.3 or 2.4; older ones won't work. Edit either -@file{vmsbuild.com} or @file{descrip.mms} according to the comments in them. -For @file{vmsbuild.com}, this just entails removing two @samp{!} delimiters. -Also edit @file{config.h} (which is a copy of file @file{[.config]vms-conf.h}) -and comment out or delete the two lines @samp{#define __STDC__ 0} and -@samp{#define VAXC_BUILTINS} near the end.@refill - -@item GNU C -Edit @file{vmsbuild.com} or @file{descrip.mms}; the changes are different -from those for VAX C V2.x, but equally straightforward. No changes to -@file{config.h} should be needed. - -@item DEC C -Edit @file{vmsbuild.com} or @file{descrip.mms} according to their comments. -No changes to @file{config.h} should be needed. -@end table - -@code{gawk} 2.15 has been tested under VAX/VMS 5.5-1 using VAX C V3.2, -GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6 and up. - -@node VMS Installation Details, VMS Running, VMS Compilation, VMS Installation -@subsection Installing @code{gawk} on VMS - -To install @code{gawk}, all you need is a ``foreign'' command, which is -a @code{DCL} symbol whose value begins with a dollar sign. - -@smallexample -$ GAWK :== $device:[directory]GAWK -@end smallexample - -@noindent -(Substitute the actual location of @code{gawk.exe} for -@samp{device:[directory]}.) The symbol should be placed in the -@file{login.com} of any user who wishes to run @code{gawk}, -so that it will be defined every time the user logs on. -Alternatively, the symbol may be placed in the system-wide -@file{sylogin.com} procedure, which will allow all users -to run @code{gawk}.@refill - -Optionally, the help entry can be loaded into a VMS help library: - -@smallexample -$ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP -@end smallexample - -@noindent -(You may want to substitute a site-specific help library rather than -the standard VMS library @samp{HELPLIB}.) After loading the help text, - -@c this is so tiny, but `should' be smallexample for consistency sake... -@c I didn't because it was so short. --mew 29jan1992 -@example -$ HELP GAWK -@end example - -@noindent -will provide information about both the @code{gawk} implementation and the -@code{awk} programming language. - -The logical name @samp{AWK_LIBRARY} can designate a default location -for @code{awk} program files. For the @samp{-f} option, if the specified -filename has no device or directory path information in it, @code{gawk} -will look in the current directory first, then in the directory specified -by the translation of @samp{AWK_LIBRARY} if the file was not found. -If after searching in both directories, the file still is not found, -then @code{gawk} appends the suffix @samp{.awk} to the filename and the -file search will be re-tried. If @samp{AWK_LIBRARY} is not defined, that -portion of the file search will fail benignly.@refill - -@node VMS Running, VMS POSIX, VMS Installation Details, VMS Installation -@subsection Running @code{gawk} on VMS - -Command line parsing and quoting conventions are significantly different -on VMS, so examples in this manual or from other sources often need minor -changes. They @emph{are} minor though, and all @code{awk} programs -should run correctly. - -Here are a couple of trivial tests: - -@smallexample -$ gawk -- "BEGIN @{print ""Hello, World!""@}" -$ gawk -"W" version ! could also be -"W version" or "-W version" -@end smallexample - -@noindent -Note that upper-case and mixed-case text must be quoted. - -The VMS port of @code{gawk} includes a @code{DCL}-style interface in addition -to the original shell-style interface (see the help entry for details). -One side-effect of dual command line parsing is that if there is only a -single parameter (as in the quoted string program above), the command -becomes ambiguous. To work around this, the normally optional @samp{--} -flag is required to force Unix style rather than @code{DCL} parsing. If any -other dash-type options (or multiple parameters such as data files to be -processed) are present, there is no ambiguity and @samp{--} can be omitted. - -The default search path when looking for @code{awk} program files specified -by the @samp{-f} option is @code{"SYS$DISK:[],AWK_LIBRARY:"}. The logical -name @samp{AWKPATH} can be used to override this default. The format -of @samp{AWKPATH} is a comma-separated list of directory specifications. -When defining it, the value should be quoted so that it retains a single -translation, and not a multi-translation @code{RMS} searchlist. - -@node VMS POSIX, , VMS Running, VMS Installation -@subsection Building and using @code{gawk} under VMS POSIX - -Ignore the instructions above, although @file{vms/gawk.hlp} should still -be made available in a help library. Make sure that the two scripts, -@file{configure} and @file{mungeconf}, are executable; use @samp{chmod +x} -on them if necessary. Then execute the following commands: - -@smallexample -$ POSIX -psx> configure vms-posix -psx> make awktab.c gawk -@end smallexample - -@noindent -The first command will construct files @file{config.h} and @file{Makefile} -out of templates. The second command will compile and link @code{gawk}. -Due to a @code{make} bug in VMS POSIX V1.0 and V1.1, -the file @file{awktab.c} must be given as an explicit target or it will -not be built and the final link step will fail. Ignore the warning -@samp{"Could not find lib m in lib list"}; it is harmless, caused by the -explicit use of @samp{-lm} as a linker option which is not needed -under VMS POSIX. Under V1.1 (but not V1.0) a problem with the @code{yacc} -skeleton @file{/etc/yyparse.c} will cause a compiler warning for -@file{awktab.c}, followed by a linker warning about compilation warnings -in the resulting object module. These warnings can be ignored.@refill - -Once built, @code{gawk} will work like any other shell utility. Unlike -the normal VMS port of @code{gawk}, no special command line manipulation is -needed in the VMS POSIX environment. - -@node MS-DOS Installation, Atari Installation, VMS Installation, Installation -@section Installing @code{gawk} on MS-DOS - -@cindex installation, ms-dos -The first step is to get all the files in the @code{gawk} distribution -onto your PC. Move all the files from the @file{pc} directory into -the main directory where the other files are. Edit the file -@file{make.bat} so that it will be an acceptable MS-DOS batch file. -This means making sure that all lines are terminated with the ASCII -carriage return and line feed characters. -restrictions. - -@code{gawk} has only been compiled with version 5.1 of the Microsoft -C compiler. The file @file{make.bat} from the @file{pc} directory -assumes that you have this compiler. - -Copy the file @file{setargv.obj} from the library directory where it -resides to the @code{gawk} source code directory. - -Run @file{make.bat}. This will compile @code{gawk} for you, and link it. -That's all there is to it! - -@node Atari Installation, , MS-DOS Installation, Installation -@section Installing @code{gawk} on the Atari ST - -@c based on material from -@c Michal Jaegermann <ntomczak@vm.ucs.ualberta.ca> - -@cindex installation, atari -This section assumes that you are running TOS. It applies to other Atari -models (STe, TT) as well. - -In order to use @code{gawk}, you need to have a shell, either text or -graphics, that does not map all the characters of a command line to -upper case. Maintaining case distinction in option flags is very -important (@pxref{Command Line, ,Invoking @code{awk}}). Popular shells -like @code{gulam} or @code{gemini} will work, as will newer versions of -@code{desktop}. Support for I/O redirection is necessary to make it easy -to import @code{awk} programs from other environments. Pipes are nice to have, -but not vital. - -If you have received an executable version of @code{gawk}, place it, -as usual, anywhere in your @code{PATH} where your shell will find it. - -While executing, @code{gawk} creates a number of temporary files. -@code{gawk} looks for either of the environment variables @code{TEMP} -or @code{TMPDIR}, in that order. If either one is found, its value -is assumed to be a directory for temporary files. This directory -must exist, and if you can spare the memory, it is a good idea to -put it on a @sc{ram} drive. If neither @code{TEMP} nor @code{TMPDIR} -are found, then @code{gawk} uses the current directory for its -temporary files. - -The ST version of @code{gawk} searches for its program files as -described in @ref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}. -On the ST, the default value for the @code{AWKPATH} variable is -@code{@w{".,c:\lib\awk,c:\gnu\lib\awk"}}. -The search path can be modified by explicitly setting @code{AWKPATH} to -whatever you wish. Note that colons cannot be used on the ST to separate -elements in the @code{AWKPATH} variable, since they have another, reserved, -meaning. Instead, you must use a comma to separate elements in the path. -If you are recompiling @code{gawk} on the ST, then you can choose a new -default search path, by setting the value of @samp{DEFPATH} in the file -@file{...\config\atari}. You may choose a different separator character -by setting the value of @samp{ENVSEP} in the same file. The new values will -be used when creating the header file @file{config.h}.@refill - -@ignore -As a last resort, small -adjustments can be made directly on the executable version of @code{gawk} -using a binary editor.@refill -@end ignore - -Although @code{awk} allows great flexibility in doing I/O redirections -from within a program, this facility should be used with care on the ST. -In some circumstances the OS routines for file handle pool processing -lose track of certain events, causing the computer to crash, and requiring -a reboot. Often a warm reboot is sufficient. Fortunately, this happens -infrequently, and in rather esoteric situations. In particular, avoid -having one part of an @code{awk} program using @code{print} -statements explicitly redirected to @code{"/dev/stdout"}, while other -@code{print} statements use the default standard output, and a -calling shell has redirected standard output to a file.@refill -@c whew! - -When @code{gawk} is compiled with the ST version of @code{gcc} and its -usual libraries, it will accept both @samp{/} and @samp{\} as path separators. -While this is convenient, it should be remembered that this removes one, -technically legal, character (@samp{/}) from your file names, and that -it may create problems for external programs, called via the @code{system()} -function, which may not support this convention. Whenever it is possible -that a file created by @code{gawk} will be used by some other program, -use only backslashes. Also remember that in @code{awk}, backslashes in -strings have to be doubled in order to get literal backslashes. - -The initial port of @code{gawk} to the ST was done with @code{gcc}. -If you wish to recompile @code{gawk} from scratch, you will need to use -a compiler that accepts @sc{ansi} standard C (such as @code{gcc}, Turbo C, -or Prospero C). If @code{sizeof(int) != @w{sizeof(int *)}}, the correctness -of the generated code depends heavily on the fact that all function calls -have function prototypes in the current scope. If your compiler does -not accept function prototypes, you will probably have to add a -number of casts to the code.@refill - -If you are using @code{gcc}, make sure that you have up-to-date libraries. -Older versions have problems with some library functions (@code{atan2()}, -@code{strftime()}, the @samp{%g} conversion in @code{sprintf()}) which -may affect the operation of @code{gawk}. - -In the @file{atari} subdirectory of the @code{gawk} distribution is -a version of the @code{system()} function that has been tested with -@code{gulam} and @code{msh}; it should work with other shells as well. -With @code{gulam}, it passes the string to be executed without spawning -an extra copy of a shell. It is possible to replace this version of -@code{system()} with a similar function from a library or from some other -source if that version would be a better choice for the shell you prefer. - -The files needed to recompile @code{gawk} on the ST can be found in -the @file{atari} directory. The provided files and instructions below -assume that you have the GNU C compiler (@code{gcc}), the @code{gulam} shell, -and an ST version of @code{sed}. The @file{Makefile} is set up to use -@file{byacc} as a @file{yacc} replacement. With a different set of tools some -adjustments and/or editing will be needed.@refill - -@code{cd} to the @file{atari} directory. Copy @file{Makefile.st} to -@file{makefile} in the source (parent) directory. Possibly adjust -@file{../config/atari} to suit your system. Execute the script @file{mkconf.g} -which will create the header file @file{../config.h}. Go back to the source -directory. If you are not using @code{gcc}, check the file @file{missing.c}. -It may be necessary to change forward slashes in the references to files -from the @file{atari} subdirectory into backslashes. Type @code{make} and -enjoy.@refill - -Compilation with @code{gcc} of some of the bigger modules, like -@file{awk_tab.c}, may require a full four megabytes of memory. On smaller -machines you would need to cut down on optimizations, or you would have to -switch to another, less memory hungry, compiler.@refill - -@node Gawk Summary, Sample Program, Installation, Top -@appendix @code{gawk} Summary - -This appendix provides a brief summary of the @code{gawk} command line and the -@code{awk} language. It is designed to serve as ``quick reference.'' It is -therefore terse, but complete. - -@menu -* Command Line Summary:: Recapitulation of the command line. -* Language Summary:: A terse review of the language. -* Variables/Fields:: Variables, fields, and arrays. -* Rules Summary:: Patterns and Actions, and their - component parts. -* Functions Summary:: Defining and calling functions. -* Historical Features:: Some undocumented but supported ``features''. -@end menu - -@node Command Line Summary, Language Summary, Gawk Summary, Gawk Summary -@appendixsec Command Line Options Summary - -The command line consists of options to @code{gawk} itself, the -@code{awk} program text (if not supplied via the @samp{-f} option), and -values to be made available in the @code{ARGC} and @code{ARGV} -predefined @code{awk} variables: - -@example -awk @r{[@var{POSIX or GNU style options}]} -f source-file @r{[@code{--}]} @var{file} @dots{} -awk @r{[@var{POSIX or GNU style options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{} -@end example - -The options that @code{gawk} accepts are: - -@table @code -@item -F @var{fs} -@itemx --field-separator=@var{fs} -Use @var{fs} for the input field separator (the value of the @code{FS} -predefined variable). - -@item -f @var{program-file} -@itemx --file=@var{program-file} -Read the @code{awk} program source from the file @var{program-file}, instead -of from the first command line argument. - -@item -v @var{var}=@var{val} -@itemx --assign=@var{var}=@var{val} -Assign the variable @var{var} the value @var{val} before program execution -begins. - -@item -W compat -@itemx --compat -Specifies compatibility mode, in which @code{gawk} extensions are turned -off. - -@item -W copyleft -@itemx -W copyright -@itemx --copyleft -@itemx --copyright -Print the short version of the General Public License on the error -output. This option may disappear in a future version of @code{gawk}. - -@item -W help -@itemx -W usage -@itemx --help -@itemx --usage -Print a relatively short summary of the available options on the error output. - -@item -W lint -@itemx --lint -Give warnings about dubious or non-portable @code{awk} constructs. - -@item -W posix -@itemx --posix -Specifies @sc{posix} compatibility mode, in which @code{gawk} extensions -are turned off and additional restrictions apply. - -@item -W source=@var{program-text} -@itemx --source=@var{program-text} -Use @var{program-text} as @code{awk} program source code. This option allows -mixing command line source code with source code from files, and is -particularly useful for mixing command line programs with library functions. - -@item -W version -@itemx --version -Print version information for this particular copy of @code{gawk} on the error -output. This option may disappear in a future version of @code{gawk}. - -@item -- -Signal the end of options. This is useful to allow further arguments to the -@code{awk} program itself to start with a @samp{-}. This is mainly for -consistency with the argument parsing conventions of @sc{posix}. -@end table - -Any other options are flagged as invalid, but are otherwise ignored. -@xref{Command Line, ,Invoking @code{awk}}, for more details. - -@node Language Summary, Variables/Fields, Command Line Summary, Gawk Summary -@appendixsec Language Summary - -An @code{awk} program consists of a sequence of pattern-action statements -and optional function definitions. - -@example -@var{pattern} @{ @var{action statements} @} - -function @var{name}(@var{parameter list}) @{ @var{action statements} @} -@end example - -@code{gawk} first reads the program source from the -@var{program-file}(s) if specified, or from the first non-option -argument on the command line. The @samp{-f} option may be used multiple -times on the command line. @code{gawk} reads the program text from all -the @var{program-file} files, effectively concatenating them in the -order they are specified. This is useful for building libraries of -@code{awk} functions, without having to include them in each new -@code{awk} program that uses them. To use a library function in a file -from a program typed in on the command line, specify @samp{-f /dev/tty}; -then type your program, and end it with a @kbd{Control-d}. -@xref{Command Line, ,Invoking @code{awk}}.@refill - -The environment variable @code{AWKPATH} specifies a search path to use -when finding source files named with the @samp{-f} option. The default -path, which is -@samp{.:/usr/lib/awk:/usr/local/lib/awk} is used if @code{AWKPATH} is not set. -If a file name given to the @samp{-f} option contains a @samp{/} character, -no path search is performed. -@xref{AWKPATH Variable, ,The @code{AWKPATH} Environment Variable}, -for a full description of the @code{AWKPATH} environment variable.@refill - -@code{gawk} compiles the program into an internal form, and then proceeds to -read each file named in the @code{ARGV} array. If there are no files named -on the command line, @code{gawk} reads the standard input. - -If a ``file'' named on the command line has the form -@samp{@var{var}=@var{val}}, it is treated as a variable assignment: the -variable @var{var} is assigned the value @var{val}. -If any of the files have a value that is the null string, that -element in the list is skipped.@refill - -For each line in the input, @code{gawk} tests to see if it matches any -@var{pattern} in the @code{awk} program. For each pattern that the line -matches, the associated @var{action} is executed. - -@node Variables/Fields, Rules Summary, Language Summary, Gawk Summary -@appendixsec Variables and Fields - -@code{awk} variables are dynamic; they come into existence when they are -first used. Their values are either floating-point numbers or strings. -@code{awk} also has one-dimension arrays; multiple-dimensional arrays -may be simulated. There are several predefined variables that -@code{awk} sets as a program runs; these are summarized below. - -@menu -* Fields Summary:: Input field splitting. -* Built-in Summary:: @code{awk}'s built-in variables. -* Arrays Summary:: Using arrays. -* Data Type Summary:: Values in @code{awk} are numbers or strings. -@end menu - -@node Fields Summary, Built-in Summary, Variables/Fields, Variables/Fields -@appendixsubsec Fields - -As each input line is read, @code{gawk} splits the line into -@var{fields}, using the value of the @code{FS} variable as the field -separator. If @code{FS} is a single character, fields are separated by -that character. Otherwise, @code{FS} is expected to be a full regular -expression. In the special case that @code{FS} is a single blank, -fields are separated by runs of blanks and/or tabs. Note that the value -of @code{IGNORECASE} (@pxref{Case-sensitivity, ,Case-sensitivity in Matching}) -also affects how fields are split when @code{FS} is a regular expression.@refill - -Each field in the input line may be referenced by its position, @code{$1}, -@code{$2}, and so on. @code{$0} is the whole line. The value of a field may -be assigned to as well. Field numbers need not be constants: - -@example -n = 5 -print $n -@end example - -@noindent -prints the fifth field in the input line. The variable @code{NF} is set to -the total number of fields in the input line. - -References to nonexistent fields (i.e., fields after @code{$NF}) return -the null-string. However, assigning to a nonexistent field (e.g., -@code{$(NF+2) = 5}) increases the value of @code{NF}, creates any -intervening fields with the null string as their value, and causes the -value of @code{$0} to be recomputed, with the fields being separated by -the value of @code{OFS}.@refill - -@xref{Reading Files, ,Reading Input Files}, for a full description of the -way @code{awk} defines and uses fields. - -@node Built-in Summary, Arrays Summary, Fields Summary, Variables/Fields -@appendixsubsec Built-in Variables - -@code{awk}'s built-in variables are: - -@table @code -@item ARGC -The number of command line arguments (not including options or the -@code{awk} program itself). - -@item ARGIND -The index in @code{ARGV} of the current file being processed. -It is always true that @samp{FILENAME == ARGV[ARGIND]}. - -@item ARGV -The array of command line arguments. The array is indexed from 0 to -@code{ARGC} @minus{} 1. Dynamically changing the contents of @code{ARGV} -can control the files used for data.@refill - -@item CONVFMT -The conversion format to use when converting numbers to strings. - -@item FIELDWIDTHS -A space separated list of numbers describing the fixed-width input data. - -@item ENVIRON -An array containing the values of the environment variables. The array -is indexed by variable name, each element being the value of that -variable. Thus, the environment variable @code{HOME} would be in -@code{ENVIRON["HOME"]}. Its value might be @file{/u/close}. - -Changing this array does not affect the environment seen by programs -which @code{gawk} spawns via redirection or the @code{system} function. -(This may change in a future version of @code{gawk}.) - -Some operating systems do not have environment variables. -The array @code{ENVIRON} is empty when running on these systems. - -@item ERRNO -The system error message when an error occurs using @code{getline} -or @code{close}. - -@item FILENAME -The name of the current input file. If no files are specified on the command -line, the value of @code{FILENAME} is @samp{-}. - -@item FNR -The input record number in the current input file. - -@item FS -The input field separator, a blank by default. - -@item IGNORECASE -The case-sensitivity flag for regular expression operations. If -@code{IGNORECASE} has a nonzero value, then pattern matching in rules, -field splitting with @code{FS}, regular expression matching with -@samp{~} and @samp{!~}, and the @code{gsub}, @code{index}, @code{match}, -@code{split} and @code{sub} predefined functions all ignore case -when doing regular expression operations.@refill - -@item NF -The number of fields in the current input record. - -@item NR -The total number of input records seen so far. - -@item OFMT -The output format for numbers for the @code{print} statement, -@code{"%.6g"} by default. - -@item OFS -The output field separator, a blank by default. - -@item ORS -The output record separator, by default a newline. - -@item RS -The input record separator, by default a newline. @code{RS} is exceptional -in that only the first character of its string value is used for separating -records. If @code{RS} is set to the null string, then records are separated by -blank lines. When @code{RS} is set to the null string, then the newline -character always acts as a field separator, in addition to whatever value -@code{FS} may have.@refill - -@item RSTART -The index of the first character matched by @code{match}; 0 if no match. - -@item RLENGTH -The length of the string matched by @code{match}; @minus{}1 if no match. - -@item SUBSEP -The string used to separate multiple subscripts in array elements, by -default @code{"\034"}. -@end table - -@xref{Built-in Variables}, for more information. - -@node Arrays Summary, Data Type Summary, Built-in Summary, Variables/Fields -@appendixsubsec Arrays - -Arrays are subscripted with an expression between square brackets -(@samp{[} and @samp{]}). Array subscripts are @emph{always} strings; -numbers are converted to strings as necessary, following the standard -conversion rules -(@pxref{Conversion, ,Conversion of Strings and Numbers}).@refill - -If you use multiple expressions separated by commas inside the square -brackets, then the array subscript is a string consisting of the -concatenation of the individual subscript values, converted to strings, -separated by the subscript separator (the value of @code{SUBSEP}). - -The special operator @code{in} may be used in an @code{if} or -@code{while} statement to see if an array has an index consisting of a -particular value. - -@example -if (val in array) - print array[val] -@end example - -If the array has multiple subscripts, use @code{(i, j, @dots{}) in array} -to test for existence of an element. - -The @code{in} construct may also be used in a @code{for} loop to iterate -over all the elements of an array. -@xref{Scanning an Array, ,Scanning all Elements of an Array}.@refill - -An element may be deleted from an array using the @code{delete} statement. - -@xref{Arrays, ,Arrays in @code{awk}}, for more detailed information. - -@node Data Type Summary, , Arrays Summary, Variables/Fields -@appendixsubsec Data Types - -The value of an @code{awk} expression is always either a number -or a string. - -Certain contexts (such as arithmetic operators) require numeric -values. They convert strings to numbers by interpreting the text -of the string as a numeral. If the string does not look like a -numeral, it converts to 0. - -Certain contexts (such as concatenation) require string values. -They convert numbers to strings by effectively printing them -with @code{sprintf}. -@xref{Conversion, ,Conversion of Strings and Numbers}, for the details.@refill - -To force conversion of a string value to a number, simply add 0 -to it. If the value you start with is already a number, this -does not change it. - -To force conversion of a numeric value to a string, concatenate it with -the null string. - -The @code{awk} language defines comparisons as being done numerically if -both operands are numeric, or if one is numeric and the other is a numeric -string. Otherwise one or both operands are converted to strings and a -string comparison is performed. - -Uninitialized variables have the string value @code{""} (the null, or -empty, string). In contexts where a number is required, this is -equivalent to 0. - -@xref{Variables}, for more information on variable naming and initialization; -@pxref{Conversion, ,Conversion of Strings and Numbers}, for more information -on how variable values are interpreted.@refill - -@node Rules Summary, Functions Summary, Variables/Fields, Gawk Summary -@appendixsec Patterns and Actions - -@menu -* Pattern Summary:: Quick overview of patterns. -* Regexp Summary:: Quick overview of regular expressions. -* Actions Summary:: Quick overview of actions. -@end menu - -An @code{awk} program is mostly composed of rules, each consisting of a -pattern followed by an action. The action is enclosed in @samp{@{} and -@samp{@}}. Either the pattern may be missing, or the action may be -missing, but, of course, not both. If the pattern is missing, the -action is executed for every single line of input. A missing action is -equivalent to this action, - -@example -@{ print @} -@end example - -@noindent -which prints the entire line. - -Comments begin with the @samp{#} character, and continue until the end of the -line. Blank lines may be used to separate statements. Normally, a statement -ends with a newline, however, this is not the case for lines ending in a -@samp{,}, @samp{@{}, @samp{?}, @samp{:}, @samp{&&}, or @samp{||}. Lines -ending in @code{do} or @code{else} also have their statements automatically -continued on the following line. In other cases, a line can be continued by -ending it with a @samp{\}, in which case the newline is ignored.@refill - -Multiple statements may be put on one line by separating them with a @samp{;}. -This applies to both the statements within the action part of a rule (the -usual case), and to the rule statements. - -@xref{Comments, ,Comments in @code{awk} Programs}, for information on -@code{awk}'s commenting convention; -@pxref{Statements/Lines, ,@code{awk} Statements versus Lines}, for a -description of the line continuation mechanism in @code{awk}.@refill - -@node Pattern Summary, Regexp Summary, Rules Summary, Rules Summary -@appendixsubsec Patterns - -@code{awk} patterns may be one of the following: - -@example -/@var{regular expression}/ -@var{relational expression} -@var{pattern} && @var{pattern} -@var{pattern} || @var{pattern} -@var{pattern} ? @var{pattern} : @var{pattern} -(@var{pattern}) -! @var{pattern} -@var{pattern1}, @var{pattern2} -BEGIN -END -@end example - -@code{BEGIN} and @code{END} are two special kinds of patterns that are not -tested against the input. The action parts of all @code{BEGIN} rules are -merged as if all the statements had been written in a single @code{BEGIN} -rule. They are executed before any of the input is read. Similarly, all the -@code{END} rules are merged, and executed when all the input is exhausted (or -when an @code{exit} statement is executed). @code{BEGIN} and @code{END} -patterns cannot be combined with other patterns in pattern expressions. -@code{BEGIN} and @code{END} rules cannot have missing action parts.@refill - -For @samp{/@var{regular-expression}/} patterns, the associated statement is -executed for each input line that matches the regular expression. Regular -expressions are extensions of those in @code{egrep}, and are summarized below. - -A @var{relational expression} may use any of the operators defined below in -the section on actions. These generally test whether certain fields match -certain regular expressions. - -The @samp{&&}, @samp{||}, and @samp{!} operators are logical ``and,'' -logical ``or,'' and logical ``not,'' respectively, as in C. They do -short-circuit evaluation, also as in C, and are used for combining more -primitive pattern expressions. As in most languages, parentheses may be -used to change the order of evaluation. - -The @samp{?:} operator is like the same operator in C. If the first -pattern matches, then the second pattern is matched against the input -record; otherwise, the third is matched. Only one of the second and -third patterns is matched. - -The @samp{@var{pattern1}, @var{pattern2}} form of a pattern is called a -range pattern. It matches all input lines starting with a line that -matches @var{pattern1}, and continuing until a line that matches -@var{pattern2}, inclusive. A range pattern cannot be used as an operand -to any of the pattern operators. - -@xref{Patterns}, for a full description of the pattern part of @code{awk} -rules. - -@node Regexp Summary, Actions Summary, Pattern Summary, Rules Summary -@appendixsubsec Regular Expressions - -Regular expressions are the extended kind found in @code{egrep}. -They are composed of characters as follows: - -@table @code -@item @var{c} -matches the character @var{c} (assuming @var{c} is a character with no -special meaning in regexps). - -@item \@var{c} -matches the literal character @var{c}. - -@item . -matches any character except newline. - -@item ^ -matches the beginning of a line or a string. - -@item $ -matches the end of a line or a string. - -@item [@var{abc}@dots{}] -matches any of the characters @var{abc}@dots{} (character class). - -@item [^@var{abc}@dots{}] -matches any character except @var{abc}@dots{} and newline (negated -character class). - -@item @var{r1}|@var{r2} -matches either @var{r1} or @var{r2} (alternation). - -@item @var{r1r2} -matches @var{r1}, and then @var{r2} (concatenation). - -@item @var{r}+ -matches one or more @var{r}'s. - -@item @var{r}* -matches zero or more @var{r}'s. - -@item @var{r}? -matches zero or one @var{r}'s. - -@item (@var{r}) -matches @var{r} (grouping). -@end table - -@xref{Regexp, ,Regular Expressions as Patterns}, for a more detailed -explanation of regular expressions. - -The escape sequences allowed in string constants are also valid in -regular expressions (@pxref{Constants, ,Constant Expressions}). - -@node Actions Summary, , Regexp Summary, Rules Summary -@appendixsubsec Actions - -Action statements are enclosed in braces, @samp{@{} and @samp{@}}. -Action statements consist of the usual assignment, conditional, and looping -statements found in most languages. The operators, control statements, -and input/output statements available are patterned after those in C. - -@menu -* Operator Summary:: @code{awk} operators. -* Control Flow Summary:: The control statements. -* I/O Summary:: The I/O statements. -* Printf Summary:: A summary of @code{printf}. -* Special File Summary:: Special file names interpreted internally. -* Numeric Functions Summary:: Built-in numeric functions. -* String Functions Summary:: Built-in string functions. -* Time Functions Summary:: Built-in time functions. -* String Constants Summary:: Escape sequences in strings. -@end menu - -@node Operator Summary, Control Flow Summary, Actions Summary, Actions Summary -@appendixsubsubsec Operators - -The operators in @code{awk}, in order of increasing precedence, are: - -@table @code -@item = += -= *= /= %= ^= -Assignment. Both absolute assignment (@code{@var{var}=@var{value}}) -and operator assignment (the other forms) are supported. - -@item ?: -A conditional expression, as in C. This has the form @code{@var{expr1} ? -@var{expr2} : @var{expr3}}. If @var{expr1} is true, the value of the -expression is @var{expr2}; otherwise it is @var{expr3}. Only one of -@var{expr2} and @var{expr3} is evaluated.@refill - -@item || -Logical ``or''. - -@item && -Logical ``and''. - -@item ~ !~ -Regular expression match, negated match. - -@item < <= > >= != == -The usual relational operators. - -@item @var{blank} -String concatenation. - -@item + - -Addition and subtraction. - -@item * / % -Multiplication, division, and modulus. - -@item + - ! -Unary plus, unary minus, and logical negation. - -@item ^ -Exponentiation (@samp{**} may also be used, and @samp{**=} for the assignment -operator, but they are not specified in the @sc{posix} standard). - -@item ++ -- -Increment and decrement, both prefix and postfix. - -@item $ -Field reference. -@end table - -@xref{Expressions, ,Expressions as Action Statements}, for a full -description of all the operators listed above. -@xref{Fields, ,Examining Fields}, for a description of the field -reference operator.@refill - -@node Control Flow Summary, I/O Summary, Operator Summary, Actions Summary -@appendixsubsubsec Control Statements - -The control statements are as follows: - -@example -if (@var{condition}) @var{statement} @r{[} else @var{statement} @r{]} -while (@var{condition}) @var{statement} -do @var{statement} while (@var{condition}) -for (@var{expr1}; @var{expr2}; @var{expr3}) @var{statement} -for (@var{var} in @var{array}) @var{statement} -break -continue -delete @var{array}[@var{index}] -exit @r{[} @var{expression} @r{]} -@{ @var{statements} @} -@end example - -@xref{Statements, ,Control Statements in Actions}, for a full description -of all the control statements listed above. - -@node I/O Summary, Printf Summary, Control Flow Summary, Actions Summary -@appendixsubsubsec I/O Statements - -The input/output statements are as follows: - -@table @code -@item getline -Set @code{$0} from next input record; set @code{NF}, @code{NR}, @code{FNR}. - -@item getline <@var{file} -Set @code{$0} from next record of @var{file}; set @code{NF}. - -@item getline @var{var} -Set @var{var} from next input record; set @code{NF}, @code{FNR}. - -@item getline @var{var} <@var{file} -Set @var{var} from next record of @var{file}. - -@item next -Stop processing the current input record. The next input record is read and -processing starts over with the first pattern in the @code{awk} program. -If the end of the input data is reached, the @code{END} rule(s), if any, -are executed. - -@item next file -Stop processing the current input file. The next input record read comes -from the next input file. @code{FILENAME} is updated, @code{FNR} is set to 1, -and processing starts over with the first pattern in the @code{awk} program. -If the end of the input data is reached, the @code{END} rule(s), if any, -are executed. - -@item print -Prints the current record. - -@item print @var{expr-list} -Prints expressions. - -@item print @var{expr-list} > @var{file} -Prints expressions on @var{file}. - -@item printf @var{fmt, expr-list} -Format and print. - -@item printf @var{fmt, expr-list} > file -Format and print on @var{file}. -@end table - -Other input/output redirections are also allowed. For @code{print} and -@code{printf}, @samp{>> @var{file}} appends output to the @var{file}, -and @samp{| @var{command}} writes on a pipe. In a similar fashion, -@samp{@var{command} | getline} pipes input into @code{getline}. -@code{getline} returns 0 on end of file, and @minus{}1 on an error.@refill - -@xref{Getline, ,Explicit Input with @code{getline}}, for a full description -of the @code{getline} statement. -@xref{Printing, ,Printing Output}, for a full description of @code{print} and -@code{printf}. Finally, @pxref{Next Statement, ,The @code{next} Statement}, -for a description of how the @code{next} statement works.@refill - -@node Printf Summary, Special File Summary, I/O Summary, Actions Summary -@appendixsubsubsec @code{printf} Summary - -The @code{awk} @code{printf} statement and @code{sprintf} function -accept the following conversion specification formats: - -@table @code -@item %c -An ASCII character. If the argument used for @samp{%c} is numeric, it is -treated as a character and printed. Otherwise, the argument is assumed to -be a string, and the only first character of that string is printed. - -@item %d -@itemx %i -A decimal number (the integer part). - -@item %e -A floating point number of the form -@samp{@r{[}-@r{]}d.ddddddE@r{[}+-@r{]}dd}.@refill - -@item %f -A floating point number of the form -@r{[}@code{-}@r{]}@code{ddd.dddddd}. - -@item %g -Use @samp{%e} or @samp{%f} conversion, whichever produces a shorter string, -with nonsignificant zeros suppressed. - -@item %o -An unsigned octal number (again, an integer). - -@item %s -A character string. - -@item %x -An unsigned hexadecimal number (an integer). - -@item %X -Like @samp{%x}, except use @samp{A} through @samp{F} instead of @samp{a} -through @samp{f} for decimal 10 through 15.@refill - -@item %% -A single @samp{%} character; no argument is converted. -@end table - -There are optional, additional parameters that may lie between the @samp{%} -and the control letter: - -@table @code -@item - -The expression should be left-justified within its field. - -@item @var{width} -The field should be padded to this width. If @var{width} has a leading zero, -then the field is padded with zeros. Otherwise it is padded with blanks. - -@item .@var{prec} -A number indicating the maximum width of strings or digits to the right -of the decimal point. -@end table - -Either or both of the @var{width} and @var{prec} values may be specified -as @samp{*}. In that case, the particular value is taken from the argument -list. - -@xref{Printf, ,Using @code{printf} Statements for Fancier Printing}, for -examples and for a more detailed description. - -@node Special File Summary, Numeric Functions Summary, Printf Summary, Actions Summary -@appendixsubsubsec Special File Names - -When doing I/O redirection from either @code{print} or @code{printf} into a -file, or via @code{getline} from a file, @code{gawk} recognizes certain special -file names internally. These file names allow access to open file descriptors -inherited from @code{gawk}'s parent process (usually the shell). The -file names are: - -@table @file -@item /dev/stdin -The standard input. - -@item /dev/stdout -The standard output. - -@item /dev/stderr -The standard error output. - -@item /dev/fd/@var{n} -The file denoted by the open file descriptor @var{n}. -@end table - -In addition the following files provide process related information -about the running @code{gawk} program. - -@table @file -@item /dev/pid -Reading this file returns the process ID of the current process, -in decimal, terminated with a newline. - -@item /dev/ppid -Reading this file returns the parent process ID of the current process, -in decimal, terminated with a newline. - -@item /dev/pgrpid -Reading this file returns the process group ID of the current process, -in decimal, terminated with a newline. - -@item /dev/user -Reading this file returns a single record terminated with a newline. -The fields are separated with blanks. The fields represent the -following information: - -@table @code -@item $1 -The value of the @code{getuid} system call. - -@item $2 -The value of the @code{geteuid} system call. - -@item $3 -The value of the @code{getgid} system call. - -@item $4 -The value of the @code{getegid} system call. -@end table - -If there are any additional fields, they are the group IDs returned by -@code{getgroups} system call. -(Multiple groups may not be supported on all systems.)@refill -@end table - -@noindent -These file names may also be used on the command line to name data files. -These file names are only recognized internally if you do not -actually have files by these names on your system. - -@xref{Special Files, ,Standard I/O Streams}, for a longer description that -provides the motivation for this feature. - -@node Numeric Functions Summary, String Functions Summary, Special File Summary, Actions Summary -@appendixsubsubsec Numeric Functions - -@code{awk} has the following predefined arithmetic functions: - -@table @code -@item atan2(@var{y}, @var{x}) -returns the arctangent of @var{y/x} in radians. - -@item cos(@var{expr}) -returns the cosine in radians. - -@item exp(@var{expr}) -the exponential function. - -@item int(@var{expr}) -truncates to integer. - -@item log(@var{expr}) -the natural logarithm function. - -@item rand() -returns a random number between 0 and 1. - -@item sin(@var{expr}) -returns the sine in radians. - -@item sqrt(@var{expr}) -the square root function. - -@item srand(@var{expr}) -use @var{expr} as a new seed for the random number generator. If no @var{expr} -is provided, the time of day is used. The return value is the previous -seed for the random number generator. -@end table - -@node String Functions Summary, Time Functions Summary, Numeric Functions Summary, Actions Summary -@appendixsubsubsec String Functions - -@code{awk} has the following predefined string functions: - -@table @code -@item gsub(@var{r}, @var{s}, @var{t}) -for each substring matching the regular expression @var{r} in the string -@var{t}, substitute the string @var{s}, and return the number of substitutions. -If @var{t} is not supplied, use @code{$0}. - -@item index(@var{s}, @var{t}) -returns the index of the string @var{t} in the string @var{s}, or 0 if -@var{t} is not present. - -@item length(@var{s}) -returns the length of the string @var{s}. The length of @code{$0} -is returned if no argument is supplied. - -@item match(@var{s}, @var{r}) -returns the position in @var{s} where the regular expression @var{r} -occurs, or 0 if @var{r} is not present, and sets the values of @code{RSTART} -and @code{RLENGTH}. - -@item split(@var{s}, @var{a}, @var{r}) -splits the string @var{s} into the array @var{a} on the regular expression -@var{r}, and returns the number of fields. If @var{r} is omitted, @code{FS} -is used instead. - -@item sprintf(@var{fmt}, @var{expr-list}) -prints @var{expr-list} according to @var{fmt}, and returns the resulting string. - -@item sub(@var{r}, @var{s}, @var{t}) -this is just like @code{gsub}, but only the first matching substring is -replaced. - -@item substr(@var{s}, @var{i}, @var{n}) -returns the @var{n}-character substring of @var{s} starting at @var{i}. -If @var{n} is omitted, the rest of @var{s} is used. - -@item tolower(@var{str}) -returns a copy of the string @var{str}, with all the upper-case characters in -@var{str} translated to their corresponding lower-case counterparts. -Nonalphabetic characters are left unchanged. - -@item toupper(@var{str}) -returns a copy of the string @var{str}, with all the lower-case characters in -@var{str} translated to their corresponding upper-case counterparts. -Nonalphabetic characters are left unchanged. - -@item system(@var{cmd-line}) -Execute the command @var{cmd-line}, and return the exit status. -@end table - -@node Time Functions Summary, String Constants Summary, String Functions Summary, Actions Summary -@appendixsubsubsec Built-in time functions - -The following two functions are available for getting the current -time of day, and for formatting time stamps. - -@table @code -@item systime() -returns the current time of day as the number of seconds since a particular -epoch (Midnight, January 1, 1970 @sc{utc}, on @sc{posix} systems). - -@item strftime(@var{format}, @var{timestamp}) -formats @var{timestamp} according to the specification in @var{format}. -The current time of day is used if no @var{timestamp} is supplied. -@xref{Time Functions, ,Functions for Dealing with Time Stamps}, for the -details on the conversion specifiers that @code{strftime} accepts.@refill -@end table - -@iftex -@xref{Built-in, ,Built-in Functions}, for a description of all of -@code{awk}'s built-in functions. -@end iftex - -@node String Constants Summary, , Time Functions Summary, Actions Summary -@appendixsubsubsec String Constants - -String constants in @code{awk} are sequences of characters enclosed -between double quotes (@code{"}). Within strings, certain @dfn{escape sequences} -are recognized, as in C. These are: - -@table @code -@item \\ -A literal backslash. - -@item \a -The ``alert'' character; usually the ASCII BEL character. - -@item \b -Backspace. - -@item \f -Formfeed. - -@item \n -Newline. - -@item \r -Carriage return. - -@item \t -Horizontal tab. - -@item \v -Vertical tab. - -@item \x@var{hex digits} -The character represented by the string of hexadecimal digits following -the @samp{\x}. As in @sc{ansi} C, all following hexadecimal digits are -considered part of the escape sequence. (This feature should tell us -something about language design by committee.) E.g., @code{"\x1B"} is a -string containing the ASCII ESC (escape) character. (The @samp{\x} -escape sequence is not in @sc{posix} @code{awk}.) - -@item \@var{ddd} -The character represented by the 1-, 2-, or 3-digit sequence of octal -digits. Thus, @code{"\033"} is also a string containing the ASCII ESC -(escape) character. - -@item \@var{c} -The literal character @var{c}. -@end table - -The escape sequences may also be used inside constant regular expressions -(e.g., the regexp @code{@w{/[@ \t\f\n\r\v]/}} matches whitespace -characters).@refill - -@xref{Constants, ,Constant Expressions}. - -@node Functions Summary, Historical Features, Rules Summary, Gawk Summary -@appendixsec Functions - -Functions in @code{awk} are defined as follows: - -@example -function @var{name}(@var{parameter list}) @{ @var{statements} @} -@end example - -Actual parameters supplied in the function call are used to instantiate -the formal parameters declared in the function. Arrays are passed by -reference, other variables are passed by value. - -If there are fewer arguments passed than there are names in @var{parameter-list}, -the extra names are given the null string as value. Extra names have the -effect of local variables. - -The open-parenthesis in a function call of a user-defined function must -immediately follow the function name, without any intervening white space. -This is to avoid a syntactic ambiguity with the concatenation operator. - -The word @code{func} may be used in place of @code{function} (but not in -@sc{posix} @code{awk}). - -Use the @code{return} statement to return a value from a function. - -@xref{User-defined, ,User-defined Functions}, for a more complete description. - -@node Historical Features, , Functions Summary, Gawk Summary -@appendixsec Historical Features - -There are two features of historical @code{awk} implementations that -@code{gawk} supports. First, it is possible to call the @code{length} -built-in function not only with no arguments, but even without parentheses! - -@example -a = length -@end example - -@noindent -is the same as either of - -@example -a = length() -a = length($0) -@end example - -@noindent -This feature is marked as ``deprecated'' in the @sc{posix} standard, and -@code{gawk} will issue a warning about its use if @samp{-W lint} is -specified on the command line. - -The other feature is the use of the @code{continue} statement outside the -body of a @code{while}, @code{for}, or @code{do} loop. Traditional -@code{awk} implementations have treated such usage as equivalent to the -@code{next} statement. @code{gawk} will support this usage if @samp{-W posix} -has not been specified. - -@node Sample Program, Bugs, Gawk Summary, Top -@appendix Sample Program - -The following example is a complete @code{awk} program, which prints -the number of occurrences of each word in its input. It illustrates the -associative nature of @code{awk} arrays by using strings as subscripts. It -also demonstrates the @samp{for @var{x} in @var{array}} construction. -Finally, it shows how @code{awk} can be used in conjunction with other -utility programs to do a useful task of some complexity with a minimum of -effort. Some explanations follow the program listing.@refill - -@example -awk ' -# Print list of word frequencies -@{ - for (i = 1; i <= NF; i++) - freq[$i]++ -@} - -END @{ - for (word in freq) - printf "%s\t%d\n", word, freq[word] -@}' -@end example - -The first thing to notice about this program is that it has two rules. The -first rule, because it has an empty pattern, is executed on every line of -the input. It uses @code{awk}'s field-accessing mechanism -(@pxref{Fields, ,Examining Fields}) to pick out the individual words from -the line, and the built-in variable @code{NF} (@pxref{Built-in Variables}) -to know how many fields are available.@refill - -For each input word, an element of the array @code{freq} is incremented to -reflect that the word has been seen an additional time.@refill - -The second rule, because it has the pattern @code{END}, is not executed -until the input has been exhausted. It prints out the contents of the -@code{freq} table that has been built up inside the first action.@refill - -Note that this program has several problems that would prevent it from being -useful by itself on real text files:@refill - -@itemize @bullet -@item -Words are detected using the @code{awk} convention that fields are -separated by whitespace and that other characters in the input (except -newlines) don't have any special meaning to @code{awk}. This means that -punctuation characters count as part of words.@refill - -@item -The @code{awk} language considers upper and lower case characters to be -distinct. Therefore, @samp{foo} and @samp{Foo} are not treated by this -program as the same word. This is undesirable since in normal text, words -are capitalized if they begin sentences, and a frequency analyzer should not -be sensitive to that.@refill - -@item -The output does not come out in any useful order. You're more likely to be -interested in which words occur most frequently, or having an alphabetized -table of how frequently each word occurs.@refill -@end itemize - -The way to solve these problems is to use some of the more advanced -features of the @code{awk} language. First, we use @code{tolower} to remove -case distinctions. Next, we use @code{gsub} to remove punctuation -characters. Finally, we use the system @code{sort} utility to process the -output of the @code{awk} script. First, here is the new version of -the program:@refill - -@example -awk ' -# Print list of word frequencies -@{ - $0 = tolower($0) # remove case distinctions - gsub(/[^a-z0-9_ \t]/, "", $0) # remove punctuation - for (i = 1; i <= NF; i++) - freq[$i]++ -@} - -END @{ - for (word in freq) - printf "%s\t%d\n", word, freq[word] -@}' -@end example - -Assuming we have saved this program in a file named @file{frequency.awk}, -and that the data is in @file{file1}, the following pipeline - -@example -awk -f frequency.awk file1 | sort +1 -nr -@end example - -@noindent -produces a table of the words appearing in @file{file1} in order of -decreasing frequency. - -The @code{awk} program suitably massages the data and produces a word -frequency table, which is not ordered. - -The @code{awk} script's output is then sorted by the @code{sort} command and -printed on the terminal. The options given to @code{sort} in this example -specify to sort using the second field of each input line (skipping one field), -that the sort keys should be treated as numeric quantities (otherwise -@samp{15} would come before @samp{5}), and that the sorting should be done -in descending (reverse) order.@refill - -We could have even done the @code{sort} from within the program, by -changing the @code{END} action to: - -@example -END @{ - sort = "sort +1 -nr" - for (word in freq) - printf "%s\t%d\n", word, freq[word] | sort - close(sort) -@}' -@end example - -See the general operating system documentation for more information on how -to use the @code{sort} command.@refill - -@ignore -@strong{ADR: I have some more substantial programs courtesy of Rick Adams -at UUNET. I am planning on incorporating those either in addition to or -instead of this program.} - -@strong{I would also like to incorporate the general @code{translate} -function that I have written.} - -@strong{I have a ton of other sample programs to include too.} -@end ignore - -@node Bugs, Notes, Sample Program, Top -@appendix Reporting Problems and Bugs - -@c This chapter stolen shamelessly from the GNU m4 manual. -@c This chapter has been unshamelessly altered to emulate changes made to -@c make.texi from whence it was originally shamelessly stolen! :-} --mew - -If you have problems with @code{gawk} or think that you have found a bug, -please report it to the developers; we cannot promise to do anything -but we might well want to fix it. - -Before reporting a bug, make sure you have actually found a real bug. -Carefully reread the documentation and see if it really says you can do -what you're trying to do. If it's not clear whether you should be able -to do something or not, report that too; it's a bug in the documentation! - -Before reporting a bug or trying to fix it yourself, try to isolate it -to the smallest possible @code{awk} program and input data file that -reproduces the problem. Then send us the program and data file, -some idea of what kind of Unix system you're using, and the exact results -@code{gawk} gave you. Also say what you expected to occur; this will help -us decide whether the problem was really in the documentation. - -Once you have a precise problem, send e-mail to (Internet) -@samp{bug-gnu-utils@@prep.ai.mit.edu} or (UUCP) -@samp{mit-eddie!prep.ai.mit.edu!bug-gnu-utils}. Please include the -version number of @code{gawk} you are using. You can get this information -with the command @samp{gawk -W version '@{@}' /dev/null}. -You should send carbon copies of your mail to David Trueman at -@samp{david@@cs.dal.ca}, and to Arnold Robbins, who can be reached at -@samp{arnold@@skeeve.atl.ga.us}. David is most likely to fix code -problems, while Arnold is most likely to fix documentation problems.@refill - -Non-bug suggestions are always welcome as well. If you have questions -about things that are unclear in the documentation or are just obscure -features, ask Arnold Robbins; he will try to help you out, although he -may not have the time to fix the problem. You can send him electronic mail at the Internet address -above. - -If you find bugs in one of the non-Unix ports of @code{gawk}, please send -an electronic mail message to the person who maintains that port. They -are listed below, and also in the @file{README} file in the @code{gawk} -distribution. Information in the @code{README} file should be considered -authoritative if it conflicts with this manual. - -The people maintaining the non-Unix ports of @code{gawk} are: - -@table @asis -@item MS-DOS -The port to MS-DOS is maintained by Scott Deifik. -His electronic mail address is @samp{scottd@@amgen.com}. - -@item VMS -The port to VAX VMS is maintained by Pat Rankin. -His electronic mail address is @samp{rankin@@eql.caltech.edu}. - -@item Atari ST -The port to the Atari ST is maintained by Michal Jaegermann. -His electronic mail address is @samp{ntomczak@@vm.ucs.ualberta.ca}. - -@end table - -If your bug is also reproducible under Unix, please send copies of your -report to the general GNU bug list, as well as to Arnold Robbins and David -Trueman, at the addresses listed above. - -@node Notes, Glossary, Bugs, Top -@appendix Implementation Notes - -This appendix contains information mainly of interest to implementors and -maintainers of @code{gawk}. Everything in it applies specifically to -@code{gawk}, and not to other implementations. - -@menu -* Compatibility Mode:: How to disable certain @code{gawk} extensions. -* Future Extensions:: New features we may implement soon. -* Improvements:: Suggestions for improvements by volunteers. -@end menu - -@node Compatibility Mode, Future Extensions, Notes, Notes -@appendixsec Downward Compatibility and Debugging - -@xref{POSIX/GNU, ,Extensions in @code{gawk} not in POSIX @code{awk}}, -for a summary of the GNU extensions to the @code{awk} language and program. -All of these features can be turned off by invoking @code{gawk} with the -@samp{-W compat} option, or with the @samp{-W posix} option.@refill - -If @code{gawk} is compiled for debugging with @samp{-DDEBUG}, then there -is one more option available on the command line: - -@table @samp -@item -W parsedebug -Print out the parse stack information as the program is being parsed. -@end table - -This option is intended only for serious @code{gawk} developers, -and not for the casual user. It probably has not even been compiled into -your version of @code{gawk}, since it slows down execution. - -@node Future Extensions, Improvements, Compatibility Mode, Notes -@appendixsec Probable Future Extensions - -This section briefly lists extensions that indicate the directions we are -currently considering for @code{gawk}. The file @file{FUTURES} in the -@code{gawk} distributions lists these extensions, as well as several others. - -@table @asis -@item @code{RS} as a regexp -The meaning of @code{RS} may be generalized along the lines of @code{FS}. - -@item Control of subprocess environment -Changes made in @code{gawk} to the array @code{ENVIRON} may be -propagated to subprocesses run by @code{gawk}. - -@item Databases -It may be possible to map a GDBM/NDBM/SDBM file into an @code{awk} array. - -@item Single-character fields -The null string, @code{""}, as a field separator, will cause field -splitting and the @code{split} function to separate individual characters. -Thus, @code{split(a, "abcd", "")} would yield @code{a[1] == "a"}, -@code{a[2] == "b"}, and so on. - -@item More @code{lint} warnings -There are more things that could be checked for portability. - -@item @code{RECLEN} variable for fixed length records -Along with @code{FIELDWIDTHS}, this would speed up the processing of -fixed-length records. - -@item @code{RT} variable to hold the record terminator -It is occasionally useful to have access to the actual string of -characters that matched the @code{RS} variable. The @code{RT} -variable would hold these characters. - -@item A @code{restart} keyword -After modifying @code{$0}, @code{restart} would restart the pattern -matching loop, without reading a new record from the input. - -@item A @samp{|&} redirection -The @samp{|&} redirection, in place of @samp{|}, would open a two-way -pipeline for communication with a sub-process (via @code{getline} and -@code{print} and @code{printf}). - -@item @code{IGNORECASE} affecting all comparisons -The effects of the @code{IGNORECASE} variable may be generalized to -all string comparisons, and not just regular expression operations. - -@item A way to mix command line source code and library files -There may be a new option that would make it possible to easily use library -functions from a program entered on the command line. -@c probably a @samp{-s} option... - -@item GNU-style long options -We will add GNU-style long options -to @code{gawk} for compatibility with other GNU programs. -(For example, @samp{--field-separator=:} would be equivalent to -@samp{-F:}.)@refill - -@c this is @emph{very} long term --- not worth including right now. -@ignore -@item The C Comma Operator -We may add the C comma operator, which takes the form -@code{@var{expr1},@var{expr2}}. The first expression is evaluated, and the -result is thrown away. The value of the full expression is the value of -@var{expr2}.@refill -@end ignore -@end table - -@node Improvements, , Future Extensions, Notes -@appendixsec Suggestions for Improvements - -Here are some projects that would-be @code{gawk} hackers might like to take -on. They vary in size from a few days to a few weeks of programming, -depending on which one you choose and how fast a programmer you are. Please -send any improvements you write to the maintainers at the GNU -project.@refill - -@enumerate -@item -Compilation of @code{awk} programs: @code{gawk} uses a Bison (YACC-like) -parser to convert the script given it into a syntax tree; the syntax -tree is then executed by a simple recursive evaluator. This method incurs -a lot of overhead, since the recursive evaluator performs many procedure -calls to do even the simplest things.@refill - -It should be possible for @code{gawk} to convert the script's parse tree -into a C program which the user would then compile, using the normal -C compiler and a special @code{gawk} library to provide all the needed -functions (regexps, fields, associative arrays, type coercion, and so -on).@refill - -An easier possibility might be for an intermediate phase of @code{awk} to -convert the parse tree into a linear byte code form like the one used -in GNU Emacs Lisp. The recursive evaluator would then be replaced by -a straight line byte code interpreter that would be intermediate in speed -between running a compiled program and doing what @code{gawk} does -now.@refill - -This may actually happen for the 3.0 version of @code{gawk}. - -@item -An error message section has not been included in this version of the -manual. Perhaps some nice beta testers will document some of the messages -for the future. - -@item -The programs in the test suite could use documenting in this manual. - -@item -The programs and data files in the manual should be available in -separate files to facilitate experimentation. - -@item -See the @file{FUTURES} file for more ideas. Contact us if you would -seriously like to tackle any of the items listed there. -@end enumerate - -@node Glossary, Index, Notes, Top -@appendix Glossary - -@table @asis -@item Action -A series of @code{awk} statements attached to a rule. If the rule's -pattern matches an input record, the @code{awk} language executes the -rule's action. Actions are always enclosed in curly braces. -@xref{Actions, ,Overview of Actions}.@refill - -@item Amazing @code{awk} Assembler -Henry Spencer at the University of Toronto wrote a retargetable assembler -completely as @code{awk} scripts. It is thousands of lines long, including -machine descriptions for several 8-bit microcomputers. -@c It is distributed with @code{gawk} (as part of the test suite) and -It is a good example of a -program that would have been better written in another language.@refill - -@item @sc{ansi} -The American National Standards Institute. This organization produces -many standards, among them the standard for the C programming language. - -@item Assignment -An @code{awk} expression that changes the value of some @code{awk} -variable or data object. An object that you can assign to is called an -@dfn{lvalue}. @xref{Assignment Ops, ,Assignment Expressions}.@refill - -@item @code{awk} Language -The language in which @code{awk} programs are written. - -@item @code{awk} Program -An @code{awk} program consists of a series of @dfn{patterns} and -@dfn{actions}, collectively known as @dfn{rules}. For each input record -given to the program, the program's rules are all processed in turn. -@code{awk} programs may also contain function definitions.@refill - -@item @code{awk} Script -Another name for an @code{awk} program. - -@item Built-in Function -The @code{awk} language provides built-in functions that perform various -numerical, time stamp related, and string computations. Examples are -@code{sqrt} (for the square root of a number) and @code{substr} (for a -substring of a string). @xref{Built-in, ,Built-in Functions}.@refill - -@item Built-in Variable -@code{ARGC}, @code{ARGIND}, @code{ARGV}, @code{CONVFMT}, @code{ENVIRON}, -@code{ERRNO}, @code{FIELDWIDTHS}, @code{FILENAME}, @code{FNR}, @code{FS}, -@code{IGNORECASE}, @code{NF}, @code{NR}, @code{OFMT}, @code{OFS}, @code{ORS}, -@code{RLENGTH}, @code{RSTART}, @code{RS}, and @code{SUBSEP}, -are the variables that have special -meaning to @code{awk}. Changing some of them affects @code{awk}'s running -environment. @xref{Built-in Variables}.@refill - -@item Braces -See ``Curly Braces.'' - -@item C -The system programming language that most GNU software is written in. The -@code{awk} programming language has C-like syntax, and this manual -points out similarities between @code{awk} and C when appropriate.@refill - -@item CHEM -A preprocessor for @code{pic} that reads descriptions of molecules -and produces @code{pic} input for drawing them. It was written by -Brian Kernighan, and is available from @code{netlib@@research.att.com}.@refill - -@item Compound Statement -A series of @code{awk} statements, enclosed in curly braces. Compound -statements may be nested. -@xref{Statements, ,Control Statements in Actions}.@refill - -@item Concatenation -Concatenating two strings means sticking them together, one after another, -giving a new string. For example, the string @samp{foo} concatenated with -the string @samp{bar} gives the string @samp{foobar}. -@xref{Concatenation, ,String Concatenation}.@refill - -@item Conditional Expression -An expression using the @samp{?:} ternary operator, such as -@code{@var{expr1} ? @var{expr2} : @var{expr3}}. The expression -@var{expr1} is evaluated; if the result is true, the value of the whole -expression is the value of @var{expr2} otherwise the value is -@var{expr3}. In either case, only one of @var{expr2} and @var{expr3} -is evaluated. @xref{Conditional Exp, ,Conditional Expressions}.@refill - -@item Constant Regular Expression -A constant regular expression is a regular expression written within -slashes, such as @samp{/foo/}. This regular expression is chosen -when you write the @code{awk} program, and cannot be changed doing -its execution. @xref{Regexp Usage, ,How to Use Regular Expressions}. - -@item Comparison Expression -A relation that is either true or false, such as @code{(a < b)}. -Comparison expressions are used in @code{if}, @code{while}, and @code{for} -statements, and in patterns to select which input records to process. -@xref{Comparison Ops, ,Comparison Expressions}.@refill - -@item Curly Braces -The characters @samp{@{} and @samp{@}}. Curly braces are used in -@code{awk} for delimiting actions, compound statements, and function -bodies.@refill - -@item Data Objects -These are numbers and strings of characters. Numbers are converted into -strings and vice versa, as needed. -@xref{Conversion, ,Conversion of Strings and Numbers}.@refill - -@item Dynamic Regular Expression -A dynamic regular expression is a regular expression written as an -ordinary expression. It could be a string constant, such as -@code{"foo"}, but it may also be an expression whose value may vary. -@xref{Regexp Usage, ,How to Use Regular Expressions}. - -@item Escape Sequences -A special sequence of characters used for describing nonprinting -characters, such as @samp{\n} for newline, or @samp{\033} for the ASCII -ESC (escape) character. @xref{Constants, ,Constant Expressions}. - -@item Field -When @code{awk} reads an input record, it splits the record into pieces -separated by whitespace (or by a separator regexp which you can -change by setting the built-in variable @code{FS}). Such pieces are -called fields. If the pieces are of fixed length, you can use the built-in -variable @code{FIELDWIDTHS} to describe their lengths. -@xref{Records, ,How Input is Split into Records}.@refill - -@item Format -Format strings are used to control the appearance of output in the -@code{printf} statement. Also, data conversions from numbers to strings -are controlled by the format string contained in the built-in variable -@code{CONVFMT}. @xref{Control Letters, ,Format-Control Letters}.@refill - -@item Function -A specialized group of statements often used to encapsulate general -or program-specific tasks. @code{awk} has a number of built-in -functions, and also allows you to define your own. -@xref{Built-in, ,Built-in Functions}. -Also, see @ref{User-defined, ,User-defined Functions}.@refill - -@item @code{gawk} -The GNU implementation of @code{awk}. - -@item GNU -``GNU's not Unix''. An on-going project of the Free Software Foundation -to create a complete, freely distributable, @sc{posix}-compliant computing -environment. - -@item Input Record -A single chunk of data read in by @code{awk}. Usually, an @code{awk} input -record consists of one line of text. -@xref{Records, ,How Input is Split into Records}.@refill - -@item Keyword -In the @code{awk} language, a keyword is a word that has special -meaning. Keywords are reserved and may not be used as variable names. - -@code{awk}'s keywords are: -@code{if}, -@code{else}, -@code{while}, -@code{do@dots{}while}, -@code{for}, -@code{for@dots{}in}, -@code{break}, -@code{continue}, -@code{delete}, -@code{next}, -@code{function}, -@code{func}, -and @code{exit}.@refill - -@item Lvalue -An expression that can appear on the left side of an assignment -operator. In most languages, lvalues can be variables or array -elements. In @code{awk}, a field designator can also be used as an -lvalue.@refill - -@item Number -A numeric valued data object. The @code{gawk} implementation uses double -precision floating point to represent numbers.@refill - -@item Pattern -Patterns tell @code{awk} which input records are interesting to which -rules. - -A pattern is an arbitrary conditional expression against which input is -tested. If the condition is satisfied, the pattern is said to @dfn{match} -the input record. A typical pattern might compare the input record against -a regular expression. @xref{Patterns}.@refill - -@item @sc{posix} -The name for a series of standards being developed by the @sc{ieee} -that specify a Portable Operating System interface. The ``IX'' denotes -the Unix heritage of these standards. The main standard of interest for -@code{awk} users is P1003.2, the Command Language and Utilities standard. - -@item Range (of input lines) -A sequence of consecutive lines from the input file. A pattern -can specify ranges of input lines for @code{awk} to process, or it can -specify single lines. @xref{Patterns}.@refill - -@item Recursion -When a function calls itself, either directly or indirectly. -If this isn't clear, refer to the entry for ``recursion.'' - -@item Redirection -Redirection means performing input from other than the standard input -stream, or output to other than the standard output stream. - -You can redirect the output of the @code{print} and @code{printf} statements -to a file or a system command, using the @samp{>}, @samp{>>}, and @samp{|} -operators. You can redirect input to the @code{getline} statement using -the @samp{<} and @samp{|} operators. -@xref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}.@refill - -@item Regular Expression -See ``regexp.'' - -@item Regexp -Short for @dfn{regular expression}. A regexp is a pattern that denotes a -set of strings, possibly an infinite set. For example, the regexp -@samp{R.*xp} matches any string starting with the letter @samp{R} -and ending with the letters @samp{xp}. In @code{awk}, regexps are -used in patterns and in conditional expressions. Regexps may contain -escape sequences. @xref{Regexp, ,Regular Expressions as Patterns}.@refill - -@item Rule -A segment of an @code{awk} program, that specifies how to process single -input records. A rule consists of a @dfn{pattern} and an @dfn{action}. -@code{awk} reads an input record; then, for each rule, if the input record -satisfies the rule's pattern, @code{awk} executes the rule's action. -Otherwise, the rule does nothing for that input record.@refill - -@item Side Effect -A side effect occurs when an expression has an effect aside from merely -producing a value. Assignment expressions, increment expressions and -function calls have side effects. @xref{Assignment Ops, ,Assignment Expressions}. - -@item Special File -A file name interpreted internally by @code{gawk}, instead of being handed -directly to the underlying operating system. For example, @file{/dev/stdin}. -@xref{Special Files, ,Standard I/O Streams}. - -@item Stream Editor -A program that reads records from an input stream and processes them one -or more at a time. This is in contrast with batch programs, which may -expect to read their input files in entirety before starting to do -anything, and with interactive programs, which require input from the -user.@refill - -@item String -A datum consisting of a sequence of characters, such as @samp{I am a -string}. Constant strings are written with double-quotes in the -@code{awk} language, and may contain escape sequences. -@xref{Constants, ,Constant Expressions}. - -@item Whitespace -A sequence of blank or tab characters occurring inside an input record or a -string.@refill -@end table - -@node Index, , Glossary, Top -@unnumbered Index -@printindex cp - -@summarycontents -@contents -@bye - -Unresolved Issues: ------------------- -1. From: ntomczak@vm.ucs.ualberta.ca (Michal Jaegermann) - Examples of usage tend to suggest that /../ and ".." delimiters - can be used for regular expressions, even if definition is consistently - using /../. I am not sure what the real rules are and in particular - what of the following is a bug and what is a feature: - # This program matches everything - '"\(" { print }' - # This one complains about mismatched parenthesis - '$0 ~ "\(" { print }' - # This one behaves in an expected manner - '/\(/ { print }' - You may also try to use "\(" as an argument to match() to see what - will happen. - -2. From ADR. - - The posix (and original Unix!) notion of awk values as both number - and string values needs to be put into the manual. This involves - major and minor rewrites of most of the manual, but should help in - clarifying many of the weirder points of the language. - -3. From ADR. - - The manual should be reorganized. Expressions should be introduced - early, building up to regexps as expressions, and from there to their - use as patterns and then in actions. Built-in vars should come earlier - in the manual too. The 'expert info' sections marked with comments - should get their own sections or subsections with nodes and titles. - The manual should be gone over thoroughly for indexing. - -4. From ADR. - - Robert J. Chassell points out that awk programs should have some indication - of how to use them. It would be useful to perhaps have a "programming - style" section of the manual that would include this and other tips. - -5. From ADR in response to moraes@uunet.ca - (This would make the beginnings of a good "puzzles" section...) - - Date: Mon, 2 Dec 91 10:08:05 EST - From: gatech!cc!arnold (Arnold Robbins) - To: cs.dal.ca!david, uunet.ca!moraes - Subject: redirecting to /dev/stderr - Cc: skeeve!arnold, boeing.com!brennan, research.att.com!bwk - - In 2.13.3 the following program no longer dumps core: - - BEGIN { print "hello" > /dev/stderr ; exit(1) } - - Instead, it creates a file named `0' with the word `hello' in it. AWK - semantics strikes again. The meaning of the statement is - - print "hello" > (($0 ~ /dev/) stderr) - - /dev/ tests $0 for the pattern `dev'. This yields a 0. The variable stderr, - having never been used, has a null string in it. The concatenation yields - a string value of "0" which is used as the file name. Sigh. - - I think with some more time I can come up with a decent fix, but it will - probably only print a diagnostic with -Wlint. - - Arnold - diff --git a/gnu/usr.bin/awk/eval.c b/gnu/usr.bin/awk/eval.c deleted file mode 100644 index ccf4671..0000000 --- a/gnu/usr.bin/awk/eval.c +++ /dev/null @@ -1,1260 +0,0 @@ -/* - * eval.c - gawk parse tree interpreter - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include "awk.h" - -extern double pow P((double x, double y)); -extern double modf P((double x, double *yp)); -extern double fmod P((double x, double y)); - -static int eval_condition P((NODE *tree)); -static NODE *op_assign P((NODE *tree)); -static NODE *func_call P((NODE *name, NODE *arg_list)); -static NODE *match_op P((NODE *tree)); - -NODE *_t; /* used as a temporary in macros */ -#ifdef MSDOS -double _msc51bug; /* to get around a bug in MSC 5.1 */ -#endif -NODE *ret_node; -int OFSlen; -int ORSlen; -int OFMTidx; -int CONVFMTidx; - -/* Macros and variables to save and restore function and loop bindings */ -/* - * the val variable allows return/continue/break-out-of-context to be - * caught and diagnosed - */ -#define PUSH_BINDING(stack, x, val) (memcpy ((char *)(stack), (char *)(x), sizeof (jmp_buf)), val++) -#define RESTORE_BINDING(stack, x, val) (memcpy ((char *)(x), (char *)(stack), sizeof (jmp_buf)), val--) - -static jmp_buf loop_tag; /* always the current binding */ -static int loop_tag_valid = 0; /* nonzero when loop_tag valid */ -static int func_tag_valid = 0; -static jmp_buf func_tag; -extern int exiting, exit_val; - -/* - * This table is used by the regexp routines to do case independant - * matching. Basically, every ascii character maps to itself, except - * uppercase letters map to lower case ones. This table has 256 - * entries, which may be overkill. Note also that if the system this - * is compiled on doesn't use 7-bit ascii, casetable[] should not be - * defined to the linker, so gawk should not load. - * - * Do NOT make this array static, it is used in several spots, not - * just in this file. - */ -#if 'a' == 97 /* it's ascii */ -char casetable[] = { - '\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007', - '\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017', - '\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027', - '\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037', - /* ' ' '!' '"' '#' '$' '%' '&' ''' */ - '\040', '\041', '\042', '\043', '\044', '\045', '\046', '\047', - /* '(' ')' '*' '+' ',' '-' '.' '/' */ - '\050', '\051', '\052', '\053', '\054', '\055', '\056', '\057', - /* '0' '1' '2' '3' '4' '5' '6' '7' */ - '\060', '\061', '\062', '\063', '\064', '\065', '\066', '\067', - /* '8' '9' ':' ';' '<' '=' '>' '?' */ - '\070', '\071', '\072', '\073', '\074', '\075', '\076', '\077', - /* '@' 'A' 'B' 'C' 'D' 'E' 'F' 'G' */ - '\100', '\141', '\142', '\143', '\144', '\145', '\146', '\147', - /* 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' */ - '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157', - /* 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' */ - '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167', - /* 'X' 'Y' 'Z' '[' '\' ']' '^' '_' */ - '\170', '\171', '\172', '\133', '\134', '\135', '\136', '\137', - /* '`' 'a' 'b' 'c' 'd' 'e' 'f' 'g' */ - '\140', '\141', '\142', '\143', '\144', '\145', '\146', '\147', - /* 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' */ - '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157', - /* 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' */ - '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167', - /* 'x' 'y' 'z' '{' '|' '}' '~' */ - '\170', '\171', '\172', '\173', '\174', '\175', '\176', '\177', - '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207', - '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217', - '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227', - '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237', - '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247', - '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257', - '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267', - '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277', - '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307', - '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317', - '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327', - '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337', - '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347', - '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357', - '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367', - '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377', -}; -#else -#include "You lose. You will need a translation table for your character set." -#endif - -/* - * Tree is a bunch of rules to run. Returns zero if it hit an exit() - * statement - */ -int -interpret(tree) -register NODE *volatile tree; -{ - jmp_buf volatile loop_tag_stack; /* shallow binding stack for loop_tag */ - static jmp_buf rule_tag; /* tag the rule currently being run, for NEXT - * and EXIT statements. It is static because - * there are no nested rules */ - register NODE *volatile t = NULL; /* temporary */ - NODE **volatile lhs; /* lhs == Left Hand Side for assigns, etc */ - NODE *volatile stable_tree; - int volatile traverse = 1; /* True => loop thru tree (Node_rule_list) */ - - /* avoid false source indications */ - source = NULL; - sourceline = 0; - - if (tree == NULL) - return 1; - sourceline = tree->source_line; - source = tree->source_file; - switch (tree->type) { - case Node_rule_node: - traverse = 0; /* False => one for-loop iteration only */ - /* FALL THROUGH */ - case Node_rule_list: - for (t = tree; t != NULL; t = t->rnode) { - if (traverse) - tree = t->lnode; - sourceline = tree->source_line; - source = tree->source_file; - switch (setjmp(rule_tag)) { - case 0: /* normal non-jump */ - /* test pattern, if any */ - if (tree->lnode == NULL || - eval_condition(tree->lnode)) - (void) interpret(tree->rnode); - break; - case TAG_CONTINUE: /* NEXT statement */ - return 1; - case TAG_BREAK: - return 0; - default: - cant_happen(); - } - if (!traverse) /* case Node_rule_node */ - break; /* don't loop */ - } - break; - - case Node_statement_list: - for (t = tree; t != NULL; t = t->rnode) - (void) interpret(t->lnode); - break; - - case Node_K_if: - if (eval_condition(tree->lnode)) { - (void) interpret(tree->rnode->lnode); - } else { - (void) interpret(tree->rnode->rnode); - } - break; - - case Node_K_while: - PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - - stable_tree = tree; - while (eval_condition(stable_tree->lnode)) { - switch (setjmp(loop_tag)) { - case 0: /* normal non-jump */ - (void) interpret(stable_tree->rnode); - break; - case TAG_CONTINUE: /* continue statement */ - break; - case TAG_BREAK: /* break statement */ - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - return 1; - default: - cant_happen(); - } - } - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - break; - - case Node_K_do: - PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - stable_tree = tree; - do { - switch (setjmp(loop_tag)) { - case 0: /* normal non-jump */ - (void) interpret(stable_tree->rnode); - break; - case TAG_CONTINUE: /* continue statement */ - break; - case TAG_BREAK: /* break statement */ - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - return 1; - default: - cant_happen(); - } - } while (eval_condition(stable_tree->lnode)); - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - break; - - case Node_K_for: - PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - (void) interpret(tree->forloop->init); - stable_tree = tree; - while (eval_condition(stable_tree->forloop->cond)) { - switch (setjmp(loop_tag)) { - case 0: /* normal non-jump */ - (void) interpret(stable_tree->lnode); - /* fall through */ - case TAG_CONTINUE: /* continue statement */ - (void) interpret(stable_tree->forloop->incr); - break; - case TAG_BREAK: /* break statement */ - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - return 1; - default: - cant_happen(); - } - } - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - break; - - case Node_K_arrayfor: - { - volatile struct search l; /* For array_for */ - Func_ptr after_assign = NULL; - -#define hakvar forloop->init -#define arrvar forloop->incr - PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - lhs = get_lhs(tree->hakvar, &after_assign); - t = tree->arrvar; - if (t->type == Node_param_list) - t = stack_ptr[t->param_cnt]; - stable_tree = tree; - for (assoc_scan(t, (struct search *)&l); - l.retval; - assoc_next((struct search *)&l)) { - unref(*((NODE **) lhs)); - *lhs = dupnode(l.retval); - if (after_assign) - (*after_assign)(); - switch (setjmp(loop_tag)) { - case 0: - (void) interpret(stable_tree->lnode); - case TAG_CONTINUE: - break; - - case TAG_BREAK: - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - return 1; - default: - cant_happen(); - } - } - RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid); - break; - } - - case Node_K_break: - if (loop_tag_valid == 0) - fatal("unexpected break"); - longjmp(loop_tag, TAG_BREAK); - break; - - case Node_K_continue: - if (loop_tag_valid == 0) { - /* - * AT&T nawk treats continue outside of loops like - * next. Allow it if not posix, and complain if - * lint. - */ - static int warned = 0; - - if (do_lint && ! warned) { - warning("use of `continue' outside of loop is not portable"); - warned = 1; - } - if (do_posix) - fatal("use of `continue' outside of loop is not allowed"); - longjmp(rule_tag, TAG_CONTINUE); - } else - longjmp(loop_tag, TAG_CONTINUE); - break; - - case Node_K_print: - do_print(tree); - break; - - case Node_K_printf: - do_printf(tree); - break; - - case Node_K_delete: - if (tree->rnode != NULL) - do_delete(tree->lnode, tree->rnode); - else - assoc_clear(tree->lnode); - break; - - case Node_K_next: - longjmp(rule_tag, TAG_CONTINUE); - break; - - case Node_K_nextfile: - do_nextfile(); - break; - - case Node_K_exit: - /* - * In A,K,&W, p. 49, it says that an exit statement "... - * causes the program to behave as if the end of input had - * occurred; no more input is read, and the END actions, if - * any are executed." This implies that the rest of the rules - * are not done. So we immediately break out of the main loop. - */ - exiting = 1; - if (tree) { - t = tree_eval(tree->lnode); - exit_val = (int) force_number(t); - } - free_temp(t); - longjmp(rule_tag, TAG_BREAK); - break; - - case Node_K_return: - t = tree_eval(tree->lnode); - ret_node = dupnode(t); - free_temp(t); - longjmp(func_tag, TAG_RETURN); - break; - - default: - /* - * Appears to be an expression statement. Throw away the - * value. - */ - if (do_lint && tree->type == Node_var) - warning("statement has no effect"); - t = tree_eval(tree); - free_temp(t); - break; - } - return 1; -} - -/* evaluate a subtree */ - -NODE * -r_tree_eval(tree) -register NODE *tree; -{ - register NODE *r, *t1, *t2; /* return value & temporary subtrees */ - register NODE **lhs; - register int di; - AWKNUM x, x1, x2; - long lx; -#ifdef _CRAY - long lx2; -#endif - -#ifdef DEBUG - if (tree == NULL) - return Nnull_string; - if (tree->type == Node_val) { - if ((char)tree->stref <= 0) cant_happen(); - return tree; - } - if (tree->type == Node_var) { - if ((char)tree->var_value->stref <= 0) cant_happen(); - return tree->var_value; - } -#endif - - if (tree->type == Node_param_list) { - tree = stack_ptr[tree->param_cnt]; - if (tree == NULL) - return Nnull_string; - } - - switch (tree->type) { - case Node_var: - return tree->var_value; - - case Node_and: - return tmp_number((AWKNUM) (eval_condition(tree->lnode) - && eval_condition(tree->rnode))); - - case Node_or: - return tmp_number((AWKNUM) (eval_condition(tree->lnode) - || eval_condition(tree->rnode))); - - case Node_not: - return tmp_number((AWKNUM) ! eval_condition(tree->lnode)); - - /* Builtins */ - case Node_builtin: - return ((*tree->proc) (tree->subnode)); - - case Node_K_getline: - return (do_getline(tree)); - - case Node_in_array: - return tmp_number((AWKNUM) in_array(tree->lnode, tree->rnode)); - - case Node_func_call: - return func_call(tree->rnode, tree->lnode); - - /* unary operations */ - case Node_NR: - case Node_FNR: - case Node_NF: - case Node_FIELDWIDTHS: - case Node_FS: - case Node_RS: - case Node_field_spec: - case Node_subscript: - case Node_IGNORECASE: - case Node_OFS: - case Node_ORS: - case Node_OFMT: - case Node_CONVFMT: - lhs = get_lhs(tree, (Func_ptr *)0); - return *lhs; - - case Node_var_array: - fatal("attempt to use array `%s' in a scalar context", tree->vname); - - case Node_unary_minus: - t1 = tree_eval(tree->subnode); - x = -force_number(t1); - free_temp(t1); - return tmp_number(x); - - case Node_cond_exp: - if (eval_condition(tree->lnode)) - return tree_eval(tree->rnode->lnode); - return tree_eval(tree->rnode->rnode); - - case Node_match: - case Node_nomatch: - case Node_regex: - return match_op(tree); - - case Node_func: - fatal("function `%s' called with space between name and (,\n%s", - tree->lnode->param, - "or used in other expression context"); - - /* assignments */ - case Node_assign: - { - Func_ptr after_assign = NULL; - - r = tree_eval(tree->rnode); - lhs = get_lhs(tree->lnode, &after_assign); - if (r != *lhs) { - NODE *save; - - save = *lhs; - *lhs = dupnode(r); - unref(save); - } - free_temp(r); - if (after_assign) - (*after_assign)(); - return *lhs; - } - - case Node_concat: - { -#define STACKSIZE 10 - NODE *treelist[STACKSIZE+1]; - NODE *strlist[STACKSIZE+1]; - register NODE **treep; - register NODE **strp; - register size_t len; - char *str; - register char *dest; - - /* - * This is an efficiency hack for multiple adjacent string - * concatenations, to avoid recursion and string copies. - * - * Node_concat trees grow downward to the left, so - * descend to lowest (first) node, accumulating nodes - * to evaluate to strings as we go. - */ - treep = treelist; - while (tree->type == Node_concat) { - *treep++ = tree->rnode; - tree = tree->lnode; - if (treep == &treelist[STACKSIZE]) - break; - } - *treep = tree; - /* - * Now, evaluate to strings in LIFO order, accumulating - * the string length, so we can do a single malloc at the - * end. - */ - strp = strlist; - len = 0; - while (treep >= treelist) { - *strp = force_string(tree_eval(*treep--)); - len += (*strp)->stlen; - strp++; - } - *strp = NULL; - emalloc(str, char *, len+2, "tree_eval"); - str[len] = str[len+1] = '\0'; /* for good measure */ - dest = str; - strp = strlist; - while (*strp) { - memcpy(dest, (*strp)->stptr, (*strp)->stlen); - dest += (*strp)->stlen; - free_temp(*strp); - strp++; - } - r = make_str_node(str, len, ALREADY_MALLOCED); - r->flags |= TEMP; - } - return r; - - /* other assignment types are easier because they are numeric */ - case Node_preincrement: - case Node_predecrement: - case Node_postincrement: - case Node_postdecrement: - case Node_assign_exp: - case Node_assign_times: - case Node_assign_quotient: - case Node_assign_mod: - case Node_assign_plus: - case Node_assign_minus: - return op_assign(tree); - default: - break; /* handled below */ - } - - /* evaluate subtrees in order to do binary operation, then keep going */ - t1 = tree_eval(tree->lnode); - t2 = tree_eval(tree->rnode); - - switch (tree->type) { - case Node_geq: - case Node_leq: - case Node_greater: - case Node_less: - case Node_notequal: - case Node_equal: - di = cmp_nodes(t1, t2); - free_temp(t1); - free_temp(t2); - switch (tree->type) { - case Node_equal: - return tmp_number((AWKNUM) (di == 0)); - case Node_notequal: - return tmp_number((AWKNUM) (di != 0)); - case Node_less: - return tmp_number((AWKNUM) (di < 0)); - case Node_greater: - return tmp_number((AWKNUM) (di > 0)); - case Node_leq: - return tmp_number((AWKNUM) (di <= 0)); - case Node_geq: - return tmp_number((AWKNUM) (di >= 0)); - default: - cant_happen(); - } - break; - default: - break; /* handled below */ - } - - x1 = force_number(t1); - free_temp(t1); - x2 = force_number(t2); - free_temp(t2); - switch (tree->type) { - case Node_exp: - if ((lx = x2) == x2 && lx >= 0) { /* integer exponent */ - if (lx == 0) - x = 1; - else if (lx == 1) - x = x1; - else { - /* doing it this way should be more precise */ - for (x = x1; --lx; ) - x *= x1; - } - } else - x = pow((double) x1, (double) x2); - return tmp_number(x); - - case Node_times: - return tmp_number(x1 * x2); - - case Node_quotient: - if (x2 == 0) - fatal("division by zero attempted"); -#ifdef _CRAY - /* - * special case for integer division, put in for Cray - */ - lx2 = x2; - if (lx2 == 0) - return tmp_number(x1 / x2); - lx = (long) x1 / lx2; - if (lx * x2 == x1) - return tmp_number((AWKNUM) lx); - else -#endif - return tmp_number(x1 / x2); - - case Node_mod: - if (x2 == 0) - fatal("division by zero attempted in mod"); -#ifndef FMOD_MISSING - return tmp_number(fmod (x1, x2)); -#else - (void) modf(x1 / x2, &x); - return tmp_number(x1 - x * x2); -#endif - - case Node_plus: - return tmp_number(x1 + x2); - - case Node_minus: - return tmp_number(x1 - x2); - - case Node_var_array: - fatal("attempt to use array `%s' in a scalar context", tree->vname); - - default: - fatal("illegal type (%d) in tree_eval", tree->type); - } - return 0; -} - -/* Is TREE true or false? Returns 0==false, non-zero==true */ -static int -eval_condition(tree) -register NODE *tree; -{ - register NODE *t1; - register int ret; - - if (tree == NULL) /* Null trees are the easiest kinds */ - return 1; - if (tree->type == Node_line_range) { - /* - * Node_line_range is kind of like Node_match, EXCEPT: the - * lnode field (more properly, the condpair field) is a node - * of a Node_cond_pair; whether we evaluate the lnode of that - * node or the rnode depends on the triggered word. More - * precisely: if we are not yet triggered, we tree_eval the - * lnode; if that returns true, we set the triggered word. - * If we are triggered (not ELSE IF, note), we tree_eval the - * rnode, clear triggered if it succeeds, and perform our - * action (regardless of success or failure). We want to be - * able to begin and end on a single input record, so this - * isn't an ELSE IF, as noted above. - */ - if (!tree->triggered) - if (!eval_condition(tree->condpair->lnode)) - return 0; - else - tree->triggered = 1; - /* Else we are triggered */ - if (eval_condition(tree->condpair->rnode)) - tree->triggered = 0; - return 1; - } - - /* - * Could just be J.random expression. in which case, null and 0 are - * false, anything else is true - */ - - t1 = tree_eval(tree); - if (t1->flags & MAYBE_NUM) - (void) force_number(t1); - if (t1->flags & NUMBER) - ret = t1->numbr != 0.0; - else - ret = t1->stlen != 0; - free_temp(t1); - return ret; -} - -/* - * compare two nodes, returning negative, 0, positive - */ -int -cmp_nodes(t1, t2) -register NODE *t1, *t2; -{ - register int ret; - register size_t len1, len2; - - if (t1 == t2) - return 0; - if (t1->flags & MAYBE_NUM) - (void) force_number(t1); - if (t2->flags & MAYBE_NUM) - (void) force_number(t2); - if ((t1->flags & NUMBER) && (t2->flags & NUMBER)) { - if (t1->numbr == t2->numbr) return 0; - else if (t1->numbr - t2->numbr < 0) return -1; - else return 1; - } - (void) force_string(t1); - (void) force_string(t2); - len1 = t1->stlen; - len2 = t2->stlen; - if (len1 == 0 || len2 == 0) - return len1 - len2; - ret = memcmp(t1->stptr, t2->stptr, len1 <= len2 ? len1 : len2); - return ret == 0 ? len1-len2 : ret; -} - -static NODE * -op_assign(tree) -register NODE *tree; -{ - AWKNUM rval, lval; - NODE **lhs; - AWKNUM t1, t2; - long ltemp; - NODE *tmp; - Func_ptr after_assign = NULL; - - lhs = get_lhs(tree->lnode, &after_assign); - lval = force_number(*lhs); - - /* - * Can't unref *lhs until we know the type; doing so - * too early breaks x += x sorts of things. - */ - switch(tree->type) { - case Node_preincrement: - case Node_predecrement: - unref(*lhs); - *lhs = make_number(lval + - (tree->type == Node_preincrement ? 1.0 : -1.0)); - if (after_assign) - (*after_assign)(); - return *lhs; - - case Node_postincrement: - case Node_postdecrement: - unref(*lhs); - *lhs = make_number(lval + - (tree->type == Node_postincrement ? 1.0 : -1.0)); - if (after_assign) - (*after_assign)(); - return tmp_number(lval); - default: - break; /* handled below */ - } - - tmp = tree_eval(tree->rnode); - rval = force_number(tmp); - free_temp(tmp); - unref(*lhs); - switch(tree->type) { - case Node_assign_exp: - if ((ltemp = rval) == rval) { /* integer exponent */ - if (ltemp == 0) - *lhs = make_number((AWKNUM) 1); - else if (ltemp == 1) - *lhs = make_number(lval); - else { - /* doing it this way should be more precise */ - for (t1 = t2 = lval; --ltemp; ) - t1 *= t2; - *lhs = make_number(t1); - } - } else - *lhs = make_number((AWKNUM) pow((double) lval, (double) rval)); - break; - - case Node_assign_times: - *lhs = make_number(lval * rval); - break; - - case Node_assign_quotient: - if (rval == (AWKNUM) 0) - fatal("division by zero attempted in /="); -#ifdef _CRAY - /* - * special case for integer division, put in for Cray - */ - ltemp = rval; - if (ltemp == 0) { - *lhs = make_number(lval / rval); - break; - } - ltemp = (long) lval / ltemp; - if (ltemp * lval == rval) - *lhs = make_number((AWKNUM) ltemp); - else -#endif - *lhs = make_number(lval / rval); - break; - - case Node_assign_mod: - if (rval == (AWKNUM) 0) - fatal("division by zero attempted in %="); -#ifndef FMOD_MISSING - *lhs = make_number(fmod(lval, rval)); -#else - (void) modf(lval / rval, &t1); - t2 = lval - rval * t1; - *lhs = make_number(t2); -#endif - break; - - case Node_assign_plus: - *lhs = make_number(lval + rval); - break; - - case Node_assign_minus: - *lhs = make_number(lval - rval); - break; - default: - cant_happen(); - } - if (after_assign) - (*after_assign)(); - return *lhs; -} - -NODE **stack_ptr; - -static NODE * -func_call(name, arg_list) -NODE *name; /* name is a Node_val giving function name */ -NODE *arg_list; /* Node_expression_list of calling args. */ -{ - register NODE *arg, *argp, *r; - NODE *n, *f; - jmp_buf volatile func_tag_stack; - jmp_buf volatile loop_tag_stack; - int volatile save_loop_tag_valid = 0; - NODE **volatile save_stack, *save_ret_node; - NODE **volatile local_stack = NULL, **sp; - int count; - extern NODE *ret_node; - - /* - * retrieve function definition node - */ - f = lookup(name->stptr); - if (!f || f->type != Node_func) - fatal("function `%s' not defined", name->stptr); -#ifdef FUNC_TRACE - fprintf(stderr, "function %s called\n", name->stptr); -#endif - count = f->lnode->param_cnt; - if (count) - emalloc(local_stack, NODE **, count*sizeof(NODE *), "func_call"); - sp = local_stack; - - /* - * for each calling arg. add NODE * on stack - */ - for (argp = arg_list; count && argp != NULL; argp = argp->rnode) { - arg = argp->lnode; - getnode(r); - r->type = Node_var; - /* - * call by reference for arrays; see below also - */ - if (arg->type == Node_param_list) - arg = stack_ptr[arg->param_cnt]; - if (arg->type == Node_var_array) - *r = *arg; - else { - n = tree_eval(arg); - r->lnode = dupnode(n); - r->rnode = (NODE *) NULL; - free_temp(n); - } - *sp++ = r; - count--; - } - if (argp != NULL) /* left over calling args. */ - warning( - "function `%s' called with more arguments than declared", - name->stptr); - /* - * add remaining params. on stack with null value - */ - while (count-- > 0) { - getnode(r); - r->type = Node_var; - r->lnode = Nnull_string; - r->rnode = (NODE *) NULL; - *sp++ = r; - } - - /* - * Execute function body, saving context, as a return statement - * will longjmp back here. - * - * Have to save and restore the loop_tag stuff so that a return - * inside a loop in a function body doesn't scrog any loops going - * on in the main program. We save the necessary info in variables - * local to this function so that function nesting works OK. - * We also only bother to save the loop stuff if we're in a loop - * when the function is called. - */ - if (loop_tag_valid) { - int junk = 0; - - save_loop_tag_valid = (volatile int) loop_tag_valid; - PUSH_BINDING(loop_tag_stack, loop_tag, junk); - loop_tag_valid = 0; - } - save_stack = stack_ptr; - stack_ptr = local_stack; - PUSH_BINDING(func_tag_stack, func_tag, func_tag_valid); - save_ret_node = ret_node; - ret_node = Nnull_string; /* default return value */ - if (setjmp(func_tag) == 0) - (void) interpret(f->rnode); - - r = ret_node; - ret_node = (NODE *) save_ret_node; - RESTORE_BINDING(func_tag_stack, func_tag, func_tag_valid); - stack_ptr = (NODE **) save_stack; - - /* - * here, we pop each parameter and check whether - * it was an array. If so, and if the arg. passed in was - * a simple variable, then the value should be copied back. - * This achieves "call-by-reference" for arrays. - */ - sp = local_stack; - count = f->lnode->param_cnt; - for (argp = arg_list; count > 0 && argp != NULL; argp = argp->rnode) { - arg = argp->lnode; - if (arg->type == Node_param_list) - arg = stack_ptr[arg->param_cnt]; - n = *sp++; - if ((arg->type == Node_var || arg->type == Node_var_array) - && n->type == Node_var_array) { - /* should we free arg->var_value ? */ - arg->var_array = n->var_array; - arg->type = Node_var_array; - arg->array_size = n->array_size; - arg->table_size = n->table_size; - arg->flags = n->flags; - } - /* n->lnode overlays the array size, don't unref it if array */ - if (n->type != Node_var_array) - unref(n->lnode); - freenode(n); - count--; - } - while (count-- > 0) { - n = *sp++; - /* if n is an (local) array, all the elements should be freed */ - if (n->type == Node_var_array) - assoc_clear(n); - unref(n->lnode); - freenode(n); - } - if (local_stack) - free((char *) local_stack); - - /* Restore the loop_tag stuff if necessary. */ - if (save_loop_tag_valid) { - int junk = 0; - - loop_tag_valid = (int) save_loop_tag_valid; - RESTORE_BINDING(loop_tag_stack, loop_tag, junk); - } - - if (!(r->flags & PERM)) - r->flags |= TEMP; - return r; -} - -/* - * This returns a POINTER to a node pointer. get_lhs(ptr) is the current - * value of the var, or where to store the var's new value - */ - -NODE ** -r_get_lhs(ptr, assign) -register NODE *ptr; -Func_ptr *assign; -{ - register NODE **aptr = NULL; - register NODE *n; - - switch (ptr->type) { - case Node_var_array: - fatal("attempt to use array `%s' in a scalar context", ptr->vname); - case Node_var: - aptr = &(ptr->var_value); -#ifdef DEBUG - if ((char)ptr->var_value->stref <= 0) - cant_happen(); -#endif - break; - - case Node_FIELDWIDTHS: - aptr = &(FIELDWIDTHS_node->var_value); - if (assign) - *assign = set_FIELDWIDTHS; - break; - - case Node_RS: - aptr = &(RS_node->var_value); - if (assign) - *assign = set_RS; - break; - - case Node_FS: - aptr = &(FS_node->var_value); - if (assign) - *assign = set_FS; - break; - - case Node_FNR: - unref(FNR_node->var_value); - FNR_node->var_value = make_number((AWKNUM) FNR); - aptr = &(FNR_node->var_value); - if (assign) - *assign = set_FNR; - break; - - case Node_NR: - unref(NR_node->var_value); - NR_node->var_value = make_number((AWKNUM) NR); - aptr = &(NR_node->var_value); - if (assign) - *assign = set_NR; - break; - - case Node_NF: - if (NF == -1) - (void) get_field(HUGE-1, assign); /* parse record */ - unref(NF_node->var_value); - NF_node->var_value = make_number((AWKNUM) NF); - aptr = &(NF_node->var_value); - if (assign) - *assign = set_NF; - break; - - case Node_IGNORECASE: - unref(IGNORECASE_node->var_value); - IGNORECASE_node->var_value = make_number((AWKNUM) IGNORECASE); - aptr = &(IGNORECASE_node->var_value); - if (assign) - *assign = set_IGNORECASE; - break; - - case Node_OFMT: - aptr = &(OFMT_node->var_value); - if (assign) - *assign = set_OFMT; - break; - - case Node_CONVFMT: - aptr = &(CONVFMT_node->var_value); - if (assign) - *assign = set_CONVFMT; - break; - - case Node_ORS: - aptr = &(ORS_node->var_value); - if (assign) - *assign = set_ORS; - break; - - case Node_OFS: - aptr = &(OFS_node->var_value); - if (assign) - *assign = set_OFS; - break; - - case Node_param_list: - aptr = &(stack_ptr[ptr->param_cnt]->var_value); - break; - - case Node_field_spec: - { - int field_num; - - n = tree_eval(ptr->lnode); - field_num = (int) force_number(n); - free_temp(n); - if (field_num < 0) - fatal("attempt to access field %d", field_num); - if (field_num == 0 && field0_valid) { /* short circuit */ - aptr = &fields_arr[0]; - if (assign) - *assign = reset_record; - break; - } - aptr = get_field(field_num, assign); - break; - } - case Node_subscript: - n = ptr->lnode; - if (n->type == Node_param_list) - n = stack_ptr[n->param_cnt]; - aptr = assoc_lookup(n, concat_exp(ptr->rnode)); - break; - - case Node_func: - fatal ("`%s' is a function, assignment is not allowed", - ptr->lnode->param); - default: - cant_happen(); - } - return aptr; -} - -static NODE * -match_op(tree) -register NODE *tree; -{ - register NODE *t1; - register Regexp *rp; - int i; - int match = 1; - - if (tree->type == Node_nomatch) - match = 0; - if (tree->type == Node_regex) - t1 = *get_field(0, (Func_ptr *) 0); - else { - t1 = force_string(tree_eval(tree->lnode)); - tree = tree->rnode; - } - rp = re_update(tree); - i = research(rp, t1->stptr, 0, t1->stlen, 0); - i = (i == -1) ^ (match == 1); - free_temp(t1); - return tmp_number((AWKNUM) i); -} - -void -set_IGNORECASE() -{ - static int warned = 0; - - if ((do_lint || do_unix) && ! warned) { - warned = 1; - warning("IGNORECASE not supported in compatibility mode"); - } - IGNORECASE = (force_number(IGNORECASE_node->var_value) != 0.0); - set_FS(); -} - -void -set_OFS() -{ - OFS = force_string(OFS_node->var_value)->stptr; - OFSlen = OFS_node->var_value->stlen; - OFS[OFSlen] = '\0'; -} - -void -set_ORS() -{ - ORS = force_string(ORS_node->var_value)->stptr; - ORSlen = ORS_node->var_value->stlen; - ORS[ORSlen] = '\0'; -} - -NODE **fmt_list = NULL; -static int fmt_ok P((NODE *n)); -static int fmt_index P((NODE *n)); - -static int -fmt_ok(n) -NODE *n; -{ - /* to be done later */ - return 1; -} - -static int -fmt_index(n) -NODE *n; -{ - register int ix = 0; - static int fmt_num = 4; - static int fmt_hiwater = 0; - - if (fmt_list == NULL) - emalloc(fmt_list, NODE **, fmt_num*sizeof(*fmt_list), "fmt_index"); - (void) force_string(n); - while (ix < fmt_hiwater) { - if (cmp_nodes(fmt_list[ix], n) == 0) - return ix; - ix++; - } - /* not found */ - n->stptr[n->stlen] = '\0'; - if (!fmt_ok(n)) - warning("bad FMT specification"); - if (fmt_hiwater >= fmt_num) { - fmt_num *= 2; - emalloc(fmt_list, NODE **, fmt_num, "fmt_index"); - } - fmt_list[fmt_hiwater] = dupnode(n); - return fmt_hiwater++; -} - -void -set_OFMT() -{ - OFMTidx = fmt_index(OFMT_node->var_value); - OFMT = fmt_list[OFMTidx]->stptr; -} - -void -set_CONVFMT() -{ - CONVFMTidx = fmt_index(CONVFMT_node->var_value); - CONVFMT = fmt_list[CONVFMTidx]->stptr; -} diff --git a/gnu/usr.bin/awk/field.c b/gnu/usr.bin/awk/field.c deleted file mode 100644 index b1a709e..0000000 --- a/gnu/usr.bin/awk/field.c +++ /dev/null @@ -1,678 +0,0 @@ -/* - * field.c - routines for dealing with fields and record parsing - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include "awk.h" - -typedef void (* Setfunc) P((int, char*, int, NODE *)); - -static long (*parse_field) P((int, char **, int, NODE *, - Regexp *, Setfunc, NODE *)); -static void rebuild_record P((void)); -static long re_parse_field P((int, char **, int, NODE *, - Regexp *, Setfunc, NODE *)); -static long def_parse_field P((int, char **, int, NODE *, - Regexp *, Setfunc, NODE *)); -static long sc_parse_field P((int, char **, int, NODE *, - Regexp *, Setfunc, NODE *)); -static long fw_parse_field P((int, char **, int, NODE *, - Regexp *, Setfunc, NODE *)); -static void set_element P((int, char *, int, NODE *)); -static void grow_fields_arr P((long num)); -static void set_field P((int num, char *str, int len, NODE *dummy)); - - -static Regexp *FS_regexp = NULL; -static char *parse_extent; /* marks where to restart parse of record */ -static long parse_high_water=0; /* field number that we have parsed so far */ -static long nf_high_water = 0; /* size of fields_arr */ -static int resave_fs; -static NODE *save_FS; /* save current value of FS when line is read, - * to be used in deferred parsing - */ - -NODE **fields_arr; /* array of pointers to the field nodes */ -int field0_valid; /* $(>0) has not been changed yet */ -int default_FS; -static NODE **nodes; /* permanent repository of field nodes */ -static int *FIELDWIDTHS = NULL; - -void -init_fields() -{ - NODE *n; - - emalloc(fields_arr, NODE **, sizeof(NODE *), "init_fields"); - emalloc(nodes, NODE **, sizeof(NODE *), "init_fields"); - getnode(n); - *n = *Nnull_string; - fields_arr[0] = nodes[0] = n; - parse_extent = fields_arr[0]->stptr; - save_FS = dupnode(FS_node->var_value); - field0_valid = 1; -} - - -static void -grow_fields_arr(num) -long num; -{ - register int t; - register NODE *n; - - erealloc(fields_arr, NODE **, (num + 1) * sizeof(NODE *), "set_field"); - erealloc(nodes, NODE **, (num+1) * sizeof(NODE *), "set_field"); - for (t = nf_high_water+1; t <= num; t++) { - getnode(n); - *n = *Nnull_string; - fields_arr[t] = nodes[t] = n; - } - nf_high_water = num; -} - -/*ARGSUSED*/ -static void -set_field(num, str, len, dummy) -int num; -char *str; -int len; -NODE *dummy; /* not used -- just to make interface same as set_element */ -{ - register NODE *n; - - if (num > nf_high_water) - grow_fields_arr(num); - n = nodes[num]; - n->stptr = str; - n->stlen = len; - n->flags = (PERM|STR|STRING|MAYBE_NUM); - fields_arr[num] = n; -} - -/* Someone assigned a value to $(something). Fix up $0 to be right */ -static void -rebuild_record() -{ - register size_t tlen; - register NODE *tmp; - NODE *ofs; - char *ops; - register char *cops; - register NODE **ptr; - register size_t ofslen; - - tlen = 0; - ofs = force_string(OFS_node->var_value); - ofslen = ofs->stlen; - ptr = &fields_arr[NF]; - while (ptr > &fields_arr[0]) { - tmp = force_string(*ptr); - tlen += tmp->stlen; - ptr--; - } - tlen += (NF - 1) * ofslen; - if ((long)tlen < 0) - tlen = 0; - emalloc(ops, char *, tlen + 2, "rebuild_record"); - cops = ops; - ops[0] = '\0'; - for (ptr = &fields_arr[1]; ptr <= &fields_arr[NF]; ptr++) { - tmp = *ptr; - if (tmp->stlen == 1) - *cops++ = tmp->stptr[0]; - else if (tmp->stlen != 0) { - memcpy(cops, tmp->stptr, tmp->stlen); - cops += tmp->stlen; - } - if (ptr != &fields_arr[NF]) { - if (ofslen == 1) - *cops++ = ofs->stptr[0]; - else if (ofslen != 0) { - memcpy(cops, ofs->stptr, ofslen); - cops += ofslen; - } - } - } - tmp = make_str_node(ops, tlen, ALREADY_MALLOCED); - unref(fields_arr[0]); - fields_arr[0] = tmp; - field0_valid = 1; -} - -/* - * setup $0, but defer parsing rest of line until reference is made to $(>0) - * or to NF. At that point, parse only as much as necessary. - */ -void -set_record(buf, cnt, freeold) -char *buf; -int cnt; -int freeold; -{ - register int i; - - NF = -1; - for (i = 1; i <= parse_high_water; i++) { - unref(fields_arr[i]); - } - parse_high_water = 0; - if (freeold) { - unref(fields_arr[0]); - if (resave_fs) { - resave_fs = 0; - unref(save_FS); - save_FS = dupnode(FS_node->var_value); - } - nodes[0]->stptr = buf; - nodes[0]->stlen = cnt; - nodes[0]->stref = 1; - nodes[0]->flags = (STRING|STR|PERM|MAYBE_NUM); - fields_arr[0] = nodes[0]; - } - fields_arr[0]->flags |= MAYBE_NUM; - field0_valid = 1; -} - -void -reset_record() -{ - (void) force_string(fields_arr[0]); - set_record(fields_arr[0]->stptr, fields_arr[0]->stlen, 0); -} - -void -set_NF() -{ - register int i; - - NF = (long) force_number(NF_node->var_value); - if (NF > nf_high_water) - grow_fields_arr(NF); - for (i = parse_high_water + 1; i <= NF; i++) { - unref(fields_arr[i]); - fields_arr[i] = Nnull_string; - } - field0_valid = 0; -} - -/* - * this is called both from get_field() and from do_split() - * via (*parse_field)(). This variation is for when FS is a regular - * expression -- either user-defined or because RS=="" and FS==" " - */ -static long -re_parse_field(up_to, buf, len, fs, rp, set, n) -int up_to; /* parse only up to this field number */ -char **buf; /* on input: string to parse; on output: point to start next */ -int len; -NODE *fs; -Regexp *rp; -Setfunc set; /* routine to set the value of the parsed field */ -NODE *n; -{ - register char *scan = *buf; - register int nf = parse_high_water; - register char *field; - register char *end = scan + len; - - if (up_to == HUGE) - nf = 0; - if (len == 0) - return nf; - - if (*RS == 0 && default_FS) - while (scan < end && (*scan == ' ' || *scan == '\t' || *scan == '\n')) - scan++; - field = scan; - while (scan < end - && research(rp, scan, 0, (end - scan), 1) != -1 - && nf < up_to) { - if (REEND(rp, scan) == RESTART(rp, scan)) { /* null match */ - scan++; - if (scan == end) { - (*set)(++nf, field, (int)(scan - field), n); - up_to = nf; - break; - } - continue; - } - (*set)(++nf, field, - (int)(scan + RESTART(rp, scan) - field), n); - scan += REEND(rp, scan); - field = scan; - if (scan == end) /* FS at end of record */ - (*set)(++nf, field, 0, n); - } - if (nf != up_to && scan < end) { - (*set)(++nf, scan, (int)(end - scan), n); - scan = end; - } - *buf = scan; - return (nf); -} - -/* - * this is called both from get_field() and from do_split() - * via (*parse_field)(). This variation is for when FS is a single space - * character. - */ -static long -def_parse_field(up_to, buf, len, fs, rp, set, n) -int up_to; /* parse only up to this field number */ -char **buf; /* on input: string to parse; on output: point to start next */ -int len; -NODE *fs; -Regexp *rp; -Setfunc set; /* routine to set the value of the parsed field */ -NODE *n; -{ - register char *scan = *buf; - register int nf = parse_high_water; - register char *field; - register char *end = scan + len; - char sav; - - if (up_to == HUGE) - nf = 0; - if (len == 0) - return nf; - - /* before doing anything save the char at *end */ - sav = *end; - /* because it will be destroyed now: */ - - *end = ' '; /* sentinel character */ - for (; nf < up_to; scan++) { - /* - * special case: fs is single space, strip leading whitespace - */ - while (scan < end && (*scan == ' ' || *scan == '\t')) - scan++; - if (scan >= end) - break; - field = scan; - while (*scan != ' ' && *scan != '\t') - scan++; - (*set)(++nf, field, (int)(scan - field), n); - if (scan == end) - break; - } - - /* everything done, restore original char at *end */ - *end = sav; - - *buf = scan; - return nf; -} - -/* - * this is called both from get_field() and from do_split() - * via (*parse_field)(). This variation is for when FS is a single character - * other than space. - */ -static long -sc_parse_field(up_to, buf, len, fs, rp, set, n) -int up_to; /* parse only up to this field number */ -char **buf; /* on input: string to parse; on output: point to start next */ -int len; -NODE *fs; -Regexp *rp; -Setfunc set; /* routine to set the value of the parsed field */ -NODE *n; -{ - register char *scan = *buf; - register char fschar; - register int nf = parse_high_water; - register char *field; - register char *end = scan + len; - char sav; - - if (up_to == HUGE) - nf = 0; - if (len == 0) - return nf; - - if (*RS == 0 && fs->stlen == 0) - fschar = '\n'; - else - fschar = fs->stptr[0]; - - /* before doing anything save the char at *end */ - sav = *end; - /* because it will be destroyed now: */ - *end = fschar; /* sentinel character */ - - for (; nf < up_to;) { - field = scan; - while (*scan != fschar) - scan++; - (*set)(++nf, field, (int)(scan - field), n); - if (scan == end) - break; - scan++; - if (scan == end) { /* FS at end of record */ - (*set)(++nf, field, 0, n); - break; - } - } - - /* everything done, restore original char at *end */ - *end = sav; - - *buf = scan; - return nf; -} - -/* - * this is called both from get_field() and from do_split() - * via (*parse_field)(). This variation is for fields are fixed widths. - */ -static long -fw_parse_field(up_to, buf, len, fs, rp, set, n) -int up_to; /* parse only up to this field number */ -char **buf; /* on input: string to parse; on output: point to start next */ -int len; -NODE *fs; -Regexp *rp; -Setfunc set; /* routine to set the value of the parsed field */ -NODE *n; -{ - register char *scan = *buf; - register long nf = parse_high_water; - register char *end = scan + len; - - if (up_to == HUGE) - nf = 0; - if (len == 0) - return nf; - for (; nf < up_to && (len = FIELDWIDTHS[nf+1]) != -1; ) { - if (len > end - scan) - len = end - scan; - (*set)(++nf, scan, len, n); - scan += len; - } - if (len == -1) - *buf = end; - else - *buf = scan; - return nf; -} - -NODE ** -get_field(requested, assign) -register int requested; -Func_ptr *assign; /* this field is on the LHS of an assign */ -{ - /* - * if requesting whole line but some other field has been altered, - * then the whole line must be rebuilt - */ - if (requested == 0) { - if (!field0_valid) { - /* first, parse remainder of input record */ - if (NF == -1) { - NF = (*parse_field)(HUGE-1, &parse_extent, - fields_arr[0]->stlen - - (parse_extent - fields_arr[0]->stptr), - save_FS, FS_regexp, set_field, - (NODE *)NULL); - parse_high_water = NF; - } - rebuild_record(); - } - if (assign) - *assign = reset_record; - return &fields_arr[0]; - } - - /* assert(requested > 0); */ - - if (assign) - field0_valid = 0; /* $0 needs reconstruction */ - - if (requested <= parse_high_water) /* already parsed this field */ - return &fields_arr[requested]; - - if (NF == -1) { /* have not yet parsed to end of record */ - /* - * parse up to requested fields, calling set_field() for each, - * saving in parse_extent the point where the parse left off - */ - if (parse_high_water == 0) /* starting at the beginning */ - parse_extent = fields_arr[0]->stptr; - parse_high_water = (*parse_field)(requested, &parse_extent, - fields_arr[0]->stlen - (parse_extent-fields_arr[0]->stptr), - save_FS, FS_regexp, set_field, (NODE *)NULL); - - /* - * if we reached the end of the record, set NF to the number of - * fields so far. Note that requested might actually refer to - * a field that is beyond the end of the record, but we won't - * set NF to that value at this point, since this is only a - * reference to the field and NF only gets set if the field - * is assigned to -- this case is handled below - */ - if (parse_extent == fields_arr[0]->stptr + fields_arr[0]->stlen) - NF = parse_high_water; - if (requested == HUGE-1) /* HUGE-1 means set NF */ - requested = parse_high_water; - } - if (parse_high_water < requested) { /* requested beyond end of record */ - if (assign) { /* expand record */ - register int i; - - if (requested > nf_high_water) - grow_fields_arr(requested); - - /* fill in fields that don't exist */ - for (i = parse_high_water + 1; i <= requested; i++) - fields_arr[i] = Nnull_string; - - NF = requested; - parse_high_water = requested; - } else - return &Nnull_string; - } - - return &fields_arr[requested]; -} - -static void -set_element(num, s, len, n) -int num; -char *s; -int len; -NODE *n; -{ - register NODE *it; - - it = make_string(s, len); - it->flags |= MAYBE_NUM; - *assoc_lookup(n, tmp_number((AWKNUM) (num))) = it; -} - -NODE * -do_split(tree) -NODE *tree; -{ - NODE *t1, *t2, *t3, *tmp; - NODE *fs; - char *s; - long (*parseit)P((int, char **, int, NODE *, - Regexp *, Setfunc, NODE *)); - Regexp *rp = NULL; - - - /* - * do dupnode(), to avoid problems like - * x = split(a[1], a, "blah") - * since we assoc_clear the array. gack. - * this also gives up complete call by value semantics. - */ - tmp = tree_eval(tree->lnode); - t1 = dupnode(tmp); - free_temp(tmp); - - t2 = tree->rnode->lnode; - t3 = tree->rnode->rnode->lnode; - - (void) force_string(t1); - - if (t2->type == Node_param_list) - t2 = stack_ptr[t2->param_cnt]; - if (t2->type != Node_var && t2->type != Node_var_array) - fatal("second argument of split is not a variable"); - assoc_clear(t2); - - if (t3->re_flags & FS_DFLT) { - parseit = parse_field; - fs = force_string(FS_node->var_value); - rp = FS_regexp; - } else { - tmp = force_string(tree_eval(t3->re_exp)); - if (tmp->stlen == 1) { - if (tmp->stptr[0] == ' ') - parseit = def_parse_field; - else - parseit = sc_parse_field; - } else { - parseit = re_parse_field; - rp = re_update(t3); - } - fs = tmp; - } - - s = t1->stptr; - tmp = tmp_number((AWKNUM) (*parseit)(HUGE, &s, (int)t1->stlen, - fs, rp, set_element, t2)); - unref(t1); - free_temp(t3); - return tmp; -} - -void -set_FS() -{ - char buf[10]; - NODE *fs; - - /* - * If changing the way fields are split, obey least-suprise - * semantics, and force $0 to be split totally. - */ - if (fields_arr != NULL) - (void) get_field(HUGE - 1, 0); - - buf[0] = '\0'; - default_FS = 0; - if (FS_regexp) { - refree(FS_regexp); - FS_regexp = NULL; - } - fs = force_string(FS_node->var_value); - if (fs->stlen > 1) - parse_field = re_parse_field; - else if (*RS == 0) { - parse_field = sc_parse_field; - if (fs->stlen == 1) { - if (fs->stptr[0] == ' ') { - default_FS = 1; - strcpy(buf, "[ \t\n]+"); - } else if (fs->stptr[0] != '\n') - sprintf(buf, "[%c\n]", fs->stptr[0]); - } - } else { - parse_field = def_parse_field; - if (fs->stptr[0] == ' ' && fs->stlen == 1) - default_FS = 1; - else if (fs->stptr[0] != ' ' && fs->stlen == 1) { - if (IGNORECASE == 0) - parse_field = sc_parse_field; - else if (fs->stptr[0] == '\\') - /* yet another special case */ - strcpy(buf, "[\\\\]"); - else - sprintf(buf, "[%c]", fs->stptr[0]); - } - } - if (buf[0]) { - FS_regexp = make_regexp(buf, strlen(buf), IGNORECASE, 1); - parse_field = re_parse_field; - } else if (parse_field == re_parse_field) { - FS_regexp = make_regexp(fs->stptr, fs->stlen, IGNORECASE, 1); - } else - FS_regexp = NULL; - resave_fs = 1; -} - -void -set_RS() -{ - (void) force_string(RS_node->var_value); - RS = RS_node->var_value->stptr; - set_FS(); -} - -void -set_FIELDWIDTHS() -{ - register char *scan; - char *end; - register int i; - static int fw_alloc = 1; - static int warned = 0; - extern double strtod(); - - if (do_lint && ! warned) { - warned = 1; - warning("use of FIELDWIDTHS is a gawk extension"); - } - if (do_unix) /* quick and dirty, does the trick */ - return; - - /* - * If changing the way fields are split, obey least-suprise - * semantics, and force $0 to be split totally. - */ - if (fields_arr != NULL) - (void) get_field(HUGE - 1, 0); - - parse_field = fw_parse_field; - scan = force_string(FIELDWIDTHS_node->var_value)->stptr; - end = scan + 1; - if (FIELDWIDTHS == NULL) - emalloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS"); - FIELDWIDTHS[0] = 0; - for (i = 1; ; i++) { - if (i >= fw_alloc) { - fw_alloc *= 2; - erealloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS"); - } - FIELDWIDTHS[i] = (int) strtod(scan, &end); - if (end == scan) - break; - scan = end; - } - FIELDWIDTHS[i] = -1; -} diff --git a/gnu/usr.bin/awk/getopt.c b/gnu/usr.bin/awk/getopt.c deleted file mode 100644 index fd142f5..0000000 --- a/gnu/usr.bin/awk/getopt.c +++ /dev/null @@ -1,757 +0,0 @@ -/* Getopt for GNU. - NOTE: getopt is now part of the C library, so if you don't know what - "Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu - before changing it! - - Copyright (C) 1987, 88, 89, 90, 91, 92, 93, 1994 - Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; either version 2, or (at your option) any - later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ - -#ifdef HAVE_CONFIG_H -#if defined (emacs) || defined (CONFIG_BROKETS) -/* We use <config.h> instead of "config.h" so that a compilation - using -I. -I$srcdir will use ./config.h rather than $srcdir/config.h - (which it would do because it found this file in $srcdir). */ -#include <config.h> -#else -#include "config.h" -#endif -#endif - -#ifndef __STDC__ -/* This is a separate conditional since some stdc systems - reject `defined (const)'. */ -#ifndef const -#define const -#endif -#endif - -/* This tells Alpha OSF/1 not to define a getopt prototype in <stdio.h>. */ -#ifndef _NO_PROTO -#define _NO_PROTO -#endif - -#include <stdio.h> - -/* Comment out all this code if we are using the GNU C Library, and are not - actually compiling the library itself. This code is part of the GNU C - Library, but also included in many other GNU distributions. Compiling - and linking in this code is a waste when using the GNU C library - (especially if it is a shared library). Rather than having every GNU - program understand `configure --with-gnu-libc' and omit the object files, - it is simpler to just do this in the source for each such file. */ - -#if defined (_LIBC) || !defined (__GNU_LIBRARY__) - - -/* This needs to come after some library #include - to get __GNU_LIBRARY__ defined. */ -#if defined(__GNU_LIBRARY__) || defined(STDC_HEADERS) -/* Don't include stdlib.h for non-GNU C libraries because some of them - contain conflicting prototypes for getopt. */ -#include <stdlib.h> -#else -extern char *getenv (); -#endif /* __GNU_LIBRARY || STDC_HEADERS */ - -/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a - long-named option. Because this is not POSIX.2 compliant, it is - being phased out. */ -/* #define GETOPT_COMPAT */ - -/* This version of `getopt' appears to the caller like standard Unix `getopt' - but it behaves differently for the user, since it allows the user - to intersperse the options with the other arguments. - - As `getopt' works, it permutes the elements of ARGV so that, - when it is done, all the options precede everything else. Thus - all application programs are extended to handle flexible argument order. - - Setting the environment variable POSIXLY_CORRECT disables permutation. - Then the behavior is completely standard. - - GNU application programs can use a third alternative mode in which - they can distinguish the relative order of options and other arguments. */ - -#include "getopt.h" - -/* For communication from `getopt' to the caller. - When `getopt' finds an option that takes an argument, - the argument value is returned here. - Also, when `ordering' is RETURN_IN_ORDER, - each non-option ARGV-element is returned here. */ - -char *optarg = 0; - -/* Index in ARGV of the next element to be scanned. - This is used for communication to and from the caller - and for communication between successive calls to `getopt'. - - On entry to `getopt', zero means this is the first call; initialize. - - When `getopt' returns EOF, this is the index of the first of the - non-option elements that the caller should itself scan. - - Otherwise, `optind' communicates from one call to the next - how much of ARGV has been scanned so far. */ - -/* XXX 1003.2 says this must be 1 before any call. */ -int optind = 0; - -/* The next char to be scanned in the option-element - in which the last option character we returned was found. - This allows us to pick up the scan where we left off. - - If this is zero, or a null string, it means resume the scan - by advancing to the next ARGV-element. */ - -static char *nextchar; - -/* Callers store zero here to inhibit the error message - for unrecognized options. */ - -int opterr = 1; - -/* Set to an option character which was unrecognized. - This must be initialized on some systems to avoid linking in the - system's own getopt implementation. */ - -int optopt = '?'; - -/* Describe how to deal with options that follow non-option ARGV-elements. - - If the caller did not specify anything, - the default is REQUIRE_ORDER if the environment variable - POSIXLY_CORRECT is defined, PERMUTE otherwise. - - REQUIRE_ORDER means don't recognize them as options; - stop option processing when the first non-option is seen. - This is what Unix does. - This mode of operation is selected by either setting the environment - variable POSIXLY_CORRECT, or using `+' as the first character - of the list of option characters. - - PERMUTE is the default. We permute the contents of ARGV as we scan, - so that eventually all the non-options are at the end. This allows options - to be given in any order, even with programs that were not written to - expect this. - - RETURN_IN_ORDER is an option available to programs that were written - to expect options and other ARGV-elements in any order and that care about - the ordering of the two. We describe each non-option ARGV-element - as if it were the argument of an option with character code 1. - Using `-' as the first character of the list of option characters - selects this mode of operation. - - The special argument `--' forces an end of option-scanning regardless - of the value of `ordering'. In the case of RETURN_IN_ORDER, only - `--' can cause `getopt' to return EOF with `optind' != ARGC. */ - -static enum -{ - REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER -} ordering; - -#if defined(__GNU_LIBRARY__) || defined(STDC_HEADERS) -/* We want to avoid inclusion of string.h with non-GNU libraries - because there are many ways it can cause trouble. - On some systems, it contains special magic macros that don't work - in GCC. */ -#include <string.h> -#define my_index strchr -#else - -/* Avoid depending on library functions or files - whose names are inconsistent. */ - -static char * -my_index (str, chr) - const char *str; - int chr; -{ - while (*str) - { - if (*str == chr) - return (char *) str; - str++; - } - return 0; -} - -/* If using GCC, we can safely declare strlen this way. - If not using GCC, it is ok not to declare it. - (Supposedly there are some machines where it might get a warning, - but changing this conditional to __STDC__ is too risky.) */ -#ifdef __GNUC__ -#ifdef IN_GCC -#include "gstddef.h" -#else -#include <stddef.h> -#endif -extern size_t strlen (const char *); -#endif - -#endif /* __GNU_LIBRARY__ || STDC_HEADERS */ - -/* Handle permutation of arguments. */ - -/* Describe the part of ARGV that contains non-options that have - been skipped. `first_nonopt' is the index in ARGV of the first of them; - `last_nonopt' is the index after the last of them. */ - -static int first_nonopt; -static int last_nonopt; - -/* Exchange two adjacent subsequences of ARGV. - One subsequence is elements [first_nonopt,last_nonopt) - which contains all the non-options that have been skipped so far. - The other is elements [last_nonopt,optind), which contains all - the options processed since those non-options were skipped. - - `first_nonopt' and `last_nonopt' are relocated so that they describe - the new indices of the non-options in ARGV after they are moved. */ - -static void -exchange (argv) - char **argv; -{ - int bottom = first_nonopt; - int middle = last_nonopt; - int top = optind; - char *tem; - - /* Exchange the shorter segment with the far end of the longer segment. - That puts the shorter segment into the right place. - It leaves the longer segment in the right place overall, - but it consists of two parts that need to be swapped next. */ - - while (top > middle && middle > bottom) - { - if (top - middle > middle - bottom) - { - /* Bottom segment is the short one. */ - int len = middle - bottom; - register int i; - - /* Swap it with the top part of the top segment. */ - for (i = 0; i < len; i++) - { - tem = argv[bottom + i]; - argv[bottom + i] = argv[top - (middle - bottom) + i]; - argv[top - (middle - bottom) + i] = tem; - } - /* Exclude the moved bottom segment from further swapping. */ - top -= len; - } - else - { - /* Top segment is the short one. */ - int len = top - middle; - register int i; - - /* Swap it with the bottom part of the bottom segment. */ - for (i = 0; i < len; i++) - { - tem = argv[bottom + i]; - argv[bottom + i] = argv[middle + i]; - argv[middle + i] = tem; - } - /* Exclude the moved top segment from further swapping. */ - bottom += len; - } - } - - /* Update records for the slots the non-options now occupy. */ - - first_nonopt += (optind - last_nonopt); - last_nonopt = optind; -} - -/* Scan elements of ARGV (whose length is ARGC) for option characters - given in OPTSTRING. - - If an element of ARGV starts with '-', and is not exactly "-" or "--", - then it is an option element. The characters of this element - (aside from the initial '-') are option characters. If `getopt' - is called repeatedly, it returns successively each of the option characters - from each of the option elements. - - If `getopt' finds another option character, it returns that character, - updating `optind' and `nextchar' so that the next call to `getopt' can - resume the scan with the following option character or ARGV-element. - - If there are no more option characters, `getopt' returns `EOF'. - Then `optind' is the index in ARGV of the first ARGV-element - that is not an option. (The ARGV-elements have been permuted - so that those that are not options now come last.) - - OPTSTRING is a string containing the legitimate option characters. - If an option character is seen that is not listed in OPTSTRING, - return '?' after printing an error message. If you set `opterr' to - zero, the error message is suppressed but we still return '?'. - - If a char in OPTSTRING is followed by a colon, that means it wants an arg, - so the following text in the same ARGV-element, or the text of the following - ARGV-element, is returned in `optarg'. Two colons mean an option that - wants an optional arg; if there is text in the current ARGV-element, - it is returned in `optarg', otherwise `optarg' is set to zero. - - If OPTSTRING starts with `-' or `+', it requests different methods of - handling the non-option ARGV-elements. - See the comments about RETURN_IN_ORDER and REQUIRE_ORDER, above. - - Long-named options begin with `--' instead of `-'. - Their names may be abbreviated as long as the abbreviation is unique - or is an exact match for some defined option. If they have an - argument, it follows the option name in the same ARGV-element, separated - from the option name by a `=', or else the in next ARGV-element. - When `getopt' finds a long-named option, it returns 0 if that option's - `flag' field is nonzero, the value of the option's `val' field - if the `flag' field is zero. - - The elements of ARGV aren't really const, because we permute them. - But we pretend they're const in the prototype to be compatible - with other systems. - - LONGOPTS is a vector of `struct option' terminated by an - element containing a name which is zero. - - LONGIND returns the index in LONGOPT of the long-named option found. - It is only valid when a long-named option has been found by the most - recent call. - - If LONG_ONLY is nonzero, '-' as well as '--' can introduce - long-named options. */ - -int -_getopt_internal (argc, argv, optstring, longopts, longind, long_only) - int argc; - char *const *argv; - const char *optstring; - const struct option *longopts; - int *longind; - int long_only; -{ - int option_index; - - optarg = 0; - - /* Initialize the internal data when the first call is made. - Start processing options with ARGV-element 1 (since ARGV-element 0 - is the program name); the sequence of previously skipped - non-option ARGV-elements is empty. */ - - if (optind == 0) - { - first_nonopt = last_nonopt = optind = 1; - - nextchar = NULL; - - /* Determine how to handle the ordering of options and nonoptions. */ - - if (optstring[0] == '-') - { - ordering = RETURN_IN_ORDER; - ++optstring; - } - else if (optstring[0] == '+') - { - ordering = REQUIRE_ORDER; - ++optstring; - } - else if (getenv ("POSIXLY_CORRECT") != NULL) - ordering = REQUIRE_ORDER; - else - ordering = PERMUTE; - } - - if (nextchar == NULL || *nextchar == '\0') - { - if (ordering == PERMUTE) - { - /* If we have just processed some options following some non-options, - exchange them so that the options come first. */ - - if (first_nonopt != last_nonopt && last_nonopt != optind) - exchange ((char **) argv); - else if (last_nonopt != optind) - first_nonopt = optind; - - /* Now skip any additional non-options - and extend the range of non-options previously skipped. */ - - while (optind < argc - && (argv[optind][0] != '-' || argv[optind][1] == '\0') -#ifdef GETOPT_COMPAT - && (longopts == NULL - || argv[optind][0] != '+' || argv[optind][1] == '\0') -#endif /* GETOPT_COMPAT */ - ) - optind++; - last_nonopt = optind; - } - - /* Special ARGV-element `--' means premature end of options. - Skip it like a null option, - then exchange with previous non-options as if it were an option, - then skip everything else like a non-option. */ - - if (optind != argc && !strcmp (argv[optind], "--")) - { - optind++; - - if (first_nonopt != last_nonopt && last_nonopt != optind) - exchange ((char **) argv); - else if (first_nonopt == last_nonopt) - first_nonopt = optind; - last_nonopt = argc; - - optind = argc; - } - - /* If we have done all the ARGV-elements, stop the scan - and back over any non-options that we skipped and permuted. */ - - if (optind == argc) - { - /* Set the next-arg-index to point at the non-options - that we previously skipped, so the caller will digest them. */ - if (first_nonopt != last_nonopt) - optind = first_nonopt; - return EOF; - } - - /* If we have come to a non-option and did not permute it, - either stop the scan or describe it to the caller and pass it by. */ - - if ((argv[optind][0] != '-' || argv[optind][1] == '\0') -#ifdef GETOPT_COMPAT - && (longopts == NULL - || argv[optind][0] != '+' || argv[optind][1] == '\0') -#endif /* GETOPT_COMPAT */ - ) - { - if (ordering == REQUIRE_ORDER) - return EOF; - optarg = argv[optind++]; - return 1; - } - - /* We have found another option-ARGV-element. - Start decoding its characters. */ - - nextchar = (argv[optind] + 1 - + (longopts != NULL && argv[optind][1] == '-')); - } - - if (longopts != NULL - && ((argv[optind][0] == '-' - && (argv[optind][1] == '-' || long_only)) -#ifdef GETOPT_COMPAT - || argv[optind][0] == '+' -#endif /* GETOPT_COMPAT */ - )) - { - const struct option *p; - char *s = nextchar; - int exact = 0; - int ambig = 0; - const struct option *pfound = NULL; - int indfound; - - while (*s && *s != '=') - s++; - - /* Test all options for either exact match or abbreviated matches. */ - for (p = longopts, option_index = 0; p->name; - p++, option_index++) - if (!strncmp (p->name, nextchar, s - nextchar)) - { - if (s - nextchar == strlen (p->name)) - { - /* Exact match found. */ - pfound = p; - indfound = option_index; - exact = 1; - break; - } - else if (pfound == NULL) - { - /* First nonexact match found. */ - pfound = p; - indfound = option_index; - } - else - /* Second nonexact match found. */ - ambig = 1; - } - - if (ambig && !exact) - { - if (opterr) - fprintf (stderr, "%s: option `%s' is ambiguous\n", - argv[0], argv[optind]); - nextchar += strlen (nextchar); - optind++; - return '?'; - } - - if (pfound != NULL) - { - option_index = indfound; - optind++; - if (*s) - { - /* Don't test has_arg with >, because some C compilers don't - allow it to be used on enums. */ - if (pfound->has_arg) - optarg = s + 1; - else - { - if (opterr) - { - if (argv[optind - 1][1] == '-') - /* --option */ - fprintf (stderr, - "%s: option `--%s' doesn't allow an argument\n", - argv[0], pfound->name); - else - /* +option or -option */ - fprintf (stderr, - "%s: option `%c%s' doesn't allow an argument\n", - argv[0], argv[optind - 1][0], pfound->name); - } - nextchar += strlen (nextchar); - return '?'; - } - } - else if (pfound->has_arg == 1) - { - if (optind < argc) - optarg = argv[optind++]; - else - { - if (opterr) - fprintf (stderr, "%s: option `%s' requires an argument\n", - argv[0], argv[optind - 1]); - nextchar += strlen (nextchar); - return optstring[0] == ':' ? ':' : '?'; - } - } - nextchar += strlen (nextchar); - if (longind != NULL) - *longind = option_index; - if (pfound->flag) - { - *(pfound->flag) = pfound->val; - return 0; - } - return pfound->val; - } - /* Can't find it as a long option. If this is not getopt_long_only, - or the option starts with '--' or is not a valid short - option, then it's an error. - Otherwise interpret it as a short option. */ - if (!long_only || argv[optind][1] == '-' -#ifdef GETOPT_COMPAT - || argv[optind][0] == '+' -#endif /* GETOPT_COMPAT */ - || my_index (optstring, *nextchar) == NULL) - { - if (opterr) - { - if (argv[optind][1] == '-') - /* --option */ - fprintf (stderr, "%s: unrecognized option `--%s'\n", - argv[0], nextchar); - else - /* +option or -option */ - fprintf (stderr, "%s: unrecognized option `%c%s'\n", - argv[0], argv[optind][0], nextchar); - } - nextchar = (char *) ""; - optind++; - return '?'; - } - } - - /* Look at and handle the next option-character. */ - - { - char c = *nextchar++; - char *temp = my_index (optstring, c); - - /* Increment `optind' when we start to process its last character. */ - if (*nextchar == '\0') - ++optind; - - if (temp == NULL || c == ':') - { - if (opterr) - { -#if 0 - if (c < 040 || c >= 0177) - fprintf (stderr, "%s: unrecognized option, character code 0%o\n", - argv[0], c); - else - fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c); -#else - /* 1003.2 specifies the format of this message. */ - fprintf (stderr, "%s: illegal option -- %c\n", argv[0], c); -#endif - } - optopt = c; - return '?'; - } - if (temp[1] == ':') - { - if (temp[2] == ':') - { - /* This is an option that accepts an argument optionally. */ - if (*nextchar != '\0') - { - optarg = nextchar; - optind++; - } - else - optarg = 0; - nextchar = NULL; - } - else - { - /* This is an option that requires an argument. */ - if (*nextchar != '\0') - { - optarg = nextchar; - /* If we end this ARGV-element by taking the rest as an arg, - we must advance to the next element now. */ - optind++; - } - else if (optind == argc) - { - if (opterr) - { -#if 0 - fprintf (stderr, "%s: option `-%c' requires an argument\n", - argv[0], c); -#else - /* 1003.2 specifies the format of this message. */ - fprintf (stderr, "%s: option requires an argument -- %c\n", - argv[0], c); -#endif - } - optopt = c; - if (optstring[0] == ':') - c = ':'; - else - c = '?'; - } - else - /* We already incremented `optind' once; - increment it again when taking next ARGV-elt as argument. */ - optarg = argv[optind++]; - nextchar = NULL; - } - } - return c; - } -} - -int -getopt (argc, argv, optstring) - int argc; - char *const *argv; - const char *optstring; -{ - return _getopt_internal (argc, argv, optstring, - (const struct option *) 0, - (int *) 0, - 0); -} - -#endif /* _LIBC or not __GNU_LIBRARY__. */ - -#ifdef TEST - -/* Compile with -DTEST to make an executable for use in testing - the above definition of `getopt'. */ - -int -main (argc, argv) - int argc; - char **argv; -{ - int c; - int digit_optind = 0; - - while (1) - { - int this_option_optind = optind ? optind : 1; - - c = getopt (argc, argv, "abc:d:0123456789"); - if (c == EOF) - break; - - switch (c) - { - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - if (digit_optind != 0 && digit_optind != this_option_optind) - printf ("digits occur in two different argv-elements.\n"); - digit_optind = this_option_optind; - printf ("option %c\n", c); - break; - - case 'a': - printf ("option a\n"); - break; - - case 'b': - printf ("option b\n"); - break; - - case 'c': - printf ("option c with value `%s'\n", optarg); - break; - - case '?': - break; - - default: - printf ("?? getopt returned character code 0%o ??\n", c); - } - } - - if (optind < argc) - { - printf ("non-option ARGV-elements: "); - while (optind < argc) - printf ("%s ", argv[optind++]); - printf ("\n"); - } - - exit (0); -} - -#endif /* TEST */ diff --git a/gnu/usr.bin/awk/getopt.h b/gnu/usr.bin/awk/getopt.h deleted file mode 100644 index b0fc4ff..0000000 --- a/gnu/usr.bin/awk/getopt.h +++ /dev/null @@ -1,129 +0,0 @@ -/* Declarations for getopt. - Copyright (C) 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; either version 2, or (at your option) any - later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ - -#ifndef _GETOPT_H -#define _GETOPT_H 1 - -#ifdef __cplusplus -extern "C" { -#endif - -/* For communication from `getopt' to the caller. - When `getopt' finds an option that takes an argument, - the argument value is returned here. - Also, when `ordering' is RETURN_IN_ORDER, - each non-option ARGV-element is returned here. */ - -extern char *optarg; - -/* Index in ARGV of the next element to be scanned. - This is used for communication to and from the caller - and for communication between successive calls to `getopt'. - - On entry to `getopt', zero means this is the first call; initialize. - - When `getopt' returns EOF, this is the index of the first of the - non-option elements that the caller should itself scan. - - Otherwise, `optind' communicates from one call to the next - how much of ARGV has been scanned so far. */ - -extern int optind; - -/* Callers store zero here to inhibit the error message `getopt' prints - for unrecognized options. */ - -extern int opterr; - -/* Set to an option character which was unrecognized. */ - -extern int optopt; - -/* Describe the long-named options requested by the application. - The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector - of `struct option' terminated by an element containing a name which is - zero. - - The field `has_arg' is: - no_argument (or 0) if the option does not take an argument, - required_argument (or 1) if the option requires an argument, - optional_argument (or 2) if the option takes an optional argument. - - If the field `flag' is not NULL, it points to a variable that is set - to the value given in the field `val' when the option is found, but - left unchanged if the option is not found. - - To have a long-named option do something other than set an `int' to - a compiled-in constant, such as set a value from `optarg', set the - option's `flag' field to zero and its `val' field to a nonzero - value (the equivalent single-letter option character, if there is - one). For long options that have a zero `flag' field, `getopt' - returns the contents of the `val' field. */ - -struct option -{ -#ifdef __STDC__ - const char *name; -#else - char *name; -#endif - /* has_arg can't be an enum because some compilers complain about - type mismatches in all the code that assumes it is an int. */ - int has_arg; - int *flag; - int val; -}; - -/* Names for the values of the `has_arg' field of `struct option'. */ - -#define no_argument 0 -#define required_argument 1 -#define optional_argument 2 - -#ifdef __STDC__ -#if defined(__GNU_LIBRARY__) -/* Many other libraries have conflicting prototypes for getopt, with - differences in the consts, in stdlib.h. To avoid compilation - errors, only prototype getopt for the GNU C library. */ -extern int getopt (int argc, char *const *argv, const char *shortopts); -#else /* not __GNU_LIBRARY__ */ -extern int getopt (); -#endif /* not __GNU_LIBRARY__ */ -extern int getopt_long (int argc, char *const *argv, const char *shortopts, - const struct option *longopts, int *longind); -extern int getopt_long_only (int argc, char *const *argv, - const char *shortopts, - const struct option *longopts, int *longind); - -/* Internal only. Users should not call this directly. */ -extern int _getopt_internal (int argc, char *const *argv, - const char *shortopts, - const struct option *longopts, int *longind, - int long_only); -#else /* not __STDC__ */ -extern int getopt (); -extern int getopt_long (); -extern int getopt_long_only (); - -extern int _getopt_internal (); -#endif /* not __STDC__ */ - -#ifdef __cplusplus -} -#endif - -#endif /* _GETOPT_H */ diff --git a/gnu/usr.bin/awk/getopt1.c b/gnu/usr.bin/awk/getopt1.c deleted file mode 100644 index 7739b51..0000000 --- a/gnu/usr.bin/awk/getopt1.c +++ /dev/null @@ -1,187 +0,0 @@ -/* getopt_long and getopt_long_only entry points for GNU getopt. - Copyright (C) 1987, 88, 89, 90, 91, 92, 1993 - Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; either version 2, or (at your option) any - later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ - -#ifdef HAVE_CONFIG_H -#if defined (emacs) || defined (CONFIG_BROKETS) -/* We use <config.h> instead of "config.h" so that a compilation - using -I. -I$srcdir will use ./config.h rather than $srcdir/config.h - (which it would do because it found this file in $srcdir). */ -#include <config.h> -#else -#include "config.h" -#endif -#endif - -#include "getopt.h" - -#ifndef __STDC__ -/* This is a separate conditional since some stdc systems - reject `defined (const)'. */ -#ifndef const -#define const -#endif -#endif - -#include <stdio.h> - -/* Comment out all this code if we are using the GNU C Library, and are not - actually compiling the library itself. This code is part of the GNU C - Library, but also included in many other GNU distributions. Compiling - and linking in this code is a waste when using the GNU C library - (especially if it is a shared library). Rather than having every GNU - program understand `configure --with-gnu-libc' and omit the object files, - it is simpler to just do this in the source for each such file. */ - -#if defined (_LIBC) || !defined (__GNU_LIBRARY__) - - -/* This needs to come after some library #include - to get __GNU_LIBRARY__ defined. */ -#if defined(__GNU_LIBRARY__) || defined(OS2) || defined(MSDOS) || defined(atarist) -#include <stdlib.h> -#else -char *getenv (); -#endif - -#ifndef NULL -#define NULL 0 -#endif - -int -getopt_long (argc, argv, options, long_options, opt_index) - int argc; - char *const *argv; - const char *options; - const struct option *long_options; - int *opt_index; -{ - return _getopt_internal (argc, argv, options, long_options, opt_index, 0); -} - -/* Like getopt_long, but '-' as well as '--' can indicate a long option. - If an option that starts with '-' (not '--') doesn't match a long option, - but does match a short option, it is parsed as a short option - instead. */ - -int -getopt_long_only (argc, argv, options, long_options, opt_index) - int argc; - char *const *argv; - const char *options; - const struct option *long_options; - int *opt_index; -{ - return _getopt_internal (argc, argv, options, long_options, opt_index, 1); -} - - -#endif /* _LIBC or not __GNU_LIBRARY__. */ - -#ifdef TEST - -#include <stdio.h> - -int -main (argc, argv) - int argc; - char **argv; -{ - int c; - int digit_optind = 0; - - while (1) - { - int this_option_optind = optind ? optind : 1; - int option_index = 0; - static struct option long_options[] = - { - {"add", 1, 0, 0}, - {"append", 0, 0, 0}, - {"delete", 1, 0, 0}, - {"verbose", 0, 0, 0}, - {"create", 0, 0, 0}, - {"file", 1, 0, 0}, - {0, 0, 0, 0} - }; - - c = getopt_long (argc, argv, "abc:d:0123456789", - long_options, &option_index); - if (c == EOF) - break; - - switch (c) - { - case 0: - printf ("option %s", long_options[option_index].name); - if (optarg) - printf (" with arg %s", optarg); - printf ("\n"); - break; - - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - case '8': - case '9': - if (digit_optind != 0 && digit_optind != this_option_optind) - printf ("digits occur in two different argv-elements.\n"); - digit_optind = this_option_optind; - printf ("option %c\n", c); - break; - - case 'a': - printf ("option a\n"); - break; - - case 'b': - printf ("option b\n"); - break; - - case 'c': - printf ("option c with value `%s'\n", optarg); - break; - - case 'd': - printf ("option d with value `%s'\n", optarg); - break; - - case '?': - break; - - default: - printf ("?? getopt returned character code 0%o ??\n", c); - } - } - - if (optind < argc) - { - printf ("non-option ARGV-elements: "); - while (optind < argc) - printf ("%s ", argv[optind++]); - printf ("\n"); - } - - exit (0); -} - -#endif /* TEST */ diff --git a/gnu/usr.bin/awk/io.c b/gnu/usr.bin/awk/io.c deleted file mode 100644 index 7f92556..0000000 --- a/gnu/usr.bin/awk/io.c +++ /dev/null @@ -1,1283 +0,0 @@ -/* - * io.c --- routines for dealing with input and output and records - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#if !defined(VMS) && !defined(VMS_POSIX) && !defined(_MSC_VER) -#include <sys/param.h> -#endif -#include "awk.h" - -#ifndef O_RDONLY -#include <fcntl.h> -#endif - -#if !defined(S_ISDIR) && defined(S_IFDIR) -#define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) -#endif - -#ifndef ENFILE -#define ENFILE EMFILE -#endif - -#ifndef atarist -#define INVALID_HANDLE (-1) -#else -#define INVALID_HANDLE (__SMALLEST_VALID_HANDLE - 1) -#endif - -#if defined(MSDOS) || defined(OS2) || defined(atarist) -#define PIPES_SIMULATED -#endif - -static IOBUF *nextfile P((int skipping)); -static int inrec P((IOBUF *iop)); -static int iop_close P((IOBUF *iop)); -struct redirect *redirect P((NODE *tree, int *errflg)); -static void close_one P((void)); -static int close_redir P((struct redirect *rp, int exitwarn)); -#ifndef PIPES_SIMULATED -static int wait_any P((int interesting)); -#endif -static IOBUF *gawk_popen P((char *cmd, struct redirect *rp)); -static IOBUF *iop_open P((const char *file, const char *how)); -static int gawk_pclose P((struct redirect *rp)); -static int do_pathopen P((const char *file)); -static int str2mode P((const char *mode)); -static void spec_setup P((IOBUF *iop, int len, int allocate)); -static int specfdopen P((IOBUF *iop, const char *name, const char *mode)); -static int pidopen P((IOBUF *iop, const char *name, const char *mode)); -static int useropen P((IOBUF *iop, const char *name, const char *mode)); - -extern FILE *fdopen(); - -#if defined (MSDOS) -#include "popen.h" -#define popen(c,m) os_popen(c,m) -#define pclose(f) os_pclose(f) -#elif defined (OS2) /* OS/2, but not family mode */ -#if defined (_MSC_VER) -#define popen(c,m) _popen(c,m) -#define pclose(f) _pclose(f) -#endif -#else -extern FILE *popen(); -#endif - -static struct redirect *red_head = NULL; - -extern int output_is_tty; -extern NODE *ARGC_node; -extern NODE *ARGV_node; -extern NODE *ARGIND_node; -extern NODE *ERRNO_node; -extern NODE **fields_arr; - -static jmp_buf filebuf; /* for do_nextfile() */ - -/* do_nextfile --- implement gawk "next file" extension */ - -void -do_nextfile() -{ - (void) nextfile(1); - longjmp(filebuf, 1); -} - -static IOBUF * -nextfile(skipping) -int skipping; -{ - static int i = 1; - static int files = 0; - NODE *arg; - static IOBUF *curfile = NULL; - - if (skipping) { - if (curfile != NULL) - iop_close(curfile); - curfile = NULL; - return NULL; - } - if (curfile != NULL) { - if (curfile->cnt == EOF) { - (void) iop_close(curfile); - curfile = NULL; - } else - return curfile; - } - for (; i < (int) (ARGC_node->lnode->numbr); i++) { - arg = *assoc_lookup(ARGV_node, tmp_number((AWKNUM) i)); - if (arg->stptr[0] == '\0') - continue; - arg->stptr[arg->stlen] = '\0'; - if (! do_unix) { - ARGIND_node->var_value->numbr = i; - ARGIND_node->var_value->flags = NUM|NUMBER; - } - if (!arg_assign(arg->stptr)) { - files++; - curfile = iop_open(arg->stptr, "r"); - if (curfile == NULL) - fatal("cannot open file `%s' for reading (%s)", - arg->stptr, strerror(errno)); - /* NOTREACHED */ - /* This is a kludge. */ - unref(FILENAME_node->var_value); - FILENAME_node->var_value = dupnode(arg); - FNR = 0; - i++; - break; - } - } - if (files == 0) { - files++; - /* no args. -- use stdin */ - /* FNR is init'ed to 0 */ - FILENAME_node->var_value = make_string("-", 1); - curfile = iop_alloc(fileno(stdin)); - } - return curfile; -} - -void -set_FNR() -{ - FNR = (long) FNR_node->var_value->numbr; -} - -void -set_NR() -{ - NR = (long) NR_node->var_value->numbr; -} - -/* - * This reads in a record from the input file - */ -static int -inrec(iop) -IOBUF *iop; -{ - char *begin; - register int cnt; - int retval = 0; - - cnt = get_a_record(&begin, iop, *RS, NULL); - if (cnt == EOF) { - cnt = 0; - retval = 1; - } else { - NR += 1; - FNR += 1; - } - set_record(begin, cnt, 1); - - return retval; -} - -static int -iop_close(iop) -IOBUF *iop; -{ - int ret; - - if (iop == NULL) - return 0; - errno = 0; - -#ifdef _CRAY - /* Work around bug in UNICOS popen */ - if (iop->fd < 3) - ret = 0; - else -#endif - /* save these for re-use; don't free the storage */ - if ((iop->flag & IOP_IS_INTERNAL) != 0) { - iop->off = iop->buf; - iop->end = iop->buf + strlen(iop->buf); - iop->cnt = 0; - iop->secsiz = 0; - return 0; - } - - /* Don't close standard files or else crufty code elsewhere will lose */ - if (iop->fd == fileno(stdin) || - iop->fd == fileno(stdout) || - iop->fd == fileno(stderr)) - ret = 0; - else - ret = close(iop->fd); - if (ret == -1) - warning("close of fd %d failed (%s)", iop->fd, strerror(errno)); - if ((iop->flag & IOP_NO_FREE) == 0) { - /* - * be careful -- $0 may still reference the buffer even though - * an explicit close is being done; in the future, maybe we - * can do this a bit better - */ - if (iop->buf) { - if ((fields_arr[0]->stptr >= iop->buf) - && (fields_arr[0]->stptr < iop->end)) { - NODE *t; - - t = make_string(fields_arr[0]->stptr, - fields_arr[0]->stlen); - unref(fields_arr[0]); - fields_arr [0] = t; - reset_record (); - } - free(iop->buf); - } - free((char *)iop); - } - return ret == -1 ? 1 : 0; -} - -void -do_input() -{ - IOBUF *iop; - extern int exiting; - - (void) setjmp(filebuf); - - while ((iop = nextfile(0)) != NULL) { - if (inrec(iop) == 0) - while (interpret(expression_value) && inrec(iop) == 0) - continue; - /* recover any space from C based alloca */ - (void) alloca(0); - - if (exiting) - break; - } -} - -/* Redirection for printf and print commands */ -struct redirect * -redirect(tree, errflg) -NODE *tree; -int *errflg; -{ - register NODE *tmp; - register struct redirect *rp; - register char *str; - int tflag = 0; - int outflag = 0; - const char *direction = "to"; - const char *mode; - int fd; - const char *what = NULL; - - switch (tree->type) { - case Node_redirect_append: - tflag = RED_APPEND; - /* FALL THROUGH */ - case Node_redirect_output: - outflag = (RED_FILE|RED_WRITE); - tflag |= outflag; - if (tree->type == Node_redirect_output) - what = ">"; - else - what = ">>"; - break; - case Node_redirect_pipe: - tflag = (RED_PIPE|RED_WRITE); - what = "|"; - break; - case Node_redirect_pipein: - tflag = (RED_PIPE|RED_READ); - what = "|"; - break; - case Node_redirect_input: - tflag = (RED_FILE|RED_READ); - what = "<"; - break; - default: - fatal ("invalid tree type %d in redirect()", tree->type); - break; - } - tmp = tree_eval(tree->subnode); - if (do_lint && ! (tmp->flags & STR)) - warning("expression in `%s' redirection only has numeric value", - what); - tmp = force_string(tmp); - str = tmp->stptr; - if (str == NULL || *str == '\0') - fatal("expression for `%s' redirection has null string value", - what); - if (do_lint - && (STREQN(str, "0", tmp->stlen) || STREQN(str, "1", tmp->stlen))) - warning("filename `%s' for `%s' redirection may be result of logical expression", str, what); - for (rp = red_head; rp != NULL; rp = rp->next) - if (strlen(rp->value) == tmp->stlen - && STREQN(rp->value, str, tmp->stlen) - && ((rp->flag & ~(RED_NOBUF|RED_EOF)) == tflag - || (outflag - && (rp->flag & (RED_FILE|RED_WRITE)) == outflag))) - break; - if (rp == NULL) { - emalloc(rp, struct redirect *, sizeof(struct redirect), - "redirect"); - emalloc(str, char *, tmp->stlen+1, "redirect"); - memcpy(str, tmp->stptr, tmp->stlen); - str[tmp->stlen] = '\0'; - rp->value = str; - rp->flag = tflag; - rp->fp = NULL; - rp->iop = NULL; - rp->pid = 0; /* unlikely that we're worried about init */ - rp->status = 0; - /* maintain list in most-recently-used first order */ - if (red_head) - red_head->prev = rp; - rp->prev = NULL; - rp->next = red_head; - red_head = rp; - } - while (rp->fp == NULL && rp->iop == NULL) { - if (rp->flag & RED_EOF) - /* encountered EOF on file or pipe -- must be cleared - * by explicit close() before reading more - */ - return rp; - mode = NULL; - errno = 0; - switch (tree->type) { - case Node_redirect_output: - mode = "w"; - if (rp->flag & RED_USED) - mode = "a"; - break; - case Node_redirect_append: - mode = "a"; - break; - case Node_redirect_pipe: - if ((rp->fp = popen(str, "w")) == NULL) - fatal("can't open pipe (\"%s\") for output (%s)", - str, strerror(errno)); - rp->flag |= RED_NOBUF; - break; - case Node_redirect_pipein: - direction = "from"; - if (gawk_popen(str, rp) == NULL) - fatal("can't open pipe (\"%s\") for input (%s)", - str, strerror(errno)); - break; - case Node_redirect_input: - direction = "from"; - rp->iop = iop_open(str, "r"); - break; - default: - cant_happen(); - } - if (mode != NULL) { - fd = devopen(str, mode); - if (fd > INVALID_HANDLE) { - if (fd == fileno(stdin)) - rp->fp = stdin; - else if (fd == fileno(stdout)) - rp->fp = stdout; - else if (fd == fileno(stderr)) - rp->fp = stderr; - else { - rp->fp = fdopen(fd, (char *) mode); - /* don't leak file descriptors */ - if (rp->fp == NULL) - close(fd); - } - if (rp->fp != NULL && isatty(fd)) - rp->flag |= RED_NOBUF; - } - } - if (rp->fp == NULL && rp->iop == NULL) { - /* too many files open -- close one and try again */ - if (errno == EMFILE || errno == ENFILE) - close_one(); - else { - /* - * Some other reason for failure. - * - * On redirection of input from a file, - * just return an error, so e.g. getline - * can return -1. For output to file, - * complain. The shell will complain on - * a bad command to a pipe. - */ - *errflg = errno; - if (tree->type == Node_redirect_output - || tree->type == Node_redirect_append) - fatal("can't redirect %s `%s' (%s)", - direction, str, strerror(errno)); - else { - free_temp(tmp); - return NULL; - } - } - } - } - free_temp(tmp); - return rp; -} - -static void -close_one() -{ - register struct redirect *rp; - register struct redirect *rplast = NULL; - - /* go to end of list first, to pick up least recently used entry */ - for (rp = red_head; rp != NULL; rp = rp->next) - rplast = rp; - /* now work back up through the list */ - for (rp = rplast; rp != NULL; rp = rp->prev) - if (rp->fp && (rp->flag & RED_FILE)) { - rp->flag |= RED_USED; - errno = 0; - if (fclose(rp->fp)) - warning("close of \"%s\" failed (%s).", - rp->value, strerror(errno)); - rp->fp = NULL; - break; - } - if (rp == NULL) - /* surely this is the only reason ??? */ - fatal("too many pipes or input files open"); -} - -NODE * -do_close(tree) -NODE *tree; -{ - NODE *tmp; - register struct redirect *rp; - - tmp = force_string(tree_eval(tree->subnode)); - for (rp = red_head; rp != NULL; rp = rp->next) { - if (strlen(rp->value) == tmp->stlen - && STREQN(rp->value, tmp->stptr, tmp->stlen)) - break; - } - free_temp(tmp); - if (rp == NULL) /* no match */ - return tmp_number((AWKNUM) 0.0); - fflush(stdout); /* synchronize regular output */ - tmp = tmp_number((AWKNUM)close_redir(rp, 0)); - rp = NULL; - return tmp; -} - -static int -close_redir(rp, exitwarn) -register struct redirect *rp; -int exitwarn; -{ - int status = 0; - char *what; - - if (rp == NULL) - return 0; - if (rp->fp == stdout || rp->fp == stderr) - return 0; - errno = 0; - if ((rp->flag & (RED_PIPE|RED_WRITE)) == (RED_PIPE|RED_WRITE)) - status = pclose(rp->fp); - else if (rp->fp) - status = fclose(rp->fp); - else if (rp->iop) { - if (rp->flag & RED_PIPE) - status = gawk_pclose(rp); - else { - status = iop_close(rp->iop); - rp->iop = NULL; - } - } - - what = (rp->flag & RED_PIPE) ? "pipe" : "file"; - - if (exitwarn) - warning("no explicit close of %s \"%s\" provided", - what, rp->value); - - /* SVR4 awk checks and warns about status of close */ - if (status) { - char *s = strerror(errno); - - warning("failure status (%d) on %s close of \"%s\" (%s)", - status, what, rp->value, s); - - if (! do_unix) { - /* set ERRNO too so that program can get at it */ - unref(ERRNO_node->var_value); - ERRNO_node->var_value = make_string(s, strlen(s)); - } - } - if (rp->next) - rp->next->prev = rp->prev; - if (rp->prev) - rp->prev->next = rp->next; - else - red_head = rp->next; - free(rp->value); - free((char *)rp); - return status; -} - -int -flush_io () -{ - register struct redirect *rp; - int status = 0; - - errno = 0; - if (fflush(stdout)) { - warning("error writing standard output (%s).", strerror(errno)); - status++; - } - if (fflush(stderr)) { - warning("error writing standard error (%s).", strerror(errno)); - status++; - } - for (rp = red_head; rp != NULL; rp = rp->next) - /* flush both files and pipes, what the heck */ - if ((rp->flag & RED_WRITE) && rp->fp != NULL) { - if (fflush(rp->fp)) { - warning("%s flush of \"%s\" failed (%s).", - (rp->flag & RED_PIPE) ? "pipe" : - "file", rp->value, strerror(errno)); - status++; - } - } - return status; -} - -int -close_io () -{ - register struct redirect *rp; - register struct redirect *next; - int status = 0; - - errno = 0; - for (rp = red_head; rp != NULL; rp = next) { - next = rp->next; - /* close_redir() will print a message if needed */ - /* if do_lint, warn about lack of explicit close */ - if (close_redir(rp, do_lint)) - status++; - rp = NULL; - } - /* - * Some of the non-Unix os's have problems doing an fclose - * on stdout and stderr. Since we don't really need to close - * them, we just flush them, and do that across the board. - */ - if (fflush(stdout)) { - warning("error writing standard output (%s).", strerror(errno)); - status++; - } - if (fflush(stderr)) { - warning("error writing standard error (%s).", strerror(errno)); - status++; - } - return status; -} - -/* str2mode --- convert a string mode to an integer mode */ - -static int -str2mode(mode) -const char *mode; -{ - int ret; - - switch(mode[0]) { - case 'r': - ret = O_RDONLY; - break; - - case 'w': - ret = O_WRONLY|O_CREAT|O_TRUNC; - break; - - case 'a': - ret = O_WRONLY|O_APPEND|O_CREAT; - break; - - default: - ret = 0; /* lint */ - cant_happen(); - } - return ret; -} - -/* devopen --- handle /dev/std{in,out,err}, /dev/fd/N, regular files */ - -/* - * This separate version is still needed for output, since file and pipe - * output is done with stdio. iop_open() handles input with IOBUFs of - * more "special" files. Those files are not handled here since it makes - * no sense to use them for output. - */ - -int -devopen(name, mode) -const char *name, *mode; -{ - int openfd = INVALID_HANDLE; - const char *cp; - char *ptr; - int flag = 0; - struct stat buf; - extern double strtod(); - - flag = str2mode(mode); - - if (do_unix) - goto strictopen; - -#ifdef VMS - if ((openfd = vms_devopen(name, flag)) >= 0) - return openfd; -#endif /* VMS */ - - if (STREQ(name, "-")) - openfd = fileno(stdin); - else if (STREQN(name, "/dev/", 5) && stat((char *) name, &buf) == -1) { - cp = name + 5; - - if (STREQ(cp, "stdin") && (flag & O_RDONLY) == O_RDONLY) - openfd = fileno(stdin); - else if (STREQ(cp, "stdout") && (flag & O_WRONLY) == O_WRONLY) - openfd = fileno(stdout); - else if (STREQ(cp, "stderr") && (flag & O_WRONLY) == O_WRONLY) - openfd = fileno(stderr); - else if (STREQN(cp, "fd/", 3)) { - cp += 3; - openfd = (int)strtod(cp, &ptr); - if (openfd <= INVALID_HANDLE || ptr == cp) - openfd = INVALID_HANDLE; - } - } - -strictopen: - if (openfd == INVALID_HANDLE) - openfd = open(name, flag, 0666); - if (openfd != INVALID_HANDLE && fstat(openfd, &buf) > 0) - if (S_ISDIR(buf.st_mode)) - fatal("file `%s' is a directory", name); - return openfd; -} - - -/* spec_setup --- setup an IOBUF for a special internal file */ - -static void -spec_setup(iop, len, allocate) -IOBUF *iop; -int len; -int allocate; -{ - char *cp; - - if (allocate) { - emalloc(cp, char *, len+2, "spec_setup"); - iop->buf = cp; - } else { - len = strlen(iop->buf); - iop->buf[len++] = '\n'; /* get_a_record clobbered it */ - iop->buf[len] = '\0'; /* just in case */ - } - iop->off = iop->buf; - iop->cnt = 0; - iop->secsiz = 0; - iop->size = len; - iop->end = iop->buf + len; - iop->fd = -1; - iop->flag = IOP_IS_INTERNAL; -} - -/* specfdopen --- open a fd special file */ - -static int -specfdopen(iop, name, mode) -IOBUF *iop; -const char *name, *mode; -{ - int fd; - IOBUF *tp; - - fd = devopen(name, mode); - if (fd == INVALID_HANDLE) - return INVALID_HANDLE; - tp = iop_alloc(fd); - if (tp == NULL) - return INVALID_HANDLE; - *iop = *tp; - iop->flag |= IOP_NO_FREE; - free(tp); - return 0; -} - -/* - * Following mess will improve in 2.16; this is written to avoid - * long lines, avoid splitting #if with backslash, and avoid #elif - * to maximize portability. - */ -#ifndef GETPGRP_NOARG -#if defined(__svr4__) || defined(BSD4_4) || defined(_POSIX_SOURCE) -#define GETPGRP_NOARG -#else -#if defined(i860) || defined(_AIX) || defined(hpux) || defined(VMS) -#define GETPGRP_NOARG -#else -#if defined(OS2) || defined(MSDOS) || defined(AMIGA) || defined(atarist) -#define GETPGRP_NOARG -#endif -#endif -#endif -#endif - -#ifdef GETPGRP_NOARG -#define getpgrp_ARG /* nothing */ -#else -#define getpgrp_ARG getpid() -#endif - -/* pidopen --- "open" /dev/pid, /dev/ppid, and /dev/pgrpid */ - -static int -pidopen(iop, name, mode) -IOBUF *iop; -const char *name, *mode; -{ - char tbuf[BUFSIZ]; - int i; - - if (name[6] == 'g') - sprintf(tbuf, "%d\n", getpgrp( getpgrp_ARG )); - else if (name[6] == 'i') - sprintf(tbuf, "%d\n", getpid()); - else - sprintf(tbuf, "%d\n", getppid()); - i = strlen(tbuf); - spec_setup(iop, i, 1); - strcpy(iop->buf, tbuf); - return 0; -} - -/* useropen --- "open" /dev/user */ - -/* - * /dev/user creates a record as follows: - * $1 = getuid() - * $2 = geteuid() - * $3 = getgid() - * $4 = getegid() - * If multiple groups are supported, the $5 through $NF are the - * supplementary group set. - */ - -static int -useropen(iop, name, mode) -IOBUF *iop; -const char *name, *mode; -{ - char tbuf[BUFSIZ], *cp; - int i; -#if defined(NGROUPS_MAX) && NGROUPS_MAX > 0 -#if defined(atarist) || defined(__svr4__) || defined(__osf__) || defined(__FreeBSD__) - gid_t groupset[NGROUPS_MAX]; -#else - int groupset[NGROUPS_MAX]; -#endif - int ngroups; -#endif - - sprintf(tbuf, "%d %d %d %d", getuid(), geteuid(), getgid(), getegid()); - - cp = tbuf + strlen(tbuf); -#if defined(NGROUPS_MAX) && NGROUPS_MAX > 0 - ngroups = getgroups(NGROUPS_MAX, groupset); - if (ngroups == -1) - fatal("could not find groups: %s", strerror(errno)); - - for (i = 0; i < ngroups; i++) { - *cp++ = ' '; - sprintf(cp, "%d", (int)groupset[i]); - cp += strlen(cp); - } -#endif - *cp++ = '\n'; - *cp++ = '\0'; - - - i = strlen(tbuf); - spec_setup(iop, i, 1); - strcpy(iop->buf, tbuf); - return 0; -} - -/* iop_open --- handle special and regular files for input */ - -static IOBUF * -iop_open(name, mode) -const char *name, *mode; -{ - int openfd = INVALID_HANDLE; - int flag = 0; - struct stat buf; - IOBUF *iop; - static struct internal { - const char *name; - int compare; - int (*fp) P((IOBUF*,const char *,const char *)); - IOBUF iob; - } table[] = { - { "/dev/fd/", 8, specfdopen }, - { "/dev/stdin", 10, specfdopen }, - { "/dev/stdout", 11, specfdopen }, - { "/dev/stderr", 11, specfdopen }, - { "/dev/pid", 8, pidopen }, - { "/dev/ppid", 9, pidopen }, - { "/dev/pgrpid", 11, pidopen }, - { "/dev/user", 9, useropen }, - }; - int devcount = sizeof(table) / sizeof(table[0]); - - flag = str2mode(mode); - - if (do_unix) - goto strictopen; - - if (STREQ(name, "-")) - openfd = fileno(stdin); - else if (STREQN(name, "/dev/", 5) && stat((char *) name, &buf) == -1) { - int i; - - for (i = 0; i < devcount; i++) { - if (STREQN(name, table[i].name, table[i].compare)) { - iop = & table[i].iob; - - if (iop->buf != NULL) { - spec_setup(iop, 0, 0); - return iop; - } else if ((*table[i].fp)(iop, name, mode) == 0) - return iop; - else { - warning("could not open %s, mode `%s'", - name, mode); - return NULL; - } - } - } - } - -strictopen: - if (openfd == INVALID_HANDLE) - openfd = open(name, flag, 0666); - if (openfd != INVALID_HANDLE && fstat(openfd, &buf) > 0) - if ((buf.st_mode & S_IFMT) == S_IFDIR) - fatal("file `%s' is a directory", name); - iop = iop_alloc(openfd); - return iop; -} - -#ifndef PIPES_SIMULATED - /* real pipes */ -static int -wait_any(interesting) -int interesting; /* pid of interest, if any */ -{ - SIGTYPE (*hstat)(), (*istat)(), (*qstat)(); - int pid; - int status = 0; - struct redirect *redp; - extern int errno; - - hstat = signal(SIGHUP, SIG_IGN); - istat = signal(SIGINT, SIG_IGN); - qstat = signal(SIGQUIT, SIG_IGN); - for (;;) { -#ifdef NeXT - pid = wait((union wait *)&status); -#else - pid = wait(&status); -#endif /* NeXT */ - if (interesting && pid == interesting) { - break; - } else if (pid != -1) { - for (redp = red_head; redp != NULL; redp = redp->next) - if (pid == redp->pid) { - redp->pid = -1; - redp->status = status; - if (redp->fp) { - pclose(redp->fp); - redp->fp = 0; - } - if (redp->iop) { - (void) iop_close(redp->iop); - redp->iop = 0; - } - break; - } - } - if (pid == -1 && errno == ECHILD) - break; - } - signal(SIGHUP, hstat); - signal(SIGINT, istat); - signal(SIGQUIT, qstat); - return(status); -} - -static IOBUF * -gawk_popen(cmd, rp) -char *cmd; -struct redirect *rp; -{ - int p[2]; - register int pid; - - /* used to wait for any children to synchronize input and output, - * but this could cause gawk to hang when it is started in a pipeline - * and thus has a child process feeding it input (shell dependant) - */ - /*(void) wait_any(0);*/ /* wait for outstanding processes */ - - if (pipe(p) < 0) - fatal("cannot open pipe \"%s\" (%s)", cmd, strerror(errno)); - if ((pid = fork()) == 0) { - if (close(1) == -1) - fatal("close of stdout in child failed (%s)", - strerror(errno)); - if (dup(p[1]) != 1) - fatal("dup of pipe failed (%s)", strerror(errno)); - if (close(p[0]) == -1 || close(p[1]) == -1) - fatal("close of pipe failed (%s)", strerror(errno)); - if (close(0) == -1) - fatal("close of stdin in child failed (%s)", - strerror(errno)); - execl("/bin/sh", "sh", "-c", cmd, 0); - _exit(127); - } - if (pid == -1) - fatal("cannot fork for \"%s\" (%s)", cmd, strerror(errno)); - rp->pid = pid; - if (close(p[1]) == -1) - fatal("close of pipe failed (%s)", strerror(errno)); - return (rp->iop = iop_alloc(p[0])); -} - -static int -gawk_pclose(rp) -struct redirect *rp; -{ - (void) iop_close(rp->iop); - rp->iop = NULL; - - /* process previously found, return stored status */ - if (rp->pid == -1) - return (rp->status >> 8) & 0xFF; - rp->status = wait_any(rp->pid); - rp->pid = -1; - return (rp->status >> 8) & 0xFF; -} - -#else /* PIPES_SIMULATED */ - /* use temporary file rather than pipe */ - /* except if popen() provides real pipes too */ - -#if defined(VMS) || defined(OS2) || defined (MSDOS) -static IOBUF * -gawk_popen(cmd, rp) -char *cmd; -struct redirect *rp; -{ - FILE *current; - - if ((current = popen(cmd, "r")) == NULL) - return NULL; - return (rp->iop = iop_alloc(fileno(current))); -} - -static int -gawk_pclose(rp) -struct redirect *rp; -{ - int rval, aval, fd = rp->iop->fd; - FILE *kludge = fdopen(fd, (char *) "r"); /* pclose needs FILE* w/ right fileno */ - - rp->iop->fd = dup(fd); /* kludge to allow close() + pclose() */ - rval = iop_close(rp->iop); - rp->iop = NULL; - aval = pclose(kludge); - return (rval < 0 ? rval : aval); -} -#else /* VMS || OS2 || MSDOS */ - -static -struct { - char *command; - char *name; -} pipes[_NFILE]; - -static IOBUF * -gawk_popen(cmd, rp) -char *cmd; -struct redirect *rp; -{ - extern char *strdup(const char *); - int current; - char *name; - static char cmdbuf[256]; - - /* get a name to use. */ - if ((name = tempnam(".", "pip")) == NULL) - return NULL; - sprintf(cmdbuf,"%s > %s", cmd, name); - system(cmdbuf); - if ((current = open(name,O_RDONLY)) == INVALID_HANDLE) - return NULL; - pipes[current].name = name; - pipes[current].command = strdup(cmd); - rp->iop = iop_alloc(current); - return (rp->iop = iop_alloc(current)); -} - -static int -gawk_pclose(rp) -struct redirect *rp; -{ - int cur = rp->iop->fd; - int rval; - - rval = iop_close(rp->iop); - rp->iop = NULL; - - /* check for an open file */ - if (pipes[cur].name == NULL) - return -1; - unlink(pipes[cur].name); - free(pipes[cur].name); - pipes[cur].name = NULL; - free(pipes[cur].command); - return rval; -} -#endif /* VMS || OS2 || MSDOS */ - -#endif /* PIPES_SIMULATED */ - -NODE * -do_getline(tree) -NODE *tree; -{ - struct redirect *rp = NULL; - IOBUF *iop; - int cnt = EOF; - char *s = NULL; - int errcode; - - while (cnt == EOF) { - if (tree->rnode == NULL) { /* no redirection */ - iop = nextfile(0); - if (iop == NULL) /* end of input */ - return tmp_number((AWKNUM) 0.0); - } else { - int redir_error = 0; - - rp = redirect(tree->rnode, &redir_error); - if (rp == NULL && redir_error) { /* failed redirect */ - if (! do_unix) { - s = strerror(redir_error); - - unref(ERRNO_node->var_value); - ERRNO_node->var_value = - make_string(s, strlen(s)); - } - return tmp_number((AWKNUM) -1.0); - } - iop = rp->iop; - if (iop == NULL) /* end of input */ - return tmp_number((AWKNUM) 0.0); - } - errcode = 0; - cnt = get_a_record(&s, iop, *RS, & errcode); - if (! do_unix && errcode != 0) { - s = strerror(errcode); - - unref(ERRNO_node->var_value); - ERRNO_node->var_value = make_string(s, strlen(s)); - return tmp_number((AWKNUM) -1.0); - } - if (cnt == EOF) { - if (rp) { - /* - * Don't do iop_close() here if we are - * reading from a pipe; otherwise - * gawk_pclose will not be called. - */ - if (!(rp->flag & RED_PIPE)) { - (void) iop_close(iop); - rp->iop = NULL; - } - rp->flag |= RED_EOF; /* sticky EOF */ - return tmp_number((AWKNUM) 0.0); - } else - continue; /* try another file */ - } - if (!rp) { - NR += 1; - FNR += 1; - } - if (tree->lnode == NULL) /* no optional var. */ - set_record(s, cnt, 1); - else { /* assignment to variable */ - Func_ptr after_assign = NULL; - NODE **lhs; - - lhs = get_lhs(tree->lnode, &after_assign); - unref(*lhs); - *lhs = make_string(s, strlen(s)); - (*lhs)->flags |= MAYBE_NUM; - /* we may have to regenerate $0 here! */ - if (after_assign) - (*after_assign)(); - } - } - return tmp_number((AWKNUM) 1.0); -} - -int -pathopen (file) -const char *file; -{ - int fd = do_pathopen(file); - -#ifdef DEFAULT_FILETYPE - if (! do_unix && fd <= INVALID_HANDLE) { - char *file_awk; - int save = errno; -#ifdef VMS - int vms_save = vaxc$errno; -#endif - - /* append ".awk" and try again */ - emalloc(file_awk, char *, strlen(file) + - sizeof(DEFAULT_FILETYPE) + 1, "pathopen"); - sprintf(file_awk, "%s%s", file, DEFAULT_FILETYPE); - fd = do_pathopen(file_awk); - free(file_awk); - if (fd <= INVALID_HANDLE) { - errno = save; -#ifdef VMS - vaxc$errno = vms_save; -#endif - } - } -#endif /*DEFAULT_FILETYPE*/ - - return fd; -} - -static int -do_pathopen (file) -const char *file; -{ - static const char *savepath = DEFPATH; /* defined in config.h */ - static int first = 1; - const char *awkpath; - char *cp, trypath[BUFSIZ]; - int fd; - - if (STREQ(file, "-")) - return (0); - - if (do_unix) - return (devopen(file, "r")); - - if (first) { - first = 0; - if ((awkpath = getenv ("AWKPATH")) != NULL && *awkpath) - savepath = awkpath; /* used for restarting */ - } - awkpath = savepath; - - /* some kind of path name, no search */ -#ifdef VMS /* (strchr not equal implies either or both not NULL) */ - if (strchr(file, ':') != strchr(file, ']') - || strchr(file, '>') != strchr(file, '/')) -#else /*!VMS*/ -#if defined(MSDOS) || defined(OS2) - if (strchr(file, '/') != strchr(file, '\\') - || strchr(file, ':') != NULL) -#else - if (strchr(file, '/') != NULL) -#endif /*MSDOS*/ -#endif /*VMS*/ - return (devopen(file, "r")); - -#if defined(MSDOS) || defined(OS2) - _searchenv(file, "AWKPATH", trypath); - if (trypath[0] == '\0') - _searchenv(file, "PATH", trypath); - return (trypath[0] == '\0') ? 0 : devopen(trypath, "r"); -#else - do { - trypath[0] = '\0'; - /* this should take into account limits on size of trypath */ - for (cp = trypath; *awkpath && *awkpath != ENVSEP; ) - *cp++ = *awkpath++; - - if (cp != trypath) { /* nun-null element in path */ - /* add directory punctuation only if needed */ -#ifdef VMS - if (strchr(":]>/", *(cp-1)) == NULL) -#else -#if defined(MSDOS) || defined(OS2) - if (strchr(":\\/", *(cp-1)) == NULL) -#else - if (*(cp-1) != '/') -#endif -#endif - *cp++ = '/'; - /* append filename */ - strcpy (cp, file); - } else - strcpy (trypath, file); - if ((fd = devopen(trypath, "r")) >= 0) - return (fd); - - /* no luck, keep going */ - if(*awkpath == ENVSEP && awkpath[1] != '\0') - awkpath++; /* skip colon */ - } while (*awkpath); - /* - * You might have one of the awk - * paths defined, WITHOUT the current working directory in it. - * Therefore try to open the file in the current directory. - */ - return (devopen(file, "r")); -#endif -} diff --git a/gnu/usr.bin/awk/iop.c b/gnu/usr.bin/awk/iop.c deleted file mode 100644 index 6b6a03b..0000000 --- a/gnu/usr.bin/awk/iop.c +++ /dev/null @@ -1,321 +0,0 @@ -/* - * iop.c - do i/o related things. - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include "awk.h" - -#ifndef atarist -#define INVALID_HANDLE (-1) -#else -#include <stddef.h> -#include <fcntl.h> -#define INVALID_HANDLE (__SMALLEST_VALID_HANDLE - 1) -#endif /* atarist */ - - -#ifdef TEST -int bufsize = 8192; - -void -fatal(s) -char *s; -{ - printf("%s\n", s); - exit(1); -} -#endif - -int -optimal_bufsize(fd) -int fd; -{ - struct stat stb; - -#ifdef VMS - /* - * These values correspond with the RMS multi-block count used by - * vms_open() in vms/vms_misc.c. - */ - if (isatty(fd) > 0) - return BUFSIZ; - else if (fstat(fd, &stb) < 0) - return 8*512; /* conservative in case of DECnet access */ - else - return 32*512; - -#else - /* - * System V doesn't have the file system block size in the - * stat structure. So we have to make some sort of reasonable - * guess. We use stdio's BUFSIZ, since that is what it was - * meant for in the first place. - */ -#ifdef BLKSIZE_MISSING -#define DEFBLKSIZE BUFSIZ -#else -#define DEFBLKSIZE (stb.st_blksize ? stb.st_blksize : BUFSIZ) -#endif - -#ifdef TEST - return bufsize; -#else -#ifndef atarist - if (isatty(fd)) -#else - /* - * On ST redirected stdin does not have a name attached - * (this could be hard to do to) and fstat would fail - */ - if (0 == fd || isatty(fd)) -#endif /*atarist */ - return BUFSIZ; -#ifndef BLKSIZE_MISSING - /* VMS POSIX 1.0: st_blksize is never assigned a value, so zero it */ - stb.st_blksize = 0; -#endif - if (fstat(fd, &stb) == -1) - fatal("can't stat fd %d (%s)", fd, strerror(errno)); - if (lseek(fd, (off_t)0, 0) == -1) - return DEFBLKSIZE; - return ((int) (stb.st_size < DEFBLKSIZE ? stb.st_size : DEFBLKSIZE)); -#endif /*! TEST */ -#endif /*! VMS */ -} - -IOBUF * -iop_alloc(fd) -int fd; -{ - IOBUF *iop; - - if (fd == INVALID_HANDLE) - return NULL; - emalloc(iop, IOBUF *, sizeof(IOBUF), "iop_alloc"); - iop->flag = 0; - if (isatty(fd)) - iop->flag |= IOP_IS_TTY; - iop->size = optimal_bufsize(fd); - iop->secsiz = -2; - errno = 0; - iop->fd = fd; - iop->off = iop->buf = NULL; - iop->cnt = 0; - return iop; -} - -/* - * Get the next record. Uses a "split buffer" where the latter part is - * the normal read buffer and the head part is an "overflow" area that is used - * when a record spans the end of the normal buffer, in which case the first - * part of the record is copied into the overflow area just before the - * normal buffer. Thus, the eventual full record can be returned as a - * contiguous area of memory with a minimum of copying. The overflow area - * is expanded as needed, so that records are unlimited in length. - * We also mark both the end of the buffer and the end of the read() with - * a sentinel character (the current record separator) so that the inside - * loop can run as a single test. - */ -int -get_a_record(out, iop, grRS, errcode) -char **out; -IOBUF *iop; -register int grRS; -int *errcode; -{ - register char *bp = iop->off; - char *bufend; - char *start = iop->off; /* beginning of record */ - char rs; - int saw_newline = 0, eat_whitespace = 0; /* used iff grRS==0 */ - - if (iop->cnt == EOF) { /* previous read hit EOF */ - *out = NULL; - return EOF; - } - - if (grRS == 0) { /* special case: grRS == "" */ - rs = '\n'; - } else - rs = (char) grRS; - - /* set up sentinel */ - if (iop->buf) { - bufend = iop->buf + iop->size + iop->secsiz; - *bufend = rs; - } else - bufend = NULL; - - for (;;) { /* break on end of record, read error or EOF */ - - /* Following code is entered on the first call of this routine - * for a new iop, or when we scan to the end of the buffer. - * In the latter case, we copy the current partial record to - * the space preceding the normal read buffer. If necessary, - * we expand this space. This is done so that we can return - * the record as a contiguous area of memory. - */ - if ((iop->flag & IOP_IS_INTERNAL) == 0 && bp >= bufend) { - char *oldbuf = NULL; - char *oldsplit = iop->buf + iop->secsiz; - long len; /* record length so far */ - - len = bp - start; - if (len > iop->secsiz) { - /* expand secondary buffer */ - if (iop->secsiz == -2) - iop->secsiz = 256; - while (len > iop->secsiz) - iop->secsiz *= 2; - oldbuf = iop->buf; - emalloc(iop->buf, char *, - iop->size+iop->secsiz+2, "get_a_record"); - bufend = iop->buf + iop->size + iop->secsiz; - *bufend = rs; - } - if (len > 0) { - char *newsplit = iop->buf + iop->secsiz; - - if (start < oldsplit) { - memcpy(newsplit - len, start, - oldsplit - start); - memcpy(newsplit - (bp - oldsplit), - oldsplit, bp - oldsplit); - } else - memcpy(newsplit - len, start, len); - } - bp = iop->end = iop->off = iop->buf + iop->secsiz; - start = bp - len; - if (oldbuf) { - free(oldbuf); - oldbuf = NULL; - } - } - /* Following code is entered whenever we have no more data to - * scan. In most cases this will read into the beginning of - * the main buffer, but in some cases (terminal, pipe etc.) - * we may be doing smallish reads into more advanced positions. - */ - if (bp >= iop->end) { - if ((iop->flag & IOP_IS_INTERNAL) != 0) { - iop->cnt = EOF; - break; - } - iop->cnt = read(iop->fd, iop->end, bufend - iop->end); - if (iop->cnt == -1) { - if (! do_unix && errcode != NULL) { - *errcode = errno; - iop->cnt = EOF; - break; - } else - fatal("error reading input: %s", - strerror(errno)); - } else if (iop->cnt == 0) { - iop->cnt = EOF; - break; - } - iop->end += iop->cnt; - *iop->end = rs; - } - if (grRS == 0) { - extern int default_FS; - - if (default_FS && (bp == start || eat_whitespace)) { - while (bp < iop->end - && (*bp == ' ' || *bp == '\t' || *bp == '\n')) - bp++; - if (bp == iop->end) { - eat_whitespace = 1; - continue; - } else - eat_whitespace = 0; - } - if (saw_newline && *bp == rs) { - bp++; - break; - } - saw_newline = 0; - } - - while (*bp++ != rs) - ; - - if (bp <= iop->end) { - if (grRS == 0) - saw_newline = 1; - else - break; - } else - bp--; - - if ((iop->flag & IOP_IS_INTERNAL) != 0) - iop->cnt = bp - start; - } - if (iop->cnt == EOF - && (((iop->flag & IOP_IS_INTERNAL) != 0) || start == bp)) { - *out = NULL; - return EOF; - } - - iop->off = bp; - bp--; - if (*bp != rs) - bp++; - *bp = '\0'; - if (grRS == 0) { - /* there could be more newlines left, clean 'em out now */ - while (*(iop->off) == rs && iop->off <= iop->end) - (iop->off)++; - - if (*--bp == rs) - *bp = '\0'; - else - bp++; - } - - *out = start; - return bp - start; -} - -#ifdef TEST -main(argc, argv) -int argc; -char *argv[]; -{ - IOBUF *iop; - char *out; - int cnt; - char rs[2]; - - rs[0] = 0; - if (argc > 1) - bufsize = atoi(argv[1]); - if (argc > 2) - rs[0] = *argv[2]; - iop = iop_alloc(0); - while ((cnt = get_a_record(&out, iop, rs[0], NULL)) > 0) { - fwrite(out, 1, cnt, stdout); - fwrite(rs, 1, 1, stdout); - } -} -#endif diff --git a/gnu/usr.bin/awk/main.c b/gnu/usr.bin/awk/main.c deleted file mode 100644 index 2276460..0000000 --- a/gnu/usr.bin/awk/main.c +++ /dev/null @@ -1,823 +0,0 @@ -/* - * main.c -- Expression tree constructors and main program for gawk. - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#ifdef __FreeBSD__ -#include <locale.h> -#endif -#include "getopt.h" -#include "awk.h" -#include "patchlevel.h" - -static void usage P((int exitval)); -static void copyleft P((void)); -static void cmdline_fs P((char *str)); -static void init_args P((int argc0, int argc, char *argv0, char **argv)); -static void init_vars P((void)); -static void pre_assign P((char *v)); -SIGTYPE catchsig P((int sig, int code)); -static void gawk_option P((char *optstr)); -static void nostalgia P((void)); -static void version P((void)); -char *gawk_name P((char *filespec)); - -#ifdef MSDOS -extern int isatty P((int)); -#endif - -extern void resetup P((void)); - -/* These nodes store all the special variables AWK uses */ -NODE *FS_node, *NF_node, *RS_node, *NR_node; -NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node; -NODE *CONVFMT_node; -NODE *ERRNO_node; -NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node; -NODE *ENVIRON_node, *IGNORECASE_node; -NODE *ARGC_node, *ARGV_node, *ARGIND_node; -NODE *FIELDWIDTHS_node; - -long NF; -long NR; -long FNR; -int IGNORECASE; -char *RS; -char *OFS; -char *ORS; -char *OFMT; -char *CONVFMT; - -/* - * The parse tree and field nodes are stored here. Parse_end is a dummy item - * used to free up unneeded fields without freeing the program being run - */ -int errcount = 0; /* error counter, used by yyerror() */ - -/* The global null string */ -NODE *Nnull_string; - -/* The name the program was invoked under, for error messages */ -const char *myname; - -/* A block of AWK code to be run before running the program */ -NODE *begin_block = 0; - -/* A block of AWK code to be run after the last input file */ -NODE *end_block = 0; - -int exiting = 0; /* Was an "exit" statement executed? */ -int exit_val = 0; /* optional exit value */ - -#if defined(YYDEBUG) || defined(DEBUG) -extern int yydebug; -#endif - -struct src *srcfiles = NULL; /* source file name(s) */ -int numfiles = -1; /* how many source files */ - -int do_unix = 0; /* turn off gnu extensions */ -int do_posix = 0; /* turn off gnu and unix extensions */ -int do_lint = 0; /* provide warnings about questionable stuff */ -int do_nostalgia = 0; /* provide a blast from the past */ - -int in_begin_rule = 0; /* we're in a BEGIN rule */ -int in_end_rule = 0; /* we're in a END rule */ - -int output_is_tty = 0; /* control flushing of output */ - -extern char *version_string; /* current version, for printing */ - -NODE *expression_value; - -static struct option optab[] = { - { "compat", no_argument, & do_unix, 1 }, - { "lint", no_argument, & do_lint, 1 }, - { "posix", no_argument, & do_posix, 1 }, - { "nostalgia", no_argument, & do_nostalgia, 1 }, - { "copyleft", no_argument, NULL, 'C' }, - { "copyright", no_argument, NULL, 'C' }, - { "field-separator", required_argument, NULL, 'F' }, - { "file", required_argument, NULL, 'f' }, - { "assign", required_argument, NULL, 'v' }, - { "version", no_argument, NULL, 'V' }, - { "usage", no_argument, NULL, 'u' }, - { "help", no_argument, NULL, 'u' }, - { "source", required_argument, NULL, 's' }, -#ifdef DEBUG - { "parsedebug", no_argument, NULL, 'D' }, -#endif - { 0, 0, 0, 0 } -}; - -int -main(argc, argv) -int argc; -char **argv; -{ - int c; - char *scan; - /* the + on the front tells GNU getopt not to rearrange argv */ - const char *optlist = "+F:f:v:W:m:"; - int stopped_early = 0; - int old_optind; - extern int optind; - extern int opterr; - extern char *optarg; - -#ifdef __FreeBSD__ - (void) setlocale(LC_ALL, ""); -#endif -#ifdef __EMX__ - _response(&argc, &argv); - _wildcard(&argc, &argv); - setvbuf(stdout, NULL, _IOLBF, BUFSIZ); -#endif - - (void) signal(SIGFPE, (SIGTYPE (*) P((int))) catchsig); - (void) signal(SIGSEGV, (SIGTYPE (*) P((int))) catchsig); -#ifdef SIGBUS - (void) signal(SIGBUS, (SIGTYPE (*) P((int))) catchsig); -#endif - - myname = gawk_name(argv[0]); - argv[0] = (char *)myname; -#ifdef VMS - vms_arg_fixup(&argc, &argv); /* emulate redirection, expand wildcards */ -#endif - - /* remove sccs gunk */ - if (strncmp(version_string, "@(#)", 4) == 0) - version_string += 4; - - if (argc < 2) - usage(1); - - /* initialize the null string */ - Nnull_string = make_string("", 0); - Nnull_string->numbr = 0.0; - Nnull_string->type = Node_val; - Nnull_string->flags = (PERM|STR|STRING|NUM|NUMBER); - - /* Set up the special variables */ - /* - * Note that this must be done BEFORE arg parsing else -F - * breaks horribly - */ - init_vars(); - - /* worst case */ - emalloc(srcfiles, struct src *, argc * sizeof(struct src), "main"); - memset(srcfiles, '\0', argc * sizeof(struct src)); - - /* Tell the regex routines how they should work. . . */ - resetup(); - -#ifdef fpsetmask - fpsetmask(~0xff); -#endif - /* we do error messages ourselves on invalid options */ - opterr = 0; - - /* option processing. ready, set, go! */ - for (optopt = 0, old_optind = 1; - (c = getopt_long(argc, argv, optlist, optab, NULL)) != EOF; - optopt = 0, old_optind = optind) { - if (do_posix) - opterr = 1; - switch (c) { - case 'F': - cmdline_fs(optarg); - break; - - case 'f': - /* - * a la MKS awk, allow multiple -f options. - * this makes function libraries real easy. - * most of the magic is in the scanner. - */ - /* The following is to allow for whitespace at the end - * of a #! /bin/gawk line in an executable file - */ - scan = optarg; - while (isspace(*scan)) - scan++; - ++numfiles; - srcfiles[numfiles].stype = SOURCEFILE; - if (*scan == '\0') - srcfiles[numfiles].val = argv[optind++]; - else - srcfiles[numfiles].val = optarg; - break; - - case 'v': - pre_assign(optarg); - break; - - case 'm': - /* - * Research awk extension. - * -mf=nnn set # fields, gawk ignores - * -mr=nnn set record length, ditto - */ - if (do_lint) - warning("-m[fr] option irrelevant"); - if ((optarg[0] != 'r' && optarg[0] != 'f') - || optarg[1] != '=') - warning("-m option usage: -m[fn]=nnn"); - break; - - case 'W': /* gawk specific options */ - gawk_option(optarg); - break; - - /* These can only come from long form options */ - case 'V': - version(); - break; - - case 'C': - copyleft(); - break; - - case 'u': - usage(0); - break; - - case 's': - if (optarg[0] == '\0') - warning("empty argument to --source ignored"); - else { - srcfiles[++numfiles].stype = CMDLINE; - srcfiles[numfiles].val = optarg; - } - break; - -#ifdef DEBUG - case 'D': - yydebug = 2; - break; -#endif - - case 0: - /* - * getopt_long found an option that sets a variable - * instead of returning a letter. Do nothing, just - * cycle around for the next one. - */ - break; - - case '?': - default: - /* - * New behavior. If not posix, an unrecognized - * option stops argument processing so that it can - * go into ARGV for the awk program to see. This - * makes use of ``#! /bin/gawk -f'' easier. - * - * However, it's never simple. If optopt is set, - * an option that requires an argument didn't get the - * argument. We care because if opterr is 0, then - * getopt_long won't print the error message for us. - */ - if (! do_posix - && (optopt == 0 || strchr(optlist, optopt) == NULL)) { - /* - * can't just do optind--. In case of an - * option with >=2 letters, getopt_long - * won't have incremented optind. - */ - optind = old_optind; - stopped_early = 1; - goto out; - } else if (optopt) - /* Use 1003.2 required message format */ - fprintf (stderr, - "%s: option requires an argument -- %c\n", - myname, optopt); - /* else - let getopt print error message for us */ - break; - } - } -out: - - if (do_nostalgia) - nostalgia(); - - /* check for POSIXLY_CORRECT environment variable */ - if (! do_posix && getenv("POSIXLY_CORRECT") != NULL) { - do_posix = 1; - if (do_lint) - warning( - "environment variable `POSIXLY_CORRECT' set: turning on --posix"); - } - - /* POSIX compliance also implies no Unix extensions either */ - if (do_posix) - do_unix = 1; - -#ifdef DEBUG - setbuf(stdout, (char *) NULL); /* make debugging easier */ -#endif - if (isatty(fileno(stdout))) - output_is_tty = 1; - /* No -f or --source options, use next arg */ - if (numfiles == -1) { - if (optind > argc - 1 || stopped_early) /* no args left or no program */ - usage(1); - srcfiles[++numfiles].stype = CMDLINE; - srcfiles[numfiles].val = argv[optind]; - optind++; - } - init_args(optind, argc, (char *) myname, argv); - (void) tokexpand(); - - /* Read in the program */ - if (yyparse() || errcount) - exit(1); - - /* Set up the field variables */ - init_fields(); - - if (do_lint && begin_block == NULL && expression_value == NULL - && end_block == NULL) - warning("no program"); - - if (begin_block) { - in_begin_rule = 1; - (void) interpret(begin_block); - } - in_begin_rule = 0; - if (!exiting && (expression_value || end_block)) - do_input(); - if (end_block) { - in_end_rule = 1; - (void) interpret(end_block); - } - in_end_rule = 0; - if (close_io() != 0 && exit_val == 0) - exit_val = 1; - exit(exit_val); /* more portable */ - return exit_val; /* to suppress warnings */ -} - -/* usage --- print usage information and exit */ - -static void -usage(exitval) -int exitval; -{ - const char *opt1 = " -f progfile [--]"; -#if defined(MSDOS) || defined(OS2) || defined(VMS) - const char *opt2 = " [--] \"program\""; -#else - const char *opt2 = " [--] 'program'"; -#endif - const char *regops = " [POSIX or GNU style options]"; - - fprintf(stderr, "Usage:\t%s%s%s file ...\n\t%s%s%s file ...\n", - myname, regops, opt1, myname, regops, opt2); - - /* GNU long options info. Gack. */ - fputs("POSIX options:\t\tGNU long options:\n", stderr); - fputs("\t-f progfile\t\t--file=progfile\n", stderr); - fputs("\t-F fs\t\t\t--field-separator=fs\n", stderr); - fputs("\t-v var=val\t\t--assign=var=val\n", stderr); - fputs("\t-m[fr]=val\n", stderr); - fputs("\t-W compat\t\t--compat\n", stderr); - fputs("\t-W copyleft\t\t--copyleft\n", stderr); - fputs("\t-W copyright\t\t--copyright\n", stderr); - fputs("\t-W help\t\t\t--help\n", stderr); - fputs("\t-W lint\t\t\t--lint\n", stderr); -#ifdef NOSTALGIA - fputs("\t-W nostalgia\t\t--nostalgia\n", stderr); -#endif -#ifdef DEBUG - fputs("\t-W parsedebug\t\t--parsedebug\n", stderr); -#endif - fputs("\t-W posix\t\t--posix\n", stderr); - fputs("\t-W source=program-text\t--source=program-text\n", stderr); - fputs("\t-W usage\t\t--usage\n", stderr); - fputs("\t-W version\t\t--version\n", stderr); - exit(exitval); -} - -static void -copyleft () -{ - static char blurb_part1[] = -"Copyright (C) 1989, 1991, 1992, Free Software Foundation.\n\ -\n\ -This program is free software; you can redistribute it and/or modify\n\ -it under the terms of the GNU General Public License as published by\n\ -the Free Software Foundation; either version 2 of the License, or\n\ -(at your option) any later version.\n\ -\n"; - static char blurb_part2[] = -"This program is distributed in the hope that it will be useful,\n\ -but WITHOUT ANY WARRANTY; without even the implied warranty of\n\ -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n\ -GNU General Public License for more details.\n\ -\n"; - static char blurb_part3[] = -"You should have received a copy of the GNU General Public License\n\ -along with this program; if not, write to the Free Software\n\ -Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.\n"; - - fputs(blurb_part1, stderr); - fputs(blurb_part2, stderr); - fputs(blurb_part3, stderr); - fflush(stderr); -} - -static void -cmdline_fs(str) -char *str; -{ - register NODE **tmp; - /* int len = strlen(str); *//* don't do that - we want to - avoid mismatched types */ - - tmp = get_lhs(FS_node, (Func_ptr *) 0); - unref(*tmp); - /* - * Only if in full compatibility mode check for the stupid special - * case so -F\t works as documented in awk even though the shell - * hands us -Ft. Bleah! - * - * Thankfully, Posix didn't propogate this "feature". - */ - if (str[0] == 't' && str[1] == '\0') { - if (do_lint) - warning("-Ft does not set FS to tab in POSIX awk"); - if (do_unix && ! do_posix) - str[0] = '\t'; - } - *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */ - set_FS(); -} - -static void -init_args(argc0, argc, argv0, argv) -int argc0, argc; -char *argv0; -char **argv; -{ - int i, j; - NODE **aptr; - - ARGV_node = install("ARGV", node(Nnull_string, Node_var, (NODE *)NULL)); - aptr = assoc_lookup(ARGV_node, tmp_number(0.0)); - *aptr = make_string(argv0, strlen(argv0)); - (*aptr)->flags |= MAYBE_NUM; - for (i = argc0, j = 1; i < argc; i++) { - aptr = assoc_lookup(ARGV_node, tmp_number((AWKNUM) j)); - *aptr = make_string(argv[i], strlen(argv[i])); - (*aptr)->flags |= MAYBE_NUM; - j++; - } - ARGC_node = install("ARGC", - node(make_number((AWKNUM) j), Node_var, (NODE *) NULL)); -} - -/* - * Set all the special variables to their initial values. - */ -struct varinit { - NODE **spec; - const char *name; - NODETYPE type; - const char *strval; - AWKNUM numval; - Func_ptr assign; -}; -static struct varinit varinit[] = { -{&NF_node, "NF", Node_NF, 0, -1, set_NF }, -{&FIELDWIDTHS_node, "FIELDWIDTHS", Node_FIELDWIDTHS, "", 0, 0 }, -{&NR_node, "NR", Node_NR, 0, 0, set_NR }, -{&FNR_node, "FNR", Node_FNR, 0, 0, set_FNR }, -{&FS_node, "FS", Node_FS, " ", 0, 0 }, -{&RS_node, "RS", Node_RS, "\n", 0, set_RS }, -{&IGNORECASE_node, "IGNORECASE", Node_IGNORECASE, 0, 0, set_IGNORECASE }, -{&FILENAME_node, "FILENAME", Node_var, "", 0, 0 }, -{&OFS_node, "OFS", Node_OFS, " ", 0, set_OFS }, -{&ORS_node, "ORS", Node_ORS, "\n", 0, set_ORS }, -{&OFMT_node, "OFMT", Node_OFMT, "%.6g", 0, set_OFMT }, -{&CONVFMT_node, "CONVFMT", Node_CONVFMT, "%.6g", 0, set_CONVFMT }, -{&RLENGTH_node, "RLENGTH", Node_var, 0, 0, 0 }, -{&RSTART_node, "RSTART", Node_var, 0, 0, 0 }, -{&SUBSEP_node, "SUBSEP", Node_var, "\034", 0, 0 }, -{&ARGIND_node, "ARGIND", Node_var, 0, 0, 0 }, -{&ERRNO_node, "ERRNO", Node_var, 0, 0, 0 }, -{0, 0, Node_illegal, 0, 0, 0 }, -}; - -static void -init_vars() -{ - register struct varinit *vp; - - for (vp = varinit; vp->name; vp++) { - *(vp->spec) = install((char *) vp->name, - node(vp->strval == 0 ? make_number(vp->numval) - : make_string((char *) vp->strval, - strlen(vp->strval)), - vp->type, (NODE *) NULL)); - if (vp->assign) - (*(vp->assign))(); - } -} - -void -load_environ() -{ -#if !defined(MSDOS) && !defined(OS2) && !(defined(VMS) && defined(__DECC)) - extern char **environ; -#endif - register char *var, *val; - NODE **aptr; - register int i; - - ENVIRON_node = install("ENVIRON", - node(Nnull_string, Node_var, (NODE *) NULL)); - for (i = 0; environ[i]; i++) { - static char nullstr[] = ""; - - var = environ[i]; - val = strchr(var, '='); - if (val) - *val++ = '\0'; - else - val = nullstr; - aptr = assoc_lookup(ENVIRON_node, tmp_string(var, strlen (var))); - *aptr = make_string(val, strlen (val)); - (*aptr)->flags |= MAYBE_NUM; - - /* restore '=' so that system() gets a valid environment */ - if (val != nullstr) - *--val = '='; - } -} - -/* Process a command-line assignment */ -char * -arg_assign(arg) -char *arg; -{ - char *cp, *cp2; - int badvar; - Func_ptr after_assign = NULL; - NODE *var; - NODE *it; - NODE **lhs; - - cp = strchr(arg, '='); - if (cp != NULL) { - *cp++ = '\0'; - /* first check that the variable name has valid syntax */ - badvar = 0; - if (! isalpha(arg[0]) && arg[0] != '_') - badvar = 1; - else - for (cp2 = arg+1; *cp2; cp2++) - if (! isalnum(*cp2) && *cp2 != '_') { - badvar = 1; - break; - } - if (badvar) - fatal("illegal name `%s' in variable assignment", arg); - - /* - * Recent versions of nawk expand escapes inside assignments. - * This makes sense, so we do it too. - */ - it = make_str_node(cp, strlen(cp), SCAN); - it->flags |= MAYBE_NUM; - var = variable(arg, 0); - lhs = get_lhs(var, &after_assign); - unref(*lhs); - *lhs = it; - if (after_assign) - (*after_assign)(); - *--cp = '='; /* restore original text of ARGV */ - } - return cp; -} - -static void -pre_assign(v) -char *v; -{ - if (!arg_assign(v)) { - fprintf (stderr, - "%s: '%s' argument to -v not in 'var=value' form\n", - myname, v); - usage(1); - } -} - -SIGTYPE -catchsig(sig, code) -int sig, code; -{ -#ifdef lint - code = 0; sig = code; code = sig; -#endif - if (sig == SIGFPE) { - fatal("floating point exception"); - } else if (sig == SIGSEGV -#ifdef SIGBUS - || sig == SIGBUS -#endif - ) { - msg("fatal error: internal error"); - /* fatal won't abort() if not compiled for debugging */ - abort(); - } else - cant_happen(); - /* NOTREACHED */ -} - -/* gawk_option --- do gawk specific things */ - -static void -gawk_option(optstr) -char *optstr; -{ - char *cp; - - for (cp = optstr; *cp; cp++) { - switch (*cp) { - case ' ': - case '\t': - case ',': - break; - case 'v': - case 'V': - /* print version */ - if (strncasecmp(cp, "version", 7) != 0) - goto unknown; - else - cp += 6; - version(); - break; - case 'c': - case 'C': - if (strncasecmp(cp, "copyright", 9) == 0) { - cp += 8; - copyleft(); - } else if (strncasecmp(cp, "copyleft", 8) == 0) { - cp += 7; - copyleft(); - } else if (strncasecmp(cp, "compat", 6) == 0) { - cp += 5; - do_unix = 1; - } else - goto unknown; - break; - case 'n': - case 'N': - /* - * Undocumented feature, - * inspired by nostalgia, and a T-shirt - */ - if (strncasecmp(cp, "nostalgia", 9) != 0) - goto unknown; - nostalgia(); - break; - case 'p': - case 'P': -#ifdef DEBUG - if (strncasecmp(cp, "parsedebug", 10) == 0) { - cp += 9; - yydebug = 2; - break; - } -#endif - if (strncasecmp(cp, "posix", 5) != 0) - goto unknown; - cp += 4; - do_posix = do_unix = 1; - break; - case 'l': - case 'L': - if (strncasecmp(cp, "lint", 4) != 0) - goto unknown; - cp += 3; - do_lint = 1; - break; - case 'H': - case 'h': - if (strncasecmp(cp, "help", 4) != 0) - goto unknown; - cp += 3; - usage(0); - break; - case 'U': - case 'u': - if (strncasecmp(cp, "usage", 5) != 0) - goto unknown; - cp += 4; - usage(0); - break; - case 's': - case 'S': - if (strncasecmp(cp, "source=", 7) != 0) - goto unknown; - cp += 7; - if (cp[0] == '\0') - warning("empty argument to -Wsource ignored"); - else { - srcfiles[++numfiles].stype = CMDLINE; - srcfiles[numfiles].val = cp; - return; - } - break; - default: - unknown: - fprintf(stderr, "'%c' -- unknown option, ignored\n", - *cp); - break; - } - } -} - -/* nostalgia --- print the famous error message and die */ - -static void -nostalgia() -{ - fprintf(stderr, "awk: bailing out near line 1\n"); - abort(); -} - -/* version --- print version message */ - -static void -version() -{ - fprintf(stderr, "%s, patchlevel %d\n", version_string, PATCHLEVEL); - /* per GNU coding standards, exit successfully, do nothing else */ - exit(0); -} - -/* this mess will improve in 2.16 */ -char * -gawk_name(filespec) -char *filespec; -{ - char *p; - -#ifdef VMS /* "device:[root.][directory.subdir]GAWK.EXE;n" -> "GAWK" */ - char *q; - - p = strrchr(filespec, ']'); /* directory punctuation */ - q = strrchr(filespec, '>'); /* alternate <international> punct */ - - if (p == NULL || q > p) p = q; - p = strdup(p == NULL ? filespec : (p + 1)); - if ((q = strrchr(p, '.')) != NULL) *q = '\0'; /* strip .typ;vers */ - - return p; -#endif /*VMS*/ - -#if defined(MSDOS) || defined(OS2) || defined(atarist) - char *q; - - for (p = filespec; (p = strchr(p, '\\')); *p = '/') - ; - p = filespec; - if ((q = strrchr(p, '/'))) - p = q + 1; - if ((q = strchr(p, '.'))) - *q = '\0'; - strlwr(p); - - return (p == NULL ? filespec : p); -#endif /* MSDOS || atarist */ - - /* "path/name" -> "name" */ - p = strrchr(filespec, '/'); - return (p == NULL ? filespec : p + 1); -} diff --git a/gnu/usr.bin/awk/msg.c b/gnu/usr.bin/awk/msg.c deleted file mode 100644 index 4244fd3..0000000 --- a/gnu/usr.bin/awk/msg.c +++ /dev/null @@ -1,107 +0,0 @@ -/* - * msg.c - routines for error messages - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2, or (at your option) - * any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include "awk.h" - -int sourceline = 0; -char *source = NULL; - -/* VARARGS2 */ -void -err(s, emsg, argp) -const char *s; -const char *emsg; -va_list argp; -{ - char *file; - - (void) fflush(stdout); - (void) fprintf(stderr, "%s: ", myname); - if (sourceline) { - if (source) - (void) fprintf(stderr, "%s:", source); - else - (void) fprintf(stderr, "cmd. line:"); - - (void) fprintf(stderr, "%d: ", sourceline); - } - if (FNR) { - file = FILENAME_node->var_value->stptr; - (void) putc('(', stderr); - if (file) - (void) fprintf(stderr, "FILENAME=%s ", file); - (void) fprintf(stderr, "FNR=%ld) ", FNR); - } - (void) fprintf(stderr, s); - vfprintf(stderr, emsg, argp); - (void) fprintf(stderr, "\n"); - (void) fflush(stderr); -} - -/*VARARGS0*/ -void -msg(va_alist) -va_dcl -{ - va_list args; - char *mesg; - - va_start(args); - mesg = va_arg(args, char *); - err("", mesg, args); - va_end(args); -} - -/*VARARGS0*/ -void -warning(va_alist) -va_dcl -{ - va_list args; - char *mesg; - - va_start(args); - mesg = va_arg(args, char *); - err("warning: ", mesg, args); - va_end(args); -} - -/*VARARGS0*/ -void -fatal(va_alist) -va_dcl -{ - va_list args; - char *mesg; - - va_start(args); - mesg = va_arg(args, char *); - err("fatal: ", mesg, args); - va_end(args); -#ifdef DEBUG - abort(); -#endif - exit(2); -} diff --git a/gnu/usr.bin/awk/node.c b/gnu/usr.bin/awk/node.c deleted file mode 100644 index 748028b..0000000 --- a/gnu/usr.bin/awk/node.c +++ /dev/null @@ -1,463 +0,0 @@ -/* - * node.c -- routines for node management - */ - -/* - * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include "awk.h" - -extern double strtod(); - -AWKNUM -r_force_number(n) -register NODE *n; -{ - register char *cp; - register char *cpend; - char save; - char *ptr; - unsigned int newflags; - -#ifdef DEBUG - if (n == NULL) - cant_happen(); - if (n->type != Node_val) - cant_happen(); - if(n->flags == 0) - cant_happen(); - if (n->flags & NUM) - return n->numbr; -#endif - - /* all the conditionals are an attempt to avoid the expensive strtod */ - - n->numbr = 0.0; - n->flags |= NUM; - - if (n->stlen == 0) - return 0.0; - - cp = n->stptr; - if (isalpha(*cp)) - return 0.0; - - cpend = cp + n->stlen; - while (cp < cpend && isspace(*cp)) - cp++; - if (cp == cpend || isalpha(*cp)) - return 0.0; - - if (n->flags & MAYBE_NUM) { - newflags = NUMBER; - n->flags &= ~MAYBE_NUM; - } else - newflags = 0; - if (cpend - cp == 1) { - if (isdigit(*cp)) { - n->numbr = (AWKNUM)(*cp - '0'); - n->flags |= newflags; - } - return n->numbr; - } - - errno = 0; - save = *cpend; - *cpend = '\0'; - n->numbr = (AWKNUM) strtod((const char *)cp, &ptr); - - /* POSIX says trailing space is OK for NUMBER */ - while (isspace(*ptr)) - ptr++; - *cpend = save; - /* the >= should be ==, but for SunOS 3.5 strtod() */ - if (errno == 0 && ptr >= cpend) - n->flags |= newflags; - else - errno = 0; - - return n->numbr; -} - -/* - * the following lookup table is used as an optimization in force_string - * (more complicated) variations on this theme didn't seem to pay off, but - * systematic testing might be in order at some point - */ -static const char *values[] = { - "0", - "1", - "2", - "3", - "4", - "5", - "6", - "7", - "8", - "9", -}; -#define NVAL (sizeof(values)/sizeof(values[0])) - -NODE * -r_force_string(s) -register NODE *s; -{ - char buf[128]; - register char *sp = buf; - double val; - -#ifdef DEBUG - if (s == NULL) cant_happen(); - if (s->type != Node_val) cant_happen(); - if (s->flags & STR) return s; - if (!(s->flags & NUM)) cant_happen(); - if (s->stref != 0) ; /*cant_happen();*/ -#endif - - /* not an integral value, or out of range */ - if ((val = double_to_int(s->numbr)) != s->numbr - || val < LONG_MIN || val > LONG_MAX) { -#ifdef GFMT_WORKAROUND - NODE *dummy, *r; - unsigned short oflags; - extern NODE *format_tree P((const char *, int, NODE *)); - extern NODE **fmt_list; /* declared in eval.c */ - - /* create dummy node for a sole use of format_tree */ - getnode(dummy); - dummy->lnode = s; - dummy->rnode = NULL; - oflags = s->flags; - s->flags |= PERM; /* prevent from freeing by format_tree() */ - r = format_tree(CONVFMT, fmt_list[CONVFMTidx]->stlen, dummy); - s->flags = oflags; - s->stfmt = (char)CONVFMTidx; - s->stlen = r->stlen; - s->stptr = r->stptr; - freenode(r); /* Do not free_temp(r)! We want */ - freenode(dummy); /* to keep s->stptr == r->stpr. */ - - goto no_malloc; -#else - /* - * no need for a "replacement" formatting by gawk, - * just use sprintf - */ - sprintf(sp, CONVFMT, s->numbr); - s->stlen = strlen(sp); - s->stfmt = (char)CONVFMTidx; -#endif /* GFMT_WORKAROUND */ - } else { - /* integral value */ - /* force conversion to long only once */ - register long num = (long) val; - if (num < NVAL && num >= 0) { - sp = (char *) values[num]; - s->stlen = 1; - } else { - (void) sprintf(sp, "%ld", num); - s->stlen = strlen(sp); - } - s->stfmt = -1; - } - emalloc(s->stptr, char *, s->stlen + 2, "force_string"); - memcpy(s->stptr, sp, s->stlen+1); -no_malloc: - s->stref = 1; - s->flags |= STR; - return s; -} - -/* - * Duplicate a node. (For strings, "duplicate" means crank up the - * reference count.) - */ -NODE * -dupnode(n) -NODE *n; -{ - register NODE *r; - - if (n->flags & TEMP) { - n->flags &= ~TEMP; - n->flags |= MALLOC; - return n; - } - if ((n->flags & (MALLOC|STR)) == (MALLOC|STR)) { - if (n->stref < 255) - n->stref++; - return n; - } - getnode(r); - *r = *n; - r->flags &= ~(PERM|TEMP); - r->flags |= MALLOC; - if (n->type == Node_val && (n->flags & STR)) { - r->stref = 1; - emalloc(r->stptr, char *, r->stlen + 2, "dupnode"); - memcpy(r->stptr, n->stptr, r->stlen); - r->stptr[r->stlen] = '\0'; - } - return r; -} - -/* this allocates a node with defined numbr */ -NODE * -mk_number(x, flags) -AWKNUM x; -unsigned int flags; -{ - register NODE *r; - - getnode(r); - r->type = Node_val; - r->numbr = x; - r->flags = flags; -#ifdef DEBUG - r->stref = 1; - r->stptr = 0; - r->stlen = 0; -#endif - return r; -} - -/* - * Make a string node. - */ -NODE * -make_str_node(s, len, flags) -char *s; -size_t len; -int flags; -{ - register NODE *r; - - getnode(r); - r->type = Node_val; - r->flags = (STRING|STR|MALLOC); - if (flags & ALREADY_MALLOCED) - r->stptr = s; - else { - emalloc(r->stptr, char *, len + 2, s); - memcpy(r->stptr, s, len); - } - r->stptr[len] = '\0'; - - if (flags & SCAN) { /* scan for escape sequences */ - char *pf; - register char *ptm; - register int c; - register char *end; - - end = &(r->stptr[len]); - for (pf = ptm = r->stptr; pf < end;) { - c = *pf++; - if (c == '\\') { - c = parse_escape(&pf); - if (c < 0) { - if (do_lint) - warning("backslash at end of string"); - c = '\\'; - } - *ptm++ = c; - } else - *ptm++ = c; - } - len = ptm - r->stptr; - erealloc(r->stptr, char *, len + 1, "make_str_node"); - r->stptr[len] = '\0'; - r->flags |= PERM; - } - r->stlen = len; - r->stref = 1; - r->stfmt = -1; - - return r; -} - -NODE * -tmp_string(s, len) -char *s; -size_t len; -{ - register NODE *r; - - r = make_string(s, len); - r->flags |= TEMP; - return r; -} - - -#define NODECHUNK 100 - -NODE *nextfree = NULL; - -NODE * -more_nodes() -{ - register NODE *np; - - /* get more nodes and initialize list */ - emalloc(nextfree, NODE *, NODECHUNK * sizeof(NODE), "newnode"); - for (np = nextfree; np < &nextfree[NODECHUNK - 1]; np++) { - np->flags = 0; - np->nextp = np + 1; - } - np->nextp = NULL; - np = nextfree; - nextfree = nextfree->nextp; - return np; -} - -#ifdef DEBUG -void -freenode(it) -NODE *it; -{ -#ifdef MPROF - it->stref = 0; - free((char *) it); -#else /* not MPROF */ - /* add it to head of freelist */ - it->nextp = nextfree; - nextfree = it; -#endif /* not MPROF */ -} -#endif /* DEBUG */ - -void -unref(tmp) -register NODE *tmp; -{ - if (tmp == NULL) - return; - if (tmp->flags & PERM) - return; - if (tmp->flags & (MALLOC|TEMP)) { - tmp->flags &= ~TEMP; - if (tmp->flags & STR) { - if (tmp->stref > 1) { - if (tmp->stref != 255) - tmp->stref--; - return; - } - free(tmp->stptr); - } - freenode(tmp); - } -} - -/* - * Parse a C escape sequence. STRING_PTR points to a variable containing a - * pointer to the string to parse. That pointer is updated past the - * characters we use. The value of the escape sequence is returned. - * - * A negative value means the sequence \ newline was seen, which is supposed to - * be equivalent to nothing at all. - * - * If \ is followed by a null character, we return a negative value and leave - * the string pointer pointing at the null character. - * - * If \ is followed by 000, we return 0 and leave the string pointer after the - * zeros. A value of 0 does not mean end of string. - * - * Posix doesn't allow \x. - */ - -int -parse_escape(string_ptr) -char **string_ptr; -{ - register int c = *(*string_ptr)++; - register int i; - register int count; - - switch (c) { - case 'a': - return BELL; - case 'b': - return '\b'; - case 'f': - return '\f'; - case 'n': - return '\n'; - case 'r': - return '\r'; - case 't': - return '\t'; - case 'v': - return '\v'; - case '\n': - return -2; - case 0: - (*string_ptr)--; - return -1; - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - i = c - '0'; - count = 0; - while (++count < 3) { - if ((c = *(*string_ptr)++) >= '0' && c <= '7') { - i *= 8; - i += c - '0'; - } else { - (*string_ptr)--; - break; - } - } - return i; - case 'x': - if (do_lint) { - static int didwarn; - - if (! didwarn) { - didwarn = 1; - warning("Posix does not allow \"\\x\" escapes"); - } - } - if (do_posix) - return ('x'); - i = 0; - while (1) { - if (isxdigit((c = *(*string_ptr)++))) { - i *= 16; - if (isdigit(c)) - i += c - '0'; - else if (isupper(c)) - i += c - 'A' + 10; - else - i += c - 'a' + 10; - } else { - (*string_ptr)--; - break; - } - } - return i; - default: - return c; - } -} diff --git a/gnu/usr.bin/awk/patchlevel.h b/gnu/usr.bin/awk/patchlevel.h deleted file mode 100644 index c80ca15..0000000 --- a/gnu/usr.bin/awk/patchlevel.h +++ /dev/null @@ -1 +0,0 @@ -#define PATCHLEVEL 5 diff --git a/gnu/usr.bin/awk/protos.h b/gnu/usr.bin/awk/protos.h deleted file mode 100644 index 62e933a..0000000 --- a/gnu/usr.bin/awk/protos.h +++ /dev/null @@ -1,122 +0,0 @@ -/* - * protos.h -- function prototypes for when the headers don't have them. - */ - -/* - * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#ifdef __STDC__ -#define aptr_t void * /* arbitrary pointer type */ -#else -#define aptr_t char * -#endif -extern aptr_t malloc P((MALLOC_ARG_T)); -extern aptr_t realloc P((aptr_t, MALLOC_ARG_T)); -extern aptr_t calloc P((MALLOC_ARG_T, MALLOC_ARG_T)); - -#if !defined(sun) && !defined(__sun__) -extern void free P((aptr_t)); -#endif -extern char *getenv P((const char *)); - -extern char *strcpy P((char *, const char *)); -extern char *strcat P((char *, const char *)); -extern int strcmp P((const char *, const char *)); -extern char *strncpy P((char *, const char *, size_t)); -extern int strncmp P((const char *, const char *, size_t)); -#ifndef VMS -extern char *strerror P((int)); -#else -extern char *strerror P((int,...)); -#endif -extern char *strchr P((const char *, int)); -extern char *strrchr P((const char *, int)); -extern char *strstr P((const char *s1, const char *s2)); -extern size_t strlen P((const char *)); -extern long strtol P((const char *, char **, int)); -#if !defined(_MSC_VER) && !defined(__GNU_LIBRARY__) -extern size_t strftime P((char *, size_t, const char *, const struct tm *)); -#endif -#ifdef __STDC__ -extern time_t time P((time_t *)); -#else -extern long time(); -#endif -extern aptr_t memset P((aptr_t, int, size_t)); -extern aptr_t memcpy P((aptr_t, const aptr_t, size_t)); -extern aptr_t memmove P((aptr_t, const aptr_t, size_t)); -extern aptr_t memchr P((const aptr_t, int, size_t)); -extern int memcmp P((const aptr_t, const aptr_t, size_t)); - -extern int fprintf P((FILE *, const char *, ...)); -#if !defined(MSDOS) && !defined(__GNU_LIBRARY__) -#ifdef __STDC__ -extern size_t fwrite P((const aptr_t, size_t, size_t, FILE *)); -#else -extern int fwrite(); -#endif -extern int fputs P((const char *, FILE *)); -extern int unlink P((const char *)); -#endif -extern int fflush P((FILE *)); -extern int fclose P((FILE *)); -extern FILE *popen P((const char *, const char *)); -extern int pclose P((FILE *)); -extern void abort P(()); -extern int isatty P((int)); -extern void exit P((int)); -extern int system P((const char *)); -extern int sscanf P((const char *, const char *, ...)); -#ifndef toupper -extern int toupper P((int)); -#endif -#ifndef tolower -extern int tolower P((int)); -#endif - -extern double pow P((double x, double y)); -extern double atof P((const char *)); -extern double strtod P((const char *, char **)); -extern int fstat P((int, struct stat *)); -extern int stat P((const char *, struct stat *)); -extern off_t lseek P((int, off_t, int)); -extern int fseek P((FILE *, long, int)); -extern int close P((int)); -extern int creat P((const char *, mode_t)); -extern int open P((const char *, int, ...)); -extern int pipe P((int *)); -extern int dup P((int)); -extern int dup2 P((int,int)); -extern int fork P(()); -extern int execl P((/* char *, char *, ... */)); -#ifndef __STDC__ -extern int read P((int, char *, int)); -#endif -extern int wait P((int *)); -extern void _exit P((int)); - -#ifdef NON_STD_SPRINTF -extern char *sprintf P((char *, const char*, ...)); -#else -extern int sprintf P((char *, const char*, ...)); -#endif /* SPRINTF_INT */ - -#undef aptr_t diff --git a/gnu/usr.bin/awk/re.c b/gnu/usr.bin/awk/re.c deleted file mode 100644 index 5ee1c43..0000000 --- a/gnu/usr.bin/awk/re.c +++ /dev/null @@ -1,212 +0,0 @@ -/* - * re.c - compile regular expressions. - */ - -/* - * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc. - * - * This file is part of GAWK, the GNU implementation of the - * AWK Progamming Language. - * - * GAWK is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * GAWK is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with GAWK; see the file COPYING. If not, write to - * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - */ - -#include "awk.h" - -/* Generate compiled regular expressions */ - -Regexp * -make_regexp(s, len, ignorecase, dfa) -char *s; -size_t len; -int ignorecase; -int dfa; -{ - Regexp *rp; - const char *rerr; - char *src = s; - char *temp; - char *end = s + len; - register char *dest; - register int c; - - /* Handle escaped characters first. */ - - /* Build a copy of the string (in dest) with the - escaped characters translated, and generate the regex - from that. - */ - emalloc(dest, char *, len + 2, "make_regexp"); - temp = dest; - - while (src < end) { - if (*src == '\\') { - c = *++src; - switch (c) { - case 'a': - case 'b': - case 'f': - case 'n': - case 'r': - case 't': - case 'v': - case 'x': - case '0': - case '1': - case '2': - case '3': - case '4': - case '5': - case '6': - case '7': - c = parse_escape(&src); - if (c < 0) - cant_happen(); - *dest++ = (char)c; - break; - default: - *dest++ = '\\'; - *dest++ = (char)c; - src++; - break; - } /* switch */ - } else { - *dest++ = *src++; /* not '\\' */ - } - } /* for */ - - *dest = '\0' ; /* Only necessary if we print dest ? */ - emalloc(rp, Regexp *, sizeof(*rp), "make_regexp"); - memset((char *) rp, 0, sizeof(*rp)); - emalloc(rp->pat.buffer, unsigned char *, 16, "make_regexp"); - rp->pat.allocated = 16; - emalloc(rp->pat.fastmap, char *, 256, "make_regexp"); - - if (ignorecase) - rp->pat.translate = casetable; - else - rp->pat.translate = NULL; - len = dest - temp; - if ((rerr = re_compile_pattern(temp, len, &(rp->pat))) != NULL) - fatal("%s: /%s/", rerr, temp); - if (dfa && !ignorecase) { - dfacomp(temp, len, &(rp->dfareg), 1); - rp->dfa = 1; - } else - rp->dfa = 0; - - free(temp); - return rp; -} - -int -research(rp, str, start, len, need_start) -Regexp *rp; -register char *str; -int start; -register size_t len; -int need_start; -{ - char *ret = str; - - if (rp->dfa) { - char save; - int count = 0; - int try_backref; - - /* - * dfa likes to stick a '\n' right after the matched - * text. So we just save and restore the character. - */ - save = str[start+len]; - ret = dfaexec(&(rp->dfareg), str+start, str+start+len, 1, - &count, &try_backref); - str[start+len] = save; - } - if (ret) { - if (need_start || rp->dfa == 0) - return re_search(&(rp->pat), str, start+len, start, - len, &(rp->regs)); - else - return 1; - } else - return -1; -} - -void -refree(rp) -Regexp *rp; -{ - free(rp->pat.buffer); - free(rp->pat.fastmap); - if (rp->dfa) - dfafree(&(rp->dfareg)); - free(rp); -} - -void -dfaerror(s) -const char *s; -{ - fatal(s); -} - -Regexp * -re_update(t) -NODE *t; -{ - NODE *t1; - -# define CASE 1 - if ((t->re_flags & CASE) == IGNORECASE) { - if (t->re_flags & CONST) - return t->re_reg; - t1 = force_string(tree_eval(t->re_exp)); - if (t->re_text) { - if (cmp_nodes(t->re_text, t1) == 0) { - free_temp(t1); - return t->re_reg; - } - unref(t->re_text); - } - t->re_text = dupnode(t1); - free_temp(t1); - } - if (t->re_reg) - refree(t->re_reg); - if (t->re_cnt) - t->re_cnt++; - if (t->re_cnt > 10) - t->re_cnt = 0; - if (!t->re_text) { - t1 = force_string(tree_eval(t->re_exp)); - t->re_text = dupnode(t1); - free_temp(t1); - } - t->re_reg = make_regexp(t->re_text->stptr, t->re_text->stlen, - IGNORECASE, t->re_cnt); - t->re_flags &= ~CASE; - t->re_flags |= IGNORECASE; - return t->re_reg; -} - -void -resetup() -{ - reg_syntax_t syn = RE_SYNTAX_AWK; - - (void) re_set_syntax(syn); - dfasyntax(syn, 0); -} diff --git a/gnu/usr.bin/awk/version.c b/gnu/usr.bin/awk/version.c deleted file mode 100644 index ffcd8e1..0000000 --- a/gnu/usr.bin/awk/version.c +++ /dev/null @@ -1,47 +0,0 @@ -char *version_string = "@(#)Gnu Awk (gawk) 2.15"; - -/* 1.02 fixed /= += *= etc to return the new Left Hand Side instead - of the Right Hand Side */ - -/* 1.03 Fixed split() to treat strings of space and tab as FS if - the split char is ' '. - - Added -v option to print version number - - Fixed bug that caused rounding when printing large numbers */ - -/* 2.00beta Incorporated the functionality of the "new" awk as described - the book (reference not handy). Extensively tested, but no - doubt still buggy. Badly needs tuning and cleanup, in - particular in memory management which is currently almost - non-existent. */ - -/* 2.01 JF: Modified to compile under GCC, and fixed a few - bugs while I was at it. I hope I didn't add any more. - I modified parse.y to reduce the number of reduce/reduce - conflicts. There are still a few left. */ - -/* 2.02 Fixed JF's bugs; improved memory management, still needs - lots of work. */ - -/* 2.10 Major grammar rework and lots of bug fixes from David. - Major changes for performance enhancements from David. - A number of minor bug fixes and new features from Arnold. - Changes for MSDOS from Conrad Kwok and Scott Garfinkle. - The gawk.texinfo and info files included! */ - -/* 2.11 Bug fix release to 2.10. Lots of changes for portability, - speed, and configurability. */ - -/* 2.12 Lots of changes for portability, speed, and configurability. - Several bugs fixed. POSIX compliance. Removal of last set - of hard-wired limits. Atari and VMS ports added. */ - -/* 2.13 Public release of 2.12 */ - -/* 2.14 Mostly bug fixes. */ - -/* 2.15 Bug fixes plus intermixing of command-line source and files, - GNU long options, ARGIND, ERRNO and Plan 9 style /dev/ files. - `delete array'. OS/2 port added. */ - |