diff options
Diffstat (limited to 'contrib/awk/doc/gawk.texi')
-rw-r--r-- | contrib/awk/doc/gawk.texi | 26169 |
1 files changed, 0 insertions, 26169 deletions
diff --git a/contrib/awk/doc/gawk.texi b/contrib/awk/doc/gawk.texi deleted file mode 100644 index 808ef6e..0000000 --- a/contrib/awk/doc/gawk.texi +++ /dev/null @@ -1,26169 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@c %**start of header (This is for running Texinfo on a region.) -@setfilename gawk.info -@settitle The GNU Awk User's Guide -@c %**end of header (This is for running Texinfo on a region.) - -@dircategory GNU Packages -@direntry -* Gawk: (gawk). A text scanning and processing language. -@end direntry -@dircategory Individual utilities -@direntry -* awk: (gawk)Invoking gawk. Text scanning and processing. -@end direntry - -@c @set xref-automatic-section-title - -@c The following information should be updated here only! -@c This sets the edition of the document, the version of gawk it -@c applies to and all the info about who's publishing this edition - -@c These apply across the board. -@set UPDATE-MONTH March, 2001 -@set VERSION 3.1 -@set PATCHLEVEL 0 - -@set FSF - -@set TITLE GAWK: Effective AWK Programming -@set SUBTITLE A User's Guide for GNU Awk -@set EDITION 3 - -@iftex -@set DOCUMENT book -@set CHAPTER chapter -@set APPENDIX appendix -@set SECTION section -@set SUBSECTION subsection -@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}} -@end iftex -@ifinfo -@set DOCUMENT Info file -@set CHAPTER major node -@set APPENDIX major node -@set SECTION minor node -@set SUBSECTION node -@set DARKCORNER (d.c.) -@end ifinfo -@ifhtml -@set DOCUMENT Web page -@set CHAPTER chapter -@set APPENDIX appendix -@set SECTION section -@set SUBSECTION subsection -@set DARKCORNER (d.c.) -@end ifhtml - -@c some special symbols -@iftex -@set LEQ @math{@leq} -@end iftex -@ifnottex -@set LEQ <= -@end ifnottex - -@set FN file name -@set FFN File Name -@set DF data file -@set DDF Data File -@set PVERSION version - -@ignore -Some comments on the layout for TeX. -1. Use at least texinfo.tex 2000-09-06.09 -2. I have done A LOT of work to make this look good. There are `@page' commands - and use of `@group ... @end group' in a number of places. If you muck - with anything, it's your responsibility not to break the layout. -@end ignore - -@c merge the function and variable indexes into the concept index -@ifinfo -@synindex fn cp -@synindex vr cp -@end ifinfo -@iftex -@syncodeindex fn cp -@syncodeindex vr cp -@end iftex - -@c If "finalout" is commented out, the printed output will show -@c black boxes that mark lines that are too long. Thus, it is -@c unwise to comment it out when running a master in case there are -@c overfulls which are deemed okay. - -@iftex -@finalout -@end iftex - -@c Comment out the "smallbook" for technical review. Saves -@c considerable paper. Remember to turn it back on *before* -@c starting the page-breaking work. -@smallbook - -@ifinfo -This file documents @command{awk}, a program that you can use to select -particular records in a file and perform operations upon them. - -This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, -for the @value{VERSION}.@value{PATCHLEVEL} version of the GNU implementation of AWK. - -Copyright (C) 1989, 1991, 1992, 1993, 1996-2001 Free Software Foundation, Inc. - -Permission is granted to copy, distribute and/or modify this document -under the terms of the GNU Free Documentation License, Version 1.1 or -any later version published by the Free Software Foundation; with the -Invariant Sections being ``GNU General Public License'', the Front-Cover -texts being (a) (see below), and with the Back-Cover Texts being (b) -(see below). A copy of the license is included in the section entitled -``GNU Free Documentation License''. - -@enumerate a -@item -``A GNU Manual'' - -@item -``You have freedom to copy and modify this GNU Manual, like GNU -software. Copies published by the Free Software Foundation raise -funds for GNU development.'' -@end enumerate -@end ifinfo - -@c Uncomment this for the release. Leaving it off saves paper -@c during editing and review. -@setchapternewpage odd - -@titlepage -@title @value{TITLE} -@subtitle @value{SUBTITLE} -@subtitle Edition @value{EDITION} -@subtitle @value{UPDATE-MONTH} -@author Arnold D. Robbins - -@c Include the Distribution inside the titlepage environment so -@c that headings are turned off. Headings on and off do not work. - -@page -@vskip 0pt plus 1filll -@ignore -The programs and applications presented in this book have been -included for their instructional value. They have been tested with care -but are not guaranteed for any particular purpose. The publisher does not -offer any warranties or representations, nor does it accept any -liabilities with respect to the programs or applications. -So there. -@sp 2 -UNIX is a registered trademark of The Open Group in the United States and other countries. @* -Microsoft, MS and MS-DOS are registered trademarks, and Windows is a -trademark of Microsoft Corporation in the United States and other -countries. @* -Atari, 520ST, 1040ST, TT, STE, Mega and Falcon are registered trademarks -or trademarks of Atari Corporation. @* -DEC, Digital, OpenVMS, ULTRIX and VMS are trademarks of Digital Equipment -Corporation. @* -@end ignore -``To boldly go where no man has gone before'' is a -Registered Trademark of Paramount Pictures Corporation. @* -@c sorry, i couldn't resist -@sp 3 -Copyright @copyright{} 1989, 1991, 1992, 1993, 1996-2001 Free Software Foundation, Inc. -@sp 2 - -This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, -for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU -implementation of AWK. - -@sp 2 -Published by: -@sp 1 - -Free Software Foundation @* -59 Temple Place --- Suite 330 @* -Boston, MA 02111-1307 USA @* -Phone: +1-617-542-5942 @* -Fax: +1-617-542-2652 @* -Email: @email{gnu@@gnu.org} @* -URL: @uref{http://www.gnu.org/} @* - -@c This one is correct for gawk 3.1.0 from the FSF -ISBN 1-882114-28-0 @* - -Permission is granted to copy, distribute and/or modify this document -under the terms of the GNU Free Documentation License, Version 1.1 or -any later version published by the Free Software Foundation; with the -Invariant Sections being ``GNU General Public License'', the Front-Cover -texts being (a) (see below), and with the Back-Cover Texts being (b) -(see below). A copy of the license is included in the section entitled -``GNU Free Documentation License''. - -@enumerate a -@item -``A GNU Manual'' - -@item -``You have freedom to copy and modify this GNU Manual, like GNU -software. Copies published by the Free Software Foundation raise -funds for GNU development.'' -@end enumerate -@sp 2 -Cover art by Etienne Suvasa. -@end titlepage - -@c Thanks to Bob Chassell for directions on doing dedications. -@iftex -@headings off -@page -@w{ } -@sp 9 -@center @i{To Miriam, for making me complete.} -@sp 1 -@center @i{To Chana, for the joy you bring us.} -@sp 1 -@center @i{To Rivka, for the exponential increase.} -@sp 1 -@center @i{To Nachum, for the added dimension.} -@sp 1 -@center @i{To Malka, for the new beginning.} -@w{ } -@page -@w{ } -@page -@headings on -@end iftex - -@iftex -@headings off -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex - -@ifinfo -@node Top, Foreword, (dir), (dir) -@top General Introduction -@c Preface node should come right after the Top -@c node, in `unnumbered' sections, then the chapter, `What is gawk'. -@c Licensing nodes are appendices, they're not central to AWK. - -This file documents @command{awk}, a program that you can use to select -particular records in a file and perform operations upon them. - -This is Edition @value{EDITION} of @cite{@value{TITLE}: @value{SUBTITLE}}, -for the @value{VERSION}.@value{PATCHLEVEL} version of the GNU implementation -of AWK. - -@end ifinfo - -@menu -* Foreword:: Some nice words about this - @value{DOCUMENT}. -* Preface:: What this @value{DOCUMENT} is about; brief - history and acknowledgments. -* Getting Started:: A basic introduction to using - @command{awk}. How to run an @command{awk} - program. Command-line syntax. -* Regexp:: All about matching things using regular - expressions. -* Reading Files:: How to read files and manipulate fields. -* Printing:: How to print using @command{awk}. Describes - the @code{print} and @code{printf} - statements. Also describes redirection of - output. -* Expressions:: Expressions are the basic building blocks - of statements. -* Patterns and Actions:: Overviews of patterns and actions. -* Arrays:: The description and use of arrays. Also - includes array-oriented control statements. -* Functions:: Built-in and user-defined functions. -* Internationalization:: Getting @command{gawk} to speak your - language. -* Advanced Features:: Stuff for advanced users, specific to - @command{gawk}. -* Invoking Gawk:: How to run @command{gawk}. -* Library Functions:: A Library of @command{awk} Functions. -* Sample Programs:: Many @command{awk} programs with complete - explanations. -* Language History:: The evolution of the @command{awk} - language. -* Installation:: Installing @command{gawk} under various - operating systems. -* Notes:: Notes about @command{gawk} extensions and - possible future work. -* Basic Concepts:: A very quick intoduction to programming - concepts. -* Glossary:: An explanation of some unfamiliar terms. -* Copying:: Your right to copy and distribute - @command{gawk}. -* GNU Free Documentation License:: The license for this @value{DOCUMENT}. -* Index:: Concept and Variable Index. - -@detailmenu -* History:: The history of @command{gawk} and - @command{awk}. -* Names:: What name to use to find @command{awk}. -* This Manual:: Using this @value{DOCUMENT}. Includes - sample input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - @value{DOCUMENT}. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -* Running gawk:: How to run @command{gawk} programs; - includes command-line syntax. -* One-shot:: Running a short throw-away @command{awk} - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent @command{awk} programs in - files. -* Executable Scripts:: Making self-contained @command{awk} - programs. -* Comments:: Adding documentation to @command{gawk} - programs. -* Quoting:: More discussion of shell quoting issues. -* Sample Data Files:: Sample data files for use in the - @command{awk} programs illustrated in this - @value{DOCUMENT}. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of @command{awk}. -* When:: When to use @command{gawk} and when to use - other things. -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write non-printing characters. -* Regexp Operators:: Regular Expression Operators. -* Character Lists:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Non-Constant Fields:: Non-constant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting @code{FS} from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -* Constant Size:: Reading constant width data. -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program - control using the @code{getline} function. -* Plain Getline:: Using @code{getline} with no arguments. -* Getline/Variable:: Using @code{getline} into a variable. -* Getline/File:: Using @code{getline} from a file. -* Getline/Variable/File:: Using @code{getline} into a variable from a - file. -* Getline/Pipe:: Using @code{getline} from a pipe. -* Getline/Variable/Pipe:: Using @code{getline} into a variable from a - pipe. -* Getline/Coprocess:: Using @code{getline} from a coprocess. -* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a - coprocess. -* Getline Notes:: Important things to know about - @code{getline}. -* Getline Summary:: Summary of @code{getline} Variants. -* Print:: The @code{print} statement. -* Print Examples:: Simple examples of @code{print} statements. -* Output Separators:: The output separators and how to change - them. -* OFMT:: Controlling Numeric Output With - @code{print}. -* Printf:: The @code{printf} statement. -* Basic Printf:: Syntax of the @code{printf} statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -* Redirection:: How to redirect output to multiple files - and pipes. -* Special Files:: File name interpretation in @command{gawk}. - @command{gawk} allows access to inherited - file descriptors. -* Special FD:: Special files for I/O. -* Special Process:: Special files for process information. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -* Constants:: String, numeric and regexp constants. -* Scalar Constants:: Numeric and string constants. -* Non-decimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later - use. -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -* Conversion:: The conversion of strings to numbers and - vice versa. -* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a - field. -* Increment Ops:: Incrementing the numeric value of a - variable. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings - with @samp{<}, etc. -* Boolean Ops:: Combining comparison expressions using - boolean operators @samp{||} (``or''), - @samp{&&} (``and'') and @samp{!} (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -* Pattern Overview:: What goes into a pattern. -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup - rules. -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -* Empty:: The empty pattern, which matches every - record. -* Using Shell Variables:: How to use shell variables with - @command{awk}. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* If Statement:: Conditionally execute some @command{awk} - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until - some condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Break Statement:: Immediately exit the innermost enclosing - loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of @command{awk}. -* Built-in Variables:: Summarizes the built-in variables. -* User-modified:: Built-in variables that you change to - control @command{awk}. -* Auto-set:: Built-in variables where @command{awk} - gives you information. -* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}. -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the @code{for} statement. It - loops through the indices of an array's - existing elements. -* Delete:: The @code{delete} statement removes an - element from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - @command{awk}. -* Uninitialized Subscripts:: Using Uninitialized variables as - subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. -* Array Sorting:: Sorting array values and indices. -* Built-in:: Summarizes the built-in functions. -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - @code{int}, @code{sin} and @code{rand}. -* String Functions:: Functions for string manipulation, such as - @code{split}, @code{match} and - @code{sprintf}. -* Gory Details:: More than you want to know about @samp{\} - and @samp{&} with @code{sub}, @code{gsub}, - and @code{gensub}. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* I18N Functions:: Functions for string translation. -* User-defined:: Describes User-defined functions in detail. -* Definition Syntax:: How to write definitions and what they - mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -* Non-decimal Data:: Allowing non-decimal input data. -* Two-way I/O:: Two-way communications with another - process. -* TCP/IP Networking:: Using @command{gawk} for network - programming. -* Portal Files:: Using @command{gawk} with BSD portals. -* Profiling:: Profiling your @command{awk} programs. -* Command Line:: How to run @command{awk}. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* AWKPATH Variable:: Searching directories for @command{awk} - programs. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Known Bugs:: Known Bugs in @command{gawk}. -* Library Names:: How to best name private global variables - in library functions. -* General Functions:: Functions that are of general use. -* Nextfile Function:: Two implementations of a @code{nextfile} - function. -* Assert Function:: A function for assertions in @command{awk} - programs. -* Round Function:: A function for rounding if @code{sprintf} - does not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers - and vice versa. -* Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. -* Data File Management:: Functions for managing command-line data - files. -* Filetrans Function:: A function for handling data file - transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Ignoring Assigns:: Treating assignments as file names. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Cut Program:: The @command{cut} utility. -* Egrep Program:: The @command{egrep} utility. -* Id Program:: The @command{id} utility. -* Split Program:: The @command{split} utility. -* Tee Program:: The @command{tee} utility. -* Uniq Program:: The @command{uniq} utility. -* Wc Program:: The @command{wc} utility. -* Miscellaneous Programs:: Some interesting @command{awk} programs. -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the @command{tr} - utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a - history file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for @command{awk} that includes - files. -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from the Bell Laboratories - version of @command{awk}. -* POSIX/GNU:: The extensions in @command{gawk} not in - POSIX @command{awk}. -* Contributors:: The major contributors to @command{gawk}. -* Gawk Distribution:: What is in the @command{gawk} distribution. -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -* Unix Installation:: Installing @command{gawk} under various - versions of Unix. -* Quick Installation:: Compiling @command{gawk} under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -* Non-Unix Installation:: Installation on Other Operating Systems. -* Amiga Installation:: Installing @command{gawk} on an Amiga. -* BeOS Installation:: Installing @command{gawk} on BeOS. -* PC Installation:: Installing and Compiling @command{gawk} on - MS-DOS and OS/2. -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, Win32, - and OS/2. -* PC Using:: Running @command{gawk} on MS-DOS, Win32 and - OS/2. -* VMS Installation:: Installing @command{gawk} on VMS. -* VMS Compilation:: How to compile @command{gawk} under VMS. -* VMS Installation Details:: How to install @command{gawk} under VMS. -* VMS Running:: How to run @command{gawk} under VMS. -* VMS POSIX:: Alternate instructions for VMS POSIX. -* Unsupported:: Systems whose ports are no longer - supported. -* Atari Installation:: Installing @command{gawk} on the Atari ST. -* Atari Compiling:: Compiling @command{gawk} on Atari. -* Atari Using:: Running @command{gawk} on Atari. -* Tandem Installation:: Installing @command{gawk} on a Tandem. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available @command{awk} - implementations. -* Compatibility Mode:: How to disable certain @command{gawk} - extensions. -* Additions:: Making Additions To @command{gawk}. -* Adding Code:: Adding code to the main body of - @command{gawk}. -* New Ports:: Porting @command{gawk} to a new operating - system. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. -* Internals:: A brief look at some @command{gawk} - internals. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -* Future Extensions:: New features that may be implemented one - day. -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. -* Floating Point Issues:: Stuff to know about floating-point numbers. -@end detailmenu -@end menu - -@c dedication for Info file -@ifinfo -@center To Miriam, for making me complete. -@sp 1 -@center To Chana, for the joy you bring us. -@sp 1 -@center To Rivka, for the exponential increase. -@sp 1 -@center To Nachum, for the added dimension. -@sp 1 -@center To Malka, for the new beginning. -@end ifinfo - -@summarycontents -@contents - -@node Foreword, Preface, Top, Top -@unnumbered Foreword - -Arnold Robbins and I are good friends. We were introduced 11 years ago -by circumstances---and our favorite programming language, AWK. -The circumstances started a couple of years -earlier. I was working at a new job and noticed an unplugged -Unix computer sitting in the corner. No one knew how to use it, -and neither did I. However, -a couple of days later it was running, and -I was @code{root} and the one-and-only user. -That day, I began the transition from statistician to Unix programmer. - -On one of many trips to the library or bookstore in search of -books on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and -Weinberger, @cite{The AWK Programming Language}, Addison-Wesley, -1988. AWK's simple programming paradigm---find a pattern in the -input and then perform an action---often reduced complex or tedious -data manipulations to few lines of code. I was excited to try my -hand at programming in AWK. - -Alas, the @command{awk} on my computer was a limited version of the -language described in the AWK book. I discovered that my computer -had ``old @command{awk}'' and the AWK book described ``new @command{awk}.'' -I learned that this was typical; the old version refused to step -aside or relinquish its name. If a system had a new @command{awk}, it was -invariably called @command{nawk}, and few systems had it. -The best way to get a new @command{awk} was to @command{ftp} the source code for -@command{gawk} from @code{prep.ai.mit.edu}. @command{gawk} was a version of -new @command{awk} written by David Trueman and Arnold, and available under -the GNU General Public License. - -(Incidentally, -it's no longer difficult to find a new @command{awk}. @command{gawk} ships with -Linux, and you can download binaries or source code for almost -any system; my wife uses @command{gawk} on her VMS box.) - -My Unix system started out unplugged from the wall; it certainly was not -plugged into a network. So, oblivious to the existence of @command{gawk} -and the Unix community in general, and desiring a new @command{awk}, I wrote -my own, called @command{mawk}. -Before I was finished I knew about @command{gawk}, -but it was too late to stop, so I eventually posted -to a @code{comp.sources} newsgroup. - -A few days after my posting, I got a friendly email -from Arnold introducing -himself. He suggested we share design and algorithms and -attached a draft of the POSIX standard so -that I could update @command{mawk} to support language extensions added -after publication of the AWK book. - -Frankly, if our roles had -been reversed, I would not have been so open and we probably would -have never met. I'm glad we did meet. -He is an AWK expert's AWK expert and a genuinely nice person. -Arnold contributes significant amounts of his -expertise and time to the Free Software Foundation. - -This book is the @command{gawk} reference manual, but at its core it -is a book about AWK programming that -will appeal to a wide audience. -It is a definitive reference to the AWK language as defined by the -1987 Bell Labs release and codified in the 1992 POSIX Utilities -standard. - -On the other hand, the novice AWK programmer can study -a wealth of practical programs that emphasize -the power of AWK's basic idioms: -data driven control-flow, pattern matching with regular expressions, -and associative arrays. -Those looking for something new can try out @command{gawk}'s -interface to network protocols via special @file{/inet} files. - -The programs in this book make clear that an AWK program is -typically much smaller and faster to develop than -a counterpart written in C. -Consequently, there is often a payoff to prototype an -algorithm or design in AWK to get it running quickly and expose -problems early. Often, the interpreted performance is adequate -and the AWK prototype becomes the product. - -The new @command{pgawk} (profiling @command{gawk}), produces -program execution counts. -I recently experimented with an algorithm that for -@math{n} lines of input, exhibited -@tex -$\sim\! Cn^2$ -@end tex -@ifnottex -~ C n^2 -@end ifnottex -performance, while -theory predicted -@tex -$\sim\! Cn\log n$ -@end tex -@ifnottex -~ C n log n -@end ifnottex -behavior. A few minutes poring -over the @file{awkprof.out} profile pinpointed the problem to -a single line of code. @command{pgawk} is a welcome addition to -my programmer's toolbox. - -Arnold has distilled over a decade of experience writing and -using AWK programs, and developing @command{gawk}, into this book. If you use -AWK or want to learn how, then read this book. - -@display -Michael Brennan -Author of @command{mawk} -@end display - -@node Preface, Getting Started, Foreword, Top -@unnumbered Preface -@c I saw a comment somewhere that the preface should describe the book itself, -@c and the introduction should describe what the book covers. -@c -@c 12/2000: Chuck wants the preface & intro combined. - -Several kinds of tasks occur repeatedly -when working with text files. -You might want to extract certain lines and discard the rest. -Or you may need to make changes wherever certain patterns appear, -but leave the rest of the file alone. -Writing single-use programs for these tasks in languages such as C, C++ or Pascal -is time-consuming and inconvenient. -Such jobs are often easier with @command{awk}. -The @command{awk} utility interprets a special-purpose programming language -that makes it easy to handle simple data-reformatting jobs. - -The GNU implementation of @command{awk} is called @command{gawk}; it is fully -compatible with the System V Release 4 version of -@command{awk}. @command{gawk} is also compatible with the POSIX -specification of the @command{awk} language. This means that all -properly written @command{awk} programs should work with @command{gawk}. -Thus, we usually don't distinguish between @command{gawk} and other -@command{awk} implementations. - -@cindex uses of @command{awk} -@cindex applications of @command{awk} -Using @command{awk} allows you to: - -@itemize @bullet -@item -Manage small, personal databases - -@item -Generate reports - -@item -Validate data - -@item -Produce indexes and perform other document preparation tasks - -@item -Experiment with algorithms that you can adapt later to other computer -languages. -@end itemize - -@cindex uses of @command{gawk} -In addition, -@command{gawk} -provides facilities that make it easy to: - -@itemize @bullet -@item -Extract bits and pieces of data for processing - -@item -Sort data - -@item -Perform simple network communications. -@end itemize - -This @value{DOCUMENT} teaches you about the @command{awk} language and -how you can use it effectively. You should already be familiar with basic -system commands, such as @command{cat} and @command{ls},@footnote{These commands -are available on POSIX-compliant systems, as well as on traditional Unix -based systems. If you are using some other operating system, you still need to -be familiar with the ideas of I/O redirection and pipes.} as well as basic shell -facilities, such as Input/Output (I/O) redirection and pipes. - -Implementations of the @command{awk} language are available for many -different computing environments. This @value{DOCUMENT}, while describing -the @command{awk} language in general, also describes the particular -implementation of @command{awk} called @command{gawk} (which stands for -``GNU awk''). @command{gawk} runs on a broad range of Unix systems, -ranging from 80386 PC-based computers, up through large-scale systems, -such as Crays. @command{gawk} has also been ported to Mac OS X, -MS-DOS, Microsoft Windows (all versions) and OS/2 PC's, Atari and Amiga -micro-computers, BeOS, Tandem D20, and VMS. - -@menu -* History:: The history of @command{gawk} and - @command{awk}. -* Names:: What name to use to find @command{awk}. -* This Manual:: Using this @value{DOCUMENT}. Includes sample - input files that you can use. -* Conventions:: Typographical Conventions. -* Manual History:: Brief history of the GNU project and this - @value{DOCUMENT}. -* How To Contribute:: Helping to save the world. -* Acknowledgments:: Acknowledgments. -@end menu - -@node History, Names, Preface, Preface -@unnumberedsec History of @command{awk} and @command{gawk} -@cindex recipe for a programming language -@cindex programming language, recipe for -@center Recipe For A Programming Language - -@multitable {2 parts} {1 part @code{egrep}} {1 part @code{snobol}} -@item @tab 1 part @code{egrep} @tab 1 part @code{snobol} -@item @tab 2 parts @code{ed} @tab 3 parts C -@end multitable - -@quotation -Blend all parts well using @code{lex} and @code{yacc}. -Document minimally and release. - -After eight years, add another part @code{egrep} and two -more parts C. Document very well and release. -@end quotation - -@cindex acronym -@cindex history of @command{awk} -@cindex Aho, Alfred -@cindex Weinberger, Peter -@cindex Kernighan, Brian -@cindex old @command{awk} -@cindex new @command{awk} -The name @command{awk} comes from the initials of its designers: Alfred V.@: -Aho, Peter J.@: Weinberger and Brian W.@: Kernighan. The original version of -@command{awk} was written in 1977 at AT&T Bell Laboratories. -In 1985, a new version made the programming -language more powerful, introducing user-defined functions, multiple input -streams, and computed regular expressions. -This new version became widely available with Unix System V -Release 3.1 (SVR3.1). -The version in SVR4 added some new features and cleaned -up the behavior in some of the ``dark corners'' of the language. -The specification for @command{awk} in the POSIX Command Language -and Utilities standard further clarified the language. -Both the @command{gawk} designers and the original Bell Laboratories @command{awk} -designers provided feedback for the POSIX specification. - -@cindex Rubin, Paul -@cindex Fenlason, Jay -@cindex Trueman, David -Paul Rubin wrote the GNU implementation, @command{gawk}, in 1986. -Jay Fenlason completed it, with advice from Richard Stallman. John Woods -contributed parts of the code as well. In 1988 and 1989, David Trueman, with -help from me, thoroughly reworked @command{gawk} for compatibility -with the newer @command{awk}. -Circa 1995, I became the primary maintainer. -Current development focuses on bug fixes, -performance improvements, standards compliance, and occasionally, new features. - -In May of 1997, J@"urgen Kahrs felt the need for network access -from @command{awk}, and with a little help from me, set about adding -features to do this for @command{gawk}. At that time, he also -wrote the bulk of -@cite{TCP/IP Internetworking with @command{gawk}} -(a separate document, available as part of the @command{gawk} distribution). -His code finally became part of the main @command{gawk} distribution -with @command{gawk} @value{PVERSION} 3.1. - -@xref{Contributors, ,Major Contributors to @command{gawk}}, -for a complete list of those who made important contributions to @command{gawk}. - -@node Names, This Manual, History, Preface -@section A Rose by Any Other Name - -@cindex old @command{awk} vs. new @command{awk} -@cindex new @command{awk} vs. old @command{awk} -The @command{awk} language has evolved over the years. Full details are -provided in @ref{Language History, ,The Evolution of the @command{awk} Language}. -The language described in this @value{DOCUMENT} -is often referred to as ``new @command{awk}'' (@command{nawk}). - -Because of this, many systems have multiple -versions of @command{awk}. -Some systems have an @command{awk} utility that implements the -original version of the @command{awk} language and a @command{nawk} utility -for the new -version. -Others have an @command{oawk} for the ``old @command{awk}'' -language and plain @command{awk} for the new one. Still others only -have one version, which is usually the new one.@footnote{Often, these systems -use @command{gawk} for their @command{awk} implementation!} - -All in all, this makes it difficult for you to know which version of -@command{awk} you should run when writing your programs. The best advice -I can give here is to check your local documentation. Look for @command{awk}, -@command{oawk}, and @command{nawk}, as well as for @command{gawk}. -It is likely that you already -have some version of new @command{awk} on your system, which is what -you should use when running your programs. (Of course, if you're reading -this @value{DOCUMENT}, chances are good that you have @command{gawk}!) - -Throughout this @value{DOCUMENT}, whenever we refer to a language feature -that should be available in any complete implementation of POSIX @command{awk}, -we simply use the term @command{awk}. When referring to a feature that is -specific to the GNU implementation, we use the term @command{gawk}. - -@node This Manual, Conventions, Names, Preface -@section Using This Book -@cindex book, using this -@cindex using this book -@cindex language, @command{awk} -@cindex program, @command{awk} -@ignore -@cindex @command{awk} language -@cindex @command{awk} program -@end ignore -@cindex Brandon, Dick -@cindex sex, comparisons with -@quotation -@i{Documentation is like sex: when it is good, it is very, very good; and -when it is bad, it is better than nothing.}@* -Dick Brandon -@end quotation - -The term @command{awk} refers to a particular program as well as to the language you -use to tell this program what to do. When we need to be careful, we call -the program ``the @command{awk} utility'' and the language ``the @command{awk} -language.'' -This @value{DOCUMENT} explains -both the @command{awk} language and how to run the @command{awk} utility. -The term @dfn{@command{awk} program} refers to a program written by you in -the @command{awk} programming language. - -Primarily, this @value{DOCUMENT} explains the features of @command{awk}, -as defined in the POSIX standard. It does so in the context of the -@command{gawk} implementation. While doing so, it also -attempts to describe important differences between @command{gawk} -and other @command{awk} implementations.@footnote{All such differences -appear in the index under the heading ``differences between @command{gawk} and -@command{awk}.''} Finally, any @command{gawk} features that are not in -the POSIX standard for @command{awk} are noted. - -@ifnotinfo -This @value{DOCUMENT} has the difficult task of being both a tutorial and a reference. -If you are a novice, feel free to skip over details that seem too complex. -You should also ignore the many cross references; they are for the -expert user and for the online Info version of the document. -@end ifnotinfo - -There are -subsections labelled -as @strong{Advanced Notes} -scattered throughout the @value{DOCUMENT}. -They add a more complete explanation of points that are relevant, but not likely -to be of interest on first reading. -All appear in the index, under the heading ``advanced notes.'' - -Most of the time, the examples use complete @command{awk} programs. -In some of the more advanced sections, only the part of the @command{awk} -program that illustrates the concept currently being described is shown. - -While this @value{DOCUMENT} is aimed principally at people who have not been -exposed -to @command{awk}, there is a lot of information here that even the @command{awk} -expert should find useful. In particular, the description of POSIX -@command{awk} and the example programs in -@ref{Library Functions, ,A Library of @command{awk} Functions}, and in -@ref{Sample Programs, ,Practical @command{awk} Programs}, -should be of interest. - -@ref{Getting Started, ,Getting Started with @command{awk}}, -provides the essentials you need to know to begin using @command{awk}. - -@ref{Regexp, ,Regular Expressions}, -introduces regular expressions in general, and in particular the flavors -supported by POSIX @command{awk} and @command{gawk}. - -@ref{Reading Files, , Reading Input Files}, -describes how @command{awk} reads your data. -It introduces the concepts of records and fields, as well -as the @code{getline} command. -I/O redirection is first described here. - -@ref{Printing, , Printing Output}, -describes how @command{awk} programs can produce output with -@code{print} and @code{printf}. - -@ref{Expressions}, -describes expressions, which are the basic building blocks -for getting most things done in a program. - -@ref{Patterns and Actions, ,Patterns Actions and Variables}, -describes how to write patterns for matching records, actions for -doing something when a record is matched, and the built-in variables -@command{awk} and @command{gawk} use. - -@ref{Arrays, ,Arrays in @command{awk}}, -covers @command{awk}'s one-and-only data structure: associative arrays. -Deleting array elements and whole arrays is also described, as well as -sorting arrays in @command{gawk}. - -@ref{Functions}, -describes the built-in functions @command{awk} and -@command{gawk} provide for you, as well as how to define -your own functions. - -@ref{Internationalization, ,Internationalization with @command{gawk}}, -describes special features in @command{gawk} for translating program -messages into different languages at runtime. - -@ref{Advanced Features, ,Advanced Features of @command{gawk}}, -describes a number of @command{gawk}-specific advanced features. -Of particular note -are the abilities to have two-way communications with another process, -perform TCP/IP networking, and -profile your @command{awk} programs. - -@ref{Invoking Gawk, ,Running @command{awk} and @command{gawk}}, -describes how to run @command{gawk}, the meaning of its -command-line options, and how it finds @command{awk} -program source files. - -@ref{Library Functions, ,A Library of @command{awk} Functions}, and -@ref{Sample Programs, ,Practical @command{awk} Programs}, -provide many sample @command{awk} programs. -Reading them allows you to see @command{awk} being used -for solving real problems. - -@ref{Language History, ,The Evolution of the @command{awk} Language}, -describes how the @command{awk} language has evolved since it was -first released to present. It also describes how @command{gawk} -has acquired features over time. - -@ref{Installation, ,Installing @command{gawk}}, -describes how to get @command{gawk}, how to compile it -under Unix, and how to compile and use it on different -non-Unix systems. It also describes how to report bugs -in @command{gawk} and where to get three other freely -available implementations of @command{awk}. - -@ref{Notes, ,Implementation Notes}, -describes how to disable @command{gawk}'s extensions, as -well as how to contribute new code to @command{gawk}, -how to write extension libraries, and some possible -future directions for @command{gawk} development. - -@ref{Basic Concepts, ,Basic Programming Concepts}, -provides some very cursory background material for those who -are completely unfamiliar with computer programming. -Also centralized there is a discussion of some of the issues -involved in using floating-point numbers. - -The -@ref{Glossary}, -defines most, if not all, the significant terms used -throughout the book. -If you find terms that you aren't familiar with, try looking them up. - -@ref{Copying, ,GNU General Public License}, and -@ref{GNU Free Documentation License}, -present the licenses that cover the @command{gawk} source code, -and this @value{DOCUMENT}, respectively. - -@node Conventions, Manual History, This Manual, Preface -@section Typographical Conventions - -@cindex Texinfo -This @value{DOCUMENT} is written using Texinfo, the GNU documentation -formatting language. -A single Texinfo source file is used to produce both the printed and online -versions of the documentation. -@iftex -Because of this, the typographical conventions -are slightly different than in other books you may have read. -@end iftex -@ifnottex -This @value{SECTION} briefly documents the typographical conventions used in Texinfo. -@end ifnottex - -Examples you would type at the command-line are preceded by the common -shell primary and secondary prompts, @samp{$} and @samp{>}. -Output from the command is preceded by the glyph ``@print{}''. -This typically represents the command's standard output. -Error messages, and other output on the command's standard error, are preceded -by the glyph ``@error{}''. For example: - -@example -$ echo hi on stdout -@print{} hi on stdout -$ echo hello on stderr 1>&2 -@error{} hello on stderr -@end example - -@iftex -In the text, command names appear in @code{this font}, while code segments -appear in the same font and quoted, @samp{like this}. Some things are -emphasized @emph{like this}, and if a point needs to be made -strongly, it is done @strong{like this}. The first occurrence of -a new term is usually its @dfn{definition} and appears in the same -font as the previous occurrence of ``definition'' in this sentence. -@value{FN}s are indicated like this: @file{/path/to/ourfile}. -@end iftex - -Characters that you type at the keyboard look @kbd{like this}. In particular, -there are special characters called ``control characters.'' These are -characters that you type by holding down both the @kbd{CONTROL} key and -another key, at the same time. For example, a @kbd{Ctrl-d} is typed -by first pressing and holding the @kbd{CONTROL} key, next -pressing the @kbd{d} key and finally releasing both keys. - -@c fakenode --- for prepinfo -@subsubheading Dark Corners -@cindex Kernighan, Brian -@quotation -@i{Dark corners are basically fractal --- no matter how much -you illuminate, there's always a smaller but darker one.}@* -Brian Kernighan -@end quotation - -@cindex d.c., see ``dark corner'' -@cindex dark corner -Until the POSIX standard (and @cite{The Gawk Manual}), -many features of @command{awk} were either poorly documented or not -documented at all. Descriptions of such features -(often called ``dark corners'') are noted in this @value{DOCUMENT} with -@iftex -the picture of a flashlight in the margin, as shown here. -@value{DARKCORNER} -@end iftex -@ifnottex -``(d.c.)''. -@end ifnottex -They also appear in the index under the heading ``dark corner.'' - -As noted by the opening quote, though, any -coverage of dark corners -is, by definition, something that is incomplete. - -@node Manual History, How To Contribute, Conventions, Preface -@unnumberedsec The GNU Project and This Book -@cindex Torvalds, Linus -@cindex sex, comparisons with -@quotation -@i{Software is like sex: it's better when it's free.}@* -Linus Torvalds -@end quotation - -@cindex FSF -@cindex Free Software Foundation -@cindex Stallman, Richard -The Free Software Foundation (FSF) is a non-profit organization dedicated -to the production and distribution of freely distributable software. -It was founded by Richard M.@: Stallman, the author of the original -Emacs editor. GNU Emacs is the most widely used version of Emacs today. - -@cindex GNU Project -@cindex GPL -@cindex General Public License -@cindex GNU General Public License -@cindex online documentation -@cindex documentation, online -The GNU@footnote{GNU stands for ``GNU's not Unix.''} -Project is an ongoing effort on the part of the Free Software -Foundation to create a complete, freely distributable, POSIX-compliant -computing environment. -The FSF uses the ``GNU General Public License'' (GPL) to ensure that -their software's -source code is always available to the end user. A -copy of the GPL is included -@ifnotinfo -in this @value{DOCUMENT} -@end ifnotinfo -for your reference -(@pxref{Copying, ,GNU General Public License}). -The GPL applies to the C language source code for @command{gawk}. -To find out more about the FSF and the GNU Project online, -see @uref{http://www.gnu.org, the GNU Project's home page}. -This @value{DOCUMENT} may also be read from -@uref{http://www.gnu.org/manual/gawk/, their web site}. - -A shell, an editor (Emacs), highly portable optimizing C, C++, and -Objective-C compilers, a symbolic debugger and dozens of large and -small utilities (such as @command{gawk}), have all been completed and are -freely available. The GNU operating -system kernel (the HURD), has been released but is still in an early -stage of development. - -@cindex Linux -@cindex GNU/Linux -@cindex BSD-based operating systems -@cindex NetBSD -@cindex FreeBSD -@cindex OpenBSD -Until the GNU operating system is more fully developed, you should -consider using GNU/Linux, a freely distributable, Unix-like operating -system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other -systems.@footnote{The terminology ``GNU/Linux'' is explained -in the @ref{Glossary}.} -There are -many books on GNU/Linux. One that is freely available is @cite{Linux -Installation and Getting Started}, by Matt Welsh. -Many GNU/Linux distributions are often available in computer stores or -bundled on CD-ROMs with books about Linux. -(There are three other freely available, Unix-like operating systems for -80386 and other systems: NetBSD, FreeBSD, and OpenBSD. All are based on the -4.4-Lite Berkeley Software Distribution, and they use recent versions -of @command{gawk} for their versions of @command{awk}.) - -@ifnotinfo -The @value{DOCUMENT} you are reading now is actually free---at least, the -information in it is free to anyone. The machine readable -source code for the @value{DOCUMENT} comes with @command{gawk}; anyone -may take this @value{DOCUMENT} to a copying machine and make as many -copies of it as they like. (Take a moment to check the Free Documentation -License; see @ref{GNU Free Documentation License}.) - -Although you could just print it out yourself, bound books are much -easier to read and use. Furthermore, -the proceeds from sales of this book go back to the FSF -to help fund development of more free software. -@end ifnotinfo - -@ignore -@cindex Close, Diane -The @value{DOCUMENT} itself has gone through several previous, -preliminary editions. -Paul Rubin wrote the very first draft of @cite{The GAWK Manual}; -it was around 40 pages in size. -Diane Close and Richard Stallman improved it, yielding the -version which I started working with in the fall of 1988. -It was around 90 pages long and barely described the original, ``old'' -version of @command{awk}. After substantial revision, the first version of -the @cite{The GAWK Manual} to be released was Edition 0.11 Beta in -October of 1989. The manual then underwent more substantial revision -for Edition 0.13 of December 1991. -David Trueman, Pat Rankin and Michal Jaegermann contributed sections -of the manual for Edition 0.13. -That edition was published by the -FSF as a bound book early in 1992. Since then there were several -minor revisions, notably Edition 0.14 of November 1992 that was published -by the FSF in January of 1993 and Edition 0.16 of August 1993. - -Edition 1.0 of @cite{GAWK: The GNU Awk User's Guide} represented a significant re-working -of @cite{The GAWK Manual}, with much additional material. -The FSF and I agreed that I was now the primary author. -@c I also felt that the manual needed a more descriptive title. - -In January 1996, SSC published Edition 1.0 under the title @cite{Effective AWK Programming}. -In February 1997, they published Edition 1.0.3 which had minor changes -as a ``second edition.'' -In 1999, the FSF published this same version as Edition 2 -of @cite{GAWK: The GNU Awk User's Guide}. - -Edition @value{EDITION} maintains the basic structure of Edition 1.0, -but with significant additional material, reflecting the host of new features -in @command{gawk} @value{PVERSION} @value{VERSION}. -Of particular note is -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}, -@ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, -@ref{Internationalization, ,Internationalization with @command{gawk}}, -@ref{Advanced Features, ,Advanced Features of @command{gawk}}, -and -@ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}. -@end ignore - -@cindex Close, Diane -The @value{DOCUMENT} itself has gone through a number of previous editions. -Paul Rubin wrote the very first draft of @cite{The GAWK Manual}; -it was around 40 pages in size. -Diane Close and Richard Stallman improved it, yielding a -version that was -around 90 pages long and barely described the original, ``old'' -version of @command{awk}. - -I started working with that version in the fall of 1988. -As work on it progressed, -the FSF published several preliminary versions (numbered 0.@var{x}). -In 1996, Edition 1.0 was released with @command{gawk} 3.0.0. -The FSF published the first two editions under -the title @cite{The GNU Awk User's Guide}. - -This edition maintains the basic structure of Edition 1.0, -but with significant additional material, reflecting the host of new features -in @command{gawk} @value{PVERSION} @value{VERSION}. -Of particular note is -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}, -as well as -@ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, -@ref{Internationalization, ,Internationalization with @command{gawk}}, -and also -@ref{Advanced Features, ,Advanced Features of @command{gawk}}, -and -@ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}. - -@cite{@value{TITLE}} will undoubtedly continue to evolve. -An electronic version -comes with the @command{gawk} distribution from the FSF. -If you find an error in this @value{DOCUMENT}, please report it! -@xref{Bugs, ,Reporting Problems and Bugs}, for information on submitting -problem reports electronically, or write to me in care of the publisher. - -@node How To Contribute, Acknowledgments, Manual History, Preface -@unnumberedsec How to Contribute - -As the maintainer of GNU @command{awk}, -I am starting a collection of publicly available @command{awk} -programs. -For more information, -see @uref{ftp://ftp.freefriends.org/arnold/Awkstuff}. -If you have written an interesting @command{awk} program, or have written a -@command{gawk} extension that you would like to -share with the rest of the world, please contact me (@email{arnold@@gnu.org}). -Making things available on the Internet helps keep the -@command{gawk} distribution down to manageable size. - -@node Acknowledgments, , How To Contribute, Preface -@unnumberedsec Acknowledgments - -The initial draft of @cite{The GAWK Manual} had the following acknowledgments: - -@quotation -Many people need to be thanked for their assistance in producing this -manual. Jay Fenlason contributed many ideas and sample programs. Richard -Mlynarik and Robert Chassell gave helpful comments on drafts of this -manual. The paper @cite{A Supplemental Document for @command{awk}} by John W.@: -Pierce of the Chemistry Department at UC San Diego, pinpointed several -issues relevant both to @command{awk} implementation and to this manual, that -would otherwise have escaped us. -@end quotation - -@cindex Stallman, Richard -I would like to acknowledge Richard M.@: Stallman, for his vision of a -better world and for his courage in founding the FSF and starting the -GNU project. - -The following people (in alphabetical order) -provided helpful comments on various -versions of this book, up to and including this edition. -Rick Adams, -Nelson H.F. Beebe, -Karl Berry, -Dr.@: Michael Brennan, -Rich Burridge, -Claire Coutier, -Diane Close, -Scott Deifik, -Christopher (``Topher'') Eliot, -Jeffrey Friedl, -Dr.@: Darrel Hankerson, -Michal Jaegermann, -Dr.@: Richard J.@: LeBlanc, -Michael Lijewski, -Pat Rankin, -Miriam Robbins, -Mary Sheehan, -and -Chuck Toporek. - -@cindex Berry, Karl -@cindex Chassell, Robert J.@: -@cindex Texinfo -Robert J.@: Chassell provided much valuable advice on -the use of Texinfo. -He also deserves special thanks for -convincing me @emph{not} to title this @value{DOCUMENT} -@cite{How To Gawk Politely}. -Karl Berry helped significantly with the @TeX{} part of Texinfo. - -@cindex Hartholz, Marshall -@cindex Hartholz, Elaine -@cindex Schreiber, Bert -@cindex Schreiber, Rita -I would like to thank Marshall and Elaine Hartholz of Seattle and -Dr.@: Bert and Rita Schreiber of Detroit for large amounts of quiet vacation -time in their homes, which allowed me to make significant progress on -this @value{DOCUMENT} and on @command{gawk} itself. - -@cindex Hughes, Phil -Phil Hughes of SSC -contributed in a very important way by loaning me his laptop GNU/Linux -system, not once, but twice, which allowed me to do a lot of work while -away from home. - -@cindex Trueman, David -David Trueman deserves special credit; he has done a yeoman job -of evolving @command{gawk} so that it performs well and without bugs. -Although he is no longer involved with @command{gawk}, -working with him on this project was a significant pleasure. - -@cindex Drepper, Ulrich -@cindex GNITS mailing list -The intrepid members of the GNITS mailing list, and most notably Ulrich -Drepper, provided invaluable help and feedback for the design of the -internationalization features. - -@cindex Beebe, Nelson -@cindex Brown, Martin -@cindex Deifik, Scott -@cindex Hankerson, Darrel -@cindex Jaegermann, Michal -@cindex Kahrs, J@"urgen -@cindex Rankin, Pat -@cindex Rommel, Kai Uwe -@cindex Zaretskii, Eli -Nelson Beebe, -Martin Brown, -Scott Deifik, -Darrel Hankerson, -Michal Jaegermann, -J@"urgen Kahrs, -Pat Rankin, -Kai Uwe Rommel, -and Eli Zaretskii -(in alphabetical order) -are long-time members of the -@command{gawk} ``crack portability team.'' Without their hard work and -help, @command{gawk} would not be nearly the fine program it is today. It -has been and continues to be a pleasure working with this team of fine -people. - -@cindex Kernighan, Brian -David and I would like to thank Brian Kernighan of Bell Laboratories for -invaluable assistance during the testing and debugging of @command{gawk}, and for -help in clarifying numerous points about the language. We could not have -done nearly as good a job on either @command{gawk} or its documentation without -his help. - -Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed -significant editorial help for this @value{DOCUMENT} for the -3.1 release of @command{gawk}. - -@cindex Robbins, Miriam -@cindex Robbins, Jean -@cindex Robbins, Harry -@cindex G-d -I must thank my wonderful wife, Miriam, for her patience through -the many versions of this project, for her proof-reading, -and for sharing me with the computer. -I would like to thank my parents for their love, and for the grace with -which they raised and educated me. -Finally, I also must acknowledge my gratitude to G-d, for the many opportunities -He has sent my way, as well as for the gifts He has given me with which to -take advantage of those opportunities. -@sp 2 -@noindent -Arnold Robbins @* -Nof Ayalon @* -ISRAEL @* -March, 2001 - -@ignore -@c Try this -@iftex -@page -@headings off -@majorheading I@ @ @ @ The @command{awk} Language and @command{gawk} -Part I describes the @command{awk} language and @command{gawk} program in detail. -It starts with the basics, and continues through all of the features of @command{awk} -and @command{gawk}. It contains the following chapters: - -@itemize @bullet -@item -@ref{Getting Started, ,Getting Started with @command{awk}}. - -@item -@ref{Regexp, ,Regular Expressions}. - -@item -@ref{Reading Files, , Reading Input Files}. - -@item -@ref{Printing, , Printing Output}. - -@item -@ref{Expressions}. - -@item -@ref{Patterns and Actions, ,Patterns Actions and Variables}. - -@item -@ref{Arrays, ,Arrays in @command{awk}}. - -@item -@ref{Functions}. - -@item -@ref{Internationalization, ,Internationalization with @command{gawk}}. - -@item -@ref{Advanced Features, ,Advanced Features of @command{gawk}}. - -@item -@ref{Invoking Gawk, ,Running @command{awk} and @command{gawk}}. -@end itemize - -@page -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex -@end ignore - -@node Getting Started, Regexp, Preface, Top -@chapter Getting Started with @command{awk} -@cindex script, definition of -@cindex rule, definition of -@cindex program, definition of -@cindex basic function of @command{awk} - -The basic function of @command{awk} is to search files for lines (or other -units of text) that contain certain patterns. When a line matches one -of the patterns, @command{awk} performs specified actions on that line. -@command{awk} keeps processing input lines in this way until it reaches -the end of the input files. - -@cindex data-driven languages -@cindex procedural languages -@cindex language, data-driven -@cindex language, procedural -Programs in @command{awk} are different from programs in most other languages, -because @command{awk} programs are @dfn{data-driven}; that is, you describe -the data you want to work with and then what to do when you find it. -Most other languages are @dfn{procedural}; you have to describe, in great -detail, every step the program is to take. When working with procedural -languages, it is usually much -harder to clearly describe the data your program will process. -For this reason, @command{awk} programs are often refreshingly easy to -write and read. - -@cindex program, definition of -@cindex rule, definition of -When you run @command{awk}, you specify an @command{awk} @dfn{program} that -tells @command{awk} what to do. The program consists of a series of -@dfn{rules}. (It may also contain @dfn{function definitions}, -an advanced feature that we will ignore for now. -@xref{User-defined, ,User-Defined Functions}.) Each rule specifies one -pattern to search for and one action to perform -upon finding the pattern. - -Syntactically, a rule consists of a pattern followed by an action. The -action is enclosed in curly braces to separate it from the pattern. -Newlines usually separate rules. Therefore, an @command{awk} -program looks like this: - -@example -@var{pattern} @{ @var{action} @} -@var{pattern} @{ @var{action} @} -@dots{} -@end example - -@menu -* Running gawk:: How to run @command{gawk} programs; includes - command-line syntax. -* Sample Data Files:: Sample data files for use in the @command{awk} - programs illustrated in this @value{DOCUMENT}. -* Very Simple:: A very simple example. -* Two Rules:: A less simple one-line example using two - rules. -* More Complex:: A more complex example. -* Statements/Lines:: Subdividing or combining statements into - lines. -* Other Features:: Other Features of @command{awk}. -* When:: When to use @command{gawk} and when to use - other things. -@end menu - -@node Running gawk, Sample Data Files, Getting Started, Getting Started -@section How to Run @command{awk} Programs - -@cindex command-line formats -@cindex running @command{awk} programs -There are several ways to run an @command{awk} program. If the program is -short, it is easiest to include it in the command that runs @command{awk}, -like this: - -@example -awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} -@end example - -When the program is long, it is usually more convenient to put it in a file -and run it with a command like this: - -@example -awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{} -@end example - -This @value{SECTION} discusses both mechanisms, along with several -variations of each. - -@menu -* One-shot:: Running a short throw-away @command{awk} - program. -* Read Terminal:: Using no input files (input from terminal - instead). -* Long:: Putting permanent @command{awk} programs in - files. -* Executable Scripts:: Making self-contained @command{awk} programs. -* Comments:: Adding documentation to @command{gawk} - programs. -* Quoting:: More discussion of shell quoting issues. -@end menu - -@node One-shot, Read Terminal, Running gawk, Running gawk -@subsection One-Shot Throw-Away @command{awk} Programs - -Once you are familiar with @command{awk}, you will often type in simple -programs the moment you want to use them. Then you can write the -program as the first argument of the @command{awk} command, like this: - -@example -awk '@var{program}' @var{input-file1} @var{input-file2} @dots{} -@end example - -@noindent -where @var{program} consists of a series of @var{patterns} and -@var{actions}, as described earlier. - -@cindex single quotes, why needed -This command format instructs the @dfn{shell}, or command interpreter, -to start @command{awk} and use the @var{program} to process records in the -input file(s). There are single quotes around @var{program} so -the shell won't interpret any @command{awk} characters as special shell -characters. The quotes also cause the shell to treat all of @var{program} as -a single argument for @command{awk}, and allow @var{program} to be more -than one line long. - -This format is also useful for running short or medium-sized @command{awk} -programs from shell scripts, because it avoids the need for a separate -file for the @command{awk} program. A self-contained shell script is more -reliable because there are no other files to misplace. - -@ref{Very Simple, ,Some Simple Examples}, -@ifnotinfo -later in this @value{CHAPTER}, -@end ifnotinfo -presents several short, -self-contained programs. - -@c Removed for gawk 3.1, doesn't really add anything here. -@ignore -As an interesting side point, the command - -@example -awk '/foo/' @var{files} @dots{} -@end example - -@noindent -is essentially the same as - -@cindex @command{egrep} utility -@example -egrep foo @var{files} @dots{} -@end example -@end ignore - -@node Read Terminal, Long, One-shot, Running gawk -@subsection Running @command{awk} Without Input Files - -@cindex standard input -@cindex input, standard -You can also run @command{awk} without any input files. If you type the -following command line: - -@example -awk '@var{program}' -@end example - -@noindent -@command{awk} applies the @var{program} to the @dfn{standard input}, -which usually means whatever you type on the terminal. This continues -until you indicate end-of-file by typing @kbd{Ctrl-d}. -(On other operating systems, the end-of-file character may be different. -For example, on OS/2 and MS-DOS, it is @kbd{Ctrl-z}.) - -As an example, the following program prints a friendly piece of advice -(from Douglas Adams's @cite{The Hitchhiker's Guide to the Galaxy}), -to keep you from worrying about the complexities of computer programming. -(@code{BEGIN} is a feature we haven't discussed yet.): - -@example -$ awk "BEGIN @{ print \"Don't Panic!\" @}" -@print{} Don't Panic! -@end example - -@cindex quoting, shell -@cindex shell quoting -This program does not read any input. The @samp{\} before each of the -inner double quotes is necessary because of the shell's quoting -rules---in particular because it mixes both single quotes and -double quotes.@footnote{Although we generally recommend the use of single -quotes around the program text, double quotes are needed here in order to -put the single quote into the message.} - -This next simple @command{awk} program -emulates the @command{cat} utility; it copies whatever you type at the -keyboard to its standard output. (Why this works is explained shortly.) - -@example -$ awk '@{ print @}' -Now is the time for all good men -@print{} Now is the time for all good men -to come to the aid of their country. -@print{} to come to the aid of their country. -Four score and seven years ago, ... -@print{} Four score and seven years ago, ... -What, me worry? -@print{} What, me worry? -@kbd{Ctrl-d} -@end example - -@node Long, Executable Scripts, Read Terminal, Running gawk -@subsection Running Long Programs - -@cindex running long programs -@cindex @code{-f} option -@cindex command-line option, @code{-f} -@cindex program file -@cindex file, @command{awk} program -Sometimes your @command{awk} programs can be very long. In this case, it is -more convenient to put the program into a separate file. In order to tell -@command{awk} to use that file for its program, you type: - -@example -awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{} -@end example - -The @option{-f} instructs the @command{awk} utility to get the @command{awk} program -from the file @var{source-file}. Any @value{FN} can be used for -@var{source-file}. For example, you could put the program: - -@example -BEGIN @{ print "Don't Panic!" @} -@end example - -@noindent -into the file @file{advice}. Then this command: - -@example -awk -f advice -@end example - -@noindent -does the same thing as this one: - -@example -awk "BEGIN @{ print \"Don't Panic!\" @}" -@end example - -@cindex quoting, shell -@cindex shell quoting -@noindent -This was explained earlier -(@pxref{Read Terminal, ,Running @command{awk} Without Input Files}). -Note that you don't usually need single quotes around the @value{FN} that you -specify with @option{-f}, because most @value{FN}s don't contain any of the shell's -special characters. Notice that in @file{advice}, the @command{awk} -program did not have single quotes around it. The quotes are only needed -for programs that are provided on the @command{awk} command line. - -If you want to identify your @command{awk} program files clearly as such, -you can add the extension @file{.awk} to the @value{FN}. This doesn't -affect the execution of the @command{awk} program but it does make -``housekeeping'' easier. - -@node Executable Scripts, Comments, Long, Running gawk -@subsection Executable @command{awk} Programs -@cindex executable scripts -@cindex scripts, executable -@cindex self-contained programs -@cindex program, self-contained -@cindex @code{#!} (executable scripts) - -Once you have learned @command{awk}, you may want to write self-contained -@command{awk} scripts, using the @samp{#!} script mechanism. You can do -this on many Unix systems@footnote{The @samp{#!} mechanism works on -Linux systems, -systems derived from the 4.4-Lite Berkeley Software Distribution, -and most commercial Unix systems.} as well as on the GNU system. -For example, you could update the file @file{advice} to look like this: - -@example -#! /bin/awk -f - -BEGIN @{ print "Don't Panic!" @} -@end example - -@noindent -After making this file executable (with the @command{chmod} utility), -simply type @samp{advice} -at the shell and the system arranges to run @command{awk}@footnote{The -line beginning with @samp{#!} lists the full @value{FN} of an interpreter -to run and an optional initial command-line argument to pass to that -interpreter. The operating system then runs the interpreter with the given -argument and the full argument list of the executed program. The first argument -in the list is the full @value{FN} of the @command{awk} program. The rest of the -argument list is either options to @command{awk}, or @value{DF}s, -or both.} as if you had -typed @samp{awk -f advice}: - -@example -$ chmod +x advice -$ advice -@print{} Don't Panic! -@end example - -@noindent -Self-contained @command{awk} scripts are useful when you want to write a -program that users can invoke without their having to know that the program is -written in @command{awk}. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Portability Issues with @samp{#!} -@cindex advanced notes - -Some systems limit the length of the interpreter name to 32 characters. -Often, this can be dealt with by using a symbolic link. - -You should not put more than one argument on the @samp{#!} -line after the path to @command{awk}. It does not work. The operating system -treats the rest of the line as a single argument and passes it to @command{awk}. -Doing this leads to confusing behavior---most likely a usage diagnostic -of some sort from @command{awk}. - -@cindex portability issues -Finally, -the value of @code{ARGV[0]} -(@pxref{Built-in Variables}) -varies depending upon your operating system. -Some systems put @samp{awk} there, some put the full pathname -of @command{awk} (such as @file{/bin/awk}), and some put the name -of your script (@samp{advice}). Don't rely on the value of @code{ARGV[0]} -to provide your script name. - -@node Comments, Quoting, Executable Scripts, Running gawk -@subsection Comments in @command{awk} Programs -@cindex @code{#} (comment) -@cindex comments -@cindex use of comments -@cindex documenting @command{awk} programs -@cindex programs, documenting - -A @dfn{comment} is some text that is included in a program for the sake -of human readers; it is not really an executable part of the program. Comments -can explain what the program does and how it works. Nearly all -programming languages have provisions for comments, as programs are -typically hard to understand without them. - -In the @command{awk} language, a comment starts with the sharp sign -character (@samp{#}) and continues to the end of the line. -The @samp{#} does not have to be the first character on the line. The -@command{awk} language ignores the rest of a line following a sharp sign. -For example, we could have put the following into @file{advice}: - -@example -# This program prints a nice friendly message. It helps -# keep novice users from being afraid of the computer. -BEGIN @{ print "Don't Panic!" @} -@end example - -You can put comment lines into keyboard-composed throw-away @command{awk} -programs, but this usually isn't very useful; the purpose of a -comment is to help you or another person understand the program -when reading it at a later time. - -@cindex quoting, shell -@cindex shell quoting -@strong{Caution:} As mentioned in -@ref{One-shot, ,One-Shot Throw-Away @command{awk} Programs}, -you can enclose small to medium programs in single quotes, in order to keep -your shell scripts self-contained. When doing so, @emph{don't} put -an apostrophe (i.e., a single quote) into a comment (or anywhere else -in your program). The shell interprets the quote as the closing -quote for the entire program. As a result, usually the shell -prints a message about mismatched quotes, and if @command{awk} actually -runs, it will probably print strange messages about syntax errors. -For example, look at the following: - -@example -$ awk '@{ print "hello" @} # let's be cute' -> -@end example - -The shell sees that the first two quotes match, and that -a new quoted object begins at the end of the command-line. -It therefore prompts with the secondary prompt, waiting for more input. -With Unix @command{awk}, closing the quoted string produces this result: - -@example -$ awk '@{ print "hello" @} # let's be cute' -> ' -@error{} awk: can't open file be -@error{} source line number 1 -@end example - -Putting a backslash before the single quote in @samp{let's} wouldn't help, -since backslashes are not special inside single quotes. -The next @value{SUBSECTION} describes the shell's quoting rules. - -@node Quoting, , Comments, Running gawk -@subsection Shell Quoting Issues -@c the indexing here is purposely different, until we -@c get a way to mark the defining instance for an index entry -@cindex quoting rules, shell -@cindex shell quoting rules - -For short to medium length @command{awk} programs, it is most convenient -to enter the program on the @command{awk} command line. -This is best done by enclosing the entire program in single quotes. -This is true whether you are entering the program interactively at -the shell prompt, or writing it as part of a larger shell script: - -@example -awk '@var{program text}' @var{input-file1} @var{input-file2} @dots{} -@end example - -@cindex @command{csh} utility -Once you are working with the shell, it is helpful to have a basic -knowledge of shell quoting rules. The following rules apply only to -POSIX-compliant, Bourne-style shells (such as @command{bash}, the GNU Bourne-Again -Shell). If you use @command{csh}, you're on your own. - -@itemize @bullet -@item -Quoted items can be concatenated with nonquoted items as well as with other -quoted items. The shell turns everything into one argument for -the command. - -@item -Preceding any single character with a backslash (@samp{\}) quotes -that character. The shell removes the backslash and passes the quoted -character on to the command. - -@item -Single quotes protect everything between the opening and closing quotes. -The shell does no interpretation of the quoted text, passing it on verbatim -to the command. -It is @emph{impossible} to embed a single quote inside single-quoted text. -Refer back to -@ref{Comments, ,Comments in @command{awk} Programs}, -for an example showing what happens if you try. - -@item -Double quotes protect most things between the opening and closing quotes. -The shell does at least variable and command substitution on the quoted text. -Different shells may do additional kinds of processing on double-quoted text. - -Since certain characters within double-quoted text are processed by the shell, -they must be @dfn{escaped} within the text. Of note are the characters -@samp{$}, @samp{`}, @samp{\} and @samp{"}, all of which must be preceded by -a backslash within double-quoted text if they are to be passed on literally -to the program. (The leading backslash is stripped first.) -Thus, the example seen -@ifnotinfo -previously -@end ifnotinfo -in @ref{Read Terminal, ,Running @command{awk} Without Input Files}, -is applicable: - -@example -$ awk "BEGIN @{ print \"Don't Panic!\" @}" -@print{} Don't Panic! -@end example - -Note that the single quote is not special within double quotes. - -@item -Null strings are removed when they occur as part of a non-null -command-line argument, while explicit non-null objects are kept. -For example, to specify that the field separator @code{FS} should -be set to the null string, use: - -@example -awk -F "" '@var{program}' @var{files} # correct -@end example - -@noindent -Don't use this: - -@example -awk -F"" '@var{program}' @var{files} # wrong! -@end example - -@noindent -In the second case, @command{awk} will attempt to use the text of the program -as the value of @code{FS}, and the first @value{FN} as the text of the program! -This results in syntax errors at best, and confusing behavior at worst. -@end itemize - -@cindex shell quoting, tricks -Mixing single and double quotes is difficult. You have to resort -to shell quoting tricks, like this: - -@example -$ awk 'BEGIN @{ print "Here is a single quote <'"'"'>" @}' -@print{} Here is a single quote <'> -@end example - -@noindent -This program consists of three concatenated quoted strings. The first and the -third are single-quoted, the second is double-quoted. - -This can be ``simplified'' to: - -@example -$ awk 'BEGIN @{ print "Here is a single quote <'\''>" @}' -@print{} Here is a single quote <'> -@end example - -@noindent -Judge for yourself which of these two is the more readable. - -Another option is to use double quotes, escaping the embedded, @command{awk}-level -double quotes: - -@example -$ awk "BEGIN @{ print \"Here is a single quote <'>\" @}" -@print{} Here is a single quote <'> -@end example - -@noindent -This option is also painful, because double quotes, backslashes, and dollar signs -are very common in @command{awk} programs. - -If you really need both single and double quotes in your @command{awk} -program, it is probably best to move it into a separate file, where -the shell won't be part of the picture, and you can say what you mean. - -@node Sample Data Files, Very Simple, Running gawk, Getting Started -@section @value{DDF}s for the Examples -@c For gawk >= 3.2, update these data files. No-one has such slow modems! - -@cindex input file, sample -@cindex sample input files -@cindex @file{BBS-list} file -Many of the examples in this @value{DOCUMENT} take their input from two sample -@value{DF}s. The first, called @file{BBS-list}, represents a list of -computer bulletin board systems together with information about those systems. -The second @value{DF}, called @file{inventory-shipped}, contains -information about monthly shipments. In both files, -each line is considered to be one @dfn{record}. - -In the file @file{BBS-list}, each record contains the name of a computer -bulletin board, its phone number, the board's baud rate(s), and a code for -the number of hours it is operational. An @samp{A} in the last column -means the board operates 24 hours a day. A @samp{B} in the last -column means the board only operates on evening and weekend hours. -A @samp{C} means the board operates only on weekends: - -@c 2e: Update the baud rates to reflect today's faster modems -@example -@c system if test ! -d eg ; then mkdir eg ; fi -@c system if test ! -d eg/lib ; then mkdir eg/lib ; fi -@c system if test ! -d eg/data ; then mkdir eg/data ; fi -@c system if test ! -d eg/prog ; then mkdir eg/prog ; fi -@c system if test ! -d eg/misc ; then mkdir eg/misc ; fi -@c file eg/data/BBS-list -aardvark 555-5553 1200/300 B -alpo-net 555-3412 2400/1200/300 A -barfly 555-7685 1200/300 A -bites 555-1675 2400/1200/300 A -camelot 555-0542 300 C -core 555-2912 1200/300 C -fooey 555-1234 2400/1200/300 B -foot 555-6699 1200/300 B -macfoo 555-6480 1200/300 A -sdace 555-3430 2400/1200/300 A -sabafoo 555-2127 1200/300 C -@c endfile -@end example - -@cindex @file{inventory-shipped} file -The second @value{DF}, called @file{inventory-shipped}, represents -information about shipments during the year. -Each record contains the month, the number -of green crates shipped, the number of red boxes shipped, the number of -orange bags shipped, and the number of blue packages shipped, -respectively. There are 16 entries, covering the 12 months of last year -and the first four months of the current year. - -@example -@c file eg/data/inventory-shipped -Jan 13 25 15 115 -Feb 15 32 24 226 -Mar 15 24 34 228 -Apr 31 52 63 420 -May 16 34 29 208 -Jun 31 42 75 492 -Jul 24 34 67 436 -Aug 15 34 47 316 -Sep 13 55 37 277 -Oct 29 54 68 525 -Nov 20 87 82 577 -Dec 17 35 61 401 - -Jan 21 36 64 620 -Feb 26 58 80 652 -Mar 24 75 70 495 -Apr 21 70 74 514 -@c endfile -@end example - -@ifinfo -If you are reading this in GNU Emacs using Info, you can copy the regions -of text showing these sample files into your own test files. This way you -can try out the examples shown in the remainder of this document. You do -this by using the command @kbd{M-x write-region} to copy text from the Info -file into a file for use with @command{awk} -(@xref{Misc File Ops, , Miscellaneous File Operations, emacs, GNU Emacs Manual}, -for more information). Using this information, create your own -@file{BBS-list} and @file{inventory-shipped} files and practice what you -learn in this @value{DOCUMENT}. - -@cindex Texinfo -If you are using the stand-alone version of Info, -see @ref{Extract Program, ,Extracting Programs from Texinfo Source Files}, -for an @command{awk} program that extracts these @value{DF}s from -@file{gawk.texi}, the Texinfo source file for this Info file. -@end ifinfo - -@node Very Simple, Two Rules, Sample Data Files, Getting Started -@section Some Simple Examples - -The following command runs a simple @command{awk} program that searches the -input file @file{BBS-list} for the character string @samp{foo}. (A -string of characters is usually called a @dfn{string}. -The term @dfn{string} is based on similar usage in English, such -as ``a string of pearls,'' or, ``a string of cars in a train.''): - -@example -awk '/foo/ @{ print $0 @}' BBS-list -@end example - -@noindent -When lines containing @samp{foo} are found, they are printed because -@w{@samp{print $0}} means print the current line. (Just @samp{print} by -itself means the same thing, so we could have written that -instead.) - -You will notice that slashes (@samp{/}) surround the string @samp{foo} -in the @command{awk} program. The slashes indicate that @samp{foo} -is the pattern to search for. This type of pattern is called a -@dfn{regular expression}, which is covered in more detail later -(@pxref{Regexp, ,Regular Expressions}). -The pattern is allowed to match parts of words. -There are -single quotes around the @command{awk} program so that the shell won't -interpret any of it as special shell characters. - -Here is what this program prints: - -@example -$ awk '/foo/ @{ print $0 @}' BBS-list -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sabafoo 555-2127 1200/300 C -@end example - -@cindex action, default -@cindex pattern, default -@cindex default action -@cindex default pattern -In an @command{awk} rule, either the pattern or the action can be omitted, -but not both. If the pattern is omitted, then the action is performed -for @emph{every} input line. If the action is omitted, the default -action is to print all lines that match the pattern. - -@cindex empty action -@cindex action, empty -Thus, we could leave out the action (the @code{print} statement and the curly -braces) in the above example and the result would be the same: all -lines matching the pattern @samp{foo} are printed. By comparison, -omitting the @code{print} statement but retaining the curly braces makes an -empty action that does nothing (i.e., no lines are printed). - -@cindex one-liners -Many practical @command{awk} programs are just a line or two. Following is a -collection of useful, short programs to get you started. Some of these -programs contain constructs that haven't been covered yet. (The description -of the program will give you a good idea of what is going on, but please -read the rest of the @value{DOCUMENT} to become an @command{awk} expert!) -Most of the examples use a @value{DF} named @file{data}. This is just a -placeholder; if you use these programs yourself, substitute -your own @value{FN}s for @file{data}. -For future reference, note that there is often more than -one way to do things in @command{awk}. At some point, you may want -to look back at these examples and see if -you can come up with different ways to do the same things shown here: - -@itemize @bullet -@item -Print the length of the longest input line: - -@example -awk '@{ if (length($0) > max) max = length($0) @} - END @{ print max @}' data -@end example - -@item -Print every line that is longer than 80 characters: - -@example -awk 'length($0) > 80' data -@end example - -The sole rule has a relational expression as its pattern and it has no -action---so the default action, printing the record, is used. - -@cindex @command{expand} utility -@item -Print the length of the longest line in @file{data}: - -@example -expand data | awk '@{ if (x < length()) x = length() @} - END @{ print "maximum line length is " x @}' -@end example - -The input is processed by the @command{expand} utility to change tabs -into spaces, so the widths compared are actually the right-margin columns. - -@item -Print every line that has at least one field: - -@example -awk 'NF > 0' data -@end example - -This is an easy way to delete blank lines from a file (or rather, to -create a new file similar to the old file but from which the blank lines -have been removed). - -@item -Print seven random numbers from 0 to 100, inclusive: - -@example -awk 'BEGIN @{ for (i = 1; i <= 7; i++) - print int(101 * rand()) @}' -@end example - -@item -Print the total number of bytes used by @var{files}: - -@example -ls -l @var{files} | awk '@{ x += $5 @} - END @{ print "total bytes: " x @}' -@end example - -@item -Print the total number of kilobytes used by @var{files}: - -@c Don't use \ continuation, not discussed yet -@example -ls -l @var{files} | awk '@{ x += $5 @} - END @{ print "total K-bytes: " (x + 1023)/1024 @}' -@end example - -@item -Print a sorted list of the login names of all users: - -@example -awk -F: '@{ print $1 @}' /etc/passwd | sort -@end example - -@item -Count lines in a file: - -@example -awk 'END @{ print NR @}' data -@end example - -@item -Print the even-numbered lines in the @value{DF}: - -@example -awk 'NR % 2 == 0' data -@end example - -If you use the expression @samp{NR % 2 == 1} instead, -it would print the odd-numbered lines. -@end itemize - -@node Two Rules, More Complex, Very Simple, Getting Started -@section An Example with Two Rules -@cindex how @command{awk} works - -The @command{awk} utility reads the input files one line at a -time. For each line, @command{awk} tries the patterns of each of the rules. -If several patterns match, then several actions are run in the order in -which they appear in the @command{awk} program. If no patterns match, then -no actions are run. - -After processing all the rules that match the line (and perhaps there are none), -@command{awk} reads the next line. (However, -@pxref{Next Statement, ,The @code{next} Statement}, -and also @pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). -This continues until the end of the file is reached. -For example, the following @command{awk} program contains two rules: - -@example -/12/ @{ print $0 @} -/21/ @{ print $0 @} -@end example - -@noindent -The first rule has the string @samp{12} as the -pattern and @samp{print $0} as the action. The second rule has the -string @samp{21} as the pattern and also has @samp{print $0} as the -action. Each rule's action is enclosed in its own pair of braces. - -This program prints every line that contains the string -@samp{12} @emph{or} the string @samp{21}. If a line contains both -strings, it is printed twice, once by each rule. - -This is what happens if we run this program on our two sample @value{DF}s, -@file{BBS-list} and @file{inventory-shipped}, as shown here: - -@example -$ awk '/12/ @{ print $0 @} -> /21/ @{ print $0 @}' BBS-list inventory-shipped -@print{} aardvark 555-5553 1200/300 B -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} barfly 555-7685 1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} core 555-2912 1200/300 C -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sdace 555-3430 2400/1200/300 A -@print{} sabafoo 555-2127 1200/300 C -@print{} sabafoo 555-2127 1200/300 C -@print{} Jan 21 36 64 620 -@print{} Apr 21 70 74 514 -@end example - -@noindent -Note how the line beginning with @samp{sabafoo} -in @file{BBS-list} was printed twice, once for each rule. - -@node More Complex, Statements/Lines, Two Rules, Getting Started -@section A More Complex Example - -Now that we've mastered some simple tasks, let's look at -what typical @command{awk} -programs do. This example shows how @command{awk} can be used to -summarize, select, and rearrange the output of another utility. It uses -features that haven't been covered yet, so don't worry if you don't -understand all the details: - -@example -ls -l | awk '$6 == "Nov" @{ sum += $5 @} - END @{ print sum @}' -@end example - -@cindex @command{csh} utility -@cindex @command{csh}, backslash continuation -@cindex backslash continuation, in @command{csh} -@cindex @command{ls} utility -This command prints the total number of bytes in all the files in the -current directory that were last modified in November (of any year). -@footnote{In the C shell (@command{csh}), you need to type -a semicolon and then a backslash at the end of the first line; see -@ref{Statements/Lines, ,@command{awk} Statements Versus Lines}, for an -explanation as to why. In a POSIX-compliant shell, such as the Bourne -shell or @command{bash}, you can type the example as shown. If the command -@samp{echo $path} produces an empty output line, you are most likely -using a POSIX-compliant shell. Otherwise, you are probably using the -C shell or a shell derived from it.} -The @w{@samp{ls -l}} part of this example is a system command that gives -you a listing of the files in a directory, including each file's size and the date -the file was last modified. Its output looks like this: - -@example --rw-r--r-- 1 arnold user 1933 Nov 7 13:05 Makefile --rw-r--r-- 1 arnold user 10809 Nov 7 13:03 awk.h --rw-r--r-- 1 arnold user 983 Apr 13 12:14 awk.tab.h --rw-r--r-- 1 arnold user 31869 Jun 15 12:20 awk.y --rw-r--r-- 1 arnold user 22414 Nov 7 13:03 awk1.c --rw-r--r-- 1 arnold user 37455 Nov 7 13:03 awk2.c --rw-r--r-- 1 arnold user 27511 Dec 9 13:07 awk3.c --rw-r--r-- 1 arnold user 7989 Nov 7 13:03 awk4.c -@end example - -@noindent -The first field contains read-write permissions, the second field contains -the number of links to the file, and the third field identifies the owner of -the file. The fourth field identifies the group of the file. -The fifth field contains the size of the file in bytes. The -sixth, seventh and eighth fields contain the month, day, and time, -respectively, that the file was last modified. Finally, the ninth field -contains the name of the file.@footnote{On some -very old systems, you may need to use @samp{ls -lg} to get this output.} - -@cindex automatic initialization -@cindex initialization, automatic -The @samp{$6 == "Nov"} in our @command{awk} program is an expression that -tests whether the sixth field of the output from @w{@samp{ls -l}} -matches the string @samp{Nov}. Each time a line has the string -@samp{Nov} for its sixth field, the action @samp{sum += $5} is -performed. This adds the fifth field (the file's size) to the variable -@code{sum}. As a result, when @command{awk} has finished reading all the -input lines, @code{sum} is the total of the sizes of the files whose -lines matched the pattern. (This works because @command{awk} variables -are automatically initialized to zero.) - -After the last line of output from @command{ls} has been processed, the -@code{END} rule executes and prints the value of @code{sum}. -In this example, the value of @code{sum} is 140963. - -These more advanced @command{awk} techniques are covered in later sections -(@pxref{Action Overview, ,Actions}). Before you can move on to more -advanced @command{awk} programming, you have to know how @command{awk} interprets -your input and displays your output. By manipulating fields and using -@code{print} statements, you can produce some very useful and impressive -looking reports. - -@node Statements/Lines, Other Features, More Complex, Getting Started -@section @command{awk} Statements Versus Lines -@cindex line break -@cindex newline - -Most often, each line in an @command{awk} program is a separate statement or -separate rule, like this: - -@example -awk '/12/ @{ print $0 @} - /21/ @{ print $0 @}' BBS-list inventory-shipped -@end example - -However, @command{gawk} ignores newlines after any of the following -symbols and keywords: - -@example -, @{ ? : || && do else -@end example - -@noindent -A newline at any other point is considered the end of the -statement.@footnote{The @samp{?} and @samp{:} referred to here is the -three-operand conditional expression described in -@ref{Conditional Exp, ,Conditional Expressions}. -Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk} -extension; if @option{--posix} is specified -(@pxref{Options, , Command-Line Options}), then this extension is disabled.} - -@cindex backslash continuation -@cindex continuation of lines -@cindex line continuation -If you would like to split a single statement into two lines at a point -where a newline would terminate it, you can @dfn{continue} it by ending the -first line with a backslash character (@samp{\}). The backslash must be -the final character on the line in order to be recognized as a continuation -character. A backslash is allowed anywhere in the statement, even -in the middle of a string or regular expression. For example: - -@example -awk '/This regular expression is too long, so continue it\ - on the next line/ @{ print $1 @}' -@end example - -@noindent -@cindex portability issues -We have generally not used backslash continuation in the sample programs -in this @value{DOCUMENT}. In @command{gawk}, there is no limit on the -length of a line, so backslash continuation is never strictly necessary; -it just makes programs more readable. For this same reason, as well as -for clarity, we have kept most statements short in the sample programs -presented throughout the @value{DOCUMENT}. Backslash continuation is -most useful when your @command{awk} program is in a separate source file -instead of entered from the command line. You should also note that -many @command{awk} implementations are more particular about where you -may use backslash continuation. For example, they may not allow you to -split a string constant using backslash continuation. Thus, for maximum -portability of your @command{awk} programs, it is best not to split your -lines in the middle of a regular expression or a string. -@c 10/2000: gawk, mawk, and current bell labs awk allow it, -@c solaris 2.7 nawk does not. Solaris /usr/xpg4/bin/awk does though! sigh. - -@cindex @command{csh} utility -@cindex @command{csh}, backslash continuation -@cindex backslash continuation, in @command{csh} -@strong{Caution:} @emph{Backslash continuation does not work as described -above with the C shell.} It works for @command{awk} programs in files and -for one-shot programs, @emph{provided} you are using a POSIX-compliant -shell, such as the Unix Bourne shell or @command{bash}. But the C shell behaves -differently! There, you must use two backslashes in a row, followed by -a newline. Note also that when using the C shell, @emph{every} newline -in your awk program must be escaped with a backslash. To illustrate: - -@example -% awk 'BEGIN @{ \ -? print \\ -? "hello, world" \ -? @}' -@print{} hello, world -@end example - -@noindent -Here, the @samp{%} and @samp{?} are the C shell's primary and secondary -prompts, analogous to the standard shell's @samp{$} and @samp{>}. - -Compare the previous example to how it is done with a POSIX-compliant shell: - -@example -$ awk 'BEGIN @{ -> print \ -> "hello, world" -> @}' -@print{} hello, world -@end example - -@command{awk} is a line-oriented language. Each rule's action has to -begin on the same line as the pattern. To have the pattern and action -on separate lines, you @emph{must} use backslash continuation; there -is no other way. - -@cindex backslash continuation, and comments -@cindex comments and backslash continuation -Another thing to keep in mind is that backslash continuation and -comments do not mix. As soon as @command{awk} sees the @samp{#} that -starts a comment, it ignores @emph{everything} on the rest of the -line. For example: - -@example -$ gawk 'BEGIN @{ print "dont panic" # a friendly \ -> BEGIN rule -> @}' -@error{} gawk: cmd. line:2: BEGIN rule -@error{} gawk: cmd. line:2: ^ parse error -@end example - -@noindent -In this case, it looks like the backslash would continue the comment onto the -next line. However, the backslash-newline combination is never even -noticed because it is ``hidden'' inside the comment. Thus, the -@code{BEGIN} is noted as a syntax error. - -@cindex multiple statements on one line -When @command{awk} statements within one rule are short, you might want to put -more than one of them on a line. This is accomplished by separating the statements -with a semicolon (@samp{;}). -This also applies to the rules themselves. -Thus, the program shown at the start of this @value{SECTION} -could also be written this way: - -@example -/12/ @{ print $0 @} ; /21/ @{ print $0 @} -@end example - -@noindent -@strong{Note:} The requirement that states that rules on the same line must be -separated with a semicolon was not in the original @command{awk} -language; it was added for consistency with the treatment of statements -within an action. - -@node Other Features, When, Statements/Lines, Getting Started -@section Other Features of @command{awk} - -The @command{awk} language provides a number of predefined, or -@dfn{built-in}, variables that your programs can use to get information -from @command{awk}. There are other variables your program can set -as well to control how @command{awk} processes your data. - -In addition, @command{awk} provides a number of built-in functions for doing -common computational and string related operations. -@command{gawk} provides built-in functions for working with timestamps, -performing bit manipulation, and for runtime string translation. - -As we develop our presentation of the @command{awk} language, we introduce -most of the variables and many of the functions. They are defined -systematically in @ref{Built-in Variables}, and -@ref{Built-in, ,Built-in Functions}. - -@node When, , Other Features, Getting Started -@section When to Use @command{awk} - -@cindex uses of @command{awk} -@cindex applications of @command{awk} -Now that you've seen some of what @command{awk} can do, -you might wonder how @command{awk} could be useful for you. By using -utility programs, advanced patterns, field separators, arithmetic -statements, and other selection criteria, you can produce much more -complex output. The @command{awk} language is very useful for producing -reports from large amounts of raw data, such as summarizing information -from the output of other utility programs like @command{ls}. -(@xref{More Complex, ,A More Complex Example}.) - -Programs written with @command{awk} are usually much smaller than they would -be in other languages. This makes @command{awk} programs easy to compose and -use. Often, @command{awk} programs can be quickly composed at your terminal, -used once, and thrown away. Because @command{awk} programs are interpreted, you -can avoid the (usually lengthy) compilation part of the typical -edit-compile-test-debug cycle of software development. - -Complex programs have been written in @command{awk}, including a complete -retargetable assembler for eight-bit microprocessors (@pxref{Glossary}, for -more information), and a microcode assembler for a special purpose Prolog -computer. However, @command{awk}'s capabilities are strained by tasks of -such complexity. - -If you find yourself writing @command{awk} scripts of more than, say, a few -hundred lines, you might consider using a different programming -language. Emacs Lisp is a good choice if you need sophisticated string -or pattern matching capabilities. The shell is also good at string and -pattern matching; in addition, it allows powerful use of the system -utilities. More conventional languages, such as C, C++, and Java, offer -better facilities for system programming and for managing the complexity -of large programs. Programs in these languages may require more lines -of source code than the equivalent @command{awk} programs, but they are -easier to maintain and usually run more efficiently. - -@node Regexp, Reading Files, Getting Started, Top -@chapter Regular Expressions -@cindex pattern, regular expressions -@cindex regexp -@cindex regular expression -@cindex regular expressions as patterns - -A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a -set of strings. -Because regular expressions are such a fundamental part of @command{awk} -programming, their format and use deserve a separate @value{CHAPTER}. - -A regular expression enclosed in slashes (@samp{/}) -is an @command{awk} pattern that matches every input record whose text -belongs to that set. -The simplest regular expression is a sequence of letters, numbers, or -both. Such a regexp matches any string that contains that sequence. -Thus, the regexp @samp{foo} matches any string containing @samp{foo}. -Therefore, the pattern @code{/foo/} matches any input record containing -the three characters @samp{foo} @emph{anywhere} in the record. Other -kinds of regexps let you specify more complicated classes of strings. - -@ifnotinfo -Initially, the examples in this @value{CHAPTER} are simple. -As we explain more about how -regular expressions work, we will present more complicated instances. -@end ifnotinfo - -@menu -* Regexp Usage:: How to Use Regular Expressions. -* Escape Sequences:: How to write non-printing characters. -* Regexp Operators:: Regular Expression Operators. -* Character Lists:: What can go between @samp{[...]}. -* GNU Regexp Operators:: Operators specific to GNU software. -* Case-sensitivity:: How to do case-insensitive matching. -* Leftmost Longest:: How much text matches. -* Computed Regexps:: Using Dynamic Regexps. -@end menu - -@node Regexp Usage, Escape Sequences, Regexp, Regexp -@section How to Use Regular Expressions - -A regular expression can be used as a pattern by enclosing it in -slashes. Then the regular expression is tested against the -entire text of each record. (Normally, it only needs -to match some part of the text in order to succeed.) For example, the -following prints the second field of each record that contains the string -@samp{foo} anywhere in it: - -@example -$ awk '/foo/ @{ print $2 @}' BBS-list -@print{} 555-1234 -@print{} 555-6699 -@print{} 555-6480 -@print{} 555-2127 -@end example - -@cindex regexp operators -@cindex string-matching operators -@cindex operators, string-matching -@cindex operators, regexp matching -@cindex @code{~} operator -@cindex @code{!~} operator -Regular expressions can also be used in matching expressions. These -expressions allow you to specify the string to match against; it need -not be the entire current input record. The two operators @samp{~} -and @samp{!~} perform regular expression comparisons. Expressions -using these operators can be used as patterns, or in @code{if}, -@code{while}, @code{for}, and @code{do} statements. -(@xref{Statements, ,Control Statements in Actions}.) -For example: - -@example -@var{exp} ~ /@var{regexp}/ -@end example - -@noindent -is true if the expression @var{exp} (taken as a string) -matches @var{regexp}. The following example matches, or selects, -all input records with the uppercase letter @samp{J} somewhere in the -first field: - -@example -$ awk '$1 ~ /J/' inventory-shipped -@print{} Jan 13 25 15 115 -@print{} Jun 31 42 75 492 -@print{} Jul 24 34 67 436 -@print{} Jan 21 36 64 620 -@end example - -So does this: - -@example -awk '@{ if ($1 ~ /J/) print @}' inventory-shipped -@end example - -This next example is true if the expression @var{exp} -(taken as a character string) -does @emph{not} match @var{regexp}: - -@example -@var{exp} !~ /@var{regexp}/ -@end example - -The following example matches, -or selects, all input records whose first field @emph{does not} contain -the uppercase letter @samp{J}: - -@example -$ awk '$1 !~ /J/' inventory-shipped -@print{} Feb 15 32 24 226 -@print{} Mar 15 24 34 228 -@print{} Apr 31 52 63 420 -@print{} May 16 34 29 208 -@dots{} -@end example - -@cindex regexp constant -When a regexp is enclosed in slashes, such as @code{/foo/}, we call it -a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and -@code{"foo"} is a string constant. - -@node Escape Sequences, Regexp Operators, Regexp Usage, Regexp -@section Escape Sequences - -@cindex escape sequence notation -Some characters cannot be included literally in string constants -(@code{"foo"}) or regexp constants (@code{/foo/}). -Instead, they should be represented with @dfn{escape sequences}, -which are character sequences beginning with a backslash (@samp{\}). -One use of an escape sequence is to include a double quote character in -a string constant. Because a plain double quote ends the string, you -must use @samp{\"} to represent an actual double quote character as a -part of the string. For example: - -@example -$ awk 'BEGIN @{ print "He said \"hi!\" to her." @}' -@print{} He said "hi!" to her. -@end example - -The backslash character itself is another character that cannot be -included normally; you must write @samp{\\} to put one backslash in the -string or regexp. Thus, the string whose contents are the two characters -@samp{"} and @samp{\} must be written @code{"\"\\"}. - -Another use of backslash is to represent unprintable characters -such as tab or newline. While there is nothing to stop you from entering most -unprintable characters directly in a string constant or regexp constant, -they may look ugly. - -The following table lists -all the escape sequences used in @command{awk} and -what they represent. Unless noted otherwise, all these escape -sequences apply to both string constants and regexp constants: - -@table @code -@item \\ -A literal backslash, @samp{\}. - -@cindex @command{awk} language, V.4 version -@cindex @code{\a} escape sequence -@item \a -The ``alert'' character, @kbd{Ctrl-g}, ASCII code 7 (BEL). -(This usually makes some sort of audible noise.) - -@cindex @code{\b} escape sequence -@item \b -Backspace, @kbd{Ctrl-h}, ASCII code 8 (BS). - -@cindex @code{\f} escape sequence -@item \f -Formfeed, @kbd{Ctrl-l}, ASCII code 12 (FF). - -@cindex @code{\n} escape sequence -@item \n -Newline, @kbd{Ctrl-j}, ASCII code 10 (LF). - -@cindex @code{\r} escape sequence -@item \r -Carriage return, @kbd{Ctrl-m}, ASCII code 13 (CR). - -@cindex @code{\t} escape sequence -@item \t -Horizontal tab, @kbd{Ctrl-i}, ASCII code 9 (HT). - -@cindex @command{awk} language, V.4 version -@cindex @code{\v} escape sequence -@item \v -Vertical tab, @kbd{Ctrl-k}, ASCII code 11 (VT). - -@cindex @code{\}@var{nnn} escape sequence (octal) -@item \@var{nnn} -The octal value @var{nnn}, where @var{nnn} stands for 1 to 3 digits -between @samp{0} and @samp{7}. For example, the code for the ASCII ESC -(escape) character is @samp{\033}. - -@cindex @code{\x} escape sequence -@cindex @command{awk} language, V.4 version -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@item \x@var{hh}@dots{} -The hexadecimal value @var{hh}, where @var{hh} stands for a sequence -of hexadecimal digits (@samp{0} through @samp{9}, and either @samp{A} -through @samp{F} or @samp{a} through @samp{f}). Like the same construct -in ISO C, the escape sequence continues until the first non-hexadecimal -digit is seen. However, using more than two hexadecimal digits produces -undefined results. (The @samp{\x} escape sequence is not allowed in -POSIX @command{awk}.) - -@cindex @code{\/} escape sequence -@item \/ -A literal slash (necessary for regexp constants only). -This expression is used when you want to write a regexp -constant that contains a slash. Because the regexp is delimited by -slashes, you need to escape the slash that is part of the pattern, -in order to tell @command{awk} to keep processing the rest of the regexp. - -@cindex @code{\"} escape sequence -@item \" -A literal double quote (necessary for string constants only). -This expression is used when you want to write a string -constant that contains a double quote. Because the string is delimited by -double quotes, you need to escape the quote that is part of the string, -in order to tell @command{awk} to keep processing the rest of the string. -@end table - -In @command{gawk}, a number of additional two-character sequences that begin -with a backslash have special meaning in regexps. -@xref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}. - -In a regexp, a backslash before any character that is not in the above table -and not listed in -@ref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}, -means that the next character should be taken literally, even if it would -normally be a regexp operator. For example, @code{/a\+b/} matches the three -characters @samp{a+b}. - -@cindex portability issues -For complete portability, do not use a backslash before any character not -shown in the table above. - -To summarize: - -@itemize @bullet -@item -The escape sequences in the table above are always processed first, -for both string constants and regexp constants. This happens very early, -as soon as @command{awk} reads your program. - -@item -@command{gawk} processes both regexp constants and dynamic regexps -(@pxref{Computed Regexps, ,Using Dynamic Regexps}), -for the special operators listed in -@ref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}. - -@item -A backslash before any other character means to treat that character -literally. -@end itemize - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Backslash Before Regular Characters -@cindex advanced notes - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -If you place a backslash in a string constant before something that is -not one of the characters listed above, POSIX @command{awk} purposely -leaves what happens as undefined. There are two choices: - -@cindex automatic warnings -@cindex warnings, automatic -@table @asis -@item Strip the backslash out -This is what Unix @command{awk} and @command{gawk} both do. -For example, @code{"a\qc"} is the same as @code{"aqc"}. -(Because this is such an easy bug to both introduce and to miss, -@command{gawk} warns you about it.) -Consider @samp{FS = @w{"[ \t]+\|[ \t]+"}} to use vertical bars -surrounded by whitespace as the field separator. There should be -two backslashes in the string, @samp{FS = @w{"[ \t]+\\|[ \t]+"}}.) -@c I did this! This is why I added the warning. - -@item Leave the backslash alone -Some other @command{awk} implementations do this. -In such implementations, @code{"a\qc"} is the same as if you had typed -@code{"a\\qc"}. -@end table - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Escape Sequences for Metacharacters -@cindex advanced notes - -Suppose you use an octal or hexadecimal -escape to represent a regexp metacharacter -(@pxref{Regexp Operators, , Regular Expression Operators}). -Does @command{awk} treat the character as a literal character or as a regexp -operator? - -@cindex dark corner -Historically, such characters were taken literally. -@value{DARKCORNER} -However, the POSIX standard indicates that they should be treated -as real metacharacters, which is what @command{gawk} does. -In compatibility mode (@pxref{Options, ,Command-Line Options}), -@command{gawk} treats the characters represented by octal and hexadecimal -escape sequences literally when used in regexp constants. Thus, -@code{/a\52b/} is equivalent to @code{/a\*b/}. - -@node Regexp Operators, Character Lists, Escape Sequences, Regexp -@section Regular Expression Operators -@cindex metacharacters -@cindex regular expression metacharacters -@cindex regexp operators - -You can combine regular expressions with special characters, -called @dfn{regular expression operators} or @dfn{metacharacters}, to -increase the power and versatility of regular expressions. - -The escape sequences described -@ifnotinfo -earlier -@end ifnotinfo -in @ref{Escape Sequences}, -are valid inside a regexp. They are introduced by a @samp{\}, and -are recognized and converted into the corresponding real characters as -the very first step in processing regexps. - -Here is a list of metacharacters. All characters that are not escape -sequences and that are not listed in the table stand for themselves: - -@table @code -@item \ -This is used to suppress the special meaning of a character when -matching. For example, @samp{\$} -matches the character @samp{$}. - -@cindex anchors in regexps -@cindex regexp, anchors -@cindex Texinfo -@item ^ -This matches the beginning of a string. For example, @samp{^@@chapter} -matches @samp{@@chapter} at the beginning of a string, and can be used -to identify chapter beginnings in Texinfo source files. -The @samp{^} is known as an @dfn{anchor}, because it anchors the pattern to -match only at the beginning of the string. - -It is important to realize that @samp{^} does not match the beginning of -a line embedded in a string. -The condition is not true in the following example: - -@example -if ("line1\nLINE 2" ~ /^L/) @dots{} -@end example - -@item $ -This is similar to @samp{^} but it matches only at the end of a string. -For example, @samp{p$} -matches a record that ends with a @samp{p}. The @samp{$} is an anchor -and does not match the end of a line embedded in a string. -The condition is not true in the following example: - -@example -if ("line1\nLINE 2" ~ /1$/) @dots{} -@end example - -@item . -This matches any single character, -@emph{including} the newline character. For example, @samp{.P} -matches any single character followed by a @samp{P} in a string. Using -concatenation, we can make a regular expression such as @samp{U.A}, that -matches any three-character sequence that begins with @samp{U} and ends -with @samp{A}. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -In strict POSIX mode (@pxref{Options, ,Command-Line Options}), -@samp{.} does not match the @sc{nul} -character, which is a character with all bits equal to zero. -Otherwise, @sc{nul} is just another character. Other versions of @command{awk} -may not be able to match the @sc{nul} character. - -@cindex character list -@cindex character set (regexp component) -@cindex character class -@cindex bracket expression -@item [@dots{}] -This is called a @dfn{character list}.@footnote{In other literature, -you may see a character list referred to as either a -@dfn{character set}, a @dfn{character class} or a @dfn{bracket expression}.} -It matches any @emph{one} of the characters that are enclosed in -the square brackets. For example, @samp{[MVX]} matches any one of -the characters @samp{M}, @samp{V}, or @samp{X}, in a string. A full -discussion of what can be inside the square brackets of a character list -is given in -@ref{Character Lists, ,Using Character Lists}. - -@cindex complemented character list -@cindex character list, complemented -@item [^ @dots{}] -This is a @dfn{complemented character list}. The first character after -the @samp{[} @emph{must} be a @samp{^}. It matches any characters -@emph{except} those in the square brackets. For example, @samp{[^awk]} -matches any character that is not an @samp{a}, a @samp{w}, -or a @samp{k}. - -@item | -This is the @dfn{alternation operator} and it is used to specify -alternatives. -The @samp{|} has the lowest precedence of all the regular -expression operators. -For example, @samp{^P|[[:digit:]]} -matches any string that matches either @samp{^P} or @samp{[[:digit:]]}. This -means it matches any string that starts with @samp{P} or contains a digit. - -The alternation applies to the largest possible regexps on either side. - -@cindex Texinfo -@item (@dots{}) -Parentheses are used for grouping in regular expressions, similar to -arithmetic. They can be used to concatenate regular expressions -containing the alternation operator, @samp{|}. For example, -@samp{@@(samp|code)\@{[^@}]+\@}} matches both @samp{@@code@{foo@}} and -@samp{@@samp@{bar@}}. -(These are Texinfo formatting control sequences.) - -@item * -This symbol means that the preceding regular expression should be -repeated as many times as necessary to find a match. For example, @samp{ph*} -applies the @samp{*} symbol to the preceding @samp{h} and looks for matches -of one @samp{p} followed by any number of @samp{h}s. This also matches -just @samp{p} if no @samp{h}s are present. - -The @samp{*} repeats the @emph{smallest} possible preceding expression. -(Use parentheses if you want to repeat a larger expression.) It finds -as many repetitions as possible. For example, -@samp{awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample} -prints every record in @file{sample} containing a string of the form -@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on. -Notice the escaping of the parentheses by preceding them -with backslashes. - -@item + -This symbol is similar to @samp{*} except that the preceding expression must be -matched at least once. This means that @samp{wh+y} -would match @samp{why} and @samp{whhy}, but not @samp{wy}, whereas -@samp{wh*y} would match all three of these strings. -The following is a simpler -way of writing the last @samp{*} example: - -@example -awk '/\(c[ad]+r x\)/ @{ print @}' sample -@end example - -@item ? -This symbol is similar to @samp{*} except that the preceding expression can be -matched either once or not at all. For example, @samp{fe?d} -matches @samp{fed} and @samp{fd}, but nothing else. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@cindex interval expressions -@item @{@var{n}@} -@itemx @{@var{n},@} -@itemx @{@var{n},@var{m}@} -One or two numbers inside braces denote an @dfn{interval expression}. -If there is one number in the braces, the preceding regexp is repeated -@var{n} times. -If there are two numbers separated by a comma, the preceding regexp is -repeated @var{n} to @var{m} times. -If there is one number followed by a comma, then the preceding regexp -is repeated at least @var{n} times: - -@table @code -@item wh@{3@}y -Matches @samp{whhhy}, but not @samp{why} or @samp{whhhhy}. - -@item wh@{3,5@}y -Matches @samp{whhhy}, @samp{whhhhy}, or @samp{whhhhhy}, only. - -@item wh@{2,@}y -Matches @samp{whhy} or @samp{whhhy}, and so on. -@end table - -Interval expressions were not traditionally available in @command{awk}. -They were added as part of the POSIX standard to make @command{awk} -and @command{egrep} consistent with each other. - -However, because old programs may use @samp{@{} and @samp{@}} in regexp -constants, by default @command{gawk} does @emph{not} match interval expressions -in regexps. If either @option{--posix} or @option{--re-interval} are specified -(@pxref{Options, , Command-Line Options}), then interval expressions -are allowed in regexps. - -For new programs that use @samp{@{} and @samp{@}} in regexp constants, -it is good practice to always escape them with a backslash. Then the -regexp constants are valid and work the way you want them to, using -any version of @command{awk}.@footnote{Use two backslashes if you're -using a string constant with a regexp operator or function.} -@end table - -@cindex precedence, regexp operators -@cindex regexp operators, precedence of -In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators, -as well as the braces @samp{@{} and @samp{@}}, -have -the highest precedence, followed by concatenation, and finally by @samp{|}. -As in arithmetic, parentheses can change how operators are grouped. - -In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and @samp{?} operators -stand for themselves when there is nothing in the regexp that precedes them. -For example, @samp{/+/} matches a literal plus sign. However, many other versions of -@command{awk} treat such a usage as a syntax error. - -If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -POSIX character classes and interval expressions are not available in -regular expressions. - -@node Character Lists, GNU Regexp Operators, Regexp Operators, Regexp -@section Using Character Lists - -Within a character list, a @dfn{range expression} consists of two -characters separated by a hyphen. It matches any single character that -sorts between the two characters, using the locale's -collating sequence and character set. For example, in the default C -locale, @samp{[a-dx-z]} is equivalent to @samp{[abcdxyz]}. Many locales -sort characters in dictionary order, and in these locales, -@samp{[a-dx-z]} is typically not equivalent to @samp{[abcdxyz]}; instead it -might be equivalent to @samp{[aBbCcDdxXyYz]}, for example. To obtain -the traditional interpretation of bracket expressions, you can use the C -locale by setting the @env{LC_ALL} environment variable to the value -@samp{C}. - -To include one of the characters @samp{\}, @samp{]}, @samp{-}, or @samp{^} in a -character list, put a @samp{\} in front of it. For example: - -@example -[d\]] -@end example - -@noindent -matches either @samp{d} or @samp{]}. - -@cindex @command{egrep} utility -This treatment of @samp{\} in character lists -is compatible with other @command{awk} -implementations and is also mandated by POSIX. -The regular expressions in @command{awk} are a superset -of the POSIX specification for Extended Regular Expressions (EREs). -POSIX EREs are based on the regular expressions accepted by the -traditional @command{egrep} utility. - -@cindex character class -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@dfn{Character classes} are a new feature introduced in the POSIX standard. -A character class is a special notation for describing -lists of characters that have a specific attribute, but the -actual characters can vary from country to country and/or -from character set to character set. For example, the notion of what -is an alphabetic character differs between the United States and France. - -A character class is only valid in a regexp @emph{inside} the -brackets of a character list. Character classes consist of @samp{[:}, -a keyword denoting the class, and @samp{:]}. Here are the character -classes defined by the POSIX standard: - -@c the regular table is commented out while trying out the multitable. -@c leave it here in case we need to go back, but make sure the text -@c still corresponds! - -@ignore -@table @code -@item [:alnum:] -Alphanumeric characters. - -@item [:alpha:] -Alphabetic characters. - -@item [:blank:] -Space and tab characters. - -@item [:cntrl:] -Control characters. - -@item [:digit:] -Numeric characters. - -@item [:graph:] -Characters that are printable and visible. -(A space is printable but not visible, whereas an @samp{a} is both.) - -@item [:lower:] -Lowercase alphabetic characters. - -@item [:print:] -Printable characters (characters that are not control characters). - -@item [:punct:] -Punctuation characters (characters that are not letters, digits, -control characters, or space characters). - -@item [:space:] -Space characters (such as space, tab, and formfeed, to name a few). - -@item [:upper:] -Uppercase alphabetic characters. - -@item [:xdigit:] -Characters that are hexadecimal digits. -@end table -@end ignore - -@multitable {@code{[:xdigit:]}} {Characters that are both printable and visible. (A space is} -@item @code{[:alnum:]} @tab Alphanumeric characters. -@item @code{[:alpha:]} @tab Alphabetic characters. -@item @code{[:blank:]} @tab Space and tab characters. -@item @code{[:cntrl:]} @tab Control characters. -@item @code{[:digit:]} @tab Numeric characters. -@item @code{[:graph:]} @tab Characters that are both printable and visible. -(A space is printable but not visible, whereas an @samp{a} is both.) -@item @code{[:lower:]} @tab Lowercase alphabetic characters. -@item @code{[:print:]} @tab Printable characters (characters that are not control characters). -@item @code{[:punct:]} @tab Punctuation characters (characters that are not letters, digits, -control characters, or space characters). -@item @code{[:space:]} @tab Space characters (such as space, tab, and formfeed, to name a few). -@item @code{[:upper:]} @tab Uppercase alphabetic characters. -@item @code{[:xdigit:]} @tab Characters that are hexadecimal digits. -@end multitable - -For example, before the POSIX standard, you had to write @code{/[A-Za-z0-9]/} -to match alphanumeric characters. If your -character set had other alphabetic characters in it, this would not -match them, and if your character set collated differently from -ASCII, this might not even match the ASCII alphanumeric characters. -With the POSIX character classes, you can write -@code{/[[:alnum:]]/} to match the alphabetic -and numeric characters in your character set. - -@cindex collating elements -Two additional special sequences can appear in character lists. -These apply to non-ASCII character sets, which can have single symbols -(called @dfn{collating elements}) that are represented with more than one -character. They can also have several characters that are equivalent for -@dfn{collating}, or sorting, purposes. (For example, in French, a plain ``e'' -and a grave-accented ``@`e'' are equivalent.) - -@table @asis -@cindex collating symbols -@item Collating Symbols -A @dfn{collating symbol} is a multicharacter collating element enclosed between -@samp{[.} and @samp{.]}. For example, if @samp{ch} is a collating element, -then @code{[[.ch.]]} is a regexp that matches this collating element, whereas -@code{[ch]} is a regexp that matches either @samp{c} or @samp{h}. - -@cindex equivalence classes -@item Equivalence Classes -An @dfn{equivalence class} is a locale-specific name for a list of -characters that are equal. The name is enclosed between -@samp{[=} and @samp{=]}. -For example, the name @samp{e} might be used to represent all of -``e,'' ``@`e,'' and ``@'e.'' In this case, @code{[[=e=]]} is a regexp -that matches any of @samp{e}, @samp{@'e}, or @samp{@`e}. -@end table - -These features are very valuable in non-English speaking locales. - -@strong{Caution:} The library functions that @command{gawk} uses for regular -expression matching currently only recognize POSIX character classes; -they do not recognize collating symbols or equivalence classes. -@c maybe one day ... - -@node GNU Regexp Operators, Case-sensitivity, Character Lists, Regexp -@section @command{gawk}-Specific Regexp Operators - -@c This section adapted (long ago) from the regex-0.12 manual - -@cindex regexp operators, GNU specific -@cindex word, regexp definition of -GNU software that deals with regular expressions provides a number of -additional regexp operators. These operators are described in this -@value{SECTION} and are specific to @command{gawk}; -they are not available in other @command{awk} implementations. -Most of the additional operators deal with word matching. -For our purposes, a @dfn{word} is a sequence of one or more letters, digits, -or underscores (@samp{_}): - -@table @code -@cindex @code{\w} regexp operator -@item \w -Matches any word-constituent character---that is, it matches any -letter, digit, or underscore. Think of it as short-hand for -@w{@code{[[:alnum:]_]}}. - -@cindex @code{\W} regexp operator -@item \W -Matches any character that is not word-constituent. -Think of it as short-hand for -@w{@code{[^[:alnum:]_]}}. - -@cindex @code{\<} regexp operator -@item \< -Matches the empty string at the beginning of a word. -For example, @code{/\<away/} matches @samp{away} but not -@samp{stowaway}. - -@cindex @code{\>} regexp operator -@item \> -Matches the empty string at the end of a word. -For example, @code{/stow\>/} matches @samp{stow} but not @samp{stowaway}. - -@cindex @code{\y} regexp operator -@cindex word boundaries, matching -@item \y -Matches the empty string at either the beginning or the -end of a word (i.e., the word boundar@strong{y}). For example, @samp{\yballs?\y} -matches either @samp{ball} or @samp{balls}, as a separate word. - -@cindex @code{\B} regexp operator -@item \B -Matches the empty string that occurs between two -word-constituent characters. For example, -@code{/\Brat\B/} matches @samp{crate} but it does not match @samp{dirty rat}. -@samp{\B} is essentially the opposite of @samp{\y}. -@end table - -@cindex buffer matching operators -There are two other operators that work on buffers. In Emacs, a -@dfn{buffer} is, naturally, an Emacs buffer. For other programs, -@command{gawk}'s regexp library routines consider the entire -string to match as the buffer. - -@table @code -@item \` -@cindex @code{\`} regexp operator -Matches the empty string at the -beginning of a buffer (string). - -@cindex @code{\'} regexp operator -@item \' -Matches the empty string at the -end of a buffer (string). -@end table - -Because @samp{^} and @samp{$} always work in terms of the beginning -and end of strings, these operators don't add any new capabilities -for @command{awk}. They are provided for compatibility with other -GNU software. - -In other GNU software, the word-boundary operator is @samp{\b}. However, -that conflicts with the @command{awk} language's definition of @samp{\b} -as backspace, so @command{gawk} uses a different letter. -An alternative method would have been to require two backslashes in the -GNU operators, but this was deemed too confusing. The current -method of using @samp{\y} for the GNU @samp{\b} appears to be the -lesser of two evils. - -@c NOTE!!! Keep this in sync with the same table in the summary appendix! -@c -@c Should really do this with file inclusion. -@cindex regexp, effect of command-line options -The various command-line options -(@pxref{Options, ,Command-Line Options}) -control how @command{gawk} interprets characters in regexps: - -@table @asis -@item No options -In the default case, @command{gawk} provides all the facilities of -POSIX regexps and the -@ifnotinfo -previously described -GNU regexp operators. -@end ifnotinfo -@ifnottex -GNU regexp operators described -in @ref{Regexp Operators, ,Regular Expression Operators}. -@end ifnottex -However, interval expressions are not supported. - -@item @code{--posix} -Only POSIX regexps are supported; the GNU operators are not special -(e.g., @samp{\w} matches a literal @samp{w}). Interval expressions -are allowed. - -@item @code{--traditional} -Traditional Unix @command{awk} regexps are matched. The GNU operators -are not special, interval expressions are not available, nor -are the POSIX character classes (@code{[[:alnum:]]} and so on). -Characters described by octal and hexadecimal escape sequences are -treated literally, even if they represent regexp metacharacters. - -@item @code{--re-interval} -Allow interval expressions in regexps, even if @option{--traditional} -has been provided. -@end table - -@node Case-sensitivity, Leftmost Longest, GNU Regexp Operators, Regexp -@section Case Sensitivity in Matching - -@cindex case sensitivity -@cindex ignoring case -Case is normally significant in regular expressions, both when matching -ordinary characters (i.e., not metacharacters) and inside character -sets. Thus, a @samp{w} in a regular expression matches only a lowercase -@samp{w} and not an uppercase @samp{W}. - -The simplest way to do a case-independent match is to use a character -list---for example, @samp{[Ww]}. However, this can be cumbersome if -you need to use it often and it can make the regular expressions harder -to read. There are two alternatives that you might prefer. - -One way to perform a case-insensitive match at a particular point in the -program is to convert the data to a single case, using the -@code{tolower} or @code{toupper} built-in string functions (which we -haven't discussed yet; -@pxref{String Functions, ,String Manipulation Functions}). -For example: - -@example -tolower($1) ~ /foo/ @{ @dots{} @} -@end example - -@noindent -converts the first field to lowercase before matching against it. -This works in any POSIX-compliant @command{awk}. - -@cindex differences between @command{gawk} and @command{awk} -@cindex @code{~} operator -@cindex @code{!~} operator -@cindex @code{IGNORECASE} variable -Another method, specific to @command{gawk}, is to set the variable -@code{IGNORECASE} to a nonzero value (@pxref{Built-in Variables}). -When @code{IGNORECASE} is not zero, @emph{all} regexp and string -operations ignore case. Changing the value of -@code{IGNORECASE} dynamically controls the case sensitivity of the -program as it runs. Case is significant by default because -@code{IGNORECASE} (like most variables) is initialized to zero: - -@example -x = "aB" -if (x ~ /ab/) @dots{} # this test will fail - -IGNORECASE = 1 -if (x ~ /ab/) @dots{} # now it will succeed -@end example - -In general, you cannot use @code{IGNORECASE} to make certain rules -case-insensitive and other rules case-sensitive, because there is no -straightforward way -to set @code{IGNORECASE} just for the pattern of -a particular rule.@footnote{Experienced C and C++ programmers will note -that it is possible, using something like -@samp{IGNORECASE = 1 && /foObAr/ @{ @dots{} @}} -and -@samp{IGNORECASE = 0 || /foobar/ @{ @dots{} @}}. -However, this is somewhat obscure and we don't recommend it.} -To do this, use either character lists or @code{tolower}. However, one -thing you can do with @code{IGNORECASE} only is dynamically turn -case-sensitivity on or off for all the rules at once. - -@code{IGNORECASE} can be set on the command line or in a @code{BEGIN} rule -(@pxref{Other Arguments, ,Other Command-Line Arguments}; also -@pxref{Using BEGIN/END, ,Startup and Cleanup Actions}). -Setting @code{IGNORECASE} from the command line is a way to make -a program case-insensitive without having to edit it. - -Prior to @command{gawk} 3.0, the value of @code{IGNORECASE} -affected regexp operations only. It did not affect string comparison -with @samp{==}, @samp{!=}, and so on. -Beginning with @value{PVERSION} 3.0, both regexp and string comparison -operations are also affected by @code{IGNORECASE}. - -@cindex ISO 8859-1 -@cindex ISO Latin-1 -Beginning with @command{gawk} 3.0, -the equivalences between upper- -and lowercase characters are based on the ISO-8859-1 (ISO Latin-1) -character set. This character set is a superset of the traditional 128 -ASCII characters, that also provides a number of characters suitable -for use with European languages. - -The value of @code{IGNORECASE} has no effect if @command{gawk} is in -compatibility mode (@pxref{Options, ,Command-Line Options}). -Case is always significant in compatibility mode. - -@node Leftmost Longest, Computed Regexps, Case-sensitivity, Regexp -@section How Much Text Matches? - -@cindex leftmost longest match -@cindex matching, leftmost longest -Consider the following: - -@example -echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' -@end example - -This example uses the @code{sub} function (which we haven't discussed yet; -@pxref{String Functions, ,String Manipulation Functions}) -to make a change to the input record. Here, the regexp @code{/a+/} -indicates ``one or more @samp{a} characters,'' and the replacement -text is @samp{<A>}. - -The input contains four @samp{a} characters. -@command{awk} (and POSIX) regular expressions always match -the leftmost, @emph{longest} sequence of input characters that can -match. Thus, all four @samp{a} characters are -replaced with @samp{<A>} in this example: - -@example -$ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' -@print{} <A>bcd -@end example - -For simple match/no-match tests, this is not so important. But when doing -text matching and substitutions with the @code{match}, @code{sub}, @code{gsub}, -and @code{gensub} functions, it is very important. -@ifinfo -@xref{String Functions, ,String Manipulation Functions}, -for more information on these functions. -@end ifinfo -Understanding this principle is also important for regexp-based record -and field splitting (@pxref{Records, ,How Input Is Split into Records}, -and also @pxref{Field Separators, ,Specifying How Fields Are Separated}). - -@node Computed Regexps, , Leftmost Longest, Regexp -@section Using Dynamic Regexps - -@cindex computed regular expressions -@cindex regular expressions, computed -@cindex dynamic regular expressions -@cindex regexp, dynamic -@cindex @code{~} operator -@cindex @code{!~} operator -The righthand side of a @samp{~} or @samp{!~} operator need not be a -regexp constant (i.e., a string of characters between slashes). It may -be any expression. The expression is evaluated and converted to a string -if necessary; the contents of the string are used as the -regexp. A regexp that is computed in this way is called a @dfn{dynamic -regexp}: - -@example -BEGIN @{ digits_regexp = "[[:digit:]]+" @} -$0 ~ digits_regexp @{ print @} -@end example - -@noindent -This sets @code{digits_regexp} to a regexp that describes one or more digits, -and tests whether the input record matches this regexp. - -@c @strong{Caution:} -When using the @samp{~} and @samp{!~} -@strong{Caution:} When using the @samp{~} and @samp{!~} -operators, there is a difference between a regexp constant -enclosed in slashes and a string constant enclosed in double quotes. -If you are going to use a string constant, you have to understand that -the string is, in essence, scanned @emph{twice}: the first time when -@command{awk} reads your program, and the second time when it goes to -match the string on the lefthand side of the operator with the pattern -on the right. This is true of any string valued expression (such as -@code{digits_regexp} shown previously), not just string constants. - -@cindex regexp constants, difference between slashes and quotes -What difference does it make if the string is -scanned twice? The answer has to do with escape sequences, and particularly -with backslashes. To get a backslash into a regular expression inside a -string, you have to type two backslashes. - -For example, @code{/\*/} is a regexp constant for a literal @samp{*}. -Only one backslash is needed. To do the same thing with a string, -you have to type @code{"\\*"}. The first backslash escapes the -second one so that the string actually contains the -two characters @samp{\} and @samp{*}. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -Given that you can use both regexp and string constants to describe -regular expressions, which should you use? The answer is ``regexp -constants,'' for several reasons: - -@itemize @bullet -@item -String constants are more complicated to write and -more difficult to read. Using regexp constants makes your programs -less error-prone. Not understanding the difference between the two -kinds of constants is a common source of errors. - -@item -It is more efficient to use regexp constants. @command{awk} can note -that you have supplied a regexp, and store it internally in a form that -makes pattern matching more efficient. When using a string constant, -@command{awk} must first convert the string into this internal form and -then perform the pattern matching. - -@item -Using regexp constants is better form; it shows clearly that you -intend a regexp match. -@end itemize - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Using @code{\n} in Character Lists of Dynamic Regexps -@cindex advanced notes -@cindex dynamic regular expressions with embedded newlines -@cindex regexp, dynamic, with embedded newlines -@cindex newlines, embedded in dynamic regexps -@cindex embedded newlines, in dynamic regexps - -Some commercial versions of @command{awk} do not allow the newline -character to be used inside a character list for a dynamic regexp: - -@example -$ awk '$0 ~ "[ \t\n]"' -@error{} awk: newline in character class [ -@error{} ]... -@error{} source line number 1 -@error{} context is -@error{} >>> <<< -@end example - -But a newline in a regexp constant works with no problem: - -@example -$ awk '$0 ~ /[ \t\n]/' -here is a sample line -@print{} here is a sample line -@kbd{Ctrl-d} -@end example - -@command{gawk} does not have this problem, and it isn't likely to -occur often in practice, but it's worth noting for future reference. - -@node Reading Files, Printing, Regexp, Top -@chapter Reading Input Files - -@cindex reading files -@cindex input -@cindex standard input -@cindex @code{FILENAME} variable -In the typical @command{awk} program, all input is read either from the -standard input (by default, this is the keyboard but often it is a pipe from another -command), or from files whose names you specify on the @command{awk} -command line. If you specify input files, @command{awk} reads them -in order, processing all the data from one before going on to the next. -The name of the current input file can be found in the built-in variable -@code{FILENAME} -(@pxref{Built-in Variables}). - -The input is read in units called @dfn{records}, and is processed by the -rules of your program one record at a time. -By default, each record is one line. Each -record is automatically split into chunks called @dfn{fields}. -This makes it more convenient for programs to work on the parts of a record. - -On rare occasions, you may need to use the @code{getline} command. -The @code{getline} command is valuable, both because it -can do explicit input from any number of files, and because the files -used with it do not have to be named on the @command{awk} command line -(@pxref{Getline, ,Explicit Input with @code{getline}}). - -@menu -* Records:: Controlling how data is split into records. -* Fields:: An introduction to fields. -* Non-Constant Fields:: Non-constant Field Numbers. -* Changing Fields:: Changing the Contents of a Field. -* Field Separators:: The field separator and how to change it. -* Constant Size:: Reading constant width data. -* Multiple Line:: Reading multi-line records. -* Getline:: Reading files under explicit program control - using the @code{getline} function. -@end menu - -@node Records, Fields, Reading Files, Reading Files -@section How Input Is Split into Records - -@cindex number of records, @code{NR}, @code{FNR} -@cindex @code{NR} variable -@cindex @code{FNR} variable -The @command{awk} utility divides the input for your @command{awk} -program into records and fields. -@command{awk} keeps track of the number of records that have -been read -so far -from the current input file. This value is stored in a -built-in variable called @code{FNR}. It is reset to zero when a new -file is started. Another built-in variable, @code{NR}, is the total -number of input records read so far from all @value{DF}s. It starts at zero, -but is never automatically reset to zero. - -@cindex record separator, @code{RS} -@cindex changing the record separator -@cindex record, definition of -@cindex @code{RS} variable -Records are separated by a character called the @dfn{record separator}. -By default, the record separator is the newline character. -This is why records are, by default, single lines. -A different character can be used for the record separator by -assigning the character to the built-in variable @code{RS}. - -Like any other variable, -the value of @code{RS} can be changed in the @command{awk} program -with the assignment operator, @samp{=} -(@pxref{Assignment Ops, ,Assignment Expressions}). -The new record-separator character should be enclosed in quotation marks, -which indicate a string constant. Often the right time to do this is -at the beginning of execution, before any input is processed, -so that the very first record is read with the proper separator. -To do this, use the special @code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). -For example: - -@example -awk 'BEGIN @{ RS = "/" @} - @{ print $0 @}' BBS-list -@end example - -@noindent -changes the value of @code{RS} to @code{"/"}, before reading any input. -This is a string whose first character is a slash; as a result, records -are separated by slashes. Then the input file is read, and the second -rule in the @command{awk} program (the action with no pattern) prints each -record. Because each @code{print} statement adds a newline at the end of -its output, the effect of this @command{awk} program is to copy the input -with each slash changed to a newline. Here are the results of running -the program on @file{BBS-list}: - -@example -$ awk 'BEGIN @{ RS = "/" @} -> @{ print $0 @}' BBS-list -@print{} aardvark 555-5553 1200 -@print{} 300 B -@print{} alpo-net 555-3412 2400 -@print{} 1200 -@print{} 300 A -@print{} barfly 555-7685 1200 -@print{} 300 A -@print{} bites 555-1675 2400 -@print{} 1200 -@print{} 300 A -@print{} camelot 555-0542 300 C -@print{} core 555-2912 1200 -@print{} 300 C -@print{} fooey 555-1234 2400 -@print{} 1200 -@print{} 300 B -@print{} foot 555-6699 1200 -@print{} 300 B -@print{} macfoo 555-6480 1200 -@print{} 300 A -@print{} sdace 555-3430 2400 -@print{} 1200 -@print{} 300 A -@print{} sabafoo 555-2127 1200 -@print{} 300 C -@print{} -@end example - -@noindent -Note that the entry for the @samp{camelot} BBS is not split. -In the original @value{DF} -(@pxref{Sample Data Files, ,@value{DDF}s for the Examples}), -the line looks like this: - -@example -camelot 555-0542 300 C -@end example - -@noindent -It has one baud rate only, so there are no slashes in the record, -unlike the others which have two or more baud rates. -In fact, this record is treated as part of the record -for the @samp{core} BBS; the newline separating them in the output -is the original newline in the @value{DF}, not the one added by -@command{awk} when it printed the record! - -Another way to change the record separator is on the command line, -using the variable-assignment feature -(@pxref{Other Arguments, ,Other Command-Line Arguments}): - -@example -awk '@{ print $0 @}' RS="/" BBS-list -@end example - -@noindent -This sets @code{RS} to @samp{/} before processing @file{BBS-list}. - -Using an unusual character such as @samp{/} for the record separator -produces correct behavior in the vast majority of cases. However, -the following (extreme) pipeline prints a surprising @samp{1}: - -@example -$ echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}' -@print{} 1 -@end example - -There is one field, consisting of a newline. The value of the built-in -variable @code{NF} is the number of fields in the current record. - -@cindex dark corner -Reaching the end of an input file terminates the current input record, -even if the last character in the file is not the character in @code{RS}. -@value{DARKCORNER} - -@cindex empty string -The empty string @code{""} (a string without any characters) -has a special meaning -as the value of @code{RS}. It means that records are separated -by one or more blank lines and nothing else. -@xref{Multiple Line, ,Multiple-Line Records}, for more details. - -If you change the value of @code{RS} in the middle of an @command{awk} run, -the new value is used to delimit subsequent records, but the record -currently being processed, as well as records already processed, are not -affected. - -@cindex @code{RT} variable -@cindex record terminator, @code{RT} -@cindex terminator, record -@cindex differences between @command{gawk} and @command{awk} -@cindex regular expressions as record separators -After the end of the record has been determined, @command{gawk} -sets the variable @code{RT} to the text in the input that matched -@code{RS}. -When using @command{gawk}, -the value of @code{RS} is not limited to a one-character -string. It can be any regular expression -(@pxref{Regexp, ,Regular Expressions}). -In general, each record -ends at the next string that matches the regular expression; the next -record starts at the end of the matching string. This general rule is -actually at work in the usual case, where @code{RS} contains just a -newline: a record ends at the beginning of the next matching string (the -next newline in the input) and the following record starts just after -the end of this string (at the first character of the following line). -The newline, because it matches @code{RS}, is not part of either record. - -When @code{RS} is a single character, @code{RT} -contains the same single character. However, when @code{RS} is a -regular expression, @code{RT} contains -the actual input text that matched the regular expression. - -The following example illustrates both of these features. -It sets @code{RS} equal to a regular expression that -matches either a newline or a series of one or more uppercase letters -with optional leading and/or trailing whitespace: - -@example -$ echo record 1 AAAA record 2 BBBB record 3 | -> gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @} -> @{ print "Record =", $0, "and RT =", RT @}' -@print{} Record = record 1 and RT = AAAA -@print{} Record = record 2 and RT = BBBB -@print{} Record = record 3 and RT = -@print{} -@end example - -@noindent -The final line of output has an extra blank line. This is because the -value of @code{RT} is a newline, and the @code{print} statement -supplies its own terminating newline. -@xref{Simple Sed, ,A Simple Stream Editor}, for a more useful example -of @code{RS} as a regexp and @code{RT}. - -@cindex differences between @command{gawk} and @command{awk} -The use of @code{RS} as a regular expression and the @code{RT} -variable are @command{gawk} extensions; they are not available in -compatibility mode -(@pxref{Options, ,Command-Line Options}). -In compatibility mode, only the first character of the value of -@code{RS} is used to determine the end of the record. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: @code{RS = "\0"} Is Not Portable -@cindex advanced notes -@cindex portability issues - -There are times when you might want to treat an entire @value{DF} as a -single record. The only way to make this happen is to give @code{RS} -a value that you know doesn't occur in the input file. This is hard -to do in a general way, such that a program always works for arbitrary -input files. -@c can you say `understatement' boys and girls? - -You might think that for text files, the @sc{nul} character, which -consists of a character with all bits equal to zero, is a good -value to use for @code{RS} in this case: - -@example -BEGIN @{ RS = "\0" @} # whole file becomes one record? -@end example - -@cindex differences between @command{gawk} and @command{awk} -@command{gawk} in fact accepts this, and uses the @sc{nul} -character for the record separator. -However, this usage is @emph{not} portable -to other @command{awk} implementations. - -@cindex dark corner -All other @command{awk} implementations@footnote{At least that we know -about.} store strings internally as C-style strings. C strings use the -@sc{nul} character as the string terminator. In effect, this means that -@samp{RS = "\0"} is the same as @samp{RS = ""}. -@value{DARKCORNER} - -The best way to treat a whole file as a single record is to -simply read the file in, one record at a time, concatenating each -record onto the end of the previous ones. - -@node Fields, Non-Constant Fields, Records, Reading Files -@section Examining Fields - -@cindex examining fields -@cindex fields -@cindex accessing fields -When @command{awk} reads an input record, the record is -automatically separated or @dfn{parsed} by the interpreter into chunks -called @dfn{fields}. By default, fields are separated by @dfn{whitespace}, -like words in a line. -Whitespace in @command{awk} means any string of one or more spaces, -tabs, or newlines;@footnote{In POSIX @command{awk}, newlines are not -considered whitespace for separating fields.} other characters, such as -formfeed, vertical tab, etc.@: that are -considered whitespace by other languages, are @emph{not} considered -whitespace by @command{awk}. - -The purpose of fields is to make it more convenient for you to refer to -these pieces of the record. You don't have to use them---you can -operate on the whole record if you want---but fields are what make -simple @command{awk} programs so powerful. - -@cindex @code{$} field operator -@cindex field operator @code{$} -A dollar-sign (@samp{$}) is used -to refer to a field in an @command{awk} program, -followed by the number of the field you want. Thus, @code{$1} -refers to the first field, @code{$2} to the second, and so on. -(Unlike the Unix shells, the field numbers are not limited to single digits. -@code{$127} is the one hundred and twenty-seventh field in the record.) -For example, suppose the following is a line of input: - -@example -This seems like a pretty nice example. -@end example - -@noindent -Here the first field, or @code{$1}, is @samp{This}, the second field, or -@code{$2}, is @samp{seems}, and so on. Note that the last field, -@code{$7}, is @samp{example.}. Because there is no space between the -@samp{e} and the @samp{.}, the period is considered part of the seventh -field. - -@cindex @code{NF} variable -@cindex number of fields, @code{NF} -@code{NF} is a built-in variable whose value is the number of fields -in the current record. @command{awk} automatically updates the value -of @code{NF} each time it reads a record. No matter how many fields -there are, the last field in a record can be represented by @code{$NF}. -So, @code{$NF} is the same as @code{$7}, which is @samp{example.}. -If you try to reference a field beyond the last -one (such as @code{$8} when the record has only seven fields), you get -the empty string. (If used in a numeric operation, you get zero.) - -The use of @code{$0}, which looks like a reference to the ``zeroth'' field, is -a special case: it represents the whole input record -when you are not interested in specific fields. -Here are some more examples: - -@example -$ awk '$1 ~ /foo/ @{ print $0 @}' BBS-list -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sabafoo 555-2127 1200/300 C -@end example - -@noindent -This example prints each record in the file @file{BBS-list} whose first -field contains the string @samp{foo}. The operator @samp{~} is called a -@dfn{matching operator} -(@pxref{Regexp Usage, , How to Use Regular Expressions}); -it tests whether a string (here, the field @code{$1}) matches a given regular -expression. - -By contrast, the following example -looks for @samp{foo} in @emph{the entire record} and prints the first -field and the last field for each matching input record: - -@example -$ awk '/foo/ @{ print $1, $NF @}' BBS-list -@print{} fooey B -@print{} foot B -@print{} macfoo A -@print{} sabafoo C -@end example - -@node Non-Constant Fields, Changing Fields, Fields, Reading Files -@section Non-Constant Field Numbers - -The number of a field does not need to be a constant. Any expression in -the @command{awk} language can be used after a @samp{$} to refer to a -field. The value of the expression specifies the field number. If the -value is a string, rather than a number, it is converted to a number. -Consider this example: - -@example -awk '@{ print $NR @}' -@end example - -@noindent -Recall that @code{NR} is the number of records read so far: one in the -first record, two in the second, etc. So this example prints the first -field of the first record, the second field of the second record, and so -on. For the twentieth record, field number 20 is printed; most likely, -the record has fewer than 20 fields, so this prints a blank line. -Here is another example of using expressions as field numbers: - -@example -awk '@{ print $(2*2) @}' BBS-list -@end example - -@command{awk} evaluates the expression @samp{(2*2)} and uses -its value as the number of the field to print. The @samp{*} sign -represents multiplication, so the expression @samp{2*2} evaluates to four. -The parentheses are used so that the multiplication is done before the -@samp{$} operation; they are necessary whenever there is a binary -operator in the field-number expression. This example, then, prints the -hours of operation (the fourth field) for every line of the file -@file{BBS-list}. (All of the @command{awk} operators are listed, in -order of decreasing precedence, in -@ref{Precedence, , Operator Precedence (How Operators Nest)}.) - -If the field number you compute is zero, you get the entire record. -Thus, @samp{$(2-2)} has the same value as @code{$0}. Negative field -numbers are not allowed; trying to reference one usually terminates -the program. (The POSIX standard does not define -what happens when you reference a negative field number. @command{gawk} -notices this and terminates your program. Other @command{awk} -implementations may behave differently.) - -As mentioned in @ref{Fields, ,Examining Fields}, -@command{awk} stores the current record's number of fields in the built-in -variable @code{NF} (also @pxref{Built-in Variables}). The expression -@code{$NF} is not a special feature---it is the direct consequence of -evaluating @code{NF} and using its value as a field number. - -@node Changing Fields, Field Separators, Non-Constant Fields, Reading Files -@section Changing the Contents of a Field - -@cindex fields, changing contents of -@cindex changing contents of a field -@cindex assignment to fields -The contents of a field, as seen by @command{awk}, can be changed within an -@command{awk} program; this changes what @command{awk} perceives as the -current input record. (The actual input is untouched; @command{awk} @emph{never} -modifies the input file.) -Consider this example and its output: - -@example -$ awk '@{ nboxes = $3 ; $3 = $3 - 10 -> print nboxes, $3 @}' inventory-shipped -@print{} 13 3 -@print{} 15 5 -@print{} 15 5 -@dots{} -@end example - -@noindent -The program first saves the original value of field three in the variable -@code{nboxes}. -The @samp{-} sign represents subtraction, so this program reassigns -field three, @code{$3}, as the original value of field three minus ten: -@samp{$3 - 10}. (@xref{Arithmetic Ops, ,Arithmetic Operators}.) -Then it prints the original and new values for field three. -(Someone in the warehouse made a consistent mistake while inventorying -the red boxes.) - -For this to work, the text in field @code{$2} must make sense -as a number; the string of characters must be converted to a number -for the computer to do arithmetic on it. The number resulting -from the subtraction is converted back to a string of characters that -then becomes field three. -@xref{Conversion, ,Conversion of Strings and Numbers}. - -When the value of a field is changed (as perceived by @command{awk}), the -text of the input record is recalculated to contain the new field where -the old one was. In other words, @code{$0} changes to reflect the altered -field. Thus, this program -prints a copy of the input file, with 10 subtracted from the second -field of each line: - -@example -$ awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped -@print{} Jan 3 25 15 115 -@print{} Feb 5 32 24 226 -@print{} Mar 5 24 34 228 -@dots{} -@end example - -It is also possible to also assign contents to fields that are out -of range. For example: - -@example -$ awk '@{ $6 = ($5 + $4 + $3 + $2) -> print $6 @}' inventory-shipped -@print{} 168 -@print{} 297 -@print{} 301 -@dots{} -@end example - -@noindent -We've just created @code{$6}, whose value is the sum of fields -@code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign -represents addition. For the file @file{inventory-shipped}, @code{$6} -represents the total number of parcels shipped for a particular month. - -Creating a new field changes @command{awk}'s internal copy of the current -input record, which is the value of @code{$0}. Thus, if you do @samp{print $0} -after adding a field, the record printed includes the new field, with -the appropriate number of field separators between it and the previously -existing fields. - -This recomputation affects and is affected by -@code{NF} (the number of fields; @pxref{Fields, ,Examining Fields}). -It is also affected by a feature that has not been discussed yet: -the @dfn{output field separator}, @code{OFS}, -used to separate the fields (@pxref{Output Separators}). -For example, the value of @code{NF} is set to the number of the highest -field you create. - -Note, however, that merely @emph{referencing} an out-of-range field -does @emph{not} change the value of either @code{$0} or @code{NF}. -Referencing an out-of-range field only produces an empty string. For -example: - -@example -if ($(NF+1) != "") - print "can't happen" -else - print "everything is normal" -@end example - -@noindent -should print @samp{everything is normal}, because @code{NF+1} is certain -to be out of range. (@xref{If Statement, ,The @code{if}-@code{else} Statement}, -for more information about @command{awk}'s @code{if-else} statements. -@xref{Typing and Comparison, ,Variable Typing and Comparison Expressions}, -for more information about the @samp{!=} operator.) - -It is important to note that making an assignment to an existing field -changes the -value of @code{$0} but does not change the value of @code{NF}, -even when you assign the empty string to a field. For example: - -@example -$ echo a b c d | awk '@{ OFS = ":"; $2 = "" -> print $0; print NF @}' -@print{} a::c:d -@print{} 4 -@end example - -@noindent -The field is still there; it just has an empty value, denoted by -the two colons between @samp{a} and @samp{c}. -This example shows what happens if you create a new field: - -@example -$ echo a b c d | awk '@{ OFS = ":"; $2 = ""; $6 = "new" -> print $0; print NF @}' -@print{} a::c:d::new -@print{} 6 -@end example - -@noindent -The intervening field, @code{$5}, is created with an empty value -(indicated by the second pair of adjacent colons), -and @code{NF} is updated with the value six. - -@c FIXME: Verify that this is in POSIX -@cindex dark corner -Decrementing @code{NF} throws away the values of the fields -after the new value of @code{NF} and recomputes @code{$0}. -@value{DARKCORNER} -Here is an example: - -@example -$ echo a b c d e f | awk '@{ print "NF =", NF; -> NF = 3; print $0 @}' -@print{} NF = 6 -@print{} a b c -@end example - -@cindex portability issues -@strong{Caution:} Some versions of @command{awk} don't -rebuild @code{$0} when @code{NF} is decremented. Caveat emptor. - -@node Field Separators, Constant Size, Changing Fields, Reading Files -@section Specifying How Fields Are Separated - -@menu -* Regexp Field Splitting:: Using regexps as the field separator. -* Single Character Fields:: Making each character a separate field. -* Command Line Field Separator:: Setting @code{FS} from the command-line. -* Field Splitting Summary:: Some final points and a summary table. -@end menu - -@cindex @code{FS} variable -@cindex fields, separating -@cindex field separator, @code{FS} -The @dfn{field separator}, which is either a single character or a regular -expression, controls the way @command{awk} splits an input record into fields. -@command{awk} scans the input record for character sequences that -match the separator; the fields themselves are the text between the matches. - -In the examples that follow, we use the bullet symbol (@bullet{}) to -represent spaces in the output. -If the field separator is @samp{oo}, then the following line: - -@example -moo goo gai pan -@end example - -@noindent -is split into three fields: @samp{m}, @samp{@bullet{}g}, and -@samp{@bullet{}gai@bullet{}pan}. -Note the leading spaces in the values of the second and third fields. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -The field separator is represented by the built-in variable @code{FS}. -Shell programmers take note: @command{awk} does @emph{not} use the -name @code{IFS} that is used by the POSIX-compliant shells (such as -the Unix Bourne shell, @command{sh}, or @command{bash}). - -The value of @code{FS} can be changed in the @command{awk} program with the -assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}). -Often the right time to do this is at the beginning of execution -before any input has been processed, so that the very first record -is read with the proper separator. To do this, use the special -@code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). -For example, here we set the value of @code{FS} to the string -@code{","}: - -@example -awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}' -@end example - -@noindent -Given the input line: - -@example -John Q. Smith, 29 Oak St., Walamazoo, MI 42139 -@end example - -@noindent -this @command{awk} program extracts and prints the string -@samp{@bullet{}29@bullet{}Oak@bullet{}St.}. - -@cindex field separator, choice of -@cindex regular expressions as field separators -Sometimes the input data contains separator characters that don't -separate fields the way you thought they would. For instance, the -person's name in the example we just used might have a title or -suffix attached, such as: - -@example -John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139 -@end example - -@noindent -The same program would extract @samp{@bullet{}LXIX}, instead of -@samp{@bullet{}29@bullet{}Oak@bullet{}St.}. -If you were expecting the program to print the -address, you would be surprised. The moral is to choose your data layout and -separator characters carefully to prevent such problems. -(If the data is not in a form that is easy to process, perhaps you -can massage it first with a separate @command{awk} program.) - -Fields are normally separated by whitespace sequences -(spaces, tabs, and newlines), not by single spaces. Two spaces in a row do not -delimit an empty field. The default value of the field separator @code{FS} -is a string containing a single space, @w{@code{" "}}. If @command{awk} -interpreted this value in the usual way, each space character would separate -fields, so two spaces in a row would make an empty field between them. -The reason this does not happen is that a single space as the value of -@code{FS} is a special case---it is taken to specify the default manner -of delimiting fields. - -If @code{FS} is any other single character, such as @code{","}, then -each occurrence of that character separates two fields. Two consecutive -occurrences delimit an empty field. If the character occurs at the -beginning or the end of the line, that too delimits an empty field. The -space character is the only single character that does not follow these -rules. - -@node Regexp Field Splitting, Single Character Fields, Field Separators, Field Separators -@subsection Using Regular Expressions to Separate Fields - -The previous @value{SUBSECTION} -discussed the use of single characters or simple strings as the -value of @code{FS}. -More generally, the value of @code{FS} may be a string containing any -regular expression. In this case, each match in the record for the regular -expression separates fields. For example, the assignment: - -@example -FS = ", \t" -@end example - -@noindent -makes every area of an input line that consists of a comma followed by a -space and a tab into a field separator. -@ifinfo -(@samp{\t} -is an @dfn{escape sequence} that stands for a tab; -@pxref{Escape Sequences}, -for the complete list of similar escape sequences.) -@end ifinfo - -For a less trivial example of a regular expression, try using -single spaces to separate fields the way single commas are used. -@code{FS} can be set to @w{@code{"[@ ]"}} (left bracket, space, right -bracket). This regular expression matches a single space and nothing else -(@pxref{Regexp, ,Regular Expressions}). - -There is an important difference between the two cases of @samp{FS = @w{" "}} -(a single space) and @samp{FS = @w{"[ \t\n]+"}} -(a regular expression matching one or more spaces, tabs, or newlines). -For both values of @code{FS}, fields are separated by @dfn{runs} -(multiple adjacent occurrences) of spaces, tabs, -and/or newlines. However, when the value of @code{FS} is @w{@code{" "}}, -@command{awk} first strips leading and trailing whitespace from -the record and then decides where the fields are. -For example, the following pipeline prints @samp{b}: - -@example -$ echo ' a b c d ' | awk '@{ print $2 @}' -@print{} b -@end example - -@noindent -However, this pipeline prints @samp{a} (note the extra spaces around -each letter): - -@example -$ echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t\n]+" @} -> @{ print $2 @}' -@print{} a -@end example - -@noindent -@cindex null string -@cindex empty string -In this case, the first field is @dfn{null} or empty. - -The stripping of leading and trailing whitespace also comes into -play whenever @code{$0} is recomputed. For instance, study this pipeline: - -@example -$ echo ' a b c d' | awk '@{ print; $2 = $2; print @}' -@print{} a b c d -@print{} a b c d -@end example - -@noindent -The first @code{print} statement prints the record as it was read, -with leading whitespace intact. The assignment to @code{$2} rebuilds -@code{$0} by concatenating @code{$1} through @code{$NF} together, -separated by the value of @code{OFS}. Because the leading whitespace -was ignored when finding @code{$1}, it is not part of the new @code{$0}. -Finally, the last @code{print} statement prints the new @code{$0}. - -@node Single Character Fields, Command Line Field Separator, Regexp Field Splitting, Field Separators -@subsection Making Each Character a Separate Field - -@cindex differences between @command{gawk} and @command{awk} -@cindex single-character fields -There are times when you may want to examine each character -of a record separately. This can be done in @command{gawk} by -simply assigning the null string (@code{""}) to @code{FS}. In this case, -each individual character in the record becomes a separate field. -For example: - -@example -$ echo a b | gawk 'BEGIN @{ FS = "" @} -> @{ -> for (i = 1; i <= NF; i = i + 1) -> print "Field", i, "is", $i -> @}' -@print{} Field 1 is a -@print{} Field 2 is -@print{} Field 3 is b -@end example - -@cindex dark corner -Traditionally, the behavior of @code{FS} equal to @code{""} was not defined. -In this case, most versions of Unix @command{awk} simply treat the entire record -as only having one field. -@value{DARKCORNER} -In compatibility mode -(@pxref{Options, ,Command-Line Options}), -if @code{FS} is the null string, then @command{gawk} also -behaves this way. - -@node Command Line Field Separator, Field Splitting Summary, Single Character Fields, Field Separators -@subsection Setting @code{FS} from the Command Line -@cindex @code{-F} option -@cindex command-line option, @code{-F} -@cindex field separator, on command line -@cindex command line, setting @code{FS} on - -@code{FS} can be set on the command line. Use the @option{-F} option to -do so. For example: - -@example -awk -F, '@var{program}' @var{input-files} -@end example - -@noindent -sets @code{FS} to the @samp{,} character. Notice that the option uses -a capital @samp{F} instead of a lowercase @option{-f}, which specifies a file -containing an @command{awk} program. Case is significant in command-line -options: -the @option{-F} and @option{-f} options have nothing to do with each other. -You can use both options at the same time to set the @code{FS} variable -@emph{and} get an @command{awk} program from a file. - -The value used for the argument to @option{-F} is processed in exactly the -same way as assignments to the built-in variable @code{FS}. -Any special characters in the field separator must be escaped -appropriately. For example, to use a @samp{\} as the field separator -on the command line, you would have to type: - -@example -# same as FS = "\\" -awk -F\\\\ '@dots{}' files @dots{} -@end example - -@noindent -Because @samp{\} is used for quoting in the shell, @command{awk} sees -@samp{-F\\}. Then @command{awk} processes the @samp{\\} for escape -characters (@pxref{Escape Sequences}), finally yielding -a single @samp{\} to use for the field separator. - -@cindex historical features -As a special case, in compatibility mode -(@pxref{Options, ,Command-Line Options}), -if the argument to @option{-F} is @samp{t}, then @code{FS} is set to -the tab character. If you type @samp{-F\t} at the -shell, without any quotes, the @samp{\} gets deleted, so @command{awk} -figures that you really want your fields to be separated with tabs and -not @samp{t}s. Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line -if you really do want to separate your fields with @samp{t}s. - -For example, let's use an @command{awk} program file called @file{baud.awk} -that contains the pattern @code{/300/} and the action @samp{print $1}: - -@example -/300/ @{ print $1 @} -@end example - -Let's also set @code{FS} to be the @samp{-} character and run the -program on the file @file{BBS-list}. The following command prints a -list of the names of the bulletin boards that operate at 300 baud and -the first three digits of their phone numbers: - -@c tweaked to make the tex output look better in @smallbook -@example -$ awk -F- -f baud.awk BBS-list -@print{} aardvark 555 -@print{} alpo -@print{} barfly 555 -@print{} bites 555 -@print{} camelot 555 -@print{} core 555 -@print{} fooey 555 -@print{} foot 555 -@print{} macfoo 555 -@print{} sdace 555 -@print{} sabafoo 555 -@end example - -@noindent -Note the second line of output. The second line -in the original file looked like this: - -@example -alpo-net 555-3412 2400/1200/300 A -@end example - -The @samp{-} as part of the system's name was used as the field -separator, instead of the @samp{-} in the phone number that was -originally intended. This demonstrates why you have to be careful in -choosing your field and record separators. - -Perhaps the most common use of a single character as the field -separator occurs when processing the Unix system password file. -On many Unix systems, each user has a separate entry in the system password -file, one line per user. The information in these lines is separated -by colons. The first field is the user's logon name and the second is -the user's (encrypted or shadow) password. A password file entry might look -like this: - -@cindex Robbins, Arnold -@example -arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/bash -@end example - -The following program searches the system password file and prints -the entries for users who have no password: - -@example -awk -F: '$2 == ""' /etc/passwd -@end example - -@node Field Splitting Summary, , Command Line Field Separator, Field Separators -@subsection Field Splitting Summary - -The following -table -summarizes how fields are split, based on the -value of @code{FS}. (@samp{==} means ``is equal to.'') - -@table @code -@item FS == " " -Fields are separated by runs of whitespace. Leading and trailing -whitespace are ignored. This is the default. - -@item FS == @var{any other single character} -Fields are separated by each occurrence of the character. Multiple -successive occurrences delimit empty fields, as do leading and -trailing occurrences. -The character can even be a regexp metacharacter; it does not need -to be escaped. - -@item FS == @var{regexp} -Fields are separated by occurrences of characters that match @var{regexp}. -Leading and trailing matches of @var{regexp} delimit empty fields. - -@item FS == "" -Each individual character in the record becomes a separate field. -(This is a @command{gawk} extension; it is not specified by the -POSIX standard.) -@end table - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Changing @code{FS} Does Not Affect the Fields - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -According to the POSIX standard, @command{awk} is supposed to behave -as if each record is split into fields at the time it is read. -In particular, this means that if you change the value of @code{FS} -after a record is read, the value of the fields (i.e., how they were split) -should reflect the old value of @code{FS}, not the new one. - -@cindex dark corner -@cindex @command{sed} utility -@cindex stream editor -However, many implementations of @command{awk} do not work this way. Instead, -they defer splitting the fields until a field is actually -referenced. The fields are split -using the @emph{current} value of @code{FS}! -@value{DARKCORNER} -This behavior can be difficult -to diagnose. The following example illustrates the difference -between the two methods. -(The @command{sed}@footnote{The @command{sed} utility is a ``stream editor.'' -Its behavior is also defined by the POSIX standard.} -command prints just the first line of @file{/etc/passwd}.) - -@example -sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}' -@end example - -@noindent -which usually prints: - -@example -root -@end example - -@noindent -on an incorrect implementation of @command{awk}, while @command{gawk} -prints something like: - -@example -root:nSijPlPhZZwgE:0:0:Root:/: -@end example - -@node Constant Size, Multiple Line, Field Separators, Reading Files -@section Reading Fixed-Width Data - -@ifnotinfo -@strong{Note:} This @value{SECTION} discusses an advanced -feature of @command{gawk}. If you are a novice @command{awk} user, -you might want to skip it on the first reading. -@end ifnotinfo - -@ifinfo -(This @value{SECTION} discusses an advanced feature of @command{awk}. -If you are a novice @command{awk} user, you might want to skip it on -the first reading.) -@end ifinfo - -@command{gawk} @value{PVERSION} 2.13 introduced a facility for dealing with -fixed-width fields with no distinctive field separator. For example, -data of this nature arises in the input for old Fortran programs where -numbers are run together, or in the output of programs that did not -anticipate the use of their output as input for other programs. - -An example of the latter is a table where all the columns are lined up by -the use of a variable number of spaces and @emph{empty fields are just -spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS} -does not work well in this case. Although a portable @command{awk} program -can use a series of @code{substr} calls on @code{$0} -(@pxref{String Functions, ,String Manipulation Functions}), -this is awkward and inefficient for a large number of fields. - -@cindex fatal errors -@cindex @command{w} utility -The splitting of an input record into fixed-width fields is specified by -assigning a string containing space-separated numbers to the built-in -variable @code{FIELDWIDTHS}. Each number specifies the width of the field, -@emph{including} columns between fields. If you want to ignore the columns -between fields, you can specify the width as a separate field that is -subsequently ignored. -It is a fatal error to supply a field width that is not a positive number. -The following data is the output of the Unix @command{w} utility. It is useful -to illustrate the use of @code{FIELDWIDTHS}: - -@example -@group - 10:06pm up 21 days, 14:04, 23 users -User tty login@ idle JCPU PCPU what -hzuo ttyV0 8:58pm 9 5 vi p24.tex -hzang ttyV3 6:37pm 50 -csh -eklye ttyV5 9:53pm 7 1 em thes.tex -dportein ttyV6 8:17pm 1:47 -csh -gierd ttyD3 10:00pm 1 elm -dave ttyD4 9:47pm 4 4 w -brent ttyp0 26Jun91 4:46 26:46 4:41 bash -dave ttyq4 26Jun9115days 46 46 wnewmail -@end group -@end example - -The following program takes the above input, converts the idle time to -number of seconds, and prints out the first two fields and the calculated -idle time. - -@strong{Note:} -This program uses a number of @command{awk} features that -haven't been introduced yet. - -@example -BEGIN @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @} -NR > 2 @{ - idle = $4 - sub(/^ */, "", idle) # strip leading spaces - if (idle == "") - idle = 0 - if (idle ~ /:/) @{ - split(idle, t, ":") - idle = t[1] * 60 + t[2] - @} - if (idle ~ /days/) - idle *= 24 * 60 * 60 - - print $1, $2, idle -@} -@end example - -Running the program on the data produces the following results: - -@example -hzuo ttyV0 0 -hzang ttyV3 50 -eklye ttyV5 0 -dportein ttyV6 107 -gierd ttyD3 1 -dave ttyD4 0 -brent ttyp0 286 -dave ttyq4 1296000 -@end example - -Another (possibly more practical) example of fixed-width input data -is the input from a deck of balloting cards. In some parts of -the United States, voters mark their choices by punching holes in computer -cards. These cards are then processed to count the votes for any particular -candidate or on any particular issue. Because a voter may choose not to -vote on some issue, any column on the card may be empty. An @command{awk} -program for processing such data could use the @code{FIELDWIDTHS} feature -to simplify reading the data. (Of course, getting @command{gawk} to run on -a system with card readers is another story!) - -@ignore -Exercise: Write a ballot card reading program -@end ignore - -Assigning a value to @code{FS} causes @command{gawk} to return to using -@code{FS} for field splitting. Use @samp{FS = FS} to make this happen, -without having to know the current value of @code{FS}. -In order to tell which kind of field splitting is in effect, -use @code{PROCINFO["FS"]} -(@pxref{Auto-set, ,Built-in Variables That Convey Information}). -The value is @code{"FS"} if regular field splitting is being used, -or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: - -@example -if (PROCINFO["FS"] == "FS") - @var{regular field splitting} @dots{} -else - @var{fixed-width field splitting} @dots{} -@end example - -This information is useful when writing a function -that needs to temporarily change @code{FS} or @code{FIELDWIDTHS}, -read some records, and then restore the original settings -(@pxref{Passwd Functions, ,Reading the User Database}, -for an example of such a function). - -@node Multiple Line, Getline, Constant Size, Reading Files -@section Multiple-Line Records - -@cindex multiple line records -@cindex input, multiple line records -@cindex reading files, multiple line records -@cindex records, multiple line -In some databases, a single line cannot conveniently hold all the -information in one entry. In such cases, you can use multiline -records. The first step in doing this is to choose your data format. - -One technique is to use an unusual character or string to separate -records. For example, you could use the formfeed character (written -@samp{\f} in @command{awk}, as in C) to separate them, making each record -a page of the file. To do this, just set the variable @code{RS} to -@code{"\f"} (a string containing the formfeed character). Any -other character could equally well be used, as long as it won't be part -of the data in a record. - -Another technique is to have blank lines separate records. By a special -dispensation, an empty string as the value of @code{RS} indicates that -records are separated by one or more blank lines. When @code{RS} is set -to the empty string, each record always ends at the first blank line -encountered. The next record doesn't start until the first non-blank -line that follows. No matter how many blank lines appear in a row, they -all act as one record separator. -(Blank lines must be completely empty; lines that contain only -whitespace do not count.) - -@cindex leftmost longest match -@cindex matching, leftmost longest -You can achieve the same effect as @samp{RS = ""} by assigning the -string @code{"\n\n+"} to @code{RS}. This regexp matches the newline -at the end of the record and one or more blank lines after the record. -In addition, a regular expression always matches the longest possible -sequence when there is a choice -(@pxref{Leftmost Longest, ,How Much Text Matches?}). -So the next record doesn't start until -the first non-blank line that follows---no matter how many blank lines -appear in a row, they are considered one record separator. - -@cindex dark corner -There is an important difference between @samp{RS = ""} and -@samp{RS = "\n\n+"}. In the first case, leading newlines in the input -@value{DF} are ignored, and if a file ends without extra blank lines -after the last record, the final newline is removed from the record. -In the second case, this special processing is not done. -@value{DARKCORNER} - -Now that the input is separated into records, the second step is to -separate the fields in the record. One way to do this is to divide each -of the lines into fields in the normal manner. This happens by default -as the result of a special feature. When @code{RS} is set to the empty -string, the newline character @emph{always} acts as a field separator. -This is in addition to whatever field separations result from @code{FS}. - -The original motivation for this special exception was probably to provide -useful behavior in the default case (i.e., @code{FS} is equal -to @w{@code{" "}}). This feature can be a problem if you really don't -want the newline character to separate fields, because there is no way to -prevent it. However, you can work around this by using the @code{split} -function to break up the record manually -(@pxref{String Functions, ,String Manipulation Functions}). - -Another way to separate fields is to -put each field on a separate line: to do this, just set the -variable @code{FS} to the string @code{"\n"}. (This simple regular -expression matches a single newline.) -A practical example of a @value{DF} organized this way might be a mailing -list, where each entry is separated by blank lines. Consider a mailing -list in a file named @file{addresses}, that looks like this: - -@example -Jane Doe -123 Main Street -Anywhere, SE 12345-6789 - -John Smith -456 Tree-lined Avenue -Smallville, MW 98765-4321 -@dots{} -@end example - -@noindent -A simple program to process this file is as follows: - -@example -# addrs.awk --- simple mailing list program - -# Records are separated by blank lines. -# Each line is one field. -BEGIN @{ RS = "" ; FS = "\n" @} - -@{ - print "Name is:", $1 - print "Address is:", $2 - print "City and State are:", $3 - print "" -@} -@end example - -Running the program produces the following output: - -@example -$ awk -f addrs.awk addresses -@print{} Name is: Jane Doe -@print{} Address is: 123 Main Street -@print{} City and State are: Anywhere, SE 12345-6789 -@print{} -@print{} Name is: John Smith -@print{} Address is: 456 Tree-lined Avenue -@print{} City and State are: Smallville, MW 98765-4321 -@print{} -@dots{} -@end example - -@xref{Labels Program, ,Printing Mailing Labels}, for a more realistic -program that deals with address lists. -The following -table -summarizes how records are split, based on the -value of -@ifinfo -@code{RS}. -(@samp{==} means ``is equal to.'') -@end ifinfo -@ifnotinfo -@code{RS}: -@end ifnotinfo - -@table @code -@item RS == "\n" -Records are separated by the newline character (@samp{\n}). In effect, -every line in the @value{DF} is a separate record, including blank lines. -This is the default. - -@item RS == @var{any single character} -Records are separated by each occurrence of the character. Multiple -successive occurrences delimit empty records. - -@item RS == "" -Records are separated by runs of blank lines. The newline character -always serves as a field separator, in addition to whatever value -@code{FS} may have. Leading and trailing newlines in a file are ignored. - -@item RS == @var{regexp} -Records are separated by occurrences of characters that match @var{regexp}. -Leading and trailing matches of @var{regexp} delimit empty records. -(This is a @command{gawk} extension, it is not specified by the -POSIX standard.) -@end table - -@cindex @code{RT} variable -In all cases, @command{gawk} sets @code{RT} to the input text that matched the -value specified by @code{RS}. - -@node Getline, , Multiple Line, Reading Files -@section Explicit Input with @code{getline} - -@cindex @code{getline} built-in function -@cindex input, explicit -@cindex explicit input -@cindex input, @code{getline} command -@cindex reading files, @code{getline} command -So far we have been getting our input data from @command{awk}'s main -input stream---either the standard input (usually your terminal, sometimes -the output from another program) or from the -files specified on the command line. The @command{awk} language has a -special built-in command called @code{getline} that -can be used to read input under your explicit control. - -The @code{getline} command is used in several different ways and should -@emph{not} be used by beginners. -The examples that follow the explanation of the @code{getline} command -include material that has not been covered yet. Therefore, come back -and study the @code{getline} command @emph{after} you have reviewed the -rest of this @value{DOCUMENT} and have a good knowledge of how @command{awk} works. - -@cindex @code{ERRNO} variable -@cindex differences between @command{gawk} and @command{awk} -@cindex @code{getline}, return values -The @code{getline} command returns one if it finds a record and zero if -the end of the file is encountered. If there is some error in getting -a record, such as a file that cannot be opened, then @code{getline} -returns @minus{}1. In this case, @command{gawk} sets the variable -@code{ERRNO} to a string describing the error that occurred. - -In the following examples, @var{command} stands for a string value that -represents a shell command. - -@menu -* Plain Getline:: Using @code{getline} with no arguments. -* Getline/Variable:: Using @code{getline} into a variable. -* Getline/File:: Using @code{getline} from a file. -* Getline/Variable/File:: Using @code{getline} into a variable from a - file. -* Getline/Pipe:: Using @code{getline} from a pipe. -* Getline/Variable/Pipe:: Using @code{getline} into a variable from a - pipe. -* Getline/Coprocess:: Using @code{getline} from a coprocess. -* Getline/Variable/Coprocess:: Using @code{getline} into a variable from a - coprocess. -* Getline Notes:: Important things to know about @code{getline}. -* Getline Summary:: Summary of @code{getline} Variants. -@end menu - -@node Plain Getline, Getline/Variable, Getline, Getline -@subsection Using @code{getline} with No Arguments - -The @code{getline} command can be used without arguments to read input -from the current input file. All it does in this case is read the next -input record and split it up into fields. This is useful if you've -finished processing the current record, but want to do some special -processing @emph{right now} on the next record. Here's an -example: - -@example -@{ - if ((t = index($0, "/*")) != 0) @{ - # value of `tmp' will be "" if t is 1 - tmp = substr($0, 1, t - 1) - u = index(substr($0, t + 2), "*/") - while (u == 0) @{ - if (getline <= 0) @{ - m = "unexpected EOF or error" - m = (m ": " ERRNO) - print m > "/dev/stderr" - exit - @} - t = -1 - u = index($0, "*/") - @} - # substr expression will be "" if */ - # occurred at end of line - $0 = tmp substr($0, u + 2) - @} - print $0 -@} -@end example - -This @command{awk} program deletes all C-style comments (@samp{/* @dots{} -*/}) from the input. By replacing the @samp{print $0} with other -statements, you could perform more complicated processing on the -decommented input, such as searching for matches of a regular -expression. (This program has a subtle problem---it does not work if one -comment ends and another begins on the same line.) - -@ignore -Exercise, -write a program that does handle multiple comments on the line. -@end ignore - -This form of the @code{getline} command sets @code{NF}, -@code{NR}, @code{FNR}, and the value of @code{$0}. - -@strong{Note:} The new value of @code{$0} is used to test -the patterns of any subsequent rules. The original value -of @code{$0} that triggered the rule that executed @code{getline} -is lost. -By contrast, the @code{next} statement reads a new record -but immediately begins processing it normally, starting with the first -rule in the program. @xref{Next Statement, ,The @code{next} Statement}. - -@node Getline/Variable, Getline/File, Plain Getline, Getline -@subsection Using @code{getline} into a Variable - -You can use @samp{getline @var{var}} to read the next record from -@command{awk}'s input into the variable @var{var}. No other processing is -done. -For example, suppose the next line is a comment or a special string, -and you want to read it without triggering -any rules. This form of @code{getline} allows you to read that line -and store it in a variable so that the main -read-a-line-and-check-each-rule loop of @command{awk} never sees it. -The following example swaps every two lines of input. -The program is as follows: - -@example -@{ - if ((getline tmp) > 0) @{ - print tmp - print $0 - @} else - print $0 -@} -@end example - -@noindent -It takes the following list: - -@example -wan -tew -free -phore -@end example - -@noindent -and produces these results: - -@example -tew -wan -phore -free -@end example - -The @code{getline} command used in this way sets only the variables -@code{NR} and @code{FNR} (and of course, @var{var}). The record is not -split into fields, so the values of the fields (including @code{$0}) and -the value of @code{NF} do not change. - -@node Getline/File, Getline/Variable/File, Getline/Variable, Getline -@subsection Using @code{getline} from a File - -@cindex input redirection -@cindex redirection of input -@cindex @code{<} I/O operator -Use @samp{getline < @var{file}} to read the next record from @var{file}. -Here @var{file} is a string-valued expression that -specifies the @value{FN}. @samp{< @var{file}} is called a @dfn{redirection} -because it directs input to come from a different place. -For example, the following -program reads its input record from the file @file{secondary.input} when it -encounters a first field with a value equal to 10 in the current input -file: - -@example -@{ - if ($1 == 10) @{ - getline < "secondary.input" - print - @} else - print -@} -@end example - -Because the main input stream is not used, the values of @code{NR} and -@code{FNR} are not changed. However, the record it reads is split into fields in -the normal manner, so the values of @code{$0} and the other fields are -changed, resulting in a new value of @code{NF}. - -@c Thanks to Paul Eggert for initial wording here -According to POSIX, @samp{getline < @var{expression}} is ambiguous if -@var{expression} contains unparenthesized operators other than -@samp{$}; for example, @samp{getline < dir "/" file} is ambiguous -because the concatenation operator is not parenthesized. You should -write it as @samp{getline < (dir "/" file)} if you want your program -to be portable to other @command{awk} implementations. -(It happens that @command{gawk} gets it right, but you should not -rely on this. Parentheses make it easier to read.) - -@node Getline/Variable/File, Getline/Pipe, Getline/File, Getline -@subsection Using @code{getline} into a Variable from a File - -Use @samp{getline @var{var} < @var{file}} to read input -from the file -@var{file}, and put it in the variable @var{var}. As above, @var{file} -is a string-valued expression that specifies the file from which to read. - -In this version of @code{getline}, none of the built-in variables are -changed and the record is not split into fields. The only variable -changed is @var{var}. -For example, the following program copies all the input files to the -output, except for records that say @w{@samp{@@include @var{filename}}}. -Such a record is replaced by the contents of the file -@var{filename}: - -@example -@{ - if (NF == 2 && $1 == "@@include") @{ - while ((getline line < $2) > 0) - print line - close($2) - @} else - print -@} -@end example - -Note here how the name of the extra input file is not built into -the program; it is taken directly from the data, from the second field on -the @samp{@@include} line. - -The @code{close} function is called to ensure that if two identical -@samp{@@include} lines appear in the input, the entire specified file is -included twice. -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}. - -One deficiency of this program is that it does not process nested -@samp{@@include} statements -(i.e., @samp{@@include} statements in included files) -the way a true macro preprocessor would. -@xref{Igawk Program, ,An Easy Way to Use Library Functions}, for a program -that does handle nested @samp{@@include} statements. - -@node Getline/Pipe, Getline/Variable/Pipe, Getline/Variable/File, Getline -@subsection Using @code{getline} from a Pipe - -@cindex @code{|} I/O operator -@cindex input pipeline -@cindex pipeline, input -The output of a command can also be piped into @code{getline}, using -@samp{@var{command} | getline}. In -this case, the string @var{command} is run as a shell command and its output -is piped into @command{awk} to be used as input. This form of @code{getline} -reads one record at a time from the pipe. -For example, the following program copies its input to its output, except for -lines that begin with @samp{@@execute}, which are replaced by the output -produced by running the rest of the line as a shell command: - -@example -@{ - if ($1 == "@@execute") @{ - tmp = substr($0, 10) - while ((tmp | getline) > 0) - print - close(tmp) - @} else - print -@} -@end example - -@noindent -The @code{close} function is called to ensure that if two identical -@samp{@@execute} lines appear in the input, the command is run for -each one. -@ifnottex -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}. -@end ifnottex -@c Exercise!! -@c This example is unrealistic, since you could just use system -Given the input: - -@example -foo -bar -baz -@@execute who -bletch -@end example - -@noindent -the program might produce: - -@cindex Robbins, Bill -@cindex Robbins, Miriam -@cindex Robbins, Arnold -@example -foo -bar -baz -arnold ttyv0 Jul 13 14:22 -miriam ttyp0 Jul 13 14:23 (murphy:0) -bill ttyp1 Jul 13 14:23 (murphy:0) -bletch -@end example - -@noindent -Notice that this program ran the command @command{who} and printed the result. -(If you try this program yourself, you will of course get different results, -depending upon who is logged in on your system.) - -This variation of @code{getline} splits the record into fields, sets the -value of @code{NF} and recomputes the value of @code{$0}. The values of -@code{NR} and @code{FNR} are not changed. - -@c Thanks to Paul Eggert for initial wording here -According to POSIX, @samp{@var{expression} | getline} is ambiguous if -@var{expression} contains unparenthesized operators other than -@samp{$}---for example, @samp{@w{"echo "} "date" | getline} is ambiguous -because the concatenation operator is not parenthesized. You should -write it as @samp{(@w{"echo "} "date") | getline} if you want your program -to be portable to other @command{awk} implementations. -@ifinfo -(It happens that @command{gawk} gets it right, but you should not -rely on this. Parentheses make it easier to read, anyway.) -@end ifinfo - -@node Getline/Variable/Pipe, Getline/Coprocess, Getline/Pipe, Getline -@subsection Using @code{getline} into a Variable from a Pipe - -When you use @samp{@var{command} | getline @var{var}}, the -output of @var{command} is sent through a pipe to -@code{getline} and into the variable @var{var}. For example, the -following program reads the current date and time into the variable -@code{current_time}, using the @command{date} utility, and then -prints it: - -@example -BEGIN @{ - "date" | getline current_time - close("date") - print "Report printed on " current_time -@} -@end example - -In this version of @code{getline}, none of the built-in variables are -changed and the record is not split into fields. - -@ifinfo -@c Thanks to Paul Eggert for initial wording here -According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if -@var{expression} contains unparenthesized operators other than -@samp{$}; for example, @samp{@w{"echo "} "date" | getline @var{var}} is ambiguous -because the concatenation operator is not parenthesized. You should -write it as @samp{(@w{"echo "} "date") | getline @var{var}} if you want your -program to be portable to other @command{awk} implementations. -(It happens that @command{gawk} gets it right, but you should not -rely on this. Parentheses make it easier to read, anyway.) -@end ifinfo - -@node Getline/Coprocess, Getline/Variable/Coprocess, Getline/Variable/Pipe, Getline -@subsection Using @code{getline} from a Coprocess -@cindex coprocess -@cindex @code{|&} I/O operator -@cindex differences between @command{gawk} and @command{awk} - -Input into @code{getline} from a pipe is a one-way operation. -The command that is started with @samp{@var{command} | getline} only -sends data @emph{to} your @command{awk} program. - -On occasion, you might want to send data to another program -for processing and then read the results back. -@command{gawk} allows you start a @dfn{coprocess}, with which two-way -communications are possible. This is done with the @samp{|&} -operator. -Typically, you write data to the coprocess first, and then -read results back, as shown in the following: - -@example -print "@var{some query}" |& "db_server" -"db_server" |& getline -@end example - -@noindent -which sends a query to @command{db_server} and then reads the results. - -The values of @code{NR} and -@code{FNR} are not changed, -because the main input stream is not used. -However, the record is split into fields in -the normal manner, thus changing the values of @code{$0}, the other fields, -and of @code{NF}. - -Coprocesses are an advanced feature. They are discussed here only because -this is the @value{SECTION} on @code{getline}. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, -where coprocesses are discussed in more detail. - -@node Getline/Variable/Coprocess, Getline Notes, Getline/Coprocess, Getline -@subsection Using @code{getline} into a Variable from a Coprocess - -When you use @samp{@var{command} |& getline @var{var}}, the output from -the coprocess @var{command} is sent through a two-way pipe to @code{getline} -and into the variable @var{var}. - -In this version of @code{getline}, none of the built-in variables are -changed and the record is not split into fields. The only variable -changed is @var{var}. - -@ifinfo -Coprocesses are an advanced feature. They are discussed here only because -this is the @value{SECTION} on @code{getline}. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, -where coprocesses are discussed in more detail. -@end ifinfo - -@node Getline Notes, Getline Summary, Getline/Variable/Coprocess, Getline -@subsection Points About @code{getline} to Remember -Here are some miscellaneous points about @code{getline} that -you should bear in mind: - -@itemize @bullet -@item -When @code{getline} changes the value of @code{$0} and @code{NF}, -@command{awk} does @emph{not} automatically jump to the start of the -program and start testing the new record against every pattern. -However, the new record is tested against any subsequent rules. - -@cindex differences between @command{gawk} and @command{awk} -@cindex limitations -@cindex implementation limits -@item -Many @command{awk} implementations limit the number of pipelines that an @command{awk} -program may have open to just one. In @command{gawk}, there is no such limit. -You can open as many pipelines (and coprocesses) as the underlying operating -system permits. - -@cindex side effects -@cindex @code{FILENAME} variable -@cindex dark corner -@cindex @code{getline}, setting @code{FILENAME} -@cindex @code{FILENAME}, being set by @code{getline} -@item -An interesting side effect occurs if you use @code{getline} without a -redirection inside a @code{BEGIN} rule. Because an unredirected @code{getline} -reads from the command-line @value{DF}s, the first @code{getline} command -causes @command{awk} to set the value of @code{FILENAME}. Normally, -@code{FILENAME} does not have a value inside @code{BEGIN} rules, because you -have not yet started to process the command-line @value{DF}s. -@value{DARKCORNER} -(@xref{BEGIN/END, , The @code{BEGIN} and @code{END} Special Patterns}, -also @pxref{Auto-set, ,Built-in Variables That Convey Information}.) -@end itemize - -@node Getline Summary, , Getline Notes, Getline -@subsection Summary of @code{getline} Variants - -The following table summarizes the eight variants of @code{getline}, -listing which built-in variables are set by each one. - -@multitable {@var{command} @code{|& getline} @var{var}} {1234567890123456789012345678901234567890} -@item @code{getline} @tab Sets @code{$0}, @code{NF}, @code{FNR} and @code{NR} - -@item @code{getline} @var{var} @tab Sets @var{var}, @code{FNR} and @code{NR} - -@item @code{getline <} @var{file} @tab Sets @code{$0} and @code{NF} - -@item @code{getline @var{var} < @var{file}} @tab Sets @var{var} - -@item @var{command} @code{| getline} @tab Sets @code{$0} and @code{NF} - -@item @var{command} @code{| getline} @var{var} @tab Sets @var{var} - -@item @var{command} @code{|& getline} @tab Sets @code{$0} and @code{NF} -(this is a @command{gawk} extension) - -@item @var{command} @code{|& getline} @var{var} @tab Sets @var{var} -(this is a @command{gawk} extension) -@end multitable - -@node Printing, Expressions, Reading Files, Top -@chapter Printing Output - -@cindex printing -@cindex output -One of the most common programming actions is to @dfn{print} or output, -some or all of the input. Use the @code{print} statement -for simple output, and the @code{printf} statement -for fancier formatting. -The @code{print} statement is not limited when -computing @emph{which} values to print. However, with two exceptions, -you cannot specify @emph{how} to print them---how many -columns, whether to use exponential notation or not, and so on. -(For the exceptions, @pxref{Output Separators}, and -@ref{OFMT, ,Controlling Numeric Output with @code{print}}.) -For that, you need the @code{printf} statement -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). - -Besides basic and formatted printing, this @value{CHAPTER} -also covers I/O redirections to files and pipes, introduces -the special @value{FN}s that @command{gawk} processes internally, -and discusses the @code{close} built-in function. - -@menu -* Print:: The @code{print} statement. -* Print Examples:: Simple examples of @code{print} statements. -* Output Separators:: The output separators and how to change them. -* OFMT:: Controlling Numeric Output With @code{print}. -* Printf:: The @code{printf} statement. -* Redirection:: How to redirect output to multiple files and - pipes. -* Special Files:: File name interpretation in @command{gawk}. - @command{gawk} allows access to inherited file - descriptors. -* Close Files And Pipes:: Closing Input and Output Files and Pipes. -@end menu - -@node Print, Print Examples, Printing, Printing -@section The @code{print} Statement -@cindex @code{print} statement - -The @code{print} statement is used to produce output with simple, standardized -formatting. Specify only the strings or numbers to print, in a -list separated by commas. They are output, separated by single spaces, -followed by a newline. The statement looks like this: - -@example -print @var{item1}, @var{item2}, @dots{} -@end example - -@noindent -The entire list of items may be optionally enclosed in parentheses. The -parentheses are necessary if any of the item expressions uses the @samp{>} -relational operator; otherwise it could be confused with a redirection -(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}). - -The items to print can be constant strings or numbers, fields of the -current record (such as @code{$1}), variables, or any @command{awk} -expression. Numeric values are converted to strings and then printed. - -The simple statement @samp{print} with no items is equivalent to -@samp{print $0}: it prints the entire current record. To print a blank -line, use @samp{print ""}, where @code{""} is the empty string. -To print a fixed piece of text, use a string constant, such as -@w{@code{"Don't Panic"}}, as one item. If you forget to use the -double quote characters, your text is taken as an @command{awk} -expression and you will probably get an error. Keep in mind that a -space is printed between any two items. - -@node Print Examples, Output Separators, Print, Printing -@section Examples of @code{print} Statements - -Each @code{print} statement makes at least one line of output. However, it -isn't limited to only one line. If an item value is a string that contains a -newline, the newline is output along with the rest of the string. A -single @code{print} statement can make any number of lines this way. - -The following is an example of printing a string that contains embedded newlines -(the @samp{\n} is an escape sequence, used to represent the newline -character; @pxref{Escape Sequences}): - -@example -$ awk 'BEGIN @{ print "line one\nline two\nline three" @}' -@print{} line one -@print{} line two -@print{} line three -@end example - -The next example, which is run on the @file{inventory-shipped} file, -prints the first two fields of each input record, with a space between -them: - -@example -$ awk '@{ print $1, $2 @}' inventory-shipped -@print{} Jan 13 -@print{} Feb 15 -@print{} Mar 15 -@dots{} -@end example - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -A common mistake in using the @code{print} statement is to omit the comma -between two items. This often has the effect of making the items run -together in the output, with no space. The reason for this is that -juxtaposing two string expressions in @command{awk} means to concatenate -them. Here is the same program, without the comma: - -@example -$ awk '@{ print $1 $2 @}' inventory-shipped -@print{} Jan13 -@print{} Feb15 -@print{} Mar15 -@dots{} -@end example - -To someone unfamiliar with the @file{inventory-shipped} file, neither -example's output makes much sense. A heading line at the beginning -would make it clearer. Let's add some headings to our table of months -(@code{$1}) and green crates shipped (@code{$2}). We do this using the -@code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}) -so that the headings are only printed once: - -@example -awk 'BEGIN @{ print "Month Crates" - print "----- ------" @} - @{ print $1, $2 @}' inventory-shipped -@end example - -@noindent -When run, the program prints the following: - -@example -Month Crates ------ ------ -Jan 13 -Feb 15 -Mar 15 -@dots{} -@end example - -@noindent -The only problem, however, is that the headings and the table data -don't line up! We can fix this by printing some spaces between the -two fields: - -@example -@group -awk 'BEGIN @{ print "Month Crates" - print "----- ------" @} - @{ print $1, " ", $2 @}' inventory-shipped -@end group -@end example - -Lining up columns this way can get pretty -complicated when there are many columns to fix. Counting spaces for two -or three columns is simple, but any more than this can take up -a lot of time. This is why the @code{printf} statement was -created (@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}); -one of its specialties is lining up columns of data. - -@cindex line continuation -@strong{Note:} You can continue either a @code{print} or -@code{printf} statement simply by putting a newline after any comma -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). - -@node Output Separators, OFMT, Print Examples, Printing -@section Output Separators - -@cindex output field separator, @code{OFS} -@cindex output record separator, @code{ORS} -@cindex @code{OFS} variable -@cindex @code{ORS} variable -As mentioned previously, a @code{print} statement contains a list -of items separated by commas. In the output, the items are normally -separated by single spaces. However, this doesn't need to be the case; -a single space is only the default. Any string of -characters may be used as the @dfn{output field separator} by setting the -built-in variable @code{OFS}. The initial value of this variable -is the string @w{@code{" "}}---that is, a single space. - -The output from an entire @code{print} statement is called an -@dfn{output record}. Each @code{print} statement outputs one output -record, and then outputs a string called the @dfn{output record separator} -(or @code{ORS}). The initial -value of @code{ORS} is the string @code{"\n"}; i.e., a newline -character. Thus, each @code{print} statement normally makes a separate line. - -In order to change how output fields and records are separated, assign -new values to the variables @code{OFS} and @code{ORS}. The usual -place to do this is in the @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), so -that it happens before any input is processed. It can also be done -with assignments on the command line, before the names of the input -files, or using the @option{-v} command-line option -(@pxref{Options, ,Command-Line Options}). -The following example prints the first and second fields of each input -record, separated by a semicolon, with a blank line added after each -newline: - -@ignore -Exercise, -Rewrite the -@example -awk 'BEGIN @{ print "Month Crates" - print "----- ------" @} - @{ print $1, " ", $2 @}' inventory-shipped -@end example -program by using a new value of @code{OFS}. -@end ignore - -@example -$ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @} -> @{ print $1, $2 @}' BBS-list -@print{} aardvark;555-5553 -@print{} -@print{} alpo-net;555-3412 -@print{} -@print{} barfly;555-7685 -@dots{} -@end example - -If the value of @code{ORS} does not contain a newline, the program's output -is run together on a single line. - -@node OFMT, Printf, Output Separators, Printing -@section Controlling Numeric Output with @code{print} -@cindex @code{OFMT} variable -@cindex numeric output format -@cindex format, numeric output -@cindex output format specifier, @code{OFMT} -When the @code{print} statement is used to print numeric values, -@command{awk} internally converts the number to a string of characters -and prints that string. @command{awk} uses the @code{sprintf} function -to do this conversion -(@pxref{String Functions, ,String Manipulation Functions}). -For now, it suffices to say that the @code{sprintf} -function accepts a @dfn{format specification} that tells it how to format -numbers (or strings), and that there are a number of different ways in which -numbers can be formatted. The different format specifications are discussed -more fully in -@ref{Control Letters, , Format-Control Letters}. - -The built-in variable @code{OFMT} contains the default format specification -that @code{print} uses with @code{sprintf} when it wants to convert a -number to a string for printing. -The default value of @code{OFMT} is @code{"%.6g"}. -The way @code{print} prints numbers can be changed -by supplying different format specifications -as the value of @code{OFMT}, as shown in the following example: - -@example -$ awk 'BEGIN @{ -> OFMT = "%.0f" # print numbers as integers (rounds) -> print 17.23, 17.54 @}' -@print{} 17 18 -@end example - -@noindent -@cindex dark corner -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -According to the POSIX standard, @command{awk}'s behavior is undefined -if @code{OFMT} contains anything but a floating-point conversion specification. -@value{DARKCORNER} - -@node Printf, Redirection, OFMT, Printing -@section Using @code{printf} Statements for Fancier Printing -@cindex formatted output -@cindex output, formatted -@cindex @code{printf} statement - -For more precise control over the output format than what is -normally provided by @code{print}, use @code{printf}. -@code{printf} can be used to -specify the width to use for each item, as well as various -formatting choices for numbers (such as what output base to use, whether to -print an exponent, whether to print a sign, and how many digits to print -after the decimal point). This is done by supplying a string, called -the @dfn{format string}, that controls how and where to print the other -arguments. - -@menu -* Basic Printf:: Syntax of the @code{printf} statement. -* Control Letters:: Format-control letters. -* Format Modifiers:: Format-specification modifiers. -* Printf Examples:: Several examples. -@end menu - -@node Basic Printf, Control Letters, Printf, Printf -@subsection Introduction to the @code{printf} Statement - -@cindex @code{printf} statement, syntax of -A simple @code{printf} statement looks like this: - -@example -printf @var{format}, @var{item1}, @var{item2}, @dots{} -@end example - -@noindent -The entire list of arguments may optionally be enclosed in parentheses. The -parentheses are necessary if any of the item expressions use the @samp{>} -relational operator; otherwise it can be confused with a redirection -(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}). - -@cindex format string -The difference between @code{printf} and @code{print} is the @var{format} -argument. This is an expression whose value is taken as a string; it -specifies how to output each of the other arguments. It is called the -@dfn{format string}. - -The format string is very similar to that in the ISO C library function -@code{printf}. Most of @var{format} is text to output verbatim. -Scattered among this text are @dfn{format specifiers}---one per item. -Each format specifier says to output the next item in the argument list -at that place in the format. - -The @code{printf} statement does not automatically append a newline -to its output. It outputs only what the format string specifies. -So if a newline is needed, you must include one in the format string. -The output separator variables @code{OFS} and @code{ORS} have no effect -on @code{printf} statements. For example: - -@example -$ awk 'BEGIN @{ -> ORS = "\nOUCH!\n"; OFS = "+" -> msg = "Dont Panic!" -> printf "%s\n", msg -> @}' -@print{} Dont Panic! -@end example - -@noindent -Here, neither the @samp{+} nor the @samp{OUCH} appear when -the message is printed. - -@node Control Letters, Format Modifiers, Basic Printf, Printf -@subsection Format-Control Letters -@cindex @code{printf}, format-control characters -@cindex format specifier, @code{printf} - -A format specifier starts with the character @samp{%} and ends with -a @dfn{format-control letter}---it tells the @code{printf} statement -how to output one item. The format-control letter specifies what @emph{kind} -of value to print. The rest of the format specifier is made up of -optional @dfn{modifiers} that control @emph{how} to print the value, such as -the field width. Here is a list of the format-control letters: - -@table @code -@item %c -This prints a number as an ASCII character; thus, @samp{printf "%c", -65} outputs the letter @samp{A}. (The output for a string value is -the first character of the string.) - -@item %d@r{,} %i -These are equivalent; they both print a decimal integer. -(The @samp{%i} specification is for compatibility with ISO C.) - -@item %e@r{,} %E -These print a number in scientific (exponential) notation; -for example: - -@example -printf "%4.3e\n", 1950 -@end example - -@noindent -prints @samp{1.950e+03}, with a total of four significant figures, three of -which follow the decimal point. -(The @samp{4.3} represents two modifiers, -discussed in the next @value{SUBSECTION}.) -@samp{%E} uses @samp{E} instead of @samp{e} in the output. - -@item %f -This prints a number in floating-point notation. -For example: - -@example -printf "%4.3f", 1950 -@end example - -@noindent -prints @samp{1950.000}, with a total of four significant figures, three of -which follow the decimal point. -(The @samp{4.3} represents two modifiers, -discussed in the next @value{SUBSECTION}.) - -@item %g@r{,} %G -These print a number in either scientific notation or in floating-point -notation, whichever uses fewer characters; if the result is printed in -scientific notation, @samp{%G} uses @samp{E} instead of @samp{e}. - -@item %o -This prints an unsigned octal integer. - -@item %s -This prints a string. - -@item %u -This prints an unsigned decimal integer. -(This format is of marginal use, because all numbers in @command{awk} -are floating-point; it is provided primarily for compatibility with C.) - -@item %x@r{,} %X -These print an unsigned hexadecimal integer; -@samp{%X} uses the letters @samp{A} through @samp{F} -instead of @samp{a} through @samp{f}. - -@item %% -This isn't a format-control letter but it does have meaning---the -sequence @samp{%%} outputs one @samp{%}; it does not consume an -argument and it ignores any modifiers. -@end table - -@cindex dark corner -@strong{Note:} -When using the integer format-control letters for values that are outside -the range of a C @code{long} integer, @command{gawk} switches to the -@samp{%g} format specifier. Other versions of @command{awk} may print -invalid values or do something else entirely. -@value{DARKCORNER} - -@node Format Modifiers, Printf Examples, Control Letters, Printf -@subsection Modifiers for @code{printf} Formats - -@cindex @code{printf}, modifiers -@cindex modifiers (in format specifiers) -A format specification can also include @dfn{modifiers} that can control -how much of the item's value is printed, as well as how much space it gets. -The modifiers come between the @samp{%} and the format-control letter. -We will use the bullet symbol ``@bullet{}'' in the following examples to -represent -spaces in the output. Here are the possible modifiers, in the order in -which they may appear: - -@table @code -@cindex differences between @command{gawk} and @command{awk} -@cindex @code{printf}, positional specifier -@cindex positional specifier, @code{printf} -@item @var{N}$ -An integer constant followed by a @samp{$} is a @dfn{positional specifier}. -Normally, format specifications are applied to arguments in the order -given in the format string. With a positional specifier, the format -specification is applied to a specific argument, instead of what -would be the next argument in the list. Positional specifiers begin -counting with one: - -@example -printf "%s %s\n", "don't", "panic" -printf "%2$s %1$s\n", "panic", "don't" -@end example - -@noindent -prints the famous friendly message twice. - -At first glance, this feature doesn't seem to be of much use. -It is in fact a @command{gawk} extension, intended for use in translating -messages at runtime. -@xref{Printf Ordering, , Rearranging @code{printf} Arguments}, -which describes how and why to use positional specifiers. -For now, we will not use them. - -@item - -The minus sign, used before the width modifier (see further on in -this table), -says to left-justify -the argument within its specified width. Normally, the argument -is printed right-justified in the specified width. Thus: - -@example -printf "%-4s", "foo" -@end example - -@noindent -prints @samp{foo@bullet{}}. - -@item @var{space} -For numeric conversions, prefix positive values with a space and -negative values with a minus sign. - -@item + -The plus sign, used before the width modifier (see further on in -this table), -says to always supply a sign for numeric conversions, even if the data -to format is positive. The @samp{+} overrides the space modifier. - -@item # -Use an ``alternate form'' for certain control letters. -For @samp{%o}, supply a leading zero. -For @samp{%x} and @samp{%X}, supply a leading @samp{0x} or @samp{0X} for -a nonzero result. -For @samp{%e}, @samp{%E}, and @samp{%f}, the result always contains a -decimal point. -For @samp{%g} and @samp{%G}, trailing zeros are not removed from the result. - -@cindex dark corner -@item 0 -A leading @samp{0} (zero) acts as a flag that indicates that output should be -padded with zeros instead of spaces. -This applies even to non-numeric output formats. -@value{DARKCORNER} -This flag only has an effect when the field width is wider than the -value to print. - -@item @var{width} -This is a number specifying the desired minimum width of a field. Inserting any -number between the @samp{%} sign and the format-control character forces the -field to expand to this width. The default way to do this is to -pad with spaces on the left. For example: - -@example -printf "%4s", "foo" -@end example - -@noindent -prints @samp{@bullet{}foo}. - -The value of @var{width} is a minimum width, not a maximum. If the item -value requires more than @var{width} characters, it can be as wide as -necessary. Thus, the following: - -@example -printf "%4s", "foobar" -@end example - -@noindent -prints @samp{foobar}. - -Preceding the @var{width} with a minus sign causes the output to be -padded with spaces on the right, instead of on the left. - -@item .@var{prec} -A period followed by an integer constant -specifies the precision to use when printing. -The meaning of the precision varies by control letter: - -@table @asis -@item @code{%e}, @code{%E}, @code{%f} -Number of digits to the right of the decimal point. - -@item @code{%g}, @code{%G} -Maximum number of significant digits. - -@item @code{%d}, @code{%i}, @code{%o}, @code{%u}, @code{%x}, @code{%X} -Minimum number of digits to print. - -@item @code{%s} -Maximum number of characters from the string that should print. -@end table - -Thus, the following: - -@example -printf "%.4s", "foobar" -@end example - -@noindent -prints @samp{foob}. -@end table - -The C library @code{printf}'s dynamic @var{width} and @var{prec} -capability (for example, @code{"%*.*s"}) is supported. Instead of -supplying explicit @var{width} and/or @var{prec} values in the format -string, they are passed in the argument list. For example: - -@example -w = 5 -p = 3 -s = "abcdefg" -printf "%*.*s\n", w, p, s -@end example - -@noindent -is exactly equivalent to: - -@example -s = "abcdefg" -printf "%5.3s\n", s -@end example - -@noindent -Both programs output @samp{@w{@bullet{}@bullet{}abc}}. -Earlier versions of @command{awk} did not support this capability. -If you must use such a version, you may simulate this feature by using -concatenation to build up the format string, like so: - -@example -w = 5 -p = 3 -s = "abcdefg" -printf "%" w "." p "s\n", s -@end example - -@noindent -This is not particularly easy to read but it does work. - -@cindex fatal errors -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@cindex lint checks -C programmers may be used to supplying additional -@samp{l}, @samp{L}, and @samp{h} -modifiers in @code{printf} format strings. These are not valid in @command{awk}. -Most @command{awk} implementations silently ignore these modifiers. -If @option{--lint} is provided on the command line -(@pxref{Options, ,Command-Line Options}), -@command{gawk} warns about their use. If @option{--posix} is supplied, -their use is a fatal error. - -@node Printf Examples, , Format Modifiers, Printf -@subsection Examples Using @code{printf} - -The following is a simple example of -how to use @code{printf} to make an aligned table: - -@example -awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@end example - -@noindent -This command -prints the names of the bulletin boards (@code{$1}) in the file -@file{BBS-list} as a string of 10 characters that are left-justified. It also -prints the phone numbers (@code{$2}) next on the line. This -produces an aligned two-column table of names and phone numbers, -as shown here: - -@example -$ awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@print{} aardvark 555-5553 -@print{} alpo-net 555-3412 -@print{} barfly 555-7685 -@print{} bites 555-1675 -@print{} camelot 555-0542 -@print{} core 555-2912 -@print{} fooey 555-1234 -@print{} foot 555-6699 -@print{} macfoo 555-6480 -@print{} sdace 555-3430 -@print{} sabafoo 555-2127 -@end example - -In this case, the phone numbers had to be printed as strings because -the numbers are separated by a dash. Printing the phone numbers as -numbers would have produced just the first three digits: @samp{555}. -This would have been pretty confusing. - -It wasn't necessary to specify a width for the phone numbers because -they are last on their lines. They don't need to have spaces -after them. - -The table could be made to look even nicer by adding headings to the -tops of the columns. This is done using the @code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}) -so that the headers are only printed once, at the beginning of -the @command{awk} program: - -@example -awk 'BEGIN @{ print "Name Number" - print "---- ------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@end example - -The above example mixed @code{print} and @code{printf} statements in -the same program. Using just @code{printf} statements can produce the -same results: - -@example -awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number" - printf "%-10s %s\n", "----", "------" @} - @{ printf "%-10s %s\n", $1, $2 @}' BBS-list -@end example - -@noindent -Printing each column heading with the same format specification -used for the column elements ensures that the headings -are aligned just like the columns. - -The fact that the same format specification is used three times can be -emphasized by storing it in a variable, like this: - -@example -awk 'BEGIN @{ format = "%-10s %s\n" - printf format, "Name", "Number" - printf format, "----", "------" @} - @{ printf format, $1, $2 @}' BBS-list -@end example - -@c !!! exercise -At this point, it would be a worthwhile exercise to use the -@code{printf} statement to line up the headings and table data for the -@file{inventory-shipped} example that was covered earlier in the @value{SECTION} -on the @code{print} statement -(@pxref{Print, ,The @code{print} Statement}). - -@node Redirection, Special Files, Printf, Printing -@section Redirecting Output of @code{print} and @code{printf} - -@cindex output redirection -@cindex redirection of output -So far, the output from @code{print} and @code{printf} has gone -to the standard -output, usually the terminal. Both @code{print} and @code{printf} can -also send their output to other places. -This is called @dfn{redirection}. - -A redirection appears after the @code{print} or @code{printf} statement. -Redirections in @command{awk} are written just like redirections in shell -commands, except that they are written inside the @command{awk} program. - -There are four forms of output redirection: output to a file, output -appended to a file, output through a pipe to another command, and output -to a coprocess. They are all shown for the @code{print} statement, -but they work identically for @code{printf}: - -@table @code -@cindex @code{>} I/O operator -@item print @var{items} > @var{output-file} -This type of redirection prints the items into the output file named -@var{output-file}. The @value{FN} @var{output-file} can be any -expression. Its value is changed to a string and then used as a -@value{FN} (@pxref{Expressions}). - -When this type of redirection is used, the @var{output-file} is erased -before the first output is written to it. Subsequent writes to the same -@var{output-file} do not erase @var{output-file}, but append to it. -(This is different from how you use redirections in shell scripts.) -If @var{output-file} does not exist, it is created. For example, here -is how an @command{awk} program can write a list of BBS names to one -file named @file{name-list}, and a list of phone numbers to another file -named @file{phone-list}: - -@example -$ awk '@{ print $2 > "phone-list" -> print $1 > "name-list" @}' BBS-list -$ cat phone-list -@print{} 555-5553 -@print{} 555-3412 -@dots{} -$ cat name-list -@print{} aardvark -@print{} alpo-net -@dots{} -@end example - -@noindent -Each output file contains one name or number per line. - -@cindex @code{>>} I/O operator -@item print @var{items} >> @var{output-file} -This type of redirection prints the items into the pre-existing output file -named @var{output-file}. The difference between this and the -single-@samp{>} redirection is that the old contents (if any) of -@var{output-file} are not erased. Instead, the @command{awk} output is -appended to the file. -If @var{output-file} does not exist, then it is created. - -@cindex @code{|} I/O operator -@cindex pipes for output -@cindex output, piping -@item print @var{items} | @var{command} -It is also possible to send output to another program through a pipe -instead of into a file. This type of redirection opens a pipe to -@var{command}, and writes the values of @var{items} through this pipe -to another process created to execute @var{command}. - -The redirection argument @var{command} is actually an @command{awk} -expression. Its value is converted to a string whose contents give -the shell command to be run. For example, the following produces two -files, one unsorted list of BBS names, and one list sorted in reverse -alphabetical order: - -@ignore -10/2000: -This isn't the best style, since COMMAND is assigned for each -record. It's done to avoid overfull hboxes in TeX. Leave it -alone for now and let's hope no-one notices. -@end ignore - -@example -awk '@{ print $1 > "names.unsorted" - command = "sort -r > names.sorted" - print $1 | command @}' BBS-list -@end example - -The unsorted list is written with an ordinary redirection, while -the sorted list is written by piping through the @command{sort} utility. - -The next example uses redirection to mail a message to the mailing -list @samp{bug-system}. This might be useful when trouble is encountered -in an @command{awk} script run periodically for system maintenance: - -@example -report = "mail bug-system" -print "Awk script failed:", $0 | report -m = ("at record number " FNR " of " FILENAME) -print m | report -close(report) -@end example - -The message is built using string concatenation and saved in the variable -@code{m}. It is then sent down the pipeline to the @command{mail} program. -(The parentheses group the items to concatenate---see -@ref{Concatenation, ,String Concatenation}.) - -The @code{close} function is called here because it's a good idea to close -the pipe as soon as all the intended output has been sent to it. -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}, -for more information on this. - -This example also illustrates the use of a variable to represent -a @var{file} or @var{command}---it is not necessary to always -use a string constant. Using a variable is generally a good idea, -because @command{awk} requires that the string value be spelled identically -every time. - -@cindex coprocess -@cindex @code{|&} I/O operator -@cindex differences between @command{gawk} and @command{awk} -@item print @var{items} |& @var{command} -This type of redirection prints the items to the input of @var{command}. -The difference between this and the -single-@samp{|} redirection is that the output from @var{command} -can be read with @code{getline}. -Thus @var{command} is a @dfn{coprocess}, that works together with, -but subsidiary to, the @command{awk} program. - -This feature is a @command{gawk} extension, and is not available in -POSIX @command{awk}. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, -for a more complete discussion. -@end table - -Redirecting output using @samp{>}, @samp{>>}, @samp{|}, or @samp{|&} -asks the system to open a file, pipe, or coprocess, only if the particular -@var{file} or @var{command} you specify has not already been written -to by your program or if it has been closed since it was last written to. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -It is a common error to use @samp{>} redirection for the first @code{print} -to a file, and then to use @samp{>>} for subsequent output: - -@example -# clear the file -print "Don't panic" > "guide.txt" -@dots{} -# append -print "Avoid improbability generators" >> "guide.txt" -@end example - -@noindent -This is indeed how redirections must be used from the shell. But in -@command{awk}, it isn't necessary. In this kind of case, a program should -use @samp{>} for all the @code{print} statements, since the output file -is only opened once. - -@cindex differences between @command{gawk} and @command{awk} -@cindex limitations -@cindex implementation limits -@ifnotinfo -As mentioned earlier -(@pxref{Getline Notes, ,Points About @code{getline} to Remember}), -many -@end ifnotinfo -@ifnottex -Many -@end ifnottex -@command{awk} implementations limit the number of pipelines that an @command{awk} -program may have open to just one! In @command{gawk}, there is no such limit. -@command{gawk} allows a program to -open as many pipelines as the underlying operating system permits. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Piping into @command{sh} -@cindex advanced notes -@cindex shell, piping commands into -@cindex piping commands into the shell - -A particularly powerful way to use redirection is to build command lines, -and pipe them into the shell, @command{sh}. For example, suppose you -have a list of files brought over from a system where all the @value{FN}s -are stored in uppercase, and you wish to rename them to have names in -all lowercase. The following program is both simple and efficient: - -@cindex @command{mv} utility -@example -@{ printf("mv %s %s\n", $0, tolower($0)) | "sh" @} - -END @{ close("sh") @} -@end example - -The @code{tolower} function returns its argument string with all -uppercase characters converted to lowercase -(@pxref{String Functions, ,String Manipulation Functions}). -The program builds up a list of command lines, -using the @command{mv} utility to rename the files. -It then sends the list to the shell for execution. - -@node Special Files, Close Files And Pipes, Redirection, Printing -@section Special @value{FFN}s in @command{gawk} - -@command{gawk} provides a number of special @value{FN}s that it interprets -internally. These @value{FN}s provide access to standard file descriptors, -process-related information, and TCP/IP networking. - -@menu -* Special FD:: Special files for I/O. -* Special Process:: Special files for process information. -* Special Network:: Special files for network communications. -* Special Caveats:: Things to watch out for. -@end menu - -@node Special FD, Special Process, Special Files, Special Files -@subsection Special Files for Standard Descriptors -@cindex standard input -@cindex standard output -@cindex standard error output -@cindex file descriptors - -Running programs conventionally have three input and output streams -already available to them for reading and writing. These are known as -the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error -output}. These streams are, by default, connected to your terminal, but -they are often redirected with the shell, via the @samp{<}, @samp{<<}, -@samp{>}, @samp{>>}, @samp{>&}, and @samp{|} operators. Standard error -is typically used for writing error messages; the reason there are two separate -streams, standard output, and standard error, is so that they can be -redirected separately. - -@cindex differences between @command{gawk} and @command{awk} -In other implementations of @command{awk}, the only way to write an error -message to standard error in an @command{awk} program is as follows: - -@example -print "Serious error detected!" | "cat 1>&2" -@end example - -@noindent -This works by opening a pipeline to a shell command that can access the -standard error stream that it inherits from the @command{awk} process. -This is far from elegant, and it is also inefficient, because it requires a -separate process. So people writing @command{awk} programs often -don't do this. Instead, they send the error messages to the -terminal, like this: - -@example -print "Serious error detected!" > "/dev/tty" -@end example - -@noindent -This usually has the same effect but not always: although the -standard error stream is usually the terminal, it can be redirected; when -that happens, writing to the terminal is not correct. In fact, if -@command{awk} is run from a background job, it may not have a terminal at all. -Then opening @file{/dev/tty} fails. - -@command{gawk} provides special @value{FN}s for accessing the three standard -streams, as well as any other inherited open files. If the @value{FN} matches -one of these special names when @command{gawk} redirects input or output, -then it directly uses the stream that the @value{FN} stands for. -(These special @value{FN}s work for all operating systems that @command{gawk} -has been ported to, not just those that are POSIX-compliant.): - -@cindex @file{/dev/stdin} special file -@cindex @file{/dev/stdout} special file -@cindex @file{/dev/stderr} special file -@cindex @file{/dev/fd} special files -@table @file -@item /dev/stdin -The standard input (file descriptor 0). - -@item /dev/stdout -The standard output (file descriptor 1). - -@item /dev/stderr -The standard error output (file descriptor 2). - -@item /dev/fd/@var{N} -The file associated with file descriptor @var{N}. Such a file must -be opened by the program initiating the @command{awk} execution (typically -the shell). Unless special pains are taken in the shell from which -@command{gawk} is invoked, only descriptors 0, 1, and 2 are available. -@end table - -The @value{FN}s @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} -are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2}, -respectively. However, they are more self-explanatory. -The proper way to write an error message in a @command{gawk} program -is to use @file{/dev/stderr}, like this: - -@example -print "Serious error detected!" > "/dev/stderr" -@end example - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -Note the use of quotes around the @value{FN}. -Like any other redirection, the value must be a string. -It is a common error to omit the quotes, which leads -to confusing results. -@c Exercise: What does it do? :-) - -@node Special Process, Special Network, Special FD, Special Files -@subsection Special Files for Process-Related Information - -@command{gawk} also provides special @value{FN}s that give access to information -about the running @command{gawk} process. Each of these ``files'' provides -a single record of information. To read them more than once, they must -first be closed with the @code{close} function -(@pxref{Close Files And Pipes, ,Closing Input and Output Redirections}). -The @value{FN}s are: - -@cindex process information -@cindex @file{/dev/pid} special file -@cindex @file{/dev/pgrpid} special file -@cindex @file{/dev/ppid} special file -@cindex @file{/dev/user} special file -@table @file -@item /dev/pid -Reading this file returns the process ID of the current process, -in decimal form, terminated with a newline. - -@item /dev/ppid -Reading this file returns the parent process ID of the current process, -in decimal form, terminated with a newline. - -@item /dev/pgrpid -Reading this file returns the process group ID of the current process, -in decimal form, terminated with a newline. - -@item /dev/user -Reading this file returns a single record terminated with a newline. -The fields are separated with spaces. The fields represent the -following information: - -@table @code -@item $1 -The return value of the @code{getuid} system call -(the real user ID number). - -@item $2 -The return value of the @code{geteuid} system call -(the effective user ID number). - -@item $3 -The return value of the @code{getgid} system call -(the real group ID number). - -@item $4 -The return value of the @code{getegid} system call -(the effective group ID number). -@end table - -If there are any additional fields, they are the group IDs returned by -the @code{getgroups} system call. -(Multiple groups may not be supported on all systems.) -@end table - -These special @value{FN}s may be used on the command line as @value{DF}s, -as well as for I/O redirections within an @command{awk} program. -They may not be used as source files with the @option{-f} option. - -@cindex automatic warnings -@cindex warnings, automatic -@strong{Note:} -The special files that provide process-related information are now considered -obsolete and will disappear entirely -in the next release of @command{gawk}. -@command{gawk} prints a warning message every time you use one of -these files. -To obtain process-related information, use the @code{PROCINFO} array. -@xref{Auto-set, ,Built-in Variables That Convey Information}. - -@node Special Network, Special Caveats, Special Process, Special Files -@subsection Special Files for Network Communications - -Starting with @value{PVERSION} 3.1 of @command{gawk}, @command{awk} programs -can open a two-way -TCP/IP connection, acting as either a client or server. -This is done using a special @value{FN} of the form: - -@example -@file{/inet/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}} -@end example - -The @var{protocol} is one of @samp{tcp}, @samp{udp}, or @samp{raw}, -and the other fields represent the other essential pieces of information -for making a networking connection. -These @value{FN}s are used with the @samp{|&} operator for communicating -with a coprocess -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}). -This is an advanced feature, mentioned here only for completeness. -Full discussion is delayed until -@ref{TCP/IP Networking, ,Using @command{gawk} for Network Programming}. - -@node Special Caveats, , Special Network, Special Files -@subsection Special @value{FFN} Caveats - -Here is a list of things to bear in mind when using the -special @value{FN}s that @command{gawk} provides. - -@itemize @bullet -@item -Recognition of these special @value{FN}s is disabled if @command{gawk} is in -compatibility mode (@pxref{Options, ,Command-Line Options}). - -@cindex automatic warnings -@cindex warnings, automatic -@item -@ifnottex -The -@end ifnottex -@ifnotinfo -As mentioned earlier, the -@end ifnotinfo -special files that provide process-related information are now considered -obsolete and will disappear entirely -in the next release of @command{gawk}. -@command{gawk} prints a warning message every time you use one of -these files. -@ifnottex -To obtain process-related information, use the @code{PROCINFO} array. -@xref{Built-in Variables}. -@end ifnottex - -@item -Starting with @value{PVERSION} 3.1, @command{gawk} @emph{always} -interprets these special @value{FN}s.@footnote{Older versions of -@command{gawk} would only interpret these names internally if the system -did not actually have a a @file{/dev/fd} directory or any of the other -above listed special files. Usually this didn't make a difference, -but sometimes it did; thus, it was decided to make @command{gawk}'s -behavior consistent on all systems and to have it always interpret -the special @value{FN}s itself.} -For example, using @samp{/dev/fd/4} -for output actually writes on file descriptor 4, and not on a new -file descriptor that is @code{dup}'ed from file descriptor 4. Most of -the time this does not matter; however, it is important to @emph{not} -close any of the files related to file descriptors 0, 1, and 2. -Doing so results in unpredictable behavior. -@end itemize - -@node Close Files And Pipes, , Special Files, Printing -@section Closing Input and Output Redirections -@cindex closing input files and pipes -@cindex closing output files and pipes -@cindex closing coprocesses -@cindex coprocess -@cindex @code{close} built-in function - -If the same @value{FN} or the same shell command is used with @code{getline} -more than once during the execution of an @command{awk} program -(@pxref{Getline, ,Explicit Input with @code{getline}}), -the file is opened (or the command is executed) the first time only. -At that time, the first record of input is read from that file or command. -The next time the same file or command is used with @code{getline}, -another record is read from it, and so on. - -Similarly, when a file or pipe is opened for output, the @value{FN} or -command associated with it is remembered by @command{awk}, and subsequent -writes to the same file or command are appended to the previous writes. -The file or pipe stays open until @command{awk} exits. - -This implies that special steps are necessary in order to read the same -file again from the beginning, or to rerun a shell command (rather than -reading more output from the same command). The @code{close} function -makes these things possible: - -@example -close(@var{filename}) -@end example - -@noindent -or: - -@example -close(@var{command}) -@end example - -The argument @var{filename} or @var{command} can be any expression. Its -value must @emph{exactly} match the string that was used to open the file or -start the command (spaces and other ``irrelevant'' characters -included). For example, if you open a pipe with this: - -@example -"sort -r names" | getline foo -@end example - -@noindent -then you must close it with this: - -@example -close("sort -r names") -@end example - -Once this function call is executed, the next @code{getline} from that -file or command, or the next @code{print} or @code{printf} to that -file or command, reopens the file or reruns the command. -Because the expression that you use to close a file or pipeline must -exactly match the expression used to open the file or run the command, -it is good practice to use a variable to store the @value{FN} or command. -The previous example becomes the following: - -@example -sortcom = "sort -r names" -sortcom | getline foo -@dots{} -close(sortcom) -@end example - -@noindent -This helps avoid hard-to-find typographical errors in your @command{awk} -programs. Here are some of the reasons for closing an output file: - -@itemize @bullet -@item -To write a file and read it back later on in the same @command{awk} -program. Close the file after writing it, then -begin reading it with @code{getline}. - -@item -To write numerous files, successively, in the same @command{awk} -program. If the files aren't closed, eventually @command{awk} may exceed a -system limit on the number of open files in one process. It is best to -close each one when the program has finished writing it. - -@item -To make a command finish. When output is redirected through a pipe, -the command reading the pipe normally continues to try to read input -as long as the pipe is open. Often this means the command cannot -really do its work until the pipe is closed. For example, if -output is redirected to the @command{mail} program, the message is not -actually sent until the pipe is closed. - -@item -To run the same program a second time, with the same arguments. -This is not the same thing as giving more input to the first run! - -For example, suppose a program pipes output to the @command{mail} program. -If it outputs several lines redirected to this pipe without closing -it, they make a single message of several lines. By contrast, if the -program closes the pipe after each line of output, then each line makes -a separate message. -@end itemize - -@cindex differences between @command{gawk} and @command{awk} -@cindex portability issues -If you use more files than the system allows you to have open, -@command{gawk} attempts to multiplex the available open files among -your @value{DF}s. @command{gawk}'s ability to do this depends upon the -facilities of your operating system, so it may not always work. It is -therefore both good practice and good portability advice to always -use @code{close} on your files when you are done with them. -In fact, if you are using a lot of pipes, it is essential that -you close commands when done. For example, consider something like this: - -@example -@{ - @dots{} - command = ("grep " $1 " /some/file | my_prog -q " $3) - while ((command | getline) > 0) @{ - @var{process output of} command - @} - # need close(command) here -@} -@end example - -This example creates a new pipeline based on data in @emph{each} record. -Without the call to @code{close} indicated in the comment, @command{awk} -creates child processes to run the commands, until it eventually -runs out of file descriptors for more pipelines. - -Even though each command has finished (as indicated by the end-of-file -return status from @code{getline}), the child process is not -terminated;@footnote{The technical terminology is rather morbid. -The finished child is called a ``zombie,'' and cleaning up after -it is referred to as ``reaping.''} -@c Good old UNIX: give the marketing guys fits, that's the ticket -more importantly, the file descriptor for the pipe -is not closed and released until @code{close} is called or -@command{awk} exits. - -@code{close} will silently do nothing if given an argument that -does not represent a file, pipe or coprocess that was opened with -a redirection. - -When using the @samp{|&} operator to communicate with a coprocess, -it is occasionally useful to be able to close one end of the two-way -pipe without closing the other. -This is done by supplying a second argument to @code{close}. -As in any other call to @code{close}, -the first argument is the name of the command or special file used -to start the coprocess. -The second argument should be a string, with either of the values -@code{"to"} or @code{"from"}. Case does not matter. -As this is an advanced feature, a more complete discussion is -delayed until -@ref{Two-way I/O, ,Two-Way Communications with Another Process}, -which discusses it in more detail and gives an example. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Using @code{close}'s Return Value -@cindex advanced notes -@cindex dark corner -@cindex differences between @command{gawk} and @command{awk} -@cindex @code{close}, return value -@cindex return value from @code{close} - -In many versions of Unix @command{awk}, the @code{close} function -is actually a statement. It is a syntax error to try and use the return -value from @code{close}: -@value{DARKCORNER} - -@example -command = "@dots{}" -command | getline info -retval = close(command) # syntax error in most Unix awks -@end example - -@command{gawk} treats @code{close} as a function. -The return value is @minus{}1 if the argument names something -that was never opened with a redirection, or if there is -a system problem closing the file or process. -In these cases, @command{gawk} sets the built-in variable -@code{ERRNO} to a string describing the problem. - -In @command{gawk}, -when closing a pipe or coprocess, -the return value is the exit status of the command. -Otherwise, it is the return value from the system's @code{close} or -@code{fclose} C functions when closing input or output -files, respectively. -This value is zero if the close succeeds, or @minus{}1 if -it fails. - -The return value for closing a pipeline is particularly useful. -It allows you to get the output from a command as well as its -exit status. - -For POSIX-compliant systems, -if the exit status is a number above 128, then the program -was terminated by a signal. Subtract 128 to get the signal number: - -@example -exit_val = close(command) -if (exit_val > 128) - print command, "died with signal", exit_val - 128 -else - print command, "exited with code", exit_val -@end example - -Currently, in @command{gawk}, this only works for commands -piping into @code{getline}. For commands piped into -from @code{print} or @code{printf}, the -return value from @code{close} is that of the library's -@code{pclose} function. - -@node Expressions, Patterns and Actions, Printing, Top -@chapter Expressions -@cindex expression - -Expressions are the basic building blocks of @command{awk} patterns -and actions. An expression evaluates to a value that you can print, test, -or pass to a function. Additionally, an expression -can assign a new value to a variable or a field by using an assignment operator. - -An expression can serve as a pattern or action statement on its own. -Most other kinds of -statements contain one or more expressions that specify the data on which to -operate. As in other languages, expressions in @command{awk} include -variables, array references, constants, and function calls, as well as -combinations of these with various operators. - -@menu -* Constants:: String, numeric and regexp constants. -* Using Constant Regexps:: When and how to use a regexp constant. -* Variables:: Variables give names to values for later use. -* Conversion:: The conversion of strings to numbers and vice - versa. -* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, - etc.) -* Concatenation:: Concatenating strings. -* Assignment Ops:: Changing the value of a variable or a field. -* Increment Ops:: Incrementing the numeric value of a variable. -* Truth Values:: What is ``true'' and what is ``false''. -* Typing and Comparison:: How variables acquire types and how this - affects comparison of numbers and strings with - @samp{<}, etc. -* Boolean Ops:: Combining comparison expressions using boolean - operators @samp{||} (``or''), @samp{&&} - (``and'') and @samp{!} (``not''). -* Conditional Exp:: Conditional expressions select between two - subexpressions under control of a third - subexpression. -* Function Calls:: A function call is an expression. -* Precedence:: How various operators nest. -@end menu - -@node Constants, Using Constant Regexps, Expressions, Expressions -@section Constant Expressions -@cindex constants, types of - -The simplest type of expression is the @dfn{constant}, which always has -the same value. There are three types of constants: numeric, -string, and regular expression. - -Each is used in the appropriate context when you need a data -value that isn't going to change. Numeric constants can -have different forms, but are stored identically internally. - -@menu -* Scalar Constants:: Numeric and string constants. -* Non-decimal-numbers:: What are octal and hex numbers. -* Regexp Constants:: Regular Expression constants. -@end menu - -@node Scalar Constants, Non-decimal-numbers, Constants, Constants -@subsection Numeric and String Constants - -@cindex numeric constant -@cindex numeric value -A @dfn{numeric constant} stands for a number. This number can be an -integer, a decimal fraction, or a number in scientific (exponential) -notation.@footnote{The internal representation of all numbers, -including integers, uses double-precision -floating-point numbers. -On most modern systems, these are in IEEE 754 standard format.} -Here are some examples of numeric constants that all -have the same value: - -@example -105 -1.05e+2 -1050e-1 -@end example - -@cindex string constants -A string constant consists of a sequence of characters enclosed in -double quote marks. For example: - -@example -"parrot" -@end example - -@noindent -@cindex differences between @command{gawk} and @command{awk} -represents the string whose contents are @samp{parrot}. Strings in -@command{gawk} can be of any length, and they can contain any of the possible -eight-bit ASCII characters including ASCII @sc{nul} (character code zero). -Other @command{awk} -implementations may have difficulty with some character codes. - -@node Non-decimal-numbers, Regexp Constants, Scalar Constants, Constants -@subsection Octal and Hexadecimal Numbers -@cindex octal numbers -@cindex hexadecimal numbers -@cindex numbers, octal -@cindex numbers, hexadecimal - -In @command{awk}, all numbers are in decimal; i.e., base 10. Many other -programming languages allow you to specify numbers in other bases, often -octal (base 8) and hexadecimal (base 16). -In octal, the numbers go 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, etc.. -Just as @samp{11} in decimal is 1 times 10 plus 1, so -@samp{11} in octal is 1 times 8, plus 1. This equals nine in decimal. -In hexadecimal, there are 16 digits. Since the everyday decimal -number system only has ten digits (@samp{0}---@samp{9}), the letters -@samp{a} through @samp{f} are used to represent the rest. -(Case in the letters is usually irrelevant; hexadecimal @samp{a} and @samp{A} -have the same value.) -Thus, @samp{11} in -hexadecimal is 1 times 16 plus 1, which equals 17 in decimal. - -Just by looking at plain @samp{11}, you can't tell what base it's in. -So, in C, C++, and other languages derived from C, -@c such as PERL, but we won't mention that.... -there is a special notation to help signify the base. -Octal numbers start with a leading @samp{0}, -and hexadecimal numbers start with a leading @samp{0x} or @samp{0X}: - -@table @code -@item 11 -Decimal 11. - -@item 011 -Octal 11, decimal value 9. - -@item 0x11 -Hexadecimal 11, decimal value 17. -@end table - -This example shows the difference: - -@example -$ gawk 'BEGIN @{ printf "%d, %d, %d\n", 011, 11, 0x11 @}' -@print{} 9, 11, 17 -@end example - -Being able to use octal and hexadecimal constants in your programs is most -useful when working with data that cannot be represented conveniently as -characters or as regular numbers, such as binary data of various sorts. - -@command{gawk} allows the use of octal and hexadecimal -constants in your program text. However, such numbers in the input data -are not treated differently; doing so by default would break old -programs. -(If you really need to do this, use the @option{--non-decimal-data} -command-line option, -@pxref{Non-decimal Data, ,Allowing Non-Decimal Input Data}.) -If you have octal or hexadecimal data, -you can use the @code{strtonum} function -(@pxref{String Functions, ,String Manipulation Functions}) -to convert the data into a number. -Most of the time, you will want to use octal or hexadecimal constants -when working with the built-in bit manipulation functions; -see @ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, -for more information. - -Unlike some early C implementations, @samp{8} and @samp{9} are not valid -in octal constants; e.g., @command{gawk} treats @samp{018} as decimal 18. - -@example -$ gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}' -@print{} 021 is 17 -@print{} 18 -@end example - -Octal and hexadecimal source code constants are a @command{gawk} extension. -If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -they are not available. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: A Constant's Base Does Not Affect Its Value -@cindex advanced notes - -Once a numeric constant has -been converted internally into a number, -@command{gawk} no longer remembers -what the original form of the constant was; the internal value is -always used. This has particular consequences for conversion of -numbers to strings: - -@example -$ gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}' -@print{} 0x11 is <17> -@end example - -@node Regexp Constants, , Non-decimal-numbers, Constants -@subsection Regular Expression Constants - -@cindex @code{~} operator -@cindex @code{!~} operator -A regexp constant is a regular expression description enclosed in -slashes, such as @code{@w{/^beginning and end$/}}. Most regexps used in -@command{awk} programs are constant, but the @samp{~} and @samp{!~} -matching operators can also match computed or ``dynamic'' regexps -(which are just ordinary strings or variables that contain a regexp). - -@node Using Constant Regexps, Variables, Constants, Expressions -@section Using Regular Expression Constants - -@cindex dark corner -When used on the righthand side of the @samp{~} or @samp{!~} -operators, a regexp constant merely stands for the regexp that is to be -matched. -However, regexp constants (such as @code{/foo/}) may be used like simple expressions. -When a -regexp constant appears by itself, it has the same meaning as if it appeared -in a pattern, i.e.; @samp{($0 ~ /foo/)} -@value{DARKCORNER} -@xref{Expression Patterns, ,Expressions as Patterns}. -This means that the following two code segments: - -@example -if ($0 ~ /barfly/ || $0 ~ /camelot/) - print "found" -@end example - -@noindent -and: - -@example -if (/barfly/ || /camelot/) - print "found" -@end example - -@noindent -are exactly equivalent. -One rather bizarre consequence of this rule is that the following -Boolean expression is valid, but does not do what the user probably -intended: - -@example -# note that /foo/ is on the left of the ~ -if (/foo/ ~ $1) print "found foo" -@end example - -@cindex automatic warnings -@cindex warnings, automatic -@noindent -This code is ``obviously'' testing @code{$1} for a match against the regexp -@code{/foo/}. But in fact, the expression @samp{/foo/ ~ $1} actually means -@samp{($0 ~ /foo/) ~ $1}. In other words, first match the input record -against the regexp @code{/foo/}. The result is either zero or one, -depending upon the success or failure of the match. That result -is then matched against the first field in the record. -Because it is unlikely that you would ever really want to make this kind of -test, @command{gawk} issues a warning when it sees this construct in -a program. -Another consequence of this rule is that the assignment statement: - -@example -matches = /foo/ -@end example - -@noindent -assigns either zero or one to the variable @code{matches}, depending -upon the contents of the current input record. -This feature of the language has never been well documented until the -POSIX specification. - -@cindex differences between @command{gawk} and @command{awk} -@cindex dark corner -Constant regular expressions are also used as the first argument for -the @code{gensub}, @code{sub}, and @code{gsub} functions, and as the -second argument of the @code{match} function -(@pxref{String Functions, ,String Manipulation Functions}). -Modern implementations of @command{awk}, including @command{gawk}, allow -the third argument of @code{split} to be a regexp constant, but some -older implementations do not. -@value{DARKCORNER} -This can lead to confusion when attempting to use regexp constants -as arguments to user defined functions -(@pxref{User-defined, ,User-Defined Functions}). -For example: - -@example -function mysub(pat, repl, str, global) -@{ - if (global) - gsub(pat, repl, str) - else - sub(pat, repl, str) - return str -@} - -@{ - @dots{} - text = "hi! hi yourself!" - mysub(/hi/, "howdy", text, 1) - @dots{} -@} -@end example - -@cindex automatic warnings -@cindex warnings, automatic -In this example, the programmer wants to pass a regexp constant to the -user-defined function @code{mysub}, which in turn passes it on to -either @code{sub} or @code{gsub}. However, what really happens is that -the @code{pat} parameter is either one or zero, depending upon whether -or not @code{$0} matches @code{/hi/}. -@command{gawk} issues a warning when it sees a regexp constant used as -a parameter to a user-defined function, since passing a truth value in -this way is probably not what was intended. - -@node Variables, Conversion, Using Constant Regexps, Expressions -@section Variables - -Variables are ways of storing values at one point in your program for -use later in another part of your program. They can be manipulated -entirely within the program text, and they can also be assigned values -on the @command{awk} command line. - -@menu -* Using Variables:: Using variables in your programs. -* Assignment Options:: Setting variables on the command-line and a - summary of command-line syntax. This is an - advanced method of input. -@end menu - -@node Using Variables, Assignment Options, Variables, Variables -@subsection Using Variables in a Program - -@cindex variables, user-defined -@cindex user-defined variables -Variables let you give names to values and refer to them later. Variables -have already been used in many of the examples. The name of a variable -must be a sequence of letters, digits, or underscores, and it may not begin -with a digit. Case is significant in variable names; @code{a} and @code{A} -are distinct variables. - -A variable name is a valid expression by itself; it represents the -variable's current value. Variables are given new values with -@dfn{assignment operators}, @dfn{increment operators}, and -@dfn{decrement operators}. -@xref{Assignment Ops, ,Assignment Expressions}. -@c NEXT ED: Can also be changed by sub, gsub, split - -A few variables have special built-in meanings, such as @code{FS} (the -field separator), and @code{NF} (the number of fields in the current input -record). @xref{Built-in Variables}, for a list of the built-in variables. -These built-in variables can be used and assigned just like all other -variables, but their values are also used or changed automatically by -@command{awk}. All built-in variables' names are entirely uppercase. - -Variables in @command{awk} can be assigned either numeric or string values. -The kind of value a variable holds can change over the life of a program. -By default, variables are initialized to the empty string, which -is zero if converted to a number. There is no need to -``initialize'' each variable explicitly in @command{awk}, -which is what you would do in C and in most other traditional languages. - -@node Assignment Options, , Using Variables, Variables -@subsection Assigning Variables on the Command Line - -Any @command{awk} variable can be set by including a @dfn{variable assignment} -among the arguments on the command line when @command{awk} is invoked -(@pxref{Other Arguments, ,Other Command-Line Arguments}). -Such an assignment has the following form: - -@example -@var{variable}=@var{text} -@end example - -@noindent -With it, a variable is set either at the beginning of the -@command{awk} run or in between input files. -When the assignment is preceded with the @option{-v} option, -as in the following: - -@example --v @var{variable}=@var{text} -@end example - -@noindent -the variable is set at the very beginning, even before the -@code{BEGIN} rules are run. The @option{-v} option and its assignment -must precede all the @value{FN} arguments, as well as the program text. -(@xref{Options, ,Command-Line Options}, for more information about -the @option{-v} option.) -Otherwise, the variable assignment is performed at a time determined by -its position among the input file arguments---after the processing of the -preceding input file argument. For example: - -@example -awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list -@end example - -@noindent -prints the value of field number @code{n} for all input records. Before -the first file is read, the command line sets the variable @code{n} -equal to four. This causes the fourth field to be printed in lines from -the file @file{inventory-shipped}. After the first file has finished, -but before the second file is started, @code{n} is set to two, so that the -second field is printed in lines from @file{BBS-list}: - -@example -$ awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list -@print{} 15 -@print{} 24 -@dots{} -@print{} 555-5553 -@print{} 555-3412 -@dots{} -@end example - -@cindex dark corner -Command-line arguments are made available for explicit examination by -the @command{awk} program in an array named @code{ARGV} -(@pxref{ARGC and ARGV, ,Using @code{ARGC} and @code{ARGV}}). -@command{awk} processes the values of command-line assignments for escape -sequences -@value{DARKCORNER} -(@pxref{Escape Sequences}). - -@node Conversion, Arithmetic Ops, Variables, Expressions -@section Conversion of Strings and Numbers - -@cindex conversion of strings and numbers -Strings are converted to numbers and numbers are converted to strings, if the context -of the @command{awk} program demands it. For example, if the value of -either @code{foo} or @code{bar} in the expression @samp{foo + bar} -happens to be a string, it is converted to a number before the addition -is performed. If numeric values appear in string concatenation, they -are converted to strings. Consider the following: - -@example -two = 2; three = 3 -print (two three) + 4 -@end example - -@noindent -This prints the (numeric) value 27. The numeric values of -the variables @code{two} and @code{three} are converted to strings and -concatenated together. The resulting string is converted back to the -number 23, to which four is then added. - -@cindex null string -@cindex empty string -@cindex type conversion -If, for some reason, you need to force a number to be converted to a -string, concatenate the empty string, @code{""}, with that number. -To force a string to be converted to a number, add zero to that string. -A string is converted to a number by interpreting any numeric prefix -of the string as numerals: -@code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"} -has a numeric value of 25. -Strings that can't be interpreted as valid numbers convert to zero. - -@cindex @code{CONVFMT} variable -The exact manner in which numbers are converted into strings is controlled -by the @command{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}). -Numbers are converted using the @code{sprintf} function -with @code{CONVFMT} as the format -specifier -(@pxref{String Functions, ,String Manipulation Functions}). - -@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with -at least six significant digits. For some applications, you might want to -change it to specify more precision. -On most modern machines, -17 digits is enough to capture a floating-point number's -value exactly, -most of the time.@footnote{Pathological cases can require up to -752 digits (!), but we doubt that you need to worry about this.} - -@cindex dark corner -Strange results can occur if you set @code{CONVFMT} to a string that doesn't -tell @code{sprintf} how to format floating-point numbers in a useful way. -For example, if you forget the @samp{%} in the format, @command{awk} converts -all numbers to the same constant string. -As a special case, if a number is an integer, then the result of converting -it to a string is @emph{always} an integer, no matter what the value of -@code{CONVFMT} may be. Given the following code fragment: - -@example -CONVFMT = "%2.2f" -a = 12 -b = a "" -@end example - -@noindent -@code{b} has the value @code{"12"}, not @code{"12.00"}. -@value{DARKCORNER} - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@cindex @code{OFMT} variable -Prior to the POSIX standard, @command{awk} used the value -of @code{OFMT} for converting numbers to strings. @code{OFMT} -specifies the output format to use when printing numbers with @code{print}. -@code{CONVFMT} was introduced in order to separate the semantics of -conversion from the semantics of printing. Both @code{CONVFMT} and -@code{OFMT} have the same default value: @code{"%.6g"}. In the vast majority -of cases, old @command{awk} programs do not change their behavior. -However, these semantics for @code{OFMT} are something to keep in mind if you must -port your new style program to older implementations of @command{awk}. -We recommend -that instead of changing your programs, just port @command{gawk} itself. -@xref{Print, ,The @code{print} Statement}, -for more information on the @code{print} statement. - -@node Arithmetic Ops, Concatenation, Conversion, Expressions -@section Arithmetic Operators -@cindex arithmetic operators -@cindex operators, arithmetic -@cindex addition -@cindex subtraction -@cindex multiplication -@cindex division -@cindex remainder -@cindex quotient -@cindex exponentiation - -The @command{awk} language uses the common arithmetic operators when -evaluating expressions. All of these arithmetic operators follow normal -precedence rules and work as you would expect them to. - -The following example uses a file named @file{grades}, which contains -a list of student names as well as three test scores per student (it's -a small class): - -@example -Pat 100 97 58 -Sandy 84 72 93 -Chris 72 92 89 -@end example - -@noindent -This programs takes the file @file{grades} and prints the average -of the scores: - -@example -$ awk '@{ sum = $2 + $3 + $4 ; avg = sum / 3 -> print $1, avg @}' grades -@print{} Pat 85 -@print{} Sandy 83 -@print{} Chris 84.3333 -@end example - -The following list provides the arithmetic operators in @command{awk}, in order from -the highest precedence to the lowest: - -@table @code -@item - @var{x} -Negation. - -@item + @var{x} -Unary plus; the expression is converted to a number. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@item @var{x} ^ @var{y} -@itemx @var{x} ** @var{y} -Exponentiation; @var{x} raised to the @var{y} power. @samp{2 ^ 3} has -the value eight; the character sequence @samp{**} is equivalent to -@samp{^}. - -@item @var{x} * @var{y} -Multiplication. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -@item @var{x} / @var{y} -Division; because all numbers in @command{awk} are floating-point -numbers, the result is @emph{not} rounded to an integer---@samp{3 / 4} has -the value 0.75. (It is a common mistake, especially for C programmers, -to forget that @emph{all} numbers in @command{awk} are floating-point, -and that division of integer-looking constants produces a real number, -not an integer.) - -@item @var{x} % @var{y} -Remainder; further discussion is provided in the text, just -after this list. - -@item @var{x} + @var{y} -Addition. - -@item @var{x} - @var{y} -Subtraction. -@end table - -Unary plus and minus have the same precedence, -the multiplication operators all have the same precedence, and -addition and subtraction have the same precedence. - -@cindex differences between @command{gawk} and @command{awk} -When computing the remainder of @code{@var{x} % @var{y}}, -the quotient is rounded toward zero to an integer and -multiplied by @var{y}. This result is subtracted from @var{x}; -this operation is sometimes known as ``trunc-mod.'' The following -relation always holds: - -@example -b * int(a / b) + (a % b) == a -@end example - -One possibly undesirable effect of this definition of remainder is that -@code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus: - -@example --17 % 8 = -1 -@end example - -In other @command{awk} implementations, the signedness of the remainder -may be machine dependent. -@c !!! what does posix say? - -@cindex portability issues -@strong{Note:} -The POSIX standard only specifies the use of @samp{^} -for exponentiation. -For maximum portability, do not use the @samp{**} operator. - -@node Concatenation, Assignment Ops, Arithmetic Ops, Expressions -@section String Concatenation -@cindex Kernighan, Brian -@quotation -@i{It seemed like a good idea at the time.}@* -Brian Kernighan -@end quotation - -@cindex string operators -@cindex operators, string -@cindex concatenation -There is only one string operation: concatenation. It does not have a -specific operator to represent it. Instead, concatenation is performed by -writing expressions next to one another, with no operator. For example: - -@example -$ awk '@{ print "Field number one: " $1 @}' BBS-list -@print{} Field number one: aardvark -@print{} Field number one: alpo-net -@dots{} -@end example - -Without the space in the string constant after the @samp{:}, the line -runs together. For example: - -@example -$ awk '@{ print "Field number one:" $1 @}' BBS-list -@print{} Field number one:aardvark -@print{} Field number one:alpo-net -@dots{} -@end example - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -Because string concatenation does not have an explicit operator, it is -often necessary to insure that it happens at the right time by using -parentheses to enclose the items to concatenate. For example, the -following code fragment does not concatenate @code{file} and @code{name} -as you might expect: - -@example -file = "file" -name = "name" -print "something meaningful" > file name -@end example - -@noindent -It is necessary to use the following: - -@example -print "something meaningful" > (file name) -@end example - -@cindex order of evaluation, concatenation -@cindex concatenation evaluation order -@cindex evaluation, order of -@cindex side effects -Parentheses should be used around concatenation in all but the -most common contexts, such as on the righthand side of @samp{=}. -Be careful about the kinds of expressions used in string concatenation. -In particular, the order of evaluation of expressions used for concatenation -is undefined in the @command{awk} language. Consider this example: - -@example -BEGIN @{ - a = "don't" - print (a " " (a = "panic")) -@} -@end example - -@noindent -It is not defined whether the assignment to @code{a} happens -before or after the value of @code{a} is retrieved for producing the -concatenated value. The result could be either @samp{don't panic}, -or @samp{panic panic}. -@c see test/nasty.awk for a worse example -The precedence of concatenation, when mixed with other operators, is often -counter-intuitive. Consider this example: - -@ignore -> To: bug-gnu-utils@@gnu.org -> CC: arnold@gnu.org -> Subject: gawk 3.0.4 bug with {print -12 " " -24} -> From: Russell Schulz <Russell_Schulz@locutus.ofB.ORG> -> Date: Tue, 8 Feb 2000 19:56:08 -0700 -> -> gawk 3.0.4 on NT gives me: -> -> prompt> cat bad.awk -> BEGIN { print -12 " " -24; } -> -> prompt> gawk -f bad.awk -> -12-24 -> -> when I would expect -> -> -12 -24 -> -> I have not investigated the source, or other implementations. The -> bug is there on my NT and DOS versions 2.15.6 . -@end ignore - -@example -$ awk 'BEGIN @{ print -12 " " -24 @}' -@print{} -12-24 -@end example - -This ``obviously'' is concatenating @minus{}12, a space, and @minus{}24. -But where did the space disappear to? -The answer lies in the combination of operator precedences and -@command{awk}'s automatic conversion rules. To get the desired result, -write the program in the following manner: - -@example -$ awk 'BEGIN @{ print -12 " " (-24) @}' -@print{} -12 -24 -@end example - -This forces @command{awk} to treat the @samp{-} on the @samp{-24} as unary. -Otherwise, it's parsed as follows: - -@display - @minus{}12 (@code{"@ "} @minus{} 24) -@result{} @minus{}12 (0 @minus{} 24) -@result{} @minus{}12 (@minus{}24) -@result{} @minus{}12@minus{}24 -@end display - -As mentioned earlier, -when doing concatenation, @emph{parenthesize}. Otherwise, -you're never quite sure what you'll get. - -@node Assignment Ops, Increment Ops, Concatenation, Expressions -@section Assignment Expressions -@cindex assignment operators -@cindex operators, assignment -@cindex expression, assignment - -@cindex @code{=} operator -An @dfn{assignment} is an expression that stores a (usually different) -value into a variable. For example, let's assign the value one to the variable -@code{z}: - -@example -z = 1 -@end example - -After this expression is executed, the variable @code{z} has the value one. -Whatever old value @code{z} had before the assignment is forgotten. - -Assignments can also store string values. For example, the -following stores -the value @code{"this food is good"} in the variable @code{message}: - -@example -thing = "food" -predicate = "good" -message = "this " thing " is " predicate -@end example - -@noindent -@cindex side effects -This also illustrates string concatenation. -The @samp{=} sign is called an @dfn{assignment operator}. It is the -simplest assignment operator because the value of the righthand -operand is stored unchanged. -Most operators (addition, concatenation, and so on) have no effect -except to compute a value. If the value isn't used, there's no reason to -use the operator. An assignment operator is different; it does -produce a value, but even if you ignore it, the assignment still -makes itself felt through the alteration of the variable. We call this -a @dfn{side effect}. - -@cindex lvalue -@cindex rvalue -The lefthand operand of an assignment need not be a variable -(@pxref{Variables}); it can also be a field -(@pxref{Changing Fields, ,Changing the Contents of a Field}) or -an array element (@pxref{Arrays, ,Arrays in @command{awk}}). -These are all called @dfn{lvalues}, -which means they can appear on the lefthand side of an assignment operator. -The righthand operand may be any expression; it produces the new value -that the assignment stores in the specified variable, field, or array -element. (Such values are called @dfn{rvalues}). - -@cindex types of variables -It is important to note that variables do @emph{not} have permanent types. -A variable's type is simply the type of whatever value it happens -to hold at the moment. In the following program fragment, the variable -@code{foo} has a numeric value at first, and a string value later on: - -@example -foo = 1 -print foo -foo = "bar" -print foo -@end example - -@noindent -When the second assignment gives @code{foo} a string value, the fact that -it previously had a numeric value is forgotten. - -String values that do not begin with a digit have a numeric value of -zero. After executing the following code, the value of @code{foo} is five: - -@example -foo = "a string" -foo = foo + 5 -@end example - -@noindent -@strong{Note:} Using a variable as a number and then later as a string -can be confusing and is poor programming style. The previous two examples -illustrate how @command{awk} works, @emph{not} how you should write your -own programs! - -An assignment is an expression, so it has a value---the same value that -is assigned. Thus, @samp{z = 1} is an expression with the value one. -One consequence of this is that you can write multiple assignments together, -such as: - -@example -x = y = z = 5 -@end example - -@noindent -This example stores the value five in all three variables -(@code{x}, @code{y}, and @code{z}). -It does so because the -value of @samp{z = 5}, which is five, is stored into @code{y} and then -the value of @samp{y = z = 5}, which is five, is stored into @code{x}. - -Assignments may be used anywhere an expression is called for. For -example, it is valid to write @samp{x != (y = 1)} to set @code{y} to one, -and then test whether @code{x} equals one. But this style tends to make -programs hard to read; such nesting of assignments should be avoided, -except perhaps in a one-shot program. - -Aside from @samp{=}, there are several other assignment operators that -do arithmetic with the old value of the variable. For example, the -operator @samp{+=} computes a new value by adding the righthand value -to the old value of the variable. Thus, the following assignment adds -five to the value of @code{foo}: - -@example -foo += 5 -@end example - -@noindent -This is equivalent to the following: - -@example -foo = foo + 5 -@end example - -@noindent -Use whichever makes the meaning of your program clearer. - -There are situations where using @samp{+=} (or any assignment operator) -is @emph{not} the same as simply repeating the lefthand operand in the -righthand expression. For example: - -@cindex Rankin, Pat -@example -# Thanks to Pat Rankin for this example -BEGIN @{ - foo[rand()] += 5 - for (x in foo) - print x, foo[x] - - bar[rand()] = bar[rand()] + 5 - for (x in bar) - print x, bar[x] -@} -@end example - -@noindent -The indices of @code{bar} are practically guaranteed to be different, because -@code{rand} returns different values each time it is called. -(Arrays and the @code{rand} function haven't been covered yet. -@xref{Arrays, ,Arrays in @command{awk}}, -and see @ref{Numeric Functions}, for more information). -This example illustrates an important fact about assignment -operators: the lefthand expression is only evaluated @emph{once}. -It is up to the implementation as to which expression is evaluated -first, the lefthand or the righthand. -Consider this example: - -@example -i = 1 -a[i += 2] = i + 1 -@end example - -@noindent -The value of @code{a[3]} could be either two or four. - -Here is a table of the arithmetic assignment operators. In each -case, the righthand operand is an expression whose value is converted -to a number. - -@ignore -@table @code -@item @var{lvalue} += @var{increment} -Adds @var{increment} to the value of @var{lvalue}. - -@item @var{lvalue} -= @var{decrement} -Subtracts @var{decrement} from the value of @var{lvalue}. - -@item @var{lvalue} *= @var{coefficient} -Multiplies the value of @var{lvalue} by @var{coefficient}. - -@item @var{lvalue} /= @var{divisor} -Divides the value of @var{lvalue} by @var{divisor}. - -@item @var{lvalue} %= @var{modulus} -Sets @var{lvalue} to its remainder by @var{modulus}. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@item @var{lvalue} ^= @var{power} -@itemx @var{lvalue} **= @var{power} -Raises @var{lvalue} to the power @var{power}. -(Only the @samp{^=} operator is specified by POSIX.) -@end table -@end ignore - -@cindex @code{+=} operator -@cindex @code{-=} operator -@cindex @code{*=} operator -@cindex @code{/=} operator -@cindex @code{%=} operator -@cindex @code{^=} operator -@cindex @code{**=} operator -@multitable {@var{lvalue} *= @var{coefficient}} {Subtracts @var{decrement} from the value of @var{lvalue}.} -@item @var{lvalue} @code{+=} @var{increment} @tab Adds @var{increment} to the value of @var{lvalue}. - -@item @var{lvalue} @code{-=} @var{decrement} @tab Subtracts @var{decrement} from the value of @var{lvalue}. - -@item @var{lvalue} @code{*=} @var{coefficient} @tab Multiplies the value of @var{lvalue} by @var{coefficient}. - -@item @var{lvalue} @code{/=} @var{divisor} @tab Divides the value of @var{lvalue} by @var{divisor}. - -@item @var{lvalue} @code{%=} @var{modulus} @tab Sets @var{lvalue} to its remainder by @var{modulus}. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@item @var{lvalue} @code{^=} @var{power} @tab -@item @var{lvalue} @code{**=} @var{power} @tab Raises @var{lvalue} to the power @var{power}. -@end multitable - -@cindex portability issues -@strong{Note:} -Only the @samp{^=} operator is specified by POSIX. -For maximum portability, do not use the @samp{**=} operator. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Syntactic Ambiguities Between @samp{/=} and Regular Expressions -@cindex advanced notes - -@c derived from email from "Nelson H. F. Beebe" <beebe@math.utah.edu> -@c Date: Mon, 1 Sep 1997 13:38:35 -0600 (MDT) - -@cindex dark corner -@cindex ambiguity, syntactic: @code{/=} operator vs. @code{/=@dots{}/} regexp constant -@cindex syntactic ambiguity: @code{/=} operator vs. @code{/=@dots{}/} regexp constant -@cindex @code{/=} operator vs. @code{/=@dots{}/} regexp constant -There is a syntactic ambiguity between the @samp{/=} assignment -operator and regexp constants whose first character is an @samp{=}. -@value{DARKCORNER} -This is most notable in commercial @command{awk} versions. -For example: - -@example -$ awk /==/ /dev/null -@error{} awk: syntax error at source line 1 -@error{} context is -@error{} >>> /= <<< -@error{} awk: bailing out at source line 1 -@end example - -@noindent -A workaround is: - -@example -awk '/[=]=/' /dev/null -@end example - -@command{gawk} does not have this problem, -nor do the other -freely-available versions described in -@ref{Other Versions, , Other Freely Available @command{awk} Implementations}. - -@node Increment Ops, Truth Values, Assignment Ops, Expressions -@section Increment and Decrement Operators - -@cindex increment operators -@cindex operators, increment -@dfn{Increment} and @dfn{decrement operators} increase or decrease the value of -a variable by one. An assignment operator can do the same thing, so -the increment operators add no power to the @command{awk} language; however they -are convenient abbreviations for very common operations. - -@cindex side effects -The operator used for adding one is written @samp{++}. It can be used to increment -a variable either before or after taking its value. -To pre-increment a variable @code{v}, write @samp{++v}. This adds -one to the value of @code{v}---that new value is also the value of the -expression. (The assignment expression @samp{v += 1} is completely -equivalent.) -Writing the @samp{++} after the variable specifies post-increment. This -increments the variable value just the same; the difference is that the -value of the increment expression itself is the variable's @emph{old} -value. Thus, if @code{foo} has the value four, then the expression @samp{foo++} -has the value four, but it changes the value of @code{foo} to five. -In other words, the operator returns the old value of the variable, -but with the side effect of incrementing it. - -The post-increment @samp{foo++} is nearly the same as writing @samp{(foo -+= 1) - 1}. It is not perfectly equivalent because all numbers in -@command{awk} are floating-point---in floating-point, @samp{foo + 1 - 1} does -not necessarily equal @code{foo}. But the difference is minute as -long as you stick to numbers that are fairly small (less than 10e12). - -Fields and array elements are incremented -just like variables. (Use @samp{$(i++)} when you want to do a field reference -and a variable increment at the same time. The parentheses are necessary -because of the precedence of the field reference operator @samp{$}.) - -@cindex decrement operators -@cindex operators, decrement -The decrement operator @samp{--} works just like @samp{++}, except that -it subtracts one instead of adding it. As with @samp{++}, it can be used before -the lvalue to pre-decrement or after it to post-decrement. -Following is a summary of increment and decrement expressions: - -@table @code -@cindex @code{++} operator -@item ++@var{lvalue} -This expression increments @var{lvalue}, and the new value becomes the -value of the expression. - -@item @var{lvalue}++ -This expression increments @var{lvalue}, but -the value of the expression is the @emph{old} value of @var{lvalue}. - -@cindex @code{--} operator -@item --@var{lvalue} -This expression is -like @samp{++@var{lvalue}}, but instead of adding, it subtracts. It -decrements @var{lvalue} and delivers the value that is the result. - -@item @var{lvalue}-- -This expression is -like @samp{@var{lvalue}++}, but instead of adding, it subtracts. It -decrements @var{lvalue}. The value of the expression is the @emph{old} -value of @var{lvalue}. -@end table - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Operator Evaluation Order -@cindex advanced notes -@cindex precedence -@cindex operator precedence -@cindex portability issues -@cindex evaluation, order of -@cindex Marx, Groucho -@quotation -@i{Doctor, doctor! It hurts when I do this!@* -So don't do that!}@* -Groucho Marx -@end quotation - -@noindent -What happens for something like the following? - -@example -b = 6 -print b += b++ -@end example - -@noindent -Or something even stranger? - -@example -b = 6 -b += ++b + b++ -print b -@end example - -@cindex side effects -In other words, when do the various side effects prescribed by the -postfix operators (@samp{b++}) take effect? -When side effects happen is @dfn{implementation defined}. -In other words, it is up to the particular version of @command{awk}. -The result for the first example may be 12 or 13, and for the second, it -may be 22 or 23. - -In short, doing things like this is not recommended and definitely -not anything that you can rely upon for portability. -You should avoid such things in your own programs. -@c You'll sleep better at night and be able to look at yourself -@c in the mirror in the morning. - -@node Truth Values, Typing and Comparison, Increment Ops, Expressions -@section True and False in @command{awk} -@cindex truth values -@cindex logical true -@cindex logical false - -@cindex null string -@cindex empty string -Many programming languages have a special representation for the concepts -of ``true'' and ``false.'' Such languages usually use the special -constants @code{true} and @code{false}, or perhaps their uppercase -equivalents. -However, @command{awk} is different. -It borrows a very simple concept of true and -false from C. In @command{awk}, any nonzero numeric value @emph{or} any -non-empty string value is true. Any other value (zero or the null -string @code{""}) is false. The following program prints @samp{A strange -truth value} three times: - -@example -BEGIN @{ - if (3.1415927) - print "A strange truth value" - if ("Four Score And Seven Years Ago") - print "A strange truth value" - if (j = 57) - print "A strange truth value" -@} -@end example - -@cindex dark corner -There is a surprising consequence of the ``nonzero or non-null'' rule: -the string constant @code{"0"} is actually true, because it is non-null. -@value{DARKCORNER} - -@node Typing and Comparison, Boolean Ops, Truth Values, Expressions -@section Variable Typing and Comparison Expressions -@cindex comparison expressions -@cindex expression, comparison -@cindex expression, matching -@cindex relational operators -@cindex operators, relational -@cindex regexp operators -@cindex variable typing -@cindex types of variables -@quotation -@i{The Guide is definitive. Reality is frequently inaccurate.}@* -The Hitchhiker's Guide to the Galaxy -@end quotation - -Unlike other programming languages, @command{awk} variables do not have a -fixed type. Instead, they can be either a number or a string, depending -upon the value that is assigned to them. - -@cindex numeric string -The 1992 POSIX standard introduced -the concept of a @dfn{numeric string}, which is simply a string that looks -like a number---for example, @code{@w{" +2"}}. This concept is used -for determining the type of a variable. -The type of the variable is important because the types of two variables -determine how they are compared. -In @command{gawk}, variable typing follows these rules: - -@itemize @bullet -@item -A numeric constant or the result of a numeric operation has the @var{numeric} -attribute. - -@item -A string constant or the result of a string operation has the @var{string} -attribute. - -@item -Fields, @code{getline} input, @code{FILENAME}, @code{ARGV} elements, -@code{ENVIRON} elements, and the -elements of an array created by @code{split} that are numeric strings -have the @var{strnum} attribute. Otherwise, they have the @var{string} -attribute. -Uninitialized variables also have the @var{strnum} attribute. - -@item -Attributes propagate across assignments but are not changed by -any use. -@c (Although a use may cause the entity to acquire an additional -@c value such that it has both a numeric and string value, this leaves the -@c attribute unchanged.) -@c This is important but not relevant -@end itemize - -The last rule is particularly important. In the following program, -@code{a} has numeric type, even though it is later used in a string -operation: - -@example -BEGIN @{ - a = 12.345 - b = a " is a cute number" - print b -@} -@end example - -When two operands are compared, either string comparison or numeric comparison -may be used. This depends upon the attributes of the operands, according to the -following symmetric matrix: - -@c thanks to Karl Berry, kb@cs.umb.edu, for major help with TeX tables -@tex -\centerline{ -\vbox{\bigskip % space above the table (about 1 linespace) -% Because we have vertical rules, we can't let TeX insert interline space -% in its usual way. -\offinterlineskip -% -% Define the table template. & separates columns, and \cr ends the -% template (and each row). # is replaced by the text of that entry on -% each row. The template for the first column breaks down like this: -% \strut -- a way to make each line have the height and depth -% of a normal line of type, since we turned off interline spacing. -% \hfil -- infinite glue; has the effect of right-justifying in this case. -% # -- replaced by the text (for instance, `STRNUM', in the last row). -% \quad -- about the width of an `M'. Just separates the columns. -% -% The second column (\vrule#) is what generates the vertical rule that -% spans table rows. -% -% The doubled && before the next entry means `repeat the following -% template as many times as necessary on each line' -- in our case, twice. -% -% The template itself, \quad#\hfil, left-justifies with a little space before. -% -\halign{\strut\hfil#\quad&\vrule#&&\quad#\hfil\cr - &&STRING &NUMERIC &STRNUM\cr -% The \omit tells TeX to skip inserting the template for this column on -% this particular row. In this case, we only want a little extra space -% to separate the heading row from the rule below it. the depth 2pt -- -% `\vrule depth 2pt' is that little space. -\omit &depth 2pt\cr -% This is the horizontal rule below the heading. Since it has nothing to -% do with the columns of the table, we use \noalign to get it in there. -\noalign{\hrule} -% Like above, this time a little more space. -\omit &depth 4pt\cr -% The remaining rows have nothing special about them. -STRING &&string &string &string\cr -NUMERIC &&string &numeric &numeric\cr -STRNUM &&string &numeric &numeric\cr -}}} -@end tex -@ifnottex -@display - +---------------------------------------------- - | STRING NUMERIC STRNUM ---------+---------------------------------------------- - | -STRING | string string string - | -NUMERIC | string numeric numeric - | -STRNUM | string numeric numeric ---------+---------------------------------------------- -@end display -@end ifnottex - -The basic idea is that user input that looks numeric---and @emph{only} -user input---should be treated as numeric, even though it is actually -made of characters and is therefore also a string. -Thus, for example, the string constant @w{@code{" +3.14"}} -is a string, even though it looks numeric, -and is @emph{never} treated as number for comparison -purposes. - -In short, when one operand is a ``pure'' string, such as a string -constant, then a string comparison is performed. Otherwise, a -numeric comparison is performed.@footnote{The POSIX standard is under -revision. The revised standard's rules for typing and comparison are -the same as just described for @command{gawk}.} - -@dfn{Comparison expressions} compare strings or numbers for -relationships such as equality. They are written using @dfn{relational -operators}, which are a superset of those in C. Here is a table of -them: - -@cindex relational operators -@cindex operators, relational -@cindex @code{<} operator -@cindex @code{<=} operator -@cindex @code{>} operator -@cindex @code{>=} operator -@cindex @code{==} operator -@cindex @code{!=} operator -@cindex @code{~} operator -@cindex @code{!~} operator -@cindex @code{in} operator -@table @code -@item @var{x} < @var{y} -True if @var{x} is less than @var{y}. - -@item @var{x} <= @var{y} -True if @var{x} is less than or equal to @var{y}. - -@item @var{x} > @var{y} -True if @var{x} is greater than @var{y}. - -@item @var{x} >= @var{y} -True if @var{x} is greater than or equal to @var{y}. - -@item @var{x} == @var{y} -True if @var{x} is equal to @var{y}. - -@item @var{x} != @var{y} -True if @var{x} is not equal to @var{y}. - -@item @var{x} ~ @var{y} -True if the string @var{x} matches the regexp denoted by @var{y}. - -@item @var{x} !~ @var{y} -True if the string @var{x} does not match the regexp denoted by @var{y}. - -@item @var{subscript} in @var{array} -True if the array @var{array} has an element with the subscript @var{subscript}. -@end table - -Comparison expressions have the value one if true and zero if false. -When comparing operands of mixed types, numeric operands are converted -to strings using the value of @code{CONVFMT} -(@pxref{Conversion, ,Conversion of Strings and Numbers}). - -Strings are compared -by comparing the first character of each, then the second character of each, -and so on. Thus, @code{"10"} is less than @code{"9"}. If there are two -strings where one is a prefix of the other, the shorter string is less than -the longer one. Thus, @code{"abc"} is less than @code{"abcd"}. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -It is very easy to accidentally mistype the @samp{==} operator and -leave off one of the @samp{=} characters. The result is still valid @command{awk} -code, but the program does not do what is intended: - -@example -if (a = b) # oops! should be a == b - @dots{} -else - @dots{} -@end example - -@noindent -Unless @code{b} happens to be zero or the null string, the @code{if} -part of the test always succeeds. Because the operators are -so similar, this kind of error is very difficult to spot when -scanning the source code. - -The following table of expressions illustrates the kind of comparison -@command{gawk} performs, as well as what the result of the comparison is: - -@table @code -@item 1.5 <= 2.0 -numeric comparison (true) - -@item "abc" >= "xyz" -string comparison (false) - -@item 1.5 != " +2" -string comparison (true) - -@item "1e2" < "3" -string comparison (true) - -@item a = 2; b = "2" -@itemx a == b -string comparison (true) - -@item a = 2; b = " +2" -@item a == b -string comparison (false) -@end table - -In the next example: - -@example -$ echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}' -@print{} false -@end example - -@cindex comparisons, string vs. regexp -@cindex string comparison vs. regexp comparison -@cindex regexp comparison vs. string comparison -@noindent -the result is @samp{false} because both @code{$1} and @code{$2} -are user input. They are numeric strings---therefore both have -the @var{strnum} attribute, dictating a numeric comparison. -The purpose of the comparison rules and the use of numeric strings is -to attempt to produce the behavior that is ``least surprising,'' while -still ``doing the right thing.'' -String comparisons and regular expression comparisons are very different. -For example: - -@example -x == "foo" -@end example - -@noindent -has the value one, or is true if the variable @code{x} -is precisely @samp{foo}. By contrast: - -@example -x ~ /foo/ -@end example - -@noindent -has the value one if @code{x} contains @samp{foo}, such as -@code{"Oh, what a fool am I!"}. - -The righthand operand of the @samp{~} and @samp{!~} operators may be -either a regexp constant (@code{/@dots{}/}) or an ordinary -expression. In the latter case, the value of the expression as a string is used as a -dynamic regexp (@pxref{Regexp Usage, ,How to Use Regular Expressions}; also -@pxref{Computed Regexps, ,Using Dynamic Regexps}). - -@cindex regexp as expression -In modern implementations of @command{awk}, a constant regular -expression in slashes by itself is also an expression. The regexp -@code{/@var{regexp}/} is an abbreviation for the following comparison expression: - -@example -$0 ~ /@var{regexp}/ -@end example - -One special place where @code{/foo/} is @emph{not} an abbreviation for -@samp{$0 ~ /foo/} is when it is the righthand operand of @samp{~} or -@samp{!~}. -@xref{Using Constant Regexps, ,Using Regular Expression Constants}, -where this is discussed in more detail. - -@node Boolean Ops, Conditional Exp, Typing and Comparison, Expressions -@section Boolean Expressions -@cindex expression, boolean -@cindex boolean expressions -@cindex operators, boolean -@cindex boolean operators -@cindex logical operators -@cindex operators, logical -@cindex short-circuit operators -@cindex operators, short-circuit -@cindex AND logical operator -@cindex OR logical operator -@cindex NOT logical operator -@cindex @code{&&} operator -@cindex @code{||} operator -@cindex @code{!} operator - -A @dfn{Boolean expression} is a combination of comparison expressions or -matching expressions, using the Boolean operators ``or'' -(@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with -parentheses to control nesting. The truth value of the Boolean expression is -computed by combining the truth values of the component expressions. -Boolean expressions are also referred to as @dfn{logical expressions}. -The terms are equivalent. - -Boolean expressions can be used wherever comparison and matching -expressions can be used. They can be used in @code{if}, @code{while}, -@code{do}, and @code{for} statements -(@pxref{Statements, ,Control Statements in Actions}). -They have numeric values (one if true, zero if false), that come into play -if the result of the Boolean expression is stored in a variable or -used in arithmetic. - -In addition, every Boolean expression is also a valid pattern, so -you can use one as a pattern to control the execution of rules. -The Boolean operators are: - -@table @code -@item @var{boolean1} && @var{boolean2} -True if both @var{boolean1} and @var{boolean2} are true. For example, -the following statement prints the current input record if it contains -both @samp{2400} and @samp{foo}: - -@example -if ($0 ~ /2400/ && $0 ~ /foo/) print -@end example - -@cindex side effects -The subexpression @var{boolean2} is evaluated only if @var{boolean1} -is true. This can make a difference when @var{boolean2} contains -expressions that have side effects. In the case of @samp{$0 ~ /foo/ && -($2 == bar++)}, the variable @code{bar} is not incremented if there is -no substring @samp{foo} in the record. - -@item @var{boolean1} || @var{boolean2} -True if at least one of @var{boolean1} or @var{boolean2} is true. -For example, the following statement prints all records in the input -that contain @emph{either} @samp{2400} or -@samp{foo} or both: - -@example -if ($0 ~ /2400/ || $0 ~ /foo/) print -@end example - -The subexpression @var{boolean2} is evaluated only if @var{boolean1} -is false. This can make a difference when @var{boolean2} contains -expressions that have side effects. - -@item ! @var{boolean} -True if @var{boolean} is false. For example, -the following program prints @samp{no home!} in -the unusual event that the @env{HOME} environment -variable is not defined: - -@example -BEGIN @{ if (! ("HOME" in ENVIRON)) - print "no home!" @} -@end example - -(The @code{in} operator is described in -@ref{Reference to Elements, ,Referring to an Array Element}.) -@end table - -The @samp{&&} and @samp{||} operators are called @dfn{short-circuit} -operators because of the way they work. Evaluation of the full expression -is ``short-circuited'' if the result can be determined part way through -its evaluation. - -@cindex line continuation -Statements that use @samp{&&} or @samp{||} can be continued simply -by putting a newline after them. But you cannot put a newline in front -of either of these operators without using backslash continuation -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). - -@cindex flag variables -The actual value of an expression using the @samp{!} operator is -either one or zero, depending upon the truth value of the expression it -is applied to. -The @samp{!} operator is often useful for changing the sense of a flag -variable from false to true and back again. For example, the following -program is one way to print lines in between special bracketing lines: - -@example -$1 == "START" @{ interested = ! interested; next @} -interested == 1 @{ print @} -$1 == "END" @{ interested = ! interested; next @} -@end example - -@noindent -The variable @code{interested}, as with all @command{awk} variables, starts -out initialized to zero, which is also false. When a line is seen whose -first field is @samp{START}, the value of @code{interested} is toggled -to true, using @samp{!}. The next rule prints lines as long as -@code{interested} is true. When a line is seen whose first field is -@samp{END}, @code{interested} is toggled back to false. - -@ignore -Scott Deifik points out that this program isn't robust against -bogus input data, but the point is to illustrate the use of `!', -so we'll leave well enough alone. -@end ignore - -@strong{Note:} The @code{next} statement is discussed in -@ref{Next Statement, ,The @code{next} Statement}. -@code{next} tells @command{awk} to skip the rest of the rules, get the -next record, and start processing the rules over again at the top. -The reason it's there is to avoid printing the bracketing -@samp{START} and @samp{END} lines. - -@node Conditional Exp, Function Calls, Boolean Ops, Expressions -@section Conditional Expressions -@cindex conditional expression -@cindex expression, conditional - -A @dfn{conditional expression} is a special kind of expression that has -three operands. It allows you to use one expression's value to select -one of two other expressions. -The conditional expression is the same as in the C language, -as shown here: - -@example -@var{selector} ? @var{if-true-exp} : @var{if-false-exp} -@end example - -@noindent -There are three subexpressions. The first, @var{selector}, is always -computed first. If it is ``true'' (not zero or not null), then -@var{if-true-exp} is computed next and its value becomes the value of -the whole expression. Otherwise, @var{if-false-exp} is computed next -and its value becomes the value of the whole expression. -For example, the following expression produces the absolute value of @code{x}: - -@example -x >= 0 ? x : -x -@end example - -@cindex side effects -Each time the conditional expression is computed, only one of -@var{if-true-exp} and @var{if-false-exp} is used; the other is ignored. -This is important when the expressions have side effects. For example, -this conditional expression examines element @code{i} of either array -@code{a} or array @code{b}, and increments @code{i}: - -@example -x == y ? a[i++] : b[i++] -@end example - -@noindent -This is guaranteed to increment @code{i} exactly once, because each time -only one of the two increment expressions is executed -and the other is not. -@xref{Arrays, ,Arrays in @command{awk}}, -for more information about arrays. - -@cindex differences between @command{gawk} and @command{awk} -@cindex line continuation -As a minor @command{gawk} extension, -a statement that uses @samp{?:} can be continued simply -by putting a newline after either character. -However, putting a newline in front -of either character does not work without using backslash continuation -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). -If @option{--posix} is specified -(@pxref{Options, , Command-Line Options}), then this extension is disabled. - -@node Function Calls, Precedence, Conditional Exp, Expressions -@section Function Calls -@cindex function call -@cindex calling a function - -A @dfn{function} is a name for a particular calculation. -This enables you to -ask for it by name at any point in the program. For -example, the function @code{sqrt} computes the square root of a number. - -A fixed set of functions are @dfn{built-in}, which means they are -available in every @command{awk} program. The @code{sqrt} function is one -of these. @xref{Built-in, ,Built-in Functions}, for a list of built-in -functions and their descriptions. In addition, you can define -functions for use in your program. -@xref{User-defined, ,User-Defined Functions}, -for instructions on how to do this. - -@cindex arguments in function call -The way to use a function is with a @dfn{function call} expression, -which consists of the function name followed immediately by a list of -@dfn{arguments} in parentheses. The arguments are expressions that -provide the raw materials for the function's calculations. -When there is more than one argument, they are separated by commas. If -there are no arguments, just write @samp{()} after the function name. -The following examples show function calls with and without arguments: - -@example -sqrt(x^2 + y^2) @i{one argument} -atan2(y, x) @i{two arguments} -rand() @i{no arguments} -@end example - -@strong{Caution:} -Do not put any space between the function name and the open-parenthesis! -A user-defined function name looks just like the name of a -variable---a space would make the expression look like concatenation of -a variable with an expression inside parentheses. - -With built-in functions, space before the parenthesis is harmless, but -it is best not to get into the habit of using space to avoid mistakes -with user-defined functions. Each function expects a particular number -of arguments. For example, the @code{sqrt} function must be called with -a single argument: the number to take the square root of: - -@example -sqrt(@var{argument}) -@end example - -Some of the built-in functions have one or -more optional arguments. -If those arguments are not supplied, the functions -use a reasonable default value. -@xref{Built-in, ,Built-in Functions}, for full details. If arguments -are omitted in calls to user-defined functions, then those arguments are -treated as local variables and initialized to the empty string -(@pxref{User-defined, ,User-Defined Functions}). - -@cindex side effects -Like every other expression, the function call has a value, which is -computed by the function based on the arguments you give it. In this -example, the value of @samp{sqrt(@var{argument})} is the square root of -@var{argument}. A function can also have side effects, such as assigning -values to certain variables or doing I/O. -The following program reads numbers, one number per line, and prints the -square root of each one: - -@example -$ awk '@{ print "The square root of", $1, "is", sqrt($1) @}' -1 -@print{} The square root of 1 is 1 -3 -@print{} The square root of 3 is 1.73205 -5 -@print{} The square root of 5 is 2.23607 -@kbd{Ctrl-d} -@end example - -@node Precedence, , Function Calls, Expressions -@section Operator Precedence (How Operators Nest) -@cindex precedence -@cindex operator precedence - -@dfn{Operator precedence} determines how operators are grouped when -different operators appear close by in one expression. For example, -@samp{*} has higher precedence than @samp{+}; thus, @samp{a + b * c} -means to multiply @code{b} and @code{c}, and then add @code{a} to the -product (i.e., @samp{a + (b * c)}). - -The normal precedence of the operators can be overruled by using parentheses. -Think of the precedence rules as saying where the -parentheses are assumed to be. In -fact, it is wise to always use parentheses whenever there is an unusual -combination of operators, because other people who read the program may -not remember what the precedence is in this case. -Even experienced programmers occasionally forget the exact rules, -which leads to mistakes. -Explicit parentheses help prevent -any such mistakes. - -When operators of equal precedence are used together, the leftmost -operator groups first, except for the assignment, conditional, and -exponentiation operators, which group in the opposite order. -Thus, @samp{a - b + c} groups as @samp{(a - b) + c} and -@samp{a = b = c} groups as @samp{a = (b = c)}. - -The precedence of prefix unary operators does not matter as long as only -unary operators are involved, because there is only one way to interpret -them: innermost first. Thus, @samp{$++i} means @samp{$(++i)} and -@samp{++$x} means @samp{++($x)}. However, when another operator follows -the operand, then the precedence of the unary operators can matter. -@samp{$x^2} means @samp{($x)^2}, but @samp{-x^2} means -@samp{-(x^2)}, because @samp{-} has lower precedence than @samp{^}, -whereas @samp{$} has higher precedence. -This table presents @command{awk}'s operators, in order of highest -precedence to lowest: - -@page - -@cindex @code{$} field operator -@cindex @code{+} operator -@cindex @code{-} operator -@cindex @code{!} operator -@cindex @code{*} operator -@cindex @code{/} operator -@cindex @code{%} operator -@cindex @code{^} operator -@cindex @code{**} operator -@cindex @code{++} operator -@cindex @code{--} operator -@cindex @code{<} operator -@cindex @code{<=} operator -@cindex @code{==} operator -@cindex @code{!=} operator -@cindex @code{>} operator -@cindex @code{>=} operator -@cindex @code{>>} I/O operator -@cindex @code{|} I/O operator -@cindex @code{|&} I/O operator -@cindex @code{~} operator -@cindex @code{!~} operator -@cindex @code{in} operator -@cindex @code{&&} operator -@cindex @code{||} operator -@cindex @code{?:} operator -@cindex @code{+=} operator -@cindex @code{-=} operator -@cindex @code{*=} operator -@cindex @code{/=} operator -@cindex @code{%=} operator -@cindex @code{^=} operator -@cindex @code{**=} operator -@c use @code in the items, looks better in TeX w/o all the quotes -@table @code -@item (@dots{}) -Grouping. - -@item $ -Field. - -@item ++ -- -Increment, decrement. - -@item ^ ** -Exponentiation. These operators group right-to-left. - -@item + - ! -Unary plus, minus, logical ``not.'' - -@item * / % -Multiplication, division, modulus. - -@item + - -Addition, subtraction. - -@item @r{String Concatenation} -No special symbol is used to indicate concatenation. -The operands are simply written side by side -(@pxref{Concatenation, ,String Concatenation}). - -@item < <= == != -@itemx > >= >> | |& -Relational and redirection. -The relational operators and the redirections have the same precedence -level. Characters such as @samp{>} serve both as relationals and as -redirections; the context distinguishes between the two meanings. - -Note that the I/O redirection operators in @code{print} and @code{printf} -statements belong to the statement level, not to expressions. The -redirection does not produce an expression that could be the operand of -another operator. As a result, it does not make sense to use a -redirection operator near another operator of lower precedence without -parentheses. Such combinations (for example @samp{print foo > a ? b : c}), -result in syntax errors. -The correct way to write this statement is @samp{print foo > (a ? b : c)}. - -@item ~ !~ -Matching, non-matching. - -@item in -Array membership. - -@item && -Logical ``and''. - -@item || -Logical ``or''. - -@item ?: -Conditional. This operator groups right-to-left. - -@item = += -= *= -@itemx /= %= ^= **= -Assignment. These operators group right-to-left. -@end table - -@cindex portability issues -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@strong{Note:} -The @samp{|&}, @samp{**}, and @samp{**=} operators are not specified by POSIX. -For maximum portability, do not use them. - -@node Patterns and Actions, Arrays, Expressions, Top -@chapter Patterns, Actions, and Variables -@cindex pattern, definition of - -As you have already seen, each @command{awk} statement consists of -a pattern with an associated action. This @value{CHAPTER} describes how -you build patterns and actions, what kinds of things you can do within -actions, and @command{awk}'s built-in variables. - -The pattern-action rules and the statements available for use -within actions form the core of @command{awk} programming. -In a sense, everything covered -up to here has been the foundation -that programs are built on top of. Now it's time to start -building something useful. - -@menu -* Pattern Overview:: What goes into a pattern. -* Using Shell Variables:: How to use shell variables with @command{awk}. -* Action Overview:: What goes into an action. -* Statements:: Describes the various control statements in - detail. -* Built-in Variables:: Summarizes the built-in variables. -@end menu - -@node Pattern Overview, Using Shell Variables, Patterns and Actions, Patterns and Actions -@section Pattern Elements - -@menu -* Regexp Patterns:: Using regexps as patterns. -* Expression Patterns:: Any expression can be used as a pattern. -* Ranges:: Pairs of patterns specify record ranges. -* BEGIN/END:: Specifying initialization and cleanup rules. -* Empty:: The empty pattern, which matches every record. -@end menu - -@cindex patterns, types of -Patterns in @command{awk} control the execution of rules---a rule is -executed when its pattern matches the current input record. -The following is a summary of the types of patterns in @command{awk}: - -@table @code -@item /@var{regular expression}/ -A regular expression. It matches when the text of the -input record fits the regular expression. -(@xref{Regexp, ,Regular Expressions}.) - -@item @var{expression} -A single expression. It matches when its value -is nonzero (if a number) or non-null (if a string). -(@xref{Expression Patterns, ,Expressions as Patterns}.) - -@item @var{pat1}, @var{pat2} -A pair of patterns separated by a comma, specifying a range of records. -The range includes both the initial record that matches @var{pat1} and -the final record that matches @var{pat2}. -(@xref{Ranges, ,Specifying Record Ranges with Patterns}.) - -@item BEGIN -@itemx END -Special patterns for you to supply startup or cleanup actions for your -@command{awk} program. -(@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}.) - -@item @var{empty} -The empty pattern matches every input record. -(@xref{Empty, ,The Empty Pattern}.) -@end table - -@node Regexp Patterns, Expression Patterns, Pattern Overview, Pattern Overview -@subsection Regular Expressions as Patterns - -Regular expressions are one of the first kinds of patterns presented -in this book. -This kind of pattern is simply a regexp constant in the pattern part of -a rule. Its meaning is @samp{$0 ~ /@var{pattern}/}. -The pattern matches when the input record matches the regexp. -For example: - -@example -/foo|bar|baz/ @{ buzzwords++ @} -END @{ print buzzwords, "buzzwords seen" @} -@end example - -@node Expression Patterns, Ranges, Regexp Patterns, Pattern Overview -@subsection Expressions as Patterns - -Any @command{awk} expression is valid as an @command{awk} pattern. -The pattern matches if the expression's value is nonzero (if a -number) or non-null (if a string). -The expression is reevaluated each time the rule is tested against a new -input record. If the expression uses fields such as @code{$1}, the -value depends directly on the new input record's text; otherwise it -depends on only what has happened so far in the execution of the -@command{awk} program. - -Comparison expressions, using the comparison operators described in -@ref{Typing and Comparison, ,Variable Typing and Comparison Expressions}, -are a very common kind of pattern. -Regexp matching and non-matching are also very common expressions. -The left operand of the @samp{~} and @samp{!~} operators is a string. -The right operand is either a constant regular expression enclosed in -slashes (@code{/@var{regexp}/}), or any expression whose string value -is used as a dynamic regular expression -(@pxref{Computed Regexps, , Using Dynamic Regexps}). -The following example prints the second field of each input record -whose first field is precisely @samp{foo}: - -@example -$ awk '$1 == "foo" @{ print $2 @}' BBS-list -@end example - -@noindent -(There is no output, because there is no BBS site with the exact name @samp{foo}.) -Contrast this with the following regular expression match, which -accepts any record with a first field that contains @samp{foo}: - -@example -$ awk '$1 ~ /foo/ @{ print $2 @}' BBS-list -@print{} 555-1234 -@print{} 555-6699 -@print{} 555-6480 -@print{} 555-2127 -@end example - -A regexp constant as a pattern is also a special case of an expression -pattern. The expression @code{/foo/} has the value one if @samp{foo} -appears in the current input record. Thus, as a pattern, @code{/foo/} -matches any record containing @samp{foo}. - -Boolean expressions are also commonly used as patterns. -Whether the pattern -matches an input record depends on whether its subexpressions match. -For example, the following command prints all the records in -@file{BBS-list} that contain both @samp{2400} and @samp{foo}: - -@example -$ awk '/2400/ && /foo/' BBS-list -@print{} fooey 555-1234 2400/1200/300 B -@end example - -The following command prints all records in -@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo} -(or both, of course): - -@example -$ awk '/2400/ || /foo/' BBS-list -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} fooey 555-1234 2400/1200/300 B -@print{} foot 555-6699 1200/300 B -@print{} macfoo 555-6480 1200/300 A -@print{} sdace 555-3430 2400/1200/300 A -@print{} sabafoo 555-2127 1200/300 C -@end example - -The following command prints all records in -@file{BBS-list} that do @emph{not} contain the string @samp{foo}: - -@example -$ awk '! /foo/' BBS-list -@print{} aardvark 555-5553 1200/300 B -@print{} alpo-net 555-3412 2400/1200/300 A -@print{} barfly 555-7685 1200/300 A -@print{} bites 555-1675 2400/1200/300 A -@print{} camelot 555-0542 300 C -@print{} core 555-2912 1200/300 C -@print{} sdace 555-3430 2400/1200/300 A -@end example - -The subexpressions of a Boolean operator in a pattern can be constant regular -expressions, comparisons, or any other @command{awk} expressions. Range -patterns are not expressions, so they cannot appear inside Boolean -patterns. Likewise, the special patterns @code{BEGIN} and @code{END}, -which never match any input record, are not expressions and cannot -appear inside Boolean patterns. - -@node Ranges, BEGIN/END, Expression Patterns, Pattern Overview -@subsection Specifying Record Ranges with Patterns - -@cindex range pattern -@cindex pattern, range -@cindex matching ranges of lines -A @dfn{range pattern} is made of two patterns separated by a comma, in -the form @samp{@var{begpat}, @var{endpat}}. It is used to match ranges of -consecutive input records. The first pattern, @var{begpat}, controls -where the range begins, while @var{endpat} controls where -the pattern ends. For example, the following: - -@example -awk '$1 == "on", $1 == "off"' myfile -@end example - -@noindent -prints every record in @file{myfile} between @samp{on}/@samp{off} pairs, inclusive. - -A range pattern starts out by matching @var{begpat} against every -input record. When a record matches @var{begpat}, the range pattern is -@dfn{turned on} and the range pattern matches this record as well. As long as -the range pattern stays turned on, it automatically matches every input -record read. The range pattern also matches @var{endpat} against every -input record; when this succeeds, the range pattern is turned off again -for the following record. Then the range pattern goes back to checking -@var{begpat} against each record. - -The record that turns on the range pattern and the one that turns it -off both match the range pattern. If you don't want to operate on -these records, you can write @code{if} statements in the rule's action -to distinguish them from the records you are interested in. - -It is possible for a pattern to be turned on and off by the same -record. If the record satisfies both conditions, then the action is -executed for just that record. -For example, suppose there is text between two identical markers (say -the @samp{%} symbol), each on its own line, that should be ignored. -A first attempt would be to -combine a range pattern that describes the delimited text with the -@code{next} statement -(not discussed yet, @pxref{Next Statement, , The @code{next} Statement}). -This causes @command{awk} to skip any further processing of the current -record and start over again with the next input record. Such a program -looks like this: - -@example -/^%$/,/^%$/ @{ next @} - @{ print @} -@end example - -@noindent -@cindex skipping lines between markers -@cindex flag variables -This program fails because the range pattern is both turned on and turned off -by the first line, which just has a @samp{%} on it. To accomplish this task, -write the program in the following manner, using a flag: - -@cindex @code{!} operator -@example -/^%$/ @{ skip = ! skip; next @} -skip == 1 @{ next @} # skip lines with `skip' set -@end example - -In a range pattern, the comma (@samp{,}) has the lowest precedence of -all the operators (i.e., it is evaluated last). Thus, the following -program attempts to combine a range pattern with another simpler test: - -@example -echo Yes | awk '/1/,/2/ || /Yes/' -@end example - -The intent of this program is @samp{(/1/,/2/) || /Yes/}. -However, @command{awk} interprets this as @samp{/1/, (/2/ || /Yes/)}. -This cannot be changed or worked around; range patterns do not combine -with other patterns: - -@example -$ echo yes | gawk '(/1/,/2/) || /Yes/' -@error{} gawk: cmd. line:1: (/1/,/2/) || /Yes/ -@error{} gawk: cmd. line:1: ^ parse error -@error{} gawk: cmd. line:2: (/1/,/2/) || /Yes/ -@error{} gawk: cmd. line:2: ^ unexpected newline -@end example - -@node BEGIN/END, Empty, Ranges, Pattern Overview -@subsection The @code{BEGIN} and @code{END} Special Patterns - -@cindex @code{BEGIN} special pattern -@cindex pattern, @code{BEGIN} -@cindex @code{END} special pattern -@cindex pattern, @code{END} -@cindex blocks, @code{BEGIN} and @code{END} -All the patterns described so far are for matching input records. -The @code{BEGIN} and @code{END} special patterns are different. -They supply startup and cleanup actions for @command{awk} programs. -@code{BEGIN} and @code{END} rules must have actions; there is no default -action for these rules because there is no current record when they run. -@code{BEGIN} and @code{END} rules are often referred to as -``@code{BEGIN} and @code{END} blocks'' by long-time @command{awk} -programmers. - -@menu -* Using BEGIN/END:: How and why to use BEGIN/END rules. -* I/O And BEGIN/END:: I/O issues in BEGIN/END rules. -@end menu - -@node Using BEGIN/END, I/O And BEGIN/END, BEGIN/END, BEGIN/END -@subsubsection Startup and Cleanup Actions - -A @code{BEGIN} rule is executed once only, before the first input record -is read. Likewise, an @code{END} rule is executed once only, after all the -input is read. For example: - -@example -$ awk ' -> BEGIN @{ print "Analysis of \"foo\"" @} -> /foo/ @{ ++n @} -> END @{ print "\"foo\" appears", n, "times." @}' BBS-list -@print{} Analysis of "foo" -@print{} "foo" appears 4 times. -@end example - -This program finds the number of records in the input file @file{BBS-list} -that contain the string @samp{foo}. The @code{BEGIN} rule prints a title -for the report. There is no need to use the @code{BEGIN} rule to -initialize the counter @code{n} to zero, since @command{awk} does this -automatically (@pxref{Variables}). -The second rule increments the variable @code{n} every time a -record containing the pattern @samp{foo} is read. The @code{END} rule -prints the value of @code{n} at the end of the run. - -The special patterns @code{BEGIN} and @code{END} cannot be used in ranges -or with Boolean operators (indeed, they cannot be used with any operators). -An @command{awk} program may have multiple @code{BEGIN} and/or @code{END} -rules. They are executed in the order in which they appear: all the @code{BEGIN} -rules at startup and all the @code{END} rules at termination. -@code{BEGIN} and @code{END} rules may be intermixed with other rules. -This feature was added in the 1987 version of @command{awk} and is included -in the POSIX standard. -The original (1978) version of @command{awk} -required the @code{BEGIN} rule to be placed at the beginning of the -program, the @code{END} rule to be placed at the end, and only allowed one of -each. -This is no longer required, but it is a good idea to follow this template -in terms of program organization and readability. - -Multiple @code{BEGIN} and @code{END} rules are useful for writing -library functions, because each library file can have its own @code{BEGIN} and/or -@code{END} rule to do its own initialization and/or cleanup. -The order in which library functions are named on the command line -controls the order in which their @code{BEGIN} and @code{END} rules are -executed. Therefore you have to be careful when writing such rules in -library files so that the order in which they are executed doesn't matter. -@xref{Options, ,Command-Line Options}, for more information on -using library functions. -@xref{Library Functions, ,A Library of @command{awk} Functions}, -for a number of useful library functions. - -If an @command{awk} program only has a @code{BEGIN} rule and no -other rules, then the program exits after the @code{BEGIN} rule is -run.@footnote{The original version of @command{awk} used to keep -reading and ignoring input until end of file was seen.} However, if an -@code{END} rule exists, then the input is read, even if there are -no other rules in the program. This is necessary in case the @code{END} -rule checks the @code{FNR} and @code{NR} variables. - -@node I/O And BEGIN/END, , Using BEGIN/END, BEGIN/END -@subsubsection Input/Output from @code{BEGIN} and @code{END} Rules - -@cindex I/O, from @code{BEGIN} and @code{END} -There are several (sometimes subtle) points to remember when doing I/O -from a @code{BEGIN} or @code{END} rule. -The first has to do with the value of @code{$0} in a @code{BEGIN} -rule. Because @code{BEGIN} rules are executed before any input is read, -there simply is no input record, and therefore no fields, when -executing @code{BEGIN} rules. References to @code{$0} and the fields -yield a null string or zero, depending upon the context. One way -to give @code{$0} a real value is to execute a @code{getline} command -without a variable (@pxref{Getline, ,Explicit Input with @code{getline}}). -Another way is to simply assign a value to @code{$0}. - -@cindex differences between @command{gawk} and @command{awk} -The second point is similar to the first but from the other direction. -Traditionally, due largely to implementation issues, @code{$0} and -@code{NF} were @emph{undefined} inside an @code{END} rule. -The POSIX standard specifies that @code{NF} is available in an @code{END} -rule. It contains the number of fields from the last input record. -Most probably due to an oversight, the standard does not say that @code{$0} -is also preserved, although logically one would think that it should be. -In fact, @command{gawk} does preserve the value of @code{$0} for use in -@code{END} rules. Be aware, however, that Unix @command{awk}, and possibly -other implementations, do not. - -The third point follows from the first two. The meaning of @samp{print} -inside a @code{BEGIN} or @code{END} rule is the same as always: -@samp{print $0}. If @code{$0} is the null string, then this prints an -empty line. Many long time @command{awk} programmers use an unadorned -@samp{print} in @code{BEGIN} and @code{END} rules, to mean @samp{@w{print ""}}, -relying on @code{$0} being null. Although one might generally get away with -this in @code{BEGIN} rules, it is a very bad idea in @code{END} rules, -at least in @command{gawk}. It is also poor style, since if an empty -line is needed in the output, the program should print one explicitly. - -Finally, the @code{next} and @code{nextfile} statements are not allowed -in a @code{BEGIN} rule, because the implicit -read-a-record-and-match-against-the-rules loop has not started yet. Similarly, those statements -are not valid in an @code{END} rule, since all the input has been read. -(@xref{Next Statement, ,The @code{next} Statement}, and see -@ref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}.) - -@node Empty, , BEGIN/END, Pattern Overview -@subsection The Empty Pattern - -@cindex empty pattern -@cindex pattern, empty -An empty (i.e., non-existent) pattern is considered to match @emph{every} -input record. For example, the program: - -@example -awk '@{ print $1 @}' BBS-list -@end example - -@noindent -prints the first field of every record. - -@node Using Shell Variables, Action Overview, Pattern Overview, Patterns and Actions -@section Using Shell Variables in Programs -@cindex shell varibles, using in @command{awk} programs -@cindex using shell variables in @command{awk} programs -@cindex shell and @command{awk} interaction - -@command{awk} programs are often used as components in larger -programs written in shell. -For example, it is very common to use a shell variable to -hold a pattern that the @command{awk} program searches for. -There are two ways to get the value of the shell variable -into the body of the @command{awk} program. - -The most common method is to use shell quoting to substitute -the variable's value into the program inside the script. -For example, in the following program: - -@example -echo -n "Enter search pattern: " -read pattern -awk "/$pattern/ "'@{ nmatches++ @} - END @{ print nmatches, "found" @}' /path/to/data -@end example - -@noindent -the @command{awk} program consists of two pieces of quoted text -that are concatenated together to form the program. -The first part is double-quoted, which allows substitution of -the @code{pattern} variable inside the quotes. -The second part is single-quoted. - -Variable substitution via quoting works, but can be potentially -messy. It requires a good understanding of the shell's quoting rules -(@pxref{Quoting, ,Shell Quoting Issues}), -and it's often difficult to correctly -match up the quotes when reading the program. - -A better method is to use @command{awk}'s variable assignment feature -(@pxref{Assignment Options, ,Assigning Variables on the Command Line}) -to assign the shell variable's value to an @command{awk} variable's -value. Then use dynamic regexps to match the pattern -(@pxref{Computed Regexps, ,Using Dynamic Regexps}). -The following shows how to redo the -previous example using this technique: - -@example -echo -n "Enter search pattern: " -read pattern -awk -v pat="$pattern" '$0 ~ pat @{ nmatches++ @} - END @{ print nmatches, "found" @}' /path/to/data -@end example - -@noindent -Now, the @command{awk} program is just one single-quoted string. -The assignment @samp{-v pat="$pattern"} still requires double quotes, -in case there is whitespace in the value of @code{$pattern}. -The @command{awk} variable @code{pat} could be named @code{pattern} -too, but that would be more confusing. Using a variable also -provides more flexibility, since the variable can be used anywhere inside -the program---for printing, as an array subscript, or for any other -use---without requiring the quoting tricks at every point in the program. - -@node Action Overview, Statements, Using Shell Variables, Patterns and Actions -@section Actions -@cindex action, definition of -@cindex curly braces -@cindex action, curly braces -@cindex action, separating statements - -An @command{awk} program or script consists of a series of -rules and function definitions interspersed. (Functions are -described later. @xref{User-defined, ,User-Defined Functions}.) -A rule contains a pattern and an action, either of which (but not -both) may be omitted. The purpose of the @dfn{action} is to tell -@command{awk} what to do once a match for the pattern is found. Thus, -in outline, an @command{awk} program generally looks like this: - -@example -@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]} -@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]} -@dots{} -function @var{name}(@var{args}) @{ @dots{} @} -@dots{} -@end example - -An action consists of one or more @command{awk} @dfn{statements}, enclosed -in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one -thing to do. The statements are separated by newlines or semicolons. -The curly braces around an action must be used even if the action -contains only one statement, or if it contains no statements at -all. However, if you omit the action entirely, omit the curly braces as -well. An omitted action is equivalent to @samp{@{ print $0 @}}: - -@example -/foo/ @{ @} @i{match @code{foo}, do nothing --- empty action} -/foo/ @i{match @code{foo}, print the record --- omitted action} -@end example - -The following types of statements are supported in @command{awk}: - -@itemize @bullet -@cindex side effects -@item -Expressions, which can call functions or assign values to variables -(@pxref{Expressions}). Executing -this kind of statement simply computes the value of the expression. -This is useful when the expression has side effects -(@pxref{Assignment Ops, ,Assignment Expressions}). - -@item -Control statements, which specify the control flow of @command{awk} -programs. The @command{awk} language gives you C-like constructs -(@code{if}, @code{for}, @code{while}, and @code{do}) as well as a few -special ones (@pxref{Statements, ,Control Statements in Actions}). - -@item -Compound statements, which consist of one or more statements enclosed in -curly braces. A compound statement is used in order to put several -statements together in the body of an @code{if}, @code{while}, @code{do}, -or @code{for} statement. - -@item -Input statements using the @code{getline} command -(@pxref{Getline, ,Explicit Input with @code{getline}}), the @code{next} -statement (@pxref{Next Statement, ,The @code{next} Statement}), -and the @code{nextfile} statement -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). - -@item -Output statements, such as @code{print} and @code{printf}. -@xref{Printing, ,Printing Output}. - -@item -Deletion statements for deleting array elements. -@xref{Delete, ,The @code{delete} Statement}. -@end itemize - -@node Statements, Built-in Variables, Action Overview, Patterns and Actions -@section Control Statements in Actions -@cindex control statement - -@dfn{Control statements}, such as @code{if}, @code{while}, and so on, -control the flow of execution in @command{awk} programs. Most of the -control statements in @command{awk} are patterned on similar statements in C. - -@cindex compound statement -@cindex statement, compound -All the control statements start with special keywords, such as @code{if} -and @code{while}, to distinguish them from simple expressions. -Many control statements contain other statements. For example, the -@code{if} statement contains another statement that may or may not be -executed. The contained statement is called the @dfn{body}. -To include more than one statement in the body, group them into a -single @dfn{compound statement} with curly braces, separating them with -newlines or semicolons. - -@menu -* If Statement:: Conditionally execute some @command{awk} - statements. -* While Statement:: Loop until some condition is satisfied. -* Do Statement:: Do specified action while looping until some - condition is satisfied. -* For Statement:: Another looping statement, that provides - initialization and increment clauses. -* Break Statement:: Immediately exit the innermost enclosing loop. -* Continue Statement:: Skip to the end of the innermost enclosing - loop. -* Next Statement:: Stop processing the current input record. -* Nextfile Statement:: Stop processing the current file. -* Exit Statement:: Stop execution of @command{awk}. -@end menu - -@node If Statement, While Statement, Statements, Statements -@subsection The @code{if}-@code{else} Statement - -@cindex @code{if}-@code{else} statement -The @code{if}-@code{else} statement is @command{awk}'s decision-making -statement. It looks like this: - -@example -if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]} -@end example - -@noindent -The @var{condition} is an expression that controls what the rest of the -statement does. If the @var{condition} is true, @var{then-body} is -executed; otherwise, @var{else-body} is executed. -The @code{else} part of the statement is -optional. The condition is considered false if its value is zero or -the null string; otherwise the condition is true. -Refer to the following: - -@example -if (x % 2 == 0) - print "x is even" -else - print "x is odd" -@end example - -In this example, if the expression @samp{x % 2 == 0} is true (that is, -if the value of @code{x} is evenly divisible by two), then the first -@code{print} statement is executed; otherwise the second @code{print} -statement is executed. -If the @code{else} keyword appears on the same line as @var{then-body} and -@var{then-body} is not a compound statement (i.e., not surrounded by -curly braces), then a semicolon must separate @var{then-body} from -the @code{else}. -To illustrate this, the previous example can be rewritten as: - -@example -if (x % 2 == 0) print "x is even"; else - print "x is odd" -@end example - -@noindent -If the @samp{;} is left out, @command{awk} can't interpret the statement and -it produces a syntax error. Don't actually write programs this way, -because a human reader might fail to see the @code{else} if it is not -the first thing on its line. - -@node While Statement, Do Statement, If Statement, Statements -@subsection The @code{while} Statement -@cindex @code{while} statement -@cindex loop -@cindex body of a loop - -In programming, a @dfn{loop} is a part of a program that can -be executed two or more times in succession. -The @code{while} statement is the simplest looping statement in -@command{awk}. It repeatedly executes a statement as long as a condition is -true. For example: - -@example -while (@var{condition}) - @var{body} -@end example - -@noindent -@var{body} is a statement called the @dfn{body} of the loop, -and @var{condition} is an expression that controls how long the loop -keeps running. -The first thing the @code{while} statement does is test the @var{condition}. -If the @var{condition} is true, it executes the statement @var{body}. -@ifinfo -(The @var{condition} is true when the value -is not zero and not a null string.) -@end ifinfo -After @var{body} has been executed, -@var{condition} is tested again, and if it is still true, @var{body} is -executed again. This process repeats until the @var{condition} is no longer -true. If the @var{condition} is initially false, the body of the loop is -never executed and @command{awk} continues with the statement following -the loop. -This example prints the first three fields of each record, one per line: - -@example -awk '@{ i = 1 - while (i <= 3) @{ - print $i - i++ - @} -@}' inventory-shipped -@end example - -@noindent -The body of this loop is a compound statement enclosed in braces, -containing two statements. -The loop works in the following manner: first, the value of @code{i} is set to one. -Then, the @code{while} statement tests whether @code{i} is less than or equal to -three. This is true when @code{i} equals one, so the @code{i}-th -field is printed. Then the @samp{i++} increments the value of @code{i} -and the loop repeats. The loop terminates when @code{i} reaches four. - -A newline is not required between the condition and the -body; however using one makes the program clearer unless the body is a -compound statement or else is very simple. The newline after the open-brace -that begins the compound statement is not required either, but the -program is harder to read without it. - -@node Do Statement, For Statement, While Statement, Statements -@subsection The @code{do}-@code{while} Statement -@cindex @code{do}-@code{while} statement - -The @code{do} loop is a variation of the @code{while} looping statement. -The @code{do} loop executes the @var{body} once and then repeats the -@var{body} as long as the @var{condition} is true. It looks like this: - -@example -do - @var{body} -while (@var{condition}) -@end example - -Even if the @var{condition} is false at the start, the @var{body} is -executed at least once (and only once, unless executing @var{body} -makes @var{condition} true). Contrast this with the corresponding -@code{while} statement: - -@example -while (@var{condition}) - @var{body} -@end example - -@noindent -This statement does not execute @var{body} even once if the @var{condition} -is false to begin with. -The following is an example of a @code{do} statement: - -@example -@{ i = 1 - do @{ - print $0 - i++ - @} while (i <= 10) -@} -@end example - -@noindent -This program prints each input record ten times. However, it isn't a very -realistic example, since in this case an ordinary @code{while} would do -just as well. This situation reflects actual experience; only -occasionally is there a real use for a @code{do} statement. - -@node For Statement, Break Statement, Do Statement, Statements -@subsection The @code{for} Statement -@cindex @code{for} statement - -The @code{for} statement makes it more convenient to count iterations of a -loop. The general form of the @code{for} statement looks like this: - -@example -for (@var{initialization}; @var{condition}; @var{increment}) - @var{body} -@end example - -@noindent -The @var{initialization}, @var{condition}, and @var{increment} parts are -arbitrary @command{awk} expressions, and @var{body} stands for any -@command{awk} statement. - -The @code{for} statement starts by executing @var{initialization}. -Then, as long -as the @var{condition} is true, it repeatedly executes @var{body} and then -@var{increment}. Typically, @var{initialization} sets a variable to -either zero or one, @var{increment} adds one to it, and @var{condition} -compares it against the desired number of iterations. -For example: - -@example -awk '@{ for (i = 1; i <= 3; i++) - print $i -@}' inventory-shipped -@end example - -@noindent -This prints the first three fields of each input record, with one field per -line. - -It isn't possible to -set more than one variable in the -@var{initialization} part without using a multiple assignment statement -such as @samp{x = y = 0}. This makes sense only if all the initial values -are equal. (But it is possible to initialize additional variables by writing -their assignments as separate statements preceding the @code{for} loop.) - -@cindex comma operator, not supported -The same is true of the @var{increment} part. Incrementing additional -variables requires separate statements at the end of the loop. -The C compound expression, using C's comma operator, is useful in -this context but it is not supported in @command{awk}. - -Most often, @var{increment} is an increment expression, as in the previous -example. But this is not required; it can be any expression -whatsoever. For example, the following statement prints all the powers of two -between 1 and 100: - -@example -for (i = 1; i <= 100; i *= 2) - print i -@end example - -If there is nothing to be done, any of the three expressions in the -parentheses following the @code{for} keyword may be omitted. Thus, -@w{@samp{for (; x > 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the -@var{condition} is omitted, it is treated as true, effectively -yielding an @dfn{infinite loop} (i.e., a loop that never terminates). - -In most cases, a @code{for} loop is an abbreviation for a @code{while} -loop, as shown here: - -@example -@var{initialization} -while (@var{condition}) @{ - @var{body} - @var{increment} -@} -@end example - -@noindent -The only exception is when the @code{continue} statement -(@pxref{Continue Statement, ,The @code{continue} Statement}) is used -inside the loop. Changing a @code{for} statement to a @code{while} -statement in this way can change the effect of the @code{continue} -statement inside the loop. - -The @command{awk} language has a @code{for} statement in addition to a -@code{while} statement because a @code{for} loop is often both less work to -type and more natural to think of. Counting the number of iterations is -very common in loops. It can be easier to think of this counting as part -of looping rather than as something to do inside the loop. - -@ifinfo -@cindex @code{in} operator -There is an alternate version of the @code{for} loop, for iterating over -all the indices of an array: - -@example -for (i in array) - @var{do something with} array[i] -@end example - -@noindent -@xref{Scanning an Array, ,Scanning All Elements of an Array}, -for more information on this version of the @code{for} loop. -@end ifinfo - -@node Break Statement, Continue Statement, For Statement, Statements -@subsection The @code{break} Statement -@cindex @code{break} statement -@cindex loops, exiting - -The @code{break} statement jumps out of the innermost @code{for}, -@code{while}, or @code{do} loop that encloses it. The following example -finds the smallest divisor of any integer, and also identifies prime -numbers: - -@example -# find smallest divisor of num -@{ - num = $1 - for (div = 2; div*div <= num; div++) - if (num % div == 0) - break - if (num % div == 0) - printf "Smallest divisor of %d is %d\n", num, div - else - printf "%d is prime\n", num -@} -@end example - -When the remainder is zero in the first @code{if} statement, @command{awk} -immediately @dfn{breaks out} of the containing @code{for} loop. This means -that @command{awk} proceeds immediately to the statement following the loop -and continues processing. (This is very different from the @code{exit} -statement, which stops the entire @command{awk} program. -@xref{Exit Statement, ,The @code{exit} Statement}.) - -Th following program illustrates how the @var{condition} of a @code{for} -or @code{while} statement could be replaced with a @code{break} inside -an @code{if}: - -@example -# find smallest divisor of num -@{ - num = $1 - for (div = 2; ; div++) @{ - if (num % div == 0) @{ - printf "Smallest divisor of %d is %d\n", num, div - break - @} - if (div*div > num) @{ - printf "%d is prime\n", num - break - @} - @} -@} -@end example - -@cindex @code{break}, outside of loops -@cindex historical features -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@cindex dark corner -The @code{break} statement has no meaning when -used outside the body of a loop. However, although it was never documented, -historical implementations of @command{awk} treated the @code{break} -statement outside of a loop as if it were a @code{next} statement -(@pxref{Next Statement, ,The @code{next} Statement}). -Recent versions of Unix @command{awk} no longer allow this usage. -@command{gawk} supports this use of @code{break} only -if @option{--traditional} -has been specified on the command line -(@pxref{Options, ,Command-Line Options}). -Otherwise, it is treated as an error, since the POSIX standard -specifies that @code{break} should only be used inside the body of a -loop. -@value{DARKCORNER} - -@node Continue Statement, Next Statement, Break Statement, Statements -@subsection The @code{continue} Statement - -@cindex @code{continue} statement -As with @code{break}, the @code{continue} statement is used only inside -@code{for}, @code{while}, and @code{do} loops. It skips -over the rest of the loop body, causing the next cycle around the loop -to begin immediately. Contrast this with @code{break}, which jumps out -of the loop altogether. - -The @code{continue} statement in a @code{for} loop directs @command{awk} to -skip the rest of the body of the loop and resume execution with the -increment-expression of the @code{for} statement. The following program -illustrates this fact: - -@example -BEGIN @{ - for (x = 0; x <= 20; x++) @{ - if (x == 5) - continue - printf "%d ", x - @} - print "" -@} -@end example - -@noindent -This program prints all the numbers from 0 to 20---except for five, for -which the @code{printf} is skipped. Because the increment @samp{x++} -is not skipped, @code{x} does not remain stuck at five. Contrast the -@code{for} loop from the previous example with the following @code{while} loop: - -@example -BEGIN @{ - x = 0 - while (x <= 20) @{ - if (x == 5) - continue - printf "%d ", x - x++ - @} - print "" -@} -@end example - -@noindent -This program loops forever once @code{x} reaches five. - -@cindex @code{continue}, outside of loops -@cindex historical features -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@cindex dark corner -The @code{continue} statement has no meaning when used outside the body of -a loop. Historical versions of @command{awk} treated a @code{continue} -statement outside a loop the same way they treated a @code{break} -statement outside a loop: as if it were a @code{next} -statement -(@pxref{Next Statement, ,The @code{next} Statement}). -Recent versions of Unix @command{awk} no longer work this way, and -@command{gawk} allows it only if @option{--traditional} is specified on -the command line (@pxref{Options, ,Command-Line Options}). Just like the -@code{break} statement, the POSIX standard specifies that @code{continue} -should only be used inside the body of a loop. -@value{DARKCORNER} - -@node Next Statement, Nextfile Statement, Continue Statement, Statements -@subsection The @code{next} Statement -@cindex @code{next} statement - -The @code{next} statement forces @command{awk} to immediately stop processing -the current record and go on to the next record. This means that no -further rules are executed for the current record, and the rest of the -current rule's action isn't executed. - -Contrast this with the effect of the @code{getline} function -(@pxref{Getline, ,Explicit Input with @code{getline}}). That also causes -@command{awk} to read the next record immediately, but it does not alter the -flow of control in any way (i.e., the rest of the current action executes -with a new input record). - -At the highest level, @command{awk} program execution is a loop that reads -an input record and then tests each rule's pattern against it. If you -think of this loop as a @code{for} statement whose body contains the -rules, then the @code{next} statement is analogous to a @code{continue} -statement. It skips to the end of the body of this implicit loop and -executes the increment (which reads another record). - -For example, suppose an @command{awk} program works only on records -with four fields, and it shouldn't fail when given bad input. To avoid -complicating the rest of the program, write a ``weed out'' rule near -the beginning, in the following manner: - -@example -NF != 4 @{ - err = sprintf("%s:%d: skipped: NF != 4\n", FILENAME, FNR) - print err > "/dev/stderr" - next -@} -@end example - -@noindent -Because of the @code{next} statement, -the program's subsequent rules won't see the bad record. The error -message is redirected to the standard error output stream, as error -messages should be. -@xref{Special Files, ,Special @value{FFN}s in @command{gawk}}. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@cindex @code{next}, inside a user-defined function -According to the POSIX standard, the behavior is undefined if -the @code{next} statement is used in a @code{BEGIN} or @code{END} rule. -@command{gawk} treats it as a syntax error. -Although POSIX permits it, -some other @command{awk} implementations don't allow the @code{next} -statement inside function bodies -(@pxref{User-defined, ,User-Defined Functions}). -Just as with any other @code{next} statement, a @code{next} statement inside a -function body reads the next record and starts processing it with the -first rule in the program. -If the @code{next} statement causes the end of the input to be reached, -then the code in any @code{END} rules is executed. -@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}. - -@node Nextfile Statement, Exit Statement, Next Statement, Statements -@subsection Using @command{gawk}'s @code{nextfile} Statement -@cindex @code{nextfile} statement -@cindex differences between @command{gawk} and @command{awk} - -@command{gawk} provides the @code{nextfile} statement, -which is similar to the @code{next} statement. -However, instead of abandoning processing of the current record, the -@code{nextfile} statement instructs @command{gawk} to stop processing the -current @value{DF}. - -The @code{nextfile} statement is a @command{gawk} extension. -In most other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -@code{nextfile} is not special. - -Upon execution of the @code{nextfile} statement, @code{FILENAME} is -updated to the name of the next @value{DF} listed on the command line, -@code{FNR} is reset to one, @code{ARGIND} is incremented, and processing -starts over with the first rule in the program. -(@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.) -If the @code{nextfile} statement causes the end of the input to be reached, -then the code in any @code{END} rules is executed. -@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}. - -The @code{nextfile} statement is useful when there are many @value{DF}s -to process but it isn't necessary to process every record in every file. -Normally, in order to move on to the next @value{DF}, a program -has to continue scanning the unwanted records. The @code{nextfile} -statement accomplishes this much more efficiently. - -While one might think that @samp{close(FILENAME)} would accomplish -the same as @code{nextfile}, this isn't true. @code{close} is -reserved for closing files, pipes, and coprocesses that are -opened with redirections. It is not related to the main processing that -@command{awk} does with the files listed in @code{ARGV}. - -If it's necessary to use an @command{awk} version that doesn't support -@code{nextfile}, see -@ref{Nextfile Function, ,Implementing @code{nextfile} as a Function}, -for a user-defined function that simulates the @code{nextfile} -statement. - -@cindex @code{nextfile}, inside a user-defined function -The current version of the Bell Laboratories @command{awk} -(@pxref{Other Versions, ,Other Freely Available @command{awk} Implementations}) -also supports @code{nextfile}. However, it doesn't allow the @code{nextfile} -statement inside function bodies -(@pxref{User-defined, ,User-Defined Functions}). -@command{gawk} does; a @code{nextfile} inside a -function body reads the next record and starts processing it with the -first rule in the program, just as any other @code{nextfile} statement. - -@cindex @code{next file} statement -@strong{Caution:} Versions of @command{gawk} prior to 3.0 used two -words (@samp{next file}) for the @code{nextfile} statement. -In @value{PVERSION} 3.0, this was changed -to one word, because the treatment of @samp{file} was -inconsistent. When it appeared after @code{next}, @samp{file} was a keyword; -otherwise, it was a regular identifier. The old usage is no longer -accepted; @samp{next file} generates a syntax error. - -@node Exit Statement, , Nextfile Statement, Statements -@subsection The @code{exit} Statement - -@cindex @code{exit} statement -The @code{exit} statement causes @command{awk} to immediately stop -executing the current rule and to stop processing input; any remaining input -is ignored. The @code{exit} statement is written as follows: - -@example -exit @r{[}@var{return code}@r{]} -@end example - -When an @code{exit} statement is executed from a @code{BEGIN} rule, the -program stops processing everything immediately. No input records are -read. However, if an @code{END} rule is present, -as part of executing the @code{exit} statement, -the @code{END} rule is executed -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). -If @code{exit} is used as part of an @code{END} rule, it causes -the program to stop immediately. - -An @code{exit} statement that is not part of a @code{BEGIN} or @code{END} -rule stops the execution of any further automatic rules for the current -record, skips reading any remaining input records, and executes the -@code{END} rule if there is one. - -In such a case, -if you don't want the @code{END} rule to do its job, set a variable -to nonzero before the @code{exit} statement and check that variable in -the @code{END} rule. -@xref{Assert Function, ,Assertions}, -for an example that does this. - -@cindex dark corner -If an argument is supplied to @code{exit}, its value is used as the exit -status code for the @command{awk} process. If no argument is supplied, -@code{exit} returns status zero (success). In the case where an argument -is supplied to a first @code{exit} statement, and then @code{exit} is -called a second time from an @code{END} rule with no argument, -@command{awk} uses the previously supplied exit value. -@value{DARKCORNER} - -@cindex conventions, programming -@cindex programming conventions -For example, suppose an error condition occurs that is difficult or -impossible to handle. Conventionally, programs report this by -exiting with a nonzero status. An @command{awk} program can do this -using an @code{exit} statement with a nonzero argument, as shown -in the following example: - -@example -BEGIN @{ - if (("date" | getline date_now) <= 0) @{ - print "Can't get system date" > "/dev/stderr" - exit 1 - @} - print "current date is", date_now - close("date") -@} -@end example - -@node Built-in Variables, , Statements, Patterns and Actions -@section Built-in Variables -@cindex built-in variables - -Most @command{awk} variables are available for you to use for your own -purposes; they never change unless your program assigns values to -them, and they never affect anything unless your program examines them. -However, a few variables in @command{awk} have special built-in meanings. -@command{awk} examines some of these automatically, so that they enable you -to tell @command{awk} how to do certain things. Others are set -automatically by @command{awk}, so that they carry information from the -internal workings of @command{awk} to your program. - -This @value{SECTION} documents all the built-in variables of -@command{gawk}, most of which are also documented in the chapters -describing their areas of activity. - -@menu -* User-modified:: Built-in variables that you change to control - @command{awk}. -* Auto-set:: Built-in variables where @command{awk} gives - you information. -* ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}. -@end menu - -@node User-modified, Auto-set, Built-in Variables, Built-in Variables -@subsection Built-in Variables That Control @command{awk} -@cindex built-in variables, user modifiable - -The following is an alphabetical list of variables that you can change to -control how @command{awk} does certain things. The variables that are -specific to @command{gawk} are marked with a pound sign (@samp{#}). - -@table @code -@cindex @code{BINMODE} variable -@cindex binary I/O -@cindex I/O, binary -@cindex differences between @command{gawk} and @command{awk} -@item BINMODE # -On non-POSIX systems, this variable specifies use of ``binary'' mode for all I/O. -Numeric values of one, two, or three, specify that input files, output files, or -all files, respectively, should use binary I/O. -Alternatively, -string values of @code{"r"} or @code{"w"} specify that input files and -output files, respectively, should use binary I/O. -A string value of @code{"rw"} or @code{"wr"} indicates that all -files should use binary I/O. -Any other string value is equivalent to @code{"rw"}, but @command{gawk} -generates a warning message. -@code{BINMODE} is described in more detail in -@ref{PC Using, ,Using @command{gawk} on PC Operating Systems}. - -This variable is a @command{gawk} extension. -In other @command{awk} implementations -(except @command{mawk}, -@pxref{Other Versions, , Other Freely Available @command{awk} Implementations}), -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -it is not special. - -@cindex @code{CONVFMT} variable -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@item CONVFMT -This string controls conversion of numbers to -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}). -It works by being passed, in effect, as the first argument to the -@code{sprintf} function -(@pxref{String Functions, ,String Manipulation Functions}). -Its default value is @code{"%.6g"}. -@code{CONVFMT} was introduced by the POSIX standard. - -@cindex @code{FIELDWIDTHS} variable -@item FIELDWIDTHS # -This is a space-separated list of columns that tells @command{gawk} -how to split input with fixed columnar boundaries. -Assigning a value to @code{FIELDWIDTHS} -overrides the use of @code{FS} for field splitting. -@xref{Constant Size, ,Reading Fixed-Width Data}, for more information. - -If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), then @code{FIELDWIDTHS} -has no special meaning, and field-splitting operations occur based -exclusively on the value of @code{FS}. - -@cindex @code{FS} variable -@item FS -This is the input field separator -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). -The value is a single-character string or a multi-character regular -expression that matches the separations between fields in an input -record. If the value is the null string (@code{""}), then each -character in the record becomes a separate field. -(This behavior is a @command{gawk} extension. POSIX @command{awk} does not -specify the behavior when @code{FS} is the null string.) -@c NEXT ED: Mark as common extension - -The default value is @w{@code{" "}}, a string consisting of a single -space. As a special exception, this value means that any -sequence of spaces, tabs, and/or newlines is a single separator.@footnote{In -POSIX @command{awk}, newline does not count as whitespace.} It also causes -spaces, tabs, and newlines at the beginning and end of a record to be ignored. - -You can set the value of @code{FS} on the command line using the -@option{-F} option: - -@example -awk -F, '@var{program}' @var{input-files} -@end example - -If @command{gawk} is using @code{FIELDWIDTHS} for field splitting, -assigning a value to @code{FS} causes @command{gawk} to return to -the normal, @code{FS}-based field splitting. An easy way to do this -is to simply say @samp{FS = FS}, perhaps with an explanatory comment. - -@cindex @code{IGNORECASE} variable -@item IGNORECASE # -If @code{IGNORECASE} is nonzero or non-null, then all string comparisons -and all regular expression matching are case-independent. Thus, regexp -matching with @samp{~} and @samp{!~}, as well as the @code{gensub}, -@code{gsub}, @code{index}, @code{match}, @code{split}, and @code{sub} -functions, record termination with @code{RS}, and field splitting with -@code{FS}, all ignore case when doing their particular regexp operations. -However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting. -@xref{Case-sensitivity, ,Case Sensitivity in Matching}. - -If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -then @code{IGNORECASE} has no special meaning. Thus, string -and regexp operations are always case-sensitive. - -@cindex @code{LINT} variable -@cindex differences between @command{gawk} and @command{awk} -@cindex lint checks -@item LINT # -When this variable is true (nonzero or non-null), @command{gawk} -behaves as if the @option{--lint} command-line option is in effect. -(@pxref{Options, ,Command-Line Options}). -With a value of @code{"fatal"}, lint warnings become fatal errors. -Any other true value prints non-fatal warnings. -Assigning a false value to @code{LINT} turns off the lint warnings. - -This variable is a @command{gawk} extension. It is not special -in other @command{awk} implementations. Unlike the other special variables, -changing @code{LINT} does affect the production of lint warnings, -even if @command{gawk} is in compatibility mode. Much as -the @option{--lint} and @option{--traditional} options independently -control different aspects of @command{gawk}'s behavior, the control -of lint warnings during program execution is independent of the flavor -of @command{awk} being executed. - -@cindex @code{OFMT} variable -@item OFMT -This string controls conversion of numbers to -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}) for -printing with the @code{print} statement. It works by being passed -as the first argument to the @code{sprintf} function -(@pxref{String Functions, ,String Manipulation Functions}). -Its default value is @code{"%.6g"}. Earlier versions of @command{awk} -also used @code{OFMT} to specify the format for converting numbers to -strings in general expressions; this is now done by @code{CONVFMT}. - -@cindex @code{OFS} variable -@item OFS -This is the output field separator (@pxref{Output Separators}). It is -output between the fields printed by a @code{print} statement. Its -default value is @w{@code{" "}}, a string consisting of a single space. - -@cindex @code{ORS} variable -@item ORS -This is the output record separator. It is output at the end of every -@code{print} statement. Its default value is @code{"\n"}, the newline -character. (@xref{Output Separators}.) - -@cindex @code{RS} variable -@item RS -This is @command{awk}'s input record separator. Its default value is a string -containing a single newline character, which means that an input record -consists of a single line of text. -It can also be the null string, in which case records are separated by -runs of blank lines. -If it is a regexp, records are separated by -matches of the regexp in the input text. -(@xref{Records, ,How Input Is Split into Records}.) - -The ability for @code{RS} to be a regular expression -is a @command{gawk} extension. -In most other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -just the first character of @code{RS}'s value is used. - -@cindex @code{SUBSEP} variable -@item SUBSEP -This is the subscript separator. It has the default value of -@code{"\034"} and is used to separate the parts of the indices of a -multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}} -really accesses @code{foo["A\034B"]} -(@pxref{Multi-dimensional, ,Multidimensional Arrays}). - -@cindex @code{TEXTDOMAIN} variable -@cindex internationalization -@item TEXTDOMAIN # -This variable is used for internationalization of programs at the -@command{awk} level. It sets the default text domain for specially -marked string constants in the source text, as well as for the -@code{dcgettext} and @code{bindtextdomain} functions -(@pxref{Internationalization, ,Internationalization with @command{gawk}}). -The default value of @code{TEXTDOMAIN} is @code{"messages"}. - -This variable is a @command{gawk} extension. -In other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -it is not special. -@end table - -@node Auto-set, ARGC and ARGV, User-modified, Built-in Variables -@subsection Built-in Variables That Convey Information -@cindex built-in variables, convey information - -The following is an alphabetical list of variables that @command{awk} -sets automatically on certain occasions in order to provide -information to your program. The variables that are specific to -@command{gawk} are marked with an asterisk (@samp{*}). - -@table @code -@cindex @code{ARGC} variable -@cindex @code{ARGV} variable -@item ARGC@r{,} ARGV -The command-line arguments available to @command{awk} programs are stored in -an array called @code{ARGV}. @code{ARGC} is the number of command-line -arguments present. @xref{Other Arguments, ,Other Command-Line Arguments}. -Unlike most @command{awk} arrays, -@code{ARGV} is indexed from 0 to @code{ARGC} @minus{} 1. -In the following example: - -@example -$ awk 'BEGIN @{ -> for (i = 0; i < ARGC; i++) -> print ARGV[i] -> @}' inventory-shipped BBS-list -@print{} awk -@print{} inventory-shipped -@print{} BBS-list -@end example - -@noindent -@code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]} -contains @code{"inventory-shipped"} and @code{ARGV[2]} contains -@code{"BBS-list"}. The value of @code{ARGC} is three, one more than the -index of the last element in @code{ARGV}, because the elements are numbered -from zero. - -@cindex conventions, programming -@cindex programming conventions -The names @code{ARGC} and @code{ARGV}, as well as the convention of indexing -the array from 0 to @code{ARGC} @minus{} 1, are derived from the C language's -method of accessing command-line arguments. - -The value of @code{ARGV[0]} can vary from system to system. -Also, you should note that the program text is @emph{not} included in -@code{ARGV}, nor are any of @command{awk}'s command-line options. -@xref{ARGC and ARGV, , Using @code{ARGC} and @code{ARGV}}, for information -about how @command{awk} uses these variables. - -@cindex @code{ARGIND} variable -@item ARGIND # -This is the index in @code{ARGV} of the current file being processed. -Every time @command{gawk} opens a new @value{DF} for processing, it sets -@code{ARGIND} to the index in @code{ARGV} of the @value{FN}. -When @command{gawk} is processing the input files, -@samp{FILENAME == ARGV[ARGIND]} is always true. - -This variable is useful in file processing; it allows you to tell how far -along you are in the list of @value{DF}s as well as to distinguish between -successive instances of the same @value{FN} on the command line. - -While you can change the value of @code{ARGIND} within your @command{awk} -program, @command{gawk} automatically sets it to a new value when the -next file is opened. - -This variable is a @command{gawk} extension. -In other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -it is not special. - -@cindex @code{ENVIRON} variable -@item ENVIRON -An associative array that contains the values of the environment. The array -indices are the environment variable names; the elements are the values of -the particular environment variables. For example, -@code{ENVIRON["HOME"]} might be @file{/home/arnold}. Changing this array -does not affect the environment passed on to any programs that -@command{awk} may spawn via redirection or the @code{system} function. -@c (In a future version of @command{gawk}, it may do so.) - -Some operating systems may not have environment variables. -On such systems, the @code{ENVIRON} array is empty (except for -@w{@code{ENVIRON["AWKPATH"]}}, -@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). - -@cindex @code{ERRNO} variable -@item ERRNO # -If a system error occurs during a redirection for @code{getline}, -during a read for @code{getline}, or during a @code{close} operation, -then @code{ERRNO} contains a string describing the error. - -This variable is a @command{gawk} extension. -In other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -it is not special. - -@cindex dark corner -@cindex @code{FILENAME} variable -@item FILENAME -This is the name of the file that @command{awk} is currently reading. -When no @value{DF}s are listed on the command line, @command{awk} reads -from the standard input and @code{FILENAME} is set to @code{"-"}. -@code{FILENAME} is changed each time a new file is read -(@pxref{Reading Files, ,Reading Input Files}). -Inside a @code{BEGIN} rule, the value of @code{FILENAME} is -@code{""}, since there are no input files being processed -yet.@footnote{Some early implementations of Unix @command{awk} initialized -@code{FILENAME} to @code{"-"}, even if there were @value{DF}s to be -processed. This behavior was incorrect and should not be relied -upon in your programs.} -@value{DARKCORNER} -Note though, that using @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}) -inside a @code{BEGIN} rule can give -@code{FILENAME} a value. - -@cindex @code{FNR} variable -@item FNR -This is the current record number in the current file. @code{FNR} is -incremented each time a new record is read -(@pxref{Getline, ,Explicit Input with @code{getline}}). It is reinitialized -to zero each time a new input file is started. - -@cindex @code{NF} variable -@item NF -This is the number of fields in the current input record. -@code{NF} is set each time a new record is read, when a new field is -created or when @code{$0} changes (@pxref{Fields, ,Examining Fields}). - -@cindex @code{NR} variable -@item NR -This is the number of input records @command{awk} has processed since -the beginning of the program's execution -(@pxref{Records, ,How Input Is Split into Records}). -@code{NR} is incremented each time a new record is read. - -@cindex @code{PROCINFO} variable -@item PROCINFO # -The elements of this array provide access to information about the -running @command{awk} program. -The following elements (listed alphabetically) -are guaranteed to be available: - -@table @code -@item PROCINFO["egid"] -The value of the @code{getegid} system call. - -@item PROCINFO["euid"] -The value of the @code{geteuid} system call. - -@item PROCINFO["FS"] -This is -@code{"FS"} if field splitting with @code{FS} is in effect, or it is -@code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect. - -@item PROCINFO["gid"] -The value of the @code{getgid} system call. - -@item PROCINFO["pgrpid"] -The process group ID of the current process. - -@item PROCINFO["pid"] -The process ID of the current process. - -@item PROCINFO["ppid"] -The parent process ID of the current process. - -@item PROCINFO["uid"] -The value of the @code{getuid} system call. -@end table - -On some systems, there may be elements in the array, @code{"group1"} -through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of -supplementary groups that the process has. Use the @code{in} operator -to test for these elements -(@pxref{Reference to Elements, , Referring to an Array Element}). - -This array is a @command{gawk} extension. -In other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -it is not special. - -@cindex @code{RLENGTH} variable -@item RLENGTH -This is the length of the substring matched by the -@code{match} function -(@pxref{String Functions, ,String Manipulation Functions}). -@code{RLENGTH} is set by invoking the @code{match} function. Its value -is the length of the matched string, or @minus{}1 if no match is found. - -@cindex @code{RSTART} variable -@item RSTART -This is the start-index in characters of the substring that is matched by the -@code{match} function -(@pxref{String Functions, ,String Manipulation Functions}). -@code{RSTART} is set by invoking the @code{match} function. Its value -is the position of the string where the matched substring starts, or zero -if no match was found. - -@cindex @code{RT} variable -@item RT # -This is set each time a record is read. It contains the input text -that matched the text denoted by @code{RS}, the record separator. - -This variable is a @command{gawk} extension. -In other @command{awk} implementations, -or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), -it is not special. -@end table - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Changing @code{NR} and @code{FNR} -@cindex advanced notes -@cindex dark corner -@command{awk} increments @code{NR} and @code{FNR} -each time it reads a record, instead of setting them to the absolute -value of the number of records read. This means that a program can -change these variables and their new values are incremented for -each record. -@value{DARKCORNER} -This is demonstrated in the following example: - -@example -$ echo '1 -> 2 -> 3 -> 4' | awk 'NR == 2 @{ NR = 17 @} -> @{ print NR @}' -@print{} 1 -@print{} 17 -@print{} 18 -@print{} 19 -@end example - -@noindent -Before @code{FNR} was added to the @command{awk} language -(@pxref{V7/SVR3.1, ,Major Changes Between V7 and SVR3.1}), -many @command{awk} programs used this feature to track the number of -records in a file by resetting @code{NR} to zero when @code{FILENAME} -changed. - -@node ARGC and ARGV, , Auto-set, Built-in Variables -@subsection Using @code{ARGC} and @code{ARGV} - -@ref{Auto-set, ,Built-in Variables That Convey Information}, -presented the following program describing the information contained in @code{ARGC} -and @code{ARGV}: - -@example -$ awk 'BEGIN @{ -> for (i = 0; i < ARGC; i++) -> print ARGV[i] -> @}' inventory-shipped BBS-list -@print{} awk -@print{} inventory-shipped -@print{} BBS-list -@end example - -@noindent -In this example, @code{ARGV[0]} contains @samp{awk}, @code{ARGV[1]} -contains @samp{inventory-shipped}, and @code{ARGV[2]} contains -@samp{BBS-list}. -Notice that the @command{awk} program is not entered in @code{ARGV}. The -other special command-line options, with their arguments, are also not -entered. This includes variable assignments done with the @option{-v} -option (@pxref{Options, ,Command-Line Options}). -Normal variable assignments on the command line @emph{are} -treated as arguments and do show up in the @code{ARGV} array: - -@example -$ cat showargs.awk -@print{} BEGIN @{ -@print{} printf "A=%d, B=%d\n", A, B -@print{} for (i = 0; i < ARGC; i++) -@print{} printf "\tARGV[%d] = %s\n", i, ARGV[i] -@print{} @} -@print{} END @{ printf "A=%d, B=%d\n", A, B @} -$ awk -v A=1 -f showargs.awk B=2 /dev/null -@print{} A=1, B=0 -@print{} ARGV[0] = awk -@print{} ARGV[1] = B=2 -@print{} ARGV[2] = /dev/null -@print{} A=1, B=2 -@end example - -A program can alter @code{ARGC} and the elements of @code{ARGV}. -Each time @command{awk} reaches the end of an input file, it uses the next -element of @code{ARGV} as the name of the next input file. By storing a -different string there, a program can change which files are read. -Use @code{"-"} to represent the standard input. Storing -additional elements and incrementing @code{ARGC} causes -additional files to be read. - -If the value of @code{ARGC} is decreased, that eliminates input files -from the end of the list. By recording the old value of @code{ARGC} -elsewhere, a program can treat the eliminated arguments as -something other than @value{FN}s. - -To eliminate a file from the middle of the list, store the null string -(@code{""}) into @code{ARGV} in place of the file's name. As a -special feature, @command{awk} ignores @value{FN}s that have been -replaced with the null string. -Another option is to -use the @code{delete} statement to remove elements from -@code{ARGV} (@pxref{Delete, ,The @code{delete} Statement}). - -All of these actions are typically done in the @code{BEGIN} rule, -before actual processing of the input begins. -@xref{Split Program, ,Splitting a Large File into Pieces}, and see -@ref{Tee Program, ,Duplicating Output into Multiple Files}, for examples -of each way of removing elements from @code{ARGV}. -The following fragment processes @code{ARGV} in order to examine, and -then remove, command-line options: -@c NEXT ED: Add xref to rewind() function - -@example -BEGIN @{ - for (i = 1; i < ARGC; i++) @{ - if (ARGV[i] == "-v") - verbose = 1 - else if (ARGV[i] == "-d") - debug = 1 - else if (ARGV[i] ~ /^-?/) @{ - e = sprintf("%s: unrecognized option -- %c", - ARGV[0], substr(ARGV[i], 1, ,1)) - print e > "/dev/stderr" - @} else - break - delete ARGV[i] - @} -@} -@end example - -To actually get the options into the @command{awk} program, -end the @command{awk} options with @option{--} and then supply -the @command{awk} program's options, in the following manner: - -@example -awk -f myprog -- -v -d file1 file2 @dots{} -@end example - -@cindex differences between @command{gawk} and @command{awk} -This is not necessary in @command{gawk}. Unless @option{--posix} has -been specified, @command{gawk} silently puts any unrecognized options -into @code{ARGV} for the @command{awk} program to deal with. As soon -as it sees an unknown option, @command{gawk} stops looking for other -options that it might otherwise recognize. The previous example with -@command{gawk} would be: - -@example -gawk -f myprog -d -v file1 file2 @dots{} -@end example - -@noindent -Because @option{-d} is not a valid @command{gawk} option, -it and the following @option{-v} -are passed on to the @command{awk} program. - -@node Arrays, Functions, Patterns and Actions, Top -@chapter Arrays in @command{awk} - -An @dfn{array} is a table of values called @dfn{elements}. The -elements of an array are distinguished by their indices. @dfn{Indices} -may be either numbers or strings. - -This @value{CHAPTER} describes how arrays work in @command{awk}, -how to use array elements, how to scan through every element in an array, -and how to remove array elements. -It also describes how @command{awk} simulates multidimensional -arrays, as well as some of the less obvious points about array usage. -The @value{CHAPTER} finishes with a discussion of @command{gawk}'s facility -for sorting an array based on its indices. - -@cindex names, use of -@cindex namespace issues in @command{awk} -@command{awk} maintains a single set -of names that may be used for naming variables, arrays, and functions -(@pxref{User-defined, ,User-Defined Functions}). -Thus, you cannot have a variable and an array with the same name in the -same @command{awk} program. - -@menu -* Array Intro:: Introduction to Arrays -* Reference to Elements:: How to examine one element of an array. -* Assigning Elements:: How to change an element of an array. -* Array Example:: Basic Example of an Array -* Scanning an Array:: A variation of the @code{for} statement. It - loops through the indices of an array's - existing elements. -* Delete:: The @code{delete} statement removes an element - from an array. -* Numeric Array Subscripts:: How to use numbers as subscripts in - @command{awk}. -* Uninitialized Subscripts:: Using Uninitialized variables as subscripts. -* Multi-dimensional:: Emulating multidimensional arrays in - @command{awk}. -* Multi-scanning:: Scanning multidimensional arrays. -* Array Sorting:: Sorting array values and indices. -@end menu - -@node Array Intro, Reference to Elements, Arrays, Arrays -@section Introduction to Arrays - -@cindex arrays -The @command{awk} language provides one-dimensional arrays -for storing groups of related strings or numbers. -Every @command{awk} array must have a name. Array names have the same -syntax as variable names; any valid variable name would also be a valid -array name. But one name cannot be used in both ways (as an array and -as a variable) in the same @command{awk} program. - -Arrays in @command{awk} superficially resemble arrays in other programming -languages, but there are fundamental differences. In @command{awk}, it -isn't necessary to specify the size of an array before starting to use it. -Additionally, any number or string in @command{awk}, not just consecutive integers, -may be used as an array index. - -In most other languages, arrays must be @dfn{declared} before use, -including a specification of -how many elements or components they contain. In such languages, the -declaration causes a contiguous block of memory to be allocated for that -many elements. Usually, an index in the array must be a positive integer. -For example, the index zero specifies the first element in the array, which is -actually stored at the beginning of the block of memory. Index one -specifies the second element, which is stored in memory right after the -first element, and so on. It is impossible to add more elements to the -array, because it has room only for as many elements as given in -the declaration. -(Some languages allow arbitrary starting and ending -indices---e.g., @samp{15 .. 27}---but the size of the array is still fixed when -the array is declared.) - -A contiguous array of four elements might look like the following example, -conceptually, if the element values are 8, @code{"foo"}, -@code{""}, and 30: - -@c NEXT ED: Use real images here -@iftex -@c from Karl Berry, much thanks for the help. -@tex -\bigskip % space above the table (about 1 linespace) -\offinterlineskip -\newdimen\width \width = 1.5cm -\newdimen\hwidth \hwidth = 4\width \advance\hwidth by 2pt % 5 * 0.4pt -\centerline{\vbox{ -\halign{\strut\hfil\ignorespaces#&&\vrule#&\hbox to\width{\hfil#\unskip\hfil}\cr -\noalign{\hrule width\hwidth} - &&{\tt 8} &&{\tt "foo"} &&{\tt ""} &&{\tt 30} &&\quad Value\cr -\noalign{\hrule width\hwidth} -\noalign{\smallskip} - &\omit&0&\omit &1 &\omit&2 &\omit&3 &\omit&\quad Index\cr -} -}} -@end tex -@end iftex -@ifinfo -@example -+---------+---------+--------+---------+ -| 8 | "foo" | "" | 30 | @r{Value} -+---------+---------+--------+---------+ - 0 1 2 3 @r{Index} -@end example -@end ifinfo - -@noindent -Only the values are stored; the indices are implicit from the order of -the values. 8 is the value at index zero, because 8 appears in the -position with zero elements before it. - -@cindex arrays, definition of -@cindex associative arrays -@cindex arrays, associative -Arrays in @command{awk} are different---they are @dfn{associative}. This means -that each array is a collection of pairs: an index, and its corresponding -array element value: - -@example -@r{Element} 3 @r{Value} 30 -@r{Element} 1 @r{Value} "foo" -@r{Element} 0 @r{Value} 8 -@r{Element} 2 @r{Value} "" -@end example - -@noindent -The pairs are shown in jumbled order because their order is irrelevant. - -One advantage of associative arrays is that new pairs can be added -at any time. For example, suppose a tenth element is added to the array -whose value is @w{@code{"number ten"}}. The result is: - -@example -@r{Element} 10 @r{Value} "number ten" -@r{Element} 3 @r{Value} 30 -@r{Element} 1 @r{Value} "foo" -@r{Element} 0 @r{Value} 8 -@r{Element} 2 @r{Value} "" -@end example - -@noindent -@cindex sparse arrays -@cindex arrays, sparse -Now the array is @dfn{sparse}, which just means some indices are missing. -It has elements 0--3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or 9. - -Another consequence of associative arrays is that the indices don't -have to be positive integers. Any number, or even a string, can be -an index. For example, the following is an array that translates words from -English into French: - -@example -@r{Element} "dog" @r{Value} "chien" -@r{Element} "cat" @r{Value} "chat" -@r{Element} "one" @r{Value} "un" -@r{Element} 1 @r{Value} "un" -@end example - -@noindent -Here we decided to translate the number one in both spelled-out and -numeric form---thus illustrating that a single array can have both -numbers and strings as indices. -In fact, array subscripts are always strings; this is discussed -in more detail in -@ref{Numeric Array Subscripts, ,Using Numbers to Subscript Arrays}. -Here, the number @code{1} isn't double-quoted, since @command{awk} -automatically converts it to a string. - -@cindex arrays, subscripts, and @code{IGNORECASE} -@cindex @code{IGNORECASE}, and array subscripts -@cindex @code{IGNORECASE} variable -The value of @code{IGNORECASE} has no effect upon array subscripting. -The identical string value used to store an array element must be used -to retrieve it. -When @command{awk} creates an array (e.g., with the @code{split} -built-in function), -that array's indices are consecutive integers starting at one. -(@xref{String Functions, ,String Manipulation Functions}.) - -@command{awk}'s arrays are efficient---the time to access an element -is independent of the number of elements in the array. - -@node Reference to Elements, Assigning Elements, Array Intro, Arrays -@section Referring to an Array Element -@cindex array reference -@cindex element of array -@cindex reference to array - -The principal way to use an array is to refer to one of its elements. -An array reference is an expression as follows: - -@example -@var{array}[@var{index}] -@end example - -@noindent -Here, @var{array} is the name of an array. The expression @var{index} is -the index of the desired element of the array. - -The value of the array reference is the current value of that array -element. For example, @code{foo[4.3]} is an expression for the element -of array @code{foo} at index @samp{4.3}. - -A reference to an array element that has no recorded value yields a value of -@code{""}, the null string. This includes elements -that have not been assigned any value as well as elements that have been -deleted (@pxref{Delete, ,The @code{delete} Statement}). Such a reference -automatically creates that array element, with the null string as its value. -(In some cases, this is unfortunate, because it might waste memory inside -@command{awk}.) - -@cindex arrays, presence of elements -@cindex arrays, the @code{in} operator -To determine whether an element exists in an array at a certain index, use -the following expression: - -@example -@var{index} in @var{array} -@end example - -@cindex side effects -@noindent -This expression tests whether or not the particular index exists, -without the side effect of creating that element if it is not present. -The expression has the value one (true) if @code{@var{array}[@var{index}]} -exists and zero (false) if it does not exist. -For example, this statement tests whether the array @code{frequencies} -contains the index @samp{2}: - -@example -if (2 in frequencies) - print "Subscript 2 is present." -@end example - -Note that this is @emph{not} a test of whether the array -@code{frequencies} contains an element whose @emph{value} is two. -There is no way to do that except to scan all the elements. Also, this -@emph{does not} create @code{frequencies[2]}, while the following -(incorrect) alternative does: - -@example -if (frequencies[2] != "") - print "Subscript 2 is present." -@end example - -@node Assigning Elements, Array Example, Reference to Elements, Arrays -@section Assigning Array Elements -@cindex array assignment -@cindex element assignment - -Array elements can be assigned values just like -@command{awk} variables: - -@example -@var{array}[@var{subscript}] = @var{value} -@end example - -@noindent -@var{array} is the name of an array. The expression -@var{subscript} is the index of the element of the array that is -assigned a value. The expression @var{value} is the value to -assign to that element of the array. - -@node Array Example, Scanning an Array, Assigning Elements, Arrays -@section Basic Array Example - -The following program takes a list of lines, each beginning with a line -number, and prints them out in order of line number. The line numbers -are not in order when they are first read---instead they -are scrambled. This program sorts the lines by making an array using -the line numbers as subscripts. The program then prints out the lines -in sorted order of their numbers. It is a very simple program and gets -confused upon encountering repeated numbers, gaps, or lines that don't -begin with a number: - -@example -@c file eg/misc/arraymax.awk -@{ - if ($1 > max) - max = $1 - arr[$1] = $0 -@} - -END @{ - for (x = 1; x <= max; x++) - print arr[x] -@} -@c endfile -@end example - -The first rule keeps track of the largest line number seen so far; -it also stores each line into the array @code{arr}, at an index that -is the line's number. -The second rule runs after all the input has been read, to print out -all the lines. -When this program is run with the following input: - -@example -@c file eg/misc/arraymax.data -5 I am the Five man -2 Who are you? The new number two! -4 . . . And four on the floor -1 Who is number one? -3 I three you. -@c endfile -@end example - -@noindent -its output is: - -@example -1 Who is number one? -2 Who are you? The new number two! -3 I three you. -4 . . . And four on the floor -5 I am the Five man -@end example - -If a line number is repeated, the last line with a given number overrides -the others. -Gaps in the line numbers can be handled with an easy improvement to the -program's @code{END} rule, as follows: - -@example -END @{ - for (x = 1; x <= max; x++) - if (x in arr) - print arr[x] -@} -@end example - -@node Scanning an Array, Delete, Array Example, Arrays -@section Scanning All Elements of an Array -@cindex @code{for (x in @dots{})} statement -@cindex arrays, special @code{for} statement -@cindex scanning an array -@cindex @code{in} operator - -In programs that use arrays, it is often necessary to use a loop that -executes once for each element of an array. In other languages, where -arrays are contiguous and indices are limited to positive integers, -this is easy: all the valid indices can be found by counting from -the lowest index up to the highest. This technique won't do the job -in @command{awk}, because any number or string can be an array index. -So @command{awk} has a special kind of @code{for} statement for scanning -an array: - -@example -for (@var{var} in @var{array}) - @var{body} -@end example - -@noindent -This loop executes @var{body} once for each index in @var{array} that the -program has previously used, with the variable @var{var} set to that index. - -The following program uses this form of the @code{for} statement. The -first rule scans the input records and notes which words appear (at -least once) in the input, by storing a one into the array @code{used} with -the word as index. The second rule scans the elements of @code{used} to -find all the distinct words that appear in the input. It prints each -word that is more than 10 characters long and also prints the number of -such words. -@xref{String Functions, ,String Manipulation Functions}, -for more information on the built-in function @code{length}. - -@example -# Record a 1 for each word that is used at least once -@{ - for (i = 1; i <= NF; i++) - used[$i] = 1 -@} - -# Find number of distinct words more than 10 characters long -END @{ - for (x in used) - if (length(x) > 10) @{ - ++num_long_words - print x - @} - print num_long_words, "words longer than 10 characters" -@} -@end example - -@noindent -@xref{Word Sorting, ,Generating Word Usage Counts}, -for a more detailed example of this type. - -The order in which elements of the array are accessed by this statement -is determined by the internal arrangement of the array elements within -@command{awk} and cannot be controlled or changed. This can lead to -problems if new elements are added to @var{array} by statements in -the loop body; it is not predictable whether or not the @code{for} loop will -reach them. Similarly, changing @var{var} inside the loop may produce -strange results. It is best to avoid such things. - -@node Delete, Numeric Array Subscripts, Scanning an Array, Arrays -@section The @code{delete} Statement -@cindex @code{delete} statement -@cindex deleting elements of arrays -@cindex removing elements of arrays -@cindex arrays, deleting an element - -To remove an individual element of an array, use the @code{delete} -statement: - -@example -delete @var{array}[@var{index}] -@end example - -Once an array element has been deleted, any value the element once -had is no longer available. It is as if the element had never -been referred to or had been given a value. -The following is an example of deleting elements in an array: - -@example -for (i in frequencies) - delete frequencies[i] -@end example - -@noindent -This example removes all the elements from the array @code{frequencies}. -Once an element is deleted, a subsequent @code{for} statement to scan the array -does not report that element and the @code{in} operator to check for -the presence of that element returns zero (i.e., false): - -@example -delete foo[4] -if (4 in foo) - print "This will never be printed" -@end example - -It is important to note that deleting an element is @emph{not} the -same as assigning it a null value (the empty string, @code{""}). -For example: - -@example -foo[4] = "" -if (4 in foo) - print "This is printed, even though foo[4] is empty" -@end example - -@cindex lint checks -It is not an error to delete an element that does not exist. -If @option{--lint} is provided on the command line -(@pxref{Options, ,Command-Line Options}), -@command{gawk} issues a warning message when an element that -is not in the array is deleted. - -@cindex arrays, deleting entire contents -@cindex deleting entire arrays -@cindex differences between @command{gawk} and @command{awk} -All the elements of an array may be deleted with a single statement -by leaving off the subscript in the @code{delete} statement, -as follows: - -@example -delete @var{array} -@end example - -This ability is a @command{gawk} extension; it is not available in -compatibility mode (@pxref{Options, ,Command-Line Options}). - -Using this version of the @code{delete} statement is about three times -more efficient than the equivalent loop that deletes each element one -at a time. - -@cindex portability issues -@cindex Brennan, Michael -The following statement provides a portable but non-obvious way to clear -out an array:@footnote{Thanks to Michael Brennan for pointing this out.} - -@example -split("", array) -@end example - -The @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}) -clears out the target array first. This call asks it to split -apart the null string. Because there is no data to split out, the -function simply clears the array and then returns. - -@strong{Caution:} Deleting an array does not change its type; you cannot -delete an array and then use the array's name as a scalar -(i.e., a regular variable). For example, the following does not work: - -@example -a[1] = 3; delete a; a = 3 -@end example - -@node Numeric Array Subscripts, Uninitialized Subscripts, Delete, Arrays -@section Using Numbers to Subscript Arrays - -@cindex conversions, during subscripting -@cindex numbers, used as subscripts -@cindex @code{CONVFMT} variable -An important aspect about arrays to remember is that @emph{array subscripts -are always strings}. When a numeric value is used as a subscript, -it is converted to a string value before being used for subscripting -(@pxref{Conversion, ,Conversion of Strings and Numbers}). -This means that the value of the built-in variable @code{CONVFMT} can -affect how your program accesses elements of an array. For example: - -@example -xyz = 12.153 -data[xyz] = 1 -CONVFMT = "%2.2f" -if (xyz in data) - printf "%s is in data\n", xyz -else - printf "%s is not in data\n", xyz -@end example - -@noindent -This prints @samp{12.15 is not in data}. The first statement gives -@code{xyz} a numeric value. Assigning to -@code{data[xyz]} subscripts @code{data} with the string value @code{"12.153"} -(using the default conversion value of @code{CONVFMT}, @code{"%.6g"}). -Thus, the array element @code{data["12.153"]} is assigned the value one. -The program then changes -the value of @code{CONVFMT}. The test @samp{(xyz in data)} generates a new -string value from @code{xyz}---this time @code{"12.15"}---because the value of -@code{CONVFMT} only allows two significant digits. This test fails, -since @code{"12.15"} is a different string from @code{"12.153"}. - -According to the rules for conversions -(@pxref{Conversion, ,Conversion of Strings and Numbers}), integer -values are always converted to strings as integers, no matter what the -value of @code{CONVFMT} may happen to be. So the usual case of -the following works: - -@example -for (i = 1; i <= maxsub; i++) - @i{do something with} array[i] -@end example - -The ``integer values always convert to strings as integers'' rule -has an additional consequence for array indexing. -Octal and hexadecimal constants -(@pxref{Non-decimal-numbers, ,Octal and Hexadecimal Numbers}) -are converted internally into numbers and their original form -is forgotten. -This means, for example, that -@code{array[17]}, -@code{array[021]}, -and -@code{array[0x11]} -all refer to the same element! - -As with many things in @command{awk}, the majority of the time -things work as one would expect them to. But it is useful to have a precise -knowledge of the actual rules which sometimes can have a subtle -effect on your programs. - -@node Uninitialized Subscripts, Multi-dimensional, Numeric Array Subscripts, Arrays -@section Using Uninitialized Variables as Subscripts - -@cindex uninitialized variables, as array subscripts -@cindex arrays, subscripts, uninitialized variables -Suppose it's necessary to write a program -to print the input data in reverse order. -A reasonable attempt to do so (with some test -data) might look like this: - -@example -$ echo 'line 1 -> line 2 -> line 3' | awk '@{ l[lines] = $0; ++lines @} -> END @{ -> for (i = lines-1; i >= 0; --i) -> print l[i] -> @}' -@print{} line 3 -@print{} line 2 -@end example - -Unfortunately, the very first line of input data did not come out in the -output! - -At first glance, this program should have worked. The variable @code{lines} -is uninitialized, and uninitialized variables have the numeric value zero. -So, @command{awk} should have printed the value of @code{l[0]}. - -The issue here is that subscripts for @command{awk} arrays are @emph{always} -strings. Uninitialized variables, when used as strings, have the -value @code{""}, not zero. Thus, @samp{line 1} ends up stored in -@code{l[""]}. -The following version of the program works correctly: - -@example -@{ l[lines++] = $0 @} -END @{ - for (i = lines - 1; i >= 0; --i) - print l[i] -@} -@end example - -Here, the @samp{++} forces @code{lines} to be numeric, thus making -the ``old value'' numeric zero. This is then converted to @code{"0"} -as the array subscript. - -@cindex null string, as array subscript -@cindex dark corner -@cindex lint checks -Even though it is somewhat unusual, the null string -(@code{""}) is a valid array subscript. -@value{DARKCORNER} -@command{gawk} warns about the use of the null string as a subscript -if @option{--lint} is provided -on the command line (@pxref{Options, ,Command-Line Options}). - -@node Multi-dimensional, Multi-scanning, Uninitialized Subscripts, Arrays -@section Multidimensional Arrays - -@cindex subscripts in arrays -@cindex arrays, multidimensional subscripts -@cindex multidimensional subscripts -A multidimensional array is an array in which an element is identified -by a sequence of indices instead of a single index. For example, a -two-dimensional array requires two indices. The usual way (in most -languages, including @command{awk}) to refer to an element of a -two-dimensional array named @code{grid} is with -@code{grid[@var{x},@var{y}]}. - -@cindex @code{SUBSEP} variable -Multidimensional arrays are supported in @command{awk} through -concatenation of indices into one string. -@command{awk} converts the indices into strings -(@pxref{Conversion, ,Conversion of Strings and Numbers}) and -concatenates them together, with a separator between them. This creates -a single string that describes the values of the separate indices. The -combined string is used as a single index into an ordinary, -one-dimensional array. The separator used is the value of the built-in -variable @code{SUBSEP}. - -For example, suppose we evaluate the expression @samp{foo[5,12] = "value"} -when the value of @code{SUBSEP} is @code{"@@"}. The numbers 5 and 12 are -converted to strings and -concatenated with an @samp{@@} between them, yielding @code{"5@@12"}; thus, -the array element @code{foo["5@@12"]} is set to @code{"value"}. - -Once the element's value is stored, @command{awk} has no record of whether -it was stored with a single index or a sequence of indices. The two -expressions @samp{foo[5,12]} and @w{@samp{foo[5 SUBSEP 12]}} are always -equivalent. - -The default value of @code{SUBSEP} is the string @code{"\034"}, -which contains a non-printing character that is unlikely to appear in an -@command{awk} program or in most input data. -The usefulness of choosing an unlikely character comes from the fact -that index values that contain a string matching @code{SUBSEP} can lead to -combined strings that are ambiguous. Suppose that @code{SUBSEP} is -@code{"@@"}; then @w{@samp{foo["a@@b", "c"]}} and @w{@samp{foo["a", -"b@@c"]}} are indistinguishable because both are actually -stored as @samp{foo["a@@b@@c"]}. - -To test whether a particular index sequence exists in a -``multidimensional'' array, use the same operator (@samp{in}) that is -used for single dimensional arrays. Write the whole sequence of indices -in parentheses, separated by commas, as the left operand: - -@example -(@var{subscript1}, @var{subscript2}, @dots{}) in @var{array} -@end example - -The following example treats its input as a two-dimensional array of -fields; it rotates this array 90 degrees clockwise and prints the -result. It assumes that all lines have the same number of -elements. - -@example -@{ - if (max_nf < NF) - max_nf = NF - max_nr = NR - for (x = 1; x <= NF; x++) - vector[x, NR] = $x -@} - -END @{ - for (x = 1; x <= max_nf; x++) @{ - for (y = max_nr; y >= 1; --y) - printf("%s ", vector[x, y]) - printf("\n") - @} -@} -@end example - -@noindent -When given the input: - -@example -1 2 3 4 5 6 -2 3 4 5 6 1 -3 4 5 6 1 2 -4 5 6 1 2 3 -@end example - -@noindent -the program produces the following output: - -@example -4 3 2 1 -5 4 3 2 -6 5 4 3 -1 6 5 4 -2 1 6 5 -3 2 1 6 -@end example - -@node Multi-scanning, Array Sorting, Multi-dimensional, Arrays -@section Scanning Multidimensional Arrays - -There is no special @code{for} statement for scanning a -``multidimensional'' array. There cannot be one, because in truth there -are no multidimensional arrays or elements---there is only a -multidimensional @emph{way of accessing} an array. - -However, if your program has an array that is always accessed as -multidimensional, you can get the effect of scanning it by combining -the scanning @code{for} statement -(@pxref{Scanning an Array, ,Scanning All Elements of an Array}) with the -built-in @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}). -It works in the following manner: - -@example -for (combined in array) @{ - split(combined, separate, SUBSEP) - @dots{} -@} -@end example - -@noindent -This sets the variable @code{combined} to -each concatenated combined index in the array, and splits it -into the individual indices by breaking it apart where the value of -@code{SUBSEP} appears. The individual indices then become the elements of -the array @code{separate}. - -Thus, if a value is previously stored in @code{array[1, "foo"]}; then -an element with index @code{"1\034foo"} exists in @code{array}. (Recall -that the default value of @code{SUBSEP} is the character with code 034.) -Sooner or later, the @code{for} statement finds that index and does an -iteration with the variable @code{combined} set to @code{"1\034foo"}. -Then the @code{split} function is called as follows: - -@example -split("1\034foo", separate, "\034") -@end example - -@noindent -The result is to set @code{separate[1]} to @code{"1"} and -@code{separate[2]} to @code{"foo"}. Presto! The original sequence of -separate indices is recovered. - -@node Array Sorting, , Multi-scanning, Arrays -@section Sorting Array Values and Indices with @command{gawk} - -@cindex arrays, sorting -@cindex @code{asort} built-in function -The order in which an array is scanned with a @samp{for (i in array)} -loop is essentially arbitrary. -In most @command{awk} implementations, sorting an array requires -writing a @code{sort} function. -While this can be educational for exploring different sorting algorithms, -usually that's not the point of the program. -@command{gawk} provides the built-in @code{asort} function -(@pxref{String Functions, ,String Manipulation Functions}) -that sorts an array. For example: - -@example -@var{populate the array} data -n = asort(data) -for (i = 1; i <= n; i++) - @var{do something with} data[i] -@end example - -After the call to @code{asort}, the array @code{data} is indexed from 1 -to some number @var{n}, the total number of elements in @code{data}. -(This count is @code{asort}'s return value.) -@code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. -The comparison of array elements is done -using @command{gawk}'s usual comparison rules -(@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}). - -@cindex side effects -An important side effect of calling @code{asort} is that -@emph{the array's original indices are irrevocably lost}. -As this isn't always desirable, @code{asort} accepts a -second argument: - -@example -@var{populate the array} source -n = asort(source, dest) -for (i = 1; i <= n; i++) - @var{do something with} dest[i] -@end example - -In this case, @command{gawk} copies the @code{source} array into the -@code{dest} array and then sorts @code{dest}, destroying its indices. -However, the @code{source} array is not affected. - -Often, what's needed is to sort on the values of the @emph{indices} -instead of the values of the elements. To do this, use a helper array -to hold the sorted index values, and then access the original array's -elements. It works in the following way: - -@example -@var{populate the array} data -# copy indices -j = 1 -for (i in data) @{ - ind[j] = i # index value becomes element value - j++ -@} -n = asort(ind) # index values are now sorted -for (i = 1; i <= n; i++) - @var{do something with} data[ind[i]] -@end example - -Sorting the array by replacing the indices provides maximal flexibility. -To traverse the elements in decreasing order, use a loop that goes from -@var{n} down to 1, either over the elements or over the indices. - -@cindex reference counting -Copying array indices and elements isn't expensive in terms of memory. -Internally, @command{gawk} maintains @dfn{reference counts} to data. -For example, when @code{asort} copies the first array to the second one, -there is only one copy of the original array elements' data, even though -both arrays use the values. Similarly, when copying the indices from -@code{data} to @code{ind}, there is only one copy of the actual index -strings. - -@cindex arrays, sorting and @code{IGNORECASE} -@cindex @code{IGNORECASE}, and array sorting -@cindex @code{IGNORECASE} variable -As with array subscripts, the value of @code{IGNORECASE} -does not affect array sorting. - -@node Functions, Internationalization, Arrays, Top -@chapter Functions - -This @value{CHAPTER} describes @command{awk}'s built-in functions, -which fall into three categories: numeric, string, and I/O. -@command{gawk} provides additional groups of functions -to work with values that represent time, do -bit manipulation, and to internationalize and localize programs. - -Besides the built-in functions, @command{awk} has provisions for -writing new functions that the rest of a program can use. -The second half of this @value{CHAPTER} describes these -@dfn{user-defined} functions. - -@menu -* Built-in:: Summarizes the built-in functions. -* User-defined:: Describes User-defined functions in detail. -@end menu - -@node Built-in, User-defined, Functions, Functions -@section Built-in Functions - -@c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!! -@cindex built-in functions -@dfn{Built-in} functions are always available for -your @command{awk} program to call. This @value{SECTION} defines all -the built-in -functions in @command{awk}; some of these are mentioned in other sections -but are summarized here for your convenience. - -@menu -* Calling Built-in:: How to call built-in functions. -* Numeric Functions:: Functions that work with numbers, including - @code{int}, @code{sin} and @code{rand}. -* String Functions:: Functions for string manipulation, such as - @code{split}, @code{match} and @code{sprintf}. -* I/O Functions:: Functions for files and shell commands. -* Time Functions:: Functions for dealing with timestamps. -* Bitwise Functions:: Functions for bitwise operations. -* I18N Functions:: Functions for string translation. -@end menu - -@node Calling Built-in, Numeric Functions, Built-in, Built-in -@subsection Calling Built-in Functions - -To call one of @command{awk}'s built-in functions, write the name of -the function followed -by arguments in parentheses. For example, @samp{atan2(y + z, 1)} -is a call to the function @code{atan2}, and has two arguments. - -@cindex conventions, programming -@cindex programming conventions -Whitespace is ignored between the built-in function name and the -open parenthesis, and it is good practice to avoid using whitespace -there. User-defined functions do not permit whitespace in this way, and -it is easier to avoid mistakes by following a simple -convention that always works---no whitespace after a function name. - -@cindex fatal errors -@cindex differences between @command{gawk} and @command{awk} -Each built-in function accepts a certain number of arguments. -In some cases, arguments can be omitted. The defaults for omitted -arguments vary from function to function and are described under the -individual functions. In some @command{awk} implementations, extra -arguments given to built-in functions are ignored. However, in @command{gawk}, -it is a fatal error to give extra arguments to a built-in function. - -When a function is called, expressions that create the function's actual -parameters are evaluated completely before the call is performed. -For example, in the following code fragment: - -@example -i = 4 -j = sqrt(i++) -@end example - -@cindex evaluation, order of -@cindex order of evaluation -@noindent -the variable @code{i} is incremented to the value five before @code{sqrt} -is called with a value of four for its actual parameter. -The order of evaluation of the expressions used for the function's -parameters is undefined. Thus, avoid writing programs that -assume that parameters are evaluated from left to right or from -right to left. For example: - -@example -i = 5 -j = atan2(i++, i *= 2) -@end example - -If the order of evaluation is left to right, then @code{i} first becomes -six, and then 12, and @code{atan2} is called with the two arguments 6 -and 12. But if the order of evaluation is right to left, @code{i} -first becomes 10, then 11, and @code{atan2} is called with the -two arguments 11 and 10. - -@node Numeric Functions, String Functions, Calling Built-in, Built-in -@subsection Numeric Functions - -The following list describes all of -the built-in functions that work with numbers. -Optional parameters are enclosed in square brackets ([ and ]): - -@table @code -@item int(@var{x}) -@cindex @code{int} built-in function -This returns the nearest integer to @var{x}, located between @var{x} and zero and -truncated toward zero. - -For example, @code{int(3)} is three, @code{int(3.9)} is three, @code{int(-3.9)} -is @minus{}3, and @code{int(-3)} is @minus{}3 as well. - -@item sqrt(@var{x}) -@cindex @code{sqrt} built-in function -This returns the positive square root of @var{x}. -@command{gawk} reports an error -if @var{x} is negative. Thus, @code{sqrt(4)} is two. - -@item exp(@var{x}) -@cindex @code{exp} built-in function -This returns the exponential of @var{x} (@code{e ^ @var{x}}) or reports -an error if @var{x} is out of range. The range of values @var{x} can have -depends on your machine's floating-point representation. - -@item log(@var{x}) -@cindex @code{log} built-in function -This returns the natural logarithm of @var{x}, if @var{x} is positive; -otherwise, it reports an error. - -@item sin(@var{x}) -@cindex @code{sin} built-in function -This returns the sine of @var{x}, with @var{x} in radians. - -@item cos(@var{x}) -@cindex @code{cos} built-in function -This returns the cosine of @var{x}, with @var{x} in radians. - -@item atan2(@var{y}, @var{x}) -@cindex @code{atan2} built-in function -This returns the arctangent of @code{@var{y} / @var{x}} in radians. - -@item rand() -@cindex @code{rand} built-in function -This returns a random number. The values of @code{rand} are -uniformly distributed between zero and one. -The value is never zero and never one.@footnote{The C version of @code{rand} -is known to produce fairly poor sequences of random numbers. -However, nothing requires that an @command{awk} implementation use the C -@code{rand} to implement the @command{awk} version of @code{rand}. -In fact, @command{gawk} uses the BSD @code{random} function, which is -considerably better than @code{rand}, to produce random numbers.} - -Often random integers are needed instead. Following is a user-defined function -that can be used to obtain a random non-negative integer less than @var{n}: - -@example -function randint(n) @{ - return int(n * rand()) -@} -@end example - -@noindent -The multiplication produces a random number greater than zero and less -than @code{n}. Using @code{int}, this result is made into -an integer between zero and @code{n} @minus{} 1, inclusive. - -The following example uses a similar function to produce random integers -between one and @var{n}. This program prints a new random number for -each input record. - -@example -# Function to roll a simulated die. -function roll(n) @{ return 1 + int(rand() * n) @} - -# Roll 3 six-sided dice and -# print total number of points. -@{ - printf("%d points\n", - roll(6)+roll(6)+roll(6)) -@} -@end example - -@cindex seed for random numbers -@cindex random numbers, seed of -@c MAWK uses a different seed each time. -@strong{Caution:} In most @command{awk} implementations, including @command{gawk}, -@code{rand} starts generating numbers from the same -starting number, or @dfn{seed}, each time you run @command{awk}. Thus, -a program generates the same results each time you run it. -The numbers are random within one @command{awk} run but predictable -from run to run. This is convenient for debugging, but if you want -a program to do different things each time it is used, you must change -the seed to a value that is different in each run. To do this, -use @code{srand}. - -@item srand(@r{[}@var{x}@r{]}) -@cindex @code{srand} built-in function -The function @code{srand} sets the starting point, or seed, -for generating random numbers to the value @var{x}. - -Each seed value leads to a particular sequence of random -numbers.@footnote{Computer generated random numbers really are not truly -random. They are technically known as ``pseudo-random.'' This means -that while the numbers in a sequence appear to be random, you can in -fact generate the same sequence of random numbers over and over again.} -Thus, if the seed is set to the same value a second time, -the same sequence of random numbers is produced again. - -Different @command{awk} implementations use different random number -generators internally. Don't expect the same @command{awk} program -to produce the same series of random numbers when executed by -different versions of @command{awk}. - -If the argument @var{x} is omitted, as in @samp{srand()}, then the current -date and time of day are used for a seed. This is the way to get random -numbers that are truly unpredictable. - -The return value of @code{srand} is the previous seed. This makes it -easy to keep track of the seeds in case you need to consistently reproduce -sequences of random numbers. -@end table - -@node String Functions, I/O Functions, Numeric Functions, Built-in -@subsection String Manipulation Functions - -The functions in this @value{SECTION} look at or change the text of one or more -strings. -Optional parameters are enclosed in square brackets ([ and ]). -Those functions that are -specific to @command{gawk} are marked with a pound sign (@samp{#}): - -@menu -* Gory Details:: More than you want to know about @samp{\} and - @samp{&} with @code{sub}, @code{gsub}, and - @code{gensub}. -@end menu - -@table @code -@item asort(@var{source} @r{[}, @var{dest}@r{]}) # -@cindex @code{asort} built-in function -@code{asort} is a @command{gawk}-specific extension, returning the number of -elements in the array @var{source}. The contents of @var{source} are -sorted using @command{gawk}'s normal rules for comparing values, and the indices -of the sorted values of @var{source} are replaced with sequential -integers starting with one. If the optional array @var{dest} is specified, -then @var{source} is duplicated into @var{dest}. @var{dest} is then -sorted, leaving the indices of @var{source} unchanged. -For example, if the contents of @code{a} are as follows: - -@example -a["last"] = "de" -a["first"] = "sac" -a["middle"] = "cul" -@end example - -@noindent -A call to @code{asort}: - -@example -asort(a) -@end example - -@noindent -results in the following contents of @code{a}: - -@example -a[1] = "cul" -a[2] = "de" -a[3] = "sac" -@end example - -@cindex differences between @command{gawk} and @command{awk} -The @code{asort} function is described in more detail in -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}. -@code{asort} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). - -@item index(@var{in}, @var{find}) -@cindex @code{index} built-in function -This searches the string @var{in} for the first occurrence of the string -@var{find}, and returns the position in characters where that occurrence -begins in the string @var{in}. Consider the following example: - -@example -$ awk 'BEGIN @{ print index("peanut", "an") @}' -@print{} 3 -@end example - -@noindent -If @var{find} is not found, @code{index} returns zero. -(Remember that string indices in @command{awk} start at one.) - -@item length(@r{[}@var{string}@r{]}) -@cindex @code{length} built-in function -This returns the number of characters in @var{string}. If -@var{string} is a number, the length of the digit string representing -that number is returned. For example, @code{length("abcde")} is 5. By -contrast, @code{length(15 * 35)} works out to 3. In this example, 15 * 35 = -525, and 525 is then converted to the string @code{"525"}, which has -three characters. - -If no argument is supplied, @code{length} returns the length of @code{$0}. - -@cindex historical features -@cindex portability issues -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -@strong{Note:} -In older versions of @command{awk}, the @code{length} function could -be called -without any parentheses. Doing so is marked as ``deprecated'' in the -POSIX standard. This means that while a program can do this, -it is a feature that can eventually be removed from a future -version of the standard. Therefore, for programs to be maximally portable, -always supply the parentheses. - -@item match(@var{string}, @var{regexp} @r{[}, @var{array}@r{]}) -@cindex @code{match} built-in function -The @code{match} function searches @var{string} for the -longest leftmost substring matched by the regular expression, -@var{regexp}. It returns the character position, or @dfn{index}, -where that substring begins (one, if it starts at the beginning of -@var{string}). If no match is found, it returns zero. - -The order of the first two arguments is backwards from most other string -functions that work with regular expressions, such as -@code{sub} and @code{gsub}. It might help to remember that -for @code{match}, the order is the same as for the @samp{~} operator: -@samp{@var{string} ~ @var{regexp}}. - -@cindex @code{RSTART} variable -@cindex @code{RLENGTH} variable -The @code{match} function sets the built-in variable @code{RSTART} to -the index. It also sets the built-in variable @code{RLENGTH} to the -length in characters of the matched substring. If no match is found, -@code{RSTART} is set to zero, and @code{RLENGTH} to @minus{}1. - -For example: - -@example -@c file eg/misc/findpat.awk -@{ - if ($1 == "FIND") - regex = $2 - else @{ - where = match($0, regex) - if (where != 0) - print "Match of", regex, "found at", - where, "in", $0 - @} -@} -@c endfile -@end example - -@noindent -This program looks for lines that match the regular expression stored in -the variable @code{regex}. This regular expression can be changed. If the -first word on a line is @samp{FIND}, @code{regex} is changed to be the -second word on that line. Therefore, if given: - -@example -@c file eg/misc/findpat.data -FIND ru+n -My program runs -but not very quickly -FIND Melvin -JF+KM -This line is property of Reality Engineering Co. -Melvin was here. -@c endfile -@end example - -@noindent -@command{awk} prints: - -@example -Match of ru+n found at 12 in My program runs -Match of Melvin found at 1 in Melvin was here. -@end example - -@cindex differences between @command{gawk} and @command{awk} -If @var{array} is present, it is cleared, and then the 0'th element -of @var{array} is set to the entire portion of @var{string} -matched by @var{regexp}. If @var{regexp} contains parentheses, -the integer-indexed elements of @var{array} are set to contain the -portion of @var{string} matching the corresponding parenthesized -sub-expression. -For example: - -@example -$ echo foooobazbarrrrr | -> gawk '@{ match($0, /(fo+).+(ba*r)/, arr) -> print arr[1], arr[2] @}' -@print{} foooo barrrrr -@end example - -@cindex fatal errors -The @var{array} argument to @code{match} is a -@command{gawk} extension. In compatibility mode -(@pxref{Options, ,Command-Line Options}), -using a third argument is a fatal error. - -@item split(@var{string}, @var{array} @r{[}, @var{fieldsep}@r{]}) -@cindex @code{split} built-in function -This function divides @var{string} into pieces separated by @var{fieldsep}, -and stores the pieces in @var{array}. The first piece is stored in -@code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so -forth. The string value of the third argument, @var{fieldsep}, is -a regexp describing where to split @var{string} (much as @code{FS} can -be a regexp describing where to split input records). If -the @var{fieldsep} is omitted, the value of @code{FS} is used. -@code{split} returns the number of elements created. -If @var{string} does not match @var{fieldsep}, @var{array} is empty -and @code{split} returns zero. - -The @code{split} function splits strings into pieces in a -manner similar to the way input lines are split into fields. For example: - -@example -split("cul-de-sac", a, "-") -@end example - -@noindent -splits the string @samp{cul-de-sac} into three fields using @samp{-} as the -separator. It sets the contents of the array @code{a} as follows: - -@example -a[1] = "cul" -a[2] = "de" -a[3] = "sac" -@end example - -@noindent -The value returned by this call to @code{split} is three. - -@cindex differences between @command{gawk} and @command{awk} -As with input field-splitting, when the value of @var{fieldsep} is -@w{@code{" "}}, leading and trailing whitespace is ignored and the elements -are separated by runs of whitespace. -Also as with input field-splitting, if @var{fieldsep} is the null string, each -individual character in the string is split into its own array element. -(This is a @command{gawk}-specific extension.) - -@cindex dark corner -Modern implementations of @command{awk}, including @command{gawk}, allow -the third argument to be a regexp constant (@code{/abc/}) as well as a -string. -@value{DARKCORNER} -The POSIX standard allows this as well. - -Before splitting the string, @code{split} deletes any previously existing -elements in the array @var{array}. -If @var{string} does not match @var{fieldsep} at all, @var{array} has -one element only. The value of that element is the original @var{string}. - -@item sprintf(@var{format}, @var{expression1}, @dots{}) -@cindex @code{sprintf} built-in function -This returns (without printing) the string that @code{printf} would -have printed out with the same arguments -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). -For example: - -@example -pival = sprintf("pi = %.2f (approx.)", 22/7) -@end example - -@noindent -assigns the string @w{@code{"pi = 3.14 (approx.)"}} to the variable @code{pival}. - -@cindex @code{strtonum} built-in function -@item strtonum(@var{str}) # -Examines @var{str} and returns its numeric value. If @var{str} -begins with a leading @samp{0}, @code{strtonum} assumes that @var{str} -is an octal number. If @var{str} begins with a leading @samp{0x} or -@samp{0X}, @code{strtonum} assumes that @var{str} is a hexadecimal number. -For example: - -@example -$ echo 0x11 | -> gawk '@{ printf "%d\n", strtonum($1) @}' -@print{} 17 -@end example - -Using the @code{strtonum} function is @emph{not} the same as adding zero -to a string value; the automatic coercion of strings to numbers -works only for decimal data, not for octal or hexadecimal.@footnote{Unless -you use the @option{--non-decimal-data} option, which isn't recommended. -@xref{Non-decimal Data, ,Allowing Non-Decimal Input Data}, for more information.} - -@cindex differences between @command{gawk} and @command{awk} -@code{strtonum} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). - -@item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{sub} built-in function -The @code{sub} function alters the value of @var{target}. -It searches this value, which is treated as a string, for the -leftmost longest substring matched by the regular expression @var{regexp}. -Then the entire string is -changed by replacing the matched text with @var{replacement}. -The modified string becomes the new value of @var{target}. - -This function is peculiar because @var{target} is not simply -used to compute a value, and not just any expression will do---it -must be a variable, field, or array element so that @code{sub} can -store a modified value there. If this argument is omitted, then the -default is to use and alter @code{$0}. -For example: - -@example -str = "water, water, everywhere" -sub(/at/, "ith", str) -@end example - -@noindent -sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the -leftmost longest occurrence of @samp{at} with @samp{ith}. - -The @code{sub} function returns the number of substitutions made (either -one or zero). - -If the special character @samp{&} appears in @var{replacement}, it -stands for the precise substring that was matched by @var{regexp}. (If -the regexp can match more than one string, then this precise substring -may vary.) For example: - -@example -@{ sub(/candidate/, "& and his wife"); print @} -@end example - -@noindent -changes the first occurrence of @samp{candidate} to @samp{candidate -and his wife} on each input line. -Here is another example: - -@example -$ awk 'BEGIN @{ -> str = "daabaaa" -> sub(/a+/, "C&C", str) -> print str -> @}' -@print{} dCaaCbaaa -@end example - -@noindent -This shows how @samp{&} can represent a non-constant string and also -illustrates the ``leftmost, longest'' rule in regexp matching -(@pxref{Leftmost Longest, ,How Much Text Matches?}). - -The effect of this special character (@samp{&}) can be turned off by putting a -backslash before it in the string. As usual, to insert one backslash in -the string, you must write two backslashes. Therefore, write @samp{\\&} -in a string constant to include a literal @samp{&} in the replacement. -For example, following is shown how to replace the first @samp{|} on each line with -an @samp{&}: - -@example -@{ sub(/\|/, "\\&"); print @} -@end example - -@cindex @code{sub}, third argument of -@cindex @code{gsub}, third argument of -As mentioned, the third argument to @code{sub} must -be a variable, field or array reference. -Some versions of @command{awk} allow the third argument to -be an expression that is not an lvalue. In such a case, @code{sub} -still searches for the pattern and returns zero or one, but the result of -the substitution (if any) is thrown away because there is no place -to put it. Such versions of @command{awk} accept expressions -such as the following: - -@example -sub(/USA/, "United States", "the USA and Canada") -@end example - -@noindent -@cindex fatal errors -For historical compatibility, @command{gawk} accepts erroneous code, -such as in the previous example. However, using any other non-changeable -object as the third parameter causes a fatal error and your program -will not run. - -Finally, if the @var{regexp} is not a regexp constant, it is converted into a -string, and then the value of that string is treated as the regexp to match. - -@item gsub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) -@cindex @code{gsub} built-in function -This is similar to the @code{sub} function, except @code{gsub} replaces -@emph{all} of the longest, leftmost, @emph{non-overlapping} matching -substrings it can find. The @samp{g} in @code{gsub} stands for -``global,'' which means replace everywhere. For example: - -@example -@{ gsub(/Britain/, "United Kingdom"); print @} -@end example - -@noindent -replaces all occurrences of the string @samp{Britain} with @samp{United -Kingdom} for all input records. - -The @code{gsub} function returns the number of substitutions made. If -the variable to search and alter (@var{target}) is -omitted, then the entire input record (@code{$0}) is used. -As in @code{sub}, the characters @samp{&} and @samp{\} are special, -and the third argument must be assignable. - -@item gensub(@var{regexp}, @var{replacement}, @var{how} @r{[}, @var{target}@r{]}) # -@cindex @code{gensub} built-in function -@code{gensub} is a general substitution function. Like @code{sub} and -@code{gsub}, it searches the target string @var{target} for matches of -the regular expression @var{regexp}. Unlike @code{sub} and @code{gsub}, -the modified string is returned as the result of the function and the -original target string is @emph{not} changed. If @var{how} is a string -beginning with @samp{g} or @samp{G}, then it replaces all matches of -@var{regexp} with @var{replacement}. Otherwise, @var{how} is treated -as a number that indicates which match of @var{regexp} to replace. If -no @var{target} is supplied, @code{$0} is used. - -@code{gensub} provides an additional feature that is not available -in @code{sub} or @code{gsub}: the ability to specify components of a -regexp in the replacement text. This is done by using parentheses in -the regexp to mark the components and then specifying @samp{\@var{N}} -in the replacement text, where @var{N} is a digit from 1 to 9. -For example: - -@example -$ gawk ' -> BEGIN @{ -> a = "abc def" -> b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a) -> print b -> @}' -@print{} def abc -@end example - -@noindent -As with @code{sub}, you must type two backslashes in order -to get one into the string. - -In the replacement text, the sequence @samp{\0} represents the entire -matched text, as does the character @samp{&}. - -The following example shows how you can use the third argument to control -which match of the regexp should be changed: - -@example -$ echo a b c a b c | -> gawk '@{ print gensub(/a/, "AA", 2) @}' -@print{} a b c AA b c -@end example - -In this case, @code{$0} is used as the default target string. -@code{gensub} returns the new string as its result, which is -passed directly to @code{print} for printing. - -@cindex automatic warnings -@cindex warnings, automatic -If the @var{how} argument is a string that does not begin with @samp{g} or -@samp{G}, or if it is a number that is less than or equal to zero, only one -substitution is performed. If @var{how} is zero, @command{gawk} issues -a warning message. - -If @var{regexp} does not match @var{target}, @code{gensub}'s return value -is the original unchanged value of @var{target}. - -@cindex differences between @command{gawk} and @command{awk} -@code{gensub} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). - -@item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]}) -@cindex @code{substr} built-in function -This returns a @var{length}-character-long substring of @var{string}, -starting at character number @var{start}. The first character of a -string is character number one.@footnote{This is different from -C and C++, where the first character is number zero.} -For example, @code{substr("washington", 5, 3)} returns @code{"ing"}. - -If @var{length} is not present, this function returns the whole suffix of -@var{string} that begins at character number @var{start}. For example, -@code{substr("washington", 5)} returns @code{"ington"}. The whole -suffix is also returned -if @var{length} is greater than the number of characters remaining -in the string, counting from character number @var{start}. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -The string returned by @code{substr} @emph{cannot} be -assigned. Thus, it is a mistake to attempt to change a portion of -a string, as shown in the following example: - -@example -string = "abcdef" -# try to get "abCDEf", won't work -substr(string, 3, 3) = "CDE" -@end example - -@noindent -It is also a mistake to use @code{substr} as the third argument -of @code{sub} or @code{gsub}: - -@example -gsub(/xyz/, "pdq", substr($0, 5, 20)) # WRONG -@end example - -@cindex portability issues -(Some commercial versions of @command{awk} do in fact let you use -@code{substr} this way, but doing so is not portable.) - -If you need to replace bits and pieces of a string, combine @code{substr} -with string concatenation, in the following manner: - -@example -string = "abcdef" -@dots{} -string = substr(string, 1, 2) "CDE" substr(string, 6) -@end example - -@cindex case conversion -@cindex conversion of case -@item tolower(@var{string}) -@cindex @code{tolower} built-in function -This returns a copy of @var{string}, with each uppercase character -in the string replaced with its corresponding lowercase character. -Non-alphabetic characters are left unchanged. For example, -@code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}. - -@item toupper(@var{string}) -@cindex @code{toupper} built-in function -This returns a copy of @var{string}, with each lowercase character -in the string replaced with its corresponding uppercase character. -Non-alphabetic characters are left unchanged. For example, -@code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}. -@end table - -@node Gory Details, , String Functions, String Functions -@subsubsection More About @samp{\} and @samp{&} with @code{sub}, @code{gsub}, and @code{gensub} - -@cindex escape processing, @code{sub} et. al. -@cindex @code{sub}, escape processing -@cindex @code{gsub}, escape processing -@cindex @code{gensub}, escape processing -When using @code{sub}, @code{gsub}, or @code{gensub}, and trying to get literal -backslashes and ampersands into the replacement text, you need to remember -that there are several levels of @dfn{escape processing} going on. - -First, there is the @dfn{lexical} level, which is when @command{awk} reads -your program -and builds an internal copy of it that can be executed. -Then there is the runtime level, which is when @command{awk} actually scans the -replacement string to determine what to generate. - -At both levels, @command{awk} looks for a defined set of characters that -can come after a backslash. At the lexical level, it looks for the -escape sequences listed in @ref{Escape Sequences}. -Thus, for every @samp{\} that @command{awk} processes at the runtime -level, type two backslashes at the lexical level. -When a character that is not valid for an escape sequence follows the -@samp{\}, Unix @command{awk} and @command{gawk} both simply remove the initial -@samp{\} and put the next character into the string. Thus, for -example, @code{"a\qb"} is treated as @code{"aqb"}. - -At the runtime level, the various functions handle sequences of -@samp{\} and @samp{&} differently. The situation is (sadly) somewhat complex. -Historically, the @code{sub} and @code{gsub} functions treated the two -character sequence @samp{\&} specially; this sequence was replaced in -the generated text with a single @samp{&}. Any other @samp{\} within -the @var{replacement} string that did not precede an @samp{&} was passed -through unchanged. To illustrate with a table: - -@c Thank to Karl Berry for help with the TeX stuff. -@tex -\vbox{\bigskip -% This table has lots of &'s and \'s, so unspecialize them. -\catcode`\& = \other \catcode`\\ = \other -% But then we need character for escape and tab. -@catcode`! = 4 -@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr -@hrulefill!@hrulefill!@hrulefill@cr - @code{\&}! @code{&}!the matched text@cr - @code{\\&}! @code{\&}!a literal @samp{&}@cr - @code{\\\&}! @code{\&}!a literal @samp{&}@cr -@code{\\\\&}! @code{\\&}!a literal @samp{\&}@cr -@code{\\\\\&}! @code{\\&}!a literal @samp{\&}@cr -@code{\\\\\\&}! @code{\\\&}!a literal @samp{\\&}@cr - @code{\\q}! @code{\q}!a literal @samp{\q}@cr -} -@bigskip} -@end tex -@ifnottex -@display - You type @code{sub} sees @code{sub} generates - -------- ---------- --------------- - @code{\&} @code{&} the matched text - @code{\\&} @code{\&} a literal @samp{&} - @code{\\\&} @code{\&} a literal @samp{&} - @code{\\\\&} @code{\\&} a literal @samp{\&} - @code{\\\\\&} @code{\\&} a literal @samp{\&} -@code{\\\\\\&} @code{\\\&} a literal @samp{\\&} - @code{\\q} @code{\q} a literal @samp{\q} -@end display -@end ifnottex - -@noindent -This table shows both the lexical-level processing, where -an odd number of backslashes becomes an even number at the runtime level, -as well as the runtime processing done by @code{sub}. -(For the sake of simplicity, the rest of the tables below only show the -case of even numbers of backslashes entered at the lexical level.) - -The problem with the historical approach is that there is no way to get -a literal @samp{\} followed by the matched text. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -The 1992 POSIX standard attempted to fix this problem. The standard -says that @code{sub} and @code{gsub} look for either a @samp{\} or an @samp{&} -after the @samp{\}. If either one follows a @samp{\}, that character is -output literally. The interpretation of @samp{\} and @samp{&} then becomes: - -@c thanks to Karl Berry for formatting this table -@tex -\vbox{\bigskip -% This table has lots of &'s and \'s, so unspecialize them. -\catcode`\& = \other \catcode`\\ = \other -% But then we need character for escape and tab. -@catcode`! = 4 -@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr -@hrulefill!@hrulefill!@hrulefill@cr - @code{&}! @code{&}!the matched text@cr - @code{\\&}! @code{\&}!a literal @samp{&}@cr -@code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr -@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr -} -@bigskip} -@end tex -@ifnottex -@display - You type @code{sub} sees @code{sub} generates - -------- ---------- --------------- - @code{&} @code{&} the matched text - @code{\\&} @code{\&} a literal @samp{&} - @code{\\\\&} @code{\\&} a literal @samp{\}, then the matched text -@code{\\\\\\&} @code{\\\&} a literal @samp{\&} -@end display -@end ifnottex - -@noindent -This appears to solve the problem. -Unfortunately, the phrasing of the standard is unusual. It -says, in effect, that @samp{\} turns off the special meaning of any -following character, but for anything other than @samp{\} and @samp{&}, -such special meaning is undefined. This wording leads to two problems: - -@itemize @bullet -@item -Backslashes must now be doubled in the @var{replacement} string, breaking -historical @command{awk} programs. - -@item -To make sure that an @command{awk} program is portable, @emph{every} character -in the @var{replacement} string must be preceded with a -backslash.@footnote{This consequence was certainly unintended.} -@c I can say that, 'cause I was involved in making this change -@end itemize - -The POSIX standard is under revision. -Because of the problems just listed, proposed text for the revised standard -reverts to rules that correspond more closely to the original existing -practice. The proposed rules have special cases that make it possible -to produce a @samp{\} preceding the matched text: - -@tex -\vbox{\bigskip -% This table has lots of &'s and \'s, so unspecialize them. -\catcode`\& = \other \catcode`\\ = \other -% But then we need character for escape and tab. -@catcode`! = 4 -@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{sub} sees!@code{sub} generates@cr -@hrulefill!@hrulefill!@hrulefill@cr -@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr -@code{\\\\&}! @code{\\&}!a literal @samp{\}, followed by the matched text@cr - @code{\\&}! @code{\&}!a literal @samp{&}@cr - @code{\\q}! @code{\q}!a literal @samp{\q}@cr -} -@bigskip} -@end tex -@ifinfo -@display - You type @code{sub} sees @code{sub} generates - -------- ---------- --------------- -@code{\\\\\\&} @code{\\\&} a literal @samp{\&} - @code{\\\\&} @code{\\&} a literal @samp{\}, followed by the matched text - @code{\\&} @code{\&} a literal @samp{&} - @code{\\q} @code{\q} a literal @samp{\q} -@end display -@end ifinfo - -In a nutshell, at the runtime level, there are now three special sequences -of characters (@samp{\\\&}, @samp{\\&} and @samp{\&}) whereas historically -there was only one. However, as in the historical case, any @samp{\} that -is not part of one of these three sequences is not special and appears -in the output literally. - -@command{gawk} 3.0 and 3.1 follow these proposed POSIX rules for @code{sub} and -@code{gsub}. -@c As much as we think it's a lousy idea. You win some, you lose some. Sigh. -Whether these proposed rules will actually become codified into the -standard is unknown at this point. Subsequent @command{gawk} releases will -track the standard and implement whatever the final version specifies; -this @value{DOCUMENT} will be updated as -well.@footnote{As this @value{DOCUMENT} was being finalized, -we learned that the POSIX standard will not use these rules. -However, it was too late to change @command{gawk} for the 3.1 release. -@command{gawk} behaves as described here.} - -The rules for @code{gensub} are considerably simpler. At the runtime -level, whenever @command{gawk} sees a @samp{\}, if the following character -is a digit, then the text that matched the corresponding parenthesized -subexpression is placed in the generated output. Otherwise, -no matter what the character after the @samp{\} is, it -appears in the generated text and the @samp{\} does not: - -@tex -\vbox{\bigskip -% This table has lots of &'s and \'s, so unspecialize them. -\catcode`\& = \other \catcode`\\ = \other -% But then we need character for escape and tab. -@catcode`! = 4 -@halign{@hfil#!@qquad@hfil#!@qquad#@hfil@cr - You type!@code{gensub} sees!@code{gensub} generates@cr -@hrulefill!@hrulefill!@hrulefill@cr - @code{&}! @code{&}!the matched text@cr - @code{\\&}! @code{\&}!a literal @samp{&}@cr - @code{\\\\}! @code{\\}!a literal @samp{\}@cr - @code{\\\\&}! @code{\\&}!a literal @samp{\}, then the matched text@cr -@code{\\\\\\&}! @code{\\\&}!a literal @samp{\&}@cr - @code{\\q}! @code{\q}!a literal @samp{q}@cr -} -@bigskip} -@end tex -@ifnottex -@display - You type @code{gensub} sees @code{gensub} generates - -------- ------------- ------------------ - @code{&} @code{&} the matched text - @code{\\&} @code{\&} a literal @samp{&} - @code{\\\\} @code{\\} a literal @samp{\} - @code{\\\\&} @code{\\&} a literal @samp{\}, then the matched text -@code{\\\\\\&} @code{\\\&} a literal @samp{\&} - @code{\\q} @code{\q} a literal @samp{q} -@end display -@end ifnottex - -Because of the complexity of the lexical and runtime level processing -and the special cases for @code{sub} and @code{gsub}, -we recommend the use of @command{gawk} and @code{gensub} when you have -to do substitutions. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Matching the Null String -@cindex advanced notes -@cindex matching, the null string - -In @command{awk}, the @samp{*} operator can match the null string. -This is particularly important for the @code{sub}, @code{gsub}, -and @code{gensub} functions. For example: - -@example -$ echo abc | awk '@{ gsub(/m*/, "X"); print @}' -@print{} XaXbXcX -@end example - -@noindent -Although this makes a certain amount of sense, it can be surprising. - -@node I/O Functions, Time Functions, String Functions, Built-in -@subsection Input/Output Functions - -The following functions relate to Input/Output (I/O). -Optional parameters are enclosed in square brackets ([ and ]): - -@table @code -@item close(@var{filename} @r{[}, @var{how}@r{]}) -@cindex @code{close} built-in function -Close the file @var{filename} for input or output. Alternatively, the -argument may be a shell command that was used for creating a coprocess, or -for redirecting to or from a pipe; then the coprocess or pipe is closed. -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}, -for more information. - -When closing a coprocess, it is occasionally useful to first close -one end of the two-way pipe, and then to close the other. This is done -by providing a second argument to @code{close}. This second argument -should be one of the two string values @code{"to"} or @code{"from"}, -indicating which end of the pipe to close. Case in the string does -not matter. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, -which discusses this feature in more detail and gives an example. - -@item fflush(@r{[}@var{filename}@r{]}) -@cindex @code{fflush} built-in function -@cindex portability issues -@cindex flushing buffers -@cindex buffers, flushing -@cindex buffering output -@cindex output, buffering -Flush any buffered output associated with @var{filename}, which is either a -file opened for writing or a shell command for redirecting output to -a pipe or coprocess. - -Many utility programs @dfn{buffer} their output; i.e., they save information -to write to a disk file or terminal in memory, until there is enough -for it to be worthwhile to send the data to the output device. -This is often more efficient than writing -every little bit of information as soon as it is ready. However, sometimes -it is necessary to force a program to @dfn{flush} its buffers; that is, -write the information to its destination, even if a buffer is not full. -This is the purpose of the @code{fflush} function---@command{gawk} also -buffers its output and the @code{fflush} function forces -@command{gawk} to flush its buffers. - -@code{fflush} was added to the Bell Laboratories research -version of @command{awk} in 1994; it is not part of the POSIX standard and is -not available if @option{--posix} has been specified on the -command line (@pxref{Options, ,Command-Line Options}). - -@command{gawk} extends the @code{fflush} function in two ways. The first -is to allow no argument at all. In this case, the buffer for the -standard output is flushed. The second is to allow the null string -(@w{@code{""}}) as the argument. In this case, the buffers for -@emph{all} open output files and pipes are flushed. - -@cindex automatic warnings -@cindex warnings, automatic -@code{fflush} returns zero if the buffer is successfully flushed; -otherwise it returns @minus{}1. -In the case where all buffers are flushed, the return value is zero -only if all buffers were flushed successfully. Otherwise, it is -@minus{}1, and @command{gawk} warns about the @var{filename} that had the problem. - -@command{gawk} also issues a warning message if you attempt to flush -a file or pipe that was opened for reading (such as with @code{getline}), -or if @var{filename} is not an open file, pipe, or coprocess. -In such a case, @code{fflush} returns @minus{}1 as well. - -@item system(@var{command}) -@cindex @code{system} built-in function -@cindex interaction, @command{awk} and other programs -The @code{system} function allows the user to execute operating system -commands and then return to the @command{awk} program. The @code{system} -function executes the command given by the string @var{command}. -It returns the status returned by the command that was executed as -its value. - -For example, if the following fragment of code is put in your @command{awk} -program: - -@example -END @{ - system("date | mail -s 'awk run done' root") -@} -@end example - -@noindent -the system administrator is sent mail when the @command{awk} program -finishes processing input and begins its end-of-input processing. - -Note that redirecting @code{print} or @code{printf} into a pipe is often -enough to accomplish your task. If you need to run many commands, it -is more efficient to simply print them down a pipeline to the shell: - -@example -while (@var{more stuff to do}) - print @var{command} | "/bin/sh" -close("/bin/sh") -@end example - -@noindent -@cindex fatal errors -However, if your @command{awk} -program is interactive, @code{system} is useful for cranking up large -self-contained programs, such as a shell or an editor. -Some operating systems cannot implement the @code{system} function. -@code{system} causes a fatal error if it is not supported. -@end table - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Interactive Versus Non-Interactive Buffering -@cindex advanced notes -@cindex buffering, interactive vs. non-interactive -@cindex buffering, non-interactive vs. interactive -@cindex interactive buffering vs. non-interactive -@cindex non-interactive buffering vs. interactive - -As a side point, buffering issues can be even more confusing, depending -upon whether your program is @dfn{interactive}; i.e., communicating -with a user sitting at a keyboard.@footnote{A program is interactive -if the standard output is connected -to a terminal device.} - -@c Thanks to Walter.Mecky@dresdnerbank.de for this example, and for -@c motivating me to write this section. -Interactive programs generally @dfn{line buffer} their output; i.e., they -write out every line. Non-interactive programs wait until they have -a full buffer, which may be many lines of output. -Here is an example of the difference: - -@example -$ awk '@{ print $1 + $2 @}' -1 1 -@print{} 2 -2 3 -@print{} 5 -@kbd{Ctrl-d} -@end example - -@noindent -Each line of output is printed immediately. Compare that behavior -with this example: - -@example -$ awk '@{ print $1 + $2 @}' | cat -1 1 -2 3 -@kbd{Ctrl-d} -@print{} 2 -@print{} 5 -@end example - -@noindent -Here, no output is printed until after the @kbd{Ctrl-d} is typed, because -it is all buffered and sent down the pipe to @command{cat} in one shot. - -@c fakenode --- for prepinfo -@subheading Advanced Notes: Controlling Output Buffering with @code{system} -@cindex advanced notes -@cindex flushing buffers -@cindex buffers, flushing -@cindex buffering output -@cindex output, buffering - -The @code{fflush} function provides explicit control over output buffering for -individual files and pipes. However, its use is not portable to many other -@command{awk} implementations. An alternative method to flush output -buffers is to call @code{system} with a null string as its argument: - -@example -system("") # flush output -@end example - -@noindent -@command{gawk} treats this use of the @code{system} function as a special -case and is smart enough not to run a shell (or other command -interpreter) with the empty command. Therefore, with @command{gawk}, this -idiom is not only useful, it is also efficient. While this method should work -with other @command{awk} implementations, it does not necessarily avoid -starting an unnecessary shell. (Other implementations may only -flush the buffer associated with the standard output and not necessarily -all buffered output.) - -If you think about what a programmer expects, it makes sense that -@code{system} should flush any pending output. The following program: - -@example -BEGIN @{ - print "first print" - system("echo system echo") - print "second print" -@} -@end example - -@noindent -must print: - -@example -first print -system echo -second print -@end example - -@noindent -and not: - -@example -system echo -first print -second print -@end example - -If @command{awk} did not flush its buffers before calling @code{system}, the -latter (undesirable) output is what you see. - -@node Time Functions, Bitwise Functions, I/O Functions, Built-in -@subsection Using @command{gawk}'s Timestamp Functions - -@cindex timestamps -@cindex time of day -A common use for @command{awk} programs is the processing of log files -containing timestamp information, indicating when a -particular log record was written. Many programs log their timestamp -in the form returned by the @code{time} system call, which is the -number of seconds since a particular epoch. On POSIX-compliant systems, -it is the number of seconds since -1970-01-01 00:00:00 UTC, not counting leap seconds.@footnote{@xref{Glossary}, -especially the entries for ``Epoch'' and ``UTC.''} -All known POSIX-compliant systems support timestamps from 0 through -@math{2^31 - 1}, which is sufficient to represent times through -2038-01-19 03:14:07 UTC. Many systems support a wider range of timestamps, -including negative timestamps that represent times before the -epoch. - -In order to make it easier to process such log files and to produce -useful reports, @command{gawk} provides the following functions for -working with timestamps. They are @command{gawk} extensions; they are -not specified in the POSIX standard, nor are they in any other known -version of @command{awk}.@footnote{The GNU @command{date} utility can -also do many of the things described here. It's use may be preferable -for simple time-related operations in shell scripts.} -Optional parameters are enclosed in square brackets ([ and ]): - -@table @code -@item systime() -@cindex @code{systime} built-in function -This function returns the current time as the number of seconds since -the system epoch. On POSIX systems, this is the number of seconds -since 1970-01-01 00:00:00 UTC, not counting leap seconds. -It may be a different number on -other systems. - -@item mktime(@var{datespec}) -@cindex @code{mktime} built-in function -This function turns @var{datespec} into a timestamp in the same form -as is returned by @code{systime}. It is similar to the function of the -same name in ISO C. The argument, @var{datespec}, is a string of the form -@w{@code{"@var{YYYY} @var{MM} @var{DD} @var{HH} @var{MM} @var{SS} [@var{DST}]"}}. -The string consists of six or seven numbers representing, respectively, -the full year including century, the month from 1 to 12, the day of the month -from 1 to 31, the hour of the day from 0 to 23, the minute from 0 to -59, the second from 0 to 60,@footnote{Occasionally there are -minutes in a year with a leap second, which is why the -seconds can go up to 60.} -and an optional daylight savings flag. - -The values of these numbers need not be within the ranges specified; -for example, an hour of @minus{}1 means 1 hour before midnight. -The origin-zero Gregorian calendar is assumed, with year 0 preceding -year 1 and year @minus{}1 preceding year 0. -The time is assumed to be in the local timezone. -If the daylight savings flag is positive, the time is assumed to be -daylight savings time; if zero, the time is assumed to be standard -time; and if negative (the default), @code{mktime} attempts to determine -whether daylight savings time is in effect for the specified time. - -If @var{datespec} does not contain enough elements or if the resulting time -is out of range, @code{mktime} returns @minus{}1. - -@item strftime(@r{[}@var{format} @r{[}, @var{timestamp}@r{]]}) -@cindex @code{strftime} built-in function -This function returns a string. It is similar to the function of the -same name in ISO C. The time specified by @var{timestamp} is used to -produce a string, based on the contents of the @var{format} string. -The @var{timestamp} is in the same format as the value returned by the -@code{systime} function. If no @var{timestamp} argument is supplied, -@command{gawk} uses the current time of day as the timestamp. -If no @var{format} argument is supplied, @code{strftime} uses -@code{@w{"%a %b %d %H:%M:%S %Z %Y"}}. This format string produces -output that is (almost) equivalent to that of the @command{date} utility. -(Versions of @command{gawk} prior to 3.0 require the @var{format} argument.) -@end table - -The @code{systime} function allows you to compare a timestamp from a -log file with the current time of day. In particular, it is easy to -determine how long ago a particular record was logged. It also allows -you to produce log records using the ``seconds since the epoch'' format. - -@cindex converting dates to timestamps -@cindex dates, converting to timestamps -@cindex timestamps, converting from dates -The @code{mktime} function allows you to convert a textual representation -of a date and time into a timestamp. This makes it easy to do before/after -comparisons of dates and times, particularly when dealing with date and -time data coming from an external source, such as a log file. - -The @code{strftime} function allows you to easily turn a timestamp -into human-readable information. It is similar in nature to the @code{sprintf} -function -(@pxref{String Functions, ,String Manipulation Functions}), -in that it copies non-format specification characters verbatim to the -returned string, while substituting date and time values for format -specifications in the @var{format} string. - -@code{strftime} is guaranteed by the 1999 ISO C standard@footnote{As this -is a recent standard, not every system's @code{strftime} necessarily -supports all of the conversions listed here.} -to support the following date format specifications: - -@cindex format specifier, @code{strftime} -@table @code -@item %a -The locale's abbreviated weekday name. - -@item %A -The locale's full weekday name. - -@item %b -The locale's abbreviated month name. - -@item %B -The locale's full month name. - -@item %c -The locale's ``appropriate'' date and time representation. -(This is @samp{%A %B %d %T %Y} in the @code{"C"} locale.) - -@item %C -The century. This is the year divided by 100 and truncated to the next -lower integer. - -@item %d -The day of the month as a decimal number (01--31). - -@item %D -Equivalent to specifying @samp{%m/%d/%y}. - -@item %e -The day of the month, padded with a space if it is only one digit. - -@item %F -Equivalent to specifying @samp{%Y-%m-%d}. -This is the ISO 8601 date format. - -@item %g -The year modulo 100 of the ISO week number, as a decimal number (00--99). -For example, January 1, 1993, is in week 53 of 1992. Thus, the year -of its ISO week number is 1992, even though its year is 1993. -Similarly, December 31, 1973, is in week 1 of 1974. Thus, the year -of its ISO week number is 1974, even though its year is 1973. - -@item %G -The full year of the ISO week number, as a decimal number. - -@item %h -Equivalent to @samp{%b}. - -@item %H -The hour (24-hour clock) as a decimal number (00--23). - -@item %I -The hour (12-hour clock) as a decimal number (01--12). - -@item %j -The day of the year as a decimal number (001--366). - -@item %m -The month as a decimal number (01--12). - -@item %M -The minute as a decimal number (00--59). - -@item %n -A newline character (ASCII LF). - -@item %p -The locale's equivalent of the AM/PM designations associated -with a 12-hour clock. - -@item %r -The locale's 12-hour clock time. -(This is @samp{%I:%M:%S %p} in the @code{"C"} locale.) - -@item %R -Equivalent to specifying @samp{%H:%M}. - -@item %S -The second as a decimal number (00--60). - -@item %t -A tab character. - -@item %T -Equivalent to specifying @samp{%H:%M:%S}. - -@item %u -The weekday as a decimal number (1--7). Monday is day one. - -@item %U -The week number of the year (the first Sunday as the first day of week one) -as a decimal number (00--53). - -@cindex ISO 8601 -@item %V -The week number of the year (the first Monday as the first -day of week one) as a decimal number (01--53). -The method for determining the week number is as specified by ISO 8601. -(To wit: if the week containing January 1 has four or more days in the -new year, then it is week one, otherwise it is week 53 of the previous year -and the next week is week one.) - -@item %w -The weekday as a decimal number (0--6). Sunday is day zero. - -@item %W -The week number of the year (the first Monday as the first day of week one) -as a decimal number (00--53). - -@item %x -The locale's ``appropriate'' date representation. -(This is @samp{%A %B %d %Y} in the @code{"C"} locale.) - -@item %X -The locale's ``appropriate'' time representation. -(This is @samp{%T} in the @code{"C"} locale.) - -@item %y -The year modulo 100 as a decimal number (00--99). - -@item %Y -The full year as a decimal number (e.g., 1995). - -@cindex RFC 822 -@cindex RFC 1036 -@item %z -The timezone offset in a +HHMM format (e.g., the format necessary to -produce RFC 822/RFC 1036 date headers). - -@item %Z -The time zone name or abbreviation; no characters if -no time zone is determinable. - -@item %Ec %EC %Ex %EX %Ey %EY %Od %Oe %OH -@itemx %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy -These are ``alternate representations'' for the specifications -that use only the second letter (@samp{%c}, @samp{%C}, -and so on).@footnote{If you don't understand any of this, don't worry about -it; these facilities are meant to make it easier to ``internationalize'' -programs. -Other internationalization features are described in -@ref{Internationalization, ,Internationalization with @command{gawk}}.} -(These facilitate compliance with the POSIX @command{date} utility.) - -@item %% -A literal @samp{%}. -@end table - -If a conversion specifier is not one of the above, the behavior is -undefined.@footnote{This is because ISO C leaves the -behavior of the C version of @code{strftime} undefined and @command{gawk} -uses the system's version of @code{strftime} if it's there. -Typically, the conversion specifier either does not appear in the -returned string or it appears literally.} - -@cindex locale, definition of -Informally, a @dfn{locale} is the geographic place in which a program -is meant to run. For example, a common way to abbreviate the date -September 4, 1991 in the United States is ``9/4/91.'' -In many countries in Europe, however, it is abbreviated ``4.9.91.'' -Thus, the @samp{%x} specification in a @code{"US"} locale might produce -@samp{9/4/91}, while in a @code{"EUROPE"} locale, it might produce -@samp{4.9.91}. The ISO C standard defines a default @code{"C"} -locale, which is an environment that is typical of what most C programmers -are used to. - -A public-domain C version of @code{strftime} is supplied with @command{gawk} -for systems that are not yet fully standards-compliant. -It supports all of the just listed format specifications. -If that version is -used to compile @command{gawk} (@pxref{Installation, ,Installing @command{gawk}}), -then the following additional format specifications are available: - -@table @code -@item %k -The hour (24-hour clock) as a decimal number (0--23). -Single digit numbers are padded with a space. - -@item %l -The hour (12-hour clock) as a decimal number (1--12). -Single digit numbers are padded with a space. - -@item %N -The ``Emperor/Era'' name. -Equivalent to @code{%C}. - -@item %o -The ``Emperor/Era'' year. -Equivalent to @code{%y}. - -@item %s -The time as a decimal timestamp in seconds since the epoch. - -@item %v -The date in VMS format (e.g., @samp{20-JUN-1991}). -@end table - -Additionally, the alternate representations are recognized but their -normal representations are used. - -This example is an @command{awk} implementation of the POSIX -@command{date} utility. Normally, the @command{date} utility prints the -current date and time of day in a well-known format. However, if you -provide an argument to it that begins with a @samp{+}, @command{date} -copies non-format specifier characters to the standard output and -interprets the current time according to the format specifiers in -the string. For example: - -@example -$ date '+Today is %A, %B %d, %Y.' -@print{} Today is Thursday, September 14, 2000. -@end example - -Here is the @command{gawk} version of the @command{date} utility. -It has a shell ``wrapper'' to handle the @option{-u} option, -which requires that @command{date} run as if the time zone -is set to UTC: - -@example -#! /bin/sh -# -# date --- approximate the P1003.2 'date' command - -case $1 in --u) TZ=UTC0 # use UTC - export TZ - shift ;; -esac - -@c FIXME: One day, change %d to %e, when C 99 is common. -gawk 'BEGIN @{ - format = "%a %b %d %H:%M:%S %Z %Y" - exitval = 0 - - if (ARGC > 2) - exitval = 1 - else if (ARGC == 2) @{ - format = ARGV[1] - if (format ~ /^\+/) - format = substr(format, 2) # remove leading + - @} - print strftime(format) - exit exitval -@}' "$@@" -@end example - -@node Bitwise Functions, I18N Functions, Time Functions, Built-in -@subsection Using @command{gawk}'s Bit Manipulation Functions -@cindex bitwise operations -@quotation -@i{I can explain it for you, but I can't understand it for you.}@* -Anonymous -@end quotation - -@cindex AND bitwise operation -@cindex OR bitwise operation -@cindex XOR bitwise operation -Many languages provide the ability to perform @dfn{bitwise} operations -on two integer numbers. In other words, the operation is performed on -each successive pair of bits in the operands. -Three common operations are bitwise AND, OR, and XOR. -The operations are described by the following table: - -@ifnottex -@display - Bit Operator - | AND | OR | XOR - |---+---+---+---+---+--- -Operands | 0 | 1 | 0 | 1 | 0 | 1 -----------+---+---+---+---+---+--- - 0 | 0 0 | 0 1 | 0 1 - 1 | 0 1 | 1 1 | 1 0 -@end display -@end ifnottex -@tex -\centerline{ -\vbox{\bigskip % space above the table (about 1 linespace) -% Because we have vertical rules, we can't let TeX insert interline space -% in its usual way. -\offinterlineskip -\halign{\strut\hfil#\quad\hfil % operands - &\vrule#&\quad#\quad % rule, 0 (of and) - &\vrule#&\quad#\quad % rule, 1 (of and) - &\vrule# % rule between and and or - &\quad#\quad % 0 (of or) - &\vrule#&\quad#\quad % rule, 1 (of of) - &\vrule# % rule between or and xor - &\quad#\quad % 0 of xor - &\vrule#&\quad#\quad % rule, 1 of xor - \cr -&\omit&\multispan{11}\hfil\bf Bit operator\hfil\cr -\noalign{\smallskip} -& &\multispan3\hfil AND\hfil&&\multispan3\hfil OR\hfil - &&\multispan3\hfil XOR\hfil\cr -\bf Operands&&0&&1&&0&&1&&0&&1\cr -\noalign{\hrule} -\omit&height 2pt&&\omit&&&&\omit&&&&\omit\cr -\noalign{\hrule height0pt}% without this the rule does not extend; why? -0&&0&\omit&0&&0&\omit&1&&0&\omit&1\cr -1&&0&\omit&1&&1&\omit&1&&1&\omit&0\cr -}}} -@end tex - -@cindex bitwise complement -@cindex complement, bitwise -As you can see, the result of an AND operation is 1 only when @emph{both} -bits are 1. -The result of an OR operation is 1 if @emph{either} bit is 1. -The result of an XOR operation is 1 if either bit is 1, -but not both. -The next operation is the @dfn{complement}; the complement of 1 is 0 and -the complement of 0 is 1. Thus, this operation ``flips'' all the bits -of a given value. - -@cindex bitwise shift -@cindex left shift, bitwise -@cindex right shift, bitwise -@cindex shift, bitwise -Finally, two other common operations are to shift the bits left or right. -For example, if you have a bit string @samp{10111001} and you shift it -right by three bits, you end up with @samp{00010111}.@footnote{This example -shows that 0's come in on the left side. For @command{gawk}, this is -always true, but in some languages, it's possible to have the left side -fill with 1's. Caveat emptor.} -@c Purposely decided to use 0's and 1's here. 2/2001. -If you start over -again with @samp{10111001} and shift it left by three bits, you end up -with @samp{11001000}. -@command{gawk} provides built-in functions that implement the -bitwise operations just described. They are: - -@ignore -@table @code -@cindex @code{and} built-in function -@item and(@var{v1}, @var{v2}) -Return the bitwise AND of the values provided by @var{v1} and @var{v2}. - -@cindex @code{or} built-in function -@item or(@var{v1}, @var{v2}) -Return the bitwise OR of the values provided by @var{v1} and @var{v2}. - -@cindex @code{xor} built-in function -@item xor(@var{v1}, @var{v2}) -Return the bitwise XOR of the values provided by @var{v1} and @var{v2}. - -@cindex @code{compl} built-in function -@item compl(@var{val}) -Return the bitwise complement of @var{val}. - -@cindex @code{lshift} built-in function -@item lshift(@var{val}, @var{count}) -Return the value of @var{val}, shifted left by @var{count} bits. - -@cindex @code{rshift} built-in function -@item rshift(@var{val}, @var{count}) -Return the value of @var{val}, shifted right by @var{count} bits. -@end table -@end ignore - -@multitable {@code{rshift(@var{val}, @var{count})}} {Return the value of @var{val}, shifted right by @var{count} bits.} -@cindex @code{and} built-in function -@item @code{and(@var{v1}, @var{v2})} -@tab Return the bitwise AND of the values provided by @var{v1} and @var{v2}. - -@cindex @code{or} built-in function -@item @code{or(@var{v1}, @var{v2})} -@tab Return the bitwise OR of the values provided by @var{v1} and @var{v2}. - -@cindex @code{xor} built-in function -@item @code{xor(@var{v1}, @var{v2})} -@tab Return the bitwise XOR of the values provided by @var{v1} and @var{v2}. - -@cindex @code{compl} built-in function -@item @code{compl(@var{val})} -@tab Return the bitwise complement of @var{val}. - -@cindex @code{lshift} built-in function -@item @code{lshift(@var{val}, @var{count})} -@tab Return the value of @var{val}, shifted left by @var{count} bits. - -@cindex @code{rshift} built-in function -@item @code{rshift(@var{val}, @var{count})} -@tab Return the value of @var{val}, shifted right by @var{count} bits. -@end multitable - -For all of these functions, first the double-precision floating-point value is -converted to a C @code{unsigned long}, then the bitwise operation is -performed and then the result is converted back into a C @code{double}. (If -you don't understand this paragraph, don't worry about it.) - -Here is a user-defined function -(@pxref{User-defined, ,User-Defined Functions}) -that illustrates the use of these functions: - -@cindex @code{bits2str} user-defined function -@cindex @code{testbits.awk} program -@smallexample -@group -@c file eg/lib/bits2str.awk -# bits2str --- turn a byte into readable 1's and 0's - -function bits2str(bits, data, mask) -@{ - if (bits == 0) - return "0" - - mask = 1 - for (; bits != 0; bits = rshift(bits, 1)) - data = (and(bits, mask) ? "1" : "0") data - - while ((length(data) % 8) != 0) - data = "0" data - - return data -@} -@c endfile -@end group - -@c this is a hack to make testbits.awk self-contained -@ignore -@c file eg/prog/testbits.awk -# bits2str --- turn a byte into readable 1's and 0's - -function bits2str(bits, data, mask) -@{ - if (bits == 0) - return "0" - - mask = 1 - for (; bits != 0; bits = rshift(bits, 1)) - data = (and(bits, mask) ? "1" : "0") data - - while ((length(data) % 8) != 0) - data = "0" data - - return data -@} -@c endfile -@end ignore -@c file eg/prog/testbits.awk -BEGIN @{ - printf "123 = %s\n", bits2str(123) - printf "0123 = %s\n", bits2str(0123) - printf "0x99 = %s\n", bits2str(0x99) - comp = compl(0x99) - printf "compl(0x99) = %#x = %s\n", comp, bits2str(comp) - shift = lshift(0x99, 2) - printf "lshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift) - shift = rshift(0x99, 2) - printf "rshift(0x99, 2) = %#x = %s\n", shift, bits2str(shift) -@} -@c endfile -@end smallexample - -@noindent -This program produces the following output when run: - -@smallexample -$ gawk -f testbits.awk -@print{} 123 = 01111011 -@print{} 0123 = 01010011 -@print{} 0x99 = 10011001 -@print{} compl(0x99) = 0xffffff66 = 11111111111111111111111101100110 -@print{} lshift(0x99, 2) = 0x264 = 0000001001100100 -@print{} rshift(0x99, 2) = 0x26 = 00100110 -@end smallexample - -The @code{bits2str} function turns a binary number into a string. -The number @code{1} represents a binary value where the rightmost bit -is set to 1. Using this mask, -the function repeatedly checks the rightmost bit. -AND-ing the mask with the value indicates whether the -rightmost bit is 1 or not. If so, a @code{"1"} is concatenated onto the front -of the string. -Otherwise, a @code{"0"} is added. -The value is then shifted right by one bit and the loop continues -until there are no more 1 bits. - -If the initial value is zero it returns a simple @code{"0"}. -Otherwise, at the end, it pads the value with zeros to represent multiples -of eight-bit quantities. This is typical in modern computers. - -The main code in the @code{BEGIN} rule shows the difference between the -decimal and octal values for the same numbers -(@pxref{Non-decimal-numbers, ,Octal and Hexadecimal Numbers}), -and then demonstrates the -results of the @code{compl}, @code{lshift}, and @code{rshift} functions. - -@node I18N Functions, , Bitwise Functions, Built-in -@subsection Using @command{gawk}'s String Translation Functions - -@command{gawk} provides facilities for internationalizing @command{awk} programs. -These include the functions described in the following list. -The description here is purposely brief. -@xref{Internationalization, ,Internationalization with @command{gawk}}, -for the full story. -Optional parameters are enclosed in square brackets ([ and ]): - -@table @code -@cindex @code{dcgettext} built-in function -@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) -This function returns the translation of @var{string} in -text domain @var{domain} for locale category @var{category}. -The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. -The default value for @var{category} is @code{"LC_MESSAGES"}. - -@cindex @code{bindtextdomain} built-in function -@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) -This function allows you to specify the directory where -@command{gawk} will look for message translation files, in case they -will not or cannot be placed in the ``standard'' locations -(e.g., during testing). -It returns the directory where @var{domain} is ``bound.'' - -The default @var{domain} is the value of @code{TEXTDOMAIN}. -If @var{directory} is the null string (@code{""}), then -@code{bindtextdomain} returns the current binding for the -given @var{domain}. -@end table - -@node User-defined, , Built-in, Functions -@section User-Defined Functions - -@cindex user-defined functions -@cindex function, user-defined -Complicated @command{awk} programs can often be simplified by defining -your own functions. User-defined functions can be called just like -built-in ones (@pxref{Function Calls}), but it is up to you to define -them; i.e., to tell @command{awk} what they should do. - -@menu -* Definition Syntax:: How to write definitions and what they mean. -* Function Example:: An example function definition and what it - does. -* Function Caveats:: Things to watch out for. -* Return Statement:: Specifying the value a function returns. -* Dynamic Typing:: How variable types can change at runtime. -@end menu - -@node Definition Syntax, Function Example, User-defined, User-defined -@subsection Function Definition Syntax -@cindex defining functions -@cindex function definition - -Definitions of functions can appear anywhere between the rules of an -@command{awk} program. Thus, the general form of an @command{awk} program is -extended to include sequences of rules @emph{and} user-defined function -definitions. -There is no need to put the definition of a function -before all uses of the function. This is because @command{awk} reads the -entire program before starting to execute any of it. - -The definition of a function named @var{name} looks like this: -@c NEXT ED: put [ ] around parameter list - -@example -function @var{name}(@var{parameter-list}) -@{ - @var{body-of-function} -@} -@end example - -@cindex names, use of -@cindex namespace issues in @command{awk} -@noindent -@var{name} is the name of the function to define. A valid function -name is like a valid variable name: a sequence of letters, digits, and -underscores, that doesn't start with a digit. -Within a single @command{awk} program, any particular name can only be -used as a variable, array, or function. - -@c NEXT ED: parameter-list is an OPTIONAL list of ... -@var{parameter-list} is a list of the function's arguments and local -variable names, separated by commas. When the function is called, -the argument names are used to hold the argument values given in -the call. The local variables are initialized to the empty string. -A function cannot have two parameters with the same name, nor may it -have a parameter with the same name as the function itself. - -The @var{body-of-function} consists of @command{awk} statements. It is the -most important part of the definition, because it says what the function -should actually @emph{do}. The argument names exist to give the body a -way to talk about the arguments; local variables exist to give the body -places to keep temporary values. - -Argument names are not distinguished syntactically from local variable -names. Instead, the number of arguments supplied when the function is -called determines how many argument variables there are. Thus, if three -argument values are given, the first three names in @var{parameter-list} -are arguments and the rest are local variables. - -It follows that if the number of arguments is not the same in all calls -to the function, some of the names in @var{parameter-list} may be -arguments on some occasions and local variables on others. Another -way to think of this is that omitted arguments default to the -null string. - -@cindex conventions, programming -@cindex programming conventions -Usually when you write a function, you know how many names you intend to -use for arguments and how many you intend to use as local variables. It is -conventional to place some extra space between the arguments and -the local variables, in order to document how your function is supposed to be used. - -@cindex variable shadowing -During execution of the function body, the arguments and local variable -values hide or @dfn{shadow} any variables of the same names used in the -rest of the program. The shadowed variables are not accessible in the -function definition, because there is no way to name them while their -names have been taken away for the local variables. All other variables -used in the @command{awk} program can be referenced or set normally in the -function's body. - -The arguments and local variables last only as long as the function body -is executing. Once the body finishes, you can once again access the -variables that were shadowed while the function was running. - -@cindex recursive function -@cindex function, recursive -The function body can contain expressions that call functions. They -can even call this function, either directly or by way of another -function. When this happens, we say the function is @dfn{recursive}. -The act of a function calling itself is called @dfn{recursion}. - -@cindex @command{awk} language, POSIX version -@cindex POSIX @command{awk} -In many @command{awk} implementations, including @command{gawk}, -the keyword @code{function} may be -abbreviated @code{func}. However, POSIX only specifies the use of -the keyword @code{function}. This actually has some practical implications. -If @command{gawk} is in POSIX-compatibility mode -(@pxref{Options, ,Command-Line Options}), then the following -statement does @emph{not} define a function: - -@example -func foo() @{ a = sqrt($1) ; print a @} -@end example - -@noindent -Instead it defines a rule that, for each record, concatenates the value -of the variable @samp{func} with the return value of the function @samp{foo}. -If the resulting string is non-null, the action is executed. -This is probably not what is desired. (@command{awk} accepts this input as -syntactically valid, because functions may be used before they are defined -in @command{awk} programs.) -@c NEXT ED: This won't actually run, since foo() is undefined ... - -@cindex portability issues -To ensure that your @command{awk} programs are portable, always use the -keyword @code{function} when defining a function. - -@node Function Example, Function Caveats, Definition Syntax, User-defined -@subsection Function Definition Examples - -Here is an example of a user-defined function, called @code{myprint}, that -takes a number and prints it in a specific format: - -@example -function myprint(num) -@{ - printf "%6.3g\n", num -@} -@end example - -@noindent -To illustrate, here is an @command{awk} rule that uses our @code{myprint} -function: - -@example -$3 > 0 @{ myprint($3) @} -@end example - -@noindent -This program prints, in our special format, all the third fields that -contain a positive number in our input. Therefore, when given the following: - -@example - 1.2 3.4 5.6 7.8 - 9.10 11.12 -13.14 15.16 -17.18 19.20 21.22 23.24 -@end example - -@noindent -this program, using our function to format the results, prints: - -@example - 5.6 - 21.2 -@end example - -@page -This function deletes all the elements in an array: - -@example -function delarray(a, i) -@{ - for (i in a) - delete a[i] -@} -@end example - -When working with arrays, it is often necessary to delete all the elements -in an array and start over with a new list of elements -(@pxref{Delete, ,The @code{delete} Statement}). -Instead of having -to repeat this loop everywhere that you need to clear out -an array, your program can just call @code{delarray}. -(This guarantees portability. The use of @samp{delete @var{array}} to delete -the contents of an entire array is a non-standard extension.) - -The following is an example of a recursive function. It takes a string -as an input parameter and returns the string in backwards order. -Recursive functions must always have a test that stops the recursion. -In this case, the recursion terminates when the starting position -is zero; i.e., when there are no more characters left in the string. - -@example -function rev(str, start) -@{ - if (start == 0) - return "" - - return (substr(str, start, 1) rev(str, start - 1)) -@} -@end example - -If this function is in a file named @file{rev.awk}, it can be tested -this way: - -@example -$ echo "Don't Panic!" | -> gawk --source '@{ print rev($0, length($0)) @}' -f rev.awk -@print{} !cinaP t'noD -@end example - -The C @code{ctime} function takes a timestamp and returns it in a string, -formatted in a well-known fashion. -The following example uses the built-in @code{strftime} function -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}) -to create an @command{awk} version of @code{ctime}: - -@c FIXME: One day, change %d to %e, when C 99 is common. -@example -@c file eg/lib/ctime.awk -# ctime.awk -# -# awk version of C ctime(3) function - -function ctime(ts, format) -@{ - format = "%a %b %d %H:%M:%S %Z %Y" - if (ts == 0) - ts = systime() # use current time as default - return strftime(format, ts) -@} -@c endfile -@end example - -@node Function Caveats, Return Statement, Function Example, User-defined -@subsection Calling User-Defined Functions - -@cindex calling a function -@cindex function call -@dfn{Calling a function} means causing the function to run and do its job. -A function call is an expression and its value is the value returned by -the function. - -A function call consists of the function name followed by the arguments -in parentheses. @command{awk} expressions are what you write in the -call for the arguments. Each time the call is executed, these -expressions are evaluated, and the values are the actual arguments. For -example, here is a call to @code{foo} with three arguments (the first -being a string concatenation): - -@example -foo(x y, "lose", 4 * z) -@end example - -@strong{Caution:} Whitespace characters (spaces and tabs) are not allowed -between the function name and the open-parenthesis of the argument list. -If you write whitespace by mistake, @command{awk} might think that you mean -to concatenate a variable with an expression in parentheses. However, it -notices that you used a function name and not a variable name, and reports -an error. - -@cindex call by value -When a function is called, it is given a @emph{copy} of the values of -its arguments. This is known as @dfn{call by value}. The caller may use -a variable as the expression for the argument, but the called function -does not know this---it only knows what value the argument had. For -example, if you write the following code: - -@example -foo = "bar" -z = myfunc(foo) -@end example - -@noindent -then you should not think of the argument to @code{myfunc} as being -``the variable @code{foo}.'' Instead, think of the argument as the -string value @code{"bar"}. -If the function @code{myfunc} alters the values of its local variables, -this has no effect on any other variables. Thus, if @code{myfunc} -does this: - -@example -function myfunc(str) -@{ - print str - str = "zzz" - print str -@} -@end example - -@noindent -to change its first argument variable @code{str}, it @emph{does not} -change the value of @code{foo} in the caller. The role of @code{foo} in -calling @code{myfunc} ended when its value (@code{"bar"}) was computed. -If @code{str} also exists outside of @code{myfunc}, the function body -cannot alter this outer value, because it is shadowed during the -execution of @code{myfunc} and cannot be seen or changed from there. - -@cindex call by reference -However, when arrays are the parameters to functions, they are @emph{not} -copied. Instead, the array itself is made available for direct manipulation -by the function. This is usually called @dfn{call by reference}. -Changes made to an array parameter inside the body of a function @emph{are} -visible outside that function. - -@strong{Note:} Changing an array parameter inside a function -can be very dangerous if you do not watch what you are doing. -For example: - -@example -function changeit(array, ind, nvalue) -@{ - array[ind] = nvalue -@} - -BEGIN @{ - a[1] = 1; a[2] = 2; a[3] = 3 - changeit(a, 2, "two") - printf "a[1] = %s, a[2] = %s, a[3] = %s\n", - a[1], a[2], a[3] -@} -@end example - -@noindent -This program prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because -@code{changeit} stores @code{"two"} in the second element of @code{a}. - -@cindex undefined functions -@cindex functions, undefined -Some @command{awk} implementations allow you to call a function that -has not been defined. They only report a problem at runtime when the -program actually tries to call the function. For example: - -@example -BEGIN @{ - if (0) - foo() - else - bar() -@} -function bar() @{ @dots{} @} -# note that `foo' is not defined -@end example - -@noindent -Because the @samp{if} statement will never be true, it is not really a -problem that @code{foo} has not been defined. Usually though, it is a -problem if a program calls an undefined function. - -@cindex lint checks -If @option{--lint} is specified -(@pxref{Options, ,Command-Line Options}), -@command{gawk} reports calls to undefined functions. - -@cindex portability issues -Some @command{awk} implementations generate a runtime -error if you use the @code{next} statement -(@pxref{Next Statement, , The @code{next} Statement}) -inside a user-defined function. -@command{gawk} does not have this limitation. - -@node Return Statement, Dynamic Typing, Function Caveats, User-defined -@subsection The @code{return} Statement -@cindex @code{return} statement - -The body of a user-defined function can contain a @code{return} statement. -This statement returns control to the calling part of the @command{awk} program. It -can also be used to return a value for use in the rest of the @command{awk} -program. It looks like this: - -@example -return @r{[}@var{expression}@r{]} -@end example - -The @var{expression} part is optional. If it is omitted, then the returned -value is undefined, and therefore, unpredictable. - -A @code{return} statement with no value expression is assumed at the end of -every function definition. So if control reaches the end of the function -body, then the function returns an unpredictable value. @command{awk} -does @emph{not} warn you if you use the return value of such a function. - -Sometimes, you want to write a function for what it does, not for -what it returns. Such a function corresponds to a @code{void} function -in C or to a @code{procedure} in Pascal. Thus, it may be appropriate to not -return any value; simply bear in mind that if you use the return -value of such a function, you do so at your own risk. - -The following is an example of a user-defined function that returns a value -for the largest number among the elements of an array: - -@example -function maxelt(vec, i, ret) -@{ - for (i in vec) @{ - if (ret == "" || vec[i] > ret) - ret = vec[i] - @} - return ret -@} -@end example - -@cindex conventions, programming -@cindex programming conventions -@noindent -You call @code{maxelt} with one argument, which is an array name. The local -variables @code{i} and @code{ret} are not intended to be arguments; -while there is nothing to stop you from passing two or three arguments -to @code{maxelt}, the results would be strange. The extra space before -@code{i} in the function parameter list indicates that @code{i} and -@code{ret} are not supposed to be arguments. This is a convention that -you should follow when you define functions. - -The following program uses the @code{maxelt} function. It loads an -array, calls @code{maxelt}, and then reports the maximum number in that -array: - -@example -function maxelt(vec, i, ret) -@{ - for (i in vec) @{ - if (ret == "" || vec[i] > ret) - ret = vec[i] - @} - return ret -@} - -# Load all fields of each record into nums. -@{ - for(i = 1; i <= NF; i++) - nums[NR, i] = $i -@} - -END @{ - print maxelt(nums) -@} -@end example - -Given the following input: - -@example - 1 5 23 8 16 -44 3 5 2 8 26 -256 291 1396 2962 100 --6 467 998 1101 -99385 11 0 225 -@end example - -@noindent -the program reports (predictably) that @code{99385} is the largest number -in the array. - -@node Dynamic Typing, , Return Statement, User-defined -@subsection Functions and Their Effect on Variable Typing - -@command{awk} is a very fluid language. -It is possible that @command{awk} can't tell if an identifier -represents a regular variable or an array until runtime. -Here is an annotated sample program: - -@example -function foo(a) -@{ - a[1] = 1 # parameter is an array -@} - -BEGIN @{ - b = 1 - foo(b) # invalid: fatal type mismatch - - foo(x) # x uninitialized, becomes an array dynamically - x = 1 # now not allowed, runtime error -@} -@end example - -Usually, such things aren't a big issue, but it's worth -being aware of them. - -@node Internationalization, Advanced Features, Functions, Top -@chapter Internationalization with @command{gawk} - -Once upon a time, computer makers -wrote software that only worked in English. -Eventually, hardware and software vendors noticed that if their -systems worked in the native languages of non-English-speaking -countries, they were able to sell more systems. -As a result, internationalization and localization -of programs and software systems became a common practice. - -@cindex internationalization features in @command{gawk} -Until recently, the ability to provide internationalization -was largely restricted to programs written in C and C++. -This @value{CHAPTER} describes the underlying library @command{gawk} -uses for internationalization, as well as how -@command{gawk} makes internationalization -features available at the @command{awk} program level. -Having internationalization available at the @command{awk} level -gives software developers additional flexibility---they are no -longer required to write in C when internationalization is -a requirement. - -@menu -* I18N and L10N:: Internationalization and Localization. -* Explaining gettext:: How GNU @code{gettext} works. -* Programmer i18n:: Features for the programmer. -* Translator i18n:: Features for the translator. -* I18N Example:: A simple i18n example. -* Gawk I18N:: @command{gawk} is also internationalized. -@end menu - -@node I18N and L10N, Explaining gettext, Internationalization, Internationalization -@section Internationalization and Localization - -@cindex internationalization -@cindex localization -@dfn{Internationalization} means writing (or modifying) a program once, -in such a way that it can use multiple languages without requiring -further source code changes. -@dfn{Localization} means providing the data necessary for an -internationalized program to work in a particular language. -Most typically, these terms refer to features such as the language -used for printing error messages, the language used to read -responses, and information related to how numerical and -monetary values are printed and read. - -@node Explaining gettext, Programmer i18n, I18N and L10N, Internationalization -@section GNU @code{gettext} - -@cindex @code{gettext}, how it works -@cindex internationalizing a program -The facilities in GNU @code{gettext} focus on messages; strings printed -by a program, either directly or via formatting with @code{printf} or -@code{sprintf}.@footnote{For some operating systems, the @command{gawk} -port doesn't support GNU @code{gettext}. This applies most notably to -the PC operating systems. As such, these features are not available -if you are using one of those operating systems. Sorry.} - -When using GNU @code{gettext}, each application has its own -@dfn{text domain}. This is a unique name such as @samp{kpilot} or @samp{gawk}, -that identifies the application. -A complete application may have multiple components---programs written -in C or C++, as well as scripts written in @command{sh} or @command{awk}. -All of the components use the same text domain. - -To make the discussion concrete, assume we're writing an application -named @command{guide}. Internationalization consists of the -following steps, in this order: - -@enumerate -@item -The programmer goes -through the source for all of @command{guide}'s components -and marks each string that is a candidate for translation. -For example, @code{"`-F': option required"} is a good candidate for translation. -A table with strings of option names is not (e.g., @command{gawk}'s -@option{--profile} option should remain the same, no matter what the local -language). - -@cindex @code{textdomain} C library function -@item -The programmer indicates the application's text domain -(@code{"guide"}) to the @code{gettext} library, -by calling the @code{textdomain} function. - -@item -Messages from the application are extracted from the source code and -collected into a Portable Object file (@file{guide.po}), -which lists the strings and their translations. -The translations are initially empty. -The original (usually English) messages serve as the key for -lookup of the translations. - -@cindex portable object files (@code{gettext}) -@item -For each language with a translator, @file{guide.po} -is copied and translations are created and shipped with the application. - -@cindex message object files (@code{gettext}) -@item -Each language's @file{.po} file is converted into a binary -message object (@file{.mo}) file. -A message object file contains the original messages and their -translations in a binary format that allows fast lookup of translations -at runtime. - -@item -When @command{guide} is built and installed, the binary translation files -are installed in a standard place. - -@cindex @code{bindtextdomain} C library function -@item -For testing and development, it is possible to tell @code{gettext} -to use @file{.mo} files in a different directory than the standard -one by using the @code{bindtextdomain} function. - -@item -At runtime, @command{guide} looks up each string via a call -to @code{gettext}. The returned string is the translated string -if available, or the original string if not. - -@item -If necessary, it is possible to access messages from a different -text domain than the one belonging to the application, without -having to switch the application's default text domain back -and forth. -@end enumerate - -@cindex @code{gettext} C library function -In C (or C++), the string marking and dynamic translation lookup -are accomplished by wrapping each string in a call to @code{gettext}: - -@example -printf(gettext("Don't Panic!\n")); -@end example - -The tools that extract messages from source code pull out all -strings enclosed in calls to @code{gettext}. - -@cindex @code{_} C macro (@code{gettext}) -The GNU @code{gettext} developers, recognizing that typing -@samp{gettext} over and over again is both painful and ugly to look -at, use the macro @samp{_} (an underscore) to make things easier: - -@example -/* In the standard header file: */ -#define _(str) gettext(str) - -/* In the program text: */ -printf(_("Don't Panic!\n")); -@end example - -@cindex locale categories -@noindent -This reduces the typing overhead to just three extra characters per string -and is considerably easier to read as well. -There are locale @dfn{categories} -for different types of locale-related information. -The defined locale categories that @code{gettext} knows about are: - -@table @code -@cindex @code{LC_MESSAGES} locale category -@item LC_MESSAGES -Text messages. This is the default category for @code{gettext} -operations, but it is possible to supply a different one explicitly, -if necessary. (It is almost never necessary to supply a different category.) - -@cindex @code{LC_COLLATE} locale category -@item LC_COLLATE -Text collation information; i.e., how different characters -and/or groups of characters sort in a given language. - -@cindex @code{LC_CTYPE} locale category -@item LC_CTYPE -Character type information (alphabetic, digit, upper- or lowercase, and -so on). -This information is accessed via the -POSIX character classes in regular expressions, -such as @code{/[[:alnum:]]/} -(@pxref{Regexp Operators, ,Regular Expression Operators}). - -@cindex @code{LC_MONETARY} locale category -@item LC_MONETARY -Monetary information, such as the currency symbol, and whether the -symbol goes before or after a number. - -@cindex @code{LC_NUMERIC} locale category -@item LC_NUMERIC -Numeric information, such as which characters to use for the decimal -point and the thousands separator.@footnote{Americans -use a comma every three decimal places and a period for the decimal -point, while many Europeans do exactly the opposite: -@code{1,234.56} vs.@: @code{1.234,56}.} - -@cindex @code{LC_RESPONSE} locale category -@item LC_RESPONSE -Response information, such as how ``yes'' and ``no'' appear in the -local language, and possibly other information as well. - -@cindex @code{LC_TIME} locale category -@item LC_TIME -Time and date related information, such as 12- or 24-hour clock, month printed -before or after day in a date, local month abbreviations, and so on. - -@cindex @code{LC_ALL} locale category -@item LC_ALL -All of the above. (Not too useful in the context of @code{gettext}.) -@end table - -@node Programmer i18n, Translator i18n, Explaining gettext, Internationalization -@section Internationalizing @command{awk} Programs - -@command{gawk} provides the following variables and functions for -internationalization: - -@table @code -@cindex @code{TEXTDOMAIN} variable -@item TEXTDOMAIN -This variable indicates the application's text domain. -For compatibility with GNU @code{gettext}, the default -value is @code{"messages"}. - -@cindex internationalization, marked strings -@cindex marked strings for internationalization -@item _"your message here" -String constants marked with a leading underscore -are candidates for translation at runtime. -String constants without a leading underscore are not translated. - -@cindex @code{dcgettext} built-in function -@item dcgettext(@var{string} @r{[}, @var{domain} @r{[}, @var{category}@r{]]}) -This built-in function returns the translation of @var{string} in -text domain @var{domain} for locale category @var{category}. -The default value for @var{domain} is the current value of @code{TEXTDOMAIN}. -The default value for @var{category} is @code{"LC_MESSAGES"}. - -If you supply a value for @var{category}, it must be a string equal to -one of the known locale categories described in -@ifnotinfo -the previous @value{SECTION}. -@end ifnotinfo -@ifinfo -@ref{Explaining gettext, ,GNU @code{gettext}}. -@end ifinfo -You must also supply a text domain. Use @code{TEXTDOMAIN} if -you want to use the current domain. - -@strong{Caution:} The order of arguments to the @command{awk} version -of the @code{dcgettext} function is purposely different from the order for -the C version. The @command{awk} version's order was -chosen to be simple and to allow for reasonable @command{awk}-style -default arguments. - -@cindex @code{bindtextdomain} built-in function -@item bindtextdomain(@var{directory} @r{[}, @var{domain}@r{]}) -This built-in function allows you to specify the directory where -@code{gettext} looks for @file{.mo} files, in case they -will not or cannot be placed in the standard locations -(e.g., during testing). -It returns the directory where @var{domain} is ``bound.'' - -The default @var{domain} is the value of @code{TEXTDOMAIN}. -If @var{directory} is the null string (@code{""}), then -@code{bindtextdomain} returns the current binding for the -given @var{domain}. -@end table - -To use these facilities in your @command{awk} program, follow the steps -outlined in -@ifnotinfo -the previous @value{SECTION}, -@end ifnotinfo -@ifinfo -@ref{Explaining gettext, ,GNU @code{gettext}}, -@end ifinfo -like so: - -@enumerate -@item -Set the variable @code{TEXTDOMAIN} to the text domain of -your program. This is best done in a @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), -or it can also be done via the @option{-v} command-line -option (@pxref{Options, ,Command-Line Options}): - -@example -BEGIN @{ - TEXTDOMAIN = "guide" - @dots{} -@} -@end example - -@item -Mark all translatable strings with a leading underscore (@samp{_}) -character. It @emph{must} be adjacent to the opening -quote of the string. For example: - -@example -print _"hello, world" -x = _"you goofed" -printf(_"Number of users is %d\n", nusers) -@end example - -@item -If you are creating strings dynamically, you can -still translate them, using the @code{dcgettext} -built-in function. - -@example -message = nusers " users logged in" -message = dcgettext(message, "adminprog") -print message -@end example - -Here, the call to @code{dcgettext} supplies a different -text domain (@code{"adminprog"}) in which to find the -message, but it uses the default @code{"LC_MESSAGES"} category. - -@item -During development, you might want to put the @file{.mo} -file in a private directory for testing. This is done -with the @code{bindtextdomain} built-in function: - -@example -BEGIN @{ - TEXTDOMAIN = "guide" # our text domain - if (Testing) @{ - # where to find our files - bindtextdomain("testdir") - # joe is in charge of adminprog - bindtextdomain("../joe/testdir", "adminprog") - @} - @dots{} -@} -@end example - -@end enumerate - -@xref{I18N Example, ,A Simple Internationalization Example}, -for an example program showing the steps necessary to create -and use translations from @command{awk}. - -@node Translator i18n, I18N Example, Programmer i18n, Internationalization -@section Translating @command{awk} Programs - -Once a program's translatable strings have been marked, they must -be extracted to create the initial @file{.po} file. -As part of translation, it is often helpful to rearrange the order -in which arguments to @code{printf} are output. - -@command{gawk}'s @option{--gen-po} command-line option extracts -the messages and is discussed next. -After that, @code{printf}'s ability to -rearrange the order for @code{printf} arguments at runtime -is covered. - -@menu -* String Extraction:: Extracting marked strings. -* Printf Ordering:: Rearranging @code{printf} arguments. -* I18N Portability:: @command{awk}-level portability issues. -@end menu - -@node String Extraction, Printf Ordering, Translator i18n, Translator i18n -@subsection Extracting Marked Strings -@cindex string extraction (internationalization) -@cindex marked string extraction (internationalization) -@cindex extraction, of marked strings (internationalization) - -@cindex @code{--gen-po} option -@cindex command-line option, @code{--gen-po} -Once your @command{awk} program is working, and all the strings have -been marked and you've set (and perhaps bound) the text domain, -it is time to produce translations. -First, use the @option{--gen-po} command-line option to create -the initial @file{.po} file: - -@example -$ gawk --gen-po -f guide.awk > guide.po -@end example - -@cindex @code{xgettext} utility -When run with @option{--gen-po}, @command{gawk} does not execute your -program. Instead, it parses it as usual and prints all marked strings -to standard output in the format of a GNU @code{gettext} Portable Object -file. Also included in the output are any constant strings that -appear as the first argument to @code{dcgettext}.@footnote{Eventually, -the @command{xgettext} utility that comes with GNU @code{gettext} will be -taught to automatically run @samp{gawk --gen-po} for @file{.awk} files, -freeing the translator from having to do it manually.} -@xref{I18N Example, ,A Simple Internationalization Example}, -for the full list of steps to go through to create and test -translations for @command{guide}. - -@node Printf Ordering, I18N Portability, String Extraction, Translator i18n -@subsection Rearranging @code{printf} Arguments - -@cindex @code{printf}, positional specifier -@cindex positional specifier, @code{printf} -Format strings for @code{printf} and @code{sprintf} -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}) -present a special problem for translation. -Consider the following:@footnote{This example is borrowed -from the GNU @code{gettext} manual.} - -@c line broken here only for smallbook format -@example -printf(_"String `%s' has %d characters\n", - string, length(string))) -@end example - -A possible German translation for this might be: - -@example -"%d Zeichen lang ist die Zeichenkette `%s'\n" -@end example - -The problem should be obvious: the order of the format -specifications is different from the original! -Even though @code{gettext} can return the translated string -at runtime, -it cannot change the argument order in the call to @code{printf}. - -To solve this problem, @code{printf} format specificiers may have -an additional optional element, which we call a @dfn{positional specifier}. -For example: - -@example -"%2$d Zeichen lang ist die Zeichenkette `%1$s'\n" -@end example - -Here, the positional specifier consists of an integer count, which indicates which -argument to use, and a @samp{$}. Counts are one-based, and the -format string itself is @emph{not} included. Thus, in the following -example, @samp{string} is the first argument and @samp{length(string)} is the second. - -@example -$ gawk 'BEGIN @{ -> string = "Dont Panic" -> printf _"%2$d characters live in \"%1$s\"\n", -> string, length(string) -> @}' -@print{} 10 characters live in "Dont Panic" -@end example - -If present, positional specifiers come first in the format specification, -before the flags, the field width, and/or the precision. - -Positional specifiers can be used with the dynamic field width and -precision capability: - -@example -$ gawk 'BEGIN @{ -> printf("%*.*s\n", 10, 20, "hello") -> printf("%3$*2$.*1$s\n", 20, 10, "hello") -> @}' -@print{} hello -@print{} hello -@end example - -@noindent -@strong{Note:} When using @samp{*} with a positional specifier, the @samp{*} -comes first, then the integer position, and then the @samp{$}. -This is somewhat counter-intutive. - -@cindex @code{printf}, mixing positional specifiers with regular formats -@cindex positional specifiers, mixing with regular formats (@code{printf}) -@cindex format specifiers, mixing regular with positional specifiers (@code{printf}) -@command{gawk} does not allow you to mix regular format specifiers -and those with positional specifiers in the same string: - -@smallexample -$ gawk 'BEGIN @{ printf _"%d %3$s\n", 1, 2, "hi" @}' -@error{} gawk: cmd. line:1: fatal: must use `count$' on all formats or none -@end smallexample - -@strong{Note:} There are some pathological cases that @command{gawk} may fail to -diagnose. In such cases, the output may not be what you expect. -It's still a bad idea to try mixing them, even if @command{gawk} -doesn't detect it. - -Although positional specifiers can be used directly in @command{awk} programs, -their primary purpose is to help in producing correct translations of -format strings into languages different from the one in which the program -is first written. - -@node I18N Portability, , Printf Ordering, Translator i18n -@subsection @command{awk} Portability Issues - -@cindex portability issues -@cindex portability issues, internationalization of @command{awk} programs -@cindex internationalization of @command{awk} programs, portability issues -@command{gawk}'s internationalization features were purposely chosen to -have as little impact as possible on the portability of @command{awk} -programs that use them to other versions of @command{awk}. -Consider this program: - -@example -BEGIN @{ - TEXTDOMAIN = "guide" - if (Test_Guide) # set with -v - bindtextdomain("/test/guide/messages") - print _"don't panic!" -@} -@end example - -@noindent -As written, it won't work on other versions of @command{awk}. -However, it is actually almost portable, requiring very little -change. - -@itemize @bullet -@item -Assignments to @code{TEXTDOMAIN} won't have any effect, -since @code{TEXTDOMAIN} is not special in other @command{awk} implementations. - -@item -Non-GNU versions of @command{awk} treat marked strings -as the concatenation of a variable named @code{_} with the string -following it.@footnote{This is good fodder for an ``Obfuscated -@command{awk}'' contest.} Typically, the variable @code{_} has -the null string (@code{""}) as its value, leaving the original string constant as -the result. - -@item -By defining ``dummy'' functions to replace @code{dcgettext} -and @code{bindtextdomain}, the @command{awk} program can be made to run, but -all the messages are output in the original language. -For example: - -@cindex @code{bindtextdomain} user-defined function -@cindex @code{dcgettext} user-defined function -@example -@c file eg/lib/libintl.awk -function bindtextdomain(dir, domain) -@{ - return dir -@} - -function dcgettext(string, domain, category) -@{ - return string -@} -@c endfile -@end example - -@item -The use of positional specifications in @code{printf} or -@code{sprintf} is @emph{not} portable. -To support @code{gettext} at the C level, many systems' C versions of -@code{sprintf} do support positional specifiers. But it works only if -enough arguments are supplied in the function call. Many versions of -@command{awk} pass @code{printf} formats and arguments unchanged to the -underlying C library version of @code{sprintf}, but only one format and -argument at a time. What happens if a positional specification is -used is anybody's guess. -However, since the positional specifications are primarily for use in -@emph{translated} format strings, and since non-GNU @command{awk}s never -retrieve the translated string, this should not be a problem in practice. -@end itemize - -@node I18N Example, Gawk I18N, Translator i18n, Internationalization -@section A Simple Internationalization Example - -Now let's look at a step-by-step example of how to internationalize and -localize a simple @command{awk} program, using @file{guide.awk} as our -original source: - -@example -@c file eg/prog/guide.awk -BEGIN @{ - TEXTDOMAIN = "guide" - bindtextdomain(".") # for testing - print _"Don't Panic" - print _"The Answer Is", 42 - print "Pardon me, Zaphod who?" -@} -@c endfile -@end example - -@noindent -Run @samp{gawk --gen-po} to create the @file{.po} file: - -@example -$ gawk --gen-po -f guide.awk > guide.po -@end example - -@noindent -This produces: - -@example -@c file eg/data/guide.po -#: guide.awk:4 -msgid "Don't Panic" -msgstr "" - -#: guide.awk:5 -msgid "The Answer Is" -msgstr "" - -@c endfile -@end example - -This original portable object file is saved and reused for each language -into which the application is translated. The @code{msgid} -is the original string and the @code{msgstr} is the translation. - -@strong{Note:} Strings not marked with a leading underscore do not -appear in the @file{guide.po} file. - -Next, the messages must be translated. -Here is a translation to a hypothetical dialect of English, -called ``Mellow'':@footnote{Perhaps it would be better if it were -called ``Hippy.'' Ah, well.} - -@example -@group -$ cp guide.po guide-mellow.po -@var{Add translations to} guide-mellow.po @dots{} -@end group -@end example - -@noindent -Following are the translations: - -@example -@c file eg/data/guide-mellow.po -#: guide.awk:4 -msgid "Don't Panic" -msgstr "Hey man, relax!" - -#: guide.awk:5 -msgid "The Answer Is" -msgstr "Like, the scoop is" - -@c endfile -@end example - -@cindex Linux -@cindex GNU/Linux -The next step is to make the directory to hold the binary message object -file and then to create the @file{guide.mo} file. -The directory layout shown here is standard for GNU @code{gettext} on -GNU/Linux systems. Other versions of @code{gettext} may use a different -layout: - -@example -$ mkdir en_US en_US/LC_MESSAGES -@end example - -@cindex @command{msgfmt} utility -The @command{msgfmt} utility does the conversion from human-readable -@file{.po} file to machine-readable @file{.mo} file. -By default, @command{msgfmt} creates a file named @file{messages}. -This file must be renamed and placed in the proper directory so that -@command{gawk} can find it: - -@example -$ msgfmt guide-mellow.po -$ mv messages en_US/LC_MESSAGES/guide.mo -@end example - -Finally, we run the program to test it: - -@example -$ gawk -f guide.awk -@print{} Hey man, relax! -@print{} Like, the scoop is 42 -@print{} Pardon me, Zaphod who? -@end example - -If the two replacement functions for @code{dcgettext} -and @code{bindtextdomain} -(@pxref{I18N Portability, ,@command{awk} Portability Issues}) -are in a file named @file{libintl.awk}, -then we can run @file{guide.awk} unchanged as follows: - -@example -$ gawk --posix -f guide.awk -f libintl.awk -@print{} Don't Panic -@print{} The Answer Is 42 -@print{} Pardon me, Zaphod who? -@end example - -@node Gawk I18N, , I18N Example, Internationalization -@section @command{gawk} Can Speak Your Language - -As of @value{PVERSION} 3.1, @command{gawk} itself has been internationalized -using the GNU @code{gettext} package. -@ifinfo -(GNU @code{gettext} is described in -complete detail in -@ref{Top}.) -@end ifinfo -@ifnotinfo -(GNU @code{gettext} is described in -complete detail in -@cite{GNU gettext tools}.) -@end ifnotinfo -As of this writing, the latest version of GNU @code{gettext} is -@uref{ftp://gnudist.gnu.org/gnu/gettext/gettext-0.10.37.tar.gz, @value{PVERSION} 0.10.37}. - -If a translation of @command{gawk}'s messages exists, -then @command{gawk} produces usage messages, warnings, -and fatal errors in the local language. - -@cindex @code{--with-included-gettext} configuration option -@cindex configuration option, @code{--with-included-gettext} -On systems that do not use @value{PVERSION} 2 (or later) of the GNU C library, you should -configure @command{gawk} with the @option{--with-included-gettext} option -before compiling and installing it. -@xref{Additional Configuration Options}, -for more information. - -@node Advanced Features, Invoking Gawk, Internationalization, Top -@chapter Advanced Features of @command{gawk} -@cindex advanced features -@cindex features, advanced -@ignore -Contributed by: Peter Langston <pud!psl@bellcore.bellcore.com> - - Found in Steve English's "signature" line: - -"Write documentation as if whoever reads it is a violent psychopath -who knows where you live." -@end ignore -@quotation -@i{Write documentation as if whoever reads it is -a violent psychopath who knows where you live.}@* -Steve English, as quoted by Peter Langston -@end quotation - -This @value{CHAPTER} discusses advanced features in @command{gawk}. -It's a bit of a ``grab bag'' of items that are otherwise unrelated -to each other. -First, a command-line option allows @command{gawk} to recognize -non-decimal numbers in input data, not just in @command{awk} -programs. Next, two-way I/O, discussed briefly in earlier parts of this -@value{DOCUMENT}, is described in full detail, along with the basics -of TCP/IP networking and BSD portal files. Finally, @command{gawk} -can @dfn{profile} an @command{awk} program, making it possible to tune -it for performance. - -@ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}, -discusses the ability to dynamically add new built-in functions to -@command{gawk}. As this feature is still immature and likely to change, -its description is relegated to an appendix. - -@menu -* Non-decimal Data:: Allowing non-decimal input data. -* Two-way I/O:: Two-way communications with another process. -* TCP/IP Networking:: Using @command{gawk} for network programming. -* Portal Files:: Using @command{gawk} with BSD portals. -* Profiling:: Profiling your @command{awk} programs. -@end menu - -@node Non-decimal Data, Two-way I/O, Advanced Features, Advanced Features -@section Allowing Non-Decimal Input Data -@cindex @code{--non-decimal-data} option -@cindex command-line option, @code{--non-decimal-data} - -If you run @command{gawk} with the @option{--non-decimal-data} option, -you can have non-decimal constants in your input data: - -@c line break here for small book format -@example -$ echo 0123 123 0x123 | -> gawk --non-decimal-data '@{ printf "%d, %d, %d\n", -> $1, $2, $3 @}' -@print{} 83, 123, 291 -@end example - -For this feature to work, write your program so that -@command{gawk} treats your data as numeric: - -@example -$ echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}' -@print{} 0123 123 0x123 -@end example - -@noindent -The @code{print} statement treats its expressions as strings. -Although the fields can act as numbers when necessary, -they are still strings, so @code{print} does not try to treat them -numerically. You may need to add zero to a field to force it to -be treated as a number. For example: - -@example -$ echo 0123 123 0x123 | gawk --non-decimal-data ' -> @{ print $1, $2, $3 -> print $1 + 0, $2 + 0, $3 + 0 @}' -@print{} 0123 123 0x123 -@print{} 83 123 291 -@end example - -Because it is common to have decimal data with leading zeros, and because -using it could lead to surprising results, the default is to leave this -facility disabled. If you want it, you must explicitly request it. - -@cindex conventions, programming -@cindex programming conventions -@strong{Caution:} -@emph{Use of this option is not recommended.} -It can break old programs very badly. -Instead, use the @code{strtonum} function to convert your data -(@pxref{Non-decimal-numbers, ,Octal and Hexadecimal Numbers}). -This makes your programs easier to write and easier to read, and -leads to less surprising results. - -@node Two-way I/O, TCP/IP Networking, Non-decimal Data, Advanced Features -@section Two-Way Communications with Another Process -@cindex Brennan, Michael -@cindex sex, programmer attractiveness -@smallexample -@c Path: cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan -From: brennan@@whidbey.com (Mike Brennan) -Newsgroups: comp.lang.awk -Subject: Re: Learn the SECRET to Attract Women Easily -Date: 4 Aug 1997 17:34:46 GMT -@c Organization: WhidbeyNet -@c Lines: 12 -Message-ID: <5s53rm$eca@@news.whidbey.com> -@c References: <5s20dn$2e1@chronicle.concentric.net> -@c Reply-To: brennan@whidbey.com -@c NNTP-Posting-Host: asn202.whidbey.com -@c X-Newsreader: slrn (0.9.4.1 UNIX) -@c Xref: cssun.mathcs.emory.edu comp.lang.awk:5403 - -On 3 Aug 1997 13:17:43 GMT, Want More Dates??? -<tracy78@@kilgrona.com> wrote: ->Learn the SECRET to Attract Women Easily -> ->The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women - -The scent of awk programmers is a lot more attractive to women than -the scent of perl programmers. --- -Mike Brennan -@c brennan@@whidbey.com -@end smallexample - -It is often useful to be able to -send data to a separate program for -processing and then read the result. This can always be -done with temporary files: - -@example -# write the data for processing -tempfile = ("/tmp/mydata." PROCINFO["pid"]) -while (@var{not done with data}) - print @var{data} | ("subprogram > " tempfile) -close("subprogram > " tempfile) - -# read the results, remove tempfile when done -while ((getline newdata < tempfile) > 0) - @var{process} newdata @var{appropriately} -close(tempfile) -system("rm " tempfile) -@end example - -@noindent -This works, but not elegantly. - -@cindex coprocess -@cindex two-way I/O -@cindex I/O, two-way -@cindex @code{|&} I/O operator -@cindex @command{csh} utility -Starting with @value{PVERSION} 3.1 of @command{gawk}, it is possible to -open a @emph{two-way} pipe to another process. The second process is -termed a @dfn{coprocess}, since it runs in parallel with @command{gawk}. -The two-way connection is created using the new @samp{|&} operator -(borrowed from the Korn Shell, @command{ksh}):@footnote{This is very -different from the same operator in the C shell, @command{csh}.} - -@example -do @{ - print @var{data} |& "subprogram" - "subprogram" |& getline results -@} while (@var{data left to process}) -close("subprogram") -@end example - -The first time an I/O operation is executed using the @samp{|&} -operator, @command{gawk} creates a two-way pipeline to a child process -that runs the other program. Output created with @code{print} -or @code{printf} is written to the program's standard input, and -output from the program's standard output can be read by the @command{gawk} -program using @code{getline}. -As is the case with processes started by @samp{|}, the subprogram -can be any program, or pipeline of programs, that can be started by -the shell. - -There are some cautionary items to be aware of: - -@itemize @bullet -@item -As the code inside @command{gawk} currently stands, the coprocess's -standard error goes to the same place that the parent @command{gawk}'s -standard error goes. It is not possible to read the child's -standard error separately. - -@cindex deadlock -@item -I/O buffering may be a problem. @command{gawk} automatically -flushes all output down the pipe to the child process. -However, if the coprocess does not flush its output, -@command{gawk} may hang when doing a @code{getline} in order to read -the coprocess's results. This could lead to a situation -known as @dfn{deadlock}, where each process is waiting for the -other one to do something. -@end itemize - -It is possible to close just one end of the two-way pipe to -a coprocess, by supplying a second argument to the @code{close} -function of either @code{"to"} or @code{"from"} -(@pxref{Close Files And Pipes, ,Closing Input and Output Redirections}). -These strings tell @command{gawk} to close the end of the pipe -that sends data to the process or the end that reads from it, -respectively. - -This is particularly necessary in order to use -the system @command{sort} utility as part of a coprocess; -@command{sort} must read @emph{all} of its input -data before it can produce any output. -The @command{sort} program does not receive an end-of-file indication -until @command{gawk} closes the write end of the pipe. - -When you have finished writing data to the @command{sort} -utility, you can close the @code{"to"} end of the pipe, and -then start reading sorted data via @code{getline}. -For example: - -@example -BEGIN @{ - command = "LC_ALL=C sort" - n = split("abcdefghijklmnopqrstuvwxyz", a, "") - - for (i = n; i > 0; i--) - print a[i] |& command - close(command, "to") - - while ((command |& getline line) > 0) - print "got", line - close(command) -@} -@end example - -This program writes the letters of the alphabet in reverse order, one -per line, down the two-way pipe to @command{sort}. It then closes the -write end of the pipe, so that @command{sort} receives an end-of-file -indication. This causes @command{sort} to sort the data and write the -sorted data back to the @command{gawk} program. Once all of the data -has been read, @command{gawk} terminates the coprocess and exits. - -As a side note, the assignment @samp{LC_ALL=C} in the @command{sort} -command ensures traditional Unix (ASCII) sorting from @command{sort}. - -@node TCP/IP Networking, Portal Files, Two-way I/O, Advanced Features -@section Using @command{gawk} for Network Programming -@cindex networking, TCP/IP -@cindex TCP/IP networking -@cindex @file{/inet} special files -@cindex @code{EMISTERED} -@quotation -@code{EMISTERED}: @i{A host is a host from coast to coast,@* -and no-one can talk to host that's close,@* -unless the host that isn't close@* -is busy hung or dead.} -@end quotation - -In addition to being able to open a two-way pipeline to a coprocess -on the same system -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}), -it is possible to make a two-way connection to -another process on another system across an IP networking connection. - -You can think of this as just a @emph{very long} two-way pipeline to -a coprocess. -The way @command{gawk} decides that you want to use TCP/IP networking is -by recognizing special @value{FN}s that begin with @samp{/inet/}. - -The full syntax of the special @value{FN} is -@file{/inet/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}. -The meaning of the components are: - -@table @var -@item protocol -The protocol to use over IP. This must be either @samp{tcp}, -@samp{udp}, or @samp{raw}, for a TCP, UDP, or raw IP connection, -respectively. The use of TCP is recommended for most applications. - -@strong{Caution:} The use of raw sockets is not currently supported -in @value{PVERSION} 3.1 of @command{gawk}. - -@item local-port -@cindex @code{getservbyname} C library function -The local TCP or UDP port number to use. Use a port number of @samp{0} -when you want the system to pick a port. This is what you should do -when writing a TCP or UDP client. -You may also use a well-known service name, such as @samp{smtp} -or @samp{http}, in which case @command{gawk} attempts to determine -the pre-defined port number using the C @code{getservbyname} function. - -@item remote-host -The IP address or fully-qualified domain name of the Internet -host to which you want to connect. - -@item remote-port -The TCP or UDP port number to use on the given @var{remote-host}. -Again, use @samp{0} if you don't care, or else a well-known -service name. -@end table - -Consider the following very simple example: - -@example -BEGIN @{ - Service = "/inet/tcp/0/localhost/daytime" - Service |& getline - print $0 - close(Service) -@} -@end example - -This program reads the current date and time from the local system's -TCP @samp{daytime} server. -It then prints the results and closes the connection. - -Because this topic is extensive, the use of @command{gawk} for -TCP/IP programming is documented separately. -@ifinfo -@xref{Top}, -@end ifinfo -@ifnotinfo -See @cite{TCP/IP Internetworking with @command{gawk}}, -which comes as part of the @command{gawk} distribution, -@end ifnotinfo -for a much more complete introduction and discussion, as well as -extensive examples. - -@node Portal Files, Profiling, TCP/IP Networking, Advanced Features -@section Using @command{gawk} with BSD Portals -@cindex portal files -@cindex BSD portal files -@cindex TCP/IP networking -@cindex @file{/p} special files -@cindex @code{--enable-portals} configuration option -@cindex configuration option, @code{--enable-portals} -@cindex BSD-based operating systems - -Similar to the @file{/inet} special files, if @command{gawk} -is configured with the @option{--enable-portals} option -(@pxref{Quick Installation, , Compiling @command{gawk} for Unix}), -then @command{gawk} treats -files whose pathnames begin with @code{/p} as 4.4 BSD-style portals. - -When used with the @samp{|&} operator, @command{gawk} opens the file -for two-way communications. The operating system's portal mechanism -then manages creating the process associated with the portal and -the corresponding communications with the portal's process. - -@node Profiling, , Portal Files, Advanced Features -@section Profiling Your @command{awk} Programs -@cindex profiling @command{awk} programs -@cindex @command{pgawk} program - -Beginning with @value{PVERSION} 3.1 of @command{gawk}, you may produce execution -traces of your @command{awk} programs. -This is done with a specially compiled version of @command{gawk}, -called @command{pgawk} (``profiling @command{gawk}''). - -@cindex @file{awkprof.out} profiling output file -@cindex profiling output file (@file{awkprof.out}) -@command{pgawk} is identical in every way to @command{gawk}, except that when -it has finished running, it creates a profile of your program in a file -named @file{awkprof.out}. -Because it is profiling, it also executes up to 45 percent slower than -@command{gawk} normally does. - -As shown in the following example, -the @option{--profile} option can be used to change the name of the file -where @command{pgawk} will write the profile: - -@example -$ pgawk --profile=myprog.prof -f myprog.awk data1 data2 -@end example - -@noindent -In the above example, @command{pgawk} places the profile in -@file{myprog.prof} instead of in @file{awkprof.out}. - -Regular @command{gawk} also accepts this option. When called with just -@option{--profile}, @command{gawk} ``pretty prints'' the program into -@file{awkprof.out}, without any execution counts. You may supply an -option to @option{--profile} to change the @value{FN}. Here is a sample -session showing a simple @command{awk} program, its input data, and the -results from running @command{pgawk}. First, the @command{awk} program: - -@example -BEGIN @{ print "First BEGIN rule" @} - -END @{ print "First END rule" @} - -/foo/ @{ - print "matched /foo/, gosh" - for (i = 1; i <= 3; i++) - sing() -@} - -@{ - if (/foo/) - print "if is true" - else - print "else is true" -@} - -BEGIN @{ print "Second BEGIN rule" @} - -END @{ print "Second END rule" @} - -function sing( dummy) -@{ - print "I gotta be me!" -@} -@end example - -Following is the input data: - -@example -foo -bar -baz -foo -junk -@end example - -Here is the @file{awkprof.out} that results from running @command{pgawk} -on this program and data. (This example also illustrates that @command{awk} -programmers sometimes have to work late.): - -@cindex blocks, @code{BEGIN} and @code{END} -@example - # gawk profile, created Sun Aug 13 00:00:15 2000 - - # BEGIN block(s) - - BEGIN @{ - 1 print "First BEGIN rule" - 1 print "Second BEGIN rule" - @} - - # Rule(s) - - 5 /foo/ @{ # 2 - 2 print "matched /foo/, gosh" - 6 for (i = 1; i <= 3; i++) @{ - 6 sing() - @} - @} - - 5 @{ - 5 if (/foo/) @{ # 2 - 2 print "if is true" - 3 @} else @{ - 3 print "else is true" - @} - @} - - # END block(s) - - END @{ - 1 print "First END rule" - 1 print "Second END rule" - @} - - # Functions, listed alphabetically - - 6 function sing(dummy) - @{ - 6 print "I gotta be me!" - @} -@end example - -The previous example illustrates many of the basic rules for profiling output. -The rules are as follows: - -@itemize @bullet -@item -The program is printed in the order @code{BEGIN} rule, -pattern/action rules, @code{END} rule and functions, listed -alphabetically. -Multiple @code{BEGIN} and @code{END} rules are merged together. - -@item -Pattern-action rules have two counts. -The first count, to the left of the rule, shows how many times -the rule's pattern was @emph{tested}. -The second count, to the right of the rule's opening left brace -in a comment, -shows how many times the rule's action was @emph{executed}. -The difference between the two indicates how many times the rule's -pattern evaluated to false. - -@item -Similarly, -the count for an @code{if}-@code{else} statement shows how many times -the condition was tested. -To the right of the opening left brace for the @code{if}'s body -is a count showing how many times the condition was true. -The count for the @code{else} -indicates how many times the test failed. - -@item -The count for a loop header (such as @code{for} -or @code{while}) shows how many times the loop test was executed. -(Because of this, you can't just look at the count on the first -statement in a rule to determine how many times the rule was executed. -If the first statement is a loop, the count is misleading.) - -@item -For user-defined functions, the count next to the @code{function} -keyword indicates how many times the function was called. -The counts next to the statements in the body show how many times -those statements were executed. - -@item -The layout uses ``K&R'' style using tabs. -Braces are used everywhere, even when -the body of an @code{if}, @code{else}, or loop is only a single statement. - -@item -Parentheses are used only where needed, as indicated by the structure -of the program and the precedence rules. -@c extra verbiage here satisfies the copyeditor. ugh. -For example, @samp{(3 + 5) * 4} means add three plus five, then multiply -the total by four. However, @samp{3 + 5 * 4} has no parentheses, and -means @samp{3 + (5 * 4)}. - -@item -All string concatenations are parenthesized too. -(This could be made a bit smarter.) - -@item -Parentheses are used around the arguments to @code{print} -and @code{printf} only when -the @code{print} or @code{printf} statement is followed by a redirection. -Similarly, if -the target of a redirection isn't a scalar, it gets parenthesized. - -@item -@command{pgawk} supplies leading comments in -front of the @code{BEGIN} and @code{END} rules, -the pattern/action rules, and the functions. - -@end itemize - -The profiled version of your program may not look exactly like what you -typed when you wrote it. This is because @command{pgawk} creates the -profiled version by ``pretty printing'' its internal representation of -the program. The advantage to this is that @command{pgawk} can produce -a standard representation. The disadvantage is that all source code -comments are lost, as are the distinctions among multiple @code{BEGIN} -and @code{END} rules. Also, things such as: - -@example -/foo/ -@end example - -@noindent -come out as: - -@example -/foo/ @{ - print $0 -@} -@end example - -@noindent -which is correct, but possibly surprising. - -@cindex dynamic profiling -@cindex profiling, dynamic -Besides creating profiles when a program has completed, -@command{pgawk} can produce a profile while it is running. -This is useful if your @command{awk} program goes into an -infinite loop and you want to see what has been executed. -To use this feature, run @command{pgawk} in the background: - -@example -$ pgawk -f myprog & -[1] 13992 -@end example - -@cindex @command{kill} command -@cindex @code{SIGUSR1} signal -@cindex @code{USR1} signal -@cindex signals, @code{SIGUSR1} -@noindent -The shell prints a job number and process ID number, in this case, 13992. -Use the @command{kill} command to send the @code{USR1} signal -to @command{pgawk}: - -@example -$ kill -USR1 13992 -@end example - -@noindent -As usual, the profiled version of the program is written to -@file{awkprof.out}, or to a different file if you use the @option{--profile} -option. - -Along with the regular profile, as shown earlier, the profile -includes a trace of any active functions: - -@example -# Function Call Stack: - -# 3. baz -# 2. bar -# 1. foo -# -- main -- -@end example - -You may send @command{pgawk} the @code{USR1} signal as many times as you like. -Each time, the profile and function call trace are appended to the output -profile file. - -@cindex @code{SIGHUP} signal -@cindex @code{HUP} signal -@cindex signals, @code{SIGHUP} -If you use the @code{HUP} signal instead of the @code{USR1} signal, -@command{pgawk} produces the profile and the function call trace, and then exits. - -@node Invoking Gawk, Library Functions, Advanced Features, Top -@chapter Running @command{awk} and @command{gawk} - -This @value{CHAPTER} covers how to run awk, both POSIX-standard -and @command{gawk}-specific command-line options, and what -@command{awk} and -@command{gawk} do with non-option arguments. -It then proceeds to cover how @command{gawk} searches for source files, -obsolete options and/or features, and known bugs in @command{gawk}. -This @value{CHAPTER} rounds out the discussion of @command{awk} -as a program and as a language. - -While a number of the options and features described here were -discussed in passing earlier in the book, this @value{CHAPTER} provides the -full details. - -@menu -* Command Line:: How to run @command{awk}. -* Options:: Command-line options and their meanings. -* Other Arguments:: Input file names and variable assignments. -* AWKPATH Variable:: Searching directories for @command{awk} - programs. -* Obsolete:: Obsolete Options and/or features. -* Undocumented:: Undocumented Options and Features. -* Known Bugs:: Known Bugs in @command{gawk}. -@end menu - -@node Command Line, Options, Invoking Gawk, Invoking Gawk -@section Invoking @command{awk} -@cindex command line -@cindex invocation of @command{gawk} -@cindex arguments, command-line -@cindex options, command-line -@cindex long options -@cindex options, long - -There are two ways to run @command{awk}---with an explicit program or with -one or more program files. Here are templates for both of them; items -enclosed in [@dots{}] in these templates are optional: - -@example -awk @r{[@var{options}]} -f progfile @r{[@code{--}]} @var{file} @dots{} -awk @r{[@var{options}]} @r{[@code{--}]} '@var{program}' @var{file} @dots{} -@end example - -Besides traditional one-letter POSIX-style options, @command{gawk} also -supports GNU long options. - -@cindex empty program -@cindex dark corner -@cindex lint checks -It is possible to invoke @command{awk} with an empty program: - -@example -awk '' datafile1 datafile2 -@end example - -@noindent -Doing so makes little sense though; @command{awk} exits -silently when given an empty program. -@value{DARKCORNER} -If @option{--lint} has -been specified on the command-line, @command{gawk} issues a -warning that the program is empty. - -@node Options, Other Arguments, Command Line, Invoking Gawk -@section Command-Line Options - -Options begin with a dash and consist of a single character. -GNU-style long options consist of two dashes and a keyword. -The keyword can be abbreviated, as long as the abbreviation allows the option -to be uniquely identified. If the option takes an argument, then the -keyword is either immediately followed by an equals sign (@samp{=}) and the -argument's value, or the keyword and the argument's value are separated -by whitespace. -If a particular option with a value is given more than once, it is the -last value that counts. - -Each long option for @command{gawk} has a corresponding -POSIX-style option. -The long and short options are -interchangeable in all contexts. -The options and their meanings are as follows: - -@table @code -@item -F @var{fs} -@itemx --field-separator @var{fs} -@cindex @code{-F} option -@cindex command-line option, @code{-F} -@cindex @code{--field-separator} option -@cindex command-line option, @code{--field-separator} -Sets the @code{FS} variable to @var{fs} -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). - -@item -f @var{source-file} -@itemx --file @var{source-file} -@cindex @code{-f} option -@cindex command-line option, @code{-f} -@cindex @code{--file} option -@cindex command-line option, @code{--file} -Indicates that the @command{awk} program is to be found in @var{source-file} -instead of in the first non-option argument. - -@item -v @var{var}=@var{val} -@itemx --assign @var{var}=@var{val} -@cindex @code{-v} option -@cindex command-line option, @code{-v} -@cindex @code{--assign} option -@cindex command-line option, @code{--assign} -Sets the variable @var{var} to the value @var{val} @emph{before} -execution of the program begins. Such variable values are available -inside the @code{BEGIN} rule -(@pxref{Other Arguments, ,Other Command-Line Arguments}). - -The @option{-v} option can only set one variable, but it can be used -more than once, setting another variable each time, like this: -@samp{awk @w{-v foo=1} @w{-v bar=2} @dots{}}. - -@strong{Caution:} Using @option{-v} to set the values of the built-in -variables may lead to surprising results. @command{awk} will reset the -values of those variables as it needs to, possibly ignoring any -predefined value you may have given. - -@item -mf @var{N} -@itemx -mr @var{N} -@cindex @code{-mf} option -@cindex command-line option, @code{-mf} -@cindex @code{-mr} option -@cindex command-line option, @code{-mr} -Set various memory limits to the value @var{N}. The @samp{f} flag sets -the maximum number of fields and the @samp{r} flag sets the maximum -record size. These two flags and the @option{-m} option are from the -Bell Laboratories research version of Unix @command{awk}. They are provided -for compatibility but otherwise ignored by -@command{gawk}, since @command{gawk} has no predefined limits. -(The Bell Laboratories @command{awk} no longer needs these options; -it continues to accept them to avoid breaking old programs.) - -@item -W @var{gawk-opt} -@cindex @code{-W} option -@cindex command-line option, @code{-W} -Following the POSIX standard, implementation-specific -options are supplied as arguments to the @option{-W} option. These options -also have corresponding GNU-style long options. -Note that the long options may be abbreviated, as long as -the abbreviations remain unique. -The full list of @command{gawk}-specific options is provided next. - -@item -- -Signals the end of the command-line options. The following arguments -are not treated as options even if they begin with @samp{-}. This -interpretation of @option{--} follows the POSIX argument parsing -conventions. - -This is useful if you have @value{FN}s that start with @samp{-}, -or in shell scripts, if you have @value{FN}s that will be specified -by the user that could start with @samp{-}. -@end table - -The previous list described options mandated by the POSIX standard, -as well as options available in the Bell Laboratories version of @command{awk}. -The following list describes @command{gawk}-specific options: - -@table @code -@item -W compat -@itemx -W traditional -@itemx --compat -@itemx --traditional -@cindex @code{--compat} option -@cindex command-line option, @code{--compat} -@cindex @code{--traditional} option -@cindex command-line option, @code{--traditional} -@cindex compatibility mode -Specifies @dfn{compatibility mode}, in which the GNU extensions to -the @command{awk} language are disabled, so that @command{gawk} behaves just -like the Bell Laboratories research version of Unix @command{awk}. -@option{--traditional} is the preferred form of this option. -@xref{POSIX/GNU, ,Extensions in @command{gawk} Not in POSIX @command{awk}}, -which summarizes the extensions. Also see -@ref{Compatibility Mode, ,Downward Compatibility and Debugging}. - -@item -W copyright -@itemx --copyright -@cindex @code{--copyright} option -@cindex command-line option, @code{--copyright} -Print the short version of the General Public License and then exit. - -@item -W copyleft -@itemx --copyleft -@cindex @code{--copyleft} option -@cindex command-line option, @code{--copyleft} -Just like @option{--copyright}. -This option may disappear in a future version of @command{gawk}. - -@cindex @code{--dump-variables} option -@cindex command-line option, @code{--dump-variables} -@cindex @file{awkvars.out} global variable list output file -@item -W dump-variables@r{[}=@var{file}@r{]} -@itemx --dump-variables@r{[}=@var{file}@r{]} -Print a sorted list of global variables, their types, and final values -to @var{file}. If no @var{file} is provided, @command{gawk} prints this -list to a file named @file{awkvars.out} in the current directory. - -@cindex common mistakes -@cindex mistakes, common -@cindex errors, common -Having a list of all the global variables is a good way to look for -typographical errors in your programs. -You would also use this option if you have a large program with a lot of -functions, and you want to be sure that your functions don't -inadvertently use global variables that you meant to be local. -(This is a particularly easy mistake to make with simple variable -names like @code{i}, @code{j}, and so on.) - -@item -W gen-po -@itemx --gen-po -@cindex @code{--gen-po} option -@cindex command-line option, @code{--gen-po} -Analyze the source program and -generate a GNU @code{gettext} Portable Object file on standard -output for all string constants that have been marked for translation. -@xref{Internationalization, ,Internationalization with @command{gawk}}, -for information about this option. - -@item -W help -@itemx -W usage -@itemx --help -@itemx --usage -@cindex @code{--help} option -@cindex command-line option, @code{--help} -@cindex @code{--usage} option -@cindex command-line option, @code{--usage} -Print a ``usage'' message summarizing the short and long style options -that @command{gawk} accepts and then exit. - -@item -W lint@r{[}=fatal@r{]} -@itemx --lint@r{[}=fatal@r{]} -@cindex @code{--lint} option -@cindex command-line option, @code{--lint} -@cindex lint checks -@cindex fatal errors -Warn about constructs that are dubious or non-portable to -other @command{awk} implementations. -Some warnings are issued when @command{gawk} first reads your program. Others -are issued at runtime, as your program executes. -With an optional argument of @samp{fatal}, -lint warnings become fatal errors. -This may be drastic but its use will certainly encourage the -development of cleaner @command{awk} programs. - -@item -W lint-old -@itemx --lint-old -@cindex @code{--lint-old} option -@cindex command-line option, @code{--lint-old} -@cindex lint checks -Warn about constructs that are not available in the original version of -@command{awk} from Version 7 Unix -(@pxref{V7/SVR3.1, ,Major Changes Between V7 and SVR3.1}). - -@item -W non-decimal-data -@itemx --non-decimal-data -@cindex @code{--non-decimal-data} option -@cindex command-line option, @code{--non-decimal-data} -Enable automatic interpretation of octal and hexadecimal -values in input data -(@pxref{Non-decimal Data, ,Allowing Non-Decimal Input Data}). - -@strong{Caution:} This option can severely break old programs. -Use with care. - -@item -W posix -@itemx --posix -@cindex @code{--posix} option -@cindex command-line option, @code{--posix} -@cindex POSIX mode -Operate in strict POSIX mode. This disables all @command{gawk} -extensions (just like @option{--traditional}) and adds the following additional -restrictions: - -@c IMPORTANT! Keep this list in sync with the one in node POSIX - -@itemize @bullet -@item -@code{\x} escape sequences are not recognized -(@pxref{Escape Sequences}). - -@item -Newlines do not act as whitespace to separate fields when @code{FS} is -equal to a single space -(@pxref{Fields, , Examining Fields}). - -@item -Newlines are not allowed after @samp{?} or @samp{:} -(@pxref{Conditional Exp, ,Conditional Expressions}). - -@item -The synonym @code{func} for the keyword @code{function} is not -recognized (@pxref{Definition Syntax, ,Function Definition Syntax}). - -@item -The @samp{**} and @samp{**=} operators cannot be used in -place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators}, -and also @pxref{Assignment Ops, ,Assignment Expressions}). - -@item -Specifying @samp{-Ft} on the command-line does not set the value -of @code{FS} to be a single tab character -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). - -@item -The @code{fflush} built-in function is not supported -(@pxref{I/O Functions, ,Input/Output Functions}). -@end itemize - -@cindex automatic warnings -@cindex warnings, automatic -If you supply both @option{--traditional} and @option{--posix} on the -command-line, @option{--posix} takes precedence. @command{gawk} -also issues a warning if both options are supplied. - -@item -W profile@r{[}=@var{file}@r{]} -@itemx --profile@r{[}=@var{file}@r{]} -@cindex @code{--profile} option -@cindex command-line option, @code{--profile} -Enable profiling of @command{awk} programs -(@pxref{Profiling, ,Profiling Your @command{awk} Programs}). -By default, profiles are created in a file named @file{awkprof.out}. -The optional @var{file} argument allows you to specify a different -@value{FN} for the profile file. - -When run with @command{gawk}, the profile is just a ``pretty printed'' version -of the program. When run with @command{pgawk}, the profile contains execution -counts for each statement in the program in the left margin, and function -call counts for each function. - -@item -W re-interval -@itemx --re-interval -@cindex @code{--re-interval} option -@cindex command-line option, @code{--re-interval} -Allow interval expressions -(@pxref{Regexp Operators, , Regular Expression Operators}) -in regexps. -Because interval expressions were traditionally not available in @command{awk}, -@command{gawk} does not provide them by default. This prevents old @command{awk} -programs from breaking. - -@item -W source @var{program-text} -@itemx --source @var{program-text} -@cindex @code{--source} option -@cindex command-line option, @code{--source} -Program source code is taken from the @var{program-text}. This option -allows you to mix source code in files with source -code that you enter on the command-line. This is particularly useful -when you have library functions that you want to use from your command-line -programs (@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). - -@item -W version -@itemx --version -@cindex @code{--version} option -@cindex command-line option, @code{--version} -Print version information for this particular copy of @command{gawk}. -This allows you to determine if your copy of @command{gawk} is up to date -with respect to whatever the Free Software Foundation is currently -distributing. -It is also useful for bug reports -(@pxref{Bugs, , Reporting Problems and Bugs}). -@end table - -As long as program text has been supplied, -any other options are flagged as invalid with a warning message but -are otherwise ignored. - -In compatibility mode, as a special case, if the value of @var{fs} supplied -to the @option{-F} option is @samp{t}, then @code{FS} is set to the tab -character (@code{"\t"}). This is only true for @option{--traditional} and not -for @option{--posix} -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). - -The @option{-f} option may be used more than once on the command-line. -If it is, @command{awk} reads its program source from all of the named files, as -if they had been concatenated together into one big file. This is -useful for creating libraries of @command{awk} functions. These functions -can be written once and then retrieved from a standard place, instead -of having to be included into each individual program. -(As mentioned in -@ref{Definition Syntax, ,Function Definition Syntax}, -function names must be unique.) - -Library functions can still be used, even if the program is entered at the terminal, -by specifying @samp{-f /dev/tty}. After typing your program, -type @kbd{Ctrl-d} (the end-of-file character) to terminate it. -(You may also use @samp{-f -} to read program source from the standard -input but then you will not be able to also use the standard input as a -source of data.) - -Because it is clumsy using the standard @command{awk} mechanisms to mix source -file and command-line @command{awk} programs, @command{gawk} provides the -@option{--source} option. This does not require you to pre-empt the standard -input for your source code; it allows you to easily mix command-line -and library source code -(@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). - -If no @option{-f} or @option{--source} option is specified, then @command{gawk} -uses the first non-option command-line argument as the text of the -program source code. - -@cindex @code{POSIXLY_CORRECT} environment variable -@cindex environment variable, @code{POSIXLY_CORRECT} -@cindex lint checks -If the environment variable @env{POSIXLY_CORRECT} exists, -then @command{gawk} behaves in strict POSIX mode, exactly as if -you had supplied the @option{--posix} command-line option. -Many GNU programs look for this environment variable to turn on -strict POSIX mode. If @option{--lint} is supplied on the command-line -and @command{gawk} turns on POSIX mode because of @env{POSIXLY_CORRECT}, -then it issues a warning message indicating that POSIX -mode is in effect. -You would typically set this variable in your shell's startup file. -For a Bourne-compatible shell (such as @command{bash}), you would add these -lines to the @file{.profile} file in your home directory: - -@example -POSIXLY_CORRECT=true -export POSIXLY_CORRECT -@end example - -@cindex @command{csh} utility -For a @command{csh} compatible -shell,@footnote{Not recommended.} -you would add this line to the @file{.login} file in your home directory: - -@example -setenv POSIXLY_CORRECT true -@end example - -Having @env{POSIXLY_CORRECT} set is not recommended for daily use, -but it is good for testing the portability of your programs to other -environments. - -@node Other Arguments, AWKPATH Variable, Options, Invoking Gawk -@section Other Command-Line Arguments - -Any additional arguments on the command-line are normally treated as -input files to be processed in the order specified. However, an -argument that has the form @code{@var{var}=@var{value}}, assigns -the value @var{value} to the variable @var{var}---it does not specify a -file at all. -(This was discussed earlier in -@ref{Assignment Options, ,Assigning Variables on the Command Line}.) - -@cindex @code{ARGIND} variable -@cindex @code{ARGV} variable -All these arguments are made available to your @command{awk} program in the -@code{ARGV} array (@pxref{Built-in Variables}). Command-line options -and the program text (if present) are omitted from @code{ARGV}. -All other arguments, including variable assignments, are -included. As each element of @code{ARGV} is processed, @command{gawk} -sets the variable @code{ARGIND} to the index in @code{ARGV} of the -current element. - -The distinction between @value{FN} arguments and variable-assignment -arguments is made when @command{awk} is about to open the next input file. -At that point in execution, it checks the @value{FN} to see whether -it is really a variable assignment; if so, @command{awk} sets the variable -instead of reading a file. - -Therefore, the variables actually receive the given values after all -previously specified files have been read. In particular, the values of -variables assigned in this fashion are @emph{not} available inside a -@code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), -because such rules are run before @command{awk} begins scanning the argument list. - -@cindex dark corner -The variable values given on the command-line are processed for escape -sequences (@pxref{Escape Sequences}). -@value{DARKCORNER} - -In some earlier implementations of @command{awk}, when a variable assignment -occurred before any @value{FN}s, the assignment would happen @emph{before} -the @code{BEGIN} rule was executed. @command{awk}'s behavior was thus -inconsistent; some command-line assignments were available inside the -@code{BEGIN} rule, while others were not. Unfortunately, -some applications came to depend -upon this ``feature.'' When @command{awk} was changed to be more consistent, -the @option{-v} option was added to accommodate applications that depended -upon the old behavior. - -The variable assignment feature is most useful for assigning to variables -such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and -output formats before scanning the @value{DF}s. It is also useful for -controlling state if multiple passes are needed over a @value{DF}. For -example: - -@cindex multiple passes over data -@cindex passes, multiple -@example -awk 'pass == 1 @{ @var{pass 1 stuff} @} - pass == 2 @{ @var{pass 2 stuff} @}' pass=1 mydata pass=2 mydata -@end example - -Given the variable assignment feature, the @option{-F} option for setting -the value of @code{FS} is not -strictly necessary. It remains for historical compatibility. - -@node AWKPATH Variable, Obsolete, Other Arguments, Invoking Gawk -@section The @env{AWKPATH} Environment Variable -@cindex @env{AWKPATH} environment variable -@cindex environment variable, @env{AWKPATH} -@cindex search path -@cindex directory search -@cindex path, search -@cindex search path, for source files -@cindex differences between @command{gawk} and @command{awk} -@ifinfo -The previous @value{SECTION} described how @command{awk} program files can be named -on the command-line with the @option{-f} option. -@end ifinfo -In most @command{awk} -implementations, you must supply a precise path name for each program -file, unless the file is in the current directory. -But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option -does not contain a @samp{/}, then @command{gawk} searches a list of -directories (called the @dfn{search path}), one by one, looking for a -file with the specified name. - -The search path is a string consisting of directory names -separated by colons. @command{gawk} gets its search path from the -@env{AWKPATH} environment variable. If that variable does not exist, -@command{gawk} uses a default path, which is -@samp{.:/usr/local/share/awk}.@footnote{Your version of @command{gawk} -may use a different directory; it -will depend upon how @command{gawk} was built and installed. The actual -directory is the value of @samp{$(datadir)} generated when -@command{gawk} was configured. You probably don't need to worry about this -though.} (Programs written for use by -system administrators should use an @env{AWKPATH} variable that -does not include the current directory, @file{.}.) - -The search path feature is particularly useful for building libraries -of useful @command{awk} functions. The library files can be placed in a -standard directory in the default path and then specified on -the command-line with a short @value{FN}. Otherwise, the full @value{FN} -would have to be typed for each file. - -By using both the @option{--source} and @option{-f} options, your command-line -@command{awk} programs can use facilities in @command{awk} library files. -@xref{Library Functions, , A Library of @command{awk} Functions}. -Path searching is not done if @command{gawk} is in compatibility mode. -This is true for both @option{--traditional} and @option{--posix}. -@xref{Options, ,Command-Line Options}. - -@strong{Note:} If you want files in the current directory to be found, -you must include the current directory in the path, either by including -@file{.} explicitly in the path or by writing a null entry in the -path. (A null entry is indicated by starting or ending the path with a -colon or by placing two colons next to each other (@samp{::}).) If the -current directory is not included in the path, then files cannot be -found in the current directory. This path search mechanism is identical -to the shell's. -@c someday, @cite{The Bourne Again Shell}.... - -Starting with @value{PVERSION} 3.0, if @env{AWKPATH} is not defined in the -environment, @command{gawk} places its default search path into -@code{ENVIRON["AWKPATH"]}. This makes it easy to determine -the actual search path that @command{gawk} will use -from within an @command{awk} program. - -While you can change @code{ENVIRON["AWKPATH"]} within your @command{awk} -program, this has no effect on the running program's behavior. This makes -sense: the @env{AWKPATH} environment variable is used to find the program -source files. Once your program is running, all the files have been -found, and @command{gawk} no longer needs to use @env{AWKPATH}. - -@node Obsolete, Undocumented, AWKPATH Variable, Invoking Gawk -@section Obsolete Options and/or Features - -@cindex deprecated options -@cindex obsolete options -@cindex deprecated features -@cindex obsolete features -This @value{SECTION} describes features and/or command-line options from -previous releases of @command{gawk} that are either not available in the -current version or that are still supported but deprecated (meaning that -they will @emph{not} be in the next release). - -@c update this section for each release! - -For @value{PVERSION} @value{VERSION} of @command{gawk}, there are no -deprecated command-line options -@c or other deprecated features -from the previous version of @command{gawk}. -The use of @samp{next file} (two words) for @code{nextfile} was deprecated -in @command{gawk} 3.0 but still worked. Starting with @value{PVERSION} 3.1, the -two word usage is no longer accepted. - -The process-related special files described in -@ref{Special Process, ,Special Files for Process-Related Information}, -work as described, but -are now considered deprecated. -@command{gawk} prints a warning message every time they are used. -(Use @code{PROCINFO} instead; see -@ref{Auto-set, ,Built-in Variables That Convey Information}.) -They will be removed from the next release of @command{gawk}. - -@ignore -This @value{SECTION} -is thus essentially a place holder, -in case some option becomes obsolete in a future version of @command{gawk}. -@end ignore - -@node Undocumented, Known Bugs, Obsolete, Invoking Gawk -@section Undocumented Options and Features -@cindex undocumented features -@cindex features, undocumented -@cindex Skywalker, Luke -@cindex Kenobi, Obi-Wan -@cindex Jedi knights -@cindex Knights, jedi -@quotation -@i{Use the Source, Luke!}@* -Obi-Wan -@end quotation - -This @value{SECTION} intentionally left -blank. - -@ignore -@c If these came out in the Info file or TeX document, then they wouldn't -@c be undocumented, would they? - -@command{gawk} has one undocumented option: - -@table @code -@item -W nostalgia -@itemx --nostalgia -Print the message @code{"awk: bailing out near line 1"} and dump core. -This option was inspired by the common behavior of very early versions of -Unix @command{awk} and by a t--shirt. -The message is @emph{not} subject to translation in non-English locales. -@c so there! nyah, nyah. -@end table - -Early versions of @command{awk} used to not require any separator (either -a newline or @samp{;}) between the rules in @command{awk} programs. Thus, -it was common to see one-line programs like: - -@example -awk '@{ sum += $1 @} END @{ print sum @}' -@end example - -@command{gawk} actually supports this but it is purposely undocumented -because it is considered bad style. The correct way to write such a program -is either - -@example -awk '@{ sum += $1 @} ; END @{ print sum @}' -@end example - -@noindent -or - -@example -awk '@{ sum += $1 @} - END @{ print sum @}' data -@end example - -@noindent -@xref{Statements/Lines, ,@command{awk} Statements Versus Lines}, for a fuller -explanation. - -You can insert newlines after the @samp{;} in @code{for} loops. -This seems to have been a long-undocumented feature in Unix @command{awk}. - -If the environment variable @env{WHINY_USERS} exists -when @command{gawk} is run, -then the associative @code{for} loop will go through the array -indices in sorted order. -The comparison used for sorting is simple string comparison; -any non-English or non-ASCII locales are not taken into account. -@code{IGNORECASE} does not affect the comparison either. - -@end ignore - -@node Known Bugs, , Undocumented, Invoking Gawk -@section Known Bugs in @command{gawk} -@cindex bugs, known in @command{gawk} -@cindex known bugs - -@itemize @bullet -@item -The @option{-F} option for changing the value of @code{FS} -(@pxref{Options, ,Command-Line Options}) -is not necessary given the command-line variable -assignment feature; it remains only for backwards compatibility. - -@item -Syntactically invalid single character programs tend to overflow -the parse stack, generating a rather unhelpful message. Such programs -are surprisingly difficult to diagnose in the completely general case -and the effort to do so really is not worth it. -@end itemize - -@ignore -@c Try this -@iftex -@page -@headings off -@majorheading II@ @ @ Using @command{awk} and @command{gawk} -Part II shows how to use @command{awk} and @command{gawk} for problem solving. -There is lots of code here for you to read and learn from. -It contains the following chapters: - -@itemize @bullet -@item -@ref{Library Functions, ,A Library of @command{awk} Functions}. - -@item -@ref{Sample Programs, ,Practical @command{awk} Programs}. - -@end itemize - -@page -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex -@end ignore - -@node Library Functions, Sample Programs, Invoking Gawk, Top -@chapter A Library of @command{awk} Functions - -@ref{User-defined, ,User-Defined Functions}, describes how to write -your own @command{awk} functions. Writing functions is important, because -it allows you to encapsulate algorithms and program tasks in a single -place. It simplifies programming, making program development more -manageable, and making programs more readable. - -One valuable way to learn a new programming language is to @emph{read} -programs in that language. To that end, this @value{CHAPTER} -and @ref{Sample Programs, ,Practical @command{awk} Programs}, -provide a good-sized body of code for you to read, -and hopefully, to learn from. - -@c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!! -This @value{CHAPTER} presents a library of useful @command{awk} functions. -Many of the sample programs presented later in this @value{DOCUMENT} -use these functions. -The functions are presented here in a progression from simple to complex. - -@cindex Texinfo -@ref{Extract Program, ,Extracting Programs from Texinfo Source Files}, -presents a program that you can use to extract the source code for -these example library functions and programs from the Texinfo source -for this @value{DOCUMENT}. -(This has already been done as part of the @command{gawk} distribution.) - -If you have written one or more useful, general purpose @command{awk} functions -and would like to contribute them to the author's collection of @command{awk} -programs, see -@ref{How To Contribute, ,How to Contribute}, for more information. - -@cindex portability issues -The programs in this @value{CHAPTER} and in -@ref{Sample Programs, ,Practical @command{awk} Programs}, -freely use features that are @command{gawk}-specific. -It is straightforward to rewrite these programs for -different implementations of @command{awk}. - -Diagnostic error messages are sent to @file{/dev/stderr}. -Use @samp{| "cat 1>&2"} instead of @samp{> "/dev/stderr"}, if your system -does not have a @file{/dev/stderr} or if you cannot use @command{gawk}. - -A number of programs use @code{nextfile} -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}) -to skip any remaining input in the input file. -@ref{Nextfile Function, ,Implementing @code{nextfile} as a Function}, -shows you how to write a function that does the same thing. - -@c 12/2000: Thanks to Nelson Beebe for pointing out the output issue. -Finally, some of the programs choose to ignore upper- and lowercase -distinctions in their input. They do so by assigning one to @code{IGNORECASE}. -You can achieve almost the same effect@footnote{The effects are -not identical. Output of the transformed -record will be in all lowercase, while @code{IGNORECASE} preserves the original -contents of the input record.} by adding the following rule to the -beginning of the program: - -@example -# ignore case -@{ $0 = tolower($0) @} -@end example - -@noindent -Also, verify that all regexp and string constants used in -comparisons only use lowercase letters. - -@menu -* Library Names:: How to best name private global variables in - library functions. -* General Functions:: Functions that are of general use. -* Data File Management:: Functions for managing command-line data - files. -* Getopt Function:: A function for processing command-line - arguments. -* Passwd Functions:: Functions for getting user information. -* Group Functions:: Functions for getting group information. -@end menu - -@node Library Names, General Functions, Library Functions, Library Functions -@section Naming Library Function Global Variables - -@cindex names, use of -@cindex namespace issues in @command{awk} -@cindex documenting @command{awk} programs -@cindex programs, documenting -Due to the way the @command{awk} language evolved, variables are either -@dfn{global} (usable by the entire program) or @dfn{local} (usable just by -a specific function). There is no intermediate state analogous to -@code{static} variables in C. - -Library functions often need to have global variables that they can use to -preserve state information between calls to the function---for example, -@code{getopt}'s variable @code{_opti} -(@pxref{Getopt Function, ,Processing Command-Line Options}). -Such variables are called @dfn{private}, since the only functions that need to -use them are the ones in the library. - -When writing a library function, you should try to choose names for your -private variables that will not conflict with any variables used by -either another library function or a user's main program. For example, a -name like @samp{i} or @samp{j} is not a good choice, because user programs -often use variable names like these for their own purposes. - -@cindex conventions, programming -@cindex programming conventions -The example programs shown in this @value{CHAPTER} all start the names of their -private variables with an underscore (@samp{_}). Users generally don't use -leading underscores in their variable names, so this convention immediately -decreases the chances that the variable name will be accidentally shared -with the user's program. - -In addition, several of the library functions use a prefix that helps -indicate what function or set of functions use the variables---for example, -@code{_pw_byname} in the user database routines -(@pxref{Passwd Functions, ,Reading the User Database}). -This convention is recommended, since it even further decreases the -chance of inadvertent conflict among variable names. Note that this -convention is used equally well for variable names and for private -function names as well.@footnote{While all the library routines could have -been rewritten to use this convention, this was not done, in order to -show how my own @command{awk} programming style has evolved, and to -provide some basis for this discussion.} - -As a final note on variable naming, if a function makes global variables -available for use by a main program, it is a good convention to start that -variable's name with a capital letter---for -example, @code{getopt}'s @code{Opterr} and @code{Optind} variables -(@pxref{Getopt Function, ,Processing Command-Line Options}). -The leading capital letter indicates that it is global, while the fact that -the variable name is not all capital letters indicates that the variable is -not one of @command{awk}'s built-in variables, such as @code{FS}. - -It is also important that @emph{all} variables in library -functions that do not need to save state are, in fact, declared -local.@footnote{@command{gawk}'s @option{--dump-variables} command-line -option is useful for verifying this.} If this is not done, the variable -could accidentally be used in the user's program, leading to bugs that -are very difficult to track down: - -@example -function lib_func(x, y, l1, l2) -@{ - @dots{} - @var{use variable} some_var # some_var should be local - @dots{} # but is not by oversight -@} -@end example - -@cindex Tcl -A different convention, common in the Tcl community, is to use a single -associative array to hold the values needed by the library function(s), or -``package.'' This significantly decreases the number of actual global names -in use. For example, the functions described in -@ref{Passwd Functions, , Reading the User Database}, -might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}}, -@code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of -@code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}}, -and @code{@w{_pw_count}}. - -The conventions presented in this @value{SECTION} are exactly -that: conventions. You are not required to write your programs this -way---we merely recommend that you do so. - -@node General Functions, Data File Management, Library Names, Library Functions -@section General Programming - -This @value{SECTION} presents a number of functions that are of general -programming use. - -@menu -* Nextfile Function:: Two implementations of a @code{nextfile} - function. -* Assert Function:: A function for assertions in @command{awk} - programs. -* Round Function:: A function for rounding if @code{sprintf} does - not do it correctly. -* Cliff Random Function:: The Cliff Random Number Generator. -* Ordinal Functions:: Functions for using characters as numbers and - vice versa. -* Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. -@end menu - -@node Nextfile Function, Assert Function, General Functions, General Functions -@subsection Implementing @code{nextfile} as a Function - -@cindex skipping input files -@cindex input files, skipping -The @code{nextfile} statement presented in -@ref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}, -is a @command{gawk}-specific extension---it is not available in most other -implementations of @command{awk}. This @value{SECTION} shows two versions of a -@code{nextfile} function that you can use to simulate @command{gawk}'s -@code{nextfile} statement if you cannot use @command{gawk}. - -A first attempt at writing a @code{nextfile} function is as follows: - -@example -# nextfile --- skip remaining records in current file -# this should be read in before the "main" awk program - -function nextfile() @{ _abandon_ = FILENAME; next @} -_abandon_ == FILENAME @{ next @} -@end example - -@cindex conventions, programming -@cindex programming conventions -Because it supplies a rule that must be executed first, this file should -be included before the main program. This rule compares the current -@value{DF}'s name (which is always in the @code{FILENAME} variable) to -a private variable named @code{_abandon_}. If the @value{FN} matches, -then the action part of the rule executes a @code{next} statement to -go on to the next record. (The use of @samp{_} in the variable name is -a convention. It is discussed more fully in -@ref{Library Names, , Naming Library Function Global Variables}.) - -The use of the @code{next} statement effectively creates a loop that reads -all the records from the current @value{DF}. -The end of the file is eventually reached and -a new @value{DF} is opened, changing the value of @code{FILENAME}. -Once this happens, the comparison of @code{_abandon_} to @code{FILENAME} -fails and execution continues with the first rule of the ``real'' program. - -The @code{nextfile} function itself simply sets the value of @code{_abandon_} -and then executes a @code{next} statement to start the -loop. -@ignore -@c If the function can't be used on other versions of awk, this whole -@c section is pointless, no? Sigh. -@footnote{@command{gawk} is the only known @command{awk} implementation -that allows you to -execute @code{next} from within a function body. Some other workaround -is necessary if you are not using @command{gawk}.} -@end ignore - -@cindex @code{nextfile} user-defined function -This initial version has a subtle problem. -If the same @value{DF} is listed @emph{twice} on the commandline, -one right after the other -or even with just a variable assignment between them, -this code skips right through the file, a second time, even though -it should stop when it gets to the end of the first occurrence. -A second version of @code{nextfile} that remedies this problem -is shown here: - -@example -@c file eg/lib/nextfile.awk -# nextfile --- skip remaining records in current file -# correctly handle successive occurrences of the same file -@c endfile -@ignore -@c file eg/lib/nextfile.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May, 1993 - -@c endfile -@end ignore -@c file eg/lib/nextfile.awk -# this should be read in before the "main" awk program - -function nextfile() @{ _abandon_ = FILENAME; next @} - -_abandon_ == FILENAME @{ - if (FNR == 1) - _abandon_ = "" - else - next -@} -@c endfile -@end example - -The @code{nextfile} function has not changed. It makes @code{_abandon_} -equal to the current @value{FN} and then executes a @code{next} statement. -The @code{next} statement reads the next record and increments @code{FNR} -so that @code{FNR} is guaranteed to have a value of at least two. -However, if @code{nextfile} is called for the last record in the file, -then @command{awk} closes the current @value{DF} and moves on to the next -one. Upon doing so, @code{FILENAME} is set to the name of the new file -and @code{FNR} is reset to one. If this next file is the same as -the previous one, @code{_abandon_} is still equal to @code{FILENAME}. -However, @code{FNR} is equal to one, telling us that this is a new -occurrence of the file and not the one we were reading when the -@code{nextfile} function was executed. In that case, @code{_abandon_} -is reset to the empty string, so that further executions of this rule -fail (until the next time that @code{nextfile} is called). - -If @code{FNR} is not one, then we are still in the original @value{DF} -and the program executes a @code{next} statement to skip through it. - -An important question to ask at this point is: given that the -functionality of @code{nextfile} can be provided with a library file, -why is it built into @command{gawk}? Adding -features for little reason leads to larger, slower programs that are -harder to maintain. -The answer is that building @code{nextfile} into @command{gawk} provides -significant gains in efficiency. If the @code{nextfile} function is executed -at the beginning of a large @value{DF}, @command{awk} still has to scan the entire -file, splitting it up into records, -@c at least conceptually -just to skip over it. The built-in -@code{nextfile} can simply close the file immediately and proceed to the -next one, which saves a lot of time. This is particularly important in -@command{awk}, because @command{awk} programs are generally I/O-bound (i.e., -they spend most of their time doing input and output, instead of performing -computations). - -@node Assert Function, Round Function, Nextfile Function, General Functions -@subsection Assertions - -@cindex assertions -@cindex @code{assert} C library function -When writing large programs, it is often useful to know -that a condition or set of conditions is true. Before proceeding with a -particular computation, you make a statement about what you believe to be -the case. Such a statement is known as an -@dfn{assertion}. The C language provides an @code{<assert.h>} header file -and corresponding @code{assert} macro that the programmer can use to make -assertions. If an assertion fails, the @code{assert} macro arranges to -print a diagnostic message describing the condition that should have -been true but was not, and then it kills the program. In C, using -@code{assert} looks this: - -@example -#include <assert.h> - -int myfunc(int a, double b) -@{ - assert(a <= 5 && b >= 17.1); - @dots{} -@} -@end example - -If the assertion fails, the program prints a message similar to this: - -@example -prog.c:5: assertion failed: a <= 5 && b >= 17.1 -@end example - -@cindex @code{assert} user-defined function -The C language makes it possible to turn the condition into a string for use -in printing the diagnostic message. This is not possible in @command{awk}, so -this @code{assert} function also requires a string version of the condition -that is being tested. -Following is the function: - -@example -@c file eg/lib/assert.awk -# assert --- assert that a condition is true. Otherwise exit. -@c endfile -@ignore -@c file eg/lib/assert.awk - -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May, 1993 - -@c endfile -@end ignore -@c file eg/lib/assert.awk -function assert(condition, string) -@{ - if (! condition) @{ - printf("%s:%d: assertion failed: %s\n", - FILENAME, FNR, string) > "/dev/stderr" - _assert_exit = 1 - exit 1 - @} -@} - -@group -END @{ - if (_assert_exit) - exit 1 -@} -@end group -@c endfile -@end example - -The @code{assert} function tests the @code{condition} parameter. If it -is false, it prints a message to standard error, using the @code{string} -parameter to describe the failed condition. It then sets the variable -@code{_assert_exit} to one and executes the @code{exit} statement. -The @code{exit} statement jumps to the @code{END} rule. If the @code{END} -rules finds @code{_assert_exit} to be true, it then exits immediately. - -The purpose of the test in the @code{END} rule is to -keep any other @code{END} rules from running. When an assertion fails, the -program should exit immediately. -If no assertions fail, then @code{_assert_exit} is still -false when the @code{END} rule is run normally, and the rest of the -program's @code{END} rules execute. -For all of this to work correctly, @file{assert.awk} must be the -first source file read by @command{awk}. -The function can be used in a program in the following way: - -@example -function myfunc(a, b) -@{ - assert(a <= 5 && b >= 17.1, "a <= 5 && b >= 17.1") - @dots{} -@} -@end example - -@noindent -If the assertion fails, you see a message similar to the following: - -@example -mydata:1357: assertion failed: a <= 5 && b >= 17.1 -@end example - -There is a small problem with this version of @code{assert}. -An @code{END} rule is automatically added -to the program calling @code{assert}. Normally, if a program consists -of just a @code{BEGIN} rule, the input files and/or standard input are -not read. However, now that the program has an @code{END} rule, @command{awk} -attempts to read the input @value{DF}s or standard input -(@pxref{Using BEGIN/END, , Startup and Cleanup Actions}), -most likely causing the program to hang as it waits for input. - -There is a simple workaround to this: -make sure the @code{BEGIN} rule always ends -with an @code{exit} statement. - -@node Round Function, Cliff Random Function, Assert Function, General Functions -@subsection Rounding Numbers - -@cindex rounding -The way @code{printf} and @code{sprintf} -(@pxref{Printf, , Using @code{printf} Statements for Fancier Printing}) -perform rounding often depends upon the system's C @code{sprintf} -subroutine. On many machines, @code{sprintf} rounding is ``unbiased,'' -which means it doesn't always round a trailing @samp{.5} up, contrary -to naive expectations. In unbiased rounding, @samp{.5} rounds to even, -rather than always up, so 1.5 rounds to 2 but 4.5 rounds to 4. This means -that if you are using a format that does rounding (e.g., @code{"%.0f"}), -you should check what your system does. The following function does -traditional rounding; it might be useful if your awk's @code{printf} -does unbiased rounding: - -@cindex @code{round} user-defined function -@example -@c file eg/lib/round.awk -# round --- do normal rounding -@c endfile -@ignore -@c file eg/lib/round.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# August, 1996 - -@c endfile -@end ignore -@c file eg/lib/round.awk -function round(x, ival, aval, fraction) -@{ - ival = int(x) # integer part, int() truncates - - # see if fractional part - if (ival == x) # no fraction - return x - - if (x < 0) @{ - aval = -x # absolute value - ival = int(aval) - fraction = aval - ival - if (fraction >= .5) - return int(x) - 1 # -2.5 --> -3 - else - return int(x) # -2.3 --> -2 - @} else @{ - fraction = x - ival - if (fraction >= .5) - return ival + 1 - else - return ival - @} -@} - -# test harness -@{ print $0, round($0) @} -@c endfile -@end example - -@node Cliff Random Function, Ordinal Functions, Round Function, General Functions -@subsection The Cliff Random Number Generator -@cindex random numbers, Cliff -@cindex Cliff random numbers - -The Cliff random number -generator@footnote{@uref{http://mathworld.wolfram.com/CliffRandomNumberGenerator.hmtl}} -is a very simple random number generator that ``passes the noise sphere test -for randomness by showing no structure.'' -It is easily programmed, in less than 10 lines of @command{awk} code: - -@cindex @code{cliff_rand} user-defined function -@example -@c file eg/lib/cliff_rand.awk -# cliff_rand.awk --- generate Cliff random numbers -@c endfile -@ignore -@c file eg/lib/cliff_rand.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# December 2000 - -@c endfile -@end ignore -@c file eg/lib/cliff_rand.awk -BEGIN @{ _cliff_seed = 0.1 @} - -function cliff_rand() -@{ - _cliff_seed = (100 * log(_cliff_seed)) % 1 - if (_cliff_seed < 0) - _cliff_seed = - _cliff_seed - return _cliff_seed -@} -@c endfile -@end example - -This algorithm requires an initial ``seed'' of 0.1. Each new value -uses the current seed as input for the calculation. -If the built-in @code{rand} function -(@pxref{Numeric Functions}) -isn't random enough, you might try using this function instead. - -@node Ordinal Functions, Join Function, Cliff Random Function, General Functions -@subsection Translating Between Characters and Numbers - -@cindex numeric character values -@cindex values of characters as numbers -One commercial implementation of @command{awk} supplies a built-in function, -@code{ord}, which takes a character and returns the numeric value for that -character in the machine's character set. If the string passed to -@code{ord} has more than one character, only the first one is used. - -The inverse of this function is @code{chr} (from the function of the same -name in Pascal), which takes a number and returns the corresponding character. -Both functions are written very nicely in @command{awk}; there is no real -reason to build them into the @command{awk} interpreter: - -@cindex @code{ord} user-defined function -@cindex @code{chr} user-defined function -@example -@c file eg/lib/ord.awk -# ord.awk --- do ord and chr - -# Global identifiers: -# _ord_: numerical values indexed by characters -# _ord_init: function to initialize _ord_ -@c endfile -@ignore -@c file eg/lib/ord.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# 16 January, 1992 -# 20 July, 1992, revised - -@c endfile -@end ignore -@c file eg/lib/ord.awk -BEGIN @{ _ord_init() @} - -function _ord_init( low, high, i, t) -@{ - low = sprintf("%c", 7) # BEL is ascii 7 - if (low == "\a") @{ # regular ascii - low = 0 - high = 127 - @} else if (sprintf("%c", 128 + 7) == "\a") @{ - # ascii, mark parity - low = 128 - high = 255 - @} else @{ # ebcdic(!) - low = 0 - high = 255 - @} - - for (i = low; i <= high; i++) @{ - t = sprintf("%c", i) - _ord_[t] = i - @} -@} -@c endfile -@end example - -@cindex character sets (machine character encodings) -@cindex character encodings -@cindex ASCII -@cindex EBCDIC -@cindex mark parity -Some explanation of the numbers used by @code{chr} is worthwhile. -The most prominent character set in use today is ASCII. Although an -eight-bit byte can hold 256 distinct values (from 0 to 255), ASCII only -defines characters that use the values from 0 to 127.@footnote{ASCII -has been extended in many countries to use the values from 128 to 255 -for country-specific characters. If your system uses these extensions, -you can simplify @code{_ord_init} to simply loop from 0 to 255.} -In the now distant past, -at least one minicomputer manufacturer -@c Pr1me, blech -used ASCII, but with mark parity, meaning that the leftmost bit in the byte -is always 1. This means that on those systems, characters -have numeric values from 128 to 255. -Finally, large mainframe systems use the EBCDIC character set, which -uses all 256 values. -While there are other character sets in use on some older systems, -they are not really worth worrying about: - -@example -@c file eg/lib/ord.awk -function ord(str, c) -@{ - # only first character is of interest - c = substr(str, 1, 1) - return _ord_[c] -@} - -function chr(c) -@{ - # force c to be numeric by adding 0 - return sprintf("%c", c + 0) -@} -@c endfile - -#### test code #### -# BEGIN \ -# @{ -# for (;;) @{ -# printf("enter a character: ") -# if (getline var <= 0) -# break -# printf("ord(%s) = %d\n", var, ord(var)) -# @} -# @} -@c endfile -@end example - -An obvious improvement to these functions is to move the code for the -@code{@w{_ord_init}} function into the body of the @code{BEGIN} rule. It was -written this way initially for ease of development. -There is a ``test program'' in a @code{BEGIN} rule, to test the -function. It is commented out for production use. - -@node Join Function, Gettimeofday Function, Ordinal Functions, General Functions -@subsection Merging an Array into a String - -@cindex merging strings -When doing string processing, it is often useful to be able to join -all the strings in an array into one long string. The following function, -@code{join}, accomplishes this task. It is used later in several of -the application programs -(@pxref{Sample Programs, ,Practical @command{awk} Programs}). - -Good function design is important; this function needs to be general but it -should also have a reasonable default behavior. It is called with an array -as well as the beginning and ending indices of the elements in the array to be -merged. This assumes that the array indices are numeric---a reasonable -assumption since the array was likely created with @code{split} -(@pxref{String Functions, ,String Manipulation Functions}): - -@cindex @code{join} user-defined function -@example -@c file eg/lib/join.awk -# join.awk --- join an array into a string -@c endfile -@ignore -@c file eg/lib/join.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/lib/join.awk -function join(array, start, end, sep, result, i) -@{ - if (sep == "") - sep = " " - else if (sep == SUBSEP) # magic value - sep = "" - result = array[start] - for (i = start + 1; i <= end; i++) - result = result sep array[i] - return result -@} -@c endfile -@end example - -An optional additional argument is the separator to use when joining the -strings back together. If the caller supplies a non-empty value, -@code{join} uses it; if it is not supplied, it has a null -value. In this case, @code{join} uses a single blank as a default -separator for the strings. If the value is equal to @code{SUBSEP}, -then @code{join} joins the strings with no separator between them. -@code{SUBSEP} serves as a ``magic'' value to indicate that there should -be no separation between the component strings.@footnote{It would -be nice if @command{awk} had an assignment operator for concatenation. -The lack of an explicit operator for concatenation makes string operations -more difficult than they really need to be.} - -@node Gettimeofday Function, , Join Function, General Functions -@subsection Managing the Time of Day - -@cindex formatted timestamps -@cindex timestamps, formatted -The @code{systime} and @code{strftime} functions described in -@ref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}, -provide the minimum functionality necessary for dealing with the time of day -in human readable form. While @code{strftime} is extensive, the control -formats are not necessarily easy to remember or intuitively obvious when -reading a program. - -The following function, @code{gettimeofday}, populates a user-supplied array -with preformatted time information. It returns a string with the current -time formatted in the same way as the @command{date} utility: - -@cindex @code{gettimeofday} user-defined function -@example -@c file eg/lib/gettime.awk -# gettimeofday.awk --- get the time of day in a usable format -@c endfile -@ignore -@c file eg/lib/gettime.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain, May 1993 -# -@c endfile -@end ignore -@c file eg/lib/gettime.awk - -# Returns a string in the format of output of date(1) -# Populates the array argument time with individual values: -# time["second"] -- seconds (0 - 59) -# time["minute"] -- minutes (0 - 59) -# time["hour"] -- hours (0 - 23) -# time["althour"] -- hours (0 - 12) -# time["monthday"] -- day of month (1 - 31) -# time["month"] -- month of year (1 - 12) -# time["monthname"] -- name of the month -# time["shortmonth"] -- short name of the month -# time["year"] -- year modulo 100 (0 - 99) -# time["fullyear"] -- full year -# time["weekday"] -- day of week (Sunday = 0) -# time["altweekday"] -- day of week (Monday = 0) -# time["dayname"] -- name of weekday -# time["shortdayname"] -- short name of weekday -# time["yearday"] -- day of year (0 - 365) -# time["timezone"] -- abbreviation of timezone name -# time["ampm"] -- AM or PM designation -# time["weeknum"] -- week number, Sunday first day -# time["altweeknum"] -- week number, Monday first day - -function gettimeofday(time, ret, now, i) -@{ - # get time once, avoids unnecessary system calls - now = systime() - - # return date(1)-style output - ret = strftime("%a %b %d %H:%M:%S %Z %Y", now) - - # clear out target array - delete time - - # fill in values, force numeric values to be - # numeric by adding 0 - time["second"] = strftime("%S", now) + 0 - time["minute"] = strftime("%M", now) + 0 - time["hour"] = strftime("%H", now) + 0 - time["althour"] = strftime("%I", now) + 0 - time["monthday"] = strftime("%d", now) + 0 - time["month"] = strftime("%m", now) + 0 - time["monthname"] = strftime("%B", now) - time["shortmonth"] = strftime("%b", now) - time["year"] = strftime("%y", now) + 0 - time["fullyear"] = strftime("%Y", now) + 0 - time["weekday"] = strftime("%w", now) + 0 - time["altweekday"] = strftime("%u", now) + 0 - time["dayname"] = strftime("%A", now) - time["shortdayname"] = strftime("%a", now) - time["yearday"] = strftime("%j", now) + 0 - time["timezone"] = strftime("%Z", now) - time["ampm"] = strftime("%p", now) - time["weeknum"] = strftime("%U", now) + 0 - time["altweeknum"] = strftime("%W", now) + 0 - - return ret -@} -@c endfile -@end example - -The string indices are easier to use and read than the various formats -required by @code{strftime}. The @code{alarm} program presented in -@ref{Alarm Program, ,An Alarm Clock Program}, -uses this function. -A more general design for the @code{gettimeofday} function would have -allowed the user to supply an optional timestamp value to use instead -of the current time. - -@node Data File Management, Getopt Function, General Functions, Library Functions -@section @value{DDF} Management - -This @value{SECTION} presents functions that are useful for managing -command-line datafiles. - -@menu -* Filetrans Function:: A function for handling data file transitions. -* Rewind Function:: A function for rereading the current file. -* File Checking:: Checking that data files are readable. -* Ignoring Assigns:: Treating assignments as file names. -@end menu - -@node Filetrans Function, Rewind Function, Data File Management, Data File Management -@subsection Noting @value{DDF} Boundaries - -@cindex per file initialization and cleanup -The @code{BEGIN} and @code{END} rules are each executed exactly once, at -the beginning and end of your @command{awk} program, respectively -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). -We (the @command{gawk} authors) once had a user who mistakenly thought that the -@code{BEGIN} rule is executed at the beginning of each @value{DF} and the -@code{END} rule is executed at the end of each @value{DF}. When informed -that this was not the case, the user requested that we add new special -patterns to @command{gawk}, named @code{BEGIN_FILE} and @code{END_FILE}, that -would have the desired behavior. He even supplied us the code to do so. - -Adding these special patterns to @command{gawk} wasn't necessary; -the job can be done cleanly in @command{awk} itself, as illustrated -by the following library program. -It arranges to call two user-supplied functions, @code{beginfile} and -@code{endfile}, at the beginning and end of each @value{DF}. -Besides solving the problem in only nine(!) lines of code, it does so -@emph{portably}; this works with any implementation of @command{awk}: - -@example -# transfile.awk -# -# Give the user a hook for filename transitions -# -# The user must supply functions beginfile() and endfile() -# that each take the name of the file being started or -# finished, respectively. -@c # -@c # Arnold Robbins, arnold@@gnu.org, Public Domain -@c # January 1992 - -FILENAME != _oldfilename \ -@{ - if (_oldfilename != "") - endfile(_oldfilename) - _oldfilename = FILENAME - beginfile(FILENAME) -@} - -END @{ endfile(FILENAME) @} -@end example - -This file must be loaded before the user's ``main'' program, so that the -rule it supplies is executed first. - -This rule relies on @command{awk}'s @code{FILENAME} variable that -automatically changes for each new @value{DF}. The current @value{FN} is -saved in a private variable, @code{_oldfilename}. If @code{FILENAME} does -not equal @code{_oldfilename}, then a new @value{DF} is being processed and -it is necessary to call @code{endfile} for the old file. Because -@code{endfile} should only be called if a file has been processed, the -program first checks to make sure that @code{_oldfilename} is not the null -string. The program then assigns the current @value{FN} to -@code{_oldfilename} and calls @code{beginfile} for the file. -Because, like all @command{awk} variables, @code{_oldfilename} is -initialized to the null string, this rule executes correctly even for the -first @value{DF}. - -The program also supplies an @code{END} rule to do the final processing for -the last file. Because this @code{END} rule comes before any @code{END} rules -supplied in the ``main'' program, @code{endfile} is called first. Once -again the value of multiple @code{BEGIN} and @code{END} rules should be clear. - -@cindex @code{beginfile} user-defined function -@cindex @code{endfile} user-defined function -This version has same problem as the first version of @code{nextfile} -(@pxref{Nextfile Function, ,Implementing @code{nextfile} as a Function}). -If the same @value{DF} occurs twice in a row on the command line, then -@code{endfile} and @code{beginfile} are not executed at the end of the -first pass and at the beginning of the second pass. -The following version solves the problem: - -@example -@c file eg/lib/ftrans.awk -# ftrans.awk --- handle data file transitions -# -# user supplies beginfile() and endfile() functions -@c endfile -@ignore -@c file eg/lib/ftrans.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# November 1992 - -@c endfile -@end ignore -@c file eg/lib/ftrans.awk -FNR == 1 @{ - if (_filename_ != "") - endfile(_filename_) - _filename_ = FILENAME - beginfile(FILENAME) -@} - -END @{ endfile(_filename_) @} -@c endfile -@end example - -@ref{Wc Program, ,Counting Things}, -shows how this library function can be used and -how it simplifies writing the main program. - -@node Rewind Function, File Checking, Filetrans Function, Data File Management -@subsection Rereading the Current File - -Another request for a new built-in function was for a @code{rewind} -function that would make it possible to reread the current file. -The requesting user didn't want to have to use @code{getline} -(@pxref{Getline, , Explicit Input with @code{getline}}) -inside a loop. - -However, as long as you are not in the @code{END} rule, it is -quite easy to arrange to immediately close the current input file -and then start over with it from the top. -For lack of a better name, we'll call it @code{rewind}: - -@cindex @code{rewind} user-defined function -@example -@c file eg/lib/rewind.awk -# rewind.awk --- rewind the current file and start over -@c endfile -@ignore -@c file eg/lib/rewind.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# September 2000 - -@c endfile -@end ignore -@c file eg/lib/rewind.awk -function rewind( i) -@{ - # shift remaining arguments up - for (i = ARGC; i > ARGIND; i--) - ARGV[i] = ARGV[i-1] - - # make sure gawk knows to keep going - ARGC++ - - # make current file next to get done - ARGV[ARGIND+1] = FILENAME - - # do it - nextfile -@} -@c endfile -@end example - -This code relies on the @code{ARGIND} variable -(@pxref{Auto-set, ,Built-in Variables That Convey Information}), -which is specific to @command{gawk}. -If you are not using -@command{gawk}, you can use ideas presented in -@iftex -the previous @value{SECTION} -@end iftex -@ifnottex -@ref{Filetrans Function, ,Noting @value{DDF} Boundaries}, -@end ifnottex -to either update @code{ARGIND} on your own -or modify this code as appropriate. - -The @code{rewind} function also relies on the @code{nextfile} keyword -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). -@xref{Nextfile Function, ,Implementing @code{nextfile} as a Function}, -for a function version of @code{nextfile}. - -@node File Checking, Ignoring Assigns, Rewind Function, Data File Management -@subsection Checking for Readable @value{DDF}s - -@cindex fatal errors -@cindex readable @value{DF}s, checking -@cindex non-readable @value{DF}s, skipping -@cindex @value{DF}s, non-readable, skipping -@cindex @value{DF}s, readable, checking -Normally, if you give @command{awk} a @value{DF} that isn't readable, -it stops with a fatal error. There are times when you -might want to just ignore such files and keep going. You can -do this by prepending the following program to your @command{awk} -program: - -@cindex @code{readable.awk} program -@example -@c file eg/lib/readable.awk -# readable.awk --- library file to skip over unreadable files -@c endfile -@ignore -@c file eg/lib/readable.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# October 2000 - -@c endfile -@end ignore -@c file eg/lib/readable.awk -BEGIN @{ - for (i = 1; i < ARGC; i++) @{ - if (ARGV[i] ~ /^[A-Za-z_][A-Za-z0-9_]*=.*/ \ - || ARGV[i] == "-") - continue # assignment or standard input - else if ((getline junk < ARGV[i]) < 0) # unreadable - delete ARGV[i] - else - close(ARGV[i]) - @} -@} -@c endfile -@end example - -@cindex fatal errors -In @command{gawk}, the @code{getline} won't be fatal (unless -@option{--posix} is in force). -Removing the element from @code{ARGV} with @code{delete} -skips the file (since it's no longer in the list). - -@c This doesn't handle /dev/stdin etc. Not worth the hassle to mention or fix. - -@node Ignoring Assigns, , File Checking, Data File Management -@subsection Treating Assignments as @value{FFN}s - -Occasionally, you might not want @command{awk} to process command-line -variable assignments -(@pxref{Assignment Options, ,Assigning Variables on the Command Line}). -In particular, if you have @value{FN}s that contain an @samp{=} character, -@command{awk} treats the @value{FN} as an assignment, and does not process it. - -Some users have suggested an additional command-line option for @command{gawk} -to disable command-line assignments. However, some simple programming with -a library file does the trick: - -@cindex @code{noassign.awk} program -@example -@c file eg/lib/noassign.awk -# noassign.awk --- library file to avoid the need for a -# special option that disables command-line assignments -@c endfile -@ignore -@c file eg/lib/noassign.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# October 1999 - -@c endfile -@end ignore -@c file eg/lib/noassign.awk -function disable_assigns(argc, argv, i) -@{ - for (i = 1; i < argc; i++) - if (argv[i] ~ /^[A-Za-z_][A-Za-z_0-9]*=.*/) - argv[i] = ("./" argv[i]) -@} - -BEGIN @{ - if (No_command_assign) - disable_assigns(ARGC, ARGV) -@} -@c endfile -@end example - -You then run your program this way: - -@example -awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk * -@end example - -The function works by looping through the arguments. -It prepends @samp{./} to -any argument that matches the form -of a variable assignment, turning that argument into a @value{FN}. - -The use of @code{No_command_assign} allows you to disable command-line -assignments at invocation time, by giving the variable a true value. -When not set, it is initially zero (i.e., false), so the command-line arguments -are left alone. - -@node Getopt Function, Passwd Functions, Data File Management, Library Functions -@section Processing Command-Line Options - -@cindex @code{getopt} C library function -@cindex processing arguments -@cindex argument processing -Most utilities on POSIX compatible systems take options, or ``switches,'' on -the command line that can be used to change the way a program behaves. -@command{awk} is an example of such a program -(@pxref{Options, ,Command-Line Options}). -Often, options take @dfn{arguments}; i.e., data that the program needs to -correctly obey the command-line option. For example, @command{awk}'s -@option{-F} option requires a string to use as the field separator. -The first occurrence on the command line of either @option{--} or a -string that does not begin with @samp{-} ends the options. - -Modern Unix systems provide a C function named @code{getopt} for processing -command-line arguments. The programmer provides a string describing the -one-letter options. If an option requires an argument, it is followed in the -string with a colon. @code{getopt} is also passed the -count and values of the command-line arguments and is called in a loop. -@code{getopt} processes the command-line arguments for option letters. -Each time around the loop, it returns a single character representing the -next option letter that it finds, or @samp{?} if it finds an invalid option. -When it returns @minus{}1, there are no options left on the command line. - -When using @code{getopt}, options that do not take arguments can be -grouped together. Furthermore, options that take arguments require that the -argument is present. The argument can immediately follow the option letter -or it can be a separate command-line argument. - -Given a hypothetical program that takes -three command-line options, @option{-a}, @option{-b}, and @option{-c}, where -@option{-b} requires an argument, all of the following are valid ways of -invoking the program: - -@example -prog -a -b foo -c data1 data2 data3 -prog -ac -bfoo -- data1 data2 data3 -prog -acbfoo data1 data2 data3 -@end example - -Notice that when the argument is grouped with its option, the rest of -the argument is considered to be the option's argument. -In this example, @option{-acbfoo} indicates that all of the -@option{-a}, @option{-b}, and @option{-c} options were supplied, -and that @samp{foo} is the argument to the @option{-b} option. - -@code{getopt} provides four external variables that the programmer can use: - -@table @code -@item optind -The index in the argument value array (@code{argv}) where the first -non-option command-line argument can be found. - -@item optarg -The string value of the argument to an option. - -@item opterr -Usually @code{getopt} prints an error message when it finds an invalid -option. Setting @code{opterr} to zero disables this feature. (An -application might want to print its own error message.) - -@item optopt -The letter representing the command-line option. -@c While not usually documented, most versions supply this variable. -@end table - -The following C fragment shows how @code{getopt} might process command-line -arguments for @command{awk}: - -@example -int -main(int argc, char *argv[]) -@{ - @dots{} - /* print our own message */ - opterr = 0; - while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) @{ - switch (c) @{ - case 'f': /* file */ - @dots{} - break; - case 'F': /* field separator */ - @dots{} - break; - case 'v': /* variable assignment */ - @dots{} - break; - case 'W': /* extension */ - @dots{} - break; - case '?': - default: - usage(); - break; - @} - @} - @dots{} -@} -@end example - -As a side point, @command{gawk} actually uses the GNU @code{getopt_long} -function to process both normal and GNU-style long options -(@pxref{Options, ,Command-Line Options}). - -The abstraction provided by @code{getopt} is very useful and is quite -handy in @command{awk} programs as well. Following is an @command{awk} -version of @code{getopt}. This function highlights one of the -greatest weaknesses in @command{awk}, which is that it is very poor at -manipulating single characters. Repeated calls to @code{substr} are -necessary for accessing individual characters -(@pxref{String Functions, ,String Manipulation Functions}).@footnote{This -function was written before @command{gawk} acquired the ability to -split strings into single characters using @code{""} as the separator. -We have left it alone, since using @code{substr} is more portable.} - -The discussion that follows walks through the code a bit at a time: - -@example -@c file eg/lib/getopt.awk -# getopt.awk --- do C library getopt(3) function in awk -@c endfile -@ignore -@c file eg/lib/getopt.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# -# Initial version: March, 1991 -# Revised: May, 1993 - -@c endfile -@end ignore -@c file eg/lib/getopt.awk -# External variables: -# Optind -- index in ARGV of first non-option argument -# Optarg -- string value of argument to current option -# Opterr -- if nonzero, print our own diagnostic -# Optopt -- current option letter - -# Returns: -# -1 at end of options -# ? for unrecognized option -# <c> a character representing the current option - -# Private Data: -# _opti -- index in multi-flag option, e.g., -abc -@c endfile -@end example - -The function starts out with -a list of the global variables it uses, -what the return values are, what they mean, and any global variables that -are ``private'' to this library function. Such documentation is essential -for any program, and particularly for library functions. - -The @code{getopt} function first checks that it was indeed called with a string of options -(the @code{options} parameter). If @code{options} has a zero length, -@code{getopt} immediately returns @minus{}1: - -@cindex @code{getopt} user-defined function -@example -@c file eg/lib/getopt.awk -function getopt(argc, argv, options, thisopt, i) -@{ - if (length(options) == 0) # no options given - return -1 - -@group - if (argv[Optind] == "--") @{ # all done - Optind++ - _opti = 0 - return -1 -@end group - @} else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) @{ - _opti = 0 - return -1 - @} -@c endfile -@end example - -The next thing to check for is the end of the options. A @option{--} -ends the command-line options, as does any command-line argument that -does not begin with a @samp{-}. @code{Optind} is used to step through -the array of command-line arguments; it retains its value across calls -to @code{getopt}, because it is a global variable. - -The regular expression that is used, @code{@w{/^-[^: \t\n\f\r\v\b]/}}, is -perhaps a bit of overkill; it checks for a @samp{-} followed by anything -that is not whitespace and not a colon. -If the current command-line argument does not match this pattern, -it is not an option, and it ends option processing. - -@example -@c file eg/lib/getopt.awk - if (_opti == 0) - _opti = 2 - thisopt = substr(argv[Optind], _opti, 1) - Optopt = thisopt - i = index(options, thisopt) - if (i == 0) @{ - if (Opterr) - printf("%c -- invalid option\n", - thisopt) > "/dev/stderr" - if (_opti >= length(argv[Optind])) @{ - Optind++ - _opti = 0 - @} else - _opti++ - return "?" - @} -@c endfile -@end example - -The @code{_opti} variable tracks the position in the current command-line -argument (@code{argv[Optind]}). If multiple options are -grouped together with one @samp{-} (e.g., @option{-abx}), it is necessary -to return them to the user one at a time. - -If @code{_opti} is equal to zero, it is set to two, which is the index in -the string of the next character to look at (we skip the @samp{-}, which -is at position one). The variable @code{thisopt} holds the character, -obtained with @code{substr}. It is saved in @code{Optopt} for the main -program to use. - -If @code{thisopt} is not in the @code{options} string, then it is an -invalid option. If @code{Opterr} is nonzero, @code{getopt} prints an error -message on the standard error that is similar to the message from the C -version of @code{getopt}. - -Because the option is invalid, it is necessary to skip it and move on to the -next option character. If @code{_opti} is greater than or equal to the -length of the current command-line argument, it is necessary to move on -to the next argument, so @code{Optind} is incremented and @code{_opti} is reset -to zero. Otherwise, @code{Optind} is left alone and @code{_opti} is merely -incremented. - -In any case, because the option is invalid, @code{getopt} returns @samp{?}. -The main program can examine @code{Optopt} if it needs to know what the -invalid option letter actually is. Continuing on: - -@example -@c file eg/lib/getopt.awk - if (substr(options, i + 1, 1) == ":") @{ - # get option argument - if (length(substr(argv[Optind], _opti + 1)) > 0) - Optarg = substr(argv[Optind], _opti + 1) - else - Optarg = argv[++Optind] - _opti = 0 - @} else - Optarg = "" -@c endfile -@end example - -If the option requires an argument, the option letter is followed by a colon -in the @code{options} string. If there are remaining characters in the -current command-line argument (@code{argv[Optind]}), then the rest of that -string is assigned to @code{Optarg}. Otherwise, the next command-line -argument is used (@samp{-xFOO} vs.@: @samp{@w{-x FOO}}). In either case, -@code{_opti} is reset to zero, because there are no more characters left to -examine in the current command-line argument. Continuing: - -@example -@c file eg/lib/getopt.awk - if (_opti == 0 || _opti >= length(argv[Optind])) @{ - Optind++ - _opti = 0 - @} else - _opti++ - return thisopt -@} -@c endfile -@end example - -Finally, if @code{_opti} is either zero or greater than the length of the -current command-line argument, it means this element in @code{argv} is -through being processed, so @code{Optind} is incremented to point to the -next element in @code{argv}. If neither condition is true, then only -@code{_opti} is incremented, so that the next option letter can be processed -on the next call to @code{getopt}. - -The @code{BEGIN} rule initializes both @code{Opterr} and @code{Optind} to one. -@code{Opterr} is set to one, since the default behavior is for @code{getopt} -to print a diagnostic message upon seeing an invalid option. @code{Optind} -is set to one, since there's no reason to look at the program name, which is -in @code{ARGV[0]}: - -@example -@c file eg/lib/getopt.awk -BEGIN @{ - Opterr = 1 # default is to diagnose - Optind = 1 # skip ARGV[0] - - # test program - if (_getopt_test) @{ - while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1) - printf("c = <%c>, optarg = <%s>\n", - _go_c, Optarg) - printf("non-option arguments:\n") - for (; Optind < ARGC; Optind++) - printf("\tARGV[%d] = <%s>\n", - Optind, ARGV[Optind]) - @} -@} -@c endfile -@end example - -The rest of the @code{BEGIN} rule is a simple test program. Here is the -result of two sample runs of the test program: - -@example -$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x -@print{} c = <a>, optarg = <> -@print{} c = <c>, optarg = <> -@print{} c = <b>, optarg = <ARG> -@print{} non-option arguments: -@print{} ARGV[3] = <bax> -@print{} ARGV[4] = <-x> - -$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc -@print{} c = <a>, optarg = <> -@error{} x -- invalid option -@print{} c = <?>, optarg = <> -@print{} non-option arguments: -@print{} ARGV[4] = <xyz> -@print{} ARGV[5] = <abc> -@end example - -In both runs, -the first @option{--} terminates the arguments to @command{awk}, so that it does -not try to interpret the @option{-a}, etc., as its own options. -Several of the sample programs presented in -@ref{Sample Programs, ,Practical @command{awk} Programs}, -use @code{getopt} to process their arguments. - -@node Passwd Functions, Group Functions, Getopt Function, Library Functions -@section Reading the User Database - -The @code{PROCINFO} array -(@pxref{Built-in Variables}) -provides access to the current user's real and effective user and group id -numbers, and if available, the user's supplementary group set. -However, because these are numbers, they do not provide very useful -information to the average user. There needs to be some way to find the -user information associated with the user and group numbers. This -@value{SECTION} presents a suite of functions for retrieving information from the -user database. @xref{Group Functions, ,Reading the Group Database}, -for a similar suite that retrieves information from the group database. - -@cindex @code{getpwent} C library function -@cindex user information -@cindex login information -@cindex account information -@cindex password file -The POSIX standard does not define the file where user information is -kept. Instead, it provides the @code{<pwd.h>} header file -and several C language subroutines for obtaining user information. -The primary function is @code{getpwent}, for ``get password entry.'' -The ``password'' comes from the original user database file, -@file{/etc/passwd}, which stores user information, along with the -encrypted passwords (hence the name). - -@cindex @command{pwcat} program -While an @command{awk} program could simply read @file{/etc/passwd} -directly, this file may not contain complete information about the -system's set of users.@footnote{It is often the case that password -information is stored in a network database.} To be sure you are able to -produce a readable and complete version of the user database, it is necessary -to write a small C program that calls @code{getpwent}. @code{getpwent} -is defined as returning a pointer to a @code{struct passwd}. Each time it -is called, it returns the next entry in the database. When there are -no more entries, it returns @code{NULL}, the null pointer. When this -happens, the C program should call @code{endpwent} to close the database. -Following is @command{pwcat}, a C program that ``cats'' the password database. - -@c Use old style function header for portability to old systems (SunOS, HP/UX). - -@example -@c file eg/lib/pwcat.c -/* - * pwcat.c - * - * Generate a printable version of the password database - */ -@c endfile -@ignore -@c file eg/lib/pwcat.c -/* - * Arnold Robbins, arnold@@gnu.org, May 1993 - * Public Domain - */ - -@c endfile -@end ignore -@c file eg/lib/pwcat.c -#include <stdio.h> -#include <pwd.h> - -int -main(argc, argv) -int argc; -char **argv; -@{ - struct passwd *p; - - while ((p = getpwent()) != NULL) - printf("%s:%s:%d:%d:%s:%s:%s\n", - p->pw_name, p->pw_passwd, p->pw_uid, - p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell); - - endpwent(); - exit(0); -@} -@c endfile -@end example - -If you don't understand C, don't worry about it. -The output from @command{pwcat} is the user database, in the traditional -@file{/etc/passwd} format of colon-separated fields. The fields are: - -@ignore -@table @asis -@item Login name -The user's login name. - -@item Encrypted password -The user's encrypted password. This may not be available on some systems. - -@item User-ID -The user's numeric user-id number. - -@item Group-ID -The user's numeric group-id number. - -@item Full name -The user's full name, and perhaps other information associated with the -user. - -@item Home directory -The user's login (or ``home'') directory (familiar to shell programmers as -@code{$HOME}). - -@item Login shell -The program that is run when the user logs in. This is usually a -shell, such as @command{bash}. -@end table -@end ignore - -@multitable {Encrypted password} {1234567890123456789012345678901234567890123456} -@item Login name @tab The user's login name. - -@item Encrypted password @tab The user's encrypted password. This may not be available on some systems. - -@item User-ID @tab The user's numeric user-id number. - -@item Group-ID @tab The user's numeric group-id number. - -@item Full name @tab The user's full name, and perhaps other information associated with the -user. - -@item Home directory @tab The user's login (or ``home'') directory (familiar to shell programmers as -@code{$HOME}). - -@item Login shell @tab The program that is run when the user logs in. This is usually a -shell, such as @command{bash}. -@end multitable - -A few lines representative of @command{pwcat}'s output are as follows: - -@cindex Jacobs, Andrew -@cindex Robbins, Arnold -@cindex Robbins, Miriam -@example -$ pwcat -@print{} root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh -@print{} nobody:*:65534:65534::/: -@print{} daemon:*:1:1::/: -@print{} sys:*:2:2::/:/bin/csh -@print{} bin:*:3:3::/bin: -@print{} arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh -@print{} miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh -@print{} andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh -@dots{} -@end example - -With that introduction, following is a group of functions for getting user -information. There are several functions here, corresponding to the C -functions of the same names: - -@c Exercise: simplify all these functions that return values. -@c Answer: return foo[key] returns "" if key not there, no need to check with `in'. - -@cindex @code{_pw_init} user-defined function -@example -@c file eg/lib/passwdawk.in -# passwd.awk --- access password file information -@c endfile -@ignore -@c file eg/lib/passwdawk.in -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 -# Revised October 2000 - -@c endfile -@end ignore -@c file eg/lib/passwdawk.in -BEGIN @{ - # tailor this to suit your system - _pw_awklib = "/usr/local/libexec/awk/" -@} - -function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw) -@{ - if (_pw_inited) - return - - oldfs = FS - oldrs = RS - olddol0 = $0 - using_fw = (PROCINFO["FS"] == "FIELDWIDTHS") - FS = ":" - RS = "\n" - - pwcat = _pw_awklib "pwcat" - while ((pwcat | getline) > 0) @{ - _pw_byname[$1] = $0 - _pw_byuid[$3] = $0 - _pw_bycount[++_pw_total] = $0 - @} - close(pwcat) - _pw_count = 0 - _pw_inited = 1 - FS = oldfs - if (using_fw) - FIELDWIDTHS = FIELDWIDTHS - RS = oldrs - $0 = olddol0 -@} -@c endfile -@end example - -The @code{BEGIN} rule sets a private variable to the directory where -@command{pwcat} is stored. Because it is used to help out an @command{awk} library -routine, we have chosen to put it in @file{/usr/local/libexec/awk}; -however, you might want it to be in a different directory on your system. - -The function @code{_pw_init} keeps three copies of the user information -in three associative arrays. The arrays are indexed by username -(@code{_pw_byname}), by user-id number (@code{_pw_byuid}), and by order of -occurrence (@code{_pw_bycount}). -The variable @code{_pw_inited} is used for efficiency; @code{_pw_init} -needs only to be called once. - -Because this function uses @code{getline} to read information from -@command{pwcat}, it first saves the values of @code{FS}, @code{RS}, and @code{$0}. -It notes in the variable @code{using_fw} whether field splitting -with @code{FIELDWIDTHS} is in effect or not. -Doing so is necessary, since these functions could be called -from anywhere within a user's program, and the user may have his -or her -own way of splitting records and fields. - -The @code{using_fw} variable checks @code{PROCINFO["FS"]}, which -is @code{"FIELDWIDTHS"} if field splitting is being done with -@code{FIELDWIDTHS}. This makes it possible to restore the correct -field-splitting mechanism later. The test can only be true for -@command{gawk}. It is false if using @code{FS} or on some other -@command{awk} implementation. - -The main part of the function uses a loop to read database lines, split -the line into fields, and then store the line into each array as necessary. -When the loop is done, @code{@w{_pw_init}} cleans up by closing the pipeline, -setting @code{@w{_pw_inited}} to one, and restoring @code{FS} (and @code{FIELDWIDTHS} -if necessary), @code{RS}, and @code{$0}. -The use of @code{@w{_pw_count}} is explained shortly. - -@c NEXT ED: All of these functions don't need the ... in ... test. Just -@c return the array element, which will be "" if not already there. Duh. -The @code{getpwnam} function takes a username as a string argument. If that -user is in the database, it returns the appropriate line. Otherwise it -returns the null string: - -@cindex @code{getpwnam} user-defined function -@example -@group -@c file eg/lib/passwdawk.in -function getpwnam(name) -@{ - _pw_init() - if (name in _pw_byname) - return _pw_byname[name] - return "" -@} -@c endfile -@end group -@end example - -Similarly, -the @code{getpwuid} function takes a user-id number argument. If that -user number is in the database, it returns the appropriate line. Otherwise it -returns the null string: - -@cindex @code{getpwuid} user-defined function -@example -@c file eg/lib/passwdawk.in -function getpwuid(uid) -@{ - _pw_init() - if (uid in _pw_byuid) - return _pw_byuid[uid] - return "" -@} -@c endfile -@end example - -The @code{getpwent} function simply steps through the database, one entry at -a time. It uses @code{_pw_count} to track its current position in the -@code{_pw_bycount} array: - -@cindex @code{getpwent} user-defined function -@example -@c file eg/lib/passwdawk.in -function getpwent() -@{ - _pw_init() - if (_pw_count < _pw_total) - return _pw_bycount[++_pw_count] - return "" -@} -@c endfile -@end example - -The @code{@w{endpwent}} function resets @code{@w{_pw_count}} to zero, so that -subsequent calls to @code{getpwent} start over again: - -@cindex @code{endpwent} user-defined function -@example -@c file eg/lib/passwdawk.in -function endpwent() -@{ - _pw_count = 0 -@} -@c endfile -@end example - -A conscious design decision in this suite is that each subroutine calls -@code{@w{_pw_init}} to initialize the database arrays. The overhead of running -a separate process to generate the user database, and the I/O to scan it, -are only incurred if the user's main program actually calls one of these -functions. If this library file is loaded along with a user's program, but -none of the routines are ever called, then there is no extra runtime overhead. -(The alternative is move the body of @code{@w{_pw_init}} into a -@code{BEGIN} rule, which always runs @command{pwcat}. This simplifies the -code but runs an extra process that may never be needed.) - -In turn, calling @code{_pw_init} is not too expensive, because the -@code{_pw_inited} variable keeps the program from reading the data more than -once. If you are worried about squeezing every last cycle out of your -@command{awk} program, the check of @code{_pw_inited} could be moved out of -@code{_pw_init} and duplicated in all the other functions. In practice, -this is not necessary, since most @command{awk} programs are I/O-bound, and it -clutters up the code. - -The @command{id} program in @ref{Id Program, ,Printing out User Information}, -uses these functions. - -@node Group Functions, , Passwd Functions, Library Functions -@section Reading the Group Database - -@cindex @code{getgrent} C library function -@cindex group information -@cindex account information -@cindex group file -Much of the discussion presented in -@ref{Passwd Functions, ,Reading the User Database}, -applies to the group database as well. Although there has traditionally -been a well-known file (@file{/etc/group}) in a well-known format, the POSIX -standard only provides a set of C library routines -(@code{<grp.h>} and @code{getgrent}) -for accessing the information. -Even though this file may exist, it likely does not have -complete information. Therefore, as with the user database, it is necessary -to have a small C program that generates the group database as its output. - -@cindex @command{grcat} program -@command{grcat}, a C program that ``cats'' the group database, -is as follows: - -@example -@c file eg/lib/grcat.c -/* - * grcat.c - * - * Generate a printable version of the group database - */ -@c endfile -@ignore -@c file eg/lib/grcat.c -/* - * Arnold Robbins, arnold@@gnu.org, May 1993 - * Public Domain - */ - -@c endfile -@end ignore -@c file eg/lib/grcat.c -#include <stdio.h> -#include <grp.h> - -int -main(argc, argv) -int argc; -char **argv; -@{ - struct group *g; - int i; - - while ((g = getgrent()) != NULL) @{ - printf("%s:%s:%d:", g->gr_name, g->gr_passwd, - g->gr_gid); - for (i = 0; g->gr_mem[i] != NULL; i++) @{ - printf("%s", g->gr_mem[i]); -@group - if (g->gr_mem[i+1] != NULL) - putchar(','); - @} -@end group - putchar('\n'); - @} - endgrent(); - exit(0); -@} -@c endfile -@end example - -Each line in the group database represents one group. The fields are -separated with colons and represent the following information: - -@ignore -@table @asis -@item Group Name -The name of the group. - -@item Group Password -The encrypted group password. In practice, this field is never used. It is -usually empty or set to @samp{*}. - -@item Group ID Number -The numeric group-id number. This number should be unique within the file. - -@item Group Member List -A comma-separated list of usernames. These users are members of the group. -Modern Unix systems allow users to be members of several groups -simultaneously. If your system does, then there are elements -@code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO} -for those group-id numbers. -(Note that @code{PROCINFO} is a @command{gawk} extension; -@pxref{Built-in Variables}.) -@end table -@end ignore - -@multitable {Encrypted password} {1234567890123456789012345678901234567890123456} -@item Group name @tab The group's name. - -@item Group password @tab The group's encrypted password. In practice, this field is never used; -it is usually empty or set to @samp{*}. - -@item Group-ID @tab -The group's numeric group-id number; this number should be unique within the file. - -@item Group member list @tab -A comma-separated list of usernames. These users are members of the group. -Modern Unix systems allow users to be members of several groups -simultaneously. If your system does, then there are elements -@code{"group1"} through @code{"group@var{N}"} in @code{PROCINFO} -for those group-id numbers. -(Note that @code{PROCINFO} is a @command{gawk} extension; -@pxref{Built-in Variables}.) -@end multitable - -Here is what running @command{grcat} might produce: - -@example -$ grcat -@print{} wheel:*:0:arnold -@print{} nogroup:*:65534: -@print{} daemon:*:1: -@print{} kmem:*:2: -@print{} staff:*:10:arnold,miriam,andy -@print{} other:*:20: -@dots{} -@end example - -Here are the functions for obtaining information from the group database. -There are several, modeled after the C library functions of the same names: - -@cindex @code{_gr_init} user-defined function -@example -@c file eg/lib/groupawk.in -# group.awk --- functions for dealing with the group file -@c endfile -@ignore -@c file eg/lib/groupawk.in -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 -# Revised October 2000 - -@c endfile -@end ignore -@c line break on _gr_init for smallbook -@c file eg/lib/groupawk.in -BEGIN \ -@{ - # Change to suit your system - _gr_awklib = "/usr/local/libexec/awk/" -@} - -function _gr_init( oldfs, oldrs, olddol0, grcat, - using_fw, n, a, i) -@{ - if (_gr_inited) - return - - oldfs = FS - oldrs = RS - olddol0 = $0 - using_fw = (PROCINFO["FS"] == "FIELDWIDTHS") - FS = ":" - RS = "\n" - - grcat = _gr_awklib "grcat" - while ((grcat | getline) > 0) @{ - if ($1 in _gr_byname) - _gr_byname[$1] = _gr_byname[$1] "," $4 - else - _gr_byname[$1] = $0 - if ($3 in _gr_bygid) - _gr_bygid[$3] = _gr_bygid[$3] "," $4 - else - _gr_bygid[$3] = $0 - - n = split($4, a, "[ \t]*,[ \t]*") - for (i = 1; i <= n; i++) - if (a[i] in _gr_groupsbyuser) - _gr_groupsbyuser[a[i]] = \ - _gr_groupsbyuser[a[i]] " " $1 - else - _gr_groupsbyuser[a[i]] = $1 - - _gr_bycount[++_gr_count] = $0 - @} - close(grcat) - _gr_count = 0 - _gr_inited++ - FS = oldfs - if (using_fw) - FIELDWIDTHS = FIELDWIDTHS - RS = oldrs - $0 = olddol0 -@} -@c endfile -@end example - -The @code{BEGIN} rule sets a private variable to the directory where -@command{grcat} is stored. Because it is used to help out an @command{awk} library -routine, we have chosen to put it in @file{/usr/local/libexec/awk}. You might -want it to be in a different directory on your system. - -These routines follow the same general outline as the user database routines -(@pxref{Passwd Functions, ,Reading the User Database}). -The @code{@w{_gr_inited}} variable is used to -ensure that the database is scanned no more than once. -The @code{@w{_gr_init}} function first saves @code{FS}, @code{FIELDWIDTHS}, @code{RS}, and -@code{$0}, and then sets @code{FS} and @code{RS} to the correct values for -scanning the group information. - -The group information is stored is several associative arrays. -The arrays are indexed by group name (@code{@w{_gr_byname}}), by group-id number -(@code{@w{_gr_bygid}}), and by position in the database (@code{@w{_gr_bycount}}). -There is an additional array indexed by username (@code{@w{_gr_groupsbyuser}}), -which is a space-separated list of groups that each user belongs to. - -Unlike the user database, it is possible to have multiple records in the -database for the same group. This is common when a group has a large number -of members. A pair of such entries might look like the following: - -@example -tvpeople:*:101:johnny,jay,arsenio -tvpeople:*:101:david,conan,tom,joan -@end example - -For this reason, @code{_gr_init} looks to see if a group name or -group-id number is already seen. If it is, then the usernames are -simply concatenated onto the previous list of users. (There is actually a -subtle problem with the code just presented. Suppose that -the first time there were no names. This code adds the names with -a leading comma. It also doesn't check that there is a @code{$4}.) - -Finally, @code{_gr_init} closes the pipeline to @command{grcat}, restores -@code{FS} (and @code{FIELDWIDTHS} if necessary), @code{RS}, and @code{$0}, -initializes @code{_gr_count} to zero -(it is used later), and makes @code{_gr_inited} nonzero. - -The @code{getgrnam} function takes a group name as its argument, and if that -group exists, it is returned. Otherwise, @code{getgrnam} returns the null -string: - -@cindex @code{getgrnam} user-defined function -@example -@c file eg/lib/groupawk.in -function getgrnam(group) -@{ - _gr_init() - if (group in _gr_byname) - return _gr_byname[group] - return "" -@} -@c endfile -@end example - -The @code{getgrgid} function is similar, it takes a numeric group-id and -looks up the information associated with that group-id: - -@cindex @code{getgrgid} user-defined function -@example -@c file eg/lib/groupawk.in -function getgrgid(gid) -@{ - _gr_init() - if (gid in _gr_bygid) - return _gr_bygid[gid] - return "" -@} -@c endfile -@end example - -The @code{getgruser} function does not have a C counterpart. It takes a -username and returns the list of groups that have the user as a member: - -@cindex @code{getgruser} user-defined function -@example -@c file eg/lib/groupawk.in -function getgruser(user) -@{ - _gr_init() - if (user in _gr_groupsbyuser) - return _gr_groupsbyuser[user] - return "" -@} -@c endfile -@end example - -The @code{getgrent} function steps through the database one entry at a time. -It uses @code{_gr_count} to track its position in the list: - -@cindex @code{getgrent} user-defined function -@example -@c file eg/lib/groupawk.in -function getgrent() -@{ - _gr_init() - if (++_gr_count in _gr_bycount) - return _gr_bycount[_gr_count] - return "" -@} -@c endfile -@end example - -The @code{endgrent} function resets @code{_gr_count} to zero so that @code{getgrent} can -start over again: - -@cindex @code{endgrent} user-defined function -@example -@c file eg/lib/groupawk.in -function endgrent() -@{ - _gr_count = 0 -@} -@c endfile -@end example - -As with the user database routines, each function calls @code{_gr_init} to -initialize the arrays. Doing so only incurs the extra overhead of running -@command{grcat} if these functions are used (as opposed to moving the body of -@code{_gr_init} into a @code{BEGIN} rule). - -Most of the work is in scanning the database and building the various -associative arrays. The functions that the user calls are themselves very -simple, relying on @command{awk}'s associative arrays to do work. - -The @command{id} program in @ref{Id Program, ,Printing out User Information}, -uses these functions. - -@node Sample Programs, Language History, Library Functions, Top -@chapter Practical @command{awk} Programs - -@ref{Library Functions, ,A Library of @command{awk} Functions}, -presents the idea that reading programs in a language contributes to -learning that language. This @value{CHAPTER} continues that theme, -presenting a potpourri of @command{awk} programs for your reading -enjoyment. -@ifnotinfo -There are three sections. -The first describes how to run the programs presented -in this @value{CHAPTER}. - -The second presents @command{awk} -versions of several common POSIX utilities. -These are programs that you are hopefully already familiar with, -and therefore, whose problems are understood. -By reimplementing these programs in @command{awk}, -you can focus on the @command{awk}-related aspects of solving -the programming problem. - -The third is a grab bag of interesting programs. -These solve a number of different data-manipulation and management -problems. Many of the programs are short, which emphasizes @command{awk}'s -ability to do a lot in just a few lines of code. -@end ifnotinfo - -Many of these programs use the library functions presented in -@ref{Library Functions, ,A Library of @command{awk} Functions}. - -@menu -* Running Examples:: How to run these examples. -* Clones:: Clones of common utilities. -* Miscellaneous Programs:: Some interesting @command{awk} programs. -@end menu - -@node Running Examples, Clones, Sample Programs, Sample Programs -@section Running the Example Programs - -To run a given program, you would typically do something like this: - -@example -awk -f @var{program} -- @var{options} @var{files} -@end example - -@noindent -Here, @var{program} is the name of the @command{awk} program (such as -@file{cut.awk}), @var{options} are any command-line options for the -program that start with a @samp{-}, and @var{files} are the actual @value{DF}s. - -If your system supports the @samp{#!} executable interpreter mechanism -(@pxref{Executable Scripts, , Executable @command{awk} Programs}), -you can instead run your program directly: - -@example -cut.awk -c1-8 myfiles > results -@end example - -If your @command{awk} is not @command{gawk}, you may instead need to use this: - -@example -cut.awk -- -c1-8 myfiles > results -@end example - -@node Clones, Miscellaneous Programs, Running Examples, Sample Programs -@section Reinventing Wheels for Fun and Profit - -This @value{SECTION} presents a number of POSIX utilities that are implemented in -@command{awk}. Reinventing these programs in @command{awk} is often enjoyable, -because the algorithms can be very clearly expressed, and the code is usually -very concise and simple. This is true because @command{awk} does so much for you. - -It should be noted that these programs are not necessarily intended to -replace the installed versions on your system. Instead, their -purpose is to illustrate @command{awk} language programming for ``real world'' -tasks. - -The programs are presented in alphabetical order. - -@menu -* Cut Program:: The @command{cut} utility. -* Egrep Program:: The @command{egrep} utility. -* Id Program:: The @command{id} utility. -* Split Program:: The @command{split} utility. -* Tee Program:: The @command{tee} utility. -* Uniq Program:: The @command{uniq} utility. -* Wc Program:: The @command{wc} utility. -@end menu - -@node Cut Program, Egrep Program, Clones, Clones -@subsection Cutting out Fields and Columns - -@cindex @command{cut} utility -The @command{cut} utility selects, or ``cuts,'' characters or fields -from its standard input and sends them to its standard output. -Fields are separated by tabs by default, -but you may supply a command-line option to change the field -@dfn{delimiter} (i.e., the field separator character). @command{cut}'s -definition of fields is less general than @command{awk}'s. - -A common use of @command{cut} might be to pull out just the login name of -logged-on users from the output of @command{who}. For example, the following -pipeline generates a sorted, unique list of the logged-on users: - -@example -who | cut -c1-8 | sort | uniq -@end example - -The options for @command{cut} are: - -@table @code -@item -c @var{list} -Use @var{list} as the list of characters to cut out. Items within the list -may be separated by commas, and ranges of characters can be separated with -dashes. The list @samp{1-8,15,22-35} specifies characters 1 through -8, 15, and 22 through 35. - -@item -f @var{list} -Use @var{list} as the list of fields to cut out. - -@item -d @var{delim} -Use @var{delim} as the field separator character instead of the tab -character. - -@item -s -Suppress printing of lines that do not contain the field delimiter. -@end table - -The @command{awk} implementation of @command{cut} uses the @code{getopt} library -function (@pxref{Getopt Function, ,Processing Command-Line Options}) -and the @code{join} library function -(@pxref{Join Function, ,Merging an Array into a String}). - -The program begins with a comment describing the options, the library -functions needed, and a @code{usage} function that prints out a usage -message and exits. @code{usage} is called if invalid arguments are -supplied: - -@cindex @code{cut.awk} program -@example -@c file eg/prog/cut.awk -# cut.awk --- implement cut in awk -@c endfile -@ignore -@c file eg/prog/cut.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/prog/cut.awk -# Options: -# -f list Cut fields -# -d c Field delimiter character -# -c list Cut characters -# -# -s Suppress lines without the delimiter -# -# Requires getopt and join library functions - -@group -function usage( e1, e2) -@{ - e1 = "usage: cut [-f list] [-d c] [-s] [files...]" - e2 = "usage: cut [-c list] [files...]" - print e1 > "/dev/stderr" - print e2 > "/dev/stderr" - exit 1 -@} -@end group -@c endfile -@end example - -@noindent -The variables @code{e1} and @code{e2} are used so that the function -fits nicely on the -@ifnotinfo -page. -@end ifnotinfo -@ifnottex -screen. -@end ifnottex - -Next comes a @code{BEGIN} rule that parses the command-line options. -It sets @code{FS} to a single tab character, because that is @command{cut}'s -default field separator. The output field separator is also set to be the -same as the input field separator. Then @code{getopt} is used to step -through the command-line options. One or the other of the variables -@code{by_fields} or @code{by_chars} is set to true, to indicate that -processing should be done by fields or by characters, respectively. -When cutting by characters, the output field separator is set to the null -string. - -@example -@c file eg/prog/cut.awk -BEGIN \ -@{ - FS = "\t" # default - OFS = FS - while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) @{ - if (c == "f") @{ - by_fields = 1 - fieldlist = Optarg - @} else if (c == "c") @{ - by_chars = 1 - fieldlist = Optarg - OFS = "" - @} else if (c == "d") @{ - if (length(Optarg) > 1) @{ - printf("Using first character of %s" \ - " for delimiter\n", Optarg) > "/dev/stderr" - Optarg = substr(Optarg, 1, 1) - @} - FS = Optarg - OFS = FS - if (FS == " ") # defeat awk semantics - FS = "[ ]" - @} else if (c == "s") - suppress++ - else - usage() - @} - - for (i = 1; i < Optind; i++) - ARGV[i] = "" -@c endfile -@end example - -Special care is taken when the field delimiter is a space. Using -a single space (@code{@w{" "}}) for the value of @code{FS} is -incorrect---@command{awk} would separate fields with runs of spaces, -tabs, and/or newlines, and we want them to be separated with individual -spaces. Also, note that after @code{getopt} is through, we have to -clear out all the elements of @code{ARGV} from 1 to @code{Optind}, -so that @command{awk} does not try to process the command-line options -as @value{FN}s. - -After dealing with the command-line options, the program verifies that the -options make sense. Only one or the other of @option{-c} and @option{-f} -should be used, and both require a field list. Then the program calls -either @code{set_fieldlist} or @code{set_charlist} to pull apart the -list of fields or characters: - -@example -@c file eg/prog/cut.awk - if (by_fields && by_chars) - usage() - - if (by_fields == 0 && by_chars == 0) - by_fields = 1 # default - - if (fieldlist == "") @{ - print "cut: needs list for -c or -f" > "/dev/stderr" - exit 1 - @} - - if (by_fields) - set_fieldlist() - else - set_charlist() -@} -@c endfile -@end example - -@code{set_fieldlist} is used to split the field list apart at the commas, -and into an array. Then, for each element of the array, it looks to -see if it is actually a range, and if so, splits it apart. The range -is verified to make sure the first number is smaller than the second. -Each number in the list is added to the @code{flist} array, which -simply lists the fields that will be printed. Normal field splitting -is used. The program lets @command{awk} handle the job of doing the -field splitting: - -@example -@c file eg/prog/cut.awk -function set_fieldlist( n, m, i, j, k, f, g) -@{ - n = split(fieldlist, f, ",") - j = 1 # index in flist - for (i = 1; i <= n; i++) @{ - if (index(f[i], "-") != 0) @{ # a range - m = split(f[i], g, "-") -@group - if (m != 2 || g[1] >= g[2]) @{ - printf("bad field list: %s\n", - f[i]) > "/dev/stderr" - exit 1 - @} -@end group - for (k = g[1]; k <= g[2]; k++) - flist[j++] = k - @} else - flist[j++] = f[i] - @} - nfields = j - 1 -@} -@c endfile -@end example - -The @code{set_charlist} function is more complicated than @code{set_fieldlist}. -The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable -(@pxref{Constant Size, ,Reading Fixed-Width Data}), -which describes constant width input. When using a character list, that is -exactly what we have. - -Setting up @code{FIELDWIDTHS} is more complicated than simply listing the -fields that need to be printed. We have to keep track of the fields to -print and also the intervening characters that have to be skipped. -For example, suppose you wanted characters 1 through 8, 15, and -22 through 35. You would use @samp{-c 1-8,15,22-35}. The necessary value -for @code{FIELDWIDTHS} is @code{@w{"8 6 1 6 14"}}. This yields five -fields, and the fields to print -are @code{$1}, @code{$3}, and @code{$5}. -The intermediate fields are @dfn{filler}, -which is stuff in between the desired data. -@code{flist} lists the fields to print, and @code{t} tracks the -complete field list, including filler fields: - -@example -@c file eg/prog/cut.awk -function set_charlist( field, i, j, f, g, t, - filler, last, len) -@{ - field = 1 # count total fields - n = split(fieldlist, f, ",") - j = 1 # index in flist - for (i = 1; i <= n; i++) @{ - if (index(f[i], "-") != 0) @{ # range - m = split(f[i], g, "-") - if (m != 2 || g[1] >= g[2]) @{ - printf("bad character list: %s\n", - f[i]) > "/dev/stderr" - exit 1 - @} - len = g[2] - g[1] + 1 - if (g[1] > 1) # compute length of filler - filler = g[1] - last - 1 - else - filler = 0 -@group - if (filler) - t[field++] = filler -@end group - t[field++] = len # length of field - last = g[2] - flist[j++] = field - 1 - @} else @{ - if (f[i] > 1) - filler = f[i] - last - 1 - else - filler = 0 - if (filler) - t[field++] = filler - t[field++] = 1 - last = f[i] - flist[j++] = field - 1 - @} - @} - FIELDWIDTHS = join(t, 1, field - 1) - nfields = j - 1 -@} -@c endfile -@end example - -Next is the rule that actually processes the data. If the @option{-s} option -is given, then @code{suppress} is true. The first @code{if} statement -makes sure that the input record does have the field separator. If -@command{cut} is processing fields, @code{suppress} is true, and the field -separator character is not in the record, then the record is skipped. - -If the record is valid, then @command{gawk} has split the data -into fields, either using the character in @code{FS} or using fixed-length -fields and @code{FIELDWIDTHS}. The loop goes through the list of fields -that should be printed. The corresponding field is printed if it contains data. -If the next field also has data, then the separator character is -written out between the fields: - -@example -@c file eg/prog/cut.awk -@{ - if (by_fields && suppress && index($0, FS) != 0) - next - - for (i = 1; i <= nfields; i++) @{ - if ($flist[i] != "") @{ - printf "%s", $flist[i] - if (i < nfields && $flist[i+1] != "") - printf "%s", OFS - @} - @} - print "" -@} -@c endfile -@end example - -This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS} -variable to do the character-based cutting. While it is possible in -other @command{awk} implementations to use @code{substr} -(@pxref{String Functions, ,String Manipulation Functions}), -it is also extremely painful. -The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem -of picking the input line apart by characters. - -@c Exercise: Rewrite using split with "". - -@node Egrep Program, Id Program, Cut Program, Clones -@subsection Searching for Regular Expressions in Files - -@cindex @command{egrep} utility -The @command{egrep} utility searches files for patterns. It uses regular -expressions that are almost identical to those available in @command{awk} -(@pxref{Regexp, ,Regular Expressions}). -It is used in the following manner: - -@example -egrep @r{[} @var{options} @r{]} '@var{pattern}' @var{files} @dots{} -@end example - -The @var{pattern} is a regular expression. In typical usage, the regular -expression is quoted to prevent the shell from expanding any of the -special characters as @value{FN} wildcards. Normally, @command{egrep} -prints the lines that matched. If multiple @value{FN}s are provided on -the command line, each output line is preceded by the name of the file -and a colon. - -The options to @command{egrep} are as follows: - -@table @code -@item -c -Print out a count of the lines that matched the pattern, instead of the -lines themselves. - -@item -s -Be silent. No output is produced and the exit value indicates whether -the pattern was matched. - -@item -v -Invert the sense of the test. @command{egrep} prints the lines that do -@emph{not} match the pattern and exits successfully if the pattern is not -matched. - -@item -i -Ignore case distinctions in both the pattern and the input data. - -@item -l -Only print (list) the names of the files that matched, not the lines that matched. - -@item -e @var{pattern} -Use @var{pattern} as the regexp to match. The purpose of the @option{-e} -option is to allow patterns that start with a @samp{-}. -@end table - -This version uses the @code{getopt} library function -(@pxref{Getopt Function, ,Processing Command-Line Options}) -and the file transition library program -(@pxref{Filetrans Function, ,Noting @value{DDF} Boundaries}). - -The program begins with a descriptive comment and then a @code{BEGIN} rule -that processes the command-line arguments with @code{getopt}. The @option{-i} -(ignore case) option is particularly easy with @command{gawk}; we just use the -@code{IGNORECASE} built-in variable -(@pxref{Built-in Variables}): - -@cindex @code{egrep.awk} program -@example -@c file eg/prog/egrep.awk -# egrep.awk --- simulate egrep in awk -@c endfile -@ignore -@c file eg/prog/egrep.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/prog/egrep.awk -# Options: -# -c count of lines -# -s silent - use exit value -# -v invert test, success if no match -# -i ignore case -# -l print filenames only -# -e argument is pattern -# -# Requires getopt and file transition library functions - -BEGIN @{ - while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{ - if (c == "c") - count_only++ - else if (c == "s") - no_print++ - else if (c == "v") - invert++ - else if (c == "i") - IGNORECASE = 1 - else if (c == "l") - filenames_only++ - else if (c == "e") - pattern = Optarg - else - usage() - @} -@c endfile -@end example - -Next comes the code that handles the @command{egrep}-specific behavior. If no -pattern is supplied with @option{-e}, the first non-option on the -command line is used. The @command{awk} command-line arguments up to @code{ARGV[Optind]} -are cleared, so that @command{awk} won't try to process them as files. If no -files are specified, the standard input is used, and if multiple files are -specified, we make sure to note this so that the @value{FN}s can precede the -matched lines in the output: - -@example -@c file eg/prog/egrep.awk - if (pattern == "") - pattern = ARGV[Optind++] - - for (i = 1; i < Optind; i++) - ARGV[i] = "" - if (Optind >= ARGC) @{ - ARGV[1] = "-" - ARGC = 2 - @} else if (ARGC - Optind > 1) - do_filenames++ - -# if (IGNORECASE) -# pattern = tolower(pattern) -@} -@c endfile -@end example - -The last two lines are commented out, since they are not needed in -@command{gawk}. They should be uncommented if you have to use another version -of @command{awk}. - -The next set of lines should be uncommented if you are not using -@command{gawk}. This rule translates all the characters in the input line -into lowercase if the @option{-i} option is specified.@footnote{It -also introduces a subtle bug; -if a match happens, we output the translated line, not the original.} -The rule is -commented out since it is not necessary with @command{gawk}: - -@c Exercise: Fix this, w/array and new line as key to original line - -@example -@c file eg/prog/egrep.awk -#@{ -# if (IGNORECASE) -# $0 = tolower($0) -#@} -@c endfile -@end example - -The @code{beginfile} function is called by the rule in @file{ftrans.awk} -when each new file is processed. In this case, it is very simple; all it -does is initialize a variable @code{fcount} to zero. @code{fcount} tracks -how many lines in the current file matched the pattern. -(Naming the parameter @code{junk} shows we know that @code{beginfile} -is called with a parameter, but that we're not interested in its value.): - -@example -@c file eg/prog/egrep.awk -function beginfile(junk) -@{ - fcount = 0 -@} -@c endfile -@end example - -The @code{endfile} function is called after each file has been processed. -It affects the output only when the user wants a count of the number of lines that -matched. @code{no_print} is true only if the exit status is desired. -@code{count_only} is true if line counts are desired. @command{egrep} -therefore only prints line counts if printing and counting are enabled. -The output format must be adjusted depending upon the number of files to -process. Finally, @code{fcount} is added to @code{total}, so that we -know how many lines altogether matched the pattern: - -@example -@c file eg/prog/egrep.awk -function endfile(file) -@{ - if (! no_print && count_only) - if (do_filenames) - print file ":" fcount - else - print fcount - - total += fcount -@} -@c endfile -@end example - -The following rule does most of the work of matching lines. The variable -@code{matches} is true if the line matched the pattern. If the user -wants lines that did not match, the sense of @code{matches} is inverted -using the @samp{!} operator. @code{fcount} is incremented with the value of -@code{matches}, which is either one or zero, depending upon a -successful or unsuccessful match. If the line does not match, the -@code{next} statement just moves on to the next record. - -A number of additional tests are made, but they are only done if we -are not counting lines. First, if the user only wants exit status -(@code{no_print} is true), then it is enough to know that @emph{one} -line in this file matched, and we can skip on to the next file with -@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can -print the @value{FN}, and then skip to the next file with @code{nextfile}. -Finally, each line is printed, with a leading @value{FN} and colon -if necessary: - -@cindex @code{!} operator -@example -@c file eg/prog/egrep.awk -@{ - matches = ($0 ~ pattern) - if (invert) - matches = ! matches - - fcount += matches # 1 or 0 - - if (! matches) - next - - if (! count_only) @{ - if (no_print) - nextfile - - if (filenames_only) @{ - print FILENAME - nextfile - @} - - if (do_filenames) - print FILENAME ":" $0 - else - print - @} -@} -@c endfile -@end example - -The @code{END} rule takes care of producing the correct exit status. If -there are no matches, the exit status is one, otherwise it is zero: - -@example -@c file eg/prog/egrep.awk -END \ -@{ - if (total == 0) - exit 1 - exit 0 -@} -@c endfile -@end example - -The @code{usage} function prints a usage message in case of invalid options, -and then exits: - -@example -@c file eg/prog/egrep.awk -function usage( e) -@{ - e = "Usage: egrep [-csvil] [-e pat] [files ...]" - e = e "\n\tegrep [-csvil] pat [files ...]" - print e > "/dev/stderr" - exit 1 -@} -@c endfile -@end example - -The variable @code{e} is used so that the function fits nicely -on the printed page. - -@cindex backslash continuation -Just a note on programming style: you may have noticed that the @code{END} -rule uses backslash continuation, with the open brace on a line by -itself. This is so that it more closely resembles the way functions -are written. Many of the examples -in this @value{CHAPTER} -use this style. You can decide for yourself if you like writing -your @code{BEGIN} and @code{END} rules this way -or not. - -@node Id Program, Split Program, Egrep Program, Clones -@subsection Printing out User Information - -@cindex @command{id} utility -The @command{id} utility lists a user's real and effective user-id numbers, -real and effective group-id numbers, and the user's group set, if any. -@command{id} only prints the effective user-id and group-id if they are -different from the real ones. If possible, @command{id} also supplies the -corresponding user and group names. The output might look like this: - -@example -$ id -@print{} uid=2076(arnold) gid=10(staff) groups=10(staff),4(tty) -@end example - -This information is part of what is provided by @command{gawk}'s -@code{PROCINFO} array (@pxref{Built-in Variables}). -However, the @command{id} utility provides a more palatable output than just -individual numbers. - -Here is a simple version of @command{id} written in @command{awk}. -It uses the user database library functions -(@pxref{Passwd Functions, ,Reading the User Database}) -and the group database library functions -(@pxref{Group Functions, ,Reading the Group Database}): - -The program is fairly straightforward. All the work is done in the -@code{BEGIN} rule. The user and group ID numbers are obtained from -@code{PROCINFO}. -The code is repetitive. The entry in the user database for the real user-id -number is split into parts at the @samp{:}. The name is the first field. -Similar code is used for the effective user-id number and the group -numbers. - -@cindex @code{id.awk} program -@example -@c file eg/prog/id.awk -# id.awk --- implement id in awk -# -# Requires user and group library functions -@c endfile -@ignore -@c file eg/prog/id.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 -# Revised February 1996 - -@c endfile -@end ignore -@c file eg/prog/id.awk -# output is: -# uid=12(foo) euid=34(bar) gid=3(baz) \ -# egid=5(blat) groups=9(nine),2(two),1(one) - -@group -BEGIN \ -@{ - uid = PROCINFO["uid"] - euid = PROCINFO["euid"] - gid = PROCINFO["gid"] - egid = PROCINFO["egid"] -@end group - - printf("uid=%d", uid) - pw = getpwuid(uid) - if (pw != "") @{ - split(pw, a, ":") - printf("(%s)", a[1]) - @} - - if (euid != uid) @{ - printf(" euid=%d", euid) - pw = getpwuid(euid) - if (pw != "") @{ - split(pw, a, ":") - printf("(%s)", a[1]) - @} - @} - - printf(" gid=%d", gid) - pw = getgrgid(gid) - if (pw != "") @{ - split(pw, a, ":") - printf("(%s)", a[1]) - @} - - if (egid != gid) @{ - printf(" egid=%d", egid) - pw = getgrgid(egid) - if (pw != "") @{ - split(pw, a, ":") - printf("(%s)", a[1]) - @} - @} - - for (i = 1; ("group" i) in PROCINFO; i++) @{ - if (i == 1) - printf(" groups=") - group = PROCINFO["group" i] - printf("%d", group) - pw = getgrgid(group) - if (pw != "") @{ - split(pw, a, ":") - printf("(%s)", a[1]) - @} - if (("group" (i+1)) in PROCINFO) - printf(",") - @} - - print "" -@} -@c endfile -@end example - -@cindex @code{in} operator -The test in the @code{for} loop is worth noting. -Any supplementary groups in the @code{PROCINFO} array have the -indices @code{"group1"} through @code{"group@var{N}"} for some -@var{N}; i.e., the total number of supplementary groups. -The problem is, we don't know in advance how many of these groups -there are. - -This loop works by starting at one, concatenating the value with -@code{"group"}, and then using @code{in} to see if that value is -in the array. Eventually, @code{i} is incremented past -the last group in the array and the loop exits. - -The loop is also correct if there are @emph{no} supplementary -groups; then the condition is false the first time it's -tested, and the loop body never executes. - -@c exercise!!! -@ignore -The POSIX version of @command{id} takes arguments that control which -information is printed. Modify this version to accept the same -arguments and perform in the same way. -@end ignore - -@node Split Program, Tee Program, Id Program, Clones -@subsection Splitting a Large File into Pieces - -@cindex @code{split} utility -The @code{split} program splits large text files into smaller pieces. -The usage is as follows: - -@example -split @r{[}-@var{count}@r{]} file @r{[} @var{prefix} @r{]} -@end example - -By default, -the output files are named @file{xaa}, @file{xab}, and so on. Each file has -1000 lines in it, with the likely exception of the last file. To change the -number of lines in each file, supply a number on the command line -preceded with a minus; e.g., @samp{-500} for files with 500 lines in them -instead of 1000. To change the name of the output files to something like -@file{myfileaa}, @file{myfileab}, and so on, supply an additional -argument that specifies the @value{FN} prefix. - -Here is a version of @code{split} in @command{awk}. It uses the @code{ord} and -@code{chr} functions presented in -@ref{Ordinal Functions, ,Translating Between Characters and Numbers}. - -The program first sets its defaults, and then tests to make sure there are -not too many arguments. It then looks at each argument in turn. The -first argument could be a minus followed by a number. If it is, this happens -to look like a negative number, so it is made positive, and that is the -count of lines. The data @value{FN} is skipped over and the final argument -is used as the prefix for the output @value{FN}s: - -@cindex @code{split.awk} program -@example -@c file eg/prog/split.awk -# split.awk --- do split in awk -# -# Requires ord and chr library functions -@c endfile -@ignore -@c file eg/prog/split.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/prog/split.awk -# usage: split [-num] [file] [outname] - -BEGIN @{ - outfile = "x" # default - count = 1000 - if (ARGC > 4) - usage() - - i = 1 - if (ARGV[i] ~ /^-[0-9]+$/) @{ - count = -ARGV[i] - ARGV[i] = "" - i++ - @} - # test argv in case reading from stdin instead of file - if (i in ARGV) - i++ # skip data file name - if (i in ARGV) @{ - outfile = ARGV[i] - ARGV[i] = "" - @} - - s1 = s2 = "a" - out = (outfile s1 s2) -@} -@c endfile -@end example - -The next rule does most of the work. @code{tcount} (temporary count) tracks -how many lines have been printed to the output file so far. If it is greater -than @code{count}, it is time to close the current file and start a new one. -@code{s1} and @code{s2} track the current suffixes for the @value{FN}. If -they are both @samp{z}, the file is just too big. Otherwise, @code{s1} -moves to the next letter in the alphabet and @code{s2} starts over again at -@samp{a}: - -@c else on separate line here for page breaking -@example -@c file eg/prog/split.awk -@{ - if (++tcount > count) @{ - close(out) - if (s2 == "z") @{ - if (s1 == "z") @{ - printf("split: %s is too large to split\n", - FILENAME) > "/dev/stderr" - exit 1 - @} - s1 = chr(ord(s1) + 1) - s2 = "a" - @} -@group - else - s2 = chr(ord(s2) + 1) -@end group - out = (outfile s1 s2) - tcount = 1 - @} - print > out -@} -@c endfile -@end example - -@c Exercise: do this with just awk builtin functions, index("abc..."), substr, etc. - -@noindent -The @code{usage} function simply prints an error message and exits: - -@example -@c file eg/prog/split.awk -function usage( e) -@{ - e = "usage: split [-num] [file] [outname]" - print e > "/dev/stderr" - exit 1 -@} -@c endfile -@end example - -@noindent -The variable @code{e} is used so that the function -fits nicely on the -@ifinfo -screen. -@end ifinfo -@ifnotinfo -page. -@end ifnotinfo - -This program is a bit sloppy; it relies on @command{awk} to close the last file -for it automatically, instead of doing it in an @code{END} rule. -It also assumes that letters are contiguous in the character set, -which isn't true for EBCDIC systems. -@c BFD... - -@node Tee Program, Uniq Program, Split Program, Clones -@subsection Duplicating Output into Multiple Files - -@cindex @code{tee} utility -The @code{tee} program is known as a ``pipe fitting.'' @code{tee} copies -its standard input to its standard output and also duplicates it to the -files named on the command line. Its usage is as follows: - -@example -tee @r{[}-a@r{]} file @dots{} -@end example - -The @option{-a} option tells @code{tee} to append to the named files, instead of -truncating them and starting over. - -The @code{BEGIN} rule first makes a copy of all the command-line arguments -into an array named @code{copy}. -@code{ARGV[0]} is not copied, since it is not needed. -@code{tee} cannot use @code{ARGV} directly, since @command{awk} attempts to -process each @value{FN} in @code{ARGV} as input data. - -@cindex flag variables -If the first argument is @option{-a}, then the flag variable -@code{append} is set to true, and both @code{ARGV[1]} and -@code{copy[1]} are deleted. If @code{ARGC} is less than two, then no -@value{FN}s were supplied and @code{tee} prints a usage message and exits. -Finally, @command{awk} is forced to read the standard input by setting -@code{ARGV[1]} to @code{"-"} and @code{ARGC} to two: - -@c NEXT ED: Add more leading commentary in this program -@cindex @code{tee.awk} program -@example -@c file eg/prog/tee.awk -# tee.awk --- tee in awk -@c endfile -@ignore -@c file eg/prog/tee.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 -# Revised December 1995 - -@c endfile -@end ignore -@c file eg/prog/tee.awk -BEGIN \ -@{ - for (i = 1; i < ARGC; i++) - copy[i] = ARGV[i] - - if (ARGV[1] == "-a") @{ - append = 1 - delete ARGV[1] - delete copy[1] - ARGC-- - @} - if (ARGC < 2) @{ - print "usage: tee [-a] file ..." > "/dev/stderr" - exit 1 - @} - ARGV[1] = "-" - ARGC = 2 -@} -@c endfile -@end example - -The single rule does all the work. Since there is no pattern, it is -executed for each line of input. The body of the rule simply prints the -line into each file on the command line, and then to the standard output: - -@example -@c file eg/prog/tee.awk -@{ - # moving the if outside the loop makes it run faster - if (append) - for (i in copy) - print >> copy[i] - else - for (i in copy) - print > copy[i] - print -@} -@c endfile -@end example - -@noindent -It is also possible to write the loop this way: - -@example -for (i in copy) - if (append) - print >> copy[i] - else - print > copy[i] -@end example - -@noindent -This is more concise but it is also less efficient. The @samp{if} is -tested for each record and for each output file. By duplicating the loop -body, the @samp{if} is only tested once for each input record. If there are -@var{N} input records and @var{M} output files, the first method only -executes @var{N} @samp{if} statements, while the second executes -@var{N}@code{*}@var{M} @samp{if} statements. - -Finally, the @code{END} rule cleans up by closing all the output files: - -@example -@c file eg/prog/tee.awk -END \ -@{ - for (i in copy) - close(copy[i]) -@} -@c endfile -@end example - -@node Uniq Program, Wc Program, Tee Program, Clones -@subsection Printing Non-Duplicated Lines of Text - -@cindex @command{uniq} utility -The @command{uniq} utility reads sorted lines of data on its standard -input, and by default removes duplicate lines. In other words, it only -prints unique lines---hence the name. @command{uniq} has a number of -options. The usage is as follows: - -@example -uniq @r{[}-udc @r{[}-@var{n}@r{]]} @r{[}+@var{n}@r{]} @r{[} @var{input file} @r{[} @var{output file} @r{]]} -@end example - -The option meanings are: - -@table @code -@item -d -Only print repeated lines. - -@item -u -Only print non-repeated lines. - -@item -c -Count lines. This option overrides @option{-d} and @option{-u}. Both repeated -and non-repeated lines are counted. - -@item -@var{n} -Skip @var{n} fields before comparing lines. The definition of fields -is similar to @command{awk}'s default: non-whitespace characters separated -by runs of spaces and/or tabs. - -@item +@var{n} -Skip @var{n} characters before comparing lines. Any fields specified with -@samp{-@var{n}} are skipped first. - -@item @var{input file} -Data is read from the input file named on the command line, instead of from -the standard input. - -@item @var{output file} -The generated output is sent to the named output file, instead of to the -standard output. -@end table - -Normally @command{uniq} behaves as if both the @option{-d} and -@option{-u} options are provided. - -@command{uniq} uses the -@code{getopt} library function -(@pxref{Getopt Function, ,Processing Command-Line Options}) -and the @code{join} library function -(@pxref{Join Function, ,Merging an Array into a String}). - -The program begins with a @code{usage} function and then a brief outline of -the options and their meanings in a comment. -The @code{BEGIN} rule deals with the command-line arguments and options. It -uses a trick to get @code{getopt} to handle options of the form @samp{-25}, -treating such an option as the option letter @samp{2} with an argument of -@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks -like a number), @code{Optarg} is -concatenated with the option digit and then the result is added to zero to make -it into a number. If there is only one digit in the option, then -@code{Optarg} is not needed. @code{Optind} must be decremented so that -@code{getopt} processes it next time. This code is admittedly a bit -tricky. - -If no options are supplied, then the default is taken, to print both -repeated and non-repeated lines. The output file, if provided, is assigned -to @code{outputfile}. Early on, @code{outputfile} is initialized to the -standard output, @file{/dev/stdout}: - -@cindex @code{uniq.awk} program -@example -@c file eg/prog/uniq.awk -@group -# uniq.awk --- do uniq in awk -# -# Requires getopt and join library functions -@end group -@c endfile -@ignore -@c file eg/prog/uniq.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/prog/uniq.awk -function usage( e) -@{ - e = "Usage: uniq [-udc [-n]] [+n] [ in [ out ]]" - print e > "/dev/stderr" - exit 1 -@} - -# -c count lines. overrides -d and -u -# -d only repeated lines -# -u only non-repeated lines -# -n skip n fields -# +n skip n characters, skip fields first - -BEGIN \ -@{ - count = 1 - outputfile = "/dev/stdout" - opts = "udc0:1:2:3:4:5:6:7:8:9:" - while ((c = getopt(ARGC, ARGV, opts)) != -1) @{ - if (c == "u") - non_repeated_only++ - else if (c == "d") - repeated_only++ - else if (c == "c") - do_count++ - else if (index("0123456789", c) != 0) @{ - # getopt requires args to options - # this messes us up for things like -5 - if (Optarg ~ /^[0-9]+$/) - fcount = (c Optarg) + 0 - else @{ - fcount = c + 0 - Optind-- - @} - @} else - usage() - @} - - if (ARGV[Optind] ~ /^\+[0-9]+$/) @{ - charcount = substr(ARGV[Optind], 2) + 0 - Optind++ - @} - - for (i = 1; i < Optind; i++) - ARGV[i] = "" - - if (repeated_only == 0 && non_repeated_only == 0) - repeated_only = non_repeated_only = 1 - - if (ARGC - Optind == 2) @{ - outputfile = ARGV[ARGC - 1] - ARGV[ARGC - 1] = "" - @} -@} -@c endfile -@end example - -The following function, @code{are_equal}, compares the current line, -@code{$0}, to the -previous line, @code{last}. It handles skipping fields and characters. -If no field count and no character count are specified, @code{are_equal} -simply returns one or zero depending upon the result of a simple string -comparison of @code{last} and @code{$0}. Otherwise, things get more -complicated. -If fields have to be skipped, each line is broken into an array using -@code{split} -(@pxref{String Functions, ,String Manipulation Functions}); -the desired fields are then joined back into a line using @code{join}. -The joined lines are stored in @code{clast} and @code{cline}. -If no fields are skipped, @code{clast} and @code{cline} are set to -@code{last} and @code{$0}, respectively. -Finally, if characters are skipped, @code{substr} is used to strip off the -leading @code{charcount} characters in @code{clast} and @code{cline}. The -two strings are then compared and @code{are_equal} returns the result: - -@example -@c file eg/prog/uniq.awk -function are_equal( n, m, clast, cline, alast, aline) -@{ - if (fcount == 0 && charcount == 0) - return (last == $0) - - if (fcount > 0) @{ - n = split(last, alast) - m = split($0, aline) - clast = join(alast, fcount+1, n) - cline = join(aline, fcount+1, m) - @} else @{ - clast = last - cline = $0 - @} - if (charcount) @{ - clast = substr(clast, charcount + 1) - cline = substr(cline, charcount + 1) - @} - - return (clast == cline) -@} -@c endfile -@end example - -The following two rules are the body of the program. The first one is -executed only for the very first line of data. It sets @code{last} equal to -@code{$0}, so that subsequent lines of text have something to be compared to. - -The second rule does the work. The variable @code{equal} is one or zero, -depending upon the results of @code{are_equal}'s comparison. If @command{uniq} -is counting repeated lines, and the lines are equal, then it increments the @code{count} variable. -Otherwise it prints the line and resets @code{count}, -since the two lines are not equal. - -If @command{uniq} is not counting, and if the lines are equal, @code{count} is incremented. -Nothing is printed, since the point is to remove duplicates. -Otherwise, if @command{uniq} is counting repeated lines and more than -one line is seen, or if @command{uniq} is counting non-repeated lines -and only one line is seen, then the line is printed, and @code{count} -is reset. - -Finally, similar logic is used in the @code{END} rule to print the final -line of input data: - -@example -@c file eg/prog/uniq.awk -NR == 1 @{ - last = $0 - next -@} - -@{ - equal = are_equal() - - if (do_count) @{ # overrides -d and -u - if (equal) - count++ - else @{ - printf("%4d %s\n", count, last) > outputfile - last = $0 - count = 1 # reset - @} - next - @} - - if (equal) - count++ - else @{ - if ((repeated_only && count > 1) || - (non_repeated_only && count == 1)) - print last > outputfile - last = $0 - count = 1 - @} -@} - -END @{ - if (do_count) - printf("%4d %s\n", count, last) > outputfile - else if ((repeated_only && count > 1) || - (non_repeated_only && count == 1)) - print last > outputfile -@} -@c endfile -@end example - -@node Wc Program, , Uniq Program, Clones -@subsection Counting Things - -@cindex @command{wc} utility -The @command{wc} (word count) utility counts lines, words, and characters in -one or more input files. Its usage is as follows: - -@example -wc @r{[}-lwc@r{]} @r{[} @var{files} @dots{} @r{]} -@end example - -If no files are specified on the command line, @command{wc} reads its standard -input. If there are multiple files, it also prints total counts for all -the files. The options and their meanings are shown in the following list: - -@table @code -@item -l -Only count lines. - -@item -w -Only count words. -A ``word'' is a contiguous sequence of non-whitespace characters, separated -by spaces and/or tabs. Happily, this is the normal way @command{awk} separates -fields in its input data. - -@item -c -Only count characters. -@end table - -Implementing @command{wc} in @command{awk} is particularly elegant, -since @command{awk} does a lot of the work for us; it splits lines into -words (i.e., fields) and counts them, it counts lines (i.e., records), -and it can easily tell us how long a line is. - -This uses the @code{getopt} library function -(@pxref{Getopt Function, ,Processing Command-Line Options}) -and the file transition functions -(@pxref{Filetrans Function, ,Noting @value{DDF} Boundaries}). - -This version has one notable difference from traditional versions of -@command{wc}: it always prints the counts in the order lines, words, -and characters. Traditional versions note the order of the @option{-l}, -@option{-w}, and @option{-c} options on the command line, and print the -counts in that order. - -The @code{BEGIN} rule does the argument processing. The variable -@code{print_total} is true if more than one file is named on the -command line: - -@cindex @code{wc.awk} program -@example -@c file eg/prog/wc.awk -# wc.awk --- count lines, words, characters -@c endfile -@ignore -@c file eg/prog/wc.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 -@c endfile -@end ignore -@c file eg/prog/wc.awk - -# Options: -# -l only count lines -# -w only count words -# -c only count characters -# -# Default is to count lines, words, characters -# -# Requires getopt and file transition library functions - -BEGIN @{ - # let getopt print a message about - # invalid options. we ignore them - while ((c = getopt(ARGC, ARGV, "lwc")) != -1) @{ - if (c == "l") - do_lines = 1 - else if (c == "w") - do_words = 1 - else if (c == "c") - do_chars = 1 - @} - for (i = 1; i < Optind; i++) - ARGV[i] = "" - - # if no options, do all - if (! do_lines && ! do_words && ! do_chars) - do_lines = do_words = do_chars = 1 - - print_total = (ARGC - i > 2) -@} -@c endfile -@end example - -The @code{beginfile} function is simple; it just resets the counts of lines, -words, and characters to zero, and saves the current @value{FN} in -@code{fname}: - -@c NEXT ED: make it lines = words = chars = 0 -@example -@c file eg/prog/wc.awk -function beginfile(file) -@{ - chars = lines = words = 0 - fname = FILENAME -@} -@c endfile -@end example - -The @code{endfile} function adds the current file's numbers to the running -totals of lines, words, and characters. It then prints out those numbers -for the file that was just read. It relies on @code{beginfile} to reset the -numbers for the following @value{DF}: - -@c NEXT ED: make order for += be lines, words, chars -@example -@c file eg/prog/wc.awk -function endfile(file) -@{ - tchars += chars - tlines += lines - twords += words - if (do_lines) - printf "\t%d", lines -@group - if (do_words) - printf "\t%d", words -@end group - if (do_chars) - printf "\t%d", chars - printf "\t%s\n", fname -@} -@c endfile -@end example - -There is one rule that is executed for each line. It adds the length of -the record, plus one, to @code{chars}. Adding one plus the record length -is needed because the newline character separating records (the value -of @code{RS}) is not part of the record itself, and thus not included -in its length. Next, @code{lines} is incremented for each line read, -and @code{words} is incremented by the value of @code{NF}, which is the -number of ``words'' on this line:@footnote{@command{wc} can't just use -the value of @code{FNR} in @code{endfile}. If you examine the code in -@ref{Filetrans Function, ,Noting @value{DDF} Boundaries}, -you will see that @code{FNR} has already been reset by the time -@code{endfile} is called.} -@c ONE DAY: make the above an exercise, instead of giving away the answer. - -@example -@c file eg/prog/wc.awk -# do per line -@{ - chars += length($0) + 1 # get newline - lines++ - words += NF -@} -@c endfile -@end example - -Finally, the @code{END} rule simply prints the totals for all the files. - -@example -@c file eg/prog/wc.awk -END @{ - if (print_total) @{ - if (do_lines) - printf "\t%d", tlines - if (do_words) - printf "\t%d", twords - if (do_chars) - printf "\t%d", tchars - print "\ttotal" - @} -@} -@c endfile -@end example - -@node Miscellaneous Programs, , Clones, Sample Programs -@section A Grab Bag of @command{awk} Programs - -This @value{SECTION} is a large ``grab bag'' of miscellaneous programs. -We hope you find them both interesting and enjoyable. - -@menu -* Dupword Program:: Finding duplicated words in a document. -* Alarm Program:: An alarm clock. -* Translate Program:: A program similar to the @command{tr} utility. -* Labels Program:: Printing mailing labels. -* Word Sorting:: A program to produce a word usage count. -* History Sorting:: Eliminating duplicate entries from a history - file. -* Extract Program:: Pulling out programs from Texinfo source - files. -* Simple Sed:: A Simple Stream Editor. -* Igawk Program:: A wrapper for @command{awk} that includes - files. -@end menu - -@node Dupword Program, Alarm Program, Miscellaneous Programs, Miscellaneous Programs -@subsection Finding Duplicated Words in a Document - -A common error when writing large amounts of prose is to accidentally -duplicate words. Typically you will see this in text as something like ``the -the program does the following @dots{}.'' When the text is online, often -the duplicated words occur at the end of one line and the beginning of -another, making them very difficult to spot. -@c as here! - -This program, @file{dupword.awk}, scans through a file one line at a time -and looks for adjacent occurrences of the same word. It also saves the last -word on a line (in the variable @code{prev}) for comparison with the first -word on the next line. - -@cindex Texinfo -The first two statements make sure that the line is all lowercase, -so that, for example, ``The'' and ``the'' compare equal to each other. -The next statement replaces non-alphanumeric and non-whitespace characters -with spaces, so that punctuation does not affect the comparison either. -The characters are replaced with spaces so that formatting controls -don't create nonsense words (e.g., the Texinfo @samp{@@code@{NF@}} -becomes @samp{codeNF} if punctuation is simply deleted). The record is -then re-split into fields, yielding just the actual words on the line, -and insuring that there are no empty fields. - -If there are no fields left after removing all the punctuation, the -current record is skipped. Otherwise, the program loops through each -word, comparing it to the previous one: - -@cindex @code{dupword.awk} program -@example -@c file eg/prog/dupword.awk -# dupword.awk --- find duplicate words in text -@c endfile -@ignore -@c file eg/prog/dupword.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# December 1991 -# Revised October 2000 - -@c endfile -@end ignore -@c file eg/prog/dupword.awk -@{ - $0 = tolower($0) - gsub(/[^[:alnum:][:blank:]]/, " "); - $0 = $0 # re-split - if (NF == 0) - next - if ($1 == prev) - printf("%s:%d: duplicate %s\n", - FILENAME, FNR, $1) - for (i = 2; i <= NF; i++) - if ($i == $(i-1)) - printf("%s:%d: duplicate %s\n", - FILENAME, FNR, $i) - prev = $NF -@} -@c endfile -@end example - -@node Alarm Program, Translate Program, Dupword Program, Miscellaneous Programs -@subsection An Alarm Clock Program -@cindex insomnia, cure for -@cindex Robbins, Arnold -@quotation -@i{Nothing cures insomnia like a ringing alarm clock.}@* -Arnold Robbins -@end quotation - -The following program is a simple ``alarm clock'' program. -You give it a time of day and an optional message. At the specified time, -it prints the message on the standard output. In addition, you can give it -the number of times to repeat the message as well as a delay between -repetitions. - -This program uses the @code{gettimeofday} function from -@ref{Gettimeofday Function, ,Managing the Time of Day}. - -All the work is done in the @code{BEGIN} rule. The first part is argument -checking and setting of defaults: the delay, the count, and the message to -print. If the user supplied a message without the ASCII BEL -character (known as the ``alert'' character, @code{"\a"}), then it is added to -the message. (On many systems, printing the ASCII BEL generates some sort -of audible alert. Thus when the alarm goes off, the system calls attention -to itself in case the user is not looking at their computer or terminal.): - -@cindex @code{alarm.awk} program -@example -@c file eg/prog/alarm.awk -# alarm.awk --- set an alarm -# -# Requires gettimeofday library function -@c endfile -@ignore -@c file eg/prog/alarm.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/prog/alarm.awk -# usage: alarm time [ "message" [ count [ delay ] ] ] - -BEGIN \ -@{ - # Initial argument sanity checking - usage1 = "usage: alarm time ['message' [count [delay]]]" - usage2 = sprintf("\t(%s) time ::= hh:mm", ARGV[1]) - - if (ARGC < 2) @{ - print usage1 > "/dev/stderr" - print usage2 > "/dev/stderr" - exit 1 - @} else if (ARGC == 5) @{ - delay = ARGV[4] + 0 - count = ARGV[3] + 0 - message = ARGV[2] - @} else if (ARGC == 4) @{ - count = ARGV[3] + 0 - message = ARGV[2] - @} else if (ARGC == 3) @{ - message = ARGV[2] - @} else if (ARGV[1] !~ /[0-9]?[0-9]:[0-9][0-9]/) @{ - print usage1 > "/dev/stderr" - print usage2 > "/dev/stderr" - exit 1 - @} - - # set defaults for once we reach the desired time - if (delay == 0) - delay = 180 # 3 minutes -@group - if (count == 0) - count = 5 -@end group - if (message == "") - message = sprintf("\aIt is now %s!\a", ARGV[1]) - else if (index(message, "\a") == 0) - message = "\a" message "\a" -@c endfile -@end example - -The next @value{SECTION} of code turns the alarm time into hours and minutes, -converts it (if necessary) to a 24-hour clock, and then turns that -time into a count of the seconds since midnight. Next it turns the current -time into a count of seconds since midnight. The difference between the two -is how long to wait before setting off the alarm: - -@example -@c file eg/prog/alarm.awk - # split up alarm time - split(ARGV[1], atime, ":") - hour = atime[1] + 0 # force numeric - minute = atime[2] + 0 # force numeric - - # get current broken down time - gettimeofday(now) - - # if time given is 12-hour hours and it's after that - # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m., - # then add 12 to real hour - if (hour < 12 && now["hour"] > hour) - hour += 12 - - # set target time in seconds since midnight - target = (hour * 60 * 60) + (minute * 60) - - # get current time in seconds since midnight - current = (now["hour"] * 60 * 60) + \ - (now["minute"] * 60) + now["second"] - - # how long to sleep for - naptime = target - current - if (naptime <= 0) @{ - print "time is in the past!" > "/dev/stderr" - exit 1 - @} -@c endfile -@end example - -@cindex @command{sleep} utility -Finally, the program uses the @code{system} function -(@pxref{I/O Functions, ,Input/Output Functions}) -to call the @command{sleep} utility. The @command{sleep} utility simply pauses -for the given number of seconds. If the exit status is not zero, -the program assumes that @command{sleep} was interrupted and exits. If -@command{sleep} exited with an OK status (zero), then the program prints the -message in a loop, again using @command{sleep} to delay for however many -seconds are necessary: - -@example -@c file eg/prog/alarm.awk - # zzzzzz..... go away if interrupted - if (system(sprintf("sleep %d", naptime)) != 0) - exit 1 - - # time to notify! - command = sprintf("sleep %d", delay) - for (i = 1; i <= count; i++) @{ - print message - # if sleep command interrupted, go away - if (system(command) != 0) - break - @} - - exit 0 -@} -@c endfile -@end example - -@node Translate Program, Labels Program, Alarm Program, Miscellaneous Programs -@subsection Transliterating Characters - -@cindex @command{tr} utility -The system @command{tr} utility transliterates characters. For example, it is -often used to map uppercase letters into lowercase for further processing: - -@example -@var{generate data} | tr 'A-Z' 'a-z' | @var{process data} @dots{} -@end example - -@command{tr} requires two lists of characters.@footnote{On some older -System V systems, -@command{tr} may require that the lists be written as -range expressions enclosed in square brackets (@samp{[a-z]}) and quoted, -to prevent the shell from attempting a @value{FN} expansion. This is -not a feature.} When processing the input, the first character in the -first list is replaced with the first character in the second list, -the second character in the first list is replaced with the second -character in the second list, and so on. If there are more characters -in the ``from'' list than in the ``to'' list, the last character of the -``to'' list is used for the remaining characters in the ``from'' list. - -Some time ago, -@c early or mid-1989! -a user proposed that a transliteration function should -be added to @command{gawk}. -@c Wishing to avoid gratuitous new features, -@c at least theoretically -The following program was written to -prove that character transliteration could be done with a user-level -function. This program is not as complete as the system @command{tr} utility -but it does most of the job. - -The @command{translate} program demonstrates one of the few weaknesses -of standard @command{awk}: dealing with individual characters is very -painful, requiring repeated use of the @code{substr}, @code{index}, -and @code{gsub} built-in functions -(@pxref{String Functions, ,String Manipulation Functions}).@footnote{This -program was written before @command{gawk} acquired the ability to -split each character in a string into separate array elements.} -@c Exercise: How might you use this new feature to simplify the program? - -There are two functions. The first, @code{stranslate}, takes three -arguments: - -@table @code -@item from -A list of characters to translate from. - -@item to -A list of characters to translate to. - -@item target -The string to do the translation on. -@end table - -Associative arrays make the translation part fairly easy. @code{t_ar} holds -the ``to'' characters, indexed by the ``from'' characters. Then a simple -loop goes through @code{from}, one character at a time. For each character -in @code{from}, if the character appears in @code{target}, @code{gsub} -is used to change it to the corresponding @code{to} character. - -The @code{translate} function simply calls @code{stranslate} using @code{$0} -as the target. The main program sets two global variables, @code{FROM} and -@code{TO}, from the command line, and then changes @code{ARGV} so that -@command{awk} reads from the standard input. - -Finally, the processing rule simply calls @code{translate} for each record: - -@cindex @code{translate.awk} program -@example -@c file eg/prog/translate.awk -# translate.awk --- do tr-like stuff -@c endfile -@ignore -@c file eg/prog/translate.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# August 1989 - -@c endfile -@end ignore -@c file eg/prog/translate.awk -# Bugs: does not handle things like: tr A-Z a-z, it has -# to be spelled out. However, if `to' is shorter than `from', -# the last character in `to' is used for the rest of `from'. - -function stranslate(from, to, target, lf, lt, t_ar, i, c) -@{ - lf = length(from) - lt = length(to) - for (i = 1; i <= lt; i++) - t_ar[substr(from, i, 1)] = substr(to, i, 1) - if (lt < lf) - for (; i <= lf; i++) - t_ar[substr(from, i, 1)] = substr(to, lt, 1) - for (i = 1; i <= lf; i++) @{ - c = substr(from, i, 1) - if (index(target, c) > 0) - gsub(c, t_ar[c], target) - @} - return target -@} - -function translate(from, to) -@{ - return $0 = stranslate(from, to, $0) -@} - -# main program -BEGIN @{ -@group - if (ARGC < 3) @{ - print "usage: translate from to" > "/dev/stderr" - exit - @} -@end group - FROM = ARGV[1] - TO = ARGV[2] - ARGC = 2 - ARGV[1] = "-" -@} - -@{ - translate(FROM, TO) - print -@} -@c endfile -@end example - -While it is possible to do character transliteration in a user-level -function, it is not necessarily efficient, and we (the @command{gawk} -authors) started to consider adding a built-in function. However, -shortly after writing this program, we learned that the System V Release 4 -@command{awk} had added the @code{toupper} and @code{tolower} functions -(@pxref{String Functions, ,String Manipulation Functions}). -These functions handle the vast majority of the -cases where character transliteration is necessary, and so we chose to -simply add those functions to @command{gawk} as well and then leave well -enough alone. - -An obvious improvement to this program would be to set up the -@code{t_ar} array only once, in a @code{BEGIN} rule. However, this -assumes that the ``from'' and ``to'' lists -will never change throughout the lifetime of the program. - -@node Labels Program, Word Sorting, Translate Program, Miscellaneous Programs -@subsection Printing Mailing Labels - -Here is a ``real world''@footnote{``Real world'' is defined as -``a program actually used to get something done.''} -program. This -script reads lists of names and -addresses and generates mailing labels. Each page of labels has 20 labels -on it, two across and ten down. The addresses are guaranteed to be no more -than five lines of data. Each address is separated from the next by a blank -line. - -The basic idea is to read 20 labels worth of data. Each line of each label -is stored in the @code{line} array. The single rule takes care of filling -the @code{line} array and printing the page when 20 labels have been read. - -The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that -@command{awk} splits records at blank lines -(@pxref{Records, ,How Input Is Split into Records}). -It sets @code{MAXLINES} to 100, since 100 is the maximum number -of lines on the page (20 * 5 = 100). - -Most of the work is done in the @code{printpage} function. -The label lines are stored sequentially in the @code{line} array. But they -have to print horizontally; @code{line[1]} next to @code{line[6]}, -@code{line[2]} next to @code{line[7]}, and so on. Two loops are used to -accomplish this. The outer loop, controlled by @code{i}, steps through -every 10 lines of data; this is each row of labels. The inner loop, -controlled by @code{j}, goes through the lines within the row. -As @code{j} goes from 0 to 4, @samp{i+j} is the @code{j}'th line in -the row, and @samp{i+j+5} is the entry next to it. The output ends up -looking something like this: - -@example -line 1 line 6 -line 2 line 7 -line 3 line 8 -line 4 line 9 -line 5 line 10 -@dots{} -@end example - -As a final note, an extra blank line is printed at lines 21 and 61, to keep -the output lined up on the labels. This is dependent on the particular -brand of labels in use when the program was written. You will also note -that there are two blank lines at the top and two blank lines at the bottom. - -The @code{END} rule arranges to flush the final page of labels; there may -not have been an even multiple of 20 labels in the data: - -@cindex @code{labels.awk} program -@example -@c file eg/prog/labels.awk -# labels.awk --- print mailing labels -@c endfile -@ignore -@c file eg/prog/labels.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# June 1992 -@c endfile -@end ignore -@c file eg/prog/labels.awk - -# Each label is 5 lines of data that may have blank lines. -# The label sheets have 2 blank lines at the top and 2 at -# the bottom. - -BEGIN @{ RS = "" ; MAXLINES = 100 @} - -function printpage( i, j) -@{ - if (Nlines <= 0) - return - - printf "\n\n" # header - - for (i = 1; i <= Nlines; i += 10) @{ - if (i == 21 || i == 61) - print "" - for (j = 0; j < 5; j++) @{ - if (i + j > MAXLINES) - break - printf " %-41s %s\n", line[i+j], line[i+j+5] - @} - print "" - @} - - printf "\n\n" # footer - - for (i in line) - line[i] = "" -@} - -# main rule -@{ - if (Count >= 20) @{ - printpage() - Count = 0 - Nlines = 0 - @} - n = split($0, a, "\n") - for (i = 1; i <= n; i++) - line[++Nlines] = a[i] - for (; i <= 5; i++) - line[++Nlines] = "" - Count++ -@} - -END \ -@{ - printpage() -@} -@c endfile -@end example - -@node Word Sorting, History Sorting, Labels Program, Miscellaneous Programs -@subsection Generating Word Usage Counts - -@c NEXT ED: Rewrite this whole section and example -The following @command{awk} program prints -the number of occurrences of each word in its input. It illustrates the -associative nature of @command{awk} arrays by using strings as subscripts. It -also demonstrates the @samp{for @var{index} in @var{array}} mechanism. -Finally, it shows how @command{awk} is used in conjunction with other -utility programs to do a useful task of some complexity with a minimum of -effort. Some explanations follow the program listing: - -@example -# Print list of word frequencies -@{ - for (i = 1; i <= NF; i++) - freq[$i]++ -@} - -END @{ - for (word in freq) - printf "%s\t%d\n", word, freq[word] -@} -@end example - -@c Exercise: Use asort() here - -This program has two rules. The -first rule, because it has an empty pattern, is executed for every input line. -It uses @command{awk}'s field-accessing mechanism -(@pxref{Fields, ,Examining Fields}) to pick out the individual words from -the line, and the built-in variable @code{NF} (@pxref{Built-in Variables}) -to know how many fields are available. -For each input word, it increments an element of the array @code{freq} to -reflect that the word has been seen an additional time. - -The second rule, because it has the pattern @code{END}, is not executed -until the input has been exhausted. It prints out the contents of the -@code{freq} table that has been built up inside the first action. -This program has several problems that would prevent it from being -useful by itself on real text files: - -@itemize @bullet -@item -Words are detected using the @command{awk} convention that fields are -separated just by whitespace. Other characters in the input (except -newlines) don't have any special meaning to @command{awk}. This means that -punctuation characters count as part of words. - -@item -The @command{awk} language considers upper- and lowercase characters to be -distinct. Therefore, ``bartender'' and ``Bartender'' are not treated -as the same word. This is undesirable, since in normal text, words -are capitalized if they begin sentences, and a frequency analyzer should not -be sensitive to capitalization. - -@item -The output does not come out in any useful order. You're more likely to be -interested in which words occur most frequently or in having an alphabetized -table of how frequently each word occurs. -@end itemize - -@cindex @command{sort} utility -The way to solve these problems is to use some of @command{awk}'s more advanced -features. First, we use @code{tolower} to remove -case distinctions. Next, we use @code{gsub} to remove punctuation -characters. Finally, we use the system @command{sort} utility to process the -output of the @command{awk} script. Here is the new version of -the program: - -@cindex @code{wordfreq.awk} program -@example -@c file eg/prog/wordfreq.awk -# wordfreq.awk --- print list of word frequencies - -@{ - $0 = tolower($0) # remove case distinctions - # remove punctuation - gsub(/[^[:alnum:]_[:blank:]]/, "", $0) - for (i = 1; i <= NF; i++) - freq[$i]++ -@} - -END @{ - for (word in freq) - printf "%s\t%d\n", word, freq[word] -@} -@c endfile -@end example - -Assuming we have saved this program in a file named @file{wordfreq.awk}, -and that the data is in @file{file1}, the following pipeline: - -@example -awk -f wordfreq.awk file1 | sort +1 -nr -@end example - -@noindent -produces a table of the words appearing in @file{file1} in order of -decreasing frequency. The @command{awk} program suitably massages the -data and produces a word frequency table, which is not ordered. - -The @command{awk} script's output is then sorted by the @command{sort} -utility and printed on the terminal. The options given to @command{sort} -specify a sort that uses the second field of each input line (skipping -one field), that the sort keys should be treated as numeric quantities -(otherwise @samp{15} would come before @samp{5}), and that the sorting -should be done in descending (reverse) order. - -The @command{sort} could even be done from within the program, by changing -the @code{END} action to: - -@example -@c file eg/prog/wordfreq.awk -END @{ - sort = "sort +1 -nr" - for (word in freq) - printf "%s\t%d\n", word, freq[word] | sort - close(sort) -@} -@c endfile -@end example - -This way of sorting must be used on systems that do not -have true pipes at the command-line (or batch-file) level. -See the general operating system documentation for more information on how -to use the @command{sort} program. - -@node History Sorting, Extract Program, Word Sorting, Miscellaneous Programs -@subsection Removing Duplicates from Unsorted Text - -The @command{uniq} program -(@pxref{Uniq Program, ,Printing Non-Duplicated Lines of Text}), -removes duplicate lines from @emph{sorted} data. - -Suppose, however, you need to remove duplicate lines from a @value{DF} but -that you want to preserve the order the lines are in. A good example of -this might be a shell history file. The history file keeps a copy of all -the commands you have entered, and it is not unusual to repeat a command -several times in a row. Occasionally you might want to compact the history -by removing duplicate entries. Yet it is desirable to maintain the order -of the original commands. - -This simple program does the job. It uses two arrays. The @code{data} -array is indexed by the text of each line. -For each line, @code{data[$0]} is incremented. -If a particular line has not -been seen before, then @code{data[$0]} is zero. -In this case, the text of the line is stored in @code{lines[count]}. -Each element of @code{lines} is a unique command, and the indices of -@code{lines} indicate the order in which those lines are encountered. -The @code{END} rule simply prints out the lines, in order: - -@cindex Rakitzis, Byron -@cindex @code{histsort.awk} program -@example -@c file eg/prog/histsort.awk -# histsort.awk --- compact a shell history file -# Thanks to Byron Rakitzis for the general idea -@c endfile -@ignore -@c file eg/prog/histsort.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 - -@c endfile -@end ignore -@c file eg/prog/histsort.awk -@group -@{ - if (data[$0]++ == 0) - lines[++count] = $0 -@} -@end group - -END @{ - for (i = 1; i <= count; i++) - print lines[i] -@} -@c endfile -@end example - -This program also provides a foundation for generating other useful -information. For example, using the following @code{print} statement in the -@code{END} rule indicates how often a particular command is used: - -@example -print data[lines[i]], lines[i] -@end example - -This works because @code{data[$0]} is incremented each time a line is -seen. - -@node Extract Program, Simple Sed, History Sorting, Miscellaneous Programs -@subsection Extracting Programs from Texinfo Source Files - -@ifnotinfo -Both this chapter and the previous chapter -(@ref{Library Functions, ,A Library of @command{awk} Functions}) -present a large number of @command{awk} programs. -@end ifnotinfo -@ifinfo -The nodes -@ref{Library Functions, ,A Library of @command{awk} Functions}, -and @ref{Sample Programs, ,Practical @command{awk} Programs}, -are the top level nodes for a large number of @command{awk} programs. -@end ifinfo -If you want to experiment with these programs, it is tedious to have to type -them in by hand. Here we present a program that can extract parts of a -Texinfo input file into separate files. - -@cindex Texinfo -This @value{DOCUMENT} is written in Texinfo, the GNU project's document -formatting -language. -A single Texinfo source file can be used to produce both -printed and online documentation. -@ifnotinfo -Texinfo is fully documented in the book -@cite{Texinfo---The GNU Documentation Format}, -available from the Free Software Foundation. -@end ifnotinfo -@ifinfo -The Texinfo language is described fully, starting with -@ref{Top}. -@end ifinfo - -For our purposes, it is enough to know three things about Texinfo input -files: - -@itemize @bullet -@item -The ``at'' symbol (@samp{@@}) is special in Texinfo, much as -the backslash (@samp{\}) is in C -or @command{awk}. Literal @samp{@@} symbols are represented in Texinfo source -files as @samp{@@@@}. - -@item -Comments start with either @samp{@@c} or @samp{@@comment}. -The file extraction program works by using special comments that start -at the beginning of a line. - -@item -Lines containing @samp{@@group} and @samp{@@end group} commands bracket -example text that should not be split across a page boundary. -(Unfortunately, @TeX{} isn't always smart enough to do things exactly right -and we have to give it some help.) -@end itemize - -The following program, @file{extract.awk}, reads through a Texinfo source -file and does two things, based on the special comments. -Upon seeing @samp{@w{@@c system @dots{}}}, -it runs a command, by extracting the command text from the -control line and passing it on to the @code{system} function -(@pxref{I/O Functions, ,Input/Output Functions}). -Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to -the file @var{filename}, until @samp{@@c endfile} is encountered. -The rules in @file{extract.awk} match either @samp{@@c} or -@samp{@@comment} by letting the @samp{omment} part be optional. -Lines containing @samp{@@group} and @samp{@@end group} are simply removed. -@file{extract.awk} uses the @code{join} library function -(@pxref{Join Function, ,Merging an Array into a String}). - -The example programs in the online Texinfo source for @cite{@value{TITLE}} -(@file{gawk.texi}) have all been bracketed inside @samp{file} and -@samp{endfile} lines. The @command{gawk} distribution uses a copy of -@file{extract.awk} to extract the sample programs and install many -of them in a standard directory where @command{gawk} can find them. -The Texinfo file looks something like this: - -@example -@dots{} -This program has a @@code@{BEGIN@} rule, -that prints a nice message: - -@@example -@@c file examples/messages.awk -BEGIN @@@{ print "Don't panic!" @@@} -@@c end file -@@end example - -It also prints some final advice: - -@@example -@@c file examples/messages.awk -END @@@{ print "Always avoid bored archeologists!" @@@} -@@c end file -@@end example -@dots{} -@end example - -@file{extract.awk} begins by setting @code{IGNORECASE} to one, so that -mixed upper- and lowercase letters in the directives won't matter. - -The first rule handles calling @code{system}, checking that a command is -given (@code{NF} is at least three) and also checking that the command -exits with a zero exit status, signifying OK: - -@cindex @code{extract.awk} program -@example -@c file eg/prog/extract.awk -# extract.awk --- extract files and run programs -# from texinfo files -@c endfile -@ignore -@c file eg/prog/extract.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# May 1993 -# Revised September 2000 - -@c endfile -@end ignore -@c file eg/prog/extract.awk -BEGIN @{ IGNORECASE = 1 @} - -/^@@c(omment)?[ \t]+system/ \ -@{ - if (NF < 3) @{ - e = (FILENAME ":" FNR) - e = (e ": badly formed `system' line") - print e > "/dev/stderr" - next - @} - $1 = "" - $2 = "" - stat = system($0) - if (stat != 0) @{ - e = (FILENAME ":" FNR) - e = (e ": warning: system returned " stat) - print e > "/dev/stderr" - @} -@} -@c endfile -@end example - -@noindent -The variable @code{e} is used so that the function -fits nicely on the -@ifnotinfo -page. -@end ifnotinfo -@ifnottex -screen. -@end ifnottex - -The second rule handles moving data into files. It verifies that a -@value{FN} is given in the directive. If the file named is not the -current file, then the current file is closed. Keeping the current file -open until a new file is encountered allows the use of the @samp{>} -redirection for printing the contents, keeping open file management -simple. - -The @samp{for} loop does the work. It reads lines using @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}). -For an unexpected end of file, it calls the @code{@w{unexpected_eof}} -function. If the line is an ``endfile'' line, then it breaks out of -the loop. -If the line is an @samp{@@group} or @samp{@@end group} line, then it -ignores it and goes on to the next line. -Similarly, comments within examples are also ignored. - -Most of the work is in the following few lines. If the line has no @samp{@@} -symbols, the program can print it directly. -Otherwise, each leading @samp{@@} must be stripped off. -To remove the @samp{@@} symbols, the line is split into separate elements of -the array @code{a}, using the @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}). -The @samp{@@} symbol is used as the separator character. -Each element of @code{a} that is empty indicates two successive @samp{@@} -symbols in the original line. For each two empty elements (@samp{@@@@} in -the original file), we have to add a single @samp{@@} symbol back in. - -When the processing of the array is finished, @code{join} is called with the -value of @code{SUBSEP}, to rejoin the pieces back into a single -line. That line is then printed to the output file: - -@example -@c file eg/prog/extract.awk -/^@@c(omment)?[ \t]+file/ \ -@{ - if (NF != 3) @{ - e = (FILENAME ":" FNR ": badly formed `file' line") - print e > "/dev/stderr" - next - @} - if ($3 != curfile) @{ - if (curfile != "") - close(curfile) - curfile = $3 - @} - - for (;;) @{ - if ((getline line) <= 0) - unexpected_eof() - if (line ~ /^@@c(omment)?[ \t]+endfile/) - break - else if (line ~ /^@@(end[ \t]+)?group/) - continue - else if (line ~ /^@@c(omment+)?[ \t]+/) - continue - if (index(line, "@@") == 0) @{ - print line > curfile - continue - @} - n = split(line, a, "@@") - # if a[1] == "", means leading @@, - # don't add one back in. - for (i = 2; i <= n; i++) @{ - if (a[i] == "") @{ # was an @@@@ - a[i] = "@@" - if (a[i+1] == "") - i++ - @} - @} - print join(a, 1, n, SUBSEP) > curfile - @} -@} -@c endfile -@end example - -An important thing to note is the use of the @samp{>} redirection. -Output done with @samp{>} only opens the file once; it stays open and -subsequent output is appended to the file -(@pxref{Redirection, , Redirecting Output of @code{print} and @code{printf}}). -This makes it easy to mix program text and explanatory prose for the same -sample source file (as has been done here!) without any hassle. The file is -only closed when a new data @value{FN} is encountered or at the end of the -input file. - -Finally, the function @code{@w{unexpected_eof}} prints an appropriate -error message and then exits. -The @code{END} rule handles the final cleanup, closing the open file: - -@c function lb put on same line for page breaking. sigh -@example -@c file eg/prog/extract.awk -@group -function unexpected_eof() @{ - printf("%s:%d: unexpected EOF or error\n", - FILENAME, FNR) > "/dev/stderr" - exit 1 -@} -@end group - -END @{ - if (curfile) - close(curfile) -@} -@c endfile -@end example - -@node Simple Sed, Igawk Program, Extract Program, Miscellaneous Programs -@subsection A Simple Stream Editor - -@cindex @command{sed} utility -@cindex stream editor -The @command{sed} utility is a ``stream editor,'' a program that reads a -stream of data, makes changes to it, and passes it on. -It is often used to make global changes to a large file or to a stream -of data generated by a pipeline of commands. -While @command{sed} is a complicated program in its own right, its most common -use is to perform global substitutions in the middle of a pipeline: - -@example -command1 < orig.data | sed 's/old/new/g' | command2 > result -@end example - -Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp -@samp{old} on each input line and globally replace it with the text -@samp{new}, (i.e., all the occurrences on a line). This is similar to -@command{awk}'s @code{gsub} function -(@pxref{String Functions, ,String Manipulation Functions}). - -The following program, @file{awksed.awk}, accepts at least two command-line -arguments: the pattern to look for and the text to replace it with. Any -additional arguments are treated as data @value{FN}s to process. If none -are provided, the standard input is used: - -@cindex Brennan, Michael -@cindex @command{awksed.awk} program -@cindex simple stream editor -@cindex stream editor, simple -@example -@c file eg/prog/awksed.awk -# awksed.awk --- do s/foo/bar/g using just print -# Thanks to Michael Brennan for the idea -@c endfile -@ignore -@c file eg/prog/awksed.awk -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# August 1995 - -@c endfile -@end ignore -@c file eg/prog/awksed.awk -function usage() -@{ - print "usage: awksed pat repl [files...]" > "/dev/stderr" - exit 1 -@} - -BEGIN @{ - # validate arguments - if (ARGC < 3) - usage() - - RS = ARGV[1] - ORS = ARGV[2] - - # don't use arguments as files - ARGV[1] = ARGV[2] = "" -@} - -@group -# look ma, no hands! -@{ - if (RT == "") - printf "%s", $0 - else - print -@} -@end group -@c endfile -@end example - -The program relies on @command{gawk}'s ability to have @code{RS} be a regexp, -as well as on the setting of @code{RT} to the actual text that terminates the -record (@pxref{Records, ,How Input Is Split into Records}). - -The idea is to have @code{RS} be the pattern to look for. @command{gawk} -automatically sets @code{$0} to the text between matches of the pattern. -This is text that we want to keep, unmodified. Then, by setting @code{ORS} -to the replacement text, a simple @code{print} statement outputs the -text we want to keep, followed by the replacement text. - -There is one wrinkle to this scheme, which is what to do if the last record -doesn't end with text that matches @code{RS}. Using a @code{print} -statement unconditionally prints the replacement text, which is not correct. -However, if the file did not end in text that matches @code{RS}, @code{RT} -is set to the null string. In this case, we can print @code{$0} using -@code{printf} -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). - -The @code{BEGIN} rule handles the setup, checking for the right number -of arguments and calling @code{usage} if there is a problem. Then it sets -@code{RS} and @code{ORS} from the command-line arguments and sets -@code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they are -not treated as @value{FN}s -(@pxref{ARGC and ARGV, , Using @code{ARGC} and @code{ARGV}}). - -The @code{usage} function prints an error message and exits. -Finally, the single rule handles the printing scheme outlined above, -using @code{print} or @code{printf} as appropriate, depending upon the -value of @code{RT}. - -@ignore -Exercise, compare the performance of this version with the more -straightforward: - -BEGIN { - pat = ARGV[1] - repl = ARGV[2] - ARGV[1] = ARGV[2] = "" -} - -{ gsub(pat, repl); print } - -Exercise: what are the advantages and disadvantages of this version vs. sed? - Advantage: egrep regexps - speed (?) - Disadvantage: no & in replacement text - -Others? -@end ignore - -@node Igawk Program, , Simple Sed, Miscellaneous Programs -@subsection An Easy Way to Use Library Functions - -Using library functions in @command{awk} can be very beneficial. It -encourages code reuse and the writing of general functions. Programs are -smaller and therefore clearer. -However, using library functions is only easy when writing @command{awk} -programs; it is painful when running them, requiring multiple @option{-f} -options. If @command{gawk} is unavailable, then so too is the @env{AWKPATH} -environment variable and the ability to put @command{awk} functions into a -library directory (@pxref{Options, ,Command-Line Options}). -It would be nice to be able to write programs in the following manner: - -@example -# library functions -@@include getopt.awk -@@include join.awk -@dots{} - -# main program -BEGIN @{ - while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1) - @dots{} - @dots{} -@} -@end example - -The following program, @file{igawk.sh}, provides this service. -It simulates @command{gawk}'s searching of the @env{AWKPATH} variable -and also allows @dfn{nested} includes; i.e., a file that is included -with @samp{@@include} can contain further @samp{@@include} statements. -@command{igawk} makes an effort to only include files once, so that nested -includes don't accidentally include a library function twice. - -@command{igawk} should behave just like @command{gawk} externally. This -means it should accept all of @command{gawk}'s command-line arguments, -including the ability to have multiple source files specified via -@option{-f}, and the ability to mix command-line and library source files. - -The program is written using the POSIX Shell (@command{sh}) command language. -The way the program works is as follows: - -@enumerate -@item -Loop through the arguments, saving anything that doesn't represent -@command{awk} source code for later, when the expanded program is run. - -@item -For any arguments that do represent @command{awk} text, put the arguments into -a temporary file that will be expanded. There are two cases: - -@enumerate a -@item -Literal text, provided with @option{--source} or @option{--source=}. This -text is just echoed directly. The @command{echo} program automatically -supplies a trailing newline. - -@item -Source @value{FN}s provided with @option{-f}. We use a neat trick and echo -@samp{@@include @var{filename}} into the temporary file. Since the file -inclusion program works the way @command{gawk} does, this gets the text -of the file included into the program at the correct point. -@end enumerate - -@item -Run an @command{awk} program (naturally) over the temporary file to expand -@samp{@@include} statements. The expanded program is placed in a second -temporary file. - -@item -Run the expanded program with @command{gawk} and any other original command-line -arguments that the user supplied (such as the data @value{FN}s). -@end enumerate - -The initial part of the program turns on shell tracing if the first -argument is @samp{debug}. Otherwise, a shell @code{trap} statement -arranges to clean up any temporary files on program exit or upon an -interrupt. - -@c 2e: For the temp file handling, go with Darrel's ig=${TMP:-/tmp}/igs.$$ -@c 2e: or something as similar as possible. - -The next part loops through all the command-line arguments. -There are several cases of interest: - -@table @code -@item -- -This ends the arguments to @command{igawk}. Anything else should be passed on -to the user's @command{awk} program without being evaluated. - -@item -W -This indicates that the next option is specific to @command{gawk}. To make -argument processing easier, the @option{-W} is appended to the front of the -remaining arguments and the loop continues. (This is an @command{sh} -programming trick. Don't worry about it if you are not familiar with -@command{sh}.) - -@item -v@r{,} -F -These are saved and passed on to @command{gawk}. - -@item -f@r{,} --file@r{,} --file=@r{,} -Wfile= -The @value{FN} is saved to the temporary file @file{/tmp/ig.s.$$} with an -@samp{@@include} statement. -The @command{sed} utility is used to remove the leading option part of the -argument (e.g., @samp{--file=}). - -@item --source@r{,} --source=@r{,} -Wsource= -The source text is echoed into @file{/tmp/ig.s.$$}. - -@item --version@r{,} -Wversion -@command{igawk} prints its version number, runs @samp{gawk --version} -to get the @command{gawk} version information, and then exits. -@end table - -If none of the @option{-f}, @option{--file}, @option{-Wfile}, @option{--source}, -or @option{-Wsource} arguments are supplied, then the first non-option argument -should be the @command{awk} program. If there are no command-line -arguments left, @command{igawk} prints an error message and exits. -Otherwise, the first argument is echoed into @file{/tmp/ig.s.$$}. -In any case, after the arguments have been processed, -@file{/tmp/ig.s.$$} contains the complete text of the original @command{awk} -program. - -@cindex @command{sed} utility -@cindex stream editor -The @samp{$$} in @command{sh} represents the current process ID number. -It is often used in shell programs to generate unique temporary @value{FN}s. -This allows multiple users to run @command{igawk} without worrying -that the temporary @value{FN}s will clash. -The program is as follows: - -@cindex @code{igawk.sh} program -@example -@c file eg/prog/igawk.sh -#! /bin/sh -# igawk --- like gawk but do @@include processing -@c endfile -@ignore -@c file eg/prog/igawk.sh -# -# Arnold Robbins, arnold@@gnu.org, Public Domain -# July 1993 - -@c endfile -@end ignore -@c file eg/prog/igawk.sh -if [ "$1" = debug ] -then - set -x - shift -else - # cleanup on exit, hangup, interrupt, quit, termination - trap 'rm -f /tmp/ig.[se].$$' 0 1 2 3 15 -fi - -while [ $# -ne 0 ] # loop over arguments -do - case $1 in - --) shift; break;; - - -W) shift - set -- -W"$@@" - continue;; - - -[vF]) opts="$opts $1 '$2'" - shift;; - - -[vF]*) opts="$opts '$1'" ;; - - -f) echo @@include "$2" >> /tmp/ig.s.$$ - shift;; - - -f*) f=`echo "$1" | sed 's/-f//'` - echo @@include "$f" >> /tmp/ig.s.$$ ;; - - -?file=*) # -Wfile or --file - f=`echo "$1" | sed 's/-.file=//'` - echo @@include "$f" >> /tmp/ig.s.$$ ;; - - -?file) # get arg, $2 - echo @@include "$2" >> /tmp/ig.s.$$ - shift;; - - -?source=*) # -Wsource or --source - t=`echo "$1" | sed 's/-.source=//'` - echo "$t" >> /tmp/ig.s.$$ ;; - - -?source) # get arg, $2 - echo "$2" >> /tmp/ig.s.$$ - shift;; - - -?version) - echo igawk: version 1.0 1>&2 - gawk --version - exit 0 ;; - - -[W-]*) opts="$opts '$1'" ;; - - *) break;; - esac - shift -done - -if [ ! -s /tmp/ig.s.$$ ] -then -@group - if [ -z "$1" ] - then - echo igawk: no program! 1>&2 - exit 1 -@end group - else - echo "$1" > /tmp/ig.s.$$ - shift - fi -fi - -# at this point, /tmp/ig.s.$$ has the program -@c endfile -@end example - -The @command{awk} program to process @samp{@@include} directives -reads through the program, one line at a time, using @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}). The input -@value{FN}s and @samp{@@include} statements are managed using a stack. -As each @samp{@@include} is encountered, the current @value{FN} is -``pushed'' onto the stack and the file named in the @samp{@@include} -directive becomes the current @value{FN}. As each file is finished, -the stack is ``popped,'' and the previous input file becomes the current -input file again. The process is started by making the original file -the first one on the stack. - -The @code{pathto} function does the work of finding the full path to -a file. It simulates @command{gawk}'s behavior when searching the -@env{AWKPATH} environment variable -(@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). -If a @value{FN} has a @samp{/} in it, no path search is done. Otherwise, -the @value{FN} is concatenated with the name of each directory in -the path, and an attempt is made to open the generated @value{FN}. -The only way to test if a file can be read in @command{awk} is to go -ahead and try to read it with @code{getline}; this is what @code{pathto} -does.@footnote{On some very old versions of @command{awk}, the test -@samp{getline junk < t} can loop forever if the file exists but is empty. -Caveat emptor.} If the file can be read, it is closed and the @value{FN} -is returned: - -@ignore -An alternative way to test for the file's existence would be to call -@samp{system("test -r " t)}, which uses the @command{test} utility to -see if the file exists and is readable. The disadvantage to this method -is that it requires creating an extra process and can thus be slightly -slower. -@end ignore - -@example -@c file eg/prog/igawk.sh -gawk -- ' -# process @@include directives - -function pathto(file, i, t, junk) -@{ - if (index(file, "/") != 0) - return file - - for (i = 1; i <= ndirs; i++) @{ - t = (pathlist[i] "/" file) -@group - if ((getline junk < t) > 0) @{ - # found it - close(t) - return t - @} -@end group - @} - return "" -@} -@c endfile -@end example - -The main program is contained inside one @code{BEGIN} rule. The first thing it -does is set up the @code{pathlist} array that @code{pathto} uses. After -splitting the path on @samp{:}, null elements are replaced with @code{"."}, -which represents the current directory: - -@example -@c file eg/prog/igawk.sh -BEGIN @{ - path = ENVIRON["AWKPATH"] - ndirs = split(path, pathlist, ":") - for (i = 1; i <= ndirs; i++) @{ - if (pathlist[i] == "") - pathlist[i] = "." - @} -@c endfile -@end example - -The stack is initialized with @code{ARGV[1]}, which will be @file{/tmp/ig.s.$$}. -The main loop comes next. Input lines are read in succession. Lines that -do not start with @samp{@@include} are printed verbatim. -If the line does start with @samp{@@include}, the @value{FN} is in @code{$2}. -@code{pathto} is called to generate the full path. If it cannot, then we -print an error message and continue. - -The next thing to check is if the file is included already. The -@code{processed} array is indexed by the full @value{FN} of each included -file and it tracks this information for us. If the file is -seen again, a warning message is printed. Otherwise, the new @value{FN} is -pushed onto the stack and processing continues. - -Finally, when @code{getline} encounters the end of the input file, the file -is closed and the stack is popped. When @code{stackptr} is less than zero, -the program is done: - -@example -@c file eg/prog/igawk.sh - stackptr = 0 - input[stackptr] = ARGV[1] # ARGV[1] is first file - - for (; stackptr >= 0; stackptr--) @{ - while ((getline < input[stackptr]) > 0) @{ - if (tolower($1) != "@@include") @{ - print - continue - @} - fpath = pathto($2) -@group - if (fpath == "") @{ - printf("igawk:%s:%d: cannot find %s\n", - input[stackptr], FNR, $2) > "/dev/stderr" - continue - @} -@end group - if (! (fpath in processed)) @{ - processed[fpath] = input[stackptr] - input[++stackptr] = fpath # push onto stack - @} else - print $2, "included in", input[stackptr], - "already included in", - processed[fpath] > "/dev/stderr" - @} - close(input[stackptr]) - @} -@}' /tmp/ig.s.$$ > /tmp/ig.e.$$ -@c endfile -@end example - -The last step is to call @command{gawk} with the expanded program, -along with the original -options and command-line arguments that the user supplied. @command{gawk}'s -exit status is passed back on to @command{igawk}'s calling program: - -@c this causes more problems than it solves, so leave it out. -@ignore -The special file @file{/dev/null} is passed as a @value{DF} to @command{gawk} -to handle an interesting case. Suppose that the user's program only has -a @code{BEGIN} rule and there are no @value{DF}s to read. -The program should exit without reading any @value{DF}s. -However, suppose that an included library file defines an @code{END} -rule of its own. In this case, @command{gawk} will hang, reading standard -input. In order to avoid this, @file{/dev/null} is explicitly added to the -command-line. Reading from @file{/dev/null} always returns an immediate -end of file indication. - -@c Hmm. Add /dev/null if $# is 0? Still messes up ARGV. Sigh. -@end ignore - -@example -@c file eg/prog/igawk.sh -eval gawk -f /tmp/ig.e.$$ $opts -- "$@@" - -exit $? -@c endfile -@end example - -This version of @command{igawk} represents my third attempt at this program. -There are three key simplifications that make the program work better: - -@itemize @bullet -@item -Using @samp{@@include} even for the files named with @option{-f} makes building -the initial collected @command{awk} program much simpler; all the -@samp{@@include} processing can be done once. - -@item -The @code{pathto} function doesn't try to save the line read with -@code{getline} when testing for the file's accessibility. Trying to save -this line for use with the main program complicates things considerably. -@c what problem does this engender though - exercise -@c answer, reading from "-" or /dev/stdin - -@item -Using a @code{getline} loop in the @code{BEGIN} rule does it all in one -place. It is not necessary to call out to a separate loop for processing -nested @samp{@@include} statements. -@end itemize - -Also, this program illustrates that it is often worthwhile to combine -@command{sh} and @command{awk} programming together. You can usually -accomplish quite a lot, without having to resort to low-level programming -in C or C++, and it is frequently easier to do certain kinds of string -and argument manipulation using the shell than it is in @command{awk}. - -Finally, @command{igawk} shows that it is not always necessary to add new -features to a program; they can often be layered on top. With @command{igawk}, -there is no real reason to build @samp{@@include} processing into -@command{gawk} itself. - -@cindex search path -@cindex directory search -@cindex path, search -@cindex search path, for source files -As an additional example of this, consider the idea of having two -files in a directory in the search path: - -@table @file -@item default.awk -This file contains a set of default library functions, such -as @code{getopt} and @code{assert}. - -@item site.awk -This file contains library functions that are specific to a site or -installation; i.e., locally developed functions. -Having a separate file allows @file{default.awk} to change with -new @command{gawk} releases, without requiring the system administrator to -update it each time by adding the local functions. -@end table - -One user -@c Karl Berry, karl@ileaf.com, 10/95 -suggested that @command{gawk} be modified to automatically read these files -upon startup. Instead, it would be very simple to modify @command{igawk} -to do this. Since @command{igawk} can process nested @samp{@@include} -directives, @file{default.awk} could simply contain @samp{@@include} -statements for the desired library functions. - -@c Exercise: make this change - -@ignore -@c Try this -@iftex -@page -@headings off -@majorheading III@ @ @ Appendixes -Part III provides the appendixes, the Glossary, and two licenses that cover -the @command{gawk} source code and this @value{DOCUMENT}, respectively. -It contains the following appendixes: - -@itemize @bullet -@item -@ref{Language History, ,The Evolution of the @command{awk} Language}. - -@item -@ref{Installation, ,Installing @command{gawk}}. - -@item -@ref{Notes, ,Implementation Notes}. - -@item -@ref{Basic Concepts, ,Basic Programming Concepts}. - -@item -@ref{Glossary}. - -@item -@ref{Copying, ,GNU General Public License}. - -@item -@ref{GNU Free Documentation License}. -@end itemize - -@page -@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| -@oddheading @| @| @strong{@thischapter}@ @ @ @thispage -@end iftex -@end ignore - -@node Language History, Installation, Sample Programs, Top -@appendix The Evolution of the @command{awk} Language - -This @value{DOCUMENT} describes the GNU implementation of @command{awk}, which follows -the POSIX specification. -Many long-time @command{awk} users learned @command{awk} programming -with the original @command{awk} implementation in Version 7 Unix. -(This implementation was the basis for @command{awk} in Berkeley Unix, -through 4.3--Reno. Subsequent versions of Berkeley Unix, and systems -derived from 4.4BSD--Lite, use various versions of @command{gawk} -for their @command{awk}.) -This @value{CHAPTER} briefly describes the -evolution of the @command{awk} language, with cross references to other parts -of the @value{DOCUMENT} where you can find more information. - -@menu -* V7/SVR3.1:: The major changes between V7 and System V - Release 3.1. -* SVR4:: Minor changes between System V Releases 3.1 - and 4. -* POSIX:: New features from the POSIX standard. -* BTL:: New features from the Bell Laboratories - version of @command{awk}. -* POSIX/GNU:: The extensions in @command{gawk} not in POSIX - @command{awk}. -* Contributors:: The major contributors to @command{gawk}. -@end menu - -@node V7/SVR3.1, SVR4, Language History, Language History -@appendixsec Major Changes Between V7 and SVR3.1 - -The @command{awk} language evolved considerably between the release of -Version 7 Unix (1978) and the new version that was first made generally available in -System V Release 3.1 (1987). This @value{SECTION} summarizes the changes, with -cross-references to further details: - -@itemize @bullet -@item -The requirement for @samp{;} to separate rules on a line -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). - -@item -User-defined functions and the @code{return} statement -(@pxref{User-defined, ,User-Defined Functions}). - -@item -The @code{delete} statement (@pxref{Delete, ,The @code{delete} Statement}). - -@item -The @code{do}-@code{while} statement -(@pxref{Do Statement, ,The @code{do}-@code{while} Statement}). - -@item -The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand}, and -@code{srand} (@pxref{Numeric Functions}). - -@item -The built-in functions @code{gsub}, @code{sub}, and @code{match} -(@pxref{String Functions, ,String Manipulation Functions}). - -@item -The built-in functions @code{close} and @code{system} -(@pxref{I/O Functions, ,Input/Output Functions}). - -@item -The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART}, -and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}). - -@item -The conditional expression using the ternary operator @samp{?:} -(@pxref{Conditional Exp, ,Conditional Expressions}). - -@item -The exponentiation operator @samp{^} -(@pxref{Arithmetic Ops, ,Arithmetic Operators}) and its assignment operator -form @samp{^=} (@pxref{Assignment Ops, ,Assignment Expressions}). - -@item -C-compatible operator precedence, which breaks some old @command{awk} -programs (@pxref{Precedence, ,Operator Precedence (How Operators Nest)}). - -@item -Regexps as the value of @code{FS} -(@pxref{Field Separators, ,Specifying How Fields Are Separated}) and as the -third argument to the @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}). - -@item -Dynamic regexps as operands of the @samp{~} and @samp{!~} operators -(@pxref{Regexp Usage, ,How to Use Regular Expressions}). - -@item -The escape sequences @samp{\b}, @samp{\f}, and @samp{\r} -(@pxref{Escape Sequences}). -(Some vendors have updated their old versions of @command{awk} to -recognize @samp{\b}, @samp{\f}, and @samp{\r}, but this is not -something you can rely on.) - -@item -Redirection of input for the @code{getline} function -(@pxref{Getline, ,Explicit Input with @code{getline}}). - -@item -Multiple @code{BEGIN} and @code{END} rules -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). - -@item -Multidimensional arrays -(@pxref{Multi-dimensional, ,Multidimensional Arrays}). -@end itemize - -@node SVR4, POSIX, V7/SVR3.1, Language History -@appendixsec Changes Between SVR3.1 and SVR4 - -@cindex @command{awk} language, V.4 version -The System V Release 4 (1989) version of Unix @command{awk} added these features -(some of which originated in @command{gawk}): - -@itemize @bullet -@item -The @code{ENVIRON} variable (@pxref{Built-in Variables}). -@c gawk and MKS awk - -@item -Multiple @option{-f} options on the command line -(@pxref{Options, ,Command-Line Options}). -@c MKS awk - -@item -The @option{-v} option for assigning variables before program execution begins -(@pxref{Options, ,Command-Line Options}). -@c GNU, Bell Laboratories & MKS together - -@item -The @option{--} option for terminating command-line options. - -@item -The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences -(@pxref{Escape Sequences}). -@c GNU, for ANSI C compat - -@item -A defined return value for the @code{srand} built-in function -(@pxref{Numeric Functions}). - -@item -The @code{toupper} and @code{tolower} built-in string functions -for case translation -(@pxref{String Functions, ,String Manipulation Functions}). - -@item -A cleaner specification for the @samp{%c} format-control letter in the -@code{printf} function -(@pxref{Control Letters, ,Format-Control Letters}). - -@item -The ability to dynamically pass the field width and precision (@code{"%*.*d"}) -in the argument list of the @code{printf} function -(@pxref{Control Letters, ,Format-Control Letters}). - -@item -The use of regexp constants, such as @code{/foo/}, as expressions, where -they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/} -(@pxref{Using Constant Regexps, ,Using Regular Expression Constants}). - -@item -Processing of escape sequences inside command-line variable assignments -(@pxref{Assignment Options, ,Assigning Variables on the Command Line}). -@end itemize - -@node POSIX, BTL, SVR4, Language History -@appendixsec Changes Between SVR4 and POSIX @command{awk} - -The POSIX Command Language and Utilities standard for @command{awk} (1992) -introduced the following changes into the language: - -@itemize @bullet -@item -The use of @option{-W} for implementation-specific options -(@pxref{Options, ,Command-Line Options}). - -@item -The use of @code{CONVFMT} for controlling the conversion of numbers -to strings (@pxref{Conversion, ,Conversion of Strings and Numbers}). - -@item -The concept of a numeric string and tighter comparison rules to go -with it (@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}). - -@item -More complete documentation of many of the previously undocumented -features of the language. -@end itemize - -The following common extensions are not permitted by the POSIX -standard: - -@c IMPORTANT! Keep this list in sync with the one in node Options - -@itemize @bullet -@item -@code{\x} escape sequences are not recognized -(@pxref{Escape Sequences}). - -@item -Newlines do not act as whitespace to separate fields when @code{FS} is -equal to a single space -(@pxref{Fields, ,Examining Fields}). - -@item -Newlines are not allowed after @samp{?} or @samp{:} -(@pxref{Conditional Exp, ,Conditional Expressions}). - -@item -The synonym @code{func} for the keyword @code{function} is not -recognized (@pxref{Definition Syntax, ,Function Definition Syntax}). - -@item -The operators @samp{**} and @samp{**=} cannot be used in -place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators}, -and @ref{Assignment Ops, ,Assignment Expressions}). - -@item -Specifying @samp{-Ft} on the command line does not set the value -of @code{FS} to be a single tab character -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). - -@item -The @code{fflush} built-in function is not supported -(@pxref{I/O Functions, ,Input/Output Functions}). -@end itemize - -@node BTL, POSIX/GNU, POSIX, Language History -@appendixsec Extensions in the Bell Laboratories @command{awk} - -@cindex extensions, Bell Laboratories @command{awk} -@cindex Kernighan, Brian -Brian Kernighan, one of the original designers of Unix @command{awk}, -has made his version available via his home page -(@pxref{Other Versions, ,Other Freely Available @command{awk} Implementations}). -This @value{SECTION} describes extensions in his version of @command{awk} that are -not in POSIX @command{awk}. - -@itemize @bullet -@item -The @samp{-mf @var{N}} and @samp{-mr @var{N}} command-line options -to set the maximum number of fields and the maximum -record size, respectively -(@pxref{Options, ,Command-Line Options}). -As a side note, his @command{awk} no longer needs these options; -it continues to accept them to avoid breaking old programs. - -@item -The @code{fflush} built-in function for flushing buffered output -(@pxref{I/O Functions, ,Input/Output Functions}). - -@item -The @samp{**} and @samp{**=} operators -(@pxref{Arithmetic Ops, ,Arithmetic Operators} -and -@ref{Assignment Ops, ,Assignment Expressions}). - -@item -The use of @code{func} as an abbreviation for @code{function} -(@pxref{Definition Syntax, ,Function Definition Syntax}). - -@ignore -@item -The @code{SYMTAB} array, that allows access to @command{awk}'s internal symbol -table. This feature is not documented, largely because -it is somewhat shakily implemented. For instance, you cannot access arrays -or array elements through it. -@end ignore -@end itemize - -The Bell Laboratories @command{awk} also incorporates the following extensions, -originally developed for @command{gawk}: - -@itemize @bullet -@item -The @samp{\x} escape sequence -(@pxref{Escape Sequences}). - -@item -The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} -special files -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). - -@item -The ability for @code{FS} and for the third -argument to @code{split} to be null strings -(@pxref{Single Character Fields, , Making Each Character a Separate Field}). - -@item -The @code{nextfile} statement -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). - -@item -The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete, ,The @code{delete} Statement}). -@end itemize - -@node POSIX/GNU, Contributors, BTL, Language History -@appendixsec Extensions in @command{gawk} Not in POSIX @command{awk} - -@ignore -I've tried to follow this general order, esp. for the 3.0 and 3.1 sections: - variables - special files - language changes (e.g., hex constants) - differences in standard awk functions - new gawk functions - new keywords - new command-line options - new ports -Within each category, be alphabetical. -@end ignore - -@cindex compatibility mode -The GNU implementation, @command{gawk}, adds a large number of features. -This @value{SECTION} lists them in the order they were added to @command{gawk}. -They can all be disabled with either the @option{--traditional} or -@option{--posix} options -(@pxref{Options, ,Command-Line Options}). - -Version 2.10 of @command{gawk} introduced the following features: - -@itemize @bullet -@item -The @env{AWKPATH} environment variable for specifying a path search for -the @option{-f} command-line option -(@pxref{Options, ,Command-Line Options}). - -@item -The @code{IGNORECASE} variable and its effects -(@pxref{Case-sensitivity, ,Case Sensitivity in Matching}). - -@item -The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and -@file{/dev/fd/@var{N}} special @value{FN}s -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). -@end itemize - -Version 2.13 of @command{gawk} introduced the following features: - -@itemize @bullet -@item -The @code{FIELDWIDTHS} variable and its effects -(@pxref{Constant Size, ,Reading Fixed-Width Data}). - -@item -The @code{systime} and @code{strftime} built-in functions for obtaining -and printing timestamps -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}). - -@item -The @option{-W lint} option to provide error and portability checking -for both the source code and at runtime -(@pxref{Options, ,Command-Line Options}). - -@item -The @option{-W compat} option to turn off the GNU extensions -(@pxref{Options, ,Command-Line Options}). - -@item -The @option{-W posix} option for full POSIX compliance -(@pxref{Options, ,Command-Line Options}). -@end itemize - -Version 2.14 of @command{gawk} introduced the following feature: - -@itemize @bullet -@item -The @code{next file} statement for skipping to the next @value{DF} -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). -@end itemize - -Version 2.15 of @command{gawk} introduced the following features: - -@itemize @bullet -@item -The @code{ARGIND} variable, which tracks the movement of @code{FILENAME} -through @code{ARGV} (@pxref{Built-in Variables}). - -@item -The @code{ERRNO} variable, which contains the system error message when -@code{getline} returns @minus{}1 or when @code{close} fails -(@pxref{Built-in Variables}). - -@item -The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and -@file{/dev/user} @value{FN} interpretation -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). - -@item -The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete, ,The @code{delete} Statement}). - -@item -The ability to use GNU-style long-named options that start with @option{--} -(@pxref{Options, ,Command-Line Options}). - -@item -The @option{--source} option for mixing command-line and library -file source code -(@pxref{Options, ,Command-Line Options}). -@end itemize - -Version 3.0 of @command{gawk} introduced the following features: - -@itemize @bullet -@item -@code{IGNORECASE} changed, now applying to string comparison as well -as regexp operations -(@pxref{Case-sensitivity, ,Case Sensitivity in Matching}). - -@item -The @code{RT} variable that contains the input text that -matched @code{RS} -(@pxref{Records, ,How Input Is Split into Records}). - -@item -Full support for both POSIX and GNU regexps -(@pxref{Regexp, , Regular Expressions}). - -@item -The @code{gensub} function for more powerful text manipulation -(@pxref{String Functions, ,String Manipulation Functions}). - -@item -The @code{strftime} function acquired a default time format, -allowing it to be called with no arguments -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}). - -@item -The ability for @code{FS} and for the third -argument to @code{split} to be null strings -(@pxref{Single Character Fields, , Making Each Character a Separate Field}). - -@item -The ability for @code{RS} to be a regexp -(@pxref{Records, ,How Input Is Split into Records}). - -@item -The @code{next file} statement became @code{nextfile} -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). - -@item -The @option{--lint-old} option to -warn about constructs that are not available in -the original Version 7 Unix version of @command{awk} -(@pxref{V7/SVR3.1, ,Major Changes Between V7 and SVR3.1}). - -@item -The @option{-m} option and the @code{fflush} function from the -Bell Laboratories research version of @command{awk} -(@pxref{Options, ,Command-Line Options}; also -@pxref{I/O Functions, ,Input/Output Functions}). - -@item -The @option{--re-interval} option to provide interval expressions in regexps -(@pxref{Regexp Operators, , Regular Expression Operators}). - -@item -The @option{--traditional} option was added as a better name for -@option{--compat} (@pxref{Options, ,Command-Line Options}). - -@item -The use of GNU Autoconf to control the configuration process -(@pxref{Quick Installation, , Compiling @command{gawk} for Unix}). - -@item -Amiga support -(@pxref{Amiga Installation, ,Installing @command{gawk} on an Amiga}). - -@end itemize - -Version 3.1 of @command{gawk} introduced the following features: - -@itemize @bullet -@item -The @code{BINMODE} special variable for non-POSIX systems, -which allows binary I/O for input and/or output files -(@pxref{PC Using, ,Using @command{gawk} on PC Operating Systems}). - -@item -The @code{LINT} special variable, which dynamically controls lint warnings -(@pxref{Built-in Variables}). - -@item -The @code{PROCINFO} array for providing process-related information -(@pxref{Built-in Variables}). - -@item -The @code{TEXTDOMAIN} special variable for setting an application's -internationalization text domain -(@pxref{Built-in Variables}, -and -@ref{Internationalization, ,Internationalization with @command{gawk}}). - -@item -The ability to use octal and hexadecimal constants in @command{awk} -program source code -(@pxref{Non-decimal-numbers, ,Octal and Hexadecimal Numbers}). - -@item -The @samp{|&} operator for two-way I/O to a coprocess -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}). - -@item -The @file{/inet} special files for TCP/IP networking using @samp{|&} -(@pxref{TCP/IP Networking, , Using @command{gawk} for Network Programming}). - -@item -The optional second argument to @code{close} that allows closing one end -of a two-way pipe to a coprocess -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}). - -@item -The optional third argument to the @code{match} function -for capturing text-matching subexpressions within a regexp -(@pxref{String Functions, , String Manipulation Functions}). - -@item -Positional specifiers in @code{printf} formats for -making translations easier -(@pxref{Printf Ordering, , Rearranging @code{printf} Arguments}). - -@item -The @code{asort} function for sorting arrays -(@pxref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}). - -@item -The @code{bindtextdomain} and @code{dcgettext} functions -for internationalization -(@pxref{Programmer i18n, ,Internationalizing @command{awk} Programs}). - -@item -The @code{extension} built-in function and the ability to add -new built-in functions dynamically -(@pxref{Dynamic Extensions, , Adding New Built-in Functions to @command{gawk}}). - -@item -The @code{mktime} built-in function for creating timestamps -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}). - -@item -The -@code{and}, -@code{or}, -@code{xor}, -@code{compl}, -@code{lshift}, -@code{rshift}, -and -@code{strtonum} built-in -functions -(@pxref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}). - -@item -@cindex @code{next file} statement -The support for @samp{next file} as two words was removed completely -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). - -@item -The @option{--dump-variables} option to print a list of all global variables -(@pxref{Options, ,Command-Line Options}). - -@item -The @option{--gen-po} command-line option and the use of a leading -underscore to mark strings that should be translated -(@pxref{String Extraction, ,Extracting Marked Strings}). - -@item -The @option{--non-decimal-data} option to allow non-decimal -input data -(@pxref{Non-decimal Data, ,Allowing Non-Decimal Input Data}). - -@item -The @option{--profile} option and @command{pgawk}, the -profiling version of @command{gawk}, for producing execution -profiles of @command{awk} programs -(@pxref{Profiling, ,Profiling Your @command{awk} Programs}). - -@item -The @option{--enable-portals} configuration option to enable special treatment of -pathnames that begin with @file{/p} as BSD portals -(@pxref{Portal Files, , Using @command{gawk} with BSD Portals}). - -@item -The use of GNU Automake to help in standardizing the configuration process -(@pxref{Quick Installation, , Compiling @command{gawk} for Unix}). - -@item -The use of GNU @code{gettext} for @command{gawk}'s own message output -(@pxref{Gawk I18N, ,@command{gawk} Can Speak Your Language}). - -@item -BeOS support -(@pxref{BeOS Installation, , Installing @command{gawk} on BeOS}). - -@item -Tandem support -(@pxref{Tandem Installation, ,Installing @command{gawk} on a Tandem}). - -@item -The Atari port became officially unsupported -(@pxref{Atari Installation, ,Installing @command{gawk} on the Atari ST}). - -@item -The source code now uses new-style function definitions, with -@command{ansi2knr} to convert the code on systems with old compilers. - -@end itemize - -@c XXX ADD MORE STUFF HERE - -@node Contributors, , POSIX/GNU, Language History -@appendixsec Major Contributors to @command{gawk} -@cindex contributors to @command{gawk} -@quotation -@i{Always give credit where credit is due.}@* -Anonymous -@end quotation - -This @value{SECTION} names the major contributors to @command{gawk} -and/or this @value{DOCUMENT}, in approximate chronological order: - -@itemize @bullet -@item -@cindex Aho, Alfred -@cindex Weinberger, Peter -@cindex Kernighan, Brian -Dr.@: Alfred V.@: Aho, -Dr.@: Peter J.@: Weinberger, and -Dr.@: Brian W.@: Kernighan, all of Bell Laboratories, -designed and implemented Unix @command{awk}, -from which @command{gawk} gets the majority of its feature set. - -@item -@cindex Rubin, Paul -Paul Rubin -did the initial design and implementation in 1986, and wrote -the first draft (around 40 pages) of this @value{DOCUMENT}. - -@item -@cindex Fenlason, Jay -Jay Fenlason -finished the initial implementation. - -@item -@cindex Close, Diane -Diane Close -revised the first draft of this @value{DOCUMENT}, bringing it -to around 90 pages. - -@item -@cindex Stallman, Richard -Richard Stallman -helped finish the implementation and the initial draft of this -@value{DOCUMENT}. -He is also the founder of the FSF and the GNU project. - -@item -@cindex Woods, John -John Woods -contributed parts of the code (mostly fixes) in -the initial version of @command{gawk}. - -@item -@cindex Trueman, David -In 1988, -David Trueman -took over primary maintenance of @command{gawk}, -making it compatible with ``new'' @command{awk}, and -greatly improving its performance. - -@item -@cindex Rankin, Pat -Pat Rankin -provided the VMS port and its documentation. - -@item -@cindex Kwok, Conrad -@cindex Garfinkle, Scott -@cindex Williams, Kent -Conrad Kwok, -Scott Garfinkle, -and -Kent Williams -did the initial ports to MS-DOS with various versions of MSC. - -@item -@cindex Peterson, Hal -Hal Peterson -provided help in porting @command{gawk} to Cray systems. - -@item -@cindex Rommel, Kai Uwe -Kai Uwe Rommel -provided the port to OS/2 and its documentation. - -@item -@cindex Jaegermann, Michal -Michal Jaegermann -provided the port to Atari systems and its documentation. -He continues to provide portability checking with DEC Alpha -systems, and has done a lot of work to make sure @command{gawk} -works on non-32-bit systems. - -@item -@cindex Fish, Fred -Fred Fish -provided the port to Amiga systems and its documentation. - -@item -@cindex Deifik, Scott -Scott Deifik -currently maintains the MS-DOS port. - -@item -@cindex Grigera, Juan -Juan Grigera -maintains the port to Win32 systems. - -@item -@cindex Hankerson, Darrel -Dr.@: Darrel Hankerson -acts as coordinator for the various ports to different PC platforms -and creates binary distributions for various PC operating systems. -He is also instrumental in keeping the documentation up to date for -the various PC platforms. - -@item -@cindex Zoulas, Christos -Christos Zoulas -provided the @code{extension} -built-in function for dynamically adding new modules. - -@item -@cindex Kahrs, J@"urgen -J@"urgen Kahrs -contributed the initial version of the TCP/IP networking -code and documentation, and motivated the inclusion of the @samp{|&} operator. - -@item -@cindex Davies, Stephen -Stephen Davies -provided the port to Tandem systems and its documentation. - -@item -@cindex Brown, Martin -Martin Brown -provided the port to BeOS and its documentation. - -@item -@cindex Peters, Arno -Arno Peters -did the initial work to convert @command{gawk} to use -GNU Automake and @code{gettext}. - -@item -@cindex Broder, Alan J.@: -Alan J.@: Broder -provided the initial version of the @code{asort} function -as well as the code for the new optional third argument to the @code{match} function. - -@item -@cindex Robbins, Arnold -Arnold Robbins -has been working on @command{gawk} since 1988, at first -helping David Trueman, and as the primary maintainer since around 1994. -@end itemize - -@node Installation, Notes, Language History, Top -@appendix Installing @command{gawk} - -@cindex Linux -@cindex GNU/Linux -This appendix provides instructions for installing @command{gawk} on the -various platforms that are supported by the developers. The primary -developer supports GNU/Linux (and Unix), whereas the other ports are -contributed. -@xref{Bugs, , Reporting Problems and Bugs}, -for the electronic mail addresses of the people who did -the respective ports. - -@menu -* Gawk Distribution:: What is in the @command{gawk} distribution. -* Unix Installation:: Installing @command{gawk} under various - versions of Unix. -* Non-Unix Installation:: Installation on Other Operating Systems. -* Unsupported:: Systems whose ports are no longer supported. -* Bugs:: Reporting Problems and Bugs. -* Other Versions:: Other freely available @command{awk} - implementations. -@end menu - -@node Gawk Distribution, Unix Installation, Installation, Installation -@appendixsec The @command{gawk} Distribution - -This @value{SECTION} describes how to get the @command{gawk} -distribution, how to extract it, and then what is in the various files and -subdirectories. - -@menu -* Getting:: How to get the distribution. -* Extracting:: How to extract the distribution. -* Distribution contents:: What is in the distribution. -@end menu - -@node Getting, Extracting, Gawk Distribution, Gawk Distribution -@appendixsubsec Getting the @command{gawk} Distribution -@cindex getting @command{gawk} -@cindex anonymous @command{ftp} -@cindex @command{ftp}, anonymous -@cindex source code, @command{gawk} -@cindex @command{gawk}, source code -There are three ways to get GNU software: - -@itemize @bullet -@item -Copy it from someone else who already has it. - -@cindex FSF -@cindex Free Software Foundation -@item -Order @command{gawk} directly from the Free Software Foundation. -Software distributions are available for Unix, MS-DOS, and VMS, on -tape and CD-ROM. Their address is: - -@display -Free Software Foundation -59 Temple Place, Suite 330 -Boston, MA 02111-1307 USA -Phone: +1-617-542-5942 -Fax (including Japan): +1-617-542-2652 -Email: @email{gnu@@gnu.org} -URL: @uref{http://www.gnu.org/} -@end display - -@noindent -Ordering from the FSF directly contributes to the support of the foundation -and to the production of more free software. - -@item -Retrieve @command{gawk} by using anonymous @command{ftp} to the Internet host -@code{gnudist.gnu.org}, in the directory @file{/gnu/gawk}. -@end itemize - -The GNU software archive is mirrored around the world. -The up-to-date list of mirror sites is available from -@uref{http://www.gnu.org/order/ftp.html, the main FSF web site}. -Try to use one of the mirrors; they -will be less busy, and you can usually find one closer to your site. - -@node Extracting, Distribution contents, Getting, Gawk Distribution -@appendixsubsec Extracting the Distribution -@command{gawk} is distributed as a @code{tar} file compressed with the -GNU Zip program, @code{gzip}. - -Once you have the distribution (for example, -@file{gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz}), -use @code{gzip} to expand the -file and then use @code{tar} to extract it. You can use the following -pipeline to produce the @command{gawk} distribution: - -@example -# Under System V, add 'o' to the tar options -gzip -d -c gawk-@value{VERSION}.@value{PATCHLEVEL}.tar.gz | tar -xvpf - -@end example - -@noindent -This creates a directory named @file{gawk-@value{VERSION}.@value{PATCHLEVEL}} -in the current directory. - -The distribution @value{FN} is of the form -@file{gawk-@var{V}.@var{R}.@var{P}.tar.gz}. -The @var{V} represents the major version of @command{gawk}, -the @var{R} represents the current release of version @var{V}, and -the @var{P} represents a @dfn{patch level}, meaning that minor bugs have -been fixed in the release. The current patch level is @value{PATCHLEVEL}, -but when retrieving distributions, you should get the version with the highest -version, release, and patch level. (Note, however, that patch levels greater than -or equal to 80 denote ``beta'' or non-production software; you might not want -to retrieve such a version unless you don't mind experimenting.) -If you are not on a Unix system, you need to make other arrangements -for getting and extracting the @command{gawk} distribution. You should consult -a local expert. - -@node Distribution contents, , Extracting, Gawk Distribution -@appendixsubsec Contents of the @command{gawk} Distribution - -The @command{gawk} distribution has a number of C source files, -documentation files, -subdirectories, and files related to the configuration process -(@pxref{Unix Installation, ,Compiling and Installing @command{gawk} on Unix}), -as well as several subdirectories related to different non-Unix -operating systems: - -@table @asis -@item Various @samp{.c}, @samp{.y}, and @samp{.h} files: -These files are the actual @command{gawk} source code. -@end table - -@table @file -@item README -@itemx README_d/README.* -Descriptive files: @file{README} for @command{gawk} under Unix and the -rest for the various hardware and software combinations. - -@item INSTALL -A file providing an overview of the configuration and installation process. - -@item ChangeLog -A detailed list of source code changes as bugs are fixed or improvements made. - -@item NEWS -A list of changes to @command{gawk} since the last release or patch. - -@item COPYING -The GNU General Public License. - -@item FUTURES -A brief list of features and changes being contemplated for future -releases, with some indication of the time frame for the feature, based -on its difficulty. - -@item LIMITATIONS -A list of those factors that limit @command{gawk}'s performance. -Most of these depend on the hardware or operating system software, and -are not limits in @command{gawk} itself. - -@item POSIX.STD -A description of one area where the POSIX standard for @command{awk} is -incorrect as well as how @command{gawk} handles the problem. - -@cindex artificial intelligence, using @command{gawk} -@cindex AI programming, using @command{gawk} -@item doc/awkforai.txt -A short article describing why @command{gawk} is a good language for -AI (Artificial Intelligence) programming. - -@item doc/README.card -@itemx doc/ad.block -@itemx doc/awkcard.in -@itemx doc/cardfonts -@itemx doc/colors -@itemx doc/macros -@itemx doc/no.colors -@itemx doc/setter.outline -The @command{troff} source for a five-color @command{awk} reference card. -A modern version of @command{troff} such as GNU @command{troff} (@command{groff}) is -needed to produce the color version. See the file @file{README.card} -for instructions if you have an older @command{troff}. - -@item doc/gawk.1 -The @command{troff} source for a manual page describing @command{gawk}. -This is distributed for the convenience of Unix users. - -@cindex Texinfo -@item doc/gawk.texi -The Texinfo source file for this @value{DOCUMENT}. -It should be processed with @TeX{} to produce a printed document, and -with @command{makeinfo} to produce an Info or HTML file. - -@item doc/gawk.info -The generated Info file for this @value{DOCUMENT}. - -@item doc/gawkinet.texi -The Texinfo source file for -@ifinfo -@xref{Top}. -@end ifinfo -@ifnotinfo -@cite{TCP/IP Internetworking with @command{gawk}}. -@end ifnotinfo -It should be processed with @TeX{} to produce a printed document and -with @command{makeinfo} to produce an Info or HTML file. - -@item doc/gawkinet.info -The generated Info file for -@cite{TCP/IP Internetworking with @command{gawk}}. - -@item doc/igawk.1 -The @command{troff} source for a manual page describing the @command{igawk} -program presented in -@ref{Igawk Program, ,An Easy Way to Use Library Functions}. - -@item doc/Makefile.in -The input file used during the configuration process to generate the -actual @file{Makefile} for creating the documentation. - -@item Makefile.am -@itemx */Makefile.am -Files used by the GNU @command{automake} software for generating -the @file{Makefile.in} files used by @command{autoconf} and -@command{configure}. - -@item Makefile.in -@itemx acconfig.h -@itemx acinclude.m4 -@itemx aclocal.m4 -@itemx configh.in -@itemx configure.in -@itemx configure -@itemx custom.h -@itemx missing_d/* -@itemx m4/* -These files and subdirectories are used when configuring @command{gawk} -for various Unix systems. They are explained in -@ref{Unix Installation, ,Compiling and Installing @command{gawk} on Unix}. - -@item intl/* -@itemx po/* -The @file{intl} directory provides the GNU @code{gettext} library, which implements -@command{gawk}'s internationalization features, while the @file{po} library -contains message translations. - -@item awklib/extract.awk -@itemx awklib/Makefile.am -@itemx awklib/Makefile.in -@itemx awklib/eg/* -The @file{awklib} directory contains a copy of @file{extract.awk} -(@pxref{Extract Program, ,Extracting Programs from Texinfo Source Files}), -which can be used to extract the sample programs from the Texinfo -source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} file, which -@command{configure} uses to generate a @file{Makefile}. -@file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}. -The library functions from -@ref{Library Functions, , A Library of @command{awk} Functions}, -and the @command{igawk} program from -@ref{Igawk Program, , An Easy Way to Use Library Functions}, -are included as ready-to-use files in the @command{gawk} distribution. -They are installed as part of the installation process. -The rest of the programs in this @value{DOCUMENT} are available in appropriate -subdirectories of @file{awklib/eg}. - -@item unsupported/atari/* -Files needed for building @command{gawk} on an Atari ST -(@pxref{Atari Installation, ,Installing @command{gawk} on the Atari ST}, for details). - -@item unsupported/tandem/* -Files needed for building @command{gawk} on a Tandem -(@pxref{Tandem Installation, ,Installing @command{gawk} on a Tandem}, for details). - -@item posix/* -Files needed for building @command{gawk} on POSIX-compliant systems. - -@item pc/* -Files needed for building @command{gawk} under MS-DOS, MS Windows and OS/2 -(@pxref{PC Installation, ,Installation on PC Operating Systems}, for details). - -@item vms/* -Files needed for building @command{gawk} under VMS -(@pxref{VMS Installation, ,How to Compile and Install @command{gawk} on VMS}, for details). - -@item test/* -A test suite for -@command{gawk}. You can use @samp{make check} from the top-level @command{gawk} -directory to run your version of @command{gawk} against the test suite. -If @command{gawk} successfully passes @samp{make check}, then you can -be confident of a successful port. -@end table - -@node Unix Installation, Non-Unix Installation, Gawk Distribution, Installation -@appendixsec Compiling and Installing @command{gawk} on Unix - -Usually, you can compile and install @command{gawk} by typing only two -commands. However, if you use an unusual system, you may need -to configure @command{gawk} for your system yourself. - -@menu -* Quick Installation:: Compiling @command{gawk} under Unix. -* Additional Configuration Options:: Other compile-time options. -* Configuration Philosophy:: How it's all supposed to work. -@end menu - -@node Quick Installation, Additional Configuration Options, Unix Installation, Unix Installation -@appendixsubsec Compiling @command{gawk} for Unix - -@cindex installation, unix -After you have extracted the @command{gawk} distribution, @command{cd} -to @file{gawk-@value{VERSION}.@value{PATCHLEVEL}}. Like most GNU software, -@command{gawk} is configured -automatically for your Unix system by running the @command{configure} program. -This program is a Bourne shell script that is generated automatically using -GNU @command{autoconf}. -@ifnotinfo -(The @command{autoconf} software is -described fully in -@cite{Autoconf---Generating Automatic Configuration Scripts}, -which is available from the Free Software Foundation.) -@end ifnotinfo -@ifinfo -(The @command{autoconf} software is described fully starting with -@ref{Top}.) -@end ifinfo - -To configure @command{gawk}, simply run @command{configure}: - -@example -sh ./configure -@end example - -This produces a @file{Makefile} and @file{config.h} tailored to your system. -The @file{config.h} file describes various facts about your system. -You might want to edit the @file{Makefile} to -change the @code{CFLAGS} variable, which controls -the command-line options that are passed to the C compiler (such as -optimization levels or compiling for debugging). - -Alternatively, you can add your own values for most @command{make} -variables on the command line, such as @code{CC} and @code{CFLAGS}, when -running @command{configure}: - -@example -CC=cc CFLAGS=-g sh ./configure -@end example - -@noindent -See the file @file{INSTALL} in the @command{gawk} distribution for -all the details. - -After you have run @command{configure} and possibly edited the @file{Makefile}, -type: - -@example -make -@end example - -@noindent -Shortly thereafter, you should have an executable version of @command{gawk}. -That's all there is to it! -To verify that @command{gawk} is working properly, -run @samp{make check}. All of the tests should succeed. -If these steps do not work, or if any of the tests fail, -check the files in the @file{README_d} directory to see if you've -found a known problem. If the failure is not described there, -please send in a bug report -(@pxref{Bugs, ,Reporting Problems and Bugs}.) - -@node Additional Configuration Options, Configuration Philosophy, Quick Installation, Unix Installation -@appendixsubsec Additional Configuration Options - -There are several additional options you may use on the @command{configure} -command line when compiling @command{gawk} from scratch. - -@table @code -@cindex @code{--enable-portals} configuration option -@cindex configuration option, @code{--enable-portals} -@item --enable-portals -This option causes @command{gawk} to treat pathnames that begin -with @file{/p} as BSD portal files when doing two-way I/O with -the @samp{|&} operator -(@pxref{Portal Files, , Using @command{gawk} with BSD Portals}). - -@cindex Linux -@cindex GNU/Linux -@cindex @code{--with-included-gettext} configuration option -@cindex configuration option, @code{--with-included-gettext} -@item --with-included-gettext -Use the version of the @code{gettext} library that comes with @command{gawk}. -This option should be used on systems that do @emph{not} use @value{PVERSION} 2 (or later) -of the GNU C library. -All known modern GNU/Linux systems use Glibc 2. Use this option on any other system. - -@cindex @code{--disable-nls} configuration option -@cindex configuration option, @code{--disable-nls} -@item --disable-nls -Disable all message translation facilities. -This is usually not desirable, but it may bring you some slight performance -improvement. -You should also use this option if @option{--with-included-gettext} -doesn't work on your system. -@end table - -@node Configuration Philosophy, , Additional Configuration Options, Unix Installation -@appendixsubsec The Configuration Process - -@cindex configuring @command{gawk} -This @value{SECTION} is of interest only if you know something about using the -C language and the Unix operating system. - -The source code for @command{gawk} generally attempts to adhere to formal -standards wherever possible. This means that @command{gawk} uses library -routines that are specified by the ISO C standard and by the POSIX -operating system interface standard. When using an ISO C compiler, -function prototypes are used to help improve the compile-time checking. - -Many Unix systems do not support all of either the ISO or the -POSIX standards. The @file{missing_d} subdirectory in the @command{gawk} -distribution contains replacement versions of those functions that are -most likely to be missing. - -The @file{config.h} file that @command{configure} creates contains -definitions that describe features of the particular operating system -where you are attempting to compile @command{gawk}. The three things -described by this file are: what header files are available, so that -they can be correctly included, what (supposedly) standard functions -are actually available in your C libraries, and various miscellaneous -facts about your variant of Unix. For example, there may not be an -@code{st_blksize} element in the @code{stat} structure. In this case, -@samp{HAVE_ST_BLKSIZE} is undefined. - -@cindex @code{custom.h} configuration file -It is possible for your C compiler to lie to @command{configure}. It may -do so by not exiting with an error when a library function is not -available. To get around this, edit the file @file{custom.h}. -Use an @samp{#ifdef} that is appropriate for your system, and either -@code{#define} any constants that @command{configure} should have defined but -didn't, or @code{#undef} any constants that @command{configure} defined and -should not have. @file{custom.h} is automatically included by -@file{config.h}. - -It is also possible that the @command{configure} program generated by -@command{autoconf} will not work on your system in some other fashion. -If you do have a problem, the file @file{configure.in} is the input for -@command{autoconf}. You may be able to change this file and generate a -new version of @command{configure} that works on your system -(@pxref{Bugs, ,Reporting Problems and Bugs}, -for information on how to report problems in configuring @command{gawk}). -The same mechanism may be used to send in updates to @file{configure.in} -and/or @file{custom.h}. - -@node Non-Unix Installation, Unsupported, Unix Installation, Installation -@appendixsec Installation on Other Operating Systems - -This @value{SECTION} describes how to install @command{gawk} on -various non-Unix systems. - -@menu -* Amiga Installation:: Installing @command{gawk} on an Amiga. -* BeOS Installation:: Installing @command{gawk} on BeOS. -* PC Installation:: Installing and Compiling @command{gawk} on - MS-DOS and OS/2. -* VMS Installation:: Installing @command{gawk} on VMS. -@end menu - -@node Amiga Installation, BeOS Installation, Non-Unix Installation, Non-Unix Installation -@appendixsubsec Installing @command{gawk} on an Amiga - -@cindex amiga -@cindex installation, amiga -You can install @command{gawk} on an Amiga system using a Unix emulation -environment, available via anonymous @command{ftp} from -@code{ftp.ninemoons.com} in the directory @file{pub/ade/current}. -This includes a shell based on @command{pdksh}. The primary component of -this environment is a Unix emulation library, @file{ixemul.lib}. -@c could really use more background here, who wrote this, etc. - -A more complete distribution for the Amiga is available on -the Geek Gadgets CD-ROM, available from: - -@display -CRONUS -1840 E. Warner Road #105-265 -Tempe, AZ 85284 USA -US Toll Free: (800) 804-0833 -Phone: +1-602-491-0442 -FAX: +1-602-491-0048 -Email: @email{info@@ninemoons.com} -WWW: @uref{http://www.ninemoons.com} -Anonymous @command{ftp} site: @code{ftp.ninemoons.com} -@end display - -Once you have the distribution, you can configure @command{gawk} simply by -running @command{configure}: - -@example -configure -v m68k-amigaos -@end example - -Then run @command{make} and you should be all set! -If these steps do not work, please send in a bug report -(@pxref{Bugs, ,Reporting Problems and Bugs}). - -@node BeOS Installation, PC Installation, Amiga Installation, Non-Unix Installation -@appendixsubsec Installing @command{gawk} on BeOS -@cindex BeOS -@cindex installation, beos - -@c From email contributed by Martin Brown, mc@whoever.com -Since BeOS DR9, all the tools that you should need to build @code{gawk} are -included with BeOS. The process is basically identical to the Unix process -of running @command{configure} and then @command{make}. Full instructions are given below. - -You can compile @command{gawk} under BeOS by extracting the standard sources -and running @command{configure}. You @emph{must} specify the location -prefix for the installation directory. For BeOS DR9 and beyond, the best directory to -use is @file{/boot/home/config}, so the @command{configure} command is: - -@example -configure --prefix=/boot/home/config -@end example - -This installs the compiled application into @file{/boot/home/config/bin}, -which is already specified in the standard @env{PATH}. - -Once the configuration process is completed, you can run @command{make}, -and then @samp{make install}: - -@example -$ make -@dots{} -$ make install -@end example - -BeOS uses @command{bash} as its shell; thus, you use @command{gawk} the same way you would -under Unix. -If these steps do not work, please send in a bug report -(@pxref{Bugs, ,Reporting Problems and Bugs}). - -@c Rewritten by Scott Deifik <scottd@amgen.com> -@c and Darrel Hankerson <hankedr@mail.auburn.edu> - -@node PC Installation, VMS Installation, BeOS Installation, Non-Unix Installation -@appendixsubsec Installation on PC Operating Systems - -@cindex installation, pc operating systems -This @value{SECTION} covers installation and usage of @command{gawk} on x86 machines -running DOS, any version of Windows, or OS/2. -In this @value{SECTION}, the term ``Win32'' -refers to any of Windows-95/98/ME/NT/2000. - -The limitations of DOS (and DOS shells under Windows or OS/2) has meant -that various ``DOS extenders'' are often used with programs such as -@command{gawk}. The varying capabilities of Microsoft Windows 3.1 -and Win32 can add to the confusion. For an overview of the -considerations, please refer to @file{README_d/README.pc} in the -distribution. - -@menu -* PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, Win32, - and OS/2. -* PC Using:: Running @command{gawk} on MS-DOS, Win32 and - OS/2. -@end menu - -@node PC Binary Installation, PC Compiling, PC Installation, PC Installation -@appendixsubsubsec Installing a Prepared Distribution for PC Systems - -If you have received a binary distribution prepared by the DOS -maintainers, then @command{gawk} and the necessary support files appear -under the @file{gnu} directory, with executables in @file{gnu/bin}, -libraries in @file{gnu/lib/awk}, and manual pages under @file{gnu/man}. -This is designed for easy installation to a @file{/gnu} directory on your -drive---however, the files can be installed anywhere provided @env{AWKPATH} is -set properly. Regardless of the installation directory, the first line of -@file{igawk.cmd} and @file{igawk.bat} (in @file{gnu/bin}) may need to be -edited. - -The binary distribution contains a separate file describing the -contents. In particular, it may include more than one version of the -@command{gawk} executable. OS/2 binary distributions may have a -different arrangement, but installation is similar. - -@node PC Compiling, PC Using, PC Binary Installation, PC Installation -@appendixsubsubsec Compiling @command{gawk} for PC Operating Systems - -@command{gawk} can be compiled for MS-DOS, Win32, and OS/2 using the GNU -development tools from DJ Delorie (DJGPP; MS-DOS only) or Eberhard -Mattes (EMX; MS-DOS, Win32 and OS/2). Microsoft Visual C/C++ can be used -to build a Win32 version, and Microsoft C/C++ can be -used to build 16-bit versions for MS-DOS and OS/2. The file -@file{README_d/README.pc} in the @command{gawk} distribution contains -additional notes, and @file{pc/Makefile} contains important information on -compilation options. - -To build @command{gawk}, copy the files in the @file{pc} directory -(@emph{except} for @file{ChangeLog}) to the directory with the rest of -the @command{gawk} sources. The @file{Makefile} contains a configuration -section with comments and may need to be edited in order to work with -your @command{make} utility. - -The @file{Makefile} contains a number of targets for building various MS-DOS, -Win32, and OS/2 versions. A list of targets is printed if the @command{make} -command is given without a target. As an example, to build @command{gawk} -using the DJGPP tools, enter @samp{make djgpp}. - -Using @command{make} to run the standard tests and to install @command{gawk} -requires additional Unix-like tools, including @command{sh}, @command{sed}, and -@command{cp}. In order to run the tests, the @file{test/*.ok} files may need to -be converted so that they have the usual DOS-style end-of-line markers. Most -of the tests work properly with Stewartson's shell along with the -companion utilities or appropriate GNU utilities. However, some editing of -@file{test/Makefile} is required. It is recommended that you copy the file -@file{pc/Makefile.tst} over the file @file{test/Makefile} as a -replacement. Details can be found in @file{README_d/README.pc} -and in the file @file{pc/Makefile.tst}. - -@node PC Using, , PC Compiling, PC Installation -@appendixsubsubsec Using @command{gawk} on PC Operating Systems - -@cindex search path -@cindex directory search -@cindex path, search -@cindex search path, for source files -The OS/2 and MS-DOS versions of @command{gawk} search for program files as -described in @ref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}. -However, semicolons (rather than colons) separate elements -in the @env{AWKPATH} variable. If @env{AWKPATH} is not set or is empty, -then the default search path is @code{@w{".;c:/lib/awk;c:/gnu/lib/awk"}}. - -An @command{sh}-like shell (as opposed to @command{command.com} under MS-DOS -or @command{cmd.exe} under OS/2) may be useful for @command{awk} programming. -Ian Stewartson has written an excellent shell for MS-DOS and OS/2, -Daisuke Aoyama has ported GNU @command{bash} to MS-DOS using the DJGPP tools, -and several shells are available for OS/2, including @command{ksh}. The file -@file{README_d/README.pc} in the @command{gawk} distribution contains -information on these shells. Users of Stewartson's shell on DOS should -examine its documentation for handling command lines; in particular, -the setting for @command{gawk} in the shell configuration may need to be -changed and the @code{ignoretype} option may also be of interest. - -@cindex @code{BINMODE} variable -Under OS/2 and DOS, @command{gawk} (and many other text programs) silently -translate end-of-line @code{"\r\n"} to @code{"\n"} on input and @code{"\n"} -to @code{"\r\n"} on output. A special @code{BINMODE} variable allows -control over these translations and is interpreted as follows. - -@itemize @bullet -@item -If @code{BINMODE} is @samp{"r"}, or -@code{(BINMODE & 1)} is nonzero, then -binary mode is set on read (i.e., no translations on reads). - -@item -If @code{BINMODE} is @code{"w"}, or -@code{(BINMODE & 2)} is nonzero, then -binary mode is set on write (i.e., no translations on writes). - -@item -If @code{BINMODE} is @code{"rw"} or @code{"wr"}, -binary mode is set for both read and write -(same as @code{(BINMODE & 3)}). - -@item -@code{BINMODE=@var{non-null-string}} is -the same as @samp{BINMODE=3} (i.e., no translations on -reads or writes). However, @command{gawk} issues a warning -message if the string is not one of @code{"rw"} or @code{"wr"}. -@end itemize - -@noindent -The modes for standard input and standard output are set one time -only (after the -command line is read, but before processing any of the @command{awk} program). -Setting @code{BINMODE} for standard input or -standard output is accomplished by using an -appropriate @samp{-v BINMODE=@var{N}} option on the command line. -@code{BINMODE} is set at the time a file or pipe is opened and cannot be -changed mid-stream. - -The name @code{BINMODE} was chosen to match @command{mawk} -(@pxref{Other Versions, , Other Freely Available @command{awk} Implementations}). -Both @command{mawk} and @command{gawk} handle @code{BINMODE} similarly; however, -@command{mawk} adds a @samp{-W BINMODE=@var{N}} option and an environment -variable that can set @code{BINMODE}, @code{RS}, and @code{ORS}. The -files @file{binmode[1-3].awk} (under @file{gnu/lib/awk} in some of the -prepared distributions) have been chosen to match @command{mawk}'s @samp{-W -BINMODE=@var{N}} option. These can be changed or discarded; in particular, -the setting of @code{RS} giving the fewest ``surprises'' is open to debate. -@command{mawk} uses @samp{RS = "\r\n"} if binary mode is set on read, which is -appropriate for files with the DOS-style end-of-line. - -To Illustrate, the following examples set binary mode on writes for standard -output and other files, and set @code{ORS} as the ``usual'' DOS-style -end-of-line: - -@example -gawk -v BINMODE=2 -v ORS="\r\n" @dots{} -@end example - -@noindent -or: - -@example -gawk -v BINMODE=w -f binmode2.awk @dots{} -@end example - -@noindent -These give the same result as the @samp{-W BINMODE=2} option in -@command{mawk}. -The following changes the record separator to @code{"\r\n"} and sets binary -mode on reads, but does not affect the mode on standard input: - -@example -gawk -v RS="\r\n" --source "BEGIN @{ BINMODE = 1 @}" @dots{} -@end example - -@noindent -or: - -@example -gawk -f binmode1.awk @dots{} -@end example - -@noindent -With proper quoting, in the first example the setting of @code{RS} can be -moved into the @code{BEGIN} rule. - -@node VMS Installation, , PC Installation, Non-Unix Installation -@appendixsubsec How to Compile and Install @command{gawk} on VMS - -@c based on material from Pat Rankin <rankin@eql.caltech.edu> - -@cindex installation, vms -This @value{SUBSECTION} describes how to compile and install @command{gawk} under VMS. - -@menu -* VMS Compilation:: How to compile @command{gawk} under VMS. -* VMS Installation Details:: How to install @command{gawk} under VMS. -* VMS Running:: How to run @command{gawk} under VMS. -* VMS POSIX:: Alternate instructions for VMS POSIX. -@end menu - -@node VMS Compilation, VMS Installation Details, VMS Installation, VMS Installation -@appendixsubsubsec Compiling @command{gawk} on VMS - -To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that -issues all the necessary @code{CC} and @code{LINK} commands. There is -also a @file{Makefile} for use with the @code{MMS} utility. From the source -directory, use either: - -@example -$ @@[.VMS]VMSBUILD.COM -@end example - -@noindent -or: - -@example -$ MMS/DESCRIPTION=[.VMS]DESCRIP.MMS GAWK -@end example - -Depending upon which C compiler you are using, follow one of the sets -of instructions in this table: - -@table @asis -@item VAX C V3.x -Use either @file{vmsbuild.com} or @file{descrip.mms} as is. These use -@code{CC/OPTIMIZE=NOLINE}, which is essential for Version 3.0. - -@item VAX C V2.x -You must have Version 2.3 or 2.4; older ones won't work. Edit either -@file{vmsbuild.com} or @file{descrip.mms} according to the comments in them. -For @file{vmsbuild.com}, this just entails removing two @samp{!} delimiters. -Also edit @file{config.h} (which is a copy of file @file{[.config]vms-conf.h}) -and comment out or delete the two lines @samp{#define __STDC__ 0} and -@samp{#define VAXC_BUILTINS} near the end. - -@item GNU C -Edit @file{vmsbuild.com} or @file{descrip.mms}; the changes are different -from those for VAX C V2.x but equally straightforward. No changes to -@file{config.h} are needed. - -@item DEC C -Edit @file{vmsbuild.com} or @file{descrip.mms} according to their comments. -No changes to @file{config.h} are needed. -@end table - -@command{gawk} has been tested under VAX/VMS 5.5-1 using VAX C V3.2, and -GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6 and up. - -@node VMS Installation Details, VMS Running, VMS Compilation, VMS Installation -@appendixsubsubsec Installing @command{gawk} on VMS - -To install @command{gawk}, all you need is a ``foreign'' command, which is -a @code{DCL} symbol whose value begins with a dollar sign. For example: - -@example -$ GAWK :== $disk1:[gnubin]GAWK -@end example - -@noindent -Substitute the actual location of @command{gawk.exe} for -@samp{$disk1:[gnubin]}. The symbol should be placed in the -@file{login.com} of any user who wants to run @command{gawk}, -so that it is defined every time the user logs on. -Alternatively, the symbol may be placed in the system-wide -@file{sylogin.com} procedure, which allows all users -to run @command{gawk}. - -Optionally, the help entry can be loaded into a VMS help library: - -@example -$ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP -@end example - -@noindent -(You may want to substitute a site-specific help library rather than -the standard VMS library @samp{HELPLIB}.) After loading the help text, -the command: - -@example -$ HELP GAWK -@end example - -@noindent -provides information about both the @command{gawk} implementation and the -@command{awk} programming language. - -The logical name @samp{AWK_LIBRARY} can designate a default location -for @command{awk} program files. For the @option{-f} option, if the specified -@value{FN} has no device or directory path information in it, @command{gawk} -looks in the current directory first, then in the directory specified -by the translation of @samp{AWK_LIBRARY} if the file is not found. -If, after searching in both directories, the file still is not found, -@command{gawk} appends the suffix @samp{.awk} to the filename and retries -the file search. If @samp{AWK_LIBRARY} is not defined, that -portion of the file search fails benignly. - -@node VMS Running, VMS POSIX, VMS Installation Details, VMS Installation -@appendixsubsubsec Running @command{gawk} on VMS - -Command-line parsing and quoting conventions are significantly different -on VMS, so examples in this @value{DOCUMENT} or from other sources often need minor -changes. They @emph{are} minor though, and all @command{awk} programs -should run correctly. - -Here are a couple of trivial tests: - -@example -$ gawk -- "BEGIN @{print ""Hello, World!""@}" -$ gawk -"W" version -! could also be -"W version" or "-W version" -@end example - -@noindent -Note that uppercase and mixed-case text must be quoted. - -The VMS port of @command{gawk} includes a @code{DCL}-style interface in addition -to the original shell-style interface (see the help entry for details). -One side effect of dual command-line parsing is that if there is only a -single parameter (as in the quoted string program above), the command -becomes ambiguous. To work around this, the normally optional @option{--} -flag is required to force Unix style rather than @code{DCL} parsing. If any -other dash-type options (or multiple parameters such as @value{DF}s to -process) are present, there is no ambiguity and @option{--} can be omitted. - -@cindex search path -@cindex directory search -@cindex path, search -@cindex search path, for source files -The default search path, when looking for @command{awk} program files specified -by the @option{-f} option, is @code{"SYS$DISK:[],AWK_LIBRARY:"}. The logical -name @samp{AWKPATH} can be used to override this default. The format -of @samp{AWKPATH} is a comma-separated list of directory specifications. -When defining it, the value should be quoted so that it retains a single -translation and not a multitranslation @code{RMS} searchlist. - -@node VMS POSIX, , VMS Running, VMS Installation -@appendixsubsubsec Building and Using @command{gawk} on VMS POSIX - -Ignore the instructions above, although @file{vms/gawk.hlp} should still -be made available in a help library. The source tree should be unpacked -into a container file subsystem rather than into the ordinary VMS filesystem. -Make sure that the two scripts, @file{configure} and -@file{vms/posix-cc.sh}, are executable; use @samp{chmod +x} on them if -necessary. Then execute the following two commands: - -@example -psx> CC=vms/posix-cc.sh configure -psx> make CC=c89 gawk -@end example - -@noindent -The first command constructs files @file{config.h} and @file{Makefile} out -of templates, using a script to make the C compiler fit @command{configure}'s -expectations. The second command compiles and links @command{gawk} using -the C compiler directly; ignore any warnings from @command{make} about being -unable to redefine @code{CC}. @command{configure} takes a very long -time to execute, but at least it provides incremental feedback as it runs. - -This has been tested with VAX/VMS V6.2, VMS POSIX V2.0, and DEC C V5.2. - -Once built, @command{gawk} works like any other shell utility. Unlike -the normal VMS port of @command{gawk}, no special command-line manipulation is -needed in the VMS POSIX environment. - -@node Unsupported, Bugs, Non-Unix Installation, Installation -@appendixsec Unsupported Operating System Ports - -This sections describes systems for which -the @command{gawk} port is no longer supported. - -@menu -* Atari Installation:: Installing @command{gawk} on the Atari ST. -* Tandem Installation:: Installing @command{gawk} on a Tandem. -@end menu - -@node Atari Installation, Tandem Installation, Unsupported, Unsupported -@appendixsubsec Installing @command{gawk} on the Atari ST - -The Atari port is no longer supported. It is -included for those who might want to use it but it is no longer being -actively maintained. - -@c based on material from Michal Jaegermann <michal@gortel.phys.ualberta.ca> -@cindex atari -@cindex installation, atari -There are no substantial differences when installing @command{gawk} on -various Atari models. Compiled @command{gawk} executables do not require -a large amount of memory with most @command{awk} programs, and should run on all -Motorola processor-based models (called further ST, even if that is not -exactly right). - -In order to use @command{gawk}, you need to have a shell, either text or -graphics, that does not map all the characters of a command line to -uppercase. Maintaining case distinction in option flags is very -important (@pxref{Options, ,Command-Line Options}). -These days this is the default and it may only be a problem for some -very old machines. If your system does not preserve the case of option -flags, you need to upgrade your tools. Support for I/O -redirection is necessary to make it easy to import @command{awk} programs -from other environments. Pipes are nice to have but not vital. - -@menu -* Atari Compiling:: Compiling @command{gawk} on Atari. -* Atari Using:: Running @command{gawk} on Atari. -@end menu - -@node Atari Compiling, Atari Using, Atari Installation, Atari Installation -@appendixsubsubsec Compiling @command{gawk} on the Atari ST - -A proper compilation of @command{gawk} sources when @code{sizeof(int)} -differs from @code{sizeof(void *)} requires an ISO C compiler. An initial -port was done with @command{gcc}. You may actually prefer executables -where @code{int}s are four bytes wide but the other variant works as well. - -You may need quite a bit of memory when trying to recompile the @command{gawk} -sources, as some source files (@file{regex.c} in particular) are quite -big. If you run out of memory compiling such a file, try reducing the -optimization level for this particular file, which may help. - -@cindex Linux -@cindex GNU/Linux -With a reasonable shell (@command{bash} will do), you have a pretty good chance -that the @command{configure} utility will succeed, and in particular if -you run GNU/Linux, MiNT or a similar operating system. Otherwise -sample versions of @file{config.h} and @file{Makefile.st} are given in the -@file{atari} subdirectory and can be edited and copied to the -corresponding files in the main source directory. Even if -@command{configure} produces something, it might be advisable to compare -its results with the sample versions and possibly make adjustments. - -Some @command{gawk} source code fragments depend on a preprocessor define -@samp{atarist}. This basically assumes the TOS environment with @command{gcc}. -Modify these sections as appropriate if they are not right for your -environment. Also see the remarks about @env{AWKPATH} and @code{envsep} in -@ref{Atari Using, ,Running @command{gawk} on the Atari ST}. - -As shipped, the sample @file{config.h} claims that the @code{system} -function is missing from the libraries, which is not true, and an -alternative implementation of this function is provided in -@file{unsupported/atari/system.c}. -Depending upon your particular combination of -shell and operating system, you might want to change the file to indicate -that @code{system} is available. - -@node Atari Using, , Atari Compiling, Atari Installation -@appendixsubsubsec Running @command{gawk} on the Atari ST - -An executable version of @command{gawk} should be placed, as usual, -anywhere in your @env{PATH} where your shell can find it. - -While executing, the Atari version of @command{gawk} creates a number of temporary files. When -using @command{gcc} libraries for TOS, @command{gawk} looks for either of -the environment variables, @env{TEMP} or @env{TMPDIR}, in that order. -If either one is found, its value is assumed to be a directory for -temporary files. This directory must exist, and if you can spare the -memory, it is a good idea to put it on a RAM drive. If neither -@env{TEMP} nor @env{TMPDIR} are found, then @command{gawk} uses the -current directory for its temporary files. - -The ST version of @command{gawk} searches for its program files, as described in -@ref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}. -The default value for the @env{AWKPATH} variable is taken from -@code{DEFPATH} defined in @file{Makefile}. The sample @command{gcc}/TOS -@file{Makefile} for the ST in the distribution sets @code{DEFPATH} to -@code{@w{".,c:\lib\awk,c:\gnu\lib\awk"}}. The search path can be -modified by explicitly setting @env{AWKPATH} to whatever you want. -Note that colons cannot be used on the ST to separate elements in the -@env{AWKPATH} variable, since they have another reserved meaning. -Instead, you must use a comma to separate elements in the path. When -recompiling, the separating character can be modified by initializing -the @code{envsep} variable in @file{unsupported/atari/gawkmisc.atr} to another -value. - -Although @command{awk} allows great flexibility in doing I/O redirections -from within a program, this facility should be used with care on the ST -running under TOS. In some circumstances, the OS routines for file-handle -pool processing lose track of certain events, causing the -computer to crash and requiring a reboot. Often a warm reboot is -sufficient. Fortunately, this happens infrequently and in rather -esoteric situations. In particular, avoid having one part of an -@command{awk} program using @code{print} statements explicitly redirected -to @file{/dev/stdout}, while other @code{print} statements use the -default standard output, and a calling shell has redirected standard -output to a file. -@c 10/2000: Is this still true, now that gawk does /dev/stdout internally? - -When @command{gawk} is compiled with the ST version of @command{gcc} and its -usual libraries, it accepts both @samp{/} and @samp{\} as path separators. -While this is convenient, it should be remembered that this removes one -technically valid character (@samp{/}) from your @value{FN}. -It may also create problems for external programs called via the @code{system} -function, which may not support this convention. Whenever it is possible -that a file created by @command{gawk} will be used by some other program, -use only backslashes. Also remember that in @command{awk}, backslashes in -strings have to be doubled in order to get literal backslashes -(@pxref{Escape Sequences}). - -@node Tandem Installation, , Atari Installation, Unsupported -@appendixsubsec Installing @command{gawk} on a Tandem -@cindex tandem -@cindex installation, tandem - -The Tandem port is only minimally supported. -The port's contributor no longer has access to a Tandem system. - -@c This section based on README.Tandem by Stephen Davies (scldad@sdc.com.au) -The Tandem port was done on a Cyclone machine running D20. -The port is pretty clean and all facilities seem to work except for -the I/O piping facilities -(@pxref{Getline/Pipe, , Using @code{getline} from a Pipe}, -@ref{Getline/Variable/Pipe, ,Using @code{getline} into a Variable from a Pipe}, -and -@ref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}), -which is just too foreign a concept for Tandem. - -To build a Tandem executable from source, download all of the files so -that the @value{FN}s on the Tandem box conform to the restrictions of D20. -For example, @file{array.c} becomes @file{ARRAYC}, and @file{awk.h} -becomes @file{AWKH}. The totally Tandem-specific files are in the -@file{tandem} ``subvolume'' (@file{unsupported/tandem} in the @command{gawk} -distribution) and should be copied to the main source directory before -building @command{gawk}. - -The file @file{compit} can then be used to compile and bind an executable. -Alas, there is no @command{configure} or @command{make}. - -Usage is the same as for Unix, except that D20 requires all @samp{@{} and -@samp{@}} characters to be escaped with @samp{~} on the command line -(but @emph{not} in script files). Also, the standard Tandem syntax for -@samp{/in filename,out filename/} must be used instead of the usual -Unix @samp{<} and @samp{>} for file redirection. (Redirection options -on @code{getline}, @code{print} etc., are supported.) - -The @samp{-mr @var{val}} option -(@pxref{Options, ,Command-Line Options}) -has been ``stolen'' to enable Tandem users to process fixed-length -records with no ``end-of-line'' character. That is, @samp{-mr 74} tells -@command{gawk} to read the input file as fixed 74-byte records. - -@node Bugs, Other Versions, Unsupported, Installation -@appendixsec Reporting Problems and Bugs -@cindex archeologists -@quotation -@i{There is nothing more dangerous than a bored archeologist.}@* -The Hitchhiker's Guide to the Galaxy -@end quotation -@c the radio show, not the book. :-) - -@cindex bug reports -@cindex problem reports -@cindex reporting bugs -@cindex reporting problems -If you have problems with @command{gawk} or think that you have found a bug, -please report it to the developers; we cannot promise to do anything -but we might well want to fix it. - -Before reporting a bug, make sure you have actually found a real bug. -Carefully reread the documentation and see if it really says you can do -what you're trying to do. If it's not clear whether you should be able -to do something or not, report that too; it's a bug in the documentation! - -Before reporting a bug or trying to fix it yourself, try to isolate it -to the smallest possible @command{awk} program and input @value{DF} that -reproduces the problem. Then send us the program and @value{DF}, -some idea of what kind of Unix system you're using, -the compiler you used to compile @command{gawk}, and the exact results -@command{gawk} gave you. Also say what you expected to occur; this helps -us decide whether the problem is really in the documentation. - -@cindex @code{bug-gawk@@gnu.org} bug reporting address -@cindex emaill address for bug reports, @code{bug-gawk@@gnu.org} -@cindex bug reports, email address, @code{bug-gawk@@gnu.org} -Once you have a precise problem, send email to @email{bug-gawk@@gnu.org}. - -@cindex Robbins, Arnold -Please include the version number of @command{gawk} you are using. -You can get this information with the command @samp{gawk --version}. -Using this address automatically sends a carbon copy of your -mail to me. If necessary, I can be reached directly at -@email{arnold@@gnu.org}. The bug reporting address is preferred since the -email list is archived at the GNU Project. -@emph{All email should be in English, since that is my native language.} - -@cindex @code{comp.lang.awk} Usenet news group -@strong{Caution:} Do @emph{not} try to report bugs in @command{gawk} by -posting to the Usenet/Internet newsgroup @code{comp.lang.awk}. -While the @command{gawk} developers do occasionally read this newsgroup, -there is no guarantee that we will see your posting. The steps described -above are the official recognized ways for reporting bugs. - -Non-bug suggestions are always welcome as well. If you have questions -about things that are unclear in the documentation or are just obscure -features, ask me; I will try to help you out, although I -may not have the time to fix the problem. You can send me electronic -mail at the Internet address noted previously. - -If you find bugs in one of the non-Unix ports of @command{gawk}, please send -an electronic mail message to the person who maintains that port. They -are named in the following list, as well as in the @file{README} file in the @command{gawk} -distribution. Information in the @file{README} file should be considered -authoritative if it conflicts with this @value{DOCUMENT}. - -The people maintaining the non-Unix ports of @command{gawk} are -as follows: - -@ignore -@table @asis -@cindex Fish, Fred -@item Amiga -Fred Fish, @email{fnf@@ninemoons.com}. - -@cindex Brown, Martin -@item BeOS -Martin Brown, @email{mc@@whoever.com}. - -@cindex Deifik, Scott -@cindex Hankerson, Darrel -@item MS-DOS -Scott Deifik, @email{scottd@@amgen.com} and -Darrel Hankerson, @email{hankedr@@mail.auburn.edu}. - -@cindex Grigera, Juan -@item MS-Windows -Juan Grigera, @email{juan@@biophnet.unlp.edu.ar}. - -@cindex Rommel, Kai Uwe -@item OS/2 -Kai Uwe Rommel, @email{rommel@@ars.de}. - -@cindex Davies, Stephen -@item Tandem -Stephen Davies, @email{scldad@@sdc.com.au}. - -@cindex Rankin, Pat -@item VMS -Pat Rankin, @email{rankin@@eql.caltech.edu}. -@end table -@end ignore - -@multitable {MS-Windows} {123456789012345678901234567890123456789001234567890} -@cindex Fish, Fred -@item Amiga @tab Fred Fish, @email{fnf@@ninemoons.com}. - -@cindex Brown, Martin -@item BeOS @tab Martin Brown, @email{mc@@whoever.com}. - -@cindex Deifik, Scott -@cindex Hankerson, Darrel -@item MS-DOS @tab Scott Deifik, @email{scottd@@amgen.com} and -Darrel Hankerson, @email{hankedr@@mail.auburn.edu}. - -@cindex Grigera, Juan -@item MS-Windows @tab Juan Grigera, @email{juan@@biophnet.unlp.edu.ar}. - -@cindex Rommel, Kai Uwe -@item OS/2 @tab Kai Uwe Rommel, @email{rommel@@ars.de}. - -@cindex Davies, Stephen -@item Tandem @tab Stephen Davies, @email{scldad@@sdc.com.au}. - -@cindex Rankin, Pat -@item VMS @tab Pat Rankin, @email{rankin@@eql.caltech.edu}. -@end multitable - -If your bug is also reproducible under Unix, please send a copy of your -report to the @email{bug-gawk@@gnu.org} email list as well. - -@node Other Versions, , Bugs, Installation -@appendixsec Other Freely Available @command{awk} Implementations -@cindex other @command{awk} implementations -@ignore -From: emory!amc.com!brennan (Michael Brennan) -Subject: C++ comments in awk programs -To: arnold@gnu.ai.mit.edu (Arnold Robbins) -Date: Wed, 4 Sep 1996 08:11:48 -0700 (PDT) - -@end ignore -@cindex Brennan, Michael -@quotation -@i{It's kind of fun to put comments like this in your awk code.}@* -@ @ @ @ @ @ @code{// Do C++ comments work? answer: yes! of course}@* -Michael Brennan -@end quotation - -There are three other freely available @command{awk} implementations. -This @value{SECTION} briefly describes where to get them: - -@table @asis -@cindex Kernighan, Brian -@cindex Unix @command{awk}, source code -@cindex source code, Unix @command{awk} -@item Unix @command{awk} -Brian Kernighan has made his implementation of -@command{awk} freely available. -You can retrieve this version via the World Wide Web from -his home page.@footnote{@uref{http://cm.bell-labs.com/who/bwk}} -It is available in several archive formats: - -@table @asis -@item Shell archive -@uref{http://cm.bell-labs.com/who/bwk/awk.shar} - -@item Compressed @command{tar} file -@uref{http://cm.bell-labs.com/who/bwk/awk.tar.gz} - -@item Zip file -@uref{http://cm.bell-labs.com/who/bwk/awk.zip} -@end table - -This version requires an ISO C (1990 standard) compiler; -the C compiler from -GCC (the GNU Compiler Collection) -works quite nicely. - -@xref{BTL, ,Extensions in the Bell Laboratories @command{awk}}, -for a list of extensions in this @command{awk} that are not in POSIX @command{awk}. - -@cindex GPL -@cindex General Public License -@cindex GNU General Public License -@cindex Brennan, Michael -@cindex @command{mawk}, source code -@cindex source code, @command{mawk} -@item @command{mawk} -Michael Brennan has written an independent implementation of @command{awk}, -called @command{mawk}. It is available under the GPL -(@pxref{Copying, ,GNU General Public License}), -just as @command{gawk} is. - -You can get it via anonymous @command{ftp} to the host -@code{@w{ftp.whidbey.net}}. Change directory to @file{/pub/brennan}. -Use ``binary'' or ``image'' mode, and retrieve @file{mawk1.3.3.tar.gz} -(or the latest version that is there). - -@command{gunzip} may be used to decompress this file. Installation -is similar to @command{gawk}'s -(@pxref{Unix Installation, , Compiling and Installing @command{gawk} on Unix}). - -@cindex extensions, @command{mawk} -@command{mawk} has the following extensions that are not in POSIX @command{awk}: - -@itemize @bullet -@item -The @code{fflush} built-in function for flushing buffered output -(@pxref{I/O Functions, ,Input/Output Functions}). - -@item -The @samp{**} and @samp{**=} operators -(@pxref{Arithmetic Ops, ,Arithmetic Operators} -and also see -@ref{Assignment Ops, ,Assignment Expressions}). - -@item -The use of @code{func} as an abbreviation for @code{function} -(@pxref{Definition Syntax, ,Function Definition Syntax}). - -@item -The @samp{\x} escape sequence -(@pxref{Escape Sequences}). - -@item -The @file{/dev/stdout}, and @file{/dev/stderr} -special files -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). -Use @code{"-"} instead of @code{"/dev/stdin"} with @command{mawk}. - -@item -The ability for @code{FS} and for the third -argument to @code{split} to be null strings -(@pxref{Single Character Fields, , Making Each Character a Separate Field}). - -@item -The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete, ,The @code{delete} Statement}). - -@item -The ability for @code{RS} to be a regexp -(@pxref{Records, ,How Input Is Split into Records}). - -@item -The @code{BINMODE} special variable for non-Unix operating systems -(@pxref{PC Using, ,Using @command{gawk} on PC Operating Systems}). -@end itemize - -The next version of @command{mawk} will support @code{nextfile}. - -@cindex Sumner, Andrew -@cindex @command{awka} compiler for @command{awk} programs -@cindex @command{awka}, source code -@cindex source code, @command{awka} -@item @command{awka} -Written by Andrew Sumner, -@command{awka} translates @command{awk} programs into C, compiles them, -and links them with a library of functions that provides the core -@command{awk} functionality. -It also has a number of extensions. - -@cindex GPL -@cindex General Public License -@cindex GNU General Public License -@cindex LGPL -@cindex Lesser General Public License -@cindex GNU Lesser General Public License -The @command{awk} translator is released under the GPL, and the library -is under the LGPL. - -@ignore -To get @command{awka}, go to its home page at -Go to @uref{http://awka.sourceforge.net}. -@end ignore -To get @command{awka}, go to @uref{http://awka.sourceforge.net}. -You can reach Andrew Sumner at @email{andrew_sumner@@bigfoot.com}. -@end table - -@node Notes, Basic Concepts, Installation, Top -@appendix Implementation Notes - -This appendix contains information mainly of interest to implementors and -maintainers of @command{gawk}. Everything in it applies specifically to -@command{gawk} and not to other implementations. - -@menu -* Compatibility Mode:: How to disable certain @command{gawk} - extensions. -* Additions:: Making Additions To @command{gawk}. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. -* Future Extensions:: New features that may be implemented one day. -@end menu - -@node Compatibility Mode, Additions, Notes, Notes -@appendixsec Downward Compatibility and Debugging - -@xref{POSIX/GNU, ,Extensions in @command{gawk} Not in POSIX @command{awk}}, -for a summary of the GNU extensions to the @command{awk} language and program. -All of these features can be turned off by invoking @command{gawk} with the -@option{--traditional} option or with the @option{--posix} option. - -If @command{gawk} is compiled for debugging with @samp{-DDEBUG}, then there -is one more option available on the command line: - -@table @code -@item -W parsedebug -@itemx --parsedebug -Print out the parse stack information as the program is being parsed. -@end table - -This option is intended only for serious @command{gawk} developers -and not for the casual user. It probably has not even been compiled into -your version of @command{gawk}, since it slows down execution. - -@node Additions, Dynamic Extensions, Compatibility Mode, Notes -@appendixsec Making Additions to @command{gawk} - -If you find that you want to enhance @command{gawk} in a significant -fashion, you are perfectly free to do so. That is the point of having -free software; the source code is available and you are free to change -it as you want (@pxref{Copying, ,GNU General Public License}). - -This @value{SECTION} discusses the ways you might want to change @command{gawk} -as well as any considerations you should bear in mind. - -@menu -* Adding Code:: Adding code to the main body of - @command{gawk}. -* New Ports:: Porting @command{gawk} to a new operating - system. -@end menu - -@node Adding Code, New Ports, Additions, Additions -@appendixsubsec Adding New Features - -@cindex adding new features -@cindex features, adding to @command{gawk} -You are free to add any new features you like to @command{gawk}. -However, if you want your changes to be incorporated into the @command{gawk} -distribution, there are several steps that you need to take in order to -make it possible for me to include your changes: - -@enumerate 1 -@item -Before building the new feature into @command{gawk} itself, -consider writing it as an extension module -(@pxref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}). -If that's not possible, continue with the rest of the steps in this list. - -@item -Get the latest version. -It is much easier for me to integrate changes if they are relative to -the most recent distributed version of @command{gawk}. If your version of -@command{gawk} is very old, I may not be able to integrate them at all. -(@xref{Getting, ,Getting the @command{gawk} Distribution}, -for information on getting the latest version of @command{gawk}.) - -@item -@ifnotinfo -Follow the @cite{GNU Coding Standards}. -@end ifnotinfo -@ifinfo -See @inforef{Top, , Version, standards, GNU Coding Standards}. -@end ifinfo -This document describes how GNU software should be written. If you haven't -read it, please do so, preferably @emph{before} starting to modify @command{gawk}. -(The @cite{GNU Coding Standards} are available from -the GNU Project's -@command{ftp} -site, at -@uref{ftp://gnudist.gnu.org/gnu/GNUInfo/standards.text}. -Texinfo, Info, and DVI versions are also available.) - -@cindex @command{gawk}, coding style -@cindex coding style used in @command{gawk} -@item -Use the @command{gawk} coding style. -The C code for @command{gawk} follows the instructions in the -@cite{GNU Coding Standards}, with minor exceptions. The code is formatted -using the traditional ``K&R'' style, particularly as regards to the placement -of braces and the use of tabs. In brief, the coding rules for @command{gawk} -are as follows: - -@itemize @bullet -@item -Use ANSI/ISO style (prototype) function headers when defining functions. - -@item -Put the name of the function at the beginning of its own line. - -@item -Put the return type of the function, even if it is @code{int}, on the -line above the line with the name and arguments of the function. - -@item -Put spaces around parentheses used in control structures -(@code{if}, @code{while}, @code{for}, @code{do}, @code{switch}, -and @code{return}). - -@item -Do not put spaces in front of parentheses used in function calls. - -@item -Put spaces around all C operators and after commas in function calls. - -@item -Do not use the comma operator to produce multiple side effects, except -in @code{for} loop initialization and increment parts, and in macro bodies. - -@item -Use real tabs for indenting, not spaces. - -@item -Use the ``K&R'' brace layout style. - -@item -Use comparisons against @code{NULL} and @code{'\0'} in the conditions of -@code{if}, @code{while}, and @code{for} statements, as well as in the @code{case}s -of @code{switch} statements, instead of just the -plain pointer or character value. - -@item -Use the @code{TRUE}, @code{FALSE} and @code{NULL} symbolic constants -and the character constant @code{'\0'} where appropriate, instead of @code{1} -and @code{0}. - -@item -Use the @code{ISALPHA}, @code{ISDIGIT}, etc.@: macros, instead of the -traditional lowercase versions; these macros are better behaved for -non-ASCII character sets. - -@item -Provide one-line descriptive comments for each function. - -@item -Do not use @samp{#elif}. Many older Unix C compilers cannot handle it. - -@item -Do not use the @code{alloca} function for allocating memory off the stack. -Its use causes more portability trouble than is worth the minor benefit of not having -to free the storage. Instead, use @code{malloc} and @code{free}. -@end itemize - -@strong{Note:} -If I have to reformat your code to follow the coding style used in -@command{gawk}, I may not bother to integrate your changes at all. - -@item -Be prepared to sign the appropriate paperwork. -In order for the FSF to distribute your changes, you must either place -those changes in the public domain and submit a signed statement to that -effect, or assign the copyright in your changes to the FSF. -Both of these actions are easy to do and @emph{many} people have done so -already. If you have questions, please contact me -(@pxref{Bugs, , Reporting Problems and Bugs}), -or @email{gnu@@gnu.org}. - -@cindex Texinfo -@item -Update the documentation. -Along with your new code, please supply new sections and/or chapters -for this @value{DOCUMENT}. If at all possible, please use real -Texinfo, instead of just supplying unformatted ASCII text (although -even that is better than no documentation at all). -Conventions to be followed in @cite{@value{TITLE}} are provided -after the @samp{@@bye} at the end of the Texinfo source file. -If possible, please update the @command{man} page as well. - -You will also have to sign paperwork for your documentation changes. - -@item -Submit changes as context diffs or unified diffs. -Use @samp{diff -c -r -N} or @samp{diff -u -r -N} to compare -the original @command{gawk} source tree with your version. -(I find context diffs to be more readable but unified diffs are -more compact.) -I recommend using the GNU version of @command{diff}. -Send the output produced by either run of @command{diff} to me when you -submit your changes. -(@xref{Bugs, , Reporting Problems and Bugs}, for the electronic mail -information.) - -Using this format makes it easy for me to apply your changes to the -master version of the @command{gawk} source code (using @code{patch}). -If I have to apply the changes manually, using a text editor, I may -not do so, particularly if there are lots of changes. - -@item -Include an entry for the @file{ChangeLog} file with your submission. -This helps further minimize the amount of work I have to do, -making it easier for me to accept patches. -@end enumerate - -Although this sounds like a lot of work, please remember that while you -may write the new code, I have to maintain it and support it. If it -isn't possible for me to do that with a minimum of extra work, then I -probably will not. - -@node New Ports, , Adding Code, Additions -@appendixsubsec Porting @command{gawk} to a New Operating System - -@cindex porting @command{gawk} -If you want to port @command{gawk} to a new operating system, there are -several steps to follow: - -@enumerate 1 -@item -Follow the guidelines in -@ifinfo -@ref{Adding Code, ,Adding New Features}, -@end ifinfo -@ifnotinfo -the previous @value{SECTION} -@end ifnotinfo -concerning coding style, submission of diffs, and so on. - -@item -When doing a port, bear in mind that your code must co-exist peacefully -with the rest of @command{gawk} and the other ports. Avoid gratuitous -changes to the system-independent parts of the code. If at all possible, -avoid sprinkling @samp{#ifdef}s just for your port throughout the -code. - -@cindex GPL -@cindex General Public License -@cindex GNU General Public License -If the changes needed for a particular system affect too much of the -code, I probably will not accept them. In such a case, you can, of course, -distribute your changes on your own, as long as you comply -with the GPL -(@pxref{Copying, ,GNU General Public License}). - -@item -A number of the files that come with @command{gawk} are maintained by other -people at the Free Software Foundation. Thus, you should not change them -unless it is for a very good reason; i.e., changes are not out of the -question, but changes to these files are scrutinized extra carefully. -The files are @file{getopt.h}, @file{getopt.c}, -@file{getopt1.c}, @file{regex.h}, @file{regex.c}, @file{dfa.h}, -@file{dfa.c}, @file{install-sh}, and @file{mkinstalldirs}. - -@item -Be willing to continue to maintain the port. -Non-Unix operating systems are supported by volunteers who maintain -the code needed to compile and run @command{gawk} on their systems. If noone -volunteers to maintain a port, it becomes unsupported and it may -be necessary to remove it from the distribution. - -@item -Supply an appropriate @file{gawkmisc.???} file. -Each port has its own @file{gawkmisc.???} that implements certain -operating system specific functions. This is cleaner than a plethora of -@samp{#ifdef}s scattered throughout the code. The @file{gawkmisc.c} in -the main source directory includes the appropriate -@file{gawkmisc.???} file from each subdirectory. -Be sure to update it as well. - -Each port's @file{gawkmisc.???} file has a suffix reminiscent of the machine -or operating system for the port---for example, @file{pc/gawkmisc.pc} and -@file{vms/gawkmisc.vms}. The use of separate suffixes, instead of plain -@file{gawkmisc.c}, makes it possible to move files from a port's subdirectory -into the main subdirectory, without accidentally destroying the real -@file{gawkmisc.c} file. (Currently, this is only an issue for the -PC operating system ports.) - -@item -Supply a @file{Makefile} as well as any other C source and header files that are -necessary for your operating system. All your code should be in a -separate subdirectory, with a name that is the same as, or reminiscent -of, either your operating system or the computer system. If possible, -try to structure things so that it is not necessary to move files out -of the subdirectory into the main source directory. If that is not -possible, then be sure to avoid using names for your files that -duplicate the names of files in the main source directory. - -@item -Update the documentation. -Please write a section (or sections) for this @value{DOCUMENT} describing the -installation and compilation steps needed to compile and/or install -@command{gawk} for your system. - -@item -Be prepared to sign the appropriate paperwork. -In order for the FSF to distribute your code, you must either place -your code in the public domain and submit a signed statement to that -effect, or assign the copyright in your code to the FSF. -@ifinfo -Both of these actions are easy to do and @emph{many} people have done so -already. If you have questions, please contact me, or -@email{gnu@@gnu.org}. -@end ifinfo -@end enumerate - -Following these steps makes it much easier to integrate your changes -into @command{gawk} and have them co-exist happily with other -operating systems' code that is already there. - -In the code that you supply and maintain, feel free to use a -coding style and brace layout that suits your taste. - -@node Dynamic Extensions, Future Extensions, Additions, Notes -@appendixsec Adding New Built-in Functions to @command{gawk} -@cindex Robinson, Will -@cindex robot, the -@cindex Lost In Space -@quotation -@i{Danger Will Robinson! Danger!!@* -Warning! Warning!}@* -The Robot -@end quotation - -@cindex Linux -@cindex GNU/Linux -Beginning with @command{gawk} 3.1, it is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the @code{dlopen} and @code{dlsym} functions. -This @value{SECTION} describes how to write and use dynamically -loaded extentions for @command{gawk}. -Experience with programming in -C or C++ is necessary when reading this @value{SECTION}. - -@strong{Caution:} The facilities described in this @value{SECTION} -are very much subject to change in the next @command{gawk} release. -Be aware that you may have to re-do everything, perhaps from scratch, -upon the next release. - -@menu -* Internals:: A brief look at some @command{gawk} internals. -* Sample Library:: A example of new functions. -@end menu - -@node Internals, Sample Library, Dynamic Extensions, Dynamic Extensions -@appendixsubsec A Minimal Introduction to @command{gawk} Internals - -The truth is that @command{gawk} was not designed for simple extensibility. -The facilities for adding functions using shared libraries work, but -are something of a ``bag on the side.'' Thus, this tour is -brief and simplistic; would-be @command{gawk} hackers are encouraged to -spend some time reading the source code before trying to write -extensions based on the material presented here. Of particular note -are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}. -Reading @file{awk.y} in order to see how the parse tree is built -would also be of use. - -With the disclaimers out of the way, the following types, structure -members, functions, and macros are declared in @file{awk.h} and are of -use when writing extensions. The next @value{SECTION} -shows how they are used: - -@table @code -@cindex @code{AWKNUM} internal type -@cindex internal type, @code{AWKNUM} -@item AWKNUM -An @code{AWKNUM} is the internal type of @command{awk} -floating-point numbers. Typically, it is a C @code{double}. - -@cindex @code{NODE} internal type -@cindex internal type, @code{NODE} -@item NODE -Just about everything is done using objects of type @code{NODE}. -These contain both strings and numbers, as well as variables and arrays. - -@cindex @code{force_number} internal function -@cindex internal function, @code{force_number} -@item AWKNUM force_number(NODE *n) -This macro forces a value to be numeric. It returns the actual -numeric value contained in the node. -It may end up calling an internal @command{gawk} function. - -@cindex @code{force_string} internal function -@cindex internal function, @code{force_string} -@item void force_string(NODE *n) -This macro guarantees that a @code{NODE}'s string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the string is zero-terminated. - -@cindex @code{param_cnt} internal variable -@cindex internal variable, @code{param_cnt} -@item n->param_cnt -The number of parameters actually passed in a function call at runtime. - -@cindex @code{stptr} internal variable -@cindex @code{stlen} internal variable -@cindex internal variable, @code{stptr} -@cindex internal variable, @code{stlen} -@item n->stptr -@itemx n->stlen -The data and length of a @code{NODE}'s string value, respectively. -The string is @emph{not} guaranteed to be zero-terminated. -If you need to pass the string value to a C library function, save -the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it, -call the routine, and then restore the value. - -@cindex @code{type} internal variable -@cindex internal variable, @code{type} -@item n->type -The type of the @code{NODE}. This is a C @code{enum}. Values should -be either @code{Node_var} or @code{Node_var_array} for function -parameters. - -@cindex @code{vname} internal variable -@cindex internal variable, @code{vname} -@item n->vname -The ``variable name'' of a node. This is not of much use inside -externally written extensions. - -@cindex @code{assoc_clear} internal function -@cindex internal function, @code{assoc_clear} -@item void assoc_clear(NODE *n) -Clears the associative array pointed to by @code{n}. -Make sure that @samp{n->type == Node_var_array} first. - -@cindex @code{assoc_lookup} internal function -@cindex internal function, @code{assoc_lookup} -@item NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference) -Finds, and installs if necessary, array elements. -@code{symbol} is the array, @code{subs} is the subscript. -This is usually a value created with @code{tmp_string} (see below). -@code{reference} should be @code{TRUE} if it is an error to use the -value before it is created. Typically, @code{FALSE} is the -correct value to use from extension functions. - -@cindex @code{make_string} internal function -@cindex internal function, @code{make_string} -@item NODE *make_string(char *s, size_t len) -Take a C string and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - -@cindex @code{make_number} internal function -@cindex internal function, @code{make_number} -@item NODE *make_number(AWKNUM val) -Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - -@cindex @code{tmp_string} internal function -@item NODE *tmp_string(char *s, size_t len); -@cindex internal function, @code{tmp_string} -Take a C string and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is temporary storage; understanding -of @command{gawk} memory management is helpful. - -@cindex @code{tmp_number} internal function -@item NODE *tmp_number(AWKNUM val) -@cindex internal function, @code{tmp_number} -Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is temporary storage; -understanding of @command{gawk} memory management is helpful. - -@cindex @code{dupnode} internal function -@cindex internal function, @code{dupnode} -@item NODE *dupnode(NODE *n) -Duplicate a node. In most cases, this increments an internal -reference count instead of actually duplicating the entire @code{NODE}; -understanding of @command{gawk} memory management is helpful. - -@cindex @code{free_temp} internal macro -@cindex internal macro, @code{free_temp} -@item void free_temp(NODE *n) -This macro releases the memory associated with a @code{NODE} -allocated with @code{tmp_string} or @code{tmp_number}. -Understanding of @command{gawk} memory management is helpful. - -@cindex @code{make_builtin} internal function -@cindex internal function, @code{make_builtin} -@item void make_builtin(char *name, NODE *(*func)(NODE *), int count) -Register a C function pointed to by @code{func} as new built-in -function @code{name}. @code{name} is a regular C string. @code{count} -is the maximum number of arguments that the function takes. -The function should be written in the following manner: - -@example -/* do_xxx --- do xxx function for gawk */ - -NODE * -do_xxx(NODE *tree) -@{ - @dots{} -@} -@end example - -@cindex @code{get_argument} internal function -@cindex internal function, @code{get_argument} -@item NODE *get_argument(NODE *tree, int i) -This function is called from within a C extension function to get -the @code{i}'th argument from the function call. -The first argument is argument zero. - -@cindex @code{set_value} internal function -@item void set_value(NODE *tree) -@cindex internal function, @code{set_value} -This function is called from within a C extension function to set -the return value from the extension function. This value is -what the @command{awk} program sees as the return value from the -new @command{awk} function. - -@cindex @code{update_ERRNO} internal function -@item void update_ERRNO(void) -@cindex internal function, @code{update_ERRNO} -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable, based on the current -value of the C @code{errno} variable. -It is provided as a convenience. -@end table - -An argument that is supposed to be an array needs to be handled with -some extra code, in case the array being passed in is actually -from a function parameter. -The following ``boiler plate'' code shows how to do this: - -@smallexample -NODE *the_arg; - -the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */ - -/* if a parameter, get it off the stack */ -if (the_arg->type == Node_param_list) - the_arg = stack_ptr[the_arg->param_cnt]; - -/* parameter referenced an array, get it */ -if (the_arg->type == Node_array_ref) - the_arg = the_arg->orig_array; - -/* check type */ -if (the_arg->type != Node_var && the_arg->type != Node_var_array) - fatal("newfunc: third argument is not an array"); - -/* force it to be an array, if necessary, clear it */ -the_arg->type = Node_var_array; -assoc_clear(the_arg); -@end smallexample - -Again, you should spend time studying the @command{gawk} internals; -don't just blindly copy this code. - -@node Sample Library, , Internals, Dynamic Extensions -@appendixsubsec Directory and File Operation Built-ins - -Two useful functions that are not in @command{awk} are @code{chdir} -(so that an @command{awk} program can change its directory) and -@code{stat} (so that an @command{awk} program can gather information about -a file). -This @value{SECTION} implements these functions for @command{gawk} in an -external extension library. - -@menu -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -@end menu - -@node Internal File Description, Internal File Ops, Sample Library, Sample Library -@appendixsubsubsec Using @code{chdir} and @code{stat} - -This @value{SECTION} shows how to use the new functions at the @command{awk} -level once they've been integrated into the running @command{gawk} -interpreter. -Using @code{chdir} is very straightforward. It takes one argument, -the new directory to change to: - -@example -@dots{} -newdir = "/home/arnold/funstuff" -ret = chdir(newdir) -if (ret < 0) @{ - printf("could not change to %s: %s\n", - newdir, ERRNO) > "/dev/stderr" - exit 1 -@} -@dots{} -@end example - -The return value is negative if the @code{chdir} failed, -and @code{ERRNO} -(@pxref{Built-in Variables}) -is set to a string indicating the error. - -Using @code{stat} is a bit more complicated. -The C @code{stat} function fills in a structure that has a fair -amount of information. -The right way to model this in @command{awk} is to fill in an associative -array with the appropriate information: - -@c broke printf for page breaking -@example -file = "/home/arnold/.profile" -fdata[1] = "x" # force `fdata' to be an array -ret = stat(file, fdata) -if (ret < 0) @{ - printf("could not stat %s: %s\n", - file, ERRNO) > "/dev/stderr" - exit 1 -@} -printf("size of %s is %d bytes\n", file, fdata["size"]) -@end example - -The @code{stat} function always clears the data array, even if -the @code{stat} fails. It fills in the following elements: - -@table @code -@item "name" -The name of the file that was @code{stat}'ed. - -@item "dev" -@itemx "ino" -The file's device and inode numbers, respectively. - -@item "mode" -The file's mode, as a numeric value. This includes both the file's -type and its permissions. - -@item "nlink" -The number of hard links (directory entries) the file has. - -@item "uid" -@itemx "gid" -The numeric user and group ID numbers of the file's owner. - -@item "size" -The size in bytes of the file. - -@item "blocks" -The number of disk blocks the file actually occupies. This may not -be a function of the file's size if the file has holes. - -@item "atime" -@itemx "mtime" -@itemx "ctime" -The file's last access, modification, and inode update times, -respectively. These are numeric timestamps, suitable for formatting -with @code{strftime} -(@pxref{Built-in, ,Built-in Functions}). - -@item "pmode" -The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as what is produced by -@samp{ls -l}---for example, @code{"drwxr-xr-x"}. - -@item "type" -A printable string representation of the file's type. The value -is one of the following: - -@table @code -@item "blockdev" -@itemx "chardev" -The file is a block or character device (``special file''). - -@ignore -@item "door" -The file is a Solaris ``door'' (special file used for -interprocess communications). -@end ignore - -@item "directory" -The file is a directory. - -@item "fifo" -The file is a named-pipe (also known as a FIFO). - -@item "file" -The file is just a regular file. - -@item "socket" -The file is an @code{AF_UNIX} (``Unix domain'') socket in the -filesystem. - -@item "symlink" -The file is a symbolic link. -@end table -@end table - -Several additional elements may be present depending upon the operating -system and the type of the file. You can test for them in your @command{awk} -program by using the @code{in} operator -(@pxref{Reference to Elements, ,Referring to an Array Element}): - -@table @code -@item "blksize" -The preferred block size for I/O to the file. This field is not -present on all POSIX-like systems in the C @code{stat} structure. - -@item "linkval" -If the file is a symbolic link, this element is the name of the -file the link points to (i.e., the value of the link). - -@item "rdev" -@itemx "major" -@itemx "minor" -If the file is a block or character device file, then these values -represent the numeric device number and the major and minor components -of that number, respectively. -@end table - -@node Internal File Ops, Using Internal File Ops, Internal File Description, Sample Library -@appendixsubsubsec C Code for @code{chdir} and @code{stat} - -@cindex Linux -@cindex GNU/Linux -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability -to other POSIX-compliant systems:@footnote{This version is edited -slightly for presentation. The complete version can be found in -@file{extension/filefuncs.c} in the @command{gawk} distribution.} - -@c break line for page breaking -@example -#include "awk.h" - -#include <sys/sysmacros.h> - -/* do_chdir --- provide dynamically loaded - chdir() builtin for gawk */ - -static NODE * -do_chdir(tree) -NODE *tree; -@{ - NODE *newdir; - int ret = -1; - - newdir = get_argument(tree, 0); -@end example - -The file includes the @code{"awk.h"} header file for definitions -for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} -for access to the @code{major} and @code{minor} macros. - -@cindex conventions, programming -@cindex programming conventions -By convention, for an @command{awk} function @code{foo}, the function that -implements it is called @samp{do_foo}. The function should take -a @samp{NODE *} argument, usually called @code{tree}, that -represents the argument list to the function. The @code{newdir} -variable represents the new directory to change to, retrieved -with @code{get_argument}. Note that the first argument is -numbered zero. - -This code actually accomplishes the @code{chdir}. It first forces -the argument to be a string and passes the string value to the -@code{chdir} system call. If the @code{chdir} fails, @code{ERRNO} -is updated. -The result of @code{force_string} has to be freed with @code{free_temp}: - -@example - if (newdir != NULL) @{ - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO(); - - free_temp(newdir); - @} -@end example - -Finally, the function returns the return value to the @command{awk} level, -using @code{set_value}. Then it must return a value from the call to -the new built-in (this value ignored by the interpreter): - -@example - /* Set the return value */ - set_value(tmp_number((AWKNUM) ret)); - - /* Just to make the interpreter happy */ - return tmp_number((AWKNUM) 0); -@} -@end example - -The @code{stat} built-in is more involved. First comes a function -that turns a numeric mode into a printable representation -(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: - -@c break line for page breaking -@example -/* format_mode --- turn a stat mode field - into something readable */ - -static char * -format_mode(fmode) -unsigned long fmode; -@{ - @dots{} -@} -@end example - -Next comes the actual @code{do_stat} function itself. First come the -variable declarations and argument checking: - -@ignore -Changed message for page breaking. Used to be: - "stat: called with incorrect number of arguments (%d), should be 2", -@end ignore -@example -/* do_stat --- provide a stat() function for gawk */ - -static NODE * -do_stat(tree) -NODE *tree; -@{ - NODE *file, *array; - struct stat sbuf; - int ret; - char *msg; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; - - /* check arg count */ - if (tree->param_cnt != 2) - fatal( - "stat: called with %d arguments, should be 2", - tree->param_cnt); -@end example - -Then comes the actual work. First, we get the arguments. -Then, we always clear the array. To get the file information, -we use @code{lstat}, in case the file is a symbolic link. -If there's an error, we set @code{ERRNO} and return: - -@c comment made multiline for page breaking -@example - /* - * directory is first arg, - * array to hold results is second - */ - file = get_argument(tree, 0); - array = get_argument(tree, 1); - - /* empty out the array */ - assoc_clear(array); - - /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); - if (ret < 0) @{ - update_ERRNO(); - - set_value(tmp_number((AWKNUM) ret)); - - free_temp(file); - return tmp_number((AWKNUM) 0); - @} -@end example - -Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: - -@example - /* fill in the array */ - aptr = assoc_lookup(array, tmp_string("name", 4), FALSE); - *aptr = dupnode(file); - - aptr = assoc_lookup(array, tmp_string("mode", 4), FALSE); - *aptr = make_number((AWKNUM) sbuf.st_mode); - - aptr = assoc_lookup(array, tmp_string("pmode", 5), FALSE); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); -@end example - -When done, we free the temporary value containing the @value{FN}, -set the return value, and return: - -@example - free_temp(file); - - /* Set the return value */ - set_value(tmp_number((AWKNUM) ret)); - - /* Just to make the interpreter happy */ - return tmp_number((AWKNUM) 0); -@} -@end example - -@cindex conventions, programming -@cindex programming conventions -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. By convention, each library has -a routine named @code{dlload} that does the job: - -@example -/* dlload --- load new builtins in this library */ - -NODE * -dlload(tree, dl) -NODE *tree; -void *dl; -@{ - make_builtin("chdir", do_chdir, 1); - make_builtin("stat", do_stat, 2); - return tmp_number((AWKNUM) 0); -@} -@end example - -And that's it! As an exercise, consider adding functions to -implement system calls such as @code{chown}, @code{chmod}, and @code{umask}. - -@node Using Internal File Ops, , Internal File Ops, Sample Library -@appendixsubsubsec Integrating the Extensions - -@cindex Linux -@cindex GNU/Linux -Now that the code is written, it must be possible to add it at -runtime to the running @command{gawk} interpreter. First, the -code must be compiled. Assuming that the functions are in -a file named @file{filefuncs.c}, and @var{idir} is the location -of the @command{gawk} include files, -the following steps create -a GNU/Linux shared library: - -@example -$ gcc -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c -$ ld -o filefuncs.so -shared filefuncs.o -@end example - -@cindex @code{extension} built-in function -Once the library exists, it is loaded by calling the @code{extension} -built-in function. -This function takes two arguments: the name of the -library to load and the name of a function to call when the library -is first loaded. This function adds the new functions to @command{gawk}. -It returns the value returned by the initialization function -within the shared library: - -@example -# file testff.awk -BEGIN @{ - extension("./filefuncs.so", "dlload") - - chdir(".") # no-op - - data[1] = 1 # force `data' to be an array - print "Info for testff.awk" - ret = stat("testff.awk", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "testff.awk modified:", - strftime("%m %d %y %H:%M:%S", data["mtime"]) -@} -@end example - -Here are the results of running the program: - -@example -$ gawk -f testff.awk -@print{} Info for testff.awk -@print{} ret = 0 -@print{} data["blksize"] = 4096 -@print{} data["mtime"] = 932361936 -@print{} data["mode"] = 33188 -@print{} data["type"] = file -@print{} data["dev"] = 2065 -@print{} data["gid"] = 10 -@print{} data["ino"] = 878597 -@print{} data["ctime"] = 971431797 -@print{} data["blocks"] = 2 -@print{} data["nlink"] = 1 -@print{} data["name"] = testff.awk -@print{} data["atime"] = 971608519 -@print{} data["pmode"] = -rw-r--r-- -@print{} data["size"] = 607 -@print{} data["uid"] = 2076 -@print{} testff.awk modified: 07 19 99 08:25:36 -@end example - -@node Future Extensions, , Dynamic Extensions, Notes -@appendixsec Probable Future Extensions -@ignore -From emory!scalpel.netlabs.com!lwall Tue Oct 31 12:43:17 1995 -Return-Path: <emory!scalpel.netlabs.com!lwall> -Message-Id: <9510311732.AA28472@scalpel.netlabs.com> -To: arnold@skeeve.atl.ga.us (Arnold D. Robbins) -Subject: Re: May I quote you? -In-Reply-To: Your message of "Tue, 31 Oct 95 09:11:00 EST." - <m0tAHPQ-00014MC@skeeve.atl.ga.us> -Date: Tue, 31 Oct 95 09:32:46 -0800 -From: Larry Wall <emory!scalpel.netlabs.com!lwall> - -: Greetings. I am working on the release of gawk 3.0. Part of it will be a -: thoroughly updated manual. One of the sections deals with planned future -: extensions and enhancements. I have the following at the beginning -: of it: -: -: @cindex PERL -: @cindex Wall, Larry -: @display -: @i{AWK is a language similar to PERL, only considerably more elegant.} @* -: Arnold Robbins -: @sp 1 -: @i{Hey!} @* -: Larry Wall -: @end display -: -: Before I actually release this for publication, I wanted to get your -: permission to quote you. (Hopefully, in the spirit of much of GNU, the -: implied humor is visible... :-) - -I think that would be fine. - -Larry -@end ignore -@cindex PERL -@cindex Wall, Larry -@cindex Robbins, Arnold -@quotation -@i{AWK is a language similar to PERL, only considerably more elegant.}@* -Arnold Robbins - -@i{Hey!}@* -Larry Wall -@end quotation - -This @value{SECTION} briefly lists extensions and possible improvements -that indicate the directions we are -currently considering for @command{gawk}. The file @file{FUTURES} in the -@command{gawk} distribution lists these extensions as well. - -Following is a list of probable future changes visible at the -@command{awk} language level: - -@c these are ordered by likelihood -@table @asis -@item Loadable Module Interface -It is not clear that the @command{awk}-level interface to the -modules facility is as good as it should be. The interface needs to be -redesigned, particularly taking namespace issues into account, as -well as possibly including issues such as library search path order -and versioning. - -@item @code{RECLEN} variable for fixed length records -Along with @code{FIELDWIDTHS}, this would speed up the processing of -fixed-length records. -@code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"}, -depending upon which kind of record processing is in effect. - -@item Additional @code{printf} specifiers -The 1999 ISO C standard added a number of additional @code{printf} -format specifiers. These should be evaluated for possible inclusion -in @command{gawk}. - -@ignore -@item A @samp{%'d} flag -Add @samp{%'d} for putting in commas in formatting numeric values. -@end ignore - -@item Databases -It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. - -@item Large Character Sets -It would be nice if @command{gawk} could handle UTF-8 and other -character sets that are larger than eight bits. - -@item More @code{lint} warnings -There are more things that could be checked for portability. -@end table - -Following is a list of probable improvements that will make @command{gawk}'s -source code easier to work with: - -@table @asis -@item Loadable Module Mechanics -The current extension mechanism works -(@pxref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}), -but is rather primitive. It requires a fair amount of manual work -to create and integrate a loadable module. -Nor is the current mechanism as portable as might be desired. -The GNU @command{libtool} package provides a number of features that -would make using loadable modules much easier. -@command{gawk} should be changed to use @command{libtool}. - -@item Loadable Module Internals -The API to its internals that @command{gawk} ``exports'' should be revised. -Too many things are needlessly exposed. A new API should be designed -and implemented to make module writing easier. - -@item Better Array Subscript Management -@command{gawk}'s management of array subscript storage could use revamping, -so that using the same value to index multiple arrays only -stores one copy of the index value. - -@item Integrating the DBUG Library -Integrating Fred Fish's DBUG library would be helpful during development, -but it's a lot of work to do. -@end table - -Following is a list of probable improvements that will make @command{gawk} -perform better: - -@table @asis -@item An Improved Version of @code{dfa} -The @code{dfa} pattern matcher from GNU @command{grep} has some -problems. Either a new version or a fixed one will deal with some -important regexp matching issues. - -@c NEXT ED: remove this item. awka and mawk do these respectively -@item Compilation of @command{awk} programs -@command{gawk} uses a Bison (YACC-like) -parser to convert the script given it into a syntax tree; the syntax -tree is then executed by a simple recursive evaluator. This method incurs -a lot of overhead, since the recursive evaluator performs many procedure -calls to do even the simplest things. - -It should be possible for @command{gawk} to convert the script's parse tree -into a C program which the user would then compile, using the normal -C compiler and a special @command{gawk} library to provide all the needed -functions (regexps, fields, associative arrays, type coercion, and so on). - -An easier possibility might be for an intermediate phase of @command{gawk} to -convert the parse tree into a linear byte code form like the one used -in GNU Emacs Lisp. The recursive evaluator would then be replaced by -a straight line byte code interpreter that would be intermediate in speed -between running a compiled program and doing what @command{gawk} does -now. -@end table - -Finally, -the programs in the test suite could use documenting in this @value{DOCUMENT}. - -@xref{Additions, ,Making Additions to @command{gawk}}, -if you are interested in tackling any of these projects. - -@node Basic Concepts, Glossary, Notes, Top -@appendix Basic Programming Concepts -@cindex basic programming concepts -@cindex programming concepts, basic - -This @value{APPENDIX} attempts to define some of the basic concepts -and terms that are used throughout the rest of this @value{DOCUMENT}. -As this @value{DOCUMENT} is specifically about @command{awk}, -and not about computer programming in general, the coverage here -is by necessity fairly cursory and simplistic. -(If you need more background, there are many -other introductory texts that you should refer to instead.) - -@menu -* Basic High Level:: The high level view. -* Basic Data Typing:: A very quick intro to data types. -* Floating Point Issues:: Stuff to know about floating-point numbers. -@end menu - -@node Basic High Level, Basic Data Typing, Basic Concepts, Basic Concepts -@appendixsec What a Program Does - -@cindex processing data -At the most basic level, the job of a program is to process -some input data and produce results. - -@c NEXT ED: Use real images here -@iftex -@tex -\expandafter\ifx\csname graph\endcsname\relax \csname newbox\endcsname\graph\fi -\expandafter\ifx\csname graphtemp\endcsname\relax \csname newdimen\endcsname\graphtemp\fi -\setbox\graph=\vtop{\vskip 0pt\hbox{% - \special{pn 20}% - \special{pa 2425 200}% - \special{pa 2850 200}% - \special{fp}% - \special{sh 1.000}% - \special{pn 20}% - \special{pa 2750 175}% - \special{pa 2850 200}% - \special{pa 2750 225}% - \special{pa 2750 175}% - \special{fp}% - \special{pn 20}% - \special{pa 850 200}% - \special{pa 1250 200}% - \special{fp}% - \special{sh 1.000}% - \special{pn 20}% - \special{pa 1150 175}% - \special{pa 1250 200}% - \special{pa 1150 225}% - \special{pa 1150 175}% - \special{fp}% - \special{pn 20}% - \special{pa 2950 400}% - \special{pa 3650 400}% - \special{pa 3650 0}% - \special{pa 2950 0}% - \special{pa 2950 400}% - \special{fp}% - \special{pn 10}% - \special{ar 1800 200 450 200 0 6.28319}% - \graphtemp=.5ex\advance\graphtemp by 0.200in - \rlap{\kern 3.300in\lower\graphtemp\hbox to 0pt{\hss Results\hss}}% - \graphtemp=.5ex\advance\graphtemp by 0.200in - \rlap{\kern 1.800in\lower\graphtemp\hbox to 0pt{\hss Program\hss}}% - \special{pn 10}% - \special{pa 0 400}% - \special{pa 700 400}% - \special{pa 700 0}% - \special{pa 0 0}% - \special{pa 0 400}% - \special{fp}% - \graphtemp=.5ex\advance\graphtemp by 0.200in - \rlap{\kern 0.350in\lower\graphtemp\hbox to 0pt{\hss Data\hss}}% - \hbox{\vrule depth0.400in width0pt height 0pt}% - \kern 3.650in - }% -}% -\centerline{\box\graph} -@end tex -@end iftex -@ifnottex -@example - _______ -+------+ / \ +---------+ -| Data | -----> < Program > -----> | Results | -+------+ \_______/ +---------+ -@end example -@end ifnottex - -@cindex compiled programs -@cindex programs, compiled -@cindex interpreted programs -@cindex programs, interpreted -The ``program'' in the figure can be either a compiled -program@footnote{Compiled programs are typically written -in lower-level languages such as C, C++, Fortran, or Ada, -and then translated, or @dfn{compiled}, into a form that -the computer can execute directly.} -(such as @command{ls}), -or it may be @dfn{interpreted}. In the latter case, a machine-executable -program such as @command{awk} reads your program, and then uses the -instructions in your program to process the data. - -@cindex programming, basic steps -When you write a program, it usually consists -of the following, very basic set of steps: - -@c NEXT ED: Use real images here -@iftex -@tex -\expandafter\ifx\csname graph\endcsname\relax \csname newbox\endcsname\graph\fi -\expandafter\ifx\csname graphtemp\endcsname\relax \csname newdimen\endcsname\graphtemp\fi -\setbox\graph=\vtop{\vskip 0pt\hbox{% - \graphtemp=.5ex\advance\graphtemp by 0.600in - \rlap{\kern 2.800in\lower\graphtemp\hbox to 0pt{\hss Yes\hss}}% - \graphtemp=.5ex\advance\graphtemp by 0.100in - \rlap{\kern 3.300in\lower\graphtemp\hbox to 0pt{\hss No\hss}}% - \special{pn 8}% - \special{pa 2100 1000}% - \special{pa 1600 1000}% - \special{pa 1600 1000}% - \special{pa 1600 300}% - \special{fp}% - \special{sh 1.000}% - \special{pn 8}% - \special{pa 1575 400}% - \special{pa 1600 300}% - \special{pa 1625 400}% - \special{pa 1575 400}% - \special{fp}% - \special{pn 8}% - \special{pa 2600 500}% - \special{pa 2600 900}% - \special{fp}% - \special{sh 1.000}% - \special{pn 8}% - \special{pa 2625 800}% - \special{pa 2600 900}% - \special{pa 2575 800}% - \special{pa 2625 800}% - \special{fp}% - \special{pn 8}% - \special{pa 3200 200}% - \special{pa 4000 200}% - \special{fp}% - \special{sh 1.000}% - \special{pn 8}% - \special{pa 3900 175}% - \special{pa 4000 200}% - \special{pa 3900 225}% - \special{pa 3900 175}% - \special{fp}% - \special{pn 8}% - \special{pa 1400 200}% - \special{pa 2100 200}% - \special{fp}% - \special{sh 1.000}% - \special{pn 8}% - \special{pa 2000 175}% - \special{pa 2100 200}% - \special{pa 2000 225}% - \special{pa 2000 175}% - \special{fp}% - \special{pn 8}% - \special{ar 2600 1000 400 100 0 6.28319}% - \graphtemp=.5ex\advance\graphtemp by 1.000in - \rlap{\kern 2.600in\lower\graphtemp\hbox to 0pt{\hss Process\hss}}% - \special{pn 8}% - \special{pa 2200 400}% - \special{pa 3100 400}% - \special{pa 3100 0}% - \special{pa 2200 0}% - \special{pa 2200 400}% - \special{fp}% - \graphtemp=.5ex\advance\graphtemp by 0.200in - \rlap{\kern 2.688in\lower\graphtemp\hbox to 0pt{\hss More Data?\hss}}% - \special{pn 8}% - \special{ar 650 200 650 200 0 6.28319}% - \graphtemp=.5ex\advance\graphtemp by 0.200in - \rlap{\kern 0.613in\lower\graphtemp\hbox to 0pt{\hss Initialization\hss}}% - \special{pn 8}% - \special{ar 0 200 0 0 0 6.28319}% - \special{pn 8}% - \special{ar 4550 200 450 100 0 6.28319}% - \graphtemp=.5ex\advance\graphtemp by 0.200in - \rlap{\kern 4.600in\lower\graphtemp\hbox to 0pt{\hss Clean Up\hss}}% - \hbox{\vrule depth1.100in width0pt height 0pt}% - \kern 5.000in - }% -}% -\centerline{\box\graph} -@end tex -@end iftex -@ifnottex -@example - ______ -+----------------+ / More \ No +----------+ -| Initialization | -------> < Data > -------> | Clean Up | -+----------------+ ^ \ ? / +----------+ - | +--+-+ - | | Yes - | | - | V - | +---------+ - +-----+ Process | - +---------+ -@end example -@end ifnottex - -@table @asis -@item Initialization -These are the things you do before actually starting to process -data, such as checking arguments, initializing any data you need -to work with, and so on. -This step corresponds to @command{awk}'s @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). - -If you were baking a cake, this might consist of laying out all the -mixing bowls and the baking pan, and making sure you have all the -ingredients that you need. - -@item Processing -This is where the actual work is done. Your program reads data, -one logical chunk at a time, and processes it as appropriate. - -In most programming languages, you have to manually manage the reading -of data, checking to see if there is more each time you read a chunk. -@command{awk}'s pattern-action paradigm -(@pxref{Getting Started, ,Getting Started with @command{awk}}) -handles the mechanics of this for you. - -In baking a cake, the processing corresponds to the actual labor: -breaking eggs, mixing the flour, water, and other ingredients, and then putting the cake -into the oven. - -@item Clean Up -Once you've processed all the data, you may have things you need to -do before exiting. -This step corresponds to @command{awk}'s @code{END} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). - -After the cake comes out of the oven, you still have to wrap it in -plastic wrap to keep anyone from tasting it, as well as wash -the mixing bowls and other utensils. -@end table - -@cindex algorithm, definition of -An @dfn{algorithm} is a detailed set of instructions necessary to accomplish -a task, or process data. It is much the same as a recipe for baking -a cake. Programs implement algorithms. Often, it is up to you to design -the algorithm and implement it, simultaneously. - -@cindex record, definition of -@cindex fields, definition of -The ``logical chunks'' we talked about previously are called @dfn{records}, -similar to the records a company keeps on employees, a school keeps for -students, or a doctor keeps for patients. -Each record has many component parts, such as first and last names, -date of birth, address, and so on. The component parts are referred -to as the @dfn{fields} of the record. - -The act of reading data is termed @dfn{input}, and that of -generating results, not too surprisingly, is termed @dfn{output}. -They are often referred to together as ``Input/Output,'' -and even more often, as ``I/O'' for short. -(You will also see ``input'' and ``output'' used as verbs.) - -@cindex data-driven languages -@cindex language, data-driven -@command{awk} manages the reading of data for you, as well as the -breaking it up into records and fields. Your program's job is to -tell @command{awk} what to with the data. You do this by describing -@dfn{patterns} in the data to look for, and @dfn{actions} to execute -when those patterns are seen. This @dfn{data-driven} nature of -@command{awk} programs usually makes them both easier to write -and easier to read. - -@node Basic Data Typing, Floating Point Issues, Basic High Level, Basic Concepts -@appendixsec Data Values in a Computer - -@cindex variable, definition of -In a program, -you keep track of information and values in things called @dfn{variables}. -A variable is just a name for a given value, such as @code{first_name}, -@code{last_name}, @code{address}, and so on. -@command{awk} has several pre-defined variables, and it has -special names to refer to the current input record -and the fields of the record. -You may also group multiple -associated values under one name, as an array. - -@cindex values, numeric -@cindex values, string -@cindex scalar, definition of -Data, particularly in @command{awk}, consists of either numeric -values, such as 42 or 3.1415927, or string values. -String values are essentially anything that's not a number, such as a name. -Strings are sometimes referred to as @dfn{character data}, since they -store the individual characters that comprise them. -Individual variables, as well as numeric and string variables, are -referred to as @dfn{scalar} values. -Groups of values, such as arrays, are not scalars. - -@cindex integer, definition of -@cindex floating-point, definition of -Within computers, there are two kinds of numeric values: @dfn{integers}, -and @dfn{floating-point}. -In school, integer values were referred to as ``whole'' numbers---that is, -numbers without any fractional part, such as 1, 42, or @minus{}17. -The advantage to integer numbers is that they represent values exactly. -The disadvantage is that their range is limited. On most modern systems, -this range is @minus{}2,147,483,648 to 2,147,483,647. - -@cindex unsigned integers -@cindex integer, unsigned -Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. -Signed values may be negative or positive, with the range of values just -described. -Unsigned values are always positive. On most modern systems, -the range is from 0 to 4,294,967,295. - -@cindex double-precision floating-point, definition of -@cindex single-precision floating-point, definition of -Floating-point numbers represent what are called ``real'' numbers; i.e., -those that do have a fractional part, such as 3.1415927. -The advantage to floating-point numbers is that they -can represent a much larger range of values. -The disadvantage is that there are numbers that they cannot represent -exactly. -@command{awk} uses @dfn{double-precision} floating-point numbers, which -can hold more digits than @dfn{single-precision} -floating-point numbers. -Floating-point issues are discussed more fully in -@ref{Floating Point Issues, ,Floating-Point Number Caveats}. - -At the very lowest level, computers store values as groups of binary digits, -or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}. -Advanced applications sometimes have to manipulate bits directly, -and @command{gawk} provides functions for doing so. - -@cindex null string, definition of -@cindex empty string, definition of -While you are probably used to the idea of a number without a value (i.e., zero), -it takes a bit more getting used to the idea of zero-length character data. -Nevertheless, such a thing exists. -It is called the @dfn{null string}. -The null string is character data that has no value. -In other words, it is empty. It is written in @command{awk} programs -like this: @code{""}. - -Humans are used to working in decimal; i.e., base 10. In base 10, -numbers go from 0 to 9, and then ``roll over'' into the next -column. (Remember grade school? 42 is 4 times 10 plus 2.) - -There are other number bases though. Computers commonly use base 2 -or @dfn{binary}, base 8 or @dfn{octal}, and base 16 or @dfn{hexadecimal}. -In binary, each column represents two times the value in the column to -its right. Each column may contain either a 0 or a 1. -Thus, binary 1010 represents 1 times 8, plus 0 times 4, plus 1 times 2, -plus 0 times 1, or decimal 10. -Octal and hexadecimal are discussed more in -@ref{Non-decimal-numbers, ,Octal and Hexadecimal Numbers}. - -Programs are written in programming languages. -Hundreds, if not thousands, of programming languages exist. -One of the most popular is the C programming language. -The C language had a very strong influence on the design of -the @command{awk} language. - -@cindex Kernighan, Brian -@cindex Ritchie, Dennis -There have been several versions of C. The first is often referred to -as ``K&R'' C, after the initials of Brian Kernighan and Dennis Ritchie, -the authors of the first book on C. (Dennis Ritchie created the language, -and Brian Kernighan was one of the creators of @command{awk}.) - -In the mid-1980's, an effort began to produce an international standard -for C. This work culminated in 1989, with the production of the ANSI -standard for C. This standard became an ISO standard in 1990. -Where it makes sense, POSIX @command{awk} is compatible with 1990 ISO C. - -In 1999, a revised ISO C standard was approved and released. -Future versions of @command{gawk} will be as compatible as possible -with this standard. - -@node Floating Point Issues, , Basic Data Typing, Basic Concepts -@appendixsec Floating-Point Number Caveats - -As mentioned earlier, floating-point numbers represent what are called -``real'' numbers; i.e., those that have a fractional part. @command{awk} -uses double-precision floating-point numbers to represent all -numeric values. This @value{SECTION} describes some of the issues -involved in using floating-point numbers. - -There is a very nice paper on floating-point arithmetic by -David Goldberg, @cite{What Every -Computer Scientist Should Know About Floating-point Arithmetic}, -@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), -5-48.@footnote{@uref{http://www.validgh.com/goldberg/paper.ps}} -This is worth reading if you are interested in the details, -but it does require a background in Computer Science. - -Internally, @command{awk} keeps both the numeric value -(double-precision floating-point) and the string value for a variable. -Separately, @command{awk} keeps -track of what type the variable has -(@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}), -which plays a role in how variables are used in comparisons. - -It is important to note that the string value for a number may not -reflect the full value (all the digits) that the numeric value -actually contains. -The following program (@file{values.awk}) illustrates this: - -@example -@{ - $1 = $2 + $3 - # see it for what it is - printf("$1 = %.12g\n", $1) - # use CONVFMT - a = "<" $1 ">" - print "a =", a -@group - # use OFMT - print "$1 =", $1 -@end group -@} -@end example - -@noindent -This program shows the full value of the sum of @code{$2} and @code{$3} -using @code{printf}, and then prints the string values obtained -from both automatic conversion (via @code{CONVFMT}) and -from printing (via @code{OFMT}). - -Here is what happens when the program is run: - -@example -$ echo 2 3.654321 1.2345678 | awk -f values.awk -@print{} $1 = 4.8888888 -@print{} a = <4.88889> -@print{} $1 = 4.88889 -@end example - -This makes it clear that the full numeric value is different from -what the default string representations show. - -@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with -at least six significant digits. For some applications, you might want to -change it to specify more precision. -On most modern machines, most of the time, -17 digits is enough to capture a floating-point number's -value exactly.@footnote{Pathological cases can require up to -752 digits (!), but we doubt that you need to worry about this.} - -@cindex floating-point, precision issues -Unlike numbers in the abstract sense (such as what you studied in high school -or college math), numbers stored in computers are limited in certain ways. -They cannot represent an infinite number of digits, nor can they always -represent things exactly. -In particular, -floating-point numbers cannot -always represent values exactly. Here is an example: - -@example -$ awk '@{ printf("%010d\n", $1 * 100) @}' -515.79 -@print{} 0000051579 -515.80 -@print{} 0000051579 -515.81 -@print{} 0000051580 -515.82 -@print{} 0000051582 -@kbd{Ctrl-d} -@end example - -@noindent -This shows that some values can be represented exactly, -whereas others are only approximated. This is not a ``bug'' -in @command{awk}, but simply an artifact of how computers -represent numbers. - -@cindex negative zero -@cindex positive zero -@cindex zero, negative vs.@: positive -@cindex floating-point, positive and negative values for zero -Another peculiarity of floating-point numbers on modern systems -is that they often have more than one representation for the number zero! -In particular, it is possible to represent ``minus zero'' as well as -regular, or ``positive'' zero. - -This example shows that negative and positive zero are distinct values -when stored internally, but that they are in fact equal to each other, -as well as to ``regular'' zero: - -@smallexample -$ gawk 'BEGIN @{ mz = -0 ; pz = 0 -> printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz -> printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0 -> @}' -@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 -@print{} mz == 0 -> 1, pz == 0 -> 1 -@end smallexample - -It helps to keep this in mind should you process numeric data -that contains negative zero values; the fact that the zero is negative -is noted and can affect comparisons. - -@node Glossary, Copying, Basic Concepts, Top -@unnumbered Glossary - -@table @asis -@item Action -A series of @command{awk} statements attached to a rule. If the rule's -pattern matches an input record, @command{awk} executes the -rule's action. Actions are always enclosed in curly braces. -(@xref{Action Overview, ,Actions}.) - -@cindex Spencer, Henry -@cindex @command{sed} utility -@cindex amazing @command{awk} assembler (@command{aaa}) -@item Amazing @command{awk} Assembler -Henry Spencer at the University of Toronto wrote a retargetable assembler -completely as @command{sed} and @command{awk} scripts. It is thousands -of lines long, including machine descriptions for several eight-bit -microcomputers. It is a good example of a program that would have been -better written in another language. -You can get it from @uref{ftp://ftp.freefriends.org/arnold/Awkstuff/aaa.tgz}. - -@cindex amazingly workable formatter (@command{awf}) -@cindex @command{awf} (amazingly workable formatter) program -@item Amazingly Workable Formatter (@command{awf}) -Henry Spencer at the University of Toronto wrote a formatter that accepts -a large subset of the @samp{nroff -ms} and @samp{nroff -man} formatting -commands, using @command{awk} and @command{sh}. -It is available over the Internet -from @uref{ftp://ftp.freefriends.org/arnold/Awkstuff/awf.tgz}. - -@item Anchor -The regexp metacharacters @samp{^} and @samp{$}, which force the match -to the beginning or end of the string, respectively. - -@cindex ANSI -@item ANSI -The American National Standards Institute. This organization produces -many standards, among them the standards for the C and C++ programming -languages. -These standards often become international standards as well. See also -``ISO.'' - -@item Array -A grouping of multiple values under the same name. -Most languages just provide sequential arrays. -@command{awk} provides associative arrays. - -@item Assertion -A statement in a program that a condition is true at this point in the program. -Useful for reasoning about how a program is supposed to behave. - -@item Assignment -An @command{awk} expression that changes the value of some @command{awk} -variable or data object. An object that you can assign to is called an -@dfn{lvalue}. The assigned values are called @dfn{rvalues}. -@xref{Assignment Ops, ,Assignment Expressions}. - -@item Associative Array -Arrays in which the indices may be numbers or strings, not just -sequential integers in a fixed range. - -@item @command{awk} Language -The language in which @command{awk} programs are written. - -@item @command{awk} Program -An @command{awk} program consists of a series of @dfn{patterns} and -@dfn{actions}, collectively known as @dfn{rules}. For each input record -given to the program, the program's rules are all processed in turn. -@command{awk} programs may also contain function definitions. - -@item @command{awk} Script -Another name for an @command{awk} program. - -@item Bash -The GNU version of the standard shell -@iftex -(the @b{B}ourne-@b{A}gain @b{SH}ell). -@end iftex -@ifnottex -(the Bourne-Again SHell). -@end ifnottex -See also ``Bourne Shell.'' - -@item BBS -See ``Bulletin Board System.'' - -@item Bit -Short for ``Binary Digit.'' -All values in computer memory ultimately reduce to binary digits: values -that are either zero or one. -Groups of bits may be interpreted differently---as integers, -floating-point numbers, character data, addresses of other -memory objects, or other data. -@command{awk} lets you work with floating-point numbers and strings. -@command{gawk} lets you manipulate bit values with the built-in -functions described in -@ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}. - -Computers are often defined by how many bits they use to represent integer -values. Typical systems are 32-bit systems, but 64-bit systems are -becoming increasingly popular, and 16-bit systems are waning in -popularity. - -@item Boolean Expression -Named after the English mathematician Boole. See also ``Logical Expression.'' - -@item Bourne Shell -The standard shell (@file{/bin/sh}) on Unix and Unix-like systems, -originally written by Steven R.@: Bourne. -Many shells (@command{bash}, @command{ksh}, @command{pdksh}, @command{zsh}) are -generally upwardly compatible with the Bourne shell. - -@item Built-in Function -The @command{awk} language provides built-in functions that perform various -numerical, I/O-related, and string computations. Examples are -@code{sqrt} (for the square root of a number) and @code{substr} (for a -substring of a string). -@command{gawk} provides functions for timestamp management, bit manipulation, -and runtime string translation. -(@xref{Built-in, ,Built-in Functions}.) - -@item Built-in Variable -@code{ARGC}, -@code{ARGV}, -@code{CONVFMT}, -@code{ENVIRON}, -@code{FILENAME}, -@code{FNR}, -@code{FS}, -@code{NF}, -@code{NR}, -@code{OFMT}, -@code{OFS}, -@code{ORS}, -@code{RLENGTH}, -@code{RSTART}, -@code{RS}, -and -@code{SUBSEP} -are the variables that have special meaning to @command{awk}. -In addition, -@code{ARGIND}, -@code{BINMODE}, -@code{ERRNO}, -@code{FIELDWIDTHS}, -@code{IGNORECASE}, -@code{LINT}, -@code{PROCINFO}, -@code{RT}, -and -@code{TEXTDOMAIN} -are the variables that have special meaning to @command{gawk}. -Changing some of them affects @command{awk}'s running environment. -(@xref{Built-in Variables}.) - -@item Braces -See ``Curly Braces.'' - -@item Bulletin Board System -A computer system allowing users to log in and read and/or leave messages -for other users of the system, much like leaving paper notes on a bulletin -board. - -@item C -The system programming language that most GNU software is written in. The -@command{awk} programming language has C-like syntax, and this @value{DOCUMENT} -points out similarities between @command{awk} and C when appropriate. - -In general, @command{gawk} attempts to be as similar to the 1990 version -of ISO C as makes sense. Future versions of @command{gawk} may adopt features -from the newer 1999 standard, as appropriate. - -@item C++ -A popular object-oriented programming language derived from C. - -@cindex ISO 8859-1 -@cindex ISO Latin-1 -@cindex character sets (machine character encodings) -@item Character Set -The set of numeric codes used by a computer system to represent the -characters (letters, numbers, punctuation, etc.) of a particular country -or place. The most common character set in use today is ASCII (American -Standard Code for Information Interchange). Many European -countries use an extension of ASCII known as ISO-8859-1 (ISO Latin-1). - -@cindex @command{chem} utility -@item CHEM -A preprocessor for @command{pic} that reads descriptions of molecules -and produces @command{pic} input for drawing them. -It was written in @command{awk} -by Brian Kernighan and Jon Bentley, and is available from -@uref{http://cm.bell-labs.com/netlib/typesetting/chem.gz}. - -@item Coprocess -A subordinate program with which two-way communications is possible. - -@cindex compiled programs -@item Compiler -A program that translates human-readable source code into -machine-executable object code. The object code is then executed -directly by the computer. -See also ``Interpreter.'' - -@item Compound Statement -A series of @command{awk} statements, enclosed in curly braces. Compound -statements may be nested. -(@xref{Statements, ,Control Statements in Actions}.) - -@item Concatenation -Concatenating two strings means sticking them together, one after another, -producing a new string. For example, the string @samp{foo} concatenated with -the string @samp{bar} gives the string @samp{foobar}. -(@xref{Concatenation, ,String Concatenation}.) - -@item Conditional Expression -An expression using the @samp{?:} ternary operator, such as -@samp{@var{expr1} ? @var{expr2} : @var{expr3}}. The expression -@var{expr1} is evaluated; if the result is true, the value of the whole -expression is the value of @var{expr2}; otherwise the value is -@var{expr3}. In either case, only one of @var{expr2} and @var{expr3} -is evaluated. (@xref{Conditional Exp, ,Conditional Expressions}.) - -@item Comparison Expression -A relation that is either true or false, such as @samp{(a < b)}. -Comparison expressions are used in @code{if}, @code{while}, @code{do}, -and @code{for} -statements, and in patterns to select which input records to process. -(@xref{Typing and Comparison, ,Variable Typing and Comparison Expressions}.) - -@item Curly Braces -The characters @samp{@{} and @samp{@}}. Curly braces are used in -@command{awk} for delimiting actions, compound statements, and function -bodies. - -@cindex dark corner -@item Dark Corner -An area in the language where specifications often were (or still -are) not clear, leading to unexpected or undesirable behavior. -Such areas are marked in this @value{DOCUMENT} with -@iftex -the picture of a flashlight in the margin -@end iftex -@ifnottex -``(d.c.)'' in the text -@end ifnottex -and are indexed under the heading ``dark corner.'' - -@item Data Driven -A description of @command{awk} programs, where you specify the data you -are interested in processing, and what to do when that data is seen. - -@item Data Objects -These are numbers and strings of characters. Numbers are converted into -strings and vice versa, as needed. -(@xref{Conversion, ,Conversion of Strings and Numbers}.) - -@item Deadlock -The situation in which two communicating processes are each waiting -for the other to perform an action. - -@item Double-Precision -An internal representation of numbers that can have fractional parts. -Double-precision numbers keep track of more digits than do single-precision -numbers, but operations on them are sometimes more expensive. This is the way -@command{awk} stores numeric values. It is the C type @code{double}. - -@item Dynamic Regular Expression -A dynamic regular expression is a regular expression written as an -ordinary expression. It could be a string constant, such as -@code{"foo"}, but it may also be an expression whose value can vary. -(@xref{Computed Regexps, , Using Dynamic Regexps}.) - -@item Environment -A collection of strings, of the form @var{name@code{=}val}, that each -program has available to it. Users generally place values into the -environment in order to provide information to various programs. Typical -examples are the environment variables @env{HOME} and @env{PATH}. - -@item Empty String -See ``Null String.'' - -@cindex epoch, definition of -@item Epoch -The date used as the ``beginning of time'' for timestamps. -Time values in Unix systems are represented as seconds since the epoch, -with library functions available for converting these values into -standard date and time formats. - -The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC. -See also ``GMT'' and ``UTC.'' - -@item Escape Sequences -A special sequence of characters used for describing non-printing -characters, such as @samp{\n} for newline or @samp{\033} for the ASCII -ESC (Escape) character. (@xref{Escape Sequences}.) - -@item FDL -See ``Free Documentation License.'' - -@item Field -When @command{awk} reads an input record, it splits the record into pieces -separated by whitespace (or by a separator regexp that you can -change by setting the built-in variable @code{FS}). Such pieces are -called fields. If the pieces are of fixed length, you can use the built-in -variable @code{FIELDWIDTHS} to describe their lengths. -(@xref{Field Separators, ,Specifying How Fields Are Separated}, -and -@ref{Constant Size, ,Reading Fixed-Width Data}.) - -@item Flag -A variable whose truth value indicates the existence or non-existence -of some condition. - -@item Floating-Point Number -Often referred to in mathematical terms as a ``rational'' or real number, -this is just a number that can have a fractional part. -See also ``Double-Precision'' and ``Single-Precision.'' - -@item Format -Format strings are used to control the appearance of output in the -@code{strftime} and @code{sprintf} functions, and are used in the -@code{printf} statement as well. Also, data conversions from numbers to strings -are controlled by the format string contained in the built-in variable -@code{CONVFMT}. (@xref{Control Letters, ,Format-Control Letters}.) - -@item Free Documentation License -This document describes the terms under which this @value{DOCUMENT} -is published and may be copied. (@xref{GNU Free Documentation License}.) - -@item Function -A specialized group of statements used to encapsulate general -or program-specific tasks. @command{awk} has a number of built-in -functions, and also allows you to define your own. -(@xref{Functions}.) - -@item FSF -See ``Free Software Foundation.'' - -@cindex FSF -@cindex Free Software Foundation -@cindex Stallman, Richard -@item Free Software Foundation -A non-profit organization dedicated -to the production and distribution of freely distributable software. -It was founded by Richard M.@: Stallman, the author of the original -Emacs editor. GNU Emacs is the most widely used version of Emacs today. - -@item @command{gawk} -The GNU implementation of @command{awk}. - -@cindex GPL -@cindex General Public License -@cindex GNU General Public License -@item General Public License -This document describes the terms under which @command{gawk} and its source -code may be distributed. (@xref{Copying, ,GNU General Public License}.) - -@item GMT -``Greenwich Mean Time.'' -This is the old term for UTC. -It is the time of day used as the epoch for Unix and POSIX systems. -See also ``Epoch'' and ``UTC.'' - -@cindex FSF -@cindex Free Software Foundation -@cindex GNU Project -@item GNU -``GNU's not Unix''. An on-going project of the Free Software Foundation -to create a complete, freely distributable, POSIX-compliant computing -environment. - -@item GNU/Linux -A variant of the GNU system using the Linux kernel, instead of the -Free Software Foundation's Hurd kernel. -Linux is a stable, efficient, full-featured clone of Unix that has -been ported to a variety of architectures. -It is most popular on PC-class systems, but runs well on a variety of -other systems too. -The Linux kernel source code is available under the terms of the GNU General -Public License, which is perhaps its most important aspect. - -@item GPL -See ``General Public License.'' - -@item Hexadecimal -Base 16 notation, where the digits are @code{0}--@code{9} and -@code{A}--@code{F}, with @samp{A} -representing 10, @samp{B} representing 11, and so on, up to @samp{F} for 15. -Hexadecimal numbers are written in C using a leading @samp{0x}, -to indicate their base. Thus, @code{0x12} is 18 (1 times 16 plus 2). - -@item I/O -Abbreviation for ``Input/Output,'' the act of moving data into and/or -out of a running program. - -@item Input Record -A single chunk of data that is read in by @command{awk}. Usually, an @command{awk} input -record consists of one line of text. -(@xref{Records, ,How Input Is Split into Records}.) - -@item Integer -A whole number, i.e., a number that does not have a fractional part. - -@item Internationalization -The process of writing or modifying a program so -that it can use multiple languages without requiring -further source code changes. - -@cindex interpreted programs -@item Interpreter -A program that reads human-readable source code directly, and uses -the instructions in it to process data and produce results. -@command{awk} is typically (but not always) implemented as an interpreter. -See also ``Compiler.'' - -@item Interval Expression -A component of a regular expression that lets you specify repeated matches of -some part of the regexp. Interval expressions were not traditionally available -in @command{awk} programs. - -@cindex ISO -@item ISO -The International Standards Organization. -This organization produces international standards for many things, including -programming languages, such as C and C++. -In the computer arena, important standards like those for C, C++, and POSIX -become both American national and ISO international standards simultaneously. -This @value{DOCUMENT} refers to Standard C as ``ISO C'' throughout. - -@item Keyword -In the @command{awk} language, a keyword is a word that has special -meaning. Keywords are reserved and may not be used as variable names. - -@command{gawk}'s keywords are: -@code{BEGIN}, -@code{END}, -@code{if}, -@code{else}, -@code{while}, -@code{do@dots{}while}, -@code{for}, -@code{for@dots{}in}, -@code{break}, -@code{continue}, -@code{delete}, -@code{next}, -@code{nextfile}, -@code{function}, -@code{func}, -and -@code{exit}. - -@cindex LGPL -@cindex Lesser General Public License -@cindex GNU Lesser General Public License -@item Lesser General Public License -This document describes the terms under which binary library archives -or shared objects, -and their source code may be distributed. - -@item Linux -See ``GNU/Linux.'' - -@item LGPL -See ``Lesser General Public License.'' - -@item Localization -The process of providing the data necessary for an -internationalized program to work in a particular language. - -@item Logical Expression -An expression using the operators for logic, AND, OR, and NOT, written -@samp{&&}, @samp{||}, and @samp{!} in @command{awk}. Often called Boolean -expressions, after the mathematician who pioneered this kind of -mathematical logic. - -@item Lvalue -An expression that can appear on the left side of an assignment -operator. In most languages, lvalues can be variables or array -elements. In @command{awk}, a field designator can also be used as an -lvalue. - -@item Matching -The act of testing a string against a regular expression. If the -regexp describes the contents of the string, it is said to @dfn{match} it. - -@item Metacharacters -Characters used within a regexp that do not stand for themselves. -Instead, they denote regular expression operations, such as repetition, -grouping, or alternation. - -@item Null String -A string with no characters in it. It is represented explicitly in -@command{awk} programs by placing two double quote characters next to -each other (@code{""}). It can appear in input data by having two successive -occurrences of the field separator appear next to each other. - -@item Number -A numeric-valued data object. Modern @command{awk} implementations use -double-precision floating-point to represent numbers. -Very old @command{awk} implementations use single-precision floating-point. - -@item Octal -Base-eight notation, where the digits are @code{0}--@code{7}. -Octal numbers are written in C using a leading @samp{0}, -to indicate their base. Thus, @code{013} is 11 (one times 8 plus 3). - -@cindex P1003.2 POSIX standard -@item P1003.2 -See ``POSIX.'' - -@item Pattern -Patterns tell @command{awk} which input records are interesting to which -rules. - -A pattern is an arbitrary conditional expression against which input is -tested. If the condition is satisfied, the pattern is said to @dfn{match} -the input record. A typical pattern might compare the input record against -a regular expression. (@xref{Pattern Overview, ,Pattern Elements}.) - -@item POSIX -The name for a series of standards -@c being developed by the IEEE -that specify a Portable Operating System interface. The ``IX'' denotes -the Unix heritage of these standards. The main standard of interest for -@command{awk} users is -@cite{IEEE Standard for Information Technology, Standard 1003.2-1992, -Portable Operating System Interface (POSIX) Part 2: Shell and Utilities}. -Informally, this standard is often referred to as simply ``P1003.2.'' - -@item Precedence -The order in which operations are performed when operators are used -without explicit parentheses. - -@item Private -Variables and/or functions that are meant for use exclusively by library -functions and not for the main @command{awk} program. Special care must be -taken when naming such variables and functions. -(@xref{Library Names, , Naming Library Function Global Variables}.) - -@item Range (of input lines) -A sequence of consecutive lines from the input file(s). A pattern -can specify ranges of input lines for @command{awk} to process or it can -specify single lines. (@xref{Pattern Overview, ,Pattern Elements}.) - -@item Recursion -When a function calls itself, either directly or indirectly. -If this isn't clear, refer to the entry for ``recursion.'' - -@item Redirection -Redirection means performing input from something other than the standard input -stream, or performing output to something other than the standard output stream. - -You can redirect the output of the @code{print} and @code{printf} statements -to a file or a system command, using the @samp{>}, @samp{>>}, @samp{|}, and @samp{|&} -operators. You can redirect input to the @code{getline} statement using -the @samp{<}, @samp{|}, and @samp{|&} operators. -(@xref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}, -and @ref{Getline, ,Explicit Input with @code{getline}}.) - -@item Regexp -Short for @dfn{regular expression}. A regexp is a pattern that denotes a -set of strings, possibly an infinite set. For example, the regexp -@samp{R.*xp} matches any string starting with the letter @samp{R} -and ending with the letters @samp{xp}. In @command{awk}, regexps are -used in patterns and in conditional expressions. Regexps may contain -escape sequences. (@xref{Regexp, ,Regular Expressions}.) - -@item Regular Expression -See ``regexp.'' - -@item Regular Expression Constant -A regular expression constant is a regular expression written within -slashes, such as @code{/foo/}. This regular expression is chosen -when you write the @command{awk} program and cannot be changed during -its execution. (@xref{Regexp Usage, ,How to Use Regular Expressions}.) - -@item Rule -A segment of an @command{awk} program that specifies how to process single -input records. A rule consists of a @dfn{pattern} and an @dfn{action}. -@command{awk} reads an input record; then, for each rule, if the input record -satisfies the rule's pattern, @command{awk} executes the rule's action. -Otherwise, the rule does nothing for that input record. - -@item Rvalue -A value that can appear on the right side of an assignment operator. -In @command{awk}, essentially every expression has a value. These values -are rvalues. - -@item Scalar -A single value, be it a number or a string. -Regular variables are scalars; arrays and functions are not. - -@item Search Path -In @command{gawk}, a list of directories to search for @command{awk} program source files. -In the shell, a list of directories to search for executable programs. - -@item Seed -The initial value, or starting point, for a sequence of random numbers. - -@item @command{sed} -See ``Stream Editor.'' - -@item Shell -The command interpreter for Unix and POSIX-compliant systems. -The shell works both interactively, and as a programming language -for batch files, or shell scripts. - -@item Short-Circuit -The nature of the @command{awk} logical operators @samp{&&} and @samp{||}. -If the value of the entire expression is determinable from evaluating just -the lefthand side of these operators, the righthand side is not -evaluated. -(@xref{Boolean Ops, ,Boolean Expressions}.) - -@item Side Effect -A side effect occurs when an expression has an effect aside from merely -producing a value. Assignment expressions, increment and decrement -expressions, and function calls have side effects. -(@xref{Assignment Ops, ,Assignment Expressions}.) - -@item Single-Precision -An internal representation of numbers that can have fractional parts. -Single-precision numbers keep track of fewer digits than do double-precision -numbers, but operations on them are sometimes less expensive in terms of CPU time. -This is the type used by some very old versions of @command{awk} to store -numeric values. It is the C type @code{float}. - -@item Space -The character generated by hitting the space bar on the keyboard. - -@item Special File -A @value{FN} interpreted internally by @command{gawk}, instead of being handed -directly to the underlying operating system---for example, @file{/dev/stderr}. -(@xref{Special Files, ,Special @value{FFN}s in @command{gawk}}.) - -@item Stream Editor -A program that reads records from an input stream and processes them one -or more at a time. This is in contrast with batch programs, which may -expect to read their input files in entirety before starting to do -anything, as well as with interactive programs which require input from the -user. - -@item String -A datum consisting of a sequence of characters, such as @samp{I am a -string}. Constant strings are written with double quotes in the -@command{awk} language and may contain escape sequences. -(@xref{Escape Sequences}.) - -@item Tab -The character generated by hitting the @kbd{TAB} key on the keyboard. -It usually expands to up to eight spaces upon output. - -@item Text Domain -A unique name that identifies an application. -Used for grouping messages that are translated at runtime -into the local language. - -@item Timestamp -A value in the ``seconds since the epoch'' format used by Unix -and POSIX systems. Used for the @command{gawk} functions -@code{mktime}, @code{strftime}, and @code{systime}. -See also ``Epoch'' and ``UTC.'' - -@cindex Linux -@cindex GNU/Linux -@cindex Unix -@cindex BSD-based operating systems -@cindex NetBSD -@cindex FreeBSD -@cindex OpenBSD -@item Unix -A computer operating system originally developed in the early 1970's at -AT&T Bell Laboratories. It initially became popular in universities around -the world and later moved into commercial environments as a software -development system and network server system. There are many commercial -versions of Unix, as well as several work-alike systems whose source code -is freely available (such as GNU/Linux, NetBSD, FreeBSD, and OpenBSD). - -@item UTC -The accepted abbreviation for ``Universal Coordinated Time.'' -This is standard time in Greenwich, England, which is used as a -reference time for day and date calculations. -See also ``Epoch'' and ``GMT.'' - -@item Whitespace -A sequence of space, tab, or newline characters occurring inside an input -record or a string. -@end table - -@node Copying, GNU Free Documentation License, Glossary, Top -@unnumbered GNU General Public License -@center Version 2, June 1991 - -@display -Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc. -59 Temple Place, Suite 330, Boston, MA 02111, USA - -Everyone is permitted to copy and distribute verbatim copies -of this license document, but changing it is not allowed. -@end display - -@c fakenode --- for prepinfo -@unnumberedsec Preamble - - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software---to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Library General Public License instead.) You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - -@ifnotinfo -@c fakenode --- for prepinfo -@unnumberedsec Terms and Conditions for Copying, Distribution and Modification -@end ifnotinfo -@ifinfo -@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION -@end ifinfo - -@enumerate 0 -@item -This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The ``Program'', below, -refers to any such program or work, and a ``work based on the Program'' -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term ``modification''.) Each licensee is addressed as ``you''. - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - -@item -You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - -@item -You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - -@enumerate a -@item -You must cause the modified files to carry prominent notices -stating that you changed the files and the date of any change. - -@item -You must cause any work that you distribute or publish, that in -whole or in part contains or is derived from the Program or any -part thereof, to be licensed as a whole at no charge to all third -parties under the terms of this License. - -@item -If the modified program normally reads commands interactively -when run, you must cause it, when started running for such -interactive use in the most ordinary way, to print or display an -announcement including an appropriate copyright notice and a -notice that there is no warranty (or else, saying that you provide -a warranty) and that users may redistribute the program under -these conditions, and telling the user how to view a copy of this -License. (Exception: if the Program itself is interactive but -does not normally print such an announcement, your work based on -the Program is not required to print an announcement.) -@end enumerate - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - -@item -You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - -@enumerate a -@item -Accompany it with the complete corresponding machine-readable -source code, which must be distributed under the terms of Sections -1 and 2 above on a medium customarily used for software interchange; or, - -@item -Accompany it with a written offer, valid for at least three -years, to give any third party, for a charge no more than your -cost of physically performing source distribution, a complete -machine-readable copy of the corresponding source code, to be -distributed under the terms of Sections 1 and 2 above on a medium -customarily used for software interchange; or, - -@item -Accompany it with the information you received as to the offer -to distribute corresponding source code. (This alternative is -allowed only for noncommercial distribution and only if you -received the program in object code or executable form with such -an offer, in accord with Subsection b above.) -@end enumerate - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - -@item -You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - -@item -You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - -@item -Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - -@item -If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - -@item -If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - -@item -The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and ``any -later version'', you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - -@item -If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - -@ifnotinfo -@c fakenode --- for prepinfo -@heading NO WARRANTY -@end ifnotinfo -@ifinfo -@center NO WARRANTY -@end ifinfo - -@item -BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW@. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE@. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU@. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - -@item -IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. -@end enumerate - -@ifnotinfo -@c fakenode --- for prepinfo -@heading END OF TERMS AND CONDITIONS -@end ifnotinfo -@ifinfo -@center END OF TERMS AND CONDITIONS -@end ifinfo - -@page -@c fakenode --- for prepinfo -@unnumberedsec How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the ``copyright'' line and a pointer to where the full notice is found. - -@smallexample -@var{one line to give the program's name and an idea of what it does.} -Copyright (C) @var{year} @var{name of author} - -This program is free software; you can redistribute it and/or -modify it under the terms of the GNU General Public License -as published by the Free Software Foundation; either version 2 -of the License, or (at your option) any later version. - -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE@. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with this program; if not, write to the Free Software -Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA. -@end smallexample - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - -@smallexample -Gnomovision version 69, Copyright (C) @var{year} @var{name of author} -Gnomovision comes with ABSOLUTELY NO WARRANTY; for details -type `show w'. This is free software, and you are welcome -to redistribute it under certain conditions; type `show c' -for details. -@end smallexample - -The hypothetical commands @samp{show w} and @samp{show c} should show -the appropriate parts of the General Public License. Of course, the -commands you use may be called something other than @samp{show w} and -@samp{show c}; they could even be mouse-clicks or menu items---whatever -suits your program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a ``copyright disclaimer'' for the program, if -necessary. Here is a sample; alter the names: - -@smallexample -@group -Yoyodyne, Inc., hereby disclaims all copyright -interest in the program `Gnomovision' -(which makes passes at compilers) written -by James Hacker. - -@var{signature of Ty Coon}, 1 April 1989 -Ty Coon, President of Vice -@end group -@end smallexample - -This General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may -consider it more useful to permit linking proprietary applications with the -library. If this is what you want to do, use the GNU Lesser General -Public License instead of this License. - -@node GNU Free Documentation License, Index, Copying, Top -@unnumbered GNU Free Documentation License -@center Version 1.1, March 2000 -@cindex FDL -@cindex Free Documentation License -@cindex GNU Free Documentation License - -@display -Copyright (C) 2000 Free Software Foundation, Inc. -59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - -Everyone is permitted to copy and distribute verbatim copies -of this license document, but changing it is not allowed. -@end display -@sp 1 -@enumerate 0 -@item -PREAMBLE - -The purpose of this License is to make a manual, textbook, or other -written document ``free'' in the sense of freedom: to assure everyone -the effective freedom to copy and redistribute it, with or without -modifying it, either commercially or noncommercially. Secondarily, -this License preserves for the author and publisher a way to get -credit for their work, while not being considered responsible for -modifications made by others. - -This License is a kind of ``copyleft'', which means that derivative -works of the document must themselves be free in the same sense. It -complements the GNU General Public License, which is a copyleft -license designed for free software. - -We have designed this License in order to use it for manuals for free -software, because free software needs free documentation: a free -program should come with manuals providing the same freedoms that the -software does. But this License is not limited to software manuals; -it can be used for any textual work, regardless of subject matter or -whether it is published as a printed book. We recommend this License -principally for works whose purpose is instruction or reference. - -@sp 1 -@item -APPLICABILITY AND DEFINITIONS - -This License applies to any manual or other work that contains a -notice placed by the copyright holder saying it can be distributed -under the terms of this License. The ``Document'', below, refers to any -such manual or work. Any member of the public is a licensee, and is -addressed as ``you''. - -A ``Modified Version'' of the Document means any work containing the -Document or a portion of it, either copied verbatim, or with -modifications and/or translated into another language. - -A ``Secondary Section'' is a named appendix or a front-matter section of -the Document that deals exclusively with the relationship of the -publishers or authors of the Document to the Document's overall subject -(or to related matters) and contains nothing that could fall directly -within that overall subject. (For example, if the Document is in part a -textbook of mathematics, a Secondary Section may not explain any -mathematics.) The relationship could be a matter of historical -connection with the subject or with related matters, or of legal, -commercial, philosophical, ethical or political position regarding -them. - -The ``Invariant Sections'' are certain Secondary Sections whose titles -are designated, as being those of Invariant Sections, in the notice -that says that the Document is released under this License. - -The ``Cover Texts'' are certain short passages of text that are listed, -as Front-Cover Texts or Back-Cover Texts, in the notice that says that -the Document is released under this License. - -A ``Transparent'' copy of the Document means a machine-readable copy, -represented in a format whose specification is available to the -general public, whose contents can be viewed and edited directly and -straightforwardly with generic text editors or (for images composed of -pixels) generic paint programs or (for drawings) some widely available -drawing editor, and that is suitable for input to text formatters or -for automatic translation to a variety of formats suitable for input -to text formatters. A copy made in an otherwise Transparent file -format whose markup has been designed to thwart or discourage -subsequent modification by readers is not Transparent. A copy that is -not ``Transparent'' is called ``Opaque''. - -Examples of suitable formats for Transparent copies include plain -ASCII without markup, Texinfo input format, LaTeX input format, SGML -or XML using a publicly available DTD, and standard-conforming simple -HTML designed for human modification. Opaque formats include -PostScript, PDF, proprietary formats that can be read and edited only -by proprietary word processors, SGML or XML for which the DTD and/or -processing tools are not generally available, and the -machine-generated HTML produced by some word processors for output -purposes only. - -The ``Title Page'' means, for a printed book, the title page itself, -plus such following pages as are needed to hold, legibly, the material -this License requires to appear in the title page. For works in -formats which do not have any title page as such, ``Title Page'' means -the text near the most prominent appearance of the work's title, -preceding the beginning of the body of the text. -@sp 1 -@item -VERBATIM COPYING - -You may copy and distribute the Document in any medium, either -commercially or noncommercially, provided that this License, the -copyright notices, and the license notice saying this License applies -to the Document are reproduced in all copies, and that you add no other -conditions whatsoever to those of this License. You may not use -technical measures to obstruct or control the reading or further -copying of the copies you make or distribute. However, you may accept -compensation in exchange for copies. If you distribute a large enough -number of copies you must also follow the conditions in section 3. - -You may also lend copies, under the same conditions stated above, and -you may publicly display copies. -@sp 1 -@item -COPYING IN QUANTITY - -If you publish printed copies of the Document numbering more than 100, -and the Document's license notice requires Cover Texts, you must enclose -the copies in covers that carry, clearly and legibly, all these Cover -Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on -the back cover. Both covers must also clearly and legibly identify -you as the publisher of these copies. The front cover must present -the full title with all words of the title equally prominent and -visible. You may add other material on the covers in addition. -Copying with changes limited to the covers, as long as they preserve -the title of the Document and satisfy these conditions, can be treated -as verbatim copying in other respects. - -If the required texts for either cover are too voluminous to fit -legibly, you should put the first ones listed (as many as fit -reasonably) on the actual cover, and continue the rest onto adjacent -pages. - -If you publish or distribute Opaque copies of the Document numbering -more than 100, you must either include a machine-readable Transparent -copy along with each Opaque copy, or state in or with each Opaque copy -a publicly-accessible computer-network location containing a complete -Transparent copy of the Document, free of added material, which the -general network-using public has access to download anonymously at no -charge using public-standard network protocols. If you use the latter -option, you must take reasonably prudent steps, when you begin -distribution of Opaque copies in quantity, to ensure that this -Transparent copy will remain thus accessible at the stated location -until at least one year after the last time you distribute an Opaque -copy (directly or through your agents or retailers) of that edition to -the public. - -It is requested, but not required, that you contact the authors of the -Document well before redistributing any large number of copies, to give -them a chance to provide you with an updated version of the Document. -@sp 1 -@item -MODIFICATIONS - -You may copy and distribute a Modified Version of the Document under -the conditions of sections 2 and 3 above, provided that you release -the Modified Version under precisely this License, with the Modified -Version filling the role of the Document, thus licensing distribution -and modification of the Modified Version to whoever possesses a copy -of it. In addition, you must do these things in the Modified Version: - -@enumerate A -@item -Use in the Title Page (and on the covers, if any) a title distinct -from that of the Document, and from those of previous versions -(which should, if there were any, be listed in the History section -of the Document). You may use the same title as a previous version -if the original publisher of that version gives permission. - -@item -List on the Title Page, as authors, one or more persons or entities -responsible for authorship of the modifications in the Modified -Version, together with at least five of the principal authors of the -Document (all of its principal authors, if it has less than five). - -@item -State on the Title page the name of the publisher of the -Modified Version, as the publisher. - -@item -Preserve all the copyright notices of the Document. - -@item -Add an appropriate copyright notice for your modifications -adjacent to the other copyright notices. - -@item -Include, immediately after the copyright notices, a license notice -giving the public permission to use the Modified Version under the -terms of this License, in the form shown in the Addendum below. - -@item -Preserve in that license notice the full lists of Invariant Sections -and required Cover Texts given in the Document's license notice. - -@item -Include an unaltered copy of this License. - -@item -Preserve the section entitled ``History'', and its title, and add to -it an item stating at least the title, year, new authors, and -publisher of the Modified Version as given on the Title Page. If -there is no section entitled ``History'' in the Document, create one -stating the title, year, authors, and publisher of the Document as -given on its Title Page, then add an item describing the Modified -Version as stated in the previous sentence. - -@item -Preserve the network location, if any, given in the Document for -public access to a Transparent copy of the Document, and likewise -the network locations given in the Document for previous versions -it was based on. These may be placed in the ``History'' section. -You may omit a network location for a work that was published at -least four years before the Document itself, or if the original -publisher of the version it refers to gives permission. - -@item -In any section entitled ``Acknowledgements'' or ``Dedications'', -preserve the section's title, and preserve in the section all the -substance and tone of each of the contributor acknowledgements -and/or dedications given therein. - -@item -Preserve all the Invariant Sections of the Document, -unaltered in their text and in their titles. Section numbers -or the equivalent are not considered part of the section titles. - -@item -Delete any section entitled ``Endorsements''. Such a section -may not be included in the Modified Version. - -@item -Do not retitle any existing section as ``Endorsements'' -or to conflict in title with any Invariant Section. -@end enumerate - -If the Modified Version includes new front-matter sections or -appendices that qualify as Secondary Sections and contain no material -copied from the Document, you may at your option designate some or all -of these sections as invariant. To do this, add their titles to the -list of Invariant Sections in the Modified Version's license notice. -These titles must be distinct from any other section titles. - -You may add a section entitled ``Endorsements'', provided it contains -nothing but endorsements of your Modified Version by various -parties--for example, statements of peer review or that the text has -been approved by an organization as the authoritative definition of a -standard. - -You may add a passage of up to five words as a Front-Cover Text, and a -passage of up to 25 words as a Back-Cover Text, to the end of the list -of Cover Texts in the Modified Version. Only one passage of -Front-Cover Text and one of Back-Cover Text may be added by (or -through arrangements made by) any one entity. If the Document already -includes a cover text for the same cover, previously added by you or -by arrangement made by the same entity you are acting on behalf of, -you may not add another; but you may replace the old one, on explicit -permission from the previous publisher that added the old one. - -The author(s) and publisher(s) of the Document do not by this License -give permission to use their names for publicity for or to assert or -imply endorsement of any Modified Version. -@sp 1 -@item -COMBINING DOCUMENTS - -You may combine the Document with other documents released under this -License, under the terms defined in section 4 above for modified -versions, provided that you include in the combination all of the -Invariant Sections of all of the original documents, unmodified, and -list them all as Invariant Sections of your combined work in its -license notice. - -The combined work need only contain one copy of this License, and -multiple identical Invariant Sections may be replaced with a single -copy. If there are multiple Invariant Sections with the same name but -different contents, make the title of each such section unique by -adding at the end of it, in parentheses, the name of the original -author or publisher of that section if known, or else a unique number. -Make the same adjustment to the section titles in the list of -Invariant Sections in the license notice of the combined work. - -In the combination, you must combine any sections entitled ``History'' -in the various original documents, forming one section entitled -``History''; likewise combine any sections entitled ``Acknowledgements'', -and any sections entitled ``Dedications''. You must delete all sections -entitled ``Endorsements.'' -@sp 1 -@item -COLLECTIONS OF DOCUMENTS - -You may make a collection consisting of the Document and other documents -released under this License, and replace the individual copies of this -License in the various documents with a single copy that is included in -the collection, provided that you follow the rules of this License for -verbatim copying of each of the documents in all other respects. - -You may extract a single document from such a collection, and distribute -it individually under this License, provided you insert a copy of this -License into the extracted document, and follow this License in all -other respects regarding verbatim copying of that document. -@sp 1 -@item -AGGREGATION WITH INDEPENDENT WORKS - -A compilation of the Document or its derivatives with other separate -and independent documents or works, in or on a volume of a storage or -distribution medium, does not as a whole count as a Modified Version -of the Document, provided no compilation copyright is claimed for the -compilation. Such a compilation is called an ``aggregate'', and this -License does not apply to the other self-contained works thus compiled -with the Document, on account of their being thus compiled, if they -are not themselves derivative works of the Document. - -If the Cover Text requirement of section 3 is applicable to these -copies of the Document, then if the Document is less than one quarter -of the entire aggregate, the Document's Cover Texts may be placed on -covers that surround only the Document within the aggregate. -Otherwise they must appear on covers around the whole aggregate. -@sp 1 -@item -TRANSLATION - -Translation is considered a kind of modification, so you may -distribute translations of the Document under the terms of section 4. -Replacing Invariant Sections with translations requires special -permission from their copyright holders, but you may include -translations of some or all Invariant Sections in addition to the -original versions of these Invariant Sections. You may include a -translation of this License provided that you also include the -original English version of this License. In case of a disagreement -between the translation and the original English version of this -License, the original English version will prevail. -@sp 1 -@item -TERMINATION - -You may not copy, modify, sublicense, or distribute the Document except -as expressly provided for under this License. Any other attempt to -copy, modify, sublicense or distribute the Document is void, and will -automatically terminate your rights under this License. However, -parties who have received copies, or rights, from you under this -License will not have their licenses terminated so long as such -parties remain in full compliance. -@sp 1 -@item -FUTURE REVISIONS OF THIS LICENSE - -The Free Software Foundation may publish new, revised versions -of the GNU Free Documentation License from time to time. Such new -versions will be similar in spirit to the present version, but may -differ in detail to address new problems or concerns. See -@uref{http://www.gnu.org/copyleft/}. - -Each version of the License is given a distinguishing version number. -If the Document specifies that a particular numbered version of this -License ``or any later version'' applies to it, you have the option of -following the terms and conditions either of that specified version or -of any later version that has been published (not as a draft) by the -Free Software Foundation. If the Document does not specify a version -number of this License, you may choose any version ever published (not -as a draft) by the Free Software Foundation. - -@end enumerate - -@c fakenode --- for prepinfo -@unnumberedsec ADDENDUM: How to use this License for your documents - -To use this License in a document you have written, include a copy of -the License in the document and put the following copyright and -license notices just after the title page: - -@smallexample -@group - - Copyright (C) @var{year} @var{your name}. - Permission is granted to copy, distribute and/or modify this document - under the terms of the GNU Free Documentation License, Version 1.1 - or any later version published by the Free Software Foundation; - with the Invariant Sections being @var{list their titles}, with the - Front-Cover Texts being @var{list}, and with the Back-Cover Texts being @var{list}. - A copy of the license is included in the section entitled ``GNU - Free Documentation License''. -@end group -@end smallexample -If you have no Invariant Sections, write ``with no Invariant Sections'' -instead of saying which ones are invariant. If you have no -Front-Cover Texts, write ``no Front-Cover Texts'' instead of -``Front-Cover Texts being @var{list}''; likewise for Back-Cover Texts. - -If your document contains nontrivial examples of program code, we -recommend releasing these examples in parallel under your choice of -free software license, such as the GNU General Public License, -to permit their use in free software. - -@node Index, , GNU Free Documentation License, Top -@unnumbered Index -@printindex cp - -@bye - -Unresolved Issues: ------------------- -1. From ADR. - - Robert J. Chassell points out that awk programs should have some indication - of how to use them. It would be useful to perhaps have a "programming - style" section of the manual that would include this and other tips. - -2. The default AWKPATH search path should be configurable via `configure' - The default and how this changes needs to be documented. - -Consistency issues: - /.../ regexps are in @code, not @samp - ".." strings are in @code, not @samp - no @print before @dots - values of expressions in the text (@code{x} has the value 15), - should be in roman, not @code - Use tab and not TAB - Use ESC and not ESCAPE - Use space and not blank to describe the space bar's character - The term "blank" is thus basically reserved for "blank lines" etc. - To make dark corners work, the @value{DARKCORNER} has to be outside - closing `.' of a sentence and after (pxref{...}). This is - a change from earlier versions. - " " should have an @w{} around it - Use "non-" everywhere - Use @command{ftp} when talking about anonymous ftp - Use uppercase and lowercase, not "upper-case" and "lower-case" - or "upper case" and "lower case" - Use "single precision" and "double precision", not "single-precision" or "double-precision" - Use alphanumeric, not alpha-numeric - Use POSIX-compliant, not POSIX compliant - Use --foo, not -Wfoo when describing long options - Use "Bell Laboratories", but not "Bell Labs". - Use "behavior" instead of "behaviour". - Use "zeros" instead of "zeroes". - Use "nonzero" not "non-zero". - Use "runtime" not "run time" or "run-time". - Use "command-line" not "command line". - Use "online" not "on-line". - Use "whitespace" not "white space". - Use "Input/Output", not "input/output". Also "I/O", not "i/o". - Use "lefthand"/"righthand", not "left-hand"/"right-hand". - Use "workaround", not "work-around". - Use "startup"/"cleanup", not "start-up"/"clean-up" - Use @code{do}, and not @code{do}-@code{while}, except where - actually discussing the do-while. - The words "a", "and", "as", "between", "for", "from", "in", "of", - "on", "that", "the", "to", "with", and "without", - should not be capitalized in @chapter, @section etc. - "Into" and "How" should. - Search for @dfn; make sure important items are also indexed. - "e.g." should always be followed by a comma. - "i.e." should always be followed by a comma. - The numbers zero through ten should be spelled out, except when - talking about file descriptor numbers. > 10 and < 0, it's - ok to use numbers. - In tables, put command-line options in @code, while in the text, - put them in @option. - When using @strong, use "Note:" or "Caution:" with colons and - not exclamation points. Do not surround the paragraphs - with @quotation ... @end quotation. - For most cases, do NOT put a comma before "and", "or" or "but". - But exercise taste with this rule. - Don't show the awk command with a program in quotes when it's - just the program. I.e. - - { - .... - } - - not - awk '{ - ... - }' - - Do show it when showing command-line arguments, data files, etc, even - if there is no output shown. - - Use numbered lists only to show a sequential series of steps. - - Use @code{xxx} for the xxx operator in indexing statements, not @samp. - -Date: Wed, 13 Apr 94 15:20:52 -0400 -From: rms@gnu.org (Richard Stallman) -To: gnu-prog@gnu.org -Subject: A reminder: no pathnames in GNU - -It's a GNU convention to use the term "file name" for the name of a -file, never "pathname". We use the term "path" for search paths, -which are lists of file names. Using it for a single file name as -well is potentially confusing to users. - -So please check any documentation you maintain, if you think you might -have used "pathname". - -Note that "file name" should be two words when it appears as ordinary -text. It's ok as one word when it's a metasyntactic variable, though. - ------------------------- -ORA uses filename, thus the macro. - -Suggestions: ------------- -Enhance FIELDWIDTHS with some way to indicate "the rest of the record". -E.g., a length of 0 or -1 or something. May be "n"? - -Make FIELDWIDTHS be an array? - -% Next edition: -% 1. Talk about common extensions, those in nawk, gawk, mawk -% 2. Use @code{foo} for variables and @code{foo()} for functions -% 3. Standardize the error messages from the functions and programs -% in Chapters 12 and 13. -% 4. Nuke the BBS stuff and use something that won't be obsolete -% 5. Reorg chapters 5 & 7 like so: -%Chapter 5: -% - Constants, Variables, and Conversions -% + Constant Expressions -% + Using Regular Expression Constants -% + Variables -% + Conversion of Strings and Numbers -% - Operators -% + Arithmetic Operators -% + String Concatenation -% + Assignment Expressions -% + Increment and Decrement Operators -% - Truth Values and Conditions -% + True and False in Awk -% + Boolean Expressions -% + Conditional Expressions -% - Function Calls -% - Operator Precedence -% -%Chapter 7: -% - Array Basics -% + Introduction to Arrays -% + Referring to an Array Element -% + Assigning Array Elements -% + Basic Array Example -% + Scanning All Elements of an Array -% - The delete Statement -% - Using Numbers to Subscript Arrays -% - Using Uninitialized Variables as Subscripts -% - Multidimensional Arrays -% + Scanning Multidimensional Arrays -% - Sorting Array Values and Indices with gawk |