diff options
Diffstat (limited to 'gnu/usr.bin/awk/awk.1')
-rw-r--r-- | gnu/usr.bin/awk/awk.1 | 1873 |
1 files changed, 1873 insertions, 0 deletions
diff --git a/gnu/usr.bin/awk/awk.1 b/gnu/usr.bin/awk/awk.1 new file mode 100644 index 0000000..0338485 --- /dev/null +++ b/gnu/usr.bin/awk/awk.1 @@ -0,0 +1,1873 @@ +.ds PX \s-1POSIX\s+1 +.ds UX \s-1UNIX\s+1 +.ds AN \s-1ANSI\s+1 +.TH GAWK 1 "Apr 15 1993" "Free Software Foundation" "Utility Commands" +.SH NAME +gawk \- pattern scanning and processing language +.SH SYNOPSIS +.B gawk +[ POSIX or GNU style options ] +.B \-f +.I program-file +[ +.B \-\^\- +] file .\^.\^. +.br +.B gawk +[ POSIX or GNU style options ] +[ +.B \-\^\- +] +.I program-text +file .\^.\^. +.SH DESCRIPTION +.I Gawk +is the GNU Project's implementation of the AWK programming language. +It conforms to the definition of the language in +the \*(PX 1003.2 Command Language And Utilities Standard. +This version in turn is based on the description in +.IR "The AWK Programming Language" , +by Aho, Kernighan, and Weinberger, +with the additional features defined in the System V Release 4 version +of \*(UX +.IR awk . +.I Gawk +also provides some GNU-specific extensions. +.PP +The command line consists of options to +.I gawk +itself, the AWK program text (if not supplied via the +.B \-f +or +.B \-\^\-file +options), and values to be made +available in the +.B ARGC +and +.B ARGV +pre-defined AWK variables. +.SH OPTIONS +.PP +.I Gawk +options may be either the traditional \*(PX one letter options, +or the GNU style long options. \*(PX style options start with a single ``\-'', +while GNU long options start with ``\-\^\-''. +GNU style long options are provided for both GNU-specific features and +for \*(PX mandated features. Other implementations of the AWK language +are likely to only accept the traditional one letter options. +.PP +Following the \*(PX standard, +.IR gawk -specific +options are supplied via arguments to the +.B \-W +option. Multiple +.B \-W +options may be supplied, or multiple arguments may be supplied together +if they are separated by commas, or enclosed in quotes and separated +by white space. +Case is ignored in arguments to the +.B \-W +option. +Each +.B \-W +option has a corresponding GNU style long option, as detailed below. +.PP +.I Gawk +accepts the following options. +.TP +.PD 0 +.BI \-F " fs" +.TP +.PD +.BI \-\^\-field-separator= fs +Use +.I fs +for the input field separator (the value of the +.B FS +predefined +variable). +.TP +.PD 0 +\fB\-v\fI var\fB\^=\^\fIval\fR +.TP +.PD +\fB\-\^\-assign=\fIvar\fB\^=\^\fIval\fR +Assign the value +.IR val , +to the variable +.IR var , +before execution of the program begins. +Such variable values are available to the +.B BEGIN +block of an AWK program. +.TP +.PD 0 +.BI \-f " program-file" +.TP +.PD +.BI \-\^\-file= program-file +Read the AWK program source from the file +.IR program-file , +instead of from the first command line argument. +Multiple +.B \-f +(or +.BR \-\^\-file ) +options may be used. +.TP \w'\fB\-\^\-copyright\fR'u+1n +.PD 0 +.B "\-W compat" +.TP +.PD +.B \-\^\-compat +Run in +.I compatibility +mode. In compatibility mode, +.I gawk +behaves identically to \*(UX +.IR awk ; +none of the GNU-specific extensions are recognized. +See +.BR "GNU EXTENSIONS" , +below, for more information. +.TP +.PD 0 +.B "\-W copyleft" +.TP +.PD 0 +.B "\-W copyright" +.TP +.PD 0 +.B \-\^\-copyleft +.TP +.PD +.B \-\^\-copyright +Print the short version of the GNU copyright information message on +the error output. +.TP +.PD 0 +.B "\-W help" +.TP +.PD 0 +.B "\-W usage" +.TP +.PD 0 +.B \-\^\-help +.TP +.PD +.B \-\^\-usage +Print a relatively short summary of the available options on +the error output. +.TP +.PD 0 +.B "\-W lint" +.TP +.PD 0 +.B \-\^\-lint +Provide warnings about constructs that are +dubious or non-portable to other AWK implementations. +.ig +.\" This option is left undocumented, on purpose. +.TP +.PD 0 +.B "\-W nostalgia" +.TP +.PD +.B \-\^\-nostalgia +Provide a moment of nostalgia for long time +.I awk +users. +.. +.TP +.PD 0 +.B "\-W posix" +.TP +.PD +.B \-\^\-posix +This turns on +.I compatibility +mode, with the following additional restrictions: +.RS +.TP \w'\(bu'u+1n +\(bu +.B \ex +escape sequences are not recognized. +.TP +\(bu +The synonym +.B func +for the keyword +.B function +is not recognized. +.TP +\(bu +The operators +.B ** +and +.B **= +cannot be used in place of +.B ^ +and +.BR ^= . +.RE +.TP +.PD 0 +.BI "\-W source=" program-text +.TP +.PD +.BI \-\^\-source= program-text +Use +.I program-text +as AWK program source code. +This option allows the easy intermixing of library functions (used via the +.B \-f +and +.B \-\^\-file +options) with source code entered on the command line. +It is intended primarily for medium to large size AWK programs used +in shell scripts. +.sp .5 +The +.B "\-W source=" +form of this option uses the rest of the command line argument for +.IR program-text ; +no other options to +.B \-W +will be recognized in the same argument. +.TP +.PD 0 +.B "\-W version" +.TP +.PD +.B \-\^\-version +Print version information for this particular copy of +.I gawk +on the error output. +This is useful mainly for knowing if the current copy of +.I gawk +on your system +is up to date with respect to whatever the Free Software Foundation +is distributing. +.TP +.B \-\^\- +Signal the end of options. This is useful to allow further arguments to the +AWK program itself to start with a ``\-''. +This is mainly for consistency with the argument parsing convention used +by most other \*(PX programs. +.PP +Any other options are flagged as illegal, but are otherwise ignored. +.SH AWK PROGRAM EXECUTION +.PP +An AWK program consists of a sequence of pattern-action statements +and optional function definitions. +.RS +.PP +\fIpattern\fB { \fIaction statements\fB }\fR +.br +\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR +.RE +.PP +.I Gawk +first reads the program source from the +.IR program-file (s) +if specified, or from the first non-option argument on the command line. +The +.B \-f +option may be used multiple times on the command line. +.I Gawk +will read the program text as if all the +.IR program-file s +had been concatenated together. This is useful for building libraries +of AWK functions, without having to include them in each new AWK +program that uses them. To use a library function in a file from a +program typed in on the command line, specify +.B /dev/tty +as one of the +.IR program-file s, +type your program, and end it with a +.B ^D +(control-d). +.PP +The environment variable +.B AWKPATH +specifies a search path to use when finding source files named with +the +.B \-f +option. If this variable does not exist, the default path is +\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR. +If a file name given to the +.B \-f +option contains a ``/'' character, no path search is performed. +.PP +.I Gawk +executes AWK programs in the following order. +First, +.I gawk +compiles the program into an internal form. +Next, all variable assignments specified via the +.B \-v +option are performed. Then, +.I gawk +executes the code in the +.B BEGIN +block(s) (if any), +and then proceeds to read +each file named in the +.B ARGV +array. +If there are no files named on the command line, +.I gawk +reads the standard input. +.PP +If a filename on the command line has the form +.IB var = val +it is treated as a variable assignment. The variable +.I var +will be assigned the value +.IR val . +(This happens after any +.B BEGIN +block(s) have been run.) +Command line variable assignment +is most useful for dynamically assigning values to the variables +AWK uses to control how input is broken into fields and records. It +is also useful for controlling state if multiple passes are needed over +a single data file. +.PP +If the value of a particular element of +.B ARGV +is empty (\fB""\fR), +.I gawk +skips over it. +.PP +For each line in the input, +.I gawk +tests to see if it matches any +.I pattern +in the AWK program. +For each pattern that the line matches, the associated +.I action +is executed. +The patterns are tested in the order they occur in the program. +.PP +Finally, after all the input is exhausted, +.I gawk +executes the code in the +.B END +block(s) (if any). +.SH VARIABLES AND FIELDS +AWK variables are dynamic; they come into existence when they are +first used. Their values are either floating-point numbers or strings, +or both, +depending upon how they are used. AWK also has one dimension +arrays; multiply dimensioned arrays may be simulated. +Several pre-defined variables are set as a program +runs; these will be described as needed and summarized below. +.SS Fields +.PP +As each input line is read, +.I gawk +splits the line into +.IR fields , +using the value of the +.B FS +variable as the field separator. +If +.B FS +is a single character, fields are separated by that character. +Otherwise, +.B FS +is expected to be a full regular expression. +In the special case that +.B FS +is a single blank, fields are separated +by runs of blanks and/or tabs. +Note that the value of +.B IGNORECASE +(see below) will also affect how fields are split when +.B FS +is a regular expression. +.PP +If the +.B FIELDWIDTHS +variable is set to a space separated list of numbers, each field is +expected to have fixed width, and +.I gawk +will split up the record using the specified widths. The value of +.B FS +is ignored. +Assigning a new value to +.B FS +overrides the use of +.BR FIELDWIDTHS , +and restores the default behavior. +.PP +Each field in the input line may be referenced by its position, +.BR $1 , +.BR $2 , +and so on. +.B $0 +is the whole line. The value of a field may be assigned to as well. +Fields need not be referenced by constants: +.RS +.PP +.ft B +n = 5 +.br +print $n +.ft R +.RE +.PP +prints the fifth field in the input line. +The variable +.B NF +is set to the total number of fields in the input line. +.PP +References to non-existent fields (i.e. fields after +.BR $NF ) +produce the null-string. However, assigning to a non-existent field +(e.g., +.BR "$(NF+2) = 5" ) +will increase the value of +.BR NF , +create any intervening fields with the null string as their value, and +cause the value of +.B $0 +to be recomputed, with the fields being separated by the value of +.BR OFS . +.SS Built-in Variables +.PP +AWK's built-in variables are: +.PP +.TP \w'\fBFIELDWIDTHS\fR'u+1n +.B ARGC +The number of command line arguments (does not include options to +.IR gawk , +or the program source). +.TP +.B ARGIND +The index in +.B ARGV +of the current file being processed. +.TP +.B ARGV +Array of command line arguments. The array is indexed from +0 to +.B ARGC +\- 1. +Dynamically changing the contents of +.B ARGV +can control the files used for data. +.TP +.B CONVFMT +The conversion format for numbers, \fB"%.6g"\fR, by default. +.TP +.B ENVIRON +An array containing the values of the current environment. +The array is indexed by the environment variables, each element being +the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be +.BR /u/arnold ). +Changing this array does not affect the environment seen by programs which +.I gawk +spawns via redirection or the +.B system() +function. +(This may change in a future version of +.IR gawk .) +.\" but don't hold your breath... +.TP +.B ERRNO +If a system error occurs either doing a redirection for +.BR getline , +during a read for +.BR getline , +or during a +.BR close , +then +.B ERRNO +will contain +a string describing the error. +.TP +.B FIELDWIDTHS +A white-space separated list of fieldwidths. When set, +.I gawk +parses the input into fields of fixed width, instead of using the +value of the +.B FS +variable as the field separator. +The fixed field width facility is still experimental; expect the +semantics to change as +.I gawk +evolves over time. +.TP +.B FILENAME +The name of the current input file. +If no files are specified on the command line, the value of +.B FILENAME +is ``\-''. +.TP +.B FNR +The input record number in the current input file. +.TP +.B FS +The input field separator, a blank by default. +.TP +.B IGNORECASE +Controls the case-sensitivity of all regular expression operations. If +.B IGNORECASE +has a non-zero value, then pattern matching in rules, +field splitting with +.BR FS , +regular expression +matching with +.B ~ +and +.BR !~ , +and the +.BR gsub() , +.BR index() , +.BR match() , +.BR split() , +and +.B sub() +pre-defined functions will all ignore case when doing regular expression +operations. Thus, if +.B IGNORECASE +is not equal to zero, +.B /aB/ +matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP, +and \fB"AB"\fP. +As with all AWK variables, the initial value of +.B IGNORECASE +is zero, so all regular expression operations are normally case-sensitive. +.TP +.B NF +The number of fields in the current input record. +.TP +.B NR +The total number of input records seen so far. +.TP +.B OFMT +The output format for numbers, \fB"%.6g"\fR, by default. +.TP +.B OFS +The output field separator, a blank by default. +.TP +.B ORS +The output record separator, by default a newline. +.TP +.B RS +The input record separator, by default a newline. +.B RS +is exceptional in that only the first character of its string +value is used for separating records. +(This will probably change in a future release of +.IR gawk .) +If +.B RS +is set to the null string, then records are separated by +blank lines. +When +.B RS +is set to the null string, then the newline character always acts as +a field separator, in addition to whatever value +.B FS +may have. +.TP +.B RSTART +The index of the first character matched by +.BR match() ; +0 if no match. +.TP +.B RLENGTH +The length of the string matched by +.BR match() ; +\-1 if no match. +.TP +.B SUBSEP +The character used to separate multiple subscripts in array +elements, by default \fB"\e034"\fR. +.SS Arrays +.PP +Arrays are subscripted with an expression between square brackets +.RB ( [ " and " ] ). +If the expression is an expression list +.RI ( expr ", " expr " ...)" +then the array subscript is a string consisting of the +concatenation of the (string) value of each expression, +separated by the value of the +.B SUBSEP +variable. +This facility is used to simulate multiply dimensioned +arrays. For example: +.PP +.RS +.ft B +i = "A" ;\^ j = "B" ;\^ k = "C" +.br +x[i, j, k] = "hello, world\en" +.ft R +.RE +.PP +assigns the string \fB"hello, world\en"\fR to the element of the array +.B x +which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK +are associative, i.e. indexed by string values. +.PP +The special operator +.B in +may be used in an +.B if +or +.B while +statement to see if an array has an index consisting of a particular +value. +.PP +.RS +.ft B +.nf +if (val in array) + print array[val] +.fi +.ft +.RE +.PP +If the array has multiple subscripts, use +.BR "(i, j) in array" . +.PP +The +.B in +construct may also be used in a +.B for +loop to iterate over all the elements of an array. +.PP +An element may be deleted from an array using the +.B delete +statement. +.SS Variable Typing And Conversion +.PP +Variables and fields +may be (floating point) numbers, or strings, or both. How the +value of a variable is interpreted depends upon its context. If used in +a numeric expression, it will be treated as a number, if used as a string +it will be treated as a string. +.PP +To force a variable to be treated as a number, add 0 to it; to force it +to be treated as a string, concatenate it with the null string. +.PP +When a string must be converted to a number, the conversion is accomplished +using +.IR atof (3). +A number is converted to a string by using the value of +.B CONVFMT +as a format string for +.IR sprintf (3), +with the numeric value of the variable as the argument. +However, even though all numbers in AWK are floating-point, +integral values are +.I always +converted as integers. Thus, given +.PP +.RS +.ft B +.nf +CONVFMT = "%2.2f" +a = 12 +b = a "" +.fi +.ft R +.RE +.PP +the variable +.B b +has a value of \fB"12"\fR and not \fB"12.00"\fR. +.PP +.I Gawk +performs comparisons as follows: +If two variables are numeric, they are compared numerically. +If one value is numeric and the other has a string value that is a +``numeric string,'' then comparisons are also done numerically. +Otherwise, the numeric value is converted to a string and a string +comparison is performed. +Two strings are compared, of course, as strings. +According to the \*(PX standard, even if two strings are +numeric strings, a numeric comparison is performed. However, this is +clearly incorrect, and +.I gawk +does not do this. +.PP +Uninitialized variables have the numeric value 0 and the string value "" +(the null, or empty, string). +.SH PATTERNS AND ACTIONS +AWK is a line oriented language. The pattern comes first, and then the +action. Action statements are enclosed in +.B { +and +.BR } . +Either the pattern may be missing, or the action may be missing, but, +of course, not both. If the pattern is missing, the action will be +executed for every single line of input. +A missing action is equivalent to +.RS +.PP +.B "{ print }" +.RE +.PP +which prints the entire line. +.PP +Comments begin with the ``#'' character, and continue until the +end of the line. +Blank lines may be used to separate statements. +Normally, a statement ends with a newline, however, this is not the +case for lines ending in +a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''. +Lines ending in +.B do +or +.B else +also have their statements automatically continued on the following line. +In other cases, a line can be continued by ending it with a ``\e'', +in which case the newline will be ignored. +.PP +Multiple statements may +be put on one line by separating them with a ``;''. +This applies to both the statements within the action part of a +pattern-action pair (the usual case), +and to the pattern-action statements themselves. +.SS Patterns +AWK patterns may be one of the following: +.PP +.RS +.nf +.B BEGIN +.B END +.BI / "regular expression" / +.I "relational expression" +.IB pattern " && " pattern +.IB pattern " || " pattern +.IB pattern " ? " pattern " : " pattern +.BI ( pattern ) +.BI ! " pattern" +.IB pattern1 ", " pattern2 +.fi +.RE +.PP +.B BEGIN +and +.B END +are two special kinds of patterns which are not tested against +the input. +The action parts of all +.B BEGIN +patterns are merged as if all the statements had +been written in a single +.B BEGIN +block. They are executed before any +of the input is read. Similarly, all the +.B END +blocks are merged, +and executed when all the input is exhausted (or when an +.B exit +statement is executed). +.B BEGIN +and +.B END +patterns cannot be combined with other patterns in pattern expressions. +.B BEGIN +and +.B END +patterns cannot have missing action parts. +.PP +For +.BI / "regular expression" / +patterns, the associated statement is executed for each input line that matches +the regular expression. +Regular expressions are the same as those in +.IR egrep (1), +and are summarized below. +.PP +A +.I "relational expression" +may use any of the operators defined below in the section on actions. +These generally test whether certain fields match certain regular expressions. +.PP +The +.BR && , +.BR || , +and +.B ! +operators are logical AND, logical OR, and logical NOT, respectively, as in C. +They do short-circuit evaluation, also as in C, and are used for combining +more primitive pattern expressions. As in most languages, parentheses +may be used to change the order of evaluation. +.PP +The +.B ?\^: +operator is like the same operator in C. If the first pattern is true +then the pattern used for testing is the second pattern, otherwise it is +the third. Only one of the second and third patterns is evaluated. +.PP +The +.IB pattern1 ", " pattern2 +form of an expression is called a range pattern. +It matches all input records starting with a line that matches +.IR pattern1 , +and continuing until a record that matches +.IR pattern2 , +inclusive. It does not combine with any other sort of pattern expression. +.SS Regular Expressions +Regular expressions are the extended kind found in +.IR egrep . +They are composed of characters as follows: +.TP \w'\fB[^\fIabc...\fB]\fR'u+2n +.I c +matches the non-metacharacter +.IR c . +.TP +.I \ec +matches the literal character +.IR c . +.TP +.B . +matches any character except newline. +.TP +.B ^ +matches the beginning of a line or a string. +.TP +.B $ +matches the end of a line or a string. +.TP +.BI [ abc... ] +character class, matches any of the characters +.IR abc... . +.TP +.BI [^ abc... ] +negated character class, matches any character except +.I abc... +and newline. +.TP +.IB r1 | r2 +alternation: matches either +.I r1 +or +.IR r2 . +.TP +.I r1r2 +concatenation: matches +.IR r1 , +and then +.IR r2 . +.TP +.IB r + +matches one or more +.IR r 's. +.TP +.IB r * +matches zero or more +.IR r 's. +.TP +.IB r ? +matches zero or one +.IR r 's. +.TP +.BI ( r ) +grouping: matches +.IR r . +.PP +The escape sequences that are valid in string constants (see below) +are also legal in regular expressions. +.SS Actions +Action statements are enclosed in braces, +.B { +and +.BR } . +Action statements consist of the usual assignment, conditional, and looping +statements found in most languages. The operators, control statements, +and input/output statements +available are patterned after those in C. +.SS Operators +.PP +The operators in AWK, in order of increasing precedence, are +.PP +.TP "\w'\fB*= /= %= ^=\fR'u+1n" +.PD 0 +.B "= += \-=" +.TP +.PD +.B "*= /= %= ^=" +Assignment. Both absolute assignment +.BI ( var " = " value ) +and operator-assignment (the other forms) are supported. +.TP +.B ?: +The C conditional expression. This has the form +.IB expr1 " ? " expr2 " : " expr3\c +\&. If +.I expr1 +is true, the value of the expression is +.IR expr2 , +otherwise it is +.IR expr3 . +Only one of +.I expr2 +and +.I expr3 +is evaluated. +.TP +.B || +Logical OR. +.TP +.B && +Logical AND. +.TP +.B "~ !~" +Regular expression match, negated match. +.B NOTE: +Do not use a constant regular expression +.RB ( /foo/ ) +on the left-hand side of a +.B ~ +or +.BR !~ . +Only use one on the right-hand side. The expression +.BI "/foo/ ~ " exp +has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR. +This is usually +.I not +what was intended. +.TP +.PD 0 +.B "< >" +.TP +.PD 0 +.B "<= >=" +.TP +.PD +.B "!= ==" +The regular relational operators. +.TP +.I blank +String concatenation. +.TP +.B "+ \-" +Addition and subtraction. +.TP +.B "* / %" +Multiplication, division, and modulus. +.TP +.B "+ \- !" +Unary plus, unary minus, and logical negation. +.TP +.B ^ +Exponentiation (\fB**\fR may also be used, and \fB**=\fR for +the assignment operator). +.TP +.B "++ \-\^\-" +Increment and decrement, both prefix and postfix. +.TP +.B $ +Field reference. +.SS Control Statements +.PP +The control statements are +as follows: +.PP +.RS +.nf +\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR] +\fBwhile (\fIcondition\fB) \fIstatement \fR +\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR +\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR +\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR +\fBbreak\fR +\fBcontinue\fR +\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR +\fBexit\fR [ \fIexpression\fR ] +\fB{ \fIstatements \fB} +.fi +.RE +.SS "I/O Statements" +.PP +The input/output statements are as follows: +.PP +.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n" +.BI close( filename ) +Close file (or pipe, see below). +.TP +.B getline +Set +.B $0 +from next input record; set +.BR NF , +.BR NR , +.BR FNR . +.TP +.BI "getline <" file +Set +.B $0 +from next record of +.IR file ; +set +.BR NF . +.TP +.BI getline " var" +Set +.I var +from next input record; set +.BR NF , +.BR FNR . +.TP +.BI getline " var" " <" file +Set +.I var +from next record of +.IR file . +.TP +.B next +Stop processing the current input record. The next input record +is read and processing starts over with the first pattern in the +AWK program. If the end of the input data is reached, the +.B END +block(s), if any, are executed. +.TP +.B "next file" +Stop processing the current input file. The next input record read +comes from the next input file. +.B FILENAME +is updated, +.B FNR +is reset to 1, and processing starts over with the first pattern in the +AWK program. If the end of the input data is reached, the +.B END +block(s), if any, are executed. +.TP +.B print +Prints the current record. +.TP +.BI print " expr-list" +Prints expressions. +.TP +.BI print " expr-list" " >" file +Prints expressions on +.IR file . +.TP +.BI printf " fmt, expr-list" +Format and print. +.TP +.BI printf " fmt, expr-list" " >" file +Format and print on +.IR file . +.TP +.BI system( cmd-line ) +Execute the command +.IR cmd-line , +and return the exit status. +(This may not be available on non-\*(PX systems.) +.PP +Other input/output redirections are also allowed. For +.B print +and +.BR printf , +.BI >> file +appends output to the +.IR file , +while +.BI | " command" +writes on a pipe. +In a similar fashion, +.IB command " | getline" +pipes into +.BR getline . +.BR Getline +will return 0 on end of file, and \-1 on an error. +.SS The \fIprintf\fP\^ Statement +.PP +The AWK versions of the +.B printf +statement and +.B sprintf() +function +(see below) +accept the following conversion specification formats: +.TP +.B %c +An \s-1ASCII\s+1 character. +If the argument used for +.B %c +is numeric, it is treated as a character and printed. +Otherwise, the argument is assumed to be a string, and the only first +character of that string is printed. +.TP +.B %d +A decimal number (the integer part). +.TP +.B %i +Just like +.BR %d . +.TP +.B %e +A floating point number of the form +.BR [\-]d.ddddddE[+\^\-]dd . +.TP +.B %f +A floating point number of the form +.BR [\-]ddd.dddddd . +.TP +.B %g +Use +.B e +or +.B f +conversion, whichever is shorter, with nonsignificant zeros suppressed. +.TP +.B %o +An unsigned octal number (again, an integer). +.TP +.B %s +A character string. +.TP +.B %x +An unsigned hexadecimal number (an integer). +.TP +.B %X +Like +.BR %x , +but using +.B ABCDEF +instead of +.BR abcdef . +.TP +.B %% +A single +.B % +character; no argument is converted. +.PP +There are optional, additional parameters that may lie between the +.B % +and the control letter: +.TP +.B \- +The expression should be left-justified within its field. +.TP +.I width +The field should be padded to this width. If the number has a leading +zero, then the field will be padded with zeros. +Otherwise it is padded with blanks. +.TP +.BI . prec +A number indicating the maximum width of strings or digits to the right +of the decimal point. +.PP +The dynamic +.I width +and +.I prec +capabilities of the \*(AN C +.B printf() +routines are supported. +A +.B * +in place of either the +.B width +or +.B prec +specifications will cause their values to be taken from +the argument list to +.B printf +or +.BR sprintf() . +.SS Special File Names +.PP +When doing I/O redirection from either +.B print +or +.B printf +into a file, +or via +.B getline +from a file, +.I gawk +recognizes certain special filenames internally. These filenames +allow access to open file descriptors inherited from +.IR gawk 's +parent process (usually the shell). +Other special filenames provide access information about the running +.B gawk +process. +The filenames are: +.TP \w'\fB/dev/stdout\fR'u+1n +.B /dev/pid +Reading this file returns the process ID of the current process, +in decimal, terminated with a newline. +.TP +.B /dev/ppid +Reading this file returns the parent process ID of the current process, +in decimal, terminated with a newline. +.TP +.B /dev/pgrpid +Reading this file returns the process group ID of the current process, +in decimal, terminated with a newline. +.TP +.B /dev/user +Reading this file returns a single record terminated with a newline. +The fields are separated with blanks. +.B $1 +is the value of the +.IR getuid (2) +system call, +.B $2 +is the value of the +.IR geteuid (2) +system call, +.B $3 +is the value of the +.IR getgid (2) +system call, and +.B $4 +is the value of the +.IR getegid (2) +system call. +If there are any additional fields, they are the group IDs returned by +.IR getgroups (2). +(Multiple groups may not be supported on all systems.) +.TP +.B /dev/stdin +The standard input. +.TP +.B /dev/stdout +The standard output. +.TP +.B /dev/stderr +The standard error output. +.TP +.BI /dev/fd/\^ n +The file associated with the open file descriptor +.IR n . +.PP +These are particularly useful for error messages. For example: +.PP +.RS +.ft B +print "You blew it!" > "/dev/stderr" +.ft R +.RE +.PP +whereas you would otherwise have to use +.PP +.RS +.ft B +print "You blew it!" | "cat 1>&2" +.ft R +.RE +.PP +These file names may also be used on the command line to name data files. +.SS Numeric Functions +.PP +AWK has the following pre-defined arithmetic functions: +.PP +.TP \w'\fBsrand(\^\fIexpr\^\fB)\fR'u+1n +.BI atan2( y , " x" ) +returns the arctangent of +.I y/x +in radians. +.TP +.BI cos( expr ) +returns the cosine in radians. +.TP +.BI exp( expr ) +the exponential function. +.TP +.BI int( expr ) +truncates to integer. +.TP +.BI log( expr ) +the natural logarithm function. +.TP +.B rand() +returns a random number between 0 and 1. +.TP +.BI sin( expr ) +returns the sine in radians. +.TP +.BI sqrt( expr ) +the square root function. +.TP +.BI srand( expr ) +use +.I expr +as a new seed for the random number generator. If no +.I expr +is provided, the time of day will be used. +The return value is the previous seed for the random +number generator. +.SS String Functions +.PP +AWK has the following pre-defined string functions: +.PP +.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n" +\fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR +for each substring matching the regular expression +.I r +in the string +.IR t , +substitute the string +.IR s , +and return the number of substitutions. +If +.I t +is not supplied, use +.BR $0 . +.TP +.BI index( s , " t" ) +returns the index of the string +.I t +in the string +.IR s , +or 0 if +.I t +is not present. +.TP +.BI length( s ) +returns the length of the string +.IR s , +or the length of +.B $0 +if +.I s +is not supplied. +.TP +.BI match( s , " r" ) +returns the position in +.I s +where the regular expression +.I r +occurs, or 0 if +.I r +is not present, and sets the values of +.B RSTART +and +.BR RLENGTH . +.TP +\fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR +splits the string +.I s +into the array +.I a +on the regular expression +.IR r , +and returns the number of fields. If +.I r +is omitted, +.B FS +is used instead. +.TP +.BI sprintf( fmt , " expr-list" ) +prints +.I expr-list +according to +.IR fmt , +and returns the resulting string. +.TP +\fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR +just like +.BR gsub() , +but only the first matching substring is replaced. +.TP +\fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR +returns the +.IR n -character +substring of +.I s +starting at +.IR i . +If +.I n +is omitted, the rest of +.I s +is used. +.TP +.BI tolower( str ) +returns a copy of the string +.IR str , +with all the upper-case characters in +.I str +translated to their corresponding lower-case counterparts. +Non-alphabetic characters are left unchanged. +.TP +.BI toupper( str ) +returns a copy of the string +.IR str , +with all the lower-case characters in +.I str +translated to their corresponding upper-case counterparts. +Non-alphabetic characters are left unchanged. +.SS Time Functions +.PP +Since one of the primary uses of AWK programs is processing log files +that contain time stamp information, +.I gawk +provides the following two functions for obtaining time stamps and +formatting them. +.PP +.TP "\w'\fBsystime()\fR'u+1n" +.B systime() +returns the current time of day as the number of seconds since the Epoch +(Midnight UTC, January 1, 1970 on \*(PX systems). +.TP +\fBstrftime(\fIformat\fR, \fItimestamp\fB)\fR +formats +.I timestamp +according to the specification in +.IR format. +The +.I timestamp +should be of the same form as returned by +.BR systime() . +If +.I timestamp +is missing, the current time of day is used. +See the specification for the +.B strftime() +function in \*(AN C for the format conversions that are +guaranteed to be available. +A public-domain version of +.IR strftime (3) +and a man page for it are shipped with +.IR gawk ; +if that version was used to build +.IR gawk , +then all of the conversions described in that man page are available to +.IR gawk. +.SS String Constants +.PP +String constants in AWK are sequences of characters enclosed +between double quotes (\fB"\fR). Within strings, certain +.I "escape sequences" +are recognized, as in C. These are: +.PP +.TP \w'\fB\e\^\fIddd\fR'u+1n +.B \e\e +A literal backslash. +.TP +.B \ea +The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character. +.TP +.B \eb +backspace. +.TP +.B \ef +form-feed. +.TP +.B \en +new line. +.TP +.B \er +carriage return. +.TP +.B \et +horizontal tab. +.TP +.B \ev +vertical tab. +.TP +.BI \ex "\^hex digits" +The character represented by the string of hexadecimal digits following +the +.BR \ex . +As in \*(AN C, all following hexadecimal digits are considered part of +the escape sequence. +(This feature should tell us something about language design by committee.) +E.g., "\ex1B" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character. +.TP +.BI \e ddd +The character represented by the 1-, 2-, or 3-digit sequence of octal +digits. E.g. "\e033" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character. +.TP +.BI \e c +The literal character +.IR c\^ . +.PP +The escape sequences may also be used inside constant regular expressions +(e.g., +.B "/[\ \et\ef\en\er\ev]/" +matches whitespace characters). +.SH FUNCTIONS +Functions in AWK are defined as follows: +.PP +.RS +\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR +.RE +.PP +Functions are executed when called from within the action parts of regular +pattern-action statements. Actual parameters supplied in the function +call are used to instantiate the formal parameters declared in the function. +Arrays are passed by reference, other variables are passed by value. +.PP +Since functions were not originally part of the AWK language, the provision +for local variables is rather clumsy: They are declared as extra parameters +in the parameter list. The convention is to separate local variables from +real parameters by extra spaces in the parameter list. For example: +.PP +.RS +.ft B +.nf +function f(p, q, a, b) { # a & b are local + ..... } + +/abc/ { ... ; f(1, 2) ; ... } +.fi +.ft R +.RE +.PP +The left parenthesis in a function call is required +to immediately follow the function name, +without any intervening white space. +This is to avoid a syntactic ambiguity with the concatenation operator. +This restriction does not apply to the built-in functions listed above. +.PP +Functions may call each other and may be recursive. +Function parameters used as local variables are initialized +to the null string and the number zero upon function invocation. +.PP +The word +.B func +may be used in place of +.BR function . +.SH EXAMPLES +.nf +Print and sort the login names of all users: + +.ft B + BEGIN { FS = ":" } + { print $1 | "sort" } + +.ft R +Count lines in a file: + +.ft B + { nlines++ } + END { print nlines } + +.ft R +Precede each line by its number in the file: + +.ft B + { print FNR, $0 } + +.ft R +Concatenate and line number (a variation on a theme): + +.ft B + { print NR, $0 } +.ft R +.fi +.SH SEE ALSO +.IR egrep (1) +.PP +.IR "The AWK Programming Language" , +Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, +Addison-Wesley, 1988. ISBN 0-201-07981-X. +.PP +.IR "The GAWK Manual" , +Edition 0.15, published by the Free Software Foundation, 1993. +.SH POSIX COMPATIBILITY +A primary goal for +.I gawk +is compatibility with the \*(PX standard, as well as with the +latest version of \*(UX +.IR awk . +To this end, +.I gawk +incorporates the following user visible +features which are not described in the AWK book, +but are part of +.I awk +in System V Release 4, and are in the \*(PX standard. +.PP +The +.B \-v +option for assigning variables before program execution starts is new. +The book indicates that command line variable assignment happens when +.I awk +would otherwise open the argument as a file, which is after the +.B BEGIN +block is executed. However, in earlier implementations, when such an +assignment appeared before any file names, the assignment would happen +.I before +the +.B BEGIN +block was run. Applications came to depend on this ``feature.'' +When +.I awk +was changed to match its documentation, this option was added to +accomodate applications that depended upon the old behavior. +(This feature was agreed upon by both the AT&T and GNU developers.) +.PP +The +.B \-W +option for implementation specific features is from the \*(PX standard. +.PP +When processing arguments, +.I gawk +uses the special option ``\fB\-\^\-\fP'' to signal the end of +arguments, and warns about, but otherwise ignores, undefined options. +.PP +The AWK book does not define the return value of +.BR srand() . +The System V Release 4 version of \*(UX +.I awk +(and the \*(PX standard) +has it return the seed it was using, to allow keeping track +of random number sequences. Therefore +.B srand() +in +.I gawk +also returns its current seed. +.PP +Other new features are: +The use of multiple +.B \-f +options (from MKS +.IR awk ); +the +.B ENVIRON +array; the +.BR \ea , +and +.BR \ev +escape sequences (done originally in +.I gawk +and fed back into AT&T's); the +.B tolower() +and +.B toupper() +built-in functions (from AT&T); and the \*(AN C conversion specifications in +.B printf +(done first in AT&T's version). +.SH GNU EXTENSIONS +.I Gawk +has some extensions to \*(PX +.IR awk . +They are described in this section. All the extensions described here +can be disabled by +invoking +.I gawk +with the +.B "\-W compat" +option. +.PP +The following features of +.I gawk +are not available in +\*(PX +.IR awk . +.RS +.TP \w'\(bu'u+1n +\(bu +The +.B \ex +escape sequence. +.TP +\(bu +The +.B systime() +and +.B strftime() +functions. +.TP +\(bu +The special file names available for I/O redirection are not recognized. +.TP +\(bu +The +.B ARGIND +and +.B ERRNO +variables are not special. +.TP +\(bu +The +.B IGNORECASE +variable and its side-effects are not available. +.TP +\(bu +The +.B FIELDWIDTHS +variable and fixed width field splitting. +.TP +\(bu +No path search is performed for files named via the +.B \-f +option. Therefore the +.B AWKPATH +environment variable is not special. +.TP +\(bu +The use of +.B "next file" +to abandon processing of the current input file. +.RE +.PP +The AWK book does not define the return value of the +.B close() +function. +.IR Gawk\^ 's +.B close() +returns the value from +.IR fclose (3), +or +.IR pclose (3), +when closing a file or pipe, respectively. +.PP +When +.I gawk +is invoked with the +.B "\-W compat" +option, +if the +.I fs +argument to the +.B \-F +option is ``t'', then +.B FS +will be set to the tab character. +Since this is a rather ugly special case, it is not the default behavior. +This behavior also does not occur if +.B \-Wposix +has been specified. +.ig +.PP +If +.I gawk +was compiled for debugging, it will +accept the following additional options: +.TP +.PD 0 +.B \-Wparsedebug +.TP +.PD +.B \-\^\-parsedebug +Turn on +.IR yacc (1) +or +.IR bison (1) +debugging output during program parsing. +This option should only be of interest to the +.I gawk +maintainers, and may not even be compiled into +.IR gawk . +.. +.SH HISTORICAL FEATURES +There are two features of historical AWK implementations that +.I gawk +supports. +First, it is possible to call the +.B length() +built-in function not only with no argument, but even without parentheses! +Thus, +.RS +.PP +.ft B +a = length +.ft R +.RE +.PP +is the same as either of +.RS +.PP +.ft B +a = length() +.br +a = length($0) +.ft R +.RE +.PP +This feature is marked as ``deprecated'' in the \*(PX standard, and +.I gawk +will issue a warning about its use if +.B \-Wlint +is specified on the command line. +.PP +The other feature is the use of the +.B continue +statement outside the body of a +.BR while , +.BR for , +or +.B do +loop. Traditional AWK implementations have treated such usage as +equivalent to the +.B next +statement. +.I Gawk +will support this usage if +.B \-Wposix +has not been specified. +.SH BUGS +The +.B \-F +option is not necessary given the command line variable assignment feature; +it remains only for backwards compatibility. +.PP +If your system actually has support for +.B /dev/fd +and the associated +.BR /dev/stdin , +.BR /dev/stdout , +and +.B /dev/stderr +files, you may get different output from +.I gawk +than you would get on a system without those files. When +.I gawk +interprets these files internally, it synchronizes output to the standard +output with output to +.BR /dev/stdout , +while on a system with those files, the output is actually to different +open files. +Caveat Emptor. +.SH VERSION INFORMATION +This man page documents +.IR gawk , +version 2.15. +.PP +Starting with the 2.15 version of +.IR gawk , +the +.BR \-c , +.BR \-V , +.BR \-C , +.ig +.BR \-D , +.. +.BR \-a , +and +.B \-e +options of the 2.11 version are no longer recognized. +.SH AUTHORS +The original version of \*(UX +.I awk +was designed and implemented by Alfred Aho, +Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan +continues to maintain and enhance it. +.PP +Paul Rubin and Jay Fenlason, +of the Free Software Foundation, wrote +.IR gawk , +to be compatible with the original version of +.I awk +distributed in Seventh Edition \*(UX. +John Woods contributed a number of bug fixes. +David Trueman, with contributions +from Arnold Robbins, made +.I gawk +compatible with the new version of \*(UX +.IR awk . +.PP +The initial DOS port was done by Conrad Kwok and Scott Garfinkle. +Scott Deifik is the current DOS maintainer. Pat Rankin did the +port to VMS, and Michal Jaegermann did the port to the Atari ST. +.SH ACKNOWLEDGEMENTS +Brian Kernighan of Bell Labs +provided valuable assistance during testing and debugging. +We thank him. |