diff options
Diffstat (limited to 'contrib/perl5/pod/perlform.pod')
-rw-r--r-- | contrib/perl5/pod/perlform.pod | 337 |
1 files changed, 337 insertions, 0 deletions
diff --git a/contrib/perl5/pod/perlform.pod b/contrib/perl5/pod/perlform.pod new file mode 100644 index 0000000..6b65e04 --- /dev/null +++ b/contrib/perl5/pod/perlform.pod @@ -0,0 +1,337 @@ +=head1 NAME + +perlform - Perl formats + +=head1 DESCRIPTION + +Perl has a mechanism to help you generate simple reports and charts. To +facilitate this, Perl helps you code up your output page close to how it +will look when it's printed. It can keep track of things like how many +lines are on a page, what page you're on, when to print page headers, +etc. Keywords are borrowed from FORTRAN: format() to declare and write() +to execute; see their entries in L<perlfunc>. Fortunately, the layout is +much more legible, more like BASIC's PRINT USING statement. Think of it +as a poor man's nroff(1). + +Formats, like packages and subroutines, are declared rather than +executed, so they may occur at any point in your program. (Usually it's +best to keep them all together though.) They have their own namespace +apart from all the other "types" in Perl. This means that if you have a +function named "Foo", it is not the same thing as having a format named +"Foo". However, the default name for the format associated with a given +filehandle is the same as the name of the filehandle. Thus, the default +format for STDOUT is named "STDOUT", and the default format for filehandle +TEMP is named "TEMP". They just look the same. They aren't. + +Output record formats are declared as follows: + + format NAME = + FORMLIST + . + +If name is omitted, format "STDOUT" is defined. FORMLIST consists of +a sequence of lines, each of which may be one of three types: + +=over 4 + +=item 1. + +A comment, indicated by putting a '#' in the first column. + +=item 2. + +A "picture" line giving the format for one output line. + +=item 3. + +An argument line supplying values to plug into the previous picture line. + +=back + +Picture lines are printed exactly as they look, except for certain fields +that substitute values into the line. Each field in a picture line starts +with either "@" (at) or "^" (caret). These lines do not undergo any kind +of variable interpolation. The at field (not to be confused with the array +marker @) is the normal kind of field; the other kind, caret fields, are used +to do rudimentary multi-line text block filling. The length of the field +is supplied by padding out the field with multiple "E<lt>", "E<gt>", or "|" +characters to specify, respectively, left justification, right +justification, or centering. If the variable would exceed the width +specified, it is truncated. + +As an alternate form of right justification, you may also use "#" +characters (with an optional ".") to specify a numeric field. This way +you can line up the decimal points. If any value supplied for these +fields contains a newline, only the text up to the newline is printed. +Finally, the special field "@*" can be used for printing multi-line, +nontruncated values; it should appear by itself on a line. + +The values are specified on the following line in the same order as +the picture fields. The expressions providing the values should be +separated by commas. The expressions are all evaluated in a list context +before the line is processed, so a single list expression could produce +multiple list elements. The expressions may be spread out to more than +one line if enclosed in braces. If so, the opening brace must be the first +token on the first line. If an expression evaluates to a number with a +decimal part, and if the corresponding picture specifies that the decimal +part should appear in the output (that is, any picture except multiple "#" +characters B<without> an embedded "."), the character used for the decimal +point is B<always> determined by the current LC_NUMERIC locale. This +means that, if, for example, the run-time environment happens to specify a +German locale, "," will be used instead of the default ".". See +L<perllocale> and L<"WARNINGS"> for more information. + +Picture fields that begin with ^ rather than @ are treated specially. +With a # field, the field is blanked out if the value is undefined. For +other field types, the caret enables a kind of fill mode. Instead of an +arbitrary expression, the value supplied must be a scalar variable name +that contains a text string. Perl puts as much text as it can into the +field, and then chops off the front of the string so that the next time +the variable is referenced, more of the text can be printed. (Yes, this +means that the variable itself is altered during execution of the write() +call, and is not returned.) Normally you would use a sequence of fields +in a vertical stack to print out a block of text. You might wish to end +the final field with the text "...", which will appear in the output if +the text was too long to appear in its entirety. You can change which +characters are legal to break on by changing the variable C<$:> (that's +$FORMAT_LINE_BREAK_CHARACTERS if you're using the English module) to a +list of the desired characters. + +Using caret fields can produce variable length records. If the text +to be formatted is short, you can suppress blank lines by putting a +"~" (tilde) character anywhere in the line. The tilde will be translated +to a space upon output. If you put a second tilde contiguous to the +first, the line will be repeated until all the fields on the line are +exhausted. (If you use a field of the at variety, the expression you +supply had better not give the same value every time forever!) + +Top-of-form processing is by default handled by a format with the +same name as the current filehandle with "_TOP" concatenated to it. +It's triggered at the top of each page. See L<perlfunc/write>. + +Examples: + + # a report on the /etc/passwd file + format STDOUT_TOP = + Passwd File + Name Login Office Uid Gid Home + ------------------------------------------------------------------ + . + format STDOUT = + @<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<< + $name, $login, $office,$uid,$gid, $home + . + + + # a report from a bug report form + format STDOUT_TOP = + Bug Reports + @<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>> + $system, $%, $date + ------------------------------------------------------------------ + . + format STDOUT = + Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $subject + Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $index, $description + Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $priority, $date, $description + From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $from, $description + Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $programmer, $description + ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $description + ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $description + ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $description + ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $description + ~ ^<<<<<<<<<<<<<<<<<<<<<<<... + $description + . + +It is possible to intermix print()s with write()s on the same output +channel, but you'll have to handle C<$-> (C<$FORMAT_LINES_LEFT>) +yourself. + +=head2 Format Variables + +The current format name is stored in the variable C<$~> (C<$FORMAT_NAME>), +and the current top of form format name is in C<$^> (C<$FORMAT_TOP_NAME>). +The current output page number is stored in C<$%> (C<$FORMAT_PAGE_NUMBER>), +and the number of lines on the page is in C<$=> (C<$FORMAT_LINES_PER_PAGE>). +Whether to autoflush output on this handle is stored in C<$|> +(C<$OUTPUT_AUTOFLUSH>). The string output before each top of page (except +the first) is stored in C<$^L> (C<$FORMAT_FORMFEED>). These variables are +set on a per-filehandle basis, so you'll need to select() into a different +one to affect them: + + select((select(OUTF), + $~ = "My_Other_Format", + $^ = "My_Top_Format" + )[0]); + +Pretty ugly, eh? It's a common idiom though, so don't be too surprised +when you see it. You can at least use a temporary variable to hold +the previous filehandle: (this is a much better approach in general, +because not only does legibility improve, you now have intermediary +stage in the expression to single-step the debugger through): + + $ofh = select(OUTF); + $~ = "My_Other_Format"; + $^ = "My_Top_Format"; + select($ofh); + +If you use the English module, you can even read the variable names: + + use English; + $ofh = select(OUTF); + $FORMAT_NAME = "My_Other_Format"; + $FORMAT_TOP_NAME = "My_Top_Format"; + select($ofh); + +But you still have those funny select()s. So just use the FileHandle +module. Now, you can access these special variables using lowercase +method names instead: + + use FileHandle; + format_name OUTF "My_Other_Format"; + format_top_name OUTF "My_Top_Format"; + +Much better! + +=head1 NOTES + +Because the values line may contain arbitrary expressions (for at fields, +not caret fields), you can farm out more sophisticated processing +to other functions, like sprintf() or one of your own. For example: + + format Ident = + @<<<<<<<<<<<<<<< + &commify($n) + . + +To get a real at or caret into the field, do this: + + format Ident = + I have an @ here. + "@" + . + +To center a whole line of text, do something like this: + + format Ident = + @||||||||||||||||||||||||||||||||||||||||||||||| + "Some text line" + . + +There is no builtin way to say "float this to the right hand side +of the page, however wide it is." You have to specify where it goes. +The truly desperate can generate their own format on the fly, based +on the current number of columns, and then eval() it: + + $format = "format STDOUT = \n" + . '^' . '<' x $cols . "\n" + . '$entry' . "\n" + . "\t^" . "<" x ($cols-8) . "~~\n" + . '$entry' . "\n" + . ".\n"; + print $format if $Debugging; + eval $format; + die $@ if $@; + +Which would generate a format looking something like this: + + format STDOUT = + ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< + $entry + ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~ + $entry + . + +Here's a little program that's somewhat like fmt(1): + + format = + ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~ + $_ + + . + + $/ = ''; + while (<>) { + s/\s*\n\s*/ /g; + write; + } + +=head2 Footers + +While $FORMAT_TOP_NAME contains the name of the current header format, +there is no corresponding mechanism to automatically do the same thing +for a footer. Not knowing how big a format is going to be until you +evaluate it is one of the major problems. It's on the TODO list. + +Here's one strategy: If you have a fixed-size footer, you can get footers +by checking $FORMAT_LINES_LEFT before each write() and print the footer +yourself if necessary. + +Here's another strategy: Open a pipe to yourself, using C<open(MYSELF, "|-")> +(see L<perlfunc/open()>) and always write() to MYSELF instead of STDOUT. +Have your child process massage its STDIN to rearrange headers and footers +however you like. Not very convenient, but doable. + +=head2 Accessing Formatting Internals + +For low-level access to the formatting mechanism. you may use formline() +and access C<$^A> (the $ACCUMULATOR variable) directly. + +For example: + + $str = formline <<'END', 1,2,3; + @<<< @||| @>>> + END + + print "Wow, I just stored `$^A' in the accumulator!\n"; + +Or to make an swrite() subroutine, which is to write() what sprintf() +is to printf(), do this: + + use Carp; + sub swrite { + croak "usage: swrite PICTURE ARGS" unless @_; + my $format = shift; + $^A = ""; + formline($format,@_); + return $^A; + } + + $string = swrite(<<'END', 1, 2, 3); + Check me out + @<<< @||| @>>> + END + print $string; + +=head1 WARNINGS + +The lone dot that ends a format can also prematurely end a mail +message passing through a misconfigured Internet mailer (and based on +experience, such misconfiguration is the rule, not the exception). So +when sending format code through mail, you should indent it so that +the format-ending dot is not on the left margin; this will prevent +SMTP cutoff. + +Lexical variables (declared with "my") are not visible within a +format unless the format is declared within the scope of the lexical +variable. (They weren't visible at all before version 5.001.) + +Formats are the only part of Perl that unconditionally use information +from a program's locale; if a program's environment specifies an +LC_NUMERIC locale, it is always used to specify the decimal point +character in formatted output. Perl ignores all other aspects of locale +handling unless the C<use locale> pragma is in effect. Formatted output +cannot be controlled by C<use locale> because the pragma is tied to the +block structure of the program, and, for historical reasons, formats +exist outside that block structure. See L<perllocale> for further +discussion of locale handling. |