diff options
Diffstat (limited to 'contrib/perl5/pod/perlsyn.pod')
-rw-r--r-- | contrib/perl5/pod/perlsyn.pod | 617 |
1 files changed, 617 insertions, 0 deletions
diff --git a/contrib/perl5/pod/perlsyn.pod b/contrib/perl5/pod/perlsyn.pod new file mode 100644 index 0000000..8321235 --- /dev/null +++ b/contrib/perl5/pod/perlsyn.pod @@ -0,0 +1,617 @@ +=head1 NAME + +perlsyn - Perl syntax + +=head1 DESCRIPTION + +A Perl script consists of a sequence of declarations and statements. +The only things that need to be declared in Perl are report formats +and subroutines. See the sections below for more information on those +declarations. All uninitialized user-created objects are assumed to +start with a C<null> or C<0> value until they are defined by some explicit +operation such as assignment. (Though you can get warnings about the +use of undefined values if you like.) The sequence of statements is +executed just once, unlike in B<sed> and B<awk> scripts, where the +sequence of statements is executed for each input line. While this means +that you must explicitly loop over the lines of your input file (or +files), it also means you have much more control over which files and +which lines you look at. (Actually, I'm lying--it is possible to do an +implicit loop with either the B<-n> or B<-p> switch. It's just not the +mandatory default like it is in B<sed> and B<awk>.) + +=head2 Declarations + +Perl is, for the most part, a free-form language. (The only +exception to this is format declarations, for obvious reasons.) Comments +are indicated by the C<"#"> character, and extend to the end of the line. If +you attempt to use C</* */> C-style comments, it will be interpreted +either as division or pattern matching, depending on the context, and C++ +C<//> comments just look like a null regular expression, so don't do +that. + +A declaration can be put anywhere a statement can, but has no effect on +the execution of the primary sequence of statements--declarations all +take effect at compile time. Typically all the declarations are put at +the beginning or the end of the script. However, if you're using +lexically-scoped private variables created with C<my()>, you'll have to make sure +your format or subroutine definition is within the same block scope +as the my if you expect to be able to access those private variables. + +Declaring a subroutine allows a subroutine name to be used as if it were a +list operator from that point forward in the program. You can declare a +subroutine without defining it by saying C<sub name>, thus: + + sub myname; + $me = myname $0 or die "can't get myname"; + +Note that it functions as a list operator, not as a unary operator; so +be careful to use C<or> instead of C<||> in this case. However, if +you were to declare the subroutine as C<sub myname ($)>, then +C<myname> would function as a unary operator, so either C<or> or +C<||> would work. + +Subroutines declarations can also be loaded up with the C<require> statement +or both loaded and imported into your namespace with a C<use> statement. +See L<perlmod> for details on this. + +A statement sequence may contain declarations of lexically-scoped +variables, but apart from declaring a variable name, the declaration acts +like an ordinary statement, and is elaborated within the sequence of +statements as if it were an ordinary statement. That means it actually +has both compile-time and run-time effects. + +=head2 Simple statements + +The only kind of simple statement is an expression evaluated for its +side effects. Every simple statement must be terminated with a +semicolon, unless it is the final statement in a block, in which case +the semicolon is optional. (A semicolon is still encouraged there if the +block takes up more than one line, because you may eventually add another line.) +Note that there are some operators like C<eval {}> and C<do {}> that look +like compound statements, but aren't (they're just TERMs in an expression), +and thus need an explicit termination if used as the last item in a statement. + +Any simple statement may optionally be followed by a I<SINGLE> modifier, +just before the terminating semicolon (or block ending). The possible +modifiers are: + + if EXPR + unless EXPR + while EXPR + until EXPR + foreach EXPR + +The C<if> and C<unless> modifiers have the expected semantics, +presuming you're a speaker of English. The C<foreach> modifier is an +iterator: For each value in EXPR, it aliases C<$_> to the value and +executes the statement. The C<while> and C<until> modifiers have the +usual "C<while> loop" semantics (conditional evaluated first), except +when applied to a C<do>-BLOCK (or to the now-deprecated C<do>-SUBROUTINE +statement), in which case the block executes once before the +conditional is evaluated. This is so that you can write loops like: + + do { + $line = <STDIN>; + ... + } until $line eq ".\n"; + +See L<perlfunc/do>. Note also that the loop control statements described +later will I<NOT> work in this construct, because modifiers don't take +loop labels. Sorry. You can always put another block inside of it +(for C<next>) or around it (for C<last>) to do that sort of thing. +For C<next>, just double the braces: + + do {{ + next if $x == $y; + # do something here + }} until $x++ > $z; + +For C<last>, you have to be more elaborate: + + LOOP: { + do { + last if $x = $y**2; + # do something here + } while $x++ <= $z; + } + +=head2 Compound statements + +In Perl, a sequence of statements that defines a scope is called a block. +Sometimes a block is delimited by the file containing it (in the case +of a required file, or the program as a whole), and sometimes a block +is delimited by the extent of a string (in the case of an eval). + +But generally, a block is delimited by curly brackets, also known as braces. +We will call this syntactic construct a BLOCK. + +The following compound statements may be used to control flow: + + if (EXPR) BLOCK + if (EXPR) BLOCK else BLOCK + if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK + LABEL while (EXPR) BLOCK + LABEL while (EXPR) BLOCK continue BLOCK + LABEL for (EXPR; EXPR; EXPR) BLOCK + LABEL foreach VAR (LIST) BLOCK + LABEL BLOCK continue BLOCK + +Note that, unlike C and Pascal, these are defined in terms of BLOCKs, +not statements. This means that the curly brackets are I<required>--no +dangling statements allowed. If you want to write conditionals without +curly brackets there are several other ways to do it. The following +all do the same thing: + + if (!open(FOO)) { die "Can't open $FOO: $!"; } + die "Can't open $FOO: $!" unless open(FOO); + open(FOO) or die "Can't open $FOO: $!"; # FOO or bust! + open(FOO) ? 'hi mom' : die "Can't open $FOO: $!"; + # a bit exotic, that last one + +The C<if> statement is straightforward. Because BLOCKs are always +bounded by curly brackets, there is never any ambiguity about which +C<if> an C<else> goes with. If you use C<unless> in place of C<if>, +the sense of the test is reversed. + +The C<while> statement executes the block as long as the expression is +true (does not evaluate to the null string (C<"">) or C<0> or C<"0")>. The LABEL is +optional, and if present, consists of an identifier followed by a colon. +The LABEL identifies the loop for the loop control statements C<next>, +C<last>, and C<redo>. If the LABEL is omitted, the loop control statement +refers to the innermost enclosing loop. This may include dynamically +looking back your call-stack at run time to find the LABEL. Such +desperate behavior triggers a warning if you use the B<-w> flag. + +If there is a C<continue> BLOCK, it is always executed just before the +conditional is about to be evaluated again, just like the third part of a +C<for> loop in C. Thus it can be used to increment a loop variable, even +when the loop has been continued via the C<next> statement (which is +similar to the C C<continue> statement). + +=head2 Loop Control + +The C<next> command is like the C<continue> statement in C; it starts +the next iteration of the loop: + + LINE: while (<STDIN>) { + next LINE if /^#/; # discard comments + ... + } + +The C<last> command is like the C<break> statement in C (as used in +loops); it immediately exits the loop in question. The +C<continue> block, if any, is not executed: + + LINE: while (<STDIN>) { + last LINE if /^$/; # exit when done with header + ... + } + +The C<redo> command restarts the loop block without evaluating the +conditional again. The C<continue> block, if any, is I<not> executed. +This command is normally used by programs that want to lie to themselves +about what was just input. + +For example, when processing a file like F</etc/termcap>. +If your input lines might end in backslashes to indicate continuation, you +want to skip ahead and get the next record. + + while (<>) { + chomp; + if (s/\\$//) { + $_ .= <>; + redo unless eof(); + } + # now process $_ + } + +which is Perl short-hand for the more explicitly written version: + + LINE: while (defined($line = <ARGV>)) { + chomp($line); + if ($line =~ s/\\$//) { + $line .= <ARGV>; + redo LINE unless eof(); # not eof(ARGV)! + } + # now process $line + } + +Note that if there were a C<continue> block on the above code, it would get +executed even on discarded lines. This is often used to reset line counters +or C<?pat?> one-time matches. + + # inspired by :1,$g/fred/s//WILMA/ + while (<>) { + ?(fred)? && s//WILMA $1 WILMA/; + ?(barney)? && s//BETTY $1 BETTY/; + ?(homer)? && s//MARGE $1 MARGE/; + } continue { + print "$ARGV $.: $_"; + close ARGV if eof(); # reset $. + reset if eof(); # reset ?pat? + } + +If the word C<while> is replaced by the word C<until>, the sense of the +test is reversed, but the conditional is still tested before the first +iteration. + +The loop control statements don't work in an C<if> or C<unless>, since +they aren't loops. You can double the braces to make them such, though. + + if (/pattern/) {{ + next if /fred/; + next if /barney/; + # so something here + }} + +The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer +available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>. + +=head2 For Loops + +Perl's C-style C<for> loop works exactly like the corresponding C<while> loop; +that means that this: + + for ($i = 1; $i < 10; $i++) { + ... + } + +is the same as this: + + $i = 1; + while ($i < 10) { + ... + } continue { + $i++; + } + +(There is one minor difference: The first form implies a lexical scope +for variables declared with C<my> in the initialization expression.) + +Besides the normal array index looping, C<for> can lend itself +to many other interesting applications. Here's one that avoids the +problem you get into if you explicitly test for end-of-file on +an interactive file descriptor causing your program to appear to +hang. + + $on_a_tty = -t STDIN && -t STDOUT; + sub prompt { print "yes? " if $on_a_tty } + for ( prompt(); <STDIN>; prompt() ) { + # do something + } + +=head2 Foreach Loops + +The C<foreach> loop iterates over a normal list value and sets the +variable VAR to be each element of the list in turn. If the variable +is preceded with the keyword C<my>, then it is lexically scoped, and +is therefore visible only within the loop. Otherwise, the variable is +implicitly local to the loop and regains its former value upon exiting +the loop. If the variable was previously declared with C<my>, it uses +that variable instead of the global one, but it's still localized to +the loop. (Note that a lexically scoped variable can cause problems +if you have subroutine or format declarations within the loop which +refer to it.) + +The C<foreach> keyword is actually a synonym for the C<for> keyword, so +you can use C<foreach> for readability or C<for> for brevity. (Or because +the Bourne shell is more familiar to you than I<csh>, so writing C<for> +comes more naturally.) If VAR is omitted, C<$_> is set to each value. +If any element of LIST is an lvalue, you can modify it by modifying VAR +inside the loop. That's because the C<foreach> loop index variable is +an implicit alias for each item in the list that you're looping over. + +If any part of LIST is an array, C<foreach> will get very confused if +you add or remove elements within the loop body, for example with +C<splice>. So don't do that. + +C<foreach> probably won't do what you expect if VAR is a tied or other +special variable. Don't do that either. + +Examples: + + for (@ary) { s/foo/bar/ } + + foreach my $elem (@elements) { + $elem *= 2; + } + + for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') { + print $count, "\n"; sleep(1); + } + + for (1..15) { print "Merry Christmas\n"; } + + foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) { + print "Item: $item\n"; + } + +Here's how a C programmer might code up a particular algorithm in Perl: + + for (my $i = 0; $i < @ary1; $i++) { + for (my $j = 0; $j < @ary2; $j++) { + if ($ary1[$i] > $ary2[$j]) { + last; # can't go to outer :-( + } + $ary1[$i] += $ary2[$j]; + } + # this is where that last takes me + } + +Whereas here's how a Perl programmer more comfortable with the idiom might +do it: + + OUTER: foreach my $wid (@ary1) { + INNER: foreach my $jet (@ary2) { + next OUTER if $wid > $jet; + $wid += $jet; + } + } + +See how much easier this is? It's cleaner, safer, and faster. It's +cleaner because it's less noisy. It's safer because if code gets added +between the inner and outer loops later on, the new code won't be +accidentally executed. The C<next> explicitly iterates the other loop +rather than merely terminating the inner one. And it's faster because +Perl executes a C<foreach> statement more rapidly than it would the +equivalent C<for> loop. + +=head2 Basic BLOCKs and Switch Statements + +A BLOCK by itself (labeled or not) is semantically equivalent to a +loop that executes once. Thus you can use any of the loop control +statements in it to leave or restart the block. (Note that this is +I<NOT> true in C<eval{}>, C<sub{}>, or contrary to popular belief +C<do{}> blocks, which do I<NOT> count as loops.) The C<continue> +block is optional. + +The BLOCK construct is particularly nice for doing case +structures. + + SWITCH: { + if (/^abc/) { $abc = 1; last SWITCH; } + if (/^def/) { $def = 1; last SWITCH; } + if (/^xyz/) { $xyz = 1; last SWITCH; } + $nothing = 1; + } + +There is no official C<switch> statement in Perl, because there are +already several ways to write the equivalent. In addition to the +above, you could write + + SWITCH: { + $abc = 1, last SWITCH if /^abc/; + $def = 1, last SWITCH if /^def/; + $xyz = 1, last SWITCH if /^xyz/; + $nothing = 1; + } + +(That's actually not as strange as it looks once you realize that you can +use loop control "operators" within an expression, That's just the normal +C comma operator.) + +or + + SWITCH: { + /^abc/ && do { $abc = 1; last SWITCH; }; + /^def/ && do { $def = 1; last SWITCH; }; + /^xyz/ && do { $xyz = 1; last SWITCH; }; + $nothing = 1; + } + +or formatted so it stands out more as a "proper" C<switch> statement: + + SWITCH: { + /^abc/ && do { + $abc = 1; + last SWITCH; + }; + + /^def/ && do { + $def = 1; + last SWITCH; + }; + + /^xyz/ && do { + $xyz = 1; + last SWITCH; + }; + $nothing = 1; + } + +or + + SWITCH: { + /^abc/ and $abc = 1, last SWITCH; + /^def/ and $def = 1, last SWITCH; + /^xyz/ and $xyz = 1, last SWITCH; + $nothing = 1; + } + +or even, horrors, + + if (/^abc/) + { $abc = 1 } + elsif (/^def/) + { $def = 1 } + elsif (/^xyz/) + { $xyz = 1 } + else + { $nothing = 1 } + +A common idiom for a C<switch> statement is to use C<foreach>'s aliasing to make +a temporary assignment to C<$_> for convenient matching: + + SWITCH: for ($where) { + /In Card Names/ && do { push @flags, '-e'; last; }; + /Anywhere/ && do { push @flags, '-h'; last; }; + /In Rulings/ && do { last; }; + die "unknown value for form variable where: `$where'"; + } + +Another interesting approach to a switch statement is arrange +for a C<do> block to return the proper value: + + $amode = do { + if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0? + elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" } + elsif ($flag & O_RDWR) { + if ($flag & O_CREAT) { "w+" } + else { ($flag & O_APPEND) ? "a+" : "r+" } + } + }; + +Or + + print do { + ($flags & O_WRONLY) ? "write-only" : + ($flags & O_RDWR) ? "read-write" : + "read-only"; + }; + +Or if you are certainly that all the C<&&> clauses are true, you can use +something like this, which "switches" on the value of the +C<HTTP_USER_AGENT> envariable. + + #!/usr/bin/perl + # pick out jargon file page based on browser + $dir = 'http://www.wins.uva.nl/~mes/jargon'; + for ($ENV{HTTP_USER_AGENT}) { + $page = /Mac/ && 'm/Macintrash.html' + || /Win(dows )?NT/ && 'e/evilandrude.html' + || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html' + || /Linux/ && 'l/Linux.html' + || /HP-UX/ && 'h/HP-SUX.html' + || /SunOS/ && 's/ScumOS.html' + || 'a/AppendixB.html'; + } + print "Location: $dir/$page\015\012\015\012"; + +That kind of switch statement only works when you know the C<&&> clauses +will be true. If you don't, the previous C<?:> example should be used. + +You might also consider writing a hash instead of synthesizing a C<switch> +statement. + +=head2 Goto + +Although not for the faint of heart, Perl does support a C<goto> statement. +A loop's LABEL is not actually a valid target for a C<goto>; +it's just the name of the loop. There are three forms: C<goto>-LABEL, +C<goto>-EXPR, and C<goto>-&NAME. + +The C<goto>-LABEL form finds the statement labeled with LABEL and resumes +execution there. It may not be used to go into any construct that +requires initialization, such as a subroutine or a C<foreach> loop. It +also can't be used to go into a construct that is optimized away. It +can be used to go almost anywhere else within the dynamic scope, +including out of subroutines, but it's usually better to use some other +construct such as C<last> or C<die>. The author of Perl has never felt the +need to use this form of C<goto> (in Perl, that is--C is another matter). + +The C<goto>-EXPR form expects a label name, whose scope will be resolved +dynamically. This allows for computed C<goto>s per FORTRAN, but isn't +necessarily recommended if you're optimizing for maintainability: + + goto ("FOO", "BAR", "GLARCH")[$i]; + +The C<goto>-&NAME form is highly magical, and substitutes a call to the +named subroutine for the currently running subroutine. This is used by +C<AUTOLOAD()> subroutines that wish to load another subroutine and then +pretend that the other subroutine had been called in the first place +(except that any modifications to C<@_> in the current subroutine are +propagated to the other subroutine.) After the C<goto>, not even C<caller()> +will be able to tell that this routine was called first. + +In almost all cases like this, it's usually a far, far better idea to use the +structured control flow mechanisms of C<next>, C<last>, or C<redo> instead of +resorting to a C<goto>. For certain applications, the catch and throw pair of +C<eval{}> and die() for exception processing can also be a prudent approach. + +=head2 PODs: Embedded Documentation + +Perl has a mechanism for intermixing documentation with source code. +While it's expecting the beginning of a new statement, if the compiler +encounters a line that begins with an equal sign and a word, like this + + =head1 Here There Be Pods! + +Then that text and all remaining text up through and including a line +beginning with C<=cut> will be ignored. The format of the intervening +text is described in L<perlpod>. + +This allows you to intermix your source code +and your documentation text freely, as in + + =item snazzle($) + + The snazzle() function will behave in the most spectacular + form that you can possibly imagine, not even excepting + cybernetic pyrotechnics. + + =cut back to the compiler, nuff of this pod stuff! + + sub snazzle($) { + my $thingie = shift; + ......... + } + +Note that pod translators should look at only paragraphs beginning +with a pod directive (it makes parsing easier), whereas the compiler +actually knows to look for pod escapes even in the middle of a +paragraph. This means that the following secret stuff will be +ignored by both the compiler and the translators. + + $a=3; + =secret stuff + warn "Neither POD nor CODE!?" + =cut back + print "got $a\n"; + +You probably shouldn't rely upon the C<warn()> being podded out forever. +Not all pod translators are well-behaved in this regard, and perhaps +the compiler will become pickier. + +One may also use pod directives to quickly comment out a section +of code. + +=head2 Plain Old Comments (Not!) + +Much like the C preprocessor, Perl can process line directives. Using +this, one can control Perl's idea of filenames and line numbers in +error or warning messages (especially for strings that are processed +with C<eval()>). The syntax for this mechanism is the same as for most +C preprocessors: it matches the regular expression +C</^#\s*line\s+(\d+)\s*(?:\s"([^"]*)")?/> with C<$1> being the line +number for the next line, and C<$2> being the optional filename +(specified within quotes). + +Here are some examples that you should be able to type into your command +shell: + + % perl + # line 200 "bzzzt" + # the `#' on the previous line must be the first char on line + die 'foo'; + __END__ + foo at bzzzt line 201. + + % perl + # line 200 "bzzzt" + eval qq[\n#line 2001 ""\ndie 'foo']; print $@; + __END__ + foo at - line 2001. + + % perl + eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@; + __END__ + foo at foo bar line 200. + + % perl + # line 345 "goop" + eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'"; + print $@; + __END__ + foo at goop line 345. + +=cut |