summaryrefslogtreecommitdiffstats
path: root/contrib/perl5/pod/perlrequick.pod
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/perl5/pod/perlrequick.pod')
-rw-r--r--contrib/perl5/pod/perlrequick.pod503
1 files changed, 0 insertions, 503 deletions
diff --git a/contrib/perl5/pod/perlrequick.pod b/contrib/perl5/pod/perlrequick.pod
deleted file mode 100644
index 5b72a35..0000000
--- a/contrib/perl5/pod/perlrequick.pod
+++ /dev/null
@@ -1,503 +0,0 @@
-=head1 NAME
-
-perlrequick - Perl regular expressions quick start
-
-=head1 DESCRIPTION
-
-This page covers the very basics of understanding, creating and
-using regular expressions ('regexes') in Perl.
-
-
-=head1 The Guide
-
-=head2 Simple word matching
-
-The simplest regex is simply a word, or more generally, a string of
-characters. A regex consisting of a word matches any string that
-contains that word:
-
- "Hello World" =~ /World/; # matches
-
-In this statement, C<World> is a regex and the C<//> enclosing
-C</World/> tells perl to search a string for a match. The operator
-C<=~> associates the string with the regex match and produces a true
-value if the regex matched, or false if the regex did not match. In
-our case, C<World> matches the second word in C<"Hello World">, so the
-expression is true. This idea has several variations.
-
-Expressions like this are useful in conditionals:
-
- print "It matches\n" if "Hello World" =~ /World/;
-
-The sense of the match can be reversed by using C<!~> operator:
-
- print "It doesn't match\n" if "Hello World" !~ /World/;
-
-The literal string in the regex can be replaced by a variable:
-
- $greeting = "World";
- print "It matches\n" if "Hello World" =~ /$greeting/;
-
-If you're matching against C<$_>, the C<$_ =~> part can be omitted:
-
- $_ = "Hello World";
- print "It matches\n" if /World/;
-
-Finally, the C<//> default delimiters for a match can be changed to
-arbitrary delimiters by putting an C<'m'> out front:
-
- "Hello World" =~ m!World!; # matches, delimited by '!'
- "Hello World" =~ m{World}; # matches, note the matching '{}'
- "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
- # '/' becomes an ordinary char
-
-Regexes must match a part of the string I<exactly> in order for the
-statement to be true:
-
- "Hello World" =~ /world/; # doesn't match, case sensitive
- "Hello World" =~ /o W/; # matches, ' ' is an ordinary char
- "Hello World" =~ /World /; # doesn't match, no ' ' at end
-
-perl will always match at the earliest possible point in the string:
-
- "Hello World" =~ /o/; # matches 'o' in 'Hello'
- "That hat is red" =~ /hat/; # matches 'hat' in 'That'
-
-Not all characters can be used 'as is' in a match. Some characters,
-called B<metacharacters>, are reserved for use in regex notation.
-The metacharacters are
-
- {}[]()^$.|*+?\
-
-A metacharacter can be matched by putting a backslash before it:
-
- "2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
- "2+2=4" =~ /2\+2/; # matches, \+ is treated like an ordinary +
- 'C:\WIN32' =~ /C:\\WIN/; # matches
- "/usr/bin/perl" =~ /\/usr\/local\/bin\/perl/; # matches
-
-In the last regex, the forward slash C<'/'> is also backslashed,
-because it is used to delimit the regex.
-
-Non-printable ASCII characters are represented by B<escape sequences>.
-Common examples are C<\t> for a tab, C<\n> for a newline, and C<\r>
-for a carriage return. Arbitrary bytes are represented by octal
-escape sequences, e.g., C<\033>, or hexadecimal escape sequences,
-e.g., C<\x1B>:
-
- "1000\t2000" =~ m(0\t2) # matches
- "cat" =~ /\143\x61\x74/ # matches, but a weird way to spell cat
-
-Regexes are treated mostly as double quoted strings, so variable
-substitution works:
-
- $foo = 'house';
- 'cathouse' =~ /cat$foo/; # matches
- 'housecat' =~ /${foo}cat/; # matches
-
-With all of the regexes above, if the regex matched anywhere in the
-string, it was considered a match. To specify I<where> it should
-match, we would use the B<anchor> metacharacters C<^> and C<$>. The
-anchor C<^> means match at the beginning of the string and the anchor
-C<$> means match at the end of the string, or before a newline at the
-end of the string. Some examples:
-
- "housekeeper" =~ /keeper/; # matches
- "housekeeper" =~ /^keeper/; # doesn't match
- "housekeeper" =~ /keeper$/; # matches
- "housekeeper\n" =~ /keeper$/; # matches
- "housekeeper" =~ /^housekeeper$/; # matches
-
-=head2 Using character classes
-
-A B<character class> allows a set of possible characters, rather than
-just a single character, to match at a particular point in a regex.
-Character classes are denoted by brackets C<[...]>, with the set of
-characters to be possibly matched inside. Here are some examples:
-
- /cat/; # matches 'cat'
- /[bcr]at/; # matches 'bat', 'cat', or 'rat'
- "abc" =~ /[cab]/; # matches 'a'
-
-In the last statement, even though C<'c'> is the first character in
-the class, the earliest point at which the regex can match is C<'a'>.
-
- /[yY][eE][sS]/; # match 'yes' in a case-insensitive way
- # 'yes', 'Yes', 'YES', etc.
- /yes/i; # also match 'yes' in a case-insensitive way
-
-The last example shows a match with an C<'i'> B<modifier>, which makes
-the match case-insensitive.
-
-Character classes also have ordinary and special characters, but the
-sets of ordinary and special characters inside a character class are
-different than those outside a character class. The special
-characters for a character class are C<-]\^$> and are matched using an
-escape:
-
- /[\]c]def/; # matches ']def' or 'cdef'
- $x = 'bcr';
- /[$x]at/; # matches 'bat, 'cat', or 'rat'
- /[\$x]at/; # matches '$at' or 'xat'
- /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'
-
-The special character C<'-'> acts as a range operator within character
-classes, so that the unwieldy C<[0123456789]> and C<[abc...xyz]>
-become the svelte C<[0-9]> and C<[a-z]>:
-
- /item[0-9]/; # matches 'item0' or ... or 'item9'
- /[0-9a-fA-F]/; # matches a hexadecimal digit
-
-If C<'-'> is the first or last character in a character class, it is
-treated as an ordinary character.
-
-The special character C<^> in the first position of a character class
-denotes a B<negated character class>, which matches any character but
-those in the brackets. Both C<[...]> and C<[^...]> must match a
-character, or the match fails. Then
-
- /[^a]at/; # doesn't match 'aat' or 'at', but matches
- # all other 'bat', 'cat, '0at', '%at', etc.
- /[^0-9]/; # matches a non-numeric character
- /[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
-
-Perl has several abbreviations for common character classes:
-
-=over 4
-
-=item *
-
-\d is a digit and represents [0-9]
-
-=item *
-
-\s is a whitespace character and represents [\ \t\r\n\f]
-
-=item *
-
-\w is a word character (alphanumeric or _) and represents [0-9a-zA-Z_]
-
-=item *
-
-\D is a negated \d; it represents any character but a digit [^0-9]
-
-=item *
-
-\S is a negated \s; it represents any non-whitespace character [^\s]
-
-=item *
-
-\W is a negated \w; it represents any non-word character [^\w]
-
-=item *
-
-The period '.' matches any character but "\n"
-
-=back
-
-The C<\d\s\w\D\S\W> abbreviations can be used both inside and outside
-of character classes. Here are some in use:
-
- /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format
- /[\d\s]/; # matches any digit or whitespace character
- /\w\W\w/; # matches a word char, followed by a
- # non-word char, followed by a word char
- /..rt/; # matches any two chars, followed by 'rt'
- /end\./; # matches 'end.'
- /end[.]/; # same thing, matches 'end.'
-
-The S<B<word anchor> > C<\b> matches a boundary between a word
-character and a non-word character C<\w\W> or C<\W\w>:
-
- $x = "Housecat catenates house and cat";
- $x =~ /\bcat/; # matches cat in 'catenates'
- $x =~ /cat\b/; # matches cat in 'housecat'
- $x =~ /\bcat\b/; # matches 'cat' at end of string
-
-In the last example, the end of the string is considered a word
-boundary.
-
-=head2 Matching this or that
-
-We can match match different character strings with the B<alternation>
-metacharacter C<'|'>. To match C<dog> or C<cat>, we form the regex
-C<dog|cat>. As before, perl will try to match the regex at the
-earliest possible point in the string. At each character position,
-perl will first try to match the the first alternative, C<dog>. If
-C<dog> doesn't match, perl will then try the next alternative, C<cat>.
-If C<cat> doesn't match either, then the match fails and perl moves to
-the next position in the string. Some examples:
-
- "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
- "cats and dogs" =~ /dog|cat|bird/; # matches "cat"
-
-Even though C<dog> is the first alternative in the second regex,
-C<cat> is able to match earlier in the string.
-
- "cats" =~ /c|ca|cat|cats/; # matches "c"
- "cats" =~ /cats|cat|ca|c/; # matches "cats"
-
-At a given character position, the first alternative that allows the
-regex match to succeed wil be the one that matches. Here, all the
-alternatives match at the first string position, so th first matches.
-
-=head2 Grouping things and hierarchical matching
-
-The B<grouping> metacharacters C<()> allow a part of a regex to be
-treated as a single unit. Parts of a regex are grouped by enclosing
-them in parentheses. The regex C<house(cat|keeper)> means match
-C<house> followed by either C<cat> or C<keeper>. Some more examples
-are
-
- /(a|b)b/; # matches 'ab' or 'bb'
- /(^a|b)c/; # matches 'ac' at start of string or 'bc' anywhere
-
- /house(cat|)/; # matches either 'housecat' or 'house'
- /house(cat(s|)|)/; # matches either 'housecats' or 'housecat' or
- # 'house'. Note groups can be nested.
-
- "20" =~ /(19|20|)\d\d/; # matches the null alternative '()\d\d',
- # because '20\d\d' can't match
-
-=head2 Extracting matches
-
-The grouping metacharacters C<()> also allow the extraction of the
-parts of a string that matched. For each grouping, the part that
-matched inside goes into the special variables C<$1>, C<$2>, etc.
-They can be used just as ordinary variables:
-
- # extract hours, minutes, seconds
- $time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format
- $hours = $1;
- $minutes = $2;
- $seconds = $3;
-
-In list context, a match C</regex/> with groupings will return the
-list of matched values C<($1,$2,...)>. So we could rewrite it as
-
- ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
-
-If the groupings in a regex are nested, C<$1> gets the group with the
-leftmost opening parenthesis, C<$2> the next opening parenthesis,
-etc. For example, here is a complex regex and the matching variables
-indicated below it:
-
- /(ab(cd|ef)((gi)|j))/;
- 1 2 34
-
-Associated with the matching variables C<$1>, C<$2>, ... are
-the B<backreferences> C<\1>, C<\2>, ... Backreferences are
-matching variables that can be used I<inside> a regex:
-
- /(\w\w\w)\s\1/; # find sequences like 'the the' in string
-
-C<$1>, C<$2>, ... should only be used outside of a regex, and C<\1>,
-C<\2>, ... only inside a regex.
-
-=head2 Matching repetitions
-
-The B<quantifier> metacharacters C<?>, C<*>, C<+>, and C<{}> allow us
-to determine the number of repeats of a portion of a regex we
-consider to be a match. Quantifiers are put immediately after the
-character, character class, or grouping that we want to specify. They
-have the following meanings:
-
-=over 4
-
-=item *
-
-C<a?> = match 'a' 1 or 0 times
-
-=item *
-
-C<a*> = match 'a' 0 or more times, i.e., any number of times
-
-=item *
-
-C<a+> = match 'a' 1 or more times, i.e., at least once
-
-=item *
-
-C<a{n,m}> = match at least C<n> times, but not more than C<m>
-times.
-
-=item *
-
-C<a{n,}> = match at least C<n> or more times
-
-=item *
-
-C<a{n}> = match exactly C<n> times
-
-=back
-
-Here are some examples:
-
- /[a-z]+\s+\d*/; # match a lowercase word, at least some space, and
- # any number of digits
- /(\w+)\s+\1/; # match doubled words of arbitrary length
- $year =~ /\d{2,4}/; # make sure year is at least 2 but not more
- # than 4 digits
- $year =~ /\d{4}|\d{2}/; # better match; throw out 3 digit dates
-
-These quantifiers will try to match as much of the string as possible,
-while still allowing the regex to match. So we have
-
- $x = 'the cat in the hat';
- $x =~ /^(.*)(at)(.*)$/; # matches,
- # $1 = 'the cat in the h'
- # $2 = 'at'
- # $3 = '' (0 matches)
-
-The first quantifier C<.*> grabs as much of the string as possible
-while still having the regex match. The second quantifier C<.*> has
-no string left to it, so it matches 0 times.
-
-=head2 More matching
-
-There are a few more things you might want to know about matching
-operators. In the code
-
- $pattern = 'Seuss';
- while (<>) {
- print if /$pattern/;
- }
-
-perl has to re-evaluate C<$pattern> each time through the loop. If
-C<$pattern> won't be changing, use the C<//o> modifier, to only
-perform variable substitutions once. If you don't want any
-substitutions at all, use the special delimiter C<m''>:
-
- $pattern = 'Seuss';
- m'$pattern'; # matches '$pattern', not 'Seuss'
-
-The global modifier C<//g> allows the matching operator to match
-within a string as many times as possible. In scalar context,
-successive matches against a string will have C<//g> jump from match
-to match, keeping track of position in the string as it goes along.
-You can get or set the position with the C<pos()> function.
-For example,
-
- $x = "cat dog house"; # 3 words
- while ($x =~ /(\w+)/g) {
- print "Word is $1, ends at position ", pos $x, "\n";
- }
-
-prints
-
- Word is cat, ends at position 3
- Word is dog, ends at position 7
- Word is house, ends at position 13
-
-A failed match or changing the target string resets the position. If
-you don't want the position reset after failure to match, add the
-C<//c>, as in C</regex/gc>.
-
-In list context, C<//g> returns a list of matched groupings, or if
-there are no groupings, a list of matches to the whole regex. So
-
- @words = ($x =~ /(\w+)/g); # matches,
- # $word[0] = 'cat'
- # $word[1] = 'dog'
- # $word[2] = 'house'
-
-=head2 Search and replace
-
-Search and replace is performed using C<s/regex/replacement/modifiers>.
-The C<replacement> is a Perl double quoted string that replaces in the
-string whatever is matched with the C<regex>. The operator C<=~> is
-also used here to associate a string with C<s///>. If matching
-against C<$_>, the S<C<$_ =~> > can be dropped. If there is a match,
-C<s///> returns the number of substitutions made, otherwise it returns
-false. Here are a few examples:
-
- $x = "Time to feed the cat!";
- $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
- $y = "'quoted words'";
- $y =~ s/^'(.*)'$/$1/; # strip single quotes,
- # $y contains "quoted words"
-
-With the C<s///> operator, the matched variables C<$1>, C<$2>, etc.
-are immediately available for use in the replacement expression. With
-the global modifier, C<s///g> will search and replace all occurrences
-of the regex in the string:
-
- $x = "I batted 4 for 4";
- $x =~ s/4/four/; # $x contains "I batted four for 4"
- $x = "I batted 4 for 4";
- $x =~ s/4/four/g; # $x contains "I batted four for four"
-
-The evaluation modifier C<s///e> wraps an C<eval{...}> around the
-replacement string and the evaluated result is substituted for the
-matched substring. Some examples:
-
- # reverse all the words in a string
- $x = "the cat in the hat";
- $x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
-
- # convert percentage to decimal
- $x = "A 39% hit rate";
- $x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"
-
-The last example shows that C<s///> can use other delimiters, such as
-C<s!!!> and C<s{}{}>, and even C<s{}//>. If single quotes are used
-C<s'''>, then the regex and replacement are treated as single quoted
-strings.
-
-=head2 The split operator
-
-C<split /regex/, string> splits C<string> into a list of substrings
-and returns that list. The regex determines the character sequence
-that C<string> is split with respect to. For example, to split a
-string into words, use
-
- $x = "Calvin and Hobbes";
- @word = split /\s+/, $x; # $word[0] = 'Calvin'
- # $word[1] = 'and'
- # $word[2] = 'Hobbes'
-
-To extract a comma-delimited list of numbers, use
-
- $x = "1.618,2.718, 3.142";
- @const = split /,\s*/, $x; # $const[0] = '1.618'
- # $const[1] = '2.718'
- # $const[2] = '3.142'
-
-If the empty regex C<//> is used, the string is split into individual
-characters. If the regex has groupings, then list produced contains
-the matched substrings from the groupings as well:
-
- $x = "/usr/bin";
- @parts = split m!(/)!, $x; # $parts[0] = ''
- # $parts[1] = '/'
- # $parts[2] = 'usr'
- # $parts[3] = '/'
- # $parts[4] = 'bin'
-
-Since the first character of $x matched the regex, C<split> prepended
-an empty initial element to the list.
-
-=head1 BUGS
-
-None.
-
-=head1 SEE ALSO
-
-This is just a quick start guide. For a more in-depth tutorial on
-regexes, see L<perlretut> and for the reference page, see L<perlre>.
-
-=head1 AUTHOR AND COPYRIGHT
-
-Copyright (c) 2000 Mark Kvale
-All rights reserved.
-
-This document may be distributed under the same terms as Perl itself.
-
-=head2 Acknowledgments
-
-The author would like to thank Mark-Jason Dominus, Tom Christiansen,
-Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful
-comments.
-
-=cut
-
OpenPOWER on IntegriCloud