diff options
Diffstat (limited to 'contrib/perl5/pod/perldsc.pod')
-rw-r--r-- | contrib/perl5/pod/perldsc.pod | 832 |
1 files changed, 832 insertions, 0 deletions
diff --git a/contrib/perl5/pod/perldsc.pod b/contrib/perl5/pod/perldsc.pod new file mode 100644 index 0000000..d0cc335 --- /dev/null +++ b/contrib/perl5/pod/perldsc.pod @@ -0,0 +1,832 @@ +=head1 NAME + +perldsc - Perl Data Structures Cookbook + +=head1 DESCRIPTION + +The single feature most sorely lacking in the Perl programming language +prior to its 5.0 release was complex data structures. Even without direct +language support, some valiant programmers did manage to emulate them, but +it was hard work and not for the faint of heart. You could occasionally +get away with the C<$m{$LoL,$b}> notation borrowed from I<awk> in which the +keys are actually more like a single concatenated string C<"$LoL$b">, but +traversal and sorting were difficult. More desperate programmers even +hacked Perl's internal symbol table directly, a strategy that proved hard +to develop and maintain--to put it mildly. + +The 5.0 release of Perl let us have complex data structures. You +may now write something like this and all of a sudden, you'd have a array +with three dimensions! + + for $x (1 .. 10) { + for $y (1 .. 10) { + for $z (1 .. 10) { + $LoL[$x][$y][$z] = + $x ** $y + $z; + } + } + } + +Alas, however simple this may appear, underneath it's a much more +elaborate construct than meets the eye! + +How do you print it out? Why can't you say just C<print @LoL>? How do +you sort it? How can you pass it to a function or get one of these back +from a function? Is is an object? Can you save it to disk to read +back later? How do you access whole rows or columns of that matrix? Do +all the values have to be numeric? + +As you see, it's quite easy to become confused. While some small portion +of the blame for this can be attributed to the reference-based +implementation, it's really more due to a lack of existing documentation with +examples designed for the beginner. + +This document is meant to be a detailed but understandable treatment of the +many different sorts of data structures you might want to develop. It +should also serve as a cookbook of examples. That way, when you need to +create one of these complex data structures, you can just pinch, pilfer, or +purloin a drop-in example from here. + +Let's look at each of these possible constructs in detail. There are separate +sections on each of the following: + +=over 5 + +=item * arrays of arrays + +=item * hashes of arrays + +=item * arrays of hashes + +=item * hashes of hashes + +=item * more elaborate constructs + +=back + +But for now, let's look at general issues common to all +these types of data structures. + +=head1 REFERENCES + +The most important thing to understand about all data structures in Perl +-- including multidimensional arrays--is that even though they might +appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally +one-dimensional. They can hold only scalar values (meaning a string, +number, or a reference). They cannot directly contain other arrays or +hashes, but instead contain I<references> to other arrays or hashes. + +You can't use a reference to a array or hash in quite the same way that you +would a real array or hash. For C or C++ programmers unused to +distinguishing between arrays and pointers to the same, this can be +confusing. If so, just think of it as the difference between a structure +and a pointer to a structure. + +You can (and should) read more about references in the perlref(1) man +page. Briefly, references are rather like pointers that know what they +point to. (Objects are also a kind of reference, but we won't be needing +them right away--if ever.) This means that when you have something which +looks to you like an access to a two-or-more-dimensional array and/or hash, +what's really going on is that the base type is +merely a one-dimensional entity that contains references to the next +level. It's just that you can I<use> it as though it were a +two-dimensional one. This is actually the way almost all C +multidimensional arrays work as well. + + $list[7][12] # array of arrays + $list[7]{string} # array of hashes + $hash{string}[7] # hash of arrays + $hash{string}{'another string'} # hash of hashes + +Now, because the top level contains only references, if you try to print +out your array in with a simple print() function, you'll get something +that doesn't look very nice, like this: + + @LoL = ( [2, 3], [4, 5, 7], [0] ); + print $LoL[1][2]; + 7 + print @LoL; + ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0) + + +That's because Perl doesn't (ever) implicitly dereference your variables. +If you want to get at the thing a reference is referring to, then you have +to do this yourself using either prefix typing indicators, like +C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows, +like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>. + +=head1 COMMON MISTAKES + +The two most common mistakes made in constructing something like +an array of arrays is either accidentally counting the number of +elements or else taking a reference to the same memory location +repeatedly. Here's the case where you just get the count instead +of a nested array: + + for $i (1..10) { + @list = somefunc($i); + $LoL[$i] = @list; # WRONG! + } + +That's just the simple case of assigning a list to a scalar and getting +its element count. If that's what you really and truly want, then you +might do well to consider being a tad more explicit about it, like this: + + for $i (1..10) { + @list = somefunc($i); + $counts[$i] = scalar @list; + } + +Here's the case of taking a reference to the same memory location +again and again: + + for $i (1..10) { + @list = somefunc($i); + $LoL[$i] = \@list; # WRONG! + } + +So, what's the big problem with that? It looks right, doesn't it? +After all, I just told you that you need an array of references, so by +golly, you've made me one! + +Unfortunately, while this is true, it's still broken. All the references +in @LoL refer to the I<very same place>, and they will therefore all hold +whatever was last in @list! It's similar to the problem demonstrated in +the following C program: + + #include <pwd.h> + main() { + struct passwd *getpwnam(), *rp, *dp; + rp = getpwnam("root"); + dp = getpwnam("daemon"); + + printf("daemon name is %s\nroot name is %s\n", + dp->pw_name, rp->pw_name); + } + +Which will print + + daemon name is daemon + root name is daemon + +The problem is that both C<rp> and C<dp> are pointers to the same location +in memory! In C, you'd have to remember to malloc() yourself some new +memory. In Perl, you'll want to use the array constructor C<[]> or the +hash constructor C<{}> instead. Here's the right way to do the preceding +broken code fragments: + + for $i (1..10) { + @list = somefunc($i); + $LoL[$i] = [ @list ]; + } + +The square brackets make a reference to a new array with a I<copy> +of what's in @list at the time of the assignment. This is what +you want. + +Note that this will produce something similar, but it's +much harder to read: + + for $i (1..10) { + @list = 0 .. $i; + @{$LoL[$i]} = @list; + } + +Is it the same? Well, maybe so--and maybe not. The subtle difference +is that when you assign something in square brackets, you know for sure +it's always a brand new reference with a new I<copy> of the data. +Something else could be going on in this new case with the C<@{$LoL[$i]}}> +dereference on the left-hand-side of the assignment. It all depends on +whether C<$LoL[$i]> had been undefined to start with, or whether it +already contained a reference. If you had already populated @LoL with +references, as in + + $LoL[3] = \@another_list; + +Then the assignment with the indirection on the left-hand-side would +use the existing reference that was already there: + + @{$LoL[3]} = @list; + +Of course, this I<would> have the "interesting" effect of clobbering +@another_list. (Have you ever noticed how when a programmer says +something is "interesting", that rather than meaning "intriguing", +they're disturbingly more apt to mean that it's "annoying", +"difficult", or both? :-) + +So just remember always to use the array or hash constructors with C<[]> +or C<{}>, and you'll be fine, although it's not always optimally +efficient. + +Surprisingly, the following dangerous-looking construct will +actually work out fine: + + for $i (1..10) { + my @list = somefunc($i); + $LoL[$i] = \@list; + } + +That's because my() is more of a run-time statement than it is a +compile-time declaration I<per se>. This means that the my() variable is +remade afresh each time through the loop. So even though it I<looks> as +though you stored the same variable reference each time, you actually did +not! This is a subtle distinction that can produce more efficient code at +the risk of misleading all but the most experienced of programmers. So I +usually advise against teaching it to beginners. In fact, except for +passing arguments to functions, I seldom like to see the gimme-a-reference +operator (backslash) used much at all in code. Instead, I advise +beginners that they (and most of the rest of us) should try to use the +much more easily understood constructors C<[]> and C<{}> instead of +relying upon lexical (or dynamic) scoping and hidden reference-counting to +do the right thing behind the scenes. + +In summary: + + $LoL[$i] = [ @list ]; # usually best + $LoL[$i] = \@list; # perilous; just how my() was that list? + @{ $LoL[$i] } = @list; # way too tricky for most programmers + + +=head1 CAVEAT ON PRECEDENCE + +Speaking of things like C<@{$LoL[$i]}>, the following are actually the +same thing: + + $listref->[2][2] # clear + $$listref[2][2] # confusing + +That's because Perl's precedence rules on its five prefix dereferencers +(which look like someone swearing: C<$ @ * % &>) make them bind more +tightly than the postfix subscripting brackets or braces! This will no +doubt come as a great shock to the C or C++ programmer, who is quite +accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th> +element of C<a>. That is, they first take the subscript, and only then +dereference the thing at that subscript. That's fine in C, but this isn't C. + +The seemingly equivalent construct in Perl, C<$$listref[$i]> first does +the deref of C<$listref>, making it take $listref as a reference to an +array, and then dereference that, and finally tell you the I<i'th> value +of the array pointed to by $LoL. If you wanted the C notion, you'd have to +write C<${$LoL[$i]}> to force the C<$LoL[$i]> to get evaluated first +before the leading C<$> dereferencer. + +=head1 WHY YOU SHOULD ALWAYS C<use strict> + +If this is starting to sound scarier than it's worth, relax. Perl has +some features to help you avoid its most common pitfalls. The best +way to avoid getting confused is to start every program like this: + + #!/usr/bin/perl -w + use strict; + +This way, you'll be forced to declare all your variables with my() and +also disallow accidental "symbolic dereferencing". Therefore if you'd done +this: + + my $listref = [ + [ "fred", "barney", "pebbles", "bambam", "dino", ], + [ "homer", "bart", "marge", "maggie", ], + [ "george", "jane", "elroy", "judy", ], + ]; + + print $listref[2][2]; + +The compiler would immediately flag that as an error I<at compile time>, +because you were accidentally accessing C<@listref>, an undeclared +variable, and it would thereby remind you to write instead: + + print $listref->[2][2] + +=head1 DEBUGGING + +Before version 5.002, the standard Perl debugger didn't do a very nice job of +printing out complex data structures. With 5.002 or above, the +debugger includes several new features, including command line editing as +well as the C<x> command to dump out complex data structures. For +example, given the assignment to $LoL above, here's the debugger output: + + DB<1> x $LoL + $LoL = ARRAY(0x13b5a0) + 0 ARRAY(0x1f0a24) + 0 'fred' + 1 'barney' + 2 'pebbles' + 3 'bambam' + 4 'dino' + 1 ARRAY(0x13b558) + 0 'homer' + 1 'bart' + 2 'marge' + 3 'maggie' + 2 ARRAY(0x13b540) + 0 'george' + 1 'jane' + 2 'elroy' + 3 'judy' + +=head1 CODE EXAMPLES + +Presented with little comment (these will get their own manpages someday) +here are short code examples illustrating access of various +types of data structures. + +=head1 LISTS OF LISTS + +=head2 Declaration of a LIST OF LISTS + + @LoL = ( + [ "fred", "barney" ], + [ "george", "jane", "elroy" ], + [ "homer", "marge", "bart" ], + ); + +=head2 Generation of a LIST OF LISTS + + # reading from file + while ( <> ) { + push @LoL, [ split ]; + } + + # calling a function + for $i ( 1 .. 10 ) { + $LoL[$i] = [ somefunc($i) ]; + } + + # using temp vars + for $i ( 1 .. 10 ) { + @tmp = somefunc($i); + $LoL[$i] = [ @tmp ]; + } + + # add to an existing row + push @{ $LoL[0] }, "wilma", "betty"; + +=head2 Access and Printing of a LIST OF LISTS + + # one element + $LoL[0][0] = "Fred"; + + # another element + $LoL[1][1] =~ s/(\w)/\u$1/; + + # print the whole thing with refs + for $aref ( @LoL ) { + print "\t [ @$aref ],\n"; + } + + # print the whole thing with indices + for $i ( 0 .. $#LoL ) { + print "\t [ @{$LoL[$i]} ],\n"; + } + + # print the whole thing one at a time + for $i ( 0 .. $#LoL ) { + for $j ( 0 .. $#{ $LoL[$i] } ) { + print "elt $i $j is $LoL[$i][$j]\n"; + } + } + +=head1 HASHES OF LISTS + +=head2 Declaration of a HASH OF LISTS + + %HoL = ( + flintstones => [ "fred", "barney" ], + jetsons => [ "george", "jane", "elroy" ], + simpsons => [ "homer", "marge", "bart" ], + ); + +=head2 Generation of a HASH OF LISTS + + # reading from file + # flintstones: fred barney wilma dino + while ( <> ) { + next unless s/^(.*?):\s*//; + $HoL{$1} = [ split ]; + } + + # reading from file; more temps + # flintstones: fred barney wilma dino + while ( $line = <> ) { + ($who, $rest) = split /:\s*/, $line, 2; + @fields = split ' ', $rest; + $HoL{$who} = [ @fields ]; + } + + # calling a function that returns a list + for $group ( "simpsons", "jetsons", "flintstones" ) { + $HoL{$group} = [ get_family($group) ]; + } + + # likewise, but using temps + for $group ( "simpsons", "jetsons", "flintstones" ) { + @members = get_family($group); + $HoL{$group} = [ @members ]; + } + + # append new members to an existing family + push @{ $HoL{"flintstones"} }, "wilma", "betty"; + +=head2 Access and Printing of a HASH OF LISTS + + # one element + $HoL{flintstones}[0] = "Fred"; + + # another element + $HoL{simpsons}[1] =~ s/(\w)/\u$1/; + + # print the whole thing + foreach $family ( keys %HoL ) { + print "$family: @{ $HoL{$family} }\n" + } + + # print the whole thing with indices + foreach $family ( keys %HoL ) { + print "family: "; + foreach $i ( 0 .. $#{ $HoL{$family} } ) { + print " $i = $HoL{$family}[$i]"; + } + print "\n"; + } + + # print the whole thing sorted by number of members + foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) { + print "$family: @{ $HoL{$family} }\n" + } + + # print the whole thing sorted by number of members and name + foreach $family ( sort { + @{$HoL{$b}} <=> @{$HoL{$a}} + || + $a cmp $b + } keys %HoL ) + { + print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n"; + } + +=head1 LISTS OF HASHES + +=head2 Declaration of a LIST OF HASHES + + @LoH = ( + { + Lead => "fred", + Friend => "barney", + }, + { + Lead => "george", + Wife => "jane", + Son => "elroy", + }, + { + Lead => "homer", + Wife => "marge", + Son => "bart", + } + ); + +=head2 Generation of a LIST OF HASHES + + # reading from file + # format: LEAD=fred FRIEND=barney + while ( <> ) { + $rec = {}; + for $field ( split ) { + ($key, $value) = split /=/, $field; + $rec->{$key} = $value; + } + push @LoH, $rec; + } + + + # reading from file + # format: LEAD=fred FRIEND=barney + # no temp + while ( <> ) { + push @LoH, { split /[\s+=]/ }; + } + + # calling a function that returns a key,value list, like + # "lead","fred","daughter","pebbles" + while ( %fields = getnextpairset() ) { + push @LoH, { %fields }; + } + + # likewise, but using no temp vars + while (<>) { + push @LoH, { parsepairs($_) }; + } + + # add key/value to an element + $LoH[0]{pet} = "dino"; + $LoH[2]{pet} = "santa's little helper"; + +=head2 Access and Printing of a LIST OF HASHES + + # one element + $LoH[0]{lead} = "fred"; + + # another element + $LoH[1]{lead} =~ s/(\w)/\u$1/; + + # print the whole thing with refs + for $href ( @LoH ) { + print "{ "; + for $role ( keys %$href ) { + print "$role=$href->{$role} "; + } + print "}\n"; + } + + # print the whole thing with indices + for $i ( 0 .. $#LoH ) { + print "$i is { "; + for $role ( keys %{ $LoH[$i] } ) { + print "$role=$LoH[$i]{$role} "; + } + print "}\n"; + } + + # print the whole thing one at a time + for $i ( 0 .. $#LoH ) { + for $role ( keys %{ $LoH[$i] } ) { + print "elt $i $role is $LoH[$i]{$role}\n"; + } + } + +=head1 HASHES OF HASHES + +=head2 Declaration of a HASH OF HASHES + + %HoH = ( + flintstones => { + lead => "fred", + pal => "barney", + }, + jetsons => { + lead => "george", + wife => "jane", + "his boy" => "elroy", + }, + simpsons => { + lead => "homer", + wife => "marge", + kid => "bart", + }, + ); + +=head2 Generation of a HASH OF HASHES + + # reading from file + # flintstones: lead=fred pal=barney wife=wilma pet=dino + while ( <> ) { + next unless s/^(.*?):\s*//; + $who = $1; + for $field ( split ) { + ($key, $value) = split /=/, $field; + $HoH{$who}{$key} = $value; + } + + + # reading from file; more temps + while ( <> ) { + next unless s/^(.*?):\s*//; + $who = $1; + $rec = {}; + $HoH{$who} = $rec; + for $field ( split ) { + ($key, $value) = split /=/, $field; + $rec->{$key} = $value; + } + } + + # calling a function that returns a key,value hash + for $group ( "simpsons", "jetsons", "flintstones" ) { + $HoH{$group} = { get_family($group) }; + } + + # likewise, but using temps + for $group ( "simpsons", "jetsons", "flintstones" ) { + %members = get_family($group); + $HoH{$group} = { %members }; + } + + # append new members to an existing family + %new_folks = ( + wife => "wilma", + pet => "dino", + ); + + for $what (keys %new_folks) { + $HoH{flintstones}{$what} = $new_folks{$what}; + } + +=head2 Access and Printing of a HASH OF HASHES + + # one element + $HoH{flintstones}{wife} = "wilma"; + + # another element + $HoH{simpsons}{lead} =~ s/(\w)/\u$1/; + + # print the whole thing + foreach $family ( keys %HoH ) { + print "$family: { "; + for $role ( keys %{ $HoH{$family} } ) { + print "$role=$HoH{$family}{$role} "; + } + print "}\n"; + } + + # print the whole thing somewhat sorted + foreach $family ( sort keys %HoH ) { + print "$family: { "; + for $role ( sort keys %{ $HoH{$family} } ) { + print "$role=$HoH{$family}{$role} "; + } + print "}\n"; + } + + + # print the whole thing sorted by number of members + foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) { + print "$family: { "; + for $role ( sort keys %{ $HoH{$family} } ) { + print "$role=$HoH{$family}{$role} "; + } + print "}\n"; + } + + # establish a sort order (rank) for each role + $i = 0; + for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i } + + # now print the whole thing sorted by number of members + foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) { + print "$family: { "; + # and print these according to rank order + for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) { + print "$role=$HoH{$family}{$role} "; + } + print "}\n"; + } + + +=head1 MORE ELABORATE RECORDS + +=head2 Declaration of MORE ELABORATE RECORDS + +Here's a sample showing how to create and use a record whose fields are of +many different sorts: + + $rec = { + TEXT => $string, + SEQUENCE => [ @old_values ], + LOOKUP => { %some_table }, + THATCODE => \&some_function, + THISCODE => sub { $_[0] ** $_[1] }, + HANDLE => \*STDOUT, + }; + + print $rec->{TEXT}; + + print $rec->{LIST}[0]; + $last = pop @ { $rec->{SEQUENCE} }; + + print $rec->{LOOKUP}{"key"}; + ($first_k, $first_v) = each %{ $rec->{LOOKUP} }; + + $answer = $rec->{THATCODE}->($arg); + $answer = $rec->{THISCODE}->($arg1, $arg2); + + # careful of extra block braces on fh ref + print { $rec->{HANDLE} } "a string\n"; + + use FileHandle; + $rec->{HANDLE}->autoflush(1); + $rec->{HANDLE}->print(" a string\n"); + +=head2 Declaration of a HASH OF COMPLEX RECORDS + + %TV = ( + flintstones => { + series => "flintstones", + nights => [ qw(monday thursday friday) ], + members => [ + { name => "fred", role => "lead", age => 36, }, + { name => "wilma", role => "wife", age => 31, }, + { name => "pebbles", role => "kid", age => 4, }, + ], + }, + + jetsons => { + series => "jetsons", + nights => [ qw(wednesday saturday) ], + members => [ + { name => "george", role => "lead", age => 41, }, + { name => "jane", role => "wife", age => 39, }, + { name => "elroy", role => "kid", age => 9, }, + ], + }, + + simpsons => { + series => "simpsons", + nights => [ qw(monday) ], + members => [ + { name => "homer", role => "lead", age => 34, }, + { name => "marge", role => "wife", age => 37, }, + { name => "bart", role => "kid", age => 11, }, + ], + }, + ); + +=head2 Generation of a HASH OF COMPLEX RECORDS + + # reading from file + # this is most easily done by having the file itself be + # in the raw data format as shown above. perl is happy + # to parse complex data structures if declared as data, so + # sometimes it's easiest to do that + + # here's a piece by piece build up + $rec = {}; + $rec->{series} = "flintstones"; + $rec->{nights} = [ find_days() ]; + + @members = (); + # assume this file in field=value syntax + while (<>) { + %fields = split /[\s=]+/; + push @members, { %fields }; + } + $rec->{members} = [ @members ]; + + # now remember the whole thing + $TV{ $rec->{series} } = $rec; + + ########################################################### + # now, you might want to make interesting extra fields that + # include pointers back into the same data structure so if + # change one piece, it changes everywhere, like for examples + # if you wanted a {kids} field that was an array reference + # to a list of the kids' records without having duplicate + # records and thus update problems. + ########################################################### + foreach $family (keys %TV) { + $rec = $TV{$family}; # temp pointer + @kids = (); + for $person ( @{ $rec->{members} } ) { + if ($person->{role} =~ /kid|son|daughter/) { + push @kids, $person; + } + } + # REMEMBER: $rec and $TV{$family} point to same data!! + $rec->{kids} = [ @kids ]; + } + + # you copied the list, but the list itself contains pointers + # to uncopied objects. this means that if you make bart get + # older via + + $TV{simpsons}{kids}[0]{age}++; + + # then this would also change in + print $TV{simpsons}{members}[2]{age}; + + # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] + # both point to the same underlying anonymous hash table + + # print the whole thing + foreach $family ( keys %TV ) { + print "the $family"; + print " is on during @{ $TV{$family}{nights} }\n"; + print "its members are:\n"; + for $who ( @{ $TV{$family}{members} } ) { + print " $who->{name} ($who->{role}), age $who->{age}\n"; + } + print "it turns out that $TV{$family}{lead} has "; + print scalar ( @{ $TV{$family}{kids} } ), " kids named "; + print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } ); + print "\n"; + } + +=head1 Database Ties + +You cannot easily tie a multilevel data structure (such as a hash of +hashes) to a dbm file. The first problem is that all but GDBM and +Berkeley DB have size limitations, but beyond that, you also have problems +with how references are to be represented on disk. One experimental +module that does partially attempt to address this need is the MLDBM +module. Check your nearest CPAN site as described in L<perlmodlib> for +source code to MLDBM. + +=head1 SEE ALSO + +perlref(1), perllol(1), perldata(1), perlobj(1) + +=head1 AUTHOR + +Tom Christiansen <F<tchrist@perl.com>> + +Last update: +Wed Oct 23 04:57:50 MET DST 1996 |