diff options
Diffstat (limited to 'contrib/gperf/gperf.texinfo')
-rw-r--r-- | contrib/gperf/gperf.texinfo | 1129 |
1 files changed, 0 insertions, 1129 deletions
diff --git a/contrib/gperf/gperf.texinfo b/contrib/gperf/gperf.texinfo deleted file mode 100644 index c957269..0000000 --- a/contrib/gperf/gperf.texinfo +++ /dev/null @@ -1,1129 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@settitle User's Guide to @code{gperf} -@setfilename gperf.info - -@ifinfo -This file documents the features of the GNU Perfect Hash Function Generator - -Copyright (C) 1989 Free Software Foundation, Inc. - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -@ignore -Permission is granted to process this file through @TeX{} and print the -results, provided the printed document carries copying permission -notice identical to this one except for the removal of this paragraph -(this paragraph not being relevant to the printed manual). - -@end ignore - -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided also that the -section entitled ``GNU General Public License'' is included exactly as -in the original, and provided that the entire resulting derived work is -distributed under the terms of a permission notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that the section entitled ``GNU @code{gperf} General Public License'' and -this permission notice may be included in translations approved by the -Free Software Foundation instead of in the original English. -@end ifinfo - -@setchapternewpage odd - -@titlepage -@center @titlefont{User's Guide} -@sp 2 -@center @titlefont{for the} -@sp 2 -@center @titlefont{GNU @code{gperf} Utility} -@sp 4 -@center Douglas C. Schmidt -@sp 3 -@center last updated 1 November 1989 -@sp 1 -@center for version 2.0 -@page -@vskip 0pt plus 1filll -Copyright @copyright{} 1989 Free Software Foundation, Inc. - - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided also that the -section entitled ``GNU @code{gperf} General Public License'' is included exactly as -in the original, and provided that the entire resulting derived work is -distributed under the terms of a permission notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that the section entitled ``GNU @code{gperf} General Public License'' may be -included in a translation approved by the author instead of in the original -English. -@end titlepage - -@ifinfo -@node Top, Copying, , (DIR) -@ichapter Introduction - -This manual documents the GNU @code{gperf} perfect hash function generator -utility, focusing on its features and how to use them, and how to report -bugs. - -@end ifinfo -@menu -* Copying:: GNU @code{gperf} General Public License says - how you can copy and share @code{gperf}. -* Contributors:: People who have contributed to @code{gperf}. -* Motivation:: Introduction to @code{gperf}. -* Search Structures:: Static search structures and GNU GPERF. -* Description:: High-level discussion of how GPERF functions. -* Options:: A description of options to the program. -* Bugs:: Known bugs and limitations with GPERF. -* Projects:: Things still left to do. -* Implementation:: Implementation Details for GNU GPERF. -* Bibliography:: Material Referenced in this Report. -@end menu - -@node Copying, Contributors, Top, Top -@unnumbered GNU GENERAL PUBLIC LICENSE -@center Version 1, February 1989 - -@display -Copyright @copyright{} 1989 Free Software Foundation, Inc. -675 Mass Ave, Cambridge, MA 02139, USA - -Everyone is permitted to copy and distribute verbatim copies -of this license document, but changing it is not allowed. -@end display - -@unnumberedsec Preamble - - The license agreements of most software companies try to keep users -at the mercy of those companies. By contrast, our General Public -License is intended to guarantee your freedom to share and change free -software---to make sure the software is free for all its users. The -General Public License applies to the Free Software Foundation's -software and to any other program whose authors commit to using it. -You can use it for your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Specifically, the General Public License is designed to make -sure that you have the freedom to give away or sell copies of free -software, that you receive source code or can get it if you want it, -that you can change the software or use pieces of it in new free -programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of a such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must tell them their rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - The precise terms and conditions for copying, distribution and -modification follow. - -@iftex -@unnumberedsec TERMS AND CONDITIONS -@end iftex -@ifinfo -@center TERMS AND CONDITIONS -@end ifinfo - -@enumerate -@item -This License Agreement applies to any program or other work which -contains a notice placed by the copyright holder saying it may be -distributed under the terms of this General Public License. The -``Program'', below, refers to any such program or work, and a ``work based -on the Program'' means either the Program or any work containing the -Program or a portion of it, either verbatim or with modifications. Each -licensee is addressed as ``you''. - -@item -You may copy and distribute verbatim copies of the Program's source -code as you receive it, in any medium, provided that you conspicuously and -appropriately publish on each copy an appropriate copyright notice and -disclaimer of warranty; keep intact all the notices that refer to this -General Public License and to the absence of any warranty; and give any -other recipients of the Program a copy of this General Public License -along with the Program. You may charge a fee for the physical act of -transferring a copy. - -@item -You may modify your copy or copies of the Program or any portion of -it, and copy and distribute such modifications under the terms of Paragraph -1 above, provided that you also do the following: - -@itemize @bullet -@item -cause the modified files to carry prominent notices stating that -you changed the files and the date of any change; and - -@item -cause the whole of any work that you distribute or publish, that -in whole or in part contains the Program or any part thereof, either -with or without modifications, to be licensed at no charge to all -third parties under the terms of this General Public License (except -that you may choose to grant warranty protection to some or all -third parties, at your option). - -@item -If the modified program normally reads commands interactively when -run, you must cause it, when started running for such interactive use -in the simplest and most usual way, to print or display an -announcement including an appropriate copyright notice and a notice -that there is no warranty (or else, saying that you provide a -warranty) and that users may redistribute the program under these -conditions, and telling the user how to view a copy of this General -Public License. - -@item -You may charge a fee for the physical act of transferring a -copy, and you may at your option offer warranty protection in -exchange for a fee. -@end itemize - -Mere aggregation of another independent work with the Program (or its -derivative) on a volume of a storage or distribution medium does not bring -the other work under the scope of these terms. - -@item -You may copy and distribute the Program (or a portion or derivative of -it, under Paragraph 2) in object code or executable form under the terms of -Paragraphs 1 and 2 above provided that you also do one of the following: - -@itemize @bullet -@item -accompany it with the complete corresponding machine-readable -source code, which must be distributed under the terms of -Paragraphs 1 and 2 above; or, - -@item -accompany it with a written offer, valid for at least three -years, to give any third party free (except for a nominal charge -for the cost of distribution) a complete machine-readable copy of the -corresponding source code, to be distributed under the terms of -Paragraphs 1 and 2 above; or, - -@item -accompany it with the information you received as to where the -corresponding source code may be obtained. (This alternative is -allowed only for noncommercial distribution and only if you -received the program in object code or executable form alone.) -@end itemize - -Source code for a work means the preferred form of the work for making -modifications to it. For an executable file, complete source code means -all the source code for all modules it contains; but, as a special -exception, it need not include source code for modules which are standard -libraries that accompany the operating system on which the executable -file runs, or for standard header files or definitions files that -accompany that operating system. - -@item -You may not copy, modify, sublicense, distribute or transfer the -Program except as expressly provided under this General Public License. -Any attempt otherwise to copy, modify, sublicense, distribute or transfer -the Program is void, and will automatically terminate your rights to use -the Program under this License. However, parties who have received -copies, or rights to use copies, from you under this General Public -License will not have their licenses terminated so long as such parties -remain in full compliance. - -@item -By copying, distributing or modifying the Program (or any work based -on the Program) you indicate your acceptance of this license to do so, -and all its terms and conditions. - -@item -Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the original -licensor to copy, distribute or modify the Program subject to these -terms and conditions. You may not impose any further restrictions on the -recipients' exercise of the rights granted herein. - -@item -The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of the license which applies to it and ``any -later version'', you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -the license, you may choose any version ever published by the Free Software -Foundation. - -@item -If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - -@iftex -@heading NO WARRANTY -@end iftex -@ifinfo -@center NO WARRANTY -@end ifinfo - -@item -BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - -@item -IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL -ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES -ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT -LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES -SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE -WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN -ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. -@end enumerate - -@iftex -@heading END OF TERMS AND CONDITIONS -@end iftex -@ifinfo -@center END OF TERMS AND CONDITIONS -@end ifinfo - -@page -@unnumberedsec Appendix: How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to humanity, the best way to achieve this is to make it -free software which everyone can redistribute and change under these -terms. - - To do so, attach the following notices to the program. It is safest to -attach them to the start of each source file to most effectively convey -the exclusion of warranty; and each file should have at least the -``copyright'' line and a pointer to where the full notice is found. - -@smallexample -@var{one line to give the program's name and a brief idea of what it does.} -Copyright (C) 19@var{yy} @var{name of author} - -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 1, or (at your option) -any later version. - -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with this program; if not, write to the Free Software -Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. -@end smallexample - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - -@smallexample -Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author} -Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. -This is free software, and you are welcome to redistribute it -under certain conditions; type `show c' for details. -@end smallexample - -The hypothetical commands `show w' and `show c' should show the -appropriate parts of the General Public License. Of course, the -commands you use may be called something other than `show w' and `show -c'; they could even be mouse-clicks or menu items---whatever suits your -program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a ``copyright disclaimer'' for the program, if -necessary. Here a sample; alter the names: - -@example -Yoyodyne, Inc., hereby disclaims all copyright interest in the -program `Gnomovision' (a program to direct compilers to make passes -at assemblers) written by James Hacker. - -@var{signature of Ty Coon}, 1 April 1989 -Ty Coon, President of Vice -@end example - -That's all there is to it! - -@node Contributors, Motivation, Copying, Top -@unnumbered Contributors to GNU @code{gperf} Utility - -@itemize @bullet -@item -The GNU @code{gperf} perfect hash function generator utility was -originally written in GNU C++ by Douglas C. Schmidt. It is now also -available in a highly-portable ``old-style'' C version. The general -idea for the perfect hash function generator was inspired by Keith -Bostic's algorithm written in C, and distributed to @code{net.sources} around -1984. The current program is a heavily modified, enhanced, and extended -implementation of Keith's basic idea, created at the University of -California, Irvine. Bugs, patches, and suggestions should be reported -to @code{schmidt@@ics.uci.edu}. - -@item -Special thanks is extended to Michael Tiemann and Doug Lea, for -providing a useful compiler, and for giving me a forum to exhibit my -creation. - -In addition, Adam de Boor and Nels Olson provided many tips and insights -that greatly helped improve the quality and functionality of @code{gperf}. -@end itemize - -@node Motivation, Search Structures, Contributors, Top -@chapter Introduction - -@code{gperf} is a perfect hash function generator written in C++. It -transforms an @emph{n} element user-specified keyword set @emph{W} into -a perfect hash function @emph{F}. @emph{F} uniquely maps keywords in -@emph{W} onto the range 0..@emph{k}, where @emph{k} >= @emph{n}. If -@emph{k = n} then @emph{F} is a @emph{minimal} perfect hash function. -@code{gperf} generates a 0..@emph{k} element static lookup table and a -pair of C functions. These functions determine whether a given -character string @emph{s} occurs in @emph{W}, using at most one probe -into the lookup table. - -@code{gperf} currently generates the reserved keyword recognizer for -lexical analyzers in several production and research compilers and -language processing tools, including GNU C, GNU C++, GNU Pascal, GNU -Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is -available via anonymous ftp from @code{ics.uci.edu}. @code{gperf} also is -distributed along with the GNU libg++ library. Finally, a highly -portable, functionally equivalent K&R C version of @code{gperf} is -archived in @code{comp.sources.unix}, volume 20. - -@node Search Structures, Description, Motivation, Top -@chapter Static search structures and GNU @code{gperf} - -A @dfn{static search structure} is an Abstract Data Type with certain -fundamental operations, @emph{e.g.}, @emph{initialize}, @emph{insert}, -and @emph{retrieve}. Conceptually, all insertions occur before any -retrievals.@footnote{In practice, @code{gperf} generates a @code{static} -array containing search set keywords and any associated attributes -specified by the user. Thus, there is essentially no execution-time -cost for the insertions.} It is a useful data structure for -representing @emph{static search sets}. Static search sets occur -frequently in software system applications. Typical static search -sets include compiler reserved words, assembler instruction opcodes, -and built-in shell interpreter commands. Search set members, called -@dfn{keywords}, are inserted into the structure only once, usually -during program initialization, and are not generally modified at -run-time. - -Numerous static search structure implementations exist, @emph{e.g.}, -arrays, linked lists, binary search trees, digital search tries, and -hash tables. Different approaches offer trade-offs between space -utilization and search time efficiency. For example, an $n$ element -sorted array is space efficient, though the average-case time -complexity for retrieval operations using binary search is -proportional to $\log n$. Conversely, hash table implementations -often locate a table entry in constant time, but typically impose -additional memory overhead and exhibit poor worst case performance -@cite{aho, etc.}. - -@emph{Minimal perfect hash functions} provide an optimal solution for a -particular class of static search sets. A minimal perfect hash -function is defined by two properties: - -@itemize @bullet -@item It allows keyword recognition in a static search set using -at most @emph{one} probe into the hash table. This represents the -``perfect'' property. -@item The actual memory allocated to store the keywords is precisely -large enough for the keyword set, and @emph{no larger}. This is the -``minimal'' property. -@end itemize - -For most applications it is far easier to generate @emph{perfect} hash -functions than @emph{minimal perfect} hash functions @cite{many bozos}. -Moreover, non-minimal perfect hash functions frequently execute faster -than minimal ones in practice @cite{cichelli}. This phenomena occurs -since searching a sparse keyword table increases the probability of -locating a ``null'' entry, thereby reducing string comparisons. -@code{gperf}'s default behavior generates @emph{near-minimal} perfect hash -functions for keyword sets. However, @code{gperf} provides many -options that permit user control over the degree of minimality and -perfection. - -Static search sets often exhibit relative stability over time. For -example, Ada's 63 reserved words have remained constant for nearly a -decade. It is therefore frequently worthwhile to expend concerted -effort building an optimal search structure @emph{once}, if it -subsequently receives heavy use multiple times. @code{gperf} removes -the drudgery associated with constructing time- and space-efficient -search structures by hand. It has proven a useful and practical tool -for serious programming projects. Output from @code{gperf} is -currently used in several production and research compilers, including -GNU C, GNU C++, GNU Pascal, and GNU Modula 3. @footnote{The latter two -compilers are not yet part of the official GNU distribution.} Each -compiler utilizes @code{gperf} to automatically generate static search -structures that efficiently identify their respective reserved -keywords. - -@node Description, Options, Search Structures, Top -@chapter High-Level Description of GNU @code{gperf} - -@menu -* Input Format:: Input Format to @code{gperf} -* Output Format:: Output Format for Generated C Code with @code{gperf} -@end menu - -The perfect hash function generator @code{gperf} reads a set of -``keywords'' from a @dfn{keyfile} (or from the standard input by -default). It attempts to derive a perfect hashing function that -recognizes a member of the @dfn{static keyword set} with at most a -single probe into the lookup table. If @code{gperf} succeeds in -generating such a function it produces a pair of C source code routines -that perform hashing and table lookup recognition. All generated C code -is directed to the standard output. Command-line options described -below allow you to modify the input and output format to @code{gperf}. - -By default, @code{gperf} attempts to produce time-efficient code, with -less emphasis on efficient space utilization. However, several options -exist that permit trading-off execution time for storage space and vice -versa. In particular, expanding the generated table size produces a -sparse search structure, generally yielding faster searches. -Conversely, you can direct @code{gperf} to utilize a C @code{switch} -statement scheme that minimizes data space storage size. Furthermore, -using a C @code{switch} may actually speed up the keyword retrieval time -somewhat. Actual results depend on your C compiler, of course. - -In general, @code{gperf} assigns values to the characters it is using -for hashing until some set of values gives each keyword a unique value. -A helpful heuristic is that the larger the hash value range, the easier -it is for @code{gperf} to find and generate a perfect hash function. -Experimentation is the key to getting the most from @code{gperf}. - -@node Input Format, Declarations, Description, Description -@section Input Format to @code{gperf} - -You can control the input keyfile format by varying certain command-line -arguments, in particular the @samp{-t} option. The input's appearance -is similar to GNU utilities @code{flex} and @code{bison} (or UNIX -utilities @code{lex} and @code{yacc}). Here's an outline of the general -format: - -@group -@example -declarations -%% -keywords -%% -functions -@end example -@end group - -@emph{Unlike} @code{flex} or @code{bison}, all sections of @code{gperf}'s input -are optional. The following sections describe the input format for each -section. - -@menu -* Declarations:: @code{struct} Declarations and C Code Inclusion. -* Keywords:: Format for Keyword Entries. -* Functions:: Including Additional C Functions. -@end menu - -@node Declarations, Keywords, Input Format, Input Format -@subsection @code{struct} Declarations and C Code Inclusion - -The keyword input file optionally contains a section for including -arbitrary C declarations and definitions, as well as provisions for -providing a user-supplied @code{struct}. If the @samp{-t} option -@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last -component in the declaration section from the keyfile file. The first -field in this struct must be a @code{char *} identifier called ``name,'' -although it is possible to modify this field's name with the @samp{-K} -option described below. - -Here is simple example, using months of the year and their attributes as -input: - -@group -@example -struct months @{ char *name; int number; int days; int leap_days; @}; -%% -january, 1, 31, 31 -february, 2, 28, 29 -march, 3, 31, 31 -april, 4, 30, 30 -may, 5, 31, 31 -june, 6, 30, 30 -july, 7, 31, 31 -august, 8, 31, 31 -september, 9, 30, 30 -october, 10, 31, 31 -november, 11, 30, 30 -december, 12, 31, 31 -@end example -@end group - -Separating the @code{struct} declaration from the list of key words and -other fields are a pair of consecutive percent signs, @code{%%}, -appearing left justified in the first column, as in the UNIX utility -@code{lex}. - -Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it -is possible to directly include C source text and comments verbatim into -the generated output file. This is accomplished by enclosing the region -inside left-justified surrounding @code{%@{}, @code{%@}} pairs. Here is -an input fragment based on the previous example that illustrates this -feature: - -@group -@example -%@{ -#include <assert.h> -/* This section of code is inserted directly into the output. */ -int return_month_days (struct months *months, int is_leap_year); -%@} -struct months @{ char *name; int number; int days; int leap_days; @}; -%% -january, 1, 31, 31 -february, 2, 28, 29 -march, 3, 31, 31 -... -@end example -@end group - -It is possible to omit the declaration section entirely. In this case -the keyfile begins directly with the first keyword line, @emph{e.g.}: - -@group -@example -january, 1, 31, 31 -february, 2, 28, 29 -march, 3, 31, 31 -april, 4, 30, 30 -... -@end example -@end group - -@node Keywords, Functions, Declarations, Input Format -@subsection Format for Keyword Entries - -The second keyfile format section contains lines of keywords and any -associated attributes you might supply. A line beginning with @samp{#} -in the first column is considered a comment. Everything following the -@samp{#} is ignored, up to and including the following newline. - -The first field of each non-comment line is always the key itself. It -should be given as a simple name, @emph{i.e.}, without surrounding -string quotation marks, and be left-justified flush against the first -column. In this context, a ``field'' is considered to extend up to, but -not include, the first blank, comma, or newline. Here is a simple -example taken from a partial list of C reserved words: - -@group -@example -# These are a few C reserved words, see the c.@code{gperf} file -# for a complete list of ANSI C reserved words. -unsigned -sizeof -switch -signed -if -default -for -while -return -@end example -@end group - -Note that unlike @code{flex} or @code{bison} the first @code{%%} marker -may be elided if the declaration section is empty. - -Additional fields may optionally follow the leading keyword. Fields -should be separated by commas, and terminate at the end of line. What -these fields mean is entirely up to you; they are used to initialize the -elements of the user-defined @code{struct} provided by you in the -declaration section. If the @samp{-t} option is @emph{not} enabled -these fields are simply ignored. All previous examples except the last -one contain keyword attributes. - -@node Functions, Output Format, Keywords, Input Format -@subsection Including Additional C Functions - -The optional third section also corresponds closely with conventions -found in @code{flex} and @code{bison}. All text in this section, -starting at the final @code{%%} and extending to the end of the input -file, is included verbatim into the generated output file. Naturally, -it is your responsibility to ensure that the code contained in this -section is valid C. - -@node Output Format, , Functions, Description -@section Output Format for Generated C Code with @code{gperf} - -Several options control how the generated C code appears on the standard -output. Two C function are generated. They are called @code{hash} and -@code{in_word_set}, although you may modify the name for -@code{in_word_set} with a command-line option. Both functions require -two arguments, a string, @code{char *} @var{str}, and a length -parameter, @code{int} @var{len}. Their default function prototypes are -as follows: - -@group -@example -static int hash (char *str, int len); -int in_word_set (char *str, int len); -@end example -@end group - -By default, the generated @code{hash} function returns an integer value -created by adding @var{len} to several user-specified @var{str} key -positions indexed into an @dfn{associated values} table stored in a -local static array. The associated values table is constructed -internally by @code{gperf} and later output as a static local C array called -@var{hash_table}; its meaning and properties are described below. -@xref{Implementation}. The relevant key positions are specified via the -@samp{-k} option when running @code{gperf}, as detailed in the @emph{Options} -section below. @xref{Options}. - -Two options, @samp{-g} (assume you are compiling with GNU C and its -@code{inline} feature) and @samp{-a} (assume ANSI C-style function -prototypes), alter the content of both the generated @code{hash} and -@code{in_word_set} routines. However, function @code{in_word_set} may -be modified more extensively, in response to your option settings. The -options that affect the @code{in_word_set} structure are: - -@itemize @bullet -@table @samp -@item -p -Have function @code{in_word_set} return a pointer rather than a boolean. - -@item -t -Make use of the user-defined @code{struct}. - -@item -S @var{total switch statements} -Generate 1 or more C @code{switch} statement rather than use a large, -(and potentially sparse) static array. Although the exact time and -space savings of this approach vary according to your C compiler's -degree of optimization, this method often results in smaller and faster -code. -@end table -@end itemize - -If the @samp{-t}, @samp{-S}, and @samp{-p} options are omitted the -default action is to generate a @code{char *} array containing the keys, -together with additional null strings used for padding the array. By -experimenting with the various input and output options, and timing the -resulting C code, you can determine the best option choices for -different keyword set characteristics. - -@node Options, Bugs, Description, Top -@chapter Options to the @code{gperf} Utility - -There are @emph{many} options to @code{gperf}. They were added to make -the program more convenient for use with real applications. ``On-line'' -help is readily available via the @samp{-h} option. Other options -include: - -@itemize @bullet -@table @samp -@item -a -Generate ANSI Standard C code using function prototypes. The default is -to use ``classic'' K&R C function declaration syntax. - -@item -c -Generates C code that uses the @code{strncmp} function to perform -string comparisons. The default action is to use @code{strcmp}. - -@item -C -Makes the contents of all generated lookup tables constant, @emph{i.e.}, -``readonly.'' Many compilers can generate more efficient code for this -by putting the tables in readonly memory. - -@item -d -Enables the debugging option. This produces verbose diagnostics to -``standard error'' when @code{gperf} is executing. It is useful both for -maintaining the program and for determining whether a given set of -options is actually speeding up the search for a solution. Some useful -information is dumped at the end of the program when the @samp{-d} -option is enabled. - -@item -D -Handle keywords whose key position sets hash to duplicate values. -Duplicate hash values occur for two reasons: - -@itemize @bullet -@item -Since @code{gperf} does not backtrack it is possible for it to process -all your input keywords without finding a unique mapping for each word. -However, frequently only a very small number of duplicates occur, and -the majority of keys still require one probe into the table. -@item -Sometimes a set of keys may have the same names, but possess different -attributes. With the -D option @code{gperf} treats all these keys as part of -an equivalence class and generates a perfect hash function with multiple -comparisons for duplicate keys. It is up to you to completely -disambiguate the keywords by modifying the generated C code. However, -@code{gperf} helps you out by organizing the output. -@end itemize - -Option @samp{-D} is extremely useful for certain large or highly -redundant keyword sets, @emph{i.e.}, assembler instruction opcodes. -Using this option usually means that the generated hash function is no -longer perfect. On the other hand, it permits @code{gperf} to work on keyword -sets that it otherwise could not handle. - -@item -e @var{keyword delimiter list} -Allows the user to provide a string containing delimiters used to -separate keywords from their attributes. The default is ",\n". This -option is essential if you want to use keywords that have embedded -commas or newlines. One useful trick is to use -e'TAB', where TAB is -the literal tab character. - -@item -f @var{iteration amount} -Generate the perfect hash function ``fast.'' This decreases @code{gperf}'s -running time at the cost of minimizing generated table-size. The -iteration amount represents the number of times to iterate when -resolving a collision. `0' means `iterate by the number of keywords. -This option is probably most useful when used in conjunction with options -@samp{-D} and/or @samp{-S} for @emph{large} keyword sets. - -@item -g -Assume a GNU compiler, @emph{e.g.}, @code{g++} or @code{gcc}. This -makes all generated routines use the ``inline'' keyword to remove the -cost of function calls. Note that @samp{-g} does @emph{not} imply -@samp{-a}, since other non-ANSI C compilers may have provisions for a -function @code{inline} feature. - -@item -G -Generate the static table of keywords as a static global variable, -rather than hiding it inside of the lookup function (which is the -default behavior). - -@item -h -Prints a short summary on the meaning of each program option. Aborts -further program execution. - -@item -H @var{hash function name} -Allows you to specify the name for the generated hash function. Default -name is `hash.' This option permits the use of two hash tables in the -same file. - -@item -i @var{initial value} -Provides an initial @var{value} for the associate values array. Default -is 0. Increasing the initial value helps inflate the final table size, -possibly leading to more time efficient keyword lookups. Note that this -option is not particularly useful when @samp{-S} is used. Also, -@samp{-i} is overriden when the @samp{-r} option is used. - -@item -j @var{jump value} -Affects the ``jump value,'' @emph{i.e.}, how far to advance the -associated character value upon collisions. @var{Jump value} is rounded -up to an odd number, the default is 5. If the @var{jump value} is 0 @code{gperf} -jumps by random amounts. - -@item -k @var{keys} -Allows selection of the character key positions used in the keywords' -hash function. The allowable choices range between 1-126, inclusive. -The positions are separated by commas, @emph{e.g.}, @samp{-k 9,4,13,14}; -ranges may be used, @emph{e.g.}, @samp{-k 2-7}; and positions may occur -in any order. Furthermore, the meta-character '*' causes the generated -hash function to consider @strong{all} character positions in each key, -whereas '$' instructs the hash function to use the ``final character'' -of a key (this is the only way to use a character position greater than -126, incidentally). - -For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash -function that considers positions 1,2,4,6,7,8,9,10, plus the last -character in each key (which may differ for each key, obviously). Keys -with length less than the indicated key positions work properly, since -selected key positions exceeding the key length are simply not -referenced in the hash function. - -@item -K @var{key name} -By default, the program assumes the structure component identifier for -the keyword is ``name.'' This option allows an arbitrary choice of -identifier for this component, although it still must occur as the first -field in your supplied @code{struct}. - -@item -l -Compare key lengths before trying a string comparison. This might cut -down on the number of string comparisons made during the lookup, since -keys with different lengths are never compared via @code{strcmp}. -However, using @samp{-l} might greatly increase the size of the -generated C code if the lookup table range is large (which implies that -the switch option @samp{-S} is not enabled), since the length table -contains as many elements as there are entries in the lookup table. - -@item -n -Instructs the generator not to include the length of a keyword when -computing its hash value. This may save a few assembly instructions in -the generated lookup table. - -@item -N @var{lookup function name} -Allows you to specify the name for the generated lookup function. -Default name is `in_word_set.' This option permits completely automatic -generation of perfect hash functions, especially when multiple generated -hash functions are used in the same application. - -@item -o -Reorders the keywords by sorting the keywords so that frequently -occuring key position set components appear first. A second reordering -pass follows so that keys with ``already determined values'' are placed -towards the front of the keylist. This may decrease the time required -to generate a perfect hash function for many keyword sets, and also -produce more minimal perfect hash functions. The reason for this is -that the reordering helps prune the search time by handling inevitable -collisions early in the search process. On the other hand, if the -number of keywords is @emph{very} large using @samp{-o} may -@emph{increase} @code{gperf}'s execution time, since collisions will begin -earlier and continue throughout the remainder of keyword processing. -See Cichelli's paper from the January 1980 Communications of the ACM for -details. - -@item -p -Changes the return value of the generated function @code{in_word_set} -from boolean (@emph{i.e.}, 0 or 1), to either type ``pointer to -user-defined struct,'' (if the @samp{-t} option is enabled), or simply -to @code{char *}, if @samp{-t} is not enabled. This option is most -useful when the @samp{-t} option (allowing user-defined structs) is -used. For example, it is possible to automatically generate the GNU C -reserved word lookup routine with the options @samp{-p} and @samp{-t}. - -@item -r -Utilizes randomness to initialize the associated values table. This -frequently generates solutions faster than using deterministic -initialization (which starts all associated values at 0). Furthermore, -using the randomization option generally increases the size of the -table. If @code{gperf} has difficultly with a certain keyword set try using -@samp{-r} or @samp{-D}. - -@item -s @var{size-multiple} -Affects the size of the generated hash table. The numeric argument for -this option indicates ``how many times larger'' the maximum associated -value range should be, in relationship to the number of keys. For -example, a value of 3 means ``allow the maximum associated value to be -about 3 times larger than the number of input keys.'' If option -@samp{-S} is @emph{not} enabled, the maximum associated value influences -the static array table size, and a larger table should decrease the time -required for an unsuccessful search, at the expense of extra table -space. - -The default value is 1, thus the default maximum associated value about -the same size as the number of keys ( for efficiency, the maximum -associated value is always rounded up to a power of 2). The actual -table size may vary somewhat, since this technique is essentially a -heuristic. In particular, setting this value too high slows down -@code{gperf}'s runtime, since it must search through a much larger range of -values. Judicious use of the @samp{-f} option helps alleviate this -overhead, however. - -@item -S @var{total switch statements} -Causes the generated C code to use a @code{switch} statement scheme, -rather than an array lookup table. This can lead to a reduction in both -time and space requirements for some keyfiles. The argument to this -option determines how many @code{switch} statements are generated. A -value of 1 generates 1 @code{switch} containing all the elements, a -value of 2 generates 2 tables with 1/2 the elements in each -@code{switch}, etc. This is useful since many C compilers cannot -correctly generate code for large @code{switch} statements. This option -was inspired in part by Keith Bostic's original C program. - -@item -t -Allows you to include a @code{struct} type declaration for generated -code. Any text before a pair of consecutive %% is consider part of the -type declaration. Key words and additional fields may follow this, one -group of fields per line. A set of examples for generating perfect hash -tables and functions for Ada, C, and G++, Pascal, and Modula 2 and 3 -reserved words are distributed with this release. - -@item -T -Prevents the transfer of the type declaration to the output file. Use -this option if the type is already defined elsewhere. - -@item -v -Prints out the current version number. -@end table -@end itemize - -@node Bugs, Projects, Options, Top -@chapter Known Bugs and Limitations with @code{gperf} - -The following are some limitations with the current release of -@code{gperf}: - -@itemize @bullet -@item -The @code{gperf} utility is tuned to execute quickly, and works quickly -for small to medium size data sets (around 1000 keywords). It is -extremely useful for maintaining perfect hash functions for compiler -keyword sets. Several recent enhancements now enable @code{gperf} to -work efficiently on much larger keyword sets (over 15,000 keywords). -When processing large keyword sets it helps greatly to have over 8 megs -of RAM. - -However, since @code{gperf} does not backtrack no guaranteed solution -occurs on every run. On the other hand, it is usually easy to obtain a -solution by varying the option parameters. In particular, try the -@samp{-r} option, and also try changing the default arguments to the -@samp{-s} and @samp{-j} options. To @emph{guarantee} a solution, use -the @samp{-D} and @samp{-S} options, although the final results are not -likely to be a @emph{perfect} hash function anymore! Finally, use the -@samp{-f} option if you want @code{gperf} to generate the perfect hash -function @emph{fast}, with less emphasis on making it minimal. - -@item -The size of the generate static keyword array can get @emph{extremely} -large if the input keyword file is large or if the keywords are quite -similar. This tends to slow down the compilation of the generated C -code, and @emph{greatly} inflates the object code size. If this -situation occurs, consider using the @samp{-S} option to reduce data -size, potentially increasing keyword recognition time a negligible -amount. Since many C compilers cannot correctly generated code for -large switch statements it is important to qualify the @var{-S} option -with an appropriate numerical argument that controls the number of -switch statements generated. - -@item -The maximum number of key positions selected for a given key has an -arbitrary limit of 126. This restriction should be removed, and if -anyone considers this a problem write me and let me know so I can remove -the constraint. - -@item -The C++ source code only compiles correctly with GNU G++, version 1.36 -(and hopefully later versions). Porting to AT&T cfront would be -tedious, but possible (and desirable). There is also a K&R C version -available now. This should compile without change on most BSD systems, -but may require a bit of work to run on SYSV, since @code{gperf} uses -@var{alloca} in several places. Send mail to @code{schmidt@@ics.uci.edu} for -information. -@end itemize - -@node Projects, Implementation, Bugs, Top -@chapter Things Still Left to Do - -It should be ``relatively'' easy to replace the current perfect hash -function algorithm with a more exhaustive approach; the perfect hash -module is essential independent from other program modules. Additional -worthwhile improvements include: - -@itemize @bullet -@item -Make the algorithm more robust. At present, the program halts with an -error diagnostic if it can't find a direct solution and the @samp{-D} -option is not enabled. A more comprehensive, albeit computationally -expensive, approach would employ backtracking or enable alternative -options and retry. It's not clear how helpful this would be, in -general, since most search sets are rather small in practice. - -@item -Another useful extension involves modifying the program to generate -``minimal'' perfect hash functions (under certain circumstances, the -current version can be rather extravagant in the generated table size). -Again, this is mostly of theoretical interest, since a sparse table -often produces faster lookups, and use of the @samp{-S} @code{switch} -option can minimize the data size, at the expense of slightly longer -lookups (note that the gcc compiler generally produces good code for -@code{switch} statements, reducing the need for more complex schemes). - -@item -In addition to improving the algorithm, it would also be useful to -generate a C++ class or Ada package as the code output, in addition to -the current C routines. -@end itemize - -@node Implementation, Bibliography, Projects, Top -@chapter Implementation Details of GNU @code{gperf} - -A paper describing the high-level description of the data structures and -algorithms used to implement @code{gperf} will soon be available. This -paper is useful not only from a maintenance and enhancement perspective, -but also because they demonstrate several clever and useful programming -techniques, @emph{e.g.}, `Iteration Number' boolean arrays, double -hashing, a ``safe'' and efficient method for reading arbitrarily long -input from a file, and a provably optimal algorithm for simultaneously -determining both the minimum and maximum elements in a list. - -@page - -@node Bibliography, , Implementation, Top -@chapter Bibliography - -[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect -Hashing Functions} Information Sciences 39(1986), 187-195. - -[2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash -Functions Method''} Communications of the ACM, 23, 12(December 1980), 729. - -[3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple} -Communications of the ACM, 23, 1(January 1980), 17-19. - -[4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal -Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27. - -[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M. -@i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58. - -[6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal -Perfect Hashing Functions} Communications of the ACM, 24, 12(December -1981), 829-833. - -[7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect -Hash Functions Method} Communications of the ACM, 23, 12(December 1980), -728-729. - -[8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect -Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532 - -[9] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions -for Reserved Word Lists} SIGPLAN Notices, 20, 12(September 1985), 47-53. -@contents - -[10] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe -Retrieving Method for Static Sets} Communications of the ACM, 20 -11(November 1977), 841-850. - -[11] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation, -1988. - -[12] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986. - -[13] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software -Foundation, 1989. -@bye |