diff options
author | conklin <conklin@FreeBSD.org> | 1993-07-30 20:16:53 +0000 |
---|---|---|
committer | conklin <conklin@FreeBSD.org> | 1993-07-30 20:16:53 +0000 |
commit | 5e0c8d9ee2600d5f7d5c710e34249ae51db60ba7 (patch) | |
tree | e01cf2a5cc6062467dbb628a7beef06eaa39845d /gnu/lib/libregex/doc/regex.info | |
parent | 605993f7be3eb2154f219e29b739c6d76f373405 (diff) | |
download | FreeBSD-src-5e0c8d9ee2600d5f7d5c710e34249ae51db60ba7.zip FreeBSD-src-5e0c8d9ee2600d5f7d5c710e34249ae51db60ba7.tar.gz |
GNU Regex 0.12
Diffstat (limited to 'gnu/lib/libregex/doc/regex.info')
-rw-r--r-- | gnu/lib/libregex/doc/regex.info | 2836 |
1 files changed, 2836 insertions, 0 deletions
diff --git a/gnu/lib/libregex/doc/regex.info b/gnu/lib/libregex/doc/regex.info new file mode 100644 index 0000000..90deede --- /dev/null +++ b/gnu/lib/libregex/doc/regex.info @@ -0,0 +1,2836 @@ +This is Info file regex.info, produced by Makeinfo-1.52 from the input +file .././doc/regex.texi. + + This file documents the GNU regular expression library. + + Copyright (C) 1992, 1993 Free Software Foundation, Inc. + + Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + + Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided also that the +section entitled "GNU General Public License" is included exactly as in +the original, and provided that the entire resulting derived work is +distributed under the terms of a permission notice identical to this +one. + + Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that the section entitled "GNU General Public License" +may be included in a translation approved by the Free Software +Foundation instead of in the original English. + + +File: regex.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir) + +Regular Expression Library +************************** + + This manual documents how to program with the GNU regular expression +library. This is edition 0.12a of the manual, 19 September 1992. + + The first part of this master menu lists the major nodes in this Info +document, including the index. The rest of the menu lists all the +lower level nodes in the document. + +* Menu: + +* Overview:: +* Regular Expression Syntax:: +* Common Operators:: +* GNU Operators:: +* GNU Emacs Operators:: +* What Gets Matched?:: +* Programming with Regex:: +* Copying:: Copying and sharing Regex. +* Index:: General index. + -- The Detailed Node Listing -- + +Regular Expression Syntax + +* Syntax Bits:: +* Predefined Syntaxes:: +* Collating Elements vs. Characters:: +* The Backslash Character:: + +Common Operators + +* Match-self Operator:: Ordinary characters. +* Match-any-character Operator:: . +* Concatenation Operator:: Juxtaposition. +* Repetition Operators:: * + ? {} +* Alternation Operator:: | +* List Operators:: [...] [^...] +* Grouping Operators:: (...) +* Back-reference Operator:: \digit +* Anchoring Operators:: ^ $ + +Repetition Operators + +* Match-zero-or-more Operator:: * +* Match-one-or-more Operator:: + +* Match-zero-or-one Operator:: ? +* Interval Operators:: {} + +List Operators (`[' ... `]' and `[^' ... `]') + +* Character Class Operators:: [:class:] +* Range Operator:: start-end + +Anchoring Operators + +* Match-beginning-of-line Operator:: ^ +* Match-end-of-line Operator:: $ + +GNU Operators + +* Word Operators:: +* Buffer Operators:: + +Word Operators + +* Non-Emacs Syntax Tables:: +* Match-word-boundary Operator:: \b +* Match-within-word Operator:: \B +* Match-beginning-of-word Operator:: \< +* Match-end-of-word Operator:: \> +* Match-word-constituent Operator:: \w +* Match-non-word-constituent Operator:: \W + +Buffer Operators + +* Match-beginning-of-buffer Operator:: \` +* Match-end-of-buffer Operator:: \' + +GNU Emacs Operators + +* Syntactic Class Operators:: + +Syntactic Class Operators + +* Emacs Syntax Tables:: +* Match-syntactic-class Operator:: \sCLASS +* Match-not-syntactic-class Operator:: \SCLASS + +Programming with Regex + +* GNU Regex Functions:: +* POSIX Regex Functions:: +* BSD Regex Functions:: + +GNU Regex Functions + +* GNU Pattern Buffers:: The re_pattern_buffer type. +* GNU Regular Expression Compiling:: re_compile_pattern () +* GNU Matching:: re_match () +* GNU Searching:: re_search () +* Matching/Searching with Split Data:: re_match_2 (), re_search_2 () +* Searching with Fastmaps:: re_compile_fastmap () +* GNU Translate Tables:: The `translate' field. +* Using Registers:: The re_registers type and related fns. +* Freeing GNU Pattern Buffers:: regfree () + +POSIX Regex Functions + +* POSIX Pattern Buffers:: The regex_t type. +* POSIX Regular Expression Compiling:: regcomp () +* POSIX Matching:: regexec () +* Reporting Errors:: regerror () +* Using Byte Offsets:: The regmatch_t type. +* Freeing POSIX Pattern Buffers:: regfree () + +BSD Regex Functions + +* BSD Regular Expression Compiling:: re_comp () +* BSD Searching:: re_exec () + + +File: regex.info, Node: Overview, Next: Regular Expression Syntax, Prev: Top, Up: Top + +Overview +******** + + A "regular expression" (or "regexp", or "pattern") is a text string +that describes some (mathematical) set of strings. A regexp R +"matches" a string S if S is in the set of strings described by R. + + Using the Regex library, you can: + + * see if a string matches a specified pattern as a whole, and + + * search within a string for a substring matching a specified + pattern. + + Some regular expressions match only one string, i.e., the set they +describe has only one member. For example, the regular expression +`foo' matches the string `foo' and no others. Other regular +expressions match more than one string, i.e., the set they describe has +more than one member. For example, the regular expression `f*' matches +the set of strings made up of any number (including zero) of `f's. As +you can see, some characters in regular expressions match themselves +(such as `f') and some don't (such as `*'); the ones that don't match +themselves instead let you specify patterns that describe many +different strings. + + To either match or search for a regular expression with the Regex +library functions, you must first compile it with a Regex pattern +compiling function. A "compiled pattern" is a regular expression +converted to the internal format used by the library functions. Once +you've compiled a pattern, you can use it for matching or searching any +number of times. + + The Regex library consists of two source files: `regex.h' and +`regex.c'. Regex provides three groups of functions with which you can +operate on regular expressions. One group--the GNU group--is more +powerful but not completely compatible with the other two, namely the +POSIX and Berkeley UNIX groups; its interface was designed specifically +for GNU. The other groups have the same interfaces as do the regular +expression functions in POSIX and Berkeley UNIX. + + We wrote this chapter with programmers in mind, not users of +programs--such as Emacs--that use Regex. We describe the Regex library +in its entirety, not how to write regular expressions that a particular +program understands. + + +File: regex.info, Node: Regular Expression Syntax, Next: Common Operators, Prev: Overview, Up: Top + +Regular Expression Syntax +************************* + + "Characters" are things you can type. "Operators" are things in a +regular expression that match one or more characters. You compose +regular expressions from operators, which in turn you specify using one +or more characters. + + Most characters represent what we call the match-self operator, i.e., +they match themselves; we call these characters "ordinary". Other +characters represent either all or parts of fancier operators; e.g., +`.' represents what we call the match-any-character operator (which, no +surprise, matches (almost) any character); we call these characters +"special". Two different things determine what characters represent +what operators: + + 1. the regular expression syntax your program has told the Regex + library to recognize, and + + 2. the context of the character in the regular expression. + + In the following sections, we describe these things in more detail. + +* Menu: + +* Syntax Bits:: +* Predefined Syntaxes:: +* Collating Elements vs. Characters:: +* The Backslash Character:: + + +File: regex.info, Node: Syntax Bits, Next: Predefined Syntaxes, Up: Regular Expression Syntax + +Syntax Bits +=========== + + In any particular syntax for regular expressions, some characters are +always special, others are sometimes special, and others are never +special. The particular syntax that Regex recognizes for a given +regular expression depends on the value in the `syntax' field of the +pattern buffer of that regular expression. + + You get a pattern buffer by compiling a regular expression. *Note +GNU Pattern Buffers::, and *Note POSIX Pattern Buffers::, for more +information on pattern buffers. *Note GNU Regular Expression +Compiling::, *Note POSIX Regular Expression Compiling::, and *Note BSD +Regular Expression Compiling::, for more information on compiling. + + Regex considers the value of the `syntax' field to be a collection of +bits; we refer to these bits as "syntax bits". In most cases, they +affect what characters represent what operators. We describe the +meanings of the operators to which we refer in *Note Common Operators::, +*Note GNU Operators::, and *Note GNU Emacs Operators::. + + For reference, here is the complete list of syntax bits, in +alphabetical order: + +`RE_BACKSLASH_ESCAPE_IN_LISTS' + If this bit is set, then `\' inside a list (*note List Operators::. + quotes (makes ordinary, if it's special) the following character; + if this bit isn't set, then `\' is an ordinary character inside + lists. (*Note The Backslash Character::, for what `\' does + outside of lists.) + +`RE_BK_PLUS_QM' + If this bit is set, then `\+' represents the match-one-or-more + operator and `\?' represents the match-zero-or-more operator; if + this bit isn't set, then `+' represents the match-one-or-more + operator and `?' represents the match-zero-or-one operator. This + bit is irrelevant if `RE_LIMITED_OPS' is set. + +`RE_CHAR_CLASSES' + If this bit is set, then you can use character classes in lists; + if this bit isn't set, then you can't. + +`RE_CONTEXT_INDEP_ANCHORS' + If this bit is set, then `^' and `$' are special anywhere outside + a list; if this bit isn't set, then these characters are special + only in certain contexts. *Note Match-beginning-of-line + Operator::, and *Note Match-end-of-line Operator::. + +`RE_CONTEXT_INDEP_OPS' + If this bit is set, then certain characters are special anywhere + outside a list; if this bit isn't set, then those characters are + special only in some contexts and are ordinary elsewhere. + Specifically, if this bit isn't set then `*', and (if the syntax + bit `RE_LIMITED_OPS' isn't set) `+' and `?' (or `\+' and `\?', + depending on the syntax bit `RE_BK_PLUS_QM') represent repetition + operators only if they're not first in a regular expression or + just after an open-group or alternation operator. The same holds + for `{' (or `\{', depending on the syntax bit `RE_NO_BK_BRACES') if + it is the beginning of a valid interval and the syntax bit + `RE_INTERVALS' is set. + +`RE_CONTEXT_INVALID_OPS' + If this bit is set, then repetition and alternation operators + can't be in certain positions within a regular expression. + Specifically, the regular expression is invalid if it has: + + * a repetition operator first in the regular expression or just + after a match-beginning-of-line, open-group, or alternation + operator; or + + * an alternation operator first or last in the regular + expression, just before a match-end-of-line operator, or just + after an alternation or open-group operator. + + If this bit isn't set, then you can put the characters + representing the repetition and alternation characters anywhere in + a regular expression. Whether or not they will in fact be + operators in certain positions depends on other syntax bits. + +`RE_DOT_NEWLINE' + If this bit is set, then the match-any-character operator matches + a newline; if this bit isn't set, then it doesn't. + +`RE_DOT_NOT_NULL' + If this bit is set, then the match-any-character operator doesn't + match a null character; if this bit isn't set, then it does. + +`RE_INTERVALS' + If this bit is set, then Regex recognizes interval operators; if + this bit isn't set, then it doesn't. + +`RE_LIMITED_OPS' + If this bit is set, then Regex doesn't recognize the + match-one-or-more, match-zero-or-one or alternation operators; if + this bit isn't set, then it does. + +`RE_NEWLINE_ALT' + If this bit is set, then newline represents the alternation + operator; if this bit isn't set, then newline is ordinary. + +`RE_NO_BK_BRACES' + If this bit is set, then `{' represents the open-interval operator + and `}' represents the close-interval operator; if this bit isn't + set, then `\{' represents the open-interval operator and `\}' + represents the close-interval operator. This bit is relevant only + if `RE_INTERVALS' is set. + +`RE_NO_BK_PARENS' + If this bit is set, then `(' represents the open-group operator and + `)' represents the close-group operator; if this bit isn't set, + then `\(' represents the open-group operator and `\)' represents + the close-group operator. + +`RE_NO_BK_REFS' + If this bit is set, then Regex doesn't recognize `\'DIGIT as the + back reference operator; if this bit isn't set, then it does. + +`RE_NO_BK_VBAR' + If this bit is set, then `|' represents the alternation operator; + if this bit isn't set, then `\|' represents the alternation + operator. This bit is irrelevant if `RE_LIMITED_OPS' is set. + +`RE_NO_EMPTY_RANGES' + If this bit is set, then a regular expression with a range whose + ending point collates lower than its starting point is invalid; if + this bit isn't set, then Regex considers such a range to be empty. + +`RE_UNMATCHED_RIGHT_PAREN_ORD' + If this bit is set and the regular expression has no matching + open-group operator, then Regex considers what would otherwise be + a close-group operator (based on how `RE_NO_BK_PARENS' is set) to + match `)'. + + +File: regex.info, Node: Predefined Syntaxes, Next: Collating Elements vs. Characters, Prev: Syntax Bits, Up: Regular Expression Syntax + +Predefined Syntaxes +=================== + + If you're programming with Regex, you can set a pattern buffer's +(*note GNU Pattern Buffers::., and *Note POSIX Pattern Buffers::) +`syntax' field either to an arbitrary combination of syntax bits (*note +Syntax Bits::.) or else to the configurations defined by Regex. These +configurations define the syntaxes used by certain programs--GNU Emacs, +POSIX Awk, traditional Awk, Grep, Egrep--in addition to syntaxes for +POSIX basic and extended regular expressions. + + The predefined syntaxes-taken directly from `regex.h'--are: + + #define RE_SYNTAX_EMACS 0 + + #define RE_SYNTAX_AWK \ + (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \ + | RE_NO_BK_PARENS | RE_NO_BK_REFS \ + | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \ + | RE_UNMATCHED_RIGHT_PAREN_ORD) + + #define RE_SYNTAX_POSIX_AWK \ + (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS) + + #define RE_SYNTAX_GREP \ + (RE_BK_PLUS_QM | RE_CHAR_CLASSES \ + | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \ + | RE_NEWLINE_ALT) + + #define RE_SYNTAX_EGREP \ + (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \ + | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \ + | RE_NEWLINE_ALT | RE_NO_BK_PARENS \ + | RE_NO_BK_VBAR) + + #define RE_SYNTAX_POSIX_EGREP \ + (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES) + + /* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */ + #define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC + + #define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC + + /* Syntax bits common to both basic and extended POSIX regex syntax. */ + #define _RE_SYNTAX_POSIX_COMMON \ + (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \ + | RE_INTERVALS | RE_NO_EMPTY_RANGES) + + #define RE_SYNTAX_POSIX_BASIC \ + (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM) + + /* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes + RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this + isn't minimal, since other operators, such as \`, aren't disabled. */ + #define RE_SYNTAX_POSIX_MINIMAL_BASIC \ + (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS) + + #define RE_SYNTAX_POSIX_EXTENDED \ + (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ + | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \ + | RE_NO_BK_PARENS | RE_NO_BK_VBAR \ + | RE_UNMATCHED_RIGHT_PAREN_ORD) + + /* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS + replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */ + #define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \ + (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \ + | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \ + | RE_NO_BK_PARENS | RE_NO_BK_REFS \ + | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD) + + +File: regex.info, Node: Collating Elements vs. Characters, Next: The Backslash Character, Prev: Predefined Syntaxes, Up: Regular Expression Syntax + +Collating Elements vs. Characters +================================= + + POSIX generalizes the notion of a character to that of a collating +element. It defines a "collating element" to be "a sequence of one or +more bytes defined in the current collating sequence as a unit of +collation." + + This generalizes the notion of a character in two ways. First, a +single character can map into two or more collating elements. For +example, the German "es-zet" collates as the collating element `s' +followed by another collating element `s'. Second, two or more +characters can map into one collating element. For example, the +Spanish `ll' collates after `l' and before `m'. + + Since POSIX's "collating element" preserves the essential idea of a +"character," we use the latter, more familiar, term in this document. + + +File: regex.info, Node: The Backslash Character, Prev: Collating Elements vs. Characters, Up: Regular Expression Syntax + +The Backslash Character +======================= + + The `\' character has one of four different meanings, depending on +the context in which you use it and what syntax bits are set (*note +Syntax Bits::.). It can: 1) stand for itself, 2) quote the next +character, 3) introduce an operator, or 4) do nothing. + + 1. It stands for itself inside a list (*note List Operators::.) if + the syntax bit `RE_BACKSLASH_ESCAPE_IN_LISTS' is not set. For + example, `[\]' would match `\'. + + 2. It quotes (makes ordinary, if it's special) the next character + when you use it either: + + * outside a list,(1) or + + * inside a list and the syntax bit + `RE_BACKSLASH_ESCAPE_IN_LISTS' is set. + + 3. It introduces an operator when followed by certain ordinary + characters--sometimes only when certain syntax bits are set. See + the cases `RE_BK_PLUS_QM', `RE_NO_BK_BRACES', `RE_NO_BK_VAR', + `RE_NO_BK_PARENS', `RE_NO_BK_REF' in *Note Syntax Bits::. Also: + + * `\b' represents the match-word-boundary operator (*note + Match-word-boundary Operator::.). + + * `\B' represents the match-within-word operator (*note + Match-within-word Operator::.). + + * `\<' represents the match-beginning-of-word operator + (*note Match-beginning-of-word Operator::.). + + * `\>' represents the match-end-of-word operator (*note + Match-end-of-word Operator::.). + + * `\w' represents the match-word-constituent operator (*note + Match-word-constituent Operator::.). + + * `\W' represents the match-non-word-constituent operator + (*note Match-non-word-constituent Operator::.). + + * `\`' represents the match-beginning-of-buffer operator and + `\'' represents the match-end-of-buffer operator (*note + Buffer Operators::.). + + * If Regex was compiled with the C preprocessor symbol `emacs' + defined, then `\sCLASS' represents the match-syntactic-class + operator and `\SCLASS' represents the + match-not-syntactic-class operator (*note Syntactic Class + Operators::.). + + 4. In all other cases, Regex ignores `\'. For example, `\n' matches + `n'. + + + ---------- Footnotes ---------- + + (1) Sometimes you don't have to explicitly quote special characters +to make them ordinary. For instance, most characters lose any special +meaning inside a list (*note List Operators::.). In addition, if the +syntax bits `RE_CONTEXT_INVALID_OPS' and `RE_CONTEXT_INDEP_OPS' aren't +set, then (for historical reasons) the matcher considers special +characters ordinary if they are in contexts where the operations they +represent make no sense; for example, then the match-zero-or-more +operator (represented by `*') matches itself in the regular expression +`*foo' because there is no preceding expression on which it can +operate. It is poor practice, however, to depend on this behavior; if +you want a special character to be ordinary outside a list, it's better +to always quote it, regardless. + + +File: regex.info, Node: Common Operators, Next: GNU Operators, Prev: Regular Expression Syntax, Up: Top + +Common Operators +**************** + + You compose regular expressions from operators. In the following +sections, we describe the regular expression operators specified by +POSIX; GNU also uses these. Most operators have more than one +representation as characters. *Note Regular Expression Syntax::, for +what characters represent what operators under what circumstances. + + For most operators that can be represented in two ways, one +representation is a single character and the other is that character +preceded by `\'. For example, either `(' or `\(' represents the +open-group operator. Which one does depends on the setting of a syntax +bit, in this case `RE_NO_BK_PARENS'. Why is this so? Historical +reasons dictate some of the varying representations, while POSIX +dictates others. + + Finally, almost all characters lose any special meaning inside a list +(*note List Operators::.). + +* Menu: + +* Match-self Operator:: Ordinary characters. +* Match-any-character Operator:: . +* Concatenation Operator:: Juxtaposition. +* Repetition Operators:: * + ? {} +* Alternation Operator:: | +* List Operators:: [...] [^...] +* Grouping Operators:: (...) +* Back-reference Operator:: \digit +* Anchoring Operators:: ^ $ + + +File: regex.info, Node: Match-self Operator, Next: Match-any-character Operator, Up: Common Operators + +The Match-self Operator (ORDINARY CHARACTER) +============================================ + + This operator matches the character itself. All ordinary characters +(*note Regular Expression Syntax::.) represent this operator. For +example, `f' is always an ordinary character, so the regular expression +`f' matches only the string `f'. In particular, it does *not* match +the string `ff'. + + +File: regex.info, Node: Match-any-character Operator, Next: Concatenation Operator, Prev: Match-self Operator, Up: Common Operators + +The Match-any-character Operator (`.') +====================================== + + This operator matches any single printing or nonprinting character +except it won't match a: + +newline + if the syntax bit `RE_DOT_NEWLINE' isn't set. + +null + if the syntax bit `RE_DOT_NOT_NULL' is set. + + The `.' (period) character represents this operator. For example, +`a.b' matches any three-character string beginning with `a' and ending +with `b'. + + +File: regex.info, Node: Concatenation Operator, Next: Repetition Operators, Prev: Match-any-character Operator, Up: Common Operators + +The Concatenation Operator +========================== + + This operator concatenates two regular expressions A and B. No +character represents this operator; you simply put B after A. The +result is a regular expression that will match a string if A matches +its first part and B matches the rest. For example, `xy' (two +match-self operators) matches `xy'. + + +File: regex.info, Node: Repetition Operators, Next: Alternation Operator, Prev: Concatenation Operator, Up: Common Operators + +Repetition Operators +==================== + + Repetition operators repeat the preceding regular expression a +specified number of times. + +* Menu: + +* Match-zero-or-more Operator:: * +* Match-one-or-more Operator:: + +* Match-zero-or-one Operator:: ? +* Interval Operators:: {} + + +File: regex.info, Node: Match-zero-or-more Operator, Next: Match-one-or-more Operator, Up: Repetition Operators + +The Match-zero-or-more Operator (`*') +------------------------------------- + + This operator repeats the smallest possible preceding regular +expression as many times as necessary (including zero) to match the +pattern. `*' represents this operator. For example, `o*' matches any +string made up of zero or more `o's. Since this operator operates on +the smallest preceding regular expression, `fo*' has a repeating `o', +not a repeating `fo'. So, `fo*' matches `f', `fo', `foo', and so on. + + Since the match-zero-or-more operator is a suffix operator, it may be +useless as such when no regular expression precedes it. This is the +case when it: + + * is first in a regular expression, or + + * follows a match-beginning-of-line, open-group, or alternation + operator. + +Three different things can happen in these cases: + + 1. If the syntax bit `RE_CONTEXT_INVALID_OPS' is set, then the + regular expression is invalid. + + 2. If `RE_CONTEXT_INVALID_OPS' isn't set, but `RE_CONTEXT_INDEP_OPS' + is, then `*' represents the match-zero-or-more operator (which + then operates on the empty string). + + 3. Otherwise, `*' is ordinary. + + + The matcher processes a match-zero-or-more operator by first matching +as many repetitions of the smallest preceding regular expression as it +can. Then it continues to match the rest of the pattern. + + If it can't match the rest of the pattern, it backtracks (as many +times as necessary), each time discarding one of the matches until it +can either match the entire pattern or be certain that it cannot get a +match. For example, when matching `ca*ar' against `caaar', the matcher +first matches all three `a's of the string with the `a*' of the regular +expression. However, it cannot then match the final `ar' of the +regular expression against the final `r' of the string. So it +backtracks, discarding the match of the last `a' in the string. It can +then match the remaining `ar'. + + +File: regex.info, Node: Match-one-or-more Operator, Next: Match-zero-or-one Operator, Prev: Match-zero-or-more Operator, Up: Repetition Operators + +The Match-one-or-more Operator (`+' or `\+') +-------------------------------------------- + + If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't +recognize this operator. Otherwise, if the syntax bit `RE_BK_PLUS_QM' +isn't set, then `+' represents this operator; if it is, then `\+' does. + + This operator is similar to the match-zero-or-more operator except +that it repeats the preceding regular expression at least once; *note +Match-zero-or-more Operator::., for what it operates on, how some +syntax bits affect it, and how Regex backtracks to match it. + + For example, supposing that `+' represents the match-one-or-more +operator; then `ca+r' matches, e.g., `car' and `caaaar', but not `cr'. + + +File: regex.info, Node: Match-zero-or-one Operator, Next: Interval Operators, Prev: Match-one-or-more Operator, Up: Repetition Operators + +The Match-zero-or-one Operator (`?' or `\?') +-------------------------------------------- + + If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't +recognize this operator. Otherwise, if the syntax bit `RE_BK_PLUS_QM' +isn't set, then `?' represents this operator; if it is, then `\?' does. + + This operator is similar to the match-zero-or-more operator except +that it repeats the preceding regular expression once or not at all; +*note Match-zero-or-more Operator::., to see what it operates on, how +some syntax bits affect it, and how Regex backtracks to match it. + + For example, supposing that `?' represents the match-zero-or-one +operator; then `ca?r' matches both `car' and `cr', but nothing else. + + +File: regex.info, Node: Interval Operators, Prev: Match-zero-or-one Operator, Up: Repetition Operators + +Interval Operators (`{' ... `}' or `\{' ... `\}') +------------------------------------------------- + + If the syntax bit `RE_INTERVALS' is set, then Regex recognizes +"interval expressions". They repeat the smallest possible preceding +regular expression a specified number of times. + + If the syntax bit `RE_NO_BK_BRACES' is set, `{' represents the +"open-interval operator" and `}' represents the "close-interval +operator" ; otherwise, `\{' and `\}' do. + + Specifically, supposing that `{' and `}' represent the open-interval +and close-interval operators; then: + +`{COUNT}' + matches exactly COUNT occurrences of the preceding regular + expression. + +`{MIN,}' + matches MIN or more occurrences of the preceding regular + expression. + +`{MIN, MAX}' + matches at least MIN but no more than MAX occurrences of the + preceding regular expression. + + The interval expression (but not necessarily the regular expression +that contains it) is invalid if: + + * MIN is greater than MAX, or + + * any of COUNT, MIN, or MAX are outside the range zero to + `RE_DUP_MAX' (which symbol `regex.h' defines). + + If the interval expression is invalid and the syntax bit +`RE_NO_BK_BRACES' is set, then Regex considers all the characters in +the would-be interval to be ordinary. If that bit isn't set, then the +regular expression is invalid. + + If the interval expression is valid but there is no preceding regular +expression on which to operate, then if the syntax bit +`RE_CONTEXT_INVALID_OPS' is set, the regular expression is invalid. If +that bit isn't set, then Regex considers all the characters--other than +backslashes, which it ignores--in the would-be interval to be ordinary. + + +File: regex.info, Node: Alternation Operator, Next: List Operators, Prev: Repetition Operators, Up: Common Operators + +The Alternation Operator (`|' or `\|') +====================================== + + If the syntax bit `RE_LIMITED_OPS' is set, then Regex doesn't +recognize this operator. Otherwise, if the syntax bit `RE_NO_BK_VBAR' +is set, then `|' represents this operator; otherwise, `\|' does. + + Alternatives match one of a choice of regular expressions: if you put +the character(s) representing the alternation operator between any two +regular expressions A and B, the result matches the union of the +strings that A and B match. For example, supposing that `|' is the +alternation operator, then `foo|bar|quux' would match any of `foo', +`bar' or `quux'. + + The alternation operator operates on the *largest* possible +surrounding regular expressions. (Put another way, it has the lowest +precedence of any regular expression operator.) Thus, the only way you +can delimit its arguments is to use grouping. For example, if `(' and +`)' are the open and close-group operators, then `fo(o|b)ar' would +match either `fooar' or `fobar'. (`foo|bar' would match `foo' or +`bar'.) + + The matcher usually tries all combinations of alternatives so as to +match the longest possible string. For example, when matching +`(fooq|foo)*(qbarquux|bar)' against `fooqbarquux', it cannot take, say, +the first ("depth-first") combination it could match, since then it +would be content to match just `fooqbar'. + + +File: regex.info, Node: List Operators, Next: Grouping Operators, Prev: Alternation Operator, Up: Common Operators + +List Operators (`[' ... `]' and `[^' ... `]') +============================================= + + "Lists", also called "bracket expressions", are a set of one or more +items. An "item" is a character, a character class expression, or a +range expression. The syntax bits affect which kinds of items you can +put in a list. We explain the last two items in subsections below. +Empty lists are invalid. + + A "matching list" matches a single character represented by one of +the list items. You form a matching list by enclosing one or more items +within an "open-matching-list operator" (represented by `[') and a +"close-list operator" (represented by `]'). + + For example, `[ab]' matches either `a' or `b'. `[ad]*' matches the +empty string and any string composed of just `a's and `d's in any +order. Regex considers invalid a regular expression with a `[' but no +matching `]'. + + "Nonmatching lists" are similar to matching lists except that they +match a single character *not* represented by one of the list items. +You use an "open-nonmatching-list operator" (represented by `[^'(1)) +instead of an open-matching-list operator to start a nonmatching list. + + For example, `[^ab]' matches any character except `a' or `b'. + + If the `posix_newline' field in the pattern buffer (*note GNU Pattern +Buffers::. is set, then nonmatching lists do not match a newline. + + Most characters lose any special meaning inside a list. The special +characters inside a list follow. + +`]' + ends the list if it's not the first list item. So, if you want to + make the `]' character a list item, you must put it first. + +`\' + quotes the next character if the syntax bit + `RE_BACKSLASH_ESCAPE_IN_LISTS' is set. + +`[:' + represents the open-character-class operator (*note Character + Class Operators::.) if the syntax bit `RE_CHAR_CLASSES' is set and + what follows is a valid character class expression. + +`:]' + represents the close-character-class operator if the syntax bit + `RE_CHAR_CLASSES' is set and what precedes it is an + open-character-class operator followed by a valid character class + name. + +`-' + represents the range operator (*note Range Operator::.) if it's + not first or last in a list or the ending point of a range. + +All other characters are ordinary. For example, `[.*]' matches `.' and +`*'. + +* Menu: + +* Character Class Operators:: [:class:] +* Range Operator:: start-end + + ---------- Footnotes ---------- + + (1) Regex therefore doesn't consider the `^' to be the first +character in the list. If you put a `^' character first in (what you +think is) a matching list, you'll turn it into a nonmatching list. + + +File: regex.info, Node: Character Class Operators, Next: Range Operator, Up: List Operators + +Character Class Operators (`[:' ... `:]') +----------------------------------------- + + If the syntax bit `RE_CHARACTER_CLASSES' is set, then Regex +recognizes character class expressions inside lists. A "character +class expression" matches one character from a given class. You form a +character class expression by putting a character class name between an +"open-character-class operator" (represented by `[:') and a +"close-character-class operator" (represented by `:]'). The character +class names and their meanings are: + +`alnum' + letters and digits + +`alpha' + letters + +`blank' + system-dependent; for GNU, a space or tab + +`cntrl' + control characters (in the ASCII encoding, code 0177 and codes + less than 040) + +`digit' + digits + +`graph' + same as `print' except omits space + +`lower' + lowercase letters + +`print' + printable characters (in the ASCII encoding, space tilde--codes + 040 through 0176) + +`punct' + neither control nor alphanumeric characters + +`space' + space, carriage return, newline, vertical tab, and form feed + +`upper' + uppercase letters + +`xdigit' + hexadecimal digits: `0'-`9', `a'-`f', `A'-`F' + +These correspond to the definitions in the C library's `<ctype.h>' +facility. For example, `[:alpha:]' corresponds to the standard +facility `isalpha'. Regex recognizes character class expressions only +inside of lists; so `[[:alpha:]]' matches any letter, but `[:alpha:]' +outside of a bracket expression and not followed by a repetition +operator matches just itself. + + +File: regex.info, Node: Range Operator, Prev: Character Class Operators, Up: List Operators + +The Range Operator (`-') +------------------------ + + Regex recognizes "range expressions" inside a list. They represent +those characters that fall between two elements in the current +collating sequence. You form a range expression by putting a "range +operator" between two characters.(1) `-' represents the range operator. +For example, `a-f' within a list represents all the characters from `a' +through `f' inclusively. + + If the syntax bit `RE_NO_EMPTY_RANGES' is set, then if the range's +ending point collates less than its starting point, the range (and the +regular expression containing it) is invalid. For example, the regular +expression `[z-a]' would be invalid. If this bit isn't set, then Regex +considers such a range to be empty. + + Since `-' represents the range operator, if you want to make a `-' +character itself a list item, you must do one of the following: + + * Put the `-' either first or last in the list. + + * Include a range whose starting point collates strictly lower than + `-' and whose ending point collates equal or higher. Unless a + range is the first item in a list, a `-' can't be its starting + point, but *can* be its ending point. That is because Regex + considers `-' to be the range operator unless it is preceded by + another `-'. For example, in the ASCII encoding, `)', `*', `+', + `,', `-', `.', and `/' are contiguous characters in the collating + sequence. You might think that `[)-+--/]' has two ranges: `)-+' + and `--/'. Rather, it has the ranges `)-+' and `+--', plus the + character `/', so it matches, e.g., `,', not `.'. + + * Put a range whose starting point is `-' first in the list. + + For example, `[-a-z]' matches a lowercase letter or a hyphen (in +English, in ASCII). + + ---------- Footnotes ---------- + + (1) You can't use a character class for the starting or ending point +of a range, since a character class is not a single character. + + +File: regex.info, Node: Grouping Operators, Next: Back-reference Operator, Prev: List Operators, Up: Common Operators + +Grouping Operators (`(' ... `)' or `\(' ... `\)') +================================================= + + A "group", also known as a "subexpression", consists of an +"open-group operator", any number of other operators, and a +"close-group operator". Regex treats this sequence as a unit, just as +mathematics and programming languages treat a parenthesized expression +as a unit. + + Therefore, using "groups", you can: + + * delimit the argument(s) to an alternation operator (*note + Alternation Operator::.) or a repetition operator (*note + Repetition Operators::.). + + * keep track of the indices of the substring that matched a given + group. *Note Using Registers::, for a precise explanation. This + lets you: + + * use the back-reference operator (*note Back-reference + Operator::.). + + * use registers (*note Using Registers::.). + + If the syntax bit `RE_NO_BK_PARENS' is set, then `(' represents the +open-group operator and `)' represents the close-group operator; +otherwise, `\(' and `\)' do. + + If the syntax bit `RE_UNMATCHED_RIGHT_PAREN_ORD' is set and a +close-group operator has no matching open-group operator, then Regex +considers it to match `)'. + + +File: regex.info, Node: Back-reference Operator, Next: Anchoring Operators, Prev: Grouping Operators, Up: Common Operators + +The Back-reference Operator ("\"DIGIT) +====================================== + + If the syntax bit `RE_NO_BK_REF' isn't set, then Regex recognizes +back references. A back reference matches a specified preceding group. +The back reference operator is represented by `\DIGIT' anywhere after +the end of a regular expression's DIGIT-th group (*note Grouping +Operators::.). + + DIGIT must be between `1' and `9'. The matcher assigns numbers 1 +through 9 to the first nine groups it encounters. By using one of `\1' +through `\9' after the corresponding group's close-group operator, you +can match a substring identical to the one that the group does. + + Back references match according to the following (in all examples +below, `(' represents the open-group, `)' the close-group, `{' the +open-interval and `}' the close-interval operator): + + * If the group matches a substring, the back reference matches an + identical substring. For example, `(a)\1' matches `aa' and + `(bana)na\1bo\1' matches `bananabanabobana'. Likewise, `(.*)\1' + matches any (newline-free if the syntax bit `RE_DOT_NEWLINE' isn't + set) string that is composed of two identical halves; the `(.*)' + matches the first half and the `\1' matches the second half. + + * If the group matches more than once (as it might if followed by, + e.g., a repetition operator), then the back reference matches the + substring the group *last* matched. For example, `((a*)b)*\1\2' + matches `aabababa'; first group 1 (the outer one) matches `aab' + and group 2 (the inner one) matches `aa'. Then group 1 matches + `ab' and group 2 matches `a'. So, `\1' matches `ab' and `\2' + matches `a'. + + * If the group doesn't participate in a match, i.e., it is part of an + alternative not taken or a repetition operator allows zero + repetitions of it, then the back reference makes the whole match + fail. For example, `(one()|two())-and-(three\2|four\3)' matches + `one-and-three' and `two-and-four', but not `one-and-four' or + `two-and-three'. For example, if the pattern matches `one-and-', + then its group 2 matches the empty string and its group 3 doesn't + participate in the match. So, if it then matches `four', then + when it tries to back reference group 3--which it will attempt to + do because `\3' follows the `four'--the match will fail because + group 3 didn't participate in the match. + + You can use a back reference as an argument to a repetition operator. +For example, `(a(b))\2*' matches `a' followed by two or more `b's. +Similarly, `(a(b))\2{3}' matches `abbbb'. + + If there is no preceding DIGIT-th subexpression, the regular +expression is invalid. + + +File: regex.info, Node: Anchoring Operators, Prev: Back-reference Operator, Up: Common Operators + +Anchoring Operators +=================== + + These operators can constrain a pattern to match only at the +beginning or end of the entire string or at the beginning or end of a +line. + +* Menu: + +* Match-beginning-of-line Operator:: ^ +* Match-end-of-line Operator:: $ + + +File: regex.info, Node: Match-beginning-of-line Operator, Next: Match-end-of-line Operator, Up: Anchoring Operators + +The Match-beginning-of-line Operator (`^') +------------------------------------------ + + This operator can match the empty string either at the beginning of +the string or after a newline character. Thus, it is said to "anchor" +the pattern to the beginning of a line. + + In the cases following, `^' represents this operator. (Otherwise, +`^' is ordinary.) + + * It (the `^') is first in the pattern, as in `^foo'. + + * The syntax bit `RE_CONTEXT_INDEP_ANCHORS' is set, and it is outside + a bracket expression. + + * It follows an open-group or alternation operator, as in `a\(^b\)' + and `a\|^b'. *Note Grouping Operators::, and *Note Alternation + Operator::. + + These rules imply that some valid patterns containing `^' cannot be +matched; for example, `foo^bar' if `RE_CONTEXT_INDEP_ANCHORS' is set. + + If the `not_bol' field is set in the pattern buffer (*note GNU +Pattern Buffers::.), then `^' fails to match at the beginning of the +string. *Note POSIX Matching::, for when you might find this useful. + + If the `newline_anchor' field is set in the pattern buffer, then `^' +fails to match after a newline. This is useful when you do not regard +the string to be matched as broken into lines. + + +File: regex.info, Node: Match-end-of-line Operator, Prev: Match-beginning-of-line Operator, Up: Anchoring Operators + +The Match-end-of-line Operator (`$') +------------------------------------ + + This operator can match the empty string either at the end of the +string or before a newline character in the string. Thus, it is said +to "anchor" the pattern to the end of a line. + + It is always represented by `$'. For example, `foo$' usually +matches, e.g., `foo' and, e.g., the first three characters of +`foo\nbar'. + + Its interaction with the syntax bits and pattern buffer fields is +exactly the dual of `^''s; see the previous section. (That is, +"beginning" becomes "end", "next" becomes "previous", and "after" +becomes "before".) + + +File: regex.info, Node: GNU Operators, Next: GNU Emacs Operators, Prev: Common Operators, Up: Top + +GNU Operators +************* + + Following are operators that GNU defines (and POSIX doesn't). + +* Menu: + +* Word Operators:: +* Buffer Operators:: + + +File: regex.info, Node: Word Operators, Next: Buffer Operators, Up: GNU Operators + +Word Operators +============== + + The operators in this section require Regex to recognize parts of +words. Regex uses a syntax table to determine whether or not a +character is part of a word, i.e., whether or not it is +"word-constituent". + +* Menu: + +* Non-Emacs Syntax Tables:: +* Match-word-boundary Operator:: \b +* Match-within-word Operator:: \B +* Match-beginning-of-word Operator:: \< +* Match-end-of-word Operator:: \> +* Match-word-constituent Operator:: \w +* Match-non-word-constituent Operator:: \W + + +File: regex.info, Node: Non-Emacs Syntax Tables, Next: Match-word-boundary Operator, Up: Word Operators + +Non-Emacs Syntax Tables +----------------------- + + A "syntax table" is an array indexed by the characters in your +character set. In the ASCII encoding, therefore, a syntax table has +256 elements. Regex always uses a `char *' variable `re_syntax_table' +as its syntax table. In some cases, it initializes this variable and +in others it expects you to initialize it. + + * If Regex is compiled with the preprocessor symbols `emacs' and + `SYNTAX_TABLE' both undefined, then Regex allocates + `re_syntax_table' and initializes an element I either to `Sword' + (which it defines) if I is a letter, number, or `_', or to zero if + it's not. + + * If Regex is compiled with `emacs' undefined but `SYNTAX_TABLE' + defined, then Regex expects you to define a `char *' variable + `re_syntax_table' to be a valid syntax table. + + * *Note Emacs Syntax Tables::, for what happens when Regex is + compiled with the preprocessor symbol `emacs' defined. + + +File: regex.info, Node: Match-word-boundary Operator, Next: Match-within-word Operator, Prev: Non-Emacs Syntax Tables, Up: Word Operators + +The Match-word-boundary Operator (`\b') +--------------------------------------- + + This operator (represented by `\b') matches the empty string at +either the beginning or the end of a word. For example, `\brat\b' +matches the separate word `rat'. + + +File: regex.info, Node: Match-within-word Operator, Next: Match-beginning-of-word Operator, Prev: Match-word-boundary Operator, Up: Word Operators + +The Match-within-word Operator (`\B') +------------------------------------- + + This operator (represented by `\B') matches the empty string within a +word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat' +doesn't match `dirty rat'. + + +File: regex.info, Node: Match-beginning-of-word Operator, Next: Match-end-of-word Operator, Prev: Match-within-word Operator, Up: Word Operators + +The Match-beginning-of-word Operator (`\<') +------------------------------------------- + + This operator (represented by `\<') matches the empty string at the +beginning of a word. + + +File: regex.info, Node: Match-end-of-word Operator, Next: Match-word-constituent Operator, Prev: Match-beginning-of-word Operator, Up: Word Operators + +The Match-end-of-word Operator (`\>') +------------------------------------- + + This operator (represented by `\>') matches the empty string at the +end of a word. + + +File: regex.info, Node: Match-word-constituent Operator, Next: Match-non-word-constituent Operator, Prev: Match-end-of-word Operator, Up: Word Operators + +The Match-word-constituent Operator (`\w') +------------------------------------------ + + This operator (represented by `\w') matches any word-constituent +character. + + +File: regex.info, Node: Match-non-word-constituent Operator, Prev: Match-word-constituent Operator, Up: Word Operators + +The Match-non-word-constituent Operator (`\W') +---------------------------------------------- + + This operator (represented by `\W') matches any character that is not +word-constituent. + + +File: regex.info, Node: Buffer Operators, Prev: Word Operators, Up: GNU Operators + +Buffer Operators +================ + + Following are operators which work on buffers. In Emacs, a "buffer" +is, naturally, an Emacs buffer. For other programs, Regex considers the +entire string to be matched as the buffer. + +* Menu: + +* Match-beginning-of-buffer Operator:: \` +* Match-end-of-buffer Operator:: \' + + +File: regex.info, Node: Match-beginning-of-buffer Operator, Next: Match-end-of-buffer Operator, Up: Buffer Operators + +The Match-beginning-of-buffer Operator (`\`') +--------------------------------------------- + + This operator (represented by `\`') matches the empty string at the +beginning of the buffer. + + +File: regex.info, Node: Match-end-of-buffer Operator, Prev: Match-beginning-of-buffer Operator, Up: Buffer Operators + +The Match-end-of-buffer Operator (`\'') +--------------------------------------- + + This operator (represented by `\'') matches the empty string at the +end of the buffer. + + +File: regex.info, Node: GNU Emacs Operators, Next: What Gets Matched?, Prev: GNU Operators, Up: Top + +GNU Emacs Operators +******************* + + Following are operators that GNU defines (and POSIX doesn't) that you +can use only when Regex is compiled with the preprocessor symbol +`emacs' defined. + +* Menu: + +* Syntactic Class Operators:: + + +File: regex.info, Node: Syntactic Class Operators, Up: GNU Emacs Operators + +Syntactic Class Operators +========================= + + The operators in this section require Regex to recognize the syntactic +classes of characters. Regex uses a syntax table to determine this. + +* Menu: + +* Emacs Syntax Tables:: +* Match-syntactic-class Operator:: \sCLASS +* Match-not-syntactic-class Operator:: \SCLASS + + +File: regex.info, Node: Emacs Syntax Tables, Next: Match-syntactic-class Operator, Up: Syntactic Class Operators + +Emacs Syntax Tables +------------------- + + A "syntax table" is an array indexed by the characters in your +character set. In the ASCII encoding, therefore, a syntax table has +256 elements. + + If Regex is compiled with the preprocessor symbol `emacs' defined, +then Regex expects you to define and initialize the variable +`re_syntax_table' to be an Emacs syntax table. Emacs' syntax tables +are more complicated than Regex's own (*note Non-Emacs Syntax +Tables::.). *Note Syntax: (emacs)Syntax, for a description of Emacs' +syntax tables. + + +File: regex.info, Node: Match-syntactic-class Operator, Next: Match-not-syntactic-class Operator, Prev: Emacs Syntax Tables, Up: Syntactic Class Operators + +The Match-syntactic-class Operator (`\s'CLASS) +---------------------------------------------- + + This operator matches any character whose syntactic class is +represented by a specified character. `\sCLASS' represents this +operator where CLASS is the character representing the syntactic class +you want. For example, `w' represents the syntactic class of +word-constituent characters, so `\sw' matches any word-constituent +character. + + +File: regex.info, Node: Match-not-syntactic-class Operator, Prev: Match-syntactic-class Operator, Up: Syntactic Class Operators + +The Match-not-syntactic-class Operator (`\S'CLASS) +-------------------------------------------------- + + This operator is similar to the match-syntactic-class operator except +that it matches any character whose syntactic class is *not* +represented by the specified character. `\SCLASS' represents this +operator. For example, `w' represents the syntactic class of +word-constituent characters, so `\Sw' matches any character that is not +word-constituent. + + +File: regex.info, Node: What Gets Matched?, Next: Programming with Regex, Prev: GNU Emacs Operators, Up: Top + +What Gets Matched? +****************** + + Regex usually matches strings according to the "leftmost longest" +rule; that is, it chooses the longest of the leftmost matches. This +does not mean that for a regular expression containing subexpressions +that it simply chooses the longest match for each subexpression, left to +right; the overall match must also be the longest possible one. + + For example, `(ac*)(c*d[ac]*)\1' matches `acdacaaa', not `acdac', as +it would if it were to choose the longest match for the first +subexpression. + + +File: regex.info, Node: Programming with Regex, Next: Copying, Prev: What Gets Matched?, Up: Top + +Programming with Regex +********************** + + Here we describe how you use the Regex data structures and functions +in C programs. Regex has three interfaces: one designed for GNU, one +compatible with POSIX and one compatible with Berkeley UNIX. + +* Menu: + +* GNU Regex Functions:: +* POSIX Regex Functions:: +* BSD Regex Functions:: + + +File: regex.info, Node: GNU Regex Functions, Next: POSIX Regex Functions, Up: Programming with Regex + +GNU Regex Functions +=================== + + If you're writing code that doesn't need to be compatible with either +POSIX or Berkeley UNIX, you can use these functions. They provide more +options than the other interfaces. + +* Menu: + +* GNU Pattern Buffers:: The re_pattern_buffer type. +* GNU Regular Expression Compiling:: re_compile_pattern () +* GNU Matching:: re_match () +* GNU Searching:: re_search () +* Matching/Searching with Split Data:: re_match_2 (), re_search_2 () +* Searching with Fastmaps:: re_compile_fastmap () +* GNU Translate Tables:: The `translate' field. +* Using Registers:: The re_registers type and related fns. +* Freeing GNU Pattern Buffers:: regfree () + + +File: regex.info, Node: GNU Pattern Buffers, Next: GNU Regular Expression Compiling, Up: GNU Regex Functions + +GNU Pattern Buffers +------------------- + + To compile, match, or search for a given regular expression, you must +supply a pattern buffer. A "pattern buffer" holds one compiled regular +expression.(1) + + You can have several different pattern buffers simultaneously, each +holding a compiled pattern for a different regular expression. + + `regex.h' defines the pattern buffer `struct' as follows: + + /* Space that holds the compiled pattern. It is declared as + `unsigned char *' because its elements are + sometimes used as array indexes. */ + unsigned char *buffer; + + /* Number of bytes to which `buffer' points. */ + unsigned long allocated; + + /* Number of bytes actually used in `buffer'. */ + unsigned long used; + + /* Syntax setting with which the pattern was compiled. */ + reg_syntax_t syntax; + + /* Pointer to a fastmap, if any, otherwise zero. re_search uses + the fastmap, if there is one, to skip over impossible + starting points for matches. */ + char *fastmap; + + /* Either a translate table to apply to all characters before + comparing them, or zero for no translation. The translation + is applied to a pattern when it is compiled and to a string + when it is matched. */ + char *translate; + + /* Number of subexpressions found by the compiler. */ + size_t re_nsub; + + /* Zero if this pattern cannot match the empty string, one else. + Well, in truth it's used only in `re_search_2', to see + whether or not we should use the fastmap, so we don't set + this absolutely perfectly; see `re_compile_fastmap' (the + `duplicate' case). */ + unsigned can_be_null : 1; + + /* If REGS_UNALLOCATED, allocate space in the `regs' structure + for `max (RE_NREGS, re_nsub + 1)' groups. + If REGS_REALLOCATE, reallocate space if necessary. + If REGS_FIXED, use what's there. */ + #define REGS_UNALLOCATED 0 + #define REGS_REALLOCATE 1 + #define REGS_FIXED 2 + unsigned regs_allocated : 2; + + /* Set to zero when `regex_compile' compiles a pattern; set to one + by `re_compile_fastmap' if it updates the fastmap. */ + unsigned fastmap_accurate : 1; + + /* If set, `re_match_2' does not return information about + subexpressions. */ + unsigned no_sub : 1; + + /* If set, a beginning-of-line anchor doesn't match at the + beginning of the string. */ + unsigned not_bol : 1; + + /* Similarly for an end-of-line anchor. */ + unsigned not_eol : 1; + + /* If true, an anchor at a newline matches. */ + unsigned newline_anchor : 1; + + ---------- Footnotes ---------- + + (1) Regular expressions are also referred to as "patterns," hence +the name "pattern buffer." + + +File: regex.info, Node: GNU Regular Expression Compiling, Next: GNU Matching, Prev: GNU Pattern Buffers, Up: GNU Regex Functions + +GNU Regular Expression Compiling +-------------------------------- + + In GNU, you can both match and search for a given regular expression. +To do either, you must first compile it in a pattern buffer (*note GNU +Pattern Buffers::.). + + Regular expressions match according to the syntax with which they were +compiled; with GNU, you indicate what syntax you want by setting the +variable `re_syntax_options' (declared in `regex.h' and defined in +`regex.c') before calling the compiling function, `re_compile_pattern' +(see below). *Note Syntax Bits::, and *Note Predefined Syntaxes::. + + You can change the value of `re_syntax_options' at any time. +Usually, however, you set its value once and then never change it. + + `re_compile_pattern' takes a pattern buffer as an argument. You must +initialize the following fields: + +`translate initialization' +`translate' + Initialize this to point to a translate table if you want one, or + to zero if you don't. We explain translate tables in *Note GNU + Translate Tables::. + +`fastmap' + Initialize this to nonzero if you want a fastmap, or to zero if you + don't. + +`buffer' +`allocated' + If you want `re_compile_pattern' to allocate memory for the + compiled pattern, set both of these to zero. If you have an + existing block of memory (allocated with `malloc') you want Regex + to use, set `buffer' to its address and `allocated' to its size (in + bytes). + + `re_compile_pattern' uses `realloc' to extend the space for the + compiled pattern as necessary. + + To compile a pattern buffer, use: + + char * + re_compile_pattern (const char *REGEX, const int REGEX_SIZE, + struct re_pattern_buffer *PATTERN_BUFFER) + +REGEX is the regular expression's address, REGEX_SIZE is its length, +and PATTERN_BUFFER is the pattern buffer's address. + + If `re_compile_pattern' successfully compiles the regular expression, +it returns zero and sets `*PATTERN_BUFFER' to the compiled pattern. It +sets the pattern buffer's fields as follows: + +`buffer' + to the compiled pattern. + +`used' + to the number of bytes the compiled pattern in `buffer' occupies. + +`syntax' + to the current value of `re_syntax_options'. + +`re_nsub' + to the number of subexpressions in REGEX. + +`fastmap_accurate' + to zero on the theory that the pattern you're compiling is + different than the one previously compiled into `buffer'; in that + case (since you can't make a fastmap without a compiled pattern), + `fastmap' would either contain an incompatible fastmap, or nothing + at all. + + If `re_compile_pattern' can't compile REGEX, it returns an error +string corresponding to one of the errors listed in *Note POSIX Regular +Expression Compiling::. + + +File: regex.info, Node: GNU Matching, Next: GNU Searching, Prev: GNU Regular Expression Compiling, Up: GNU Regex Functions + +GNU Matching +------------ + + Matching the GNU way means trying to match as much of a string as +possible starting at a position within it you specify. Once you've +compiled a pattern into a pattern buffer (*note GNU Regular Expression +Compiling::.), you can ask the matcher to match that pattern against a +string using: + + int + re_match (struct re_pattern_buffer *PATTERN_BUFFER, + const char *STRING, const int SIZE, + const int START, struct re_registers *REGS) + +PATTERN_BUFFER is the address of a pattern buffer containing a compiled +pattern. STRING is the string you want to match; it can contain +newline and null characters. SIZE is the length of that string. START +is the string index at which you want to begin matching; the first +character of STRING is at index zero. *Note Using Registers::, for a +explanation of REGS; you can safely pass zero. + + `re_match' matches the regular expression in PATTERN_BUFFER against +the string STRING according to the syntax in PATTERN_BUFFERS's `syntax' +field. (*Note GNU Regular Expression Compiling::, for how to set it.) +The function returns -1 if the compiled pattern does not match any part +of STRING and -2 if an internal error happens; otherwise, it returns +how many (possibly zero) characters of STRING the pattern matched. + + An example: suppose PATTERN_BUFFER points to a pattern buffer +containing the compiled pattern for `a*', and STRING points to `aaaaab' +(whereupon SIZE should be 6). Then if START is 2, `re_match' returns 3, +i.e., `a*' would have matched the last three `a's in STRING. If START +is 0, `re_match' returns 5, i.e., `a*' would have matched all the `a's +in STRING. If START is either 5 or 6, it returns zero. + + If START is not between zero and SIZE, then `re_match' returns -1. + + +File: regex.info, Node: GNU Searching, Next: Matching/Searching with Split Data, Prev: GNU Matching, Up: GNU Regex Functions + +GNU Searching +------------- + + "Searching" means trying to match starting at successive positions +within a string. The function `re_search' does this. + + Before calling `re_search', you must compile your regular expression. +*Note GNU Regular Expression Compiling::. + + Here is the function declaration: + + int + re_search (struct re_pattern_buffer *PATTERN_BUFFER, + const char *STRING, const int SIZE, + const int START, const int RANGE, + struct re_registers *REGS) + +whose arguments are the same as those to `re_match' (*note GNU +Matching::.) except that the two arguments START and RANGE replace +`re_match''s argument START. + + If RANGE is positive, then `re_search' attempts a match starting +first at index START, then at START + 1 if that fails, and so on, up to +START + RANGE; if RANGE is negative, then it attempts a match starting +first at index START, then at START -1 if that fails, and so on. + + If START is not between zero and SIZE, then `re_search' returns -1. +When RANGE is positive, `re_search' adjusts RANGE so that START + RANGE +- 1 is between zero and SIZE, if necessary; that way it won't search +outside of STRING. Similarly, when RANGE is negative, `re_search' +adjusts RANGE so that START + RANGE + 1 is between zero and SIZE, if +necessary. + + If the `fastmap' field of PATTERN_BUFFER is zero, `re_search' matches +starting at consecutive positions; otherwise, it uses `fastmap' to make +the search more efficient. *Note Searching with Fastmaps::. + + If no match is found, `re_search' returns -1. If a match is found, +it returns the index where the match began. If an internal error +happens, it returns -2. + + +File: regex.info, Node: Matching/Searching with Split Data, Next: Searching with Fastmaps, Prev: GNU Searching, Up: GNU Regex Functions + +Matching and Searching with Split Data +-------------------------------------- + + Using the functions `re_match_2' and `re_search_2', you can match or +search in data that is divided into two strings. + + The function: + + int + re_match_2 (struct re_pattern_buffer *BUFFER, + const char *STRING1, const int SIZE1, + const char *STRING2, const int SIZE2, + const int START, + struct re_registers *REGS, + const int STOP) + +is similar to `re_match' (*note GNU Matching::.) except that you pass +*two* data strings and sizes, and an index STOP beyond which you don't +want the matcher to try matching. As with `re_match', if it succeeds, +`re_match_2' returns how many characters of STRING it matched. Regard +STRING1 and STRING2 as concatenated when you set the arguments START and +STOP and use the contents of REGS; `re_match_2' never returns a value +larger than SIZE1 + SIZE2. + + The function: + + int + re_search_2 (struct re_pattern_buffer *BUFFER, + const char *STRING1, const int SIZE1, + const char *STRING2, const int SIZE2, + const int START, const int RANGE, + struct re_registers *REGS, + const int STOP) + +is similarly related to `re_search'. + + +File: regex.info, Node: Searching with Fastmaps, Next: GNU Translate Tables, Prev: Matching/Searching with Split Data, Up: GNU Regex Functions + +Searching with Fastmaps +----------------------- + + If you're searching through a long string, you should use a fastmap. +Without one, the searcher tries to match at consecutive positions in the +string. Generally, most of the characters in the string could not start +a match. It takes much longer to try matching at a given position in +the string than it does to check in a table whether or not the +character at that position could start a match. A "fastmap" is such a +table. + + More specifically, a fastmap is an array indexed by the characters in +your character set. Under the ASCII encoding, therefore, a fastmap has +256 elements. If you want the searcher to use a fastmap with a given +pattern buffer, you must allocate the array and assign the array's +address to the pattern buffer's `fastmap' field. You either can +compile the fastmap yourself or have `re_search' do it for you; when +`fastmap' is nonzero, it automatically compiles a fastmap the first +time you search using a particular compiled pattern. + + To compile a fastmap yourself, use: + + int + re_compile_fastmap (struct re_pattern_buffer *PATTERN_BUFFER) + +PATTERN_BUFFER is the address of a pattern buffer. If the character C +could start a match for the pattern, `re_compile_fastmap' makes +`PATTERN_BUFFER->fastmap[C]' nonzero. It returns 0 if it can compile a +fastmap and -2 if there is an internal error. For example, if `|' is +the alternation operator and PATTERN_BUFFER holds the compiled pattern +for `a|b', then `re_compile_fastmap' sets `fastmap['a']' and +`fastmap['b']' (and no others). + + `re_search' uses a fastmap as it moves along in the string: it checks +the string's characters until it finds one that's in the fastmap. Then +it tries matching at that character. If the match fails, it repeats +the process. So, by using a fastmap, `re_search' doesn't waste time +trying to match at positions in the string that couldn't start a match. + + If you don't want `re_search' to use a fastmap, store zero in the +`fastmap' field of the pattern buffer before calling `re_search'. + + Once you've initialized a pattern buffer's `fastmap' field, you need +never do so again--even if you compile a new pattern in it--provided +the way the field is set still reflects whether or not you want a +fastmap. `re_search' will still either do nothing if `fastmap' is null +or, if it isn't, compile a new fastmap for the new pattern. + + +File: regex.info, Node: GNU Translate Tables, Next: Using Registers, Prev: Searching with Fastmaps, Up: GNU Regex Functions + +GNU Translate Tables +-------------------- + + If you set the `translate' field of a pattern buffer to a translate +table, then the GNU Regex functions to which you've passed that pattern +buffer use it to apply a simple transformation to all the regular +expression and string characters at which they look. + + A "translate table" is an array indexed by the characters in your +character set. Under the ASCII encoding, therefore, a translate table +has 256 elements. The array's elements are also characters in your +character set. When the Regex functions see a character C, they use +`translate[C]' in its place, with one exception: the character after a +`\' is not translated. (This ensures that, the operators, e.g., `\B' +and `\b', are always distinguishable.) + + For example, a table that maps all lowercase letters to the +corresponding uppercase ones would cause the matcher to ignore +differences in case.(1) Such a table would map all characters except +lowercase letters to themselves, and lowercase letters to the +corresponding uppercase ones. Under the ASCII encoding, here's how you +could initialize such a table (we'll call it `case_fold'): + + for (i = 0; i < 256; i++) + case_fold[i] = i; + for (i = 'a'; i <= 'z'; i++) + case_fold[i] = i - ('a' - 'A'); + + You tell Regex to use a translate table on a given pattern buffer by +assigning that table's address to the `translate' field of that buffer. +If you don't want Regex to do any translation, put zero into this +field. You'll get weird results if you change the table's contents +anytime between compiling the pattern buffer, compiling its fastmap, and +matching or searching with the pattern buffer. + + ---------- Footnotes ---------- + + (1) A table that maps all uppercase letters to the corresponding +lowercase ones would work just as well for this purpose. + + +File: regex.info, Node: Using Registers, Next: Freeing GNU Pattern Buffers, Prev: GNU Translate Tables, Up: GNU Regex Functions + +Using Registers +--------------- + + A group in a regular expression can match a (posssibly empty) +substring of the string that regular expression as a whole matched. +The matcher remembers the beginning and end of the substring matched by +each group. + + To find out what they matched, pass a nonzero REGS argument to a GNU +matching or searching function (*note GNU Matching::. and *Note GNU +Searching::), i.e., the address of a structure of this type, as defined +in `regex.h': + + struct re_registers + { + unsigned num_regs; + regoff_t *start; + regoff_t *end; + }; + + Except for (possibly) the NUM_REGS'th element (see below), the Ith +element of the `start' and `end' arrays records information about the +Ith group in the pattern. (They're declared as C pointers, but this is +only because not all C compilers accept zero-length arrays; +conceptually, it is simplest to think of them as arrays.) + + The `start' and `end' arrays are allocated in various ways, depending +on the value of the `regs_allocated' field in the pattern buffer passed +to the matcher. + + The simplest and perhaps most useful is to let the matcher +(re)allocate enough space to record information for all the groups in +the regular expression. If `regs_allocated' is `REGS_UNALLOCATED', the +matcher allocates 1 + RE_NSUB (another field in the pattern buffer; +*note GNU Pattern Buffers::.). The extra element is set to -1, and +sets `regs_allocated' to `REGS_REALLOCATE'. Then on subsequent calls +with the same pattern buffer and REGS arguments, the matcher +reallocates more space if necessary. + + It would perhaps be more logical to make the `regs_allocated' field +part of the `re_registers' structure, instead of part of the pattern +buffer. But in that case the caller would be forced to initialize the +structure before passing it. Much existing code doesn't do this +initialization, and it's arguably better to avoid it anyway. + + `re_compile_pattern' sets `regs_allocated' to `REGS_UNALLOCATED', so +if you use the GNU regular expression functions, you get this behavior +by default. + + xx document re_set_registers + + POSIX, on the other hand, requires a different interface: the caller +is supposed to pass in a fixed-length array which the matcher fills. +Therefore, if `regs_allocated' is `REGS_FIXED' the matcher simply fills +that array. + + The following examples illustrate the information recorded in the +`re_registers' structure. (In all of them, `(' represents the +open-group and `)' the close-group operator. The first character in +the string STRING is at index 0.) + + * If the regular expression has an I-th group not contained within + another group that matches a substring of STRING, then the + function sets `REGS->start[I]' to the index in STRING where the + substring matched by the I-th group begins, and `REGS->end[I]' to + the index just beyond that substring's end. The function sets + `REGS->start[0]' and `REGS->end[0]' to analogous information about + the entire pattern. + + For example, when you match `((a)(b))' against `ab', you get: + + * 0 in `REGS->start[0]' and 2 in `REGS->end[0]' + + * 0 in `REGS->start[1]' and 2 in `REGS->end[1]' + + * 0 in `REGS->start[2]' and 1 in `REGS->end[2]' + + * 1 in `REGS->start[3]' and 2 in `REGS->end[3]' + + * If a group matches more than once (as it might if followed by, + e.g., a repetition operator), then the function reports the + information about what the group *last* matched. + + For example, when you match the pattern `(a)*' against the string + `aa', you get: + + * 0 in `REGS->start[0]' and 2 in `REGS->end[0]' + + * 1 in `REGS->start[1]' and 2 in `REGS->end[1]' + + * If the I-th group does not participate in a successful match, + e.g., it is an alternative not taken or a repetition operator + allows zero repetitions of it, then the function sets + `REGS->start[I]' and `REGS->end[I]' to -1. + + For example, when you match the pattern `(a)*b' against the string + `b', you get: + + * 0 in `REGS->start[0]' and 1 in `REGS->end[0]' + + * -1 in `REGS->start[1]' and -1 in `REGS->end[1]' + + * If the I-th group matches a zero-length string, then the function + sets `REGS->start[I]' and `REGS->end[I]' to the index just beyond + that zero-length string. + + For example, when you match the pattern `(a*)b' against the string + `b', you get: + + * 0 in `REGS->start[0]' and 1 in `REGS->end[0]' + + * 0 in `REGS->start[1]' and 0 in `REGS->end[1]' + + * If an I-th group contains a J-th group in turn not contained + within any other group within group I and the function reports a + match of the I-th group, then it records in `REGS->start[J]' and + `REGS->end[J]' the last match (if it matched) of the J-th group. + + For example, when you match the pattern `((a*)b)*' against the + string `abb', group 2 last matches the empty string, so you get + what it previously matched: + + * 0 in `REGS->start[0]' and 3 in `REGS->end[0]' + + * 2 in `REGS->start[1]' and 3 in `REGS->end[1]' + + * 2 in `REGS->start[2]' and 2 in `REGS->end[2]' + + When you match the pattern `((a)*b)*' against the string `abb', + group 2 doesn't participate in the last match, so you get: + + * 0 in `REGS->start[0]' and 3 in `REGS->end[0]' + + * 2 in `REGS->start[1]' and 3 in `REGS->end[1]' + + * 0 in `REGS->start[2]' and 1 in `REGS->end[2]' + + * If an I-th group contains a J-th group in turn not contained + within any other group within group I and the function sets + `REGS->start[I]' and `REGS->end[I]' to -1, then it also sets + `REGS->start[J]' and `REGS->end[J]' to -1. + + For example, when you match the pattern `((a)*b)*c' against the + string `c', you get: + + * 0 in `REGS->start[0]' and 1 in `REGS->end[0]' + + * -1 in `REGS->start[1]' and -1 in `REGS->end[1]' + + * -1 in `REGS->start[2]' and -1 in `REGS->end[2]' + + +File: regex.info, Node: Freeing GNU Pattern Buffers, Prev: Using Registers, Up: GNU Regex Functions + +Freeing GNU Pattern Buffers +--------------------------- + + To free any allocated fields of a pattern buffer, you can use the +POSIX function described in *Note Freeing POSIX Pattern Buffers::, +since the type `regex_t'--the type for POSIX pattern buffers--is +equivalent to the type `re_pattern_buffer'. After freeing a pattern +buffer, you need to again compile a regular expression in it (*note GNU +Regular Expression Compiling::.) before passing it to a matching or +searching function. + + +File: regex.info, Node: POSIX Regex Functions, Next: BSD Regex Functions, Prev: GNU Regex Functions, Up: Programming with Regex + +POSIX Regex Functions +===================== + + If you're writing code that has to be POSIX compatible, you'll need +to use these functions. Their interfaces are as specified by POSIX, +draft 1003.2/D11.2. + +* Menu: + +* POSIX Pattern Buffers:: The regex_t type. +* POSIX Regular Expression Compiling:: regcomp () +* POSIX Matching:: regexec () +* Reporting Errors:: regerror () +* Using Byte Offsets:: The regmatch_t type. +* Freeing POSIX Pattern Buffers:: regfree () + + +File: regex.info, Node: POSIX Pattern Buffers, Next: POSIX Regular Expression Compiling, Up: POSIX Regex Functions + +POSIX Pattern Buffers +--------------------- + + To compile or match a given regular expression the POSIX way, you +must supply a pattern buffer exactly the way you do for GNU (*note GNU +Pattern Buffers::.). POSIX pattern buffers have type `regex_t', which +is equivalent to the GNU pattern buffer type `re_pattern_buffer'. + + +File: regex.info, Node: POSIX Regular Expression Compiling, Next: POSIX Matching, Prev: POSIX Pattern Buffers, Up: POSIX Regex Functions + +POSIX Regular Expression Compiling +---------------------------------- + + With POSIX, you can only search for a given regular expression; you +can't match it. To do this, you must first compile it in a pattern +buffer, using `regcomp'. + + To compile a pattern buffer, use: + + int + regcomp (regex_t *PREG, const char *REGEX, int CFLAGS) + +PREG is the initialized pattern buffer's address, REGEX is the regular +expression's address, and CFLAGS is the compilation flags, which Regex +considers as a collection of bits. Here are the valid bits, as defined +in `regex.h': + +`REG_EXTENDED' + says to use POSIX Extended Regular Expression syntax; if this isn't + set, then says to use POSIX Basic Regular Expression syntax. + `regcomp' sets PREG's `syntax' field accordingly. + +`REG_ICASE' + says to ignore case; `regcomp' sets PREG's `translate' field to a + translate table which ignores case, replacing anything you've put + there before. + +`REG_NOSUB' + says to set PREG's `no_sub' field; *note POSIX Matching::., for + what this means. + +`REG_NEWLINE' + says that a: + + * match-any-character operator (*note Match-any-character + Operator::.) doesn't match a newline. + + * nonmatching list not containing a newline (*note List + Operators::.) matches a newline. + + * match-beginning-of-line operator (*note + Match-beginning-of-line Operator::.) matches the empty string + immediately after a newline, regardless of how `REG_NOTBOL' + is set (*note POSIX Matching::., for an explanation of + `REG_NOTBOL'). + + * match-end-of-line operator (*note Match-beginning-of-line + Operator::.) matches the empty string immediately before a + newline, regardless of how `REG_NOTEOL' is set (*note POSIX + Matching::., for an explanation of `REG_NOTEOL'). + + If `regcomp' successfully compiles the regular expression, it returns +zero and sets `*PATTERN_BUFFER' to the compiled pattern. Except for +`syntax' (which it sets as explained above), it also sets the same +fields the same way as does the GNU compiling function (*note GNU +Regular Expression Compiling::.). + + If `regcomp' can't compile the regular expression, it returns one of +the error codes listed here. (Except when noted differently, the +syntax of in all examples below is basic regular expression syntax.) + +`REG_BADRPT' + For example, the consecutive repetition operators `**' in `a**' + are invalid. As another example, if the syntax is extended + regular expression syntax, then the repetition operator `*' with + nothing on which to operate in `*' is invalid. + +`REG_BADBR' + For example, the COUNT `-1' in `a\{-1' is invalid. + +`REG_EBRACE' + For example, `a\{1' is missing a close-interval operator. + +`REG_EBRACK' + For example, `[a' is missing a close-list operator. + +`REG_ERANGE' + For example, the range ending point `z' that collates lower than + does its starting point `a' in `[z-a]' is invalid. Also, the + range with the character class `[:alpha:]' as its starting point in + `[[:alpha:]-|]'. + +`REG_ECTYPE' + For example, the character class name `foo' in `[[:foo:]' is + invalid. + +`REG_EPAREN' + For example, `a\)' is missing an open-group operator and `\(a' is + missing a close-group operator. + +`REG_ESUBREG' + For example, the back reference `\2' that refers to a nonexistent + subexpression in `\(a\)\2' is invalid. + +`REG_EEND' + Returned when a regular expression causes no other more specific + error. + +`REG_EESCAPE' + For example, the trailing backslash `\' in `a\' is invalid, as is + the one in `\'. + +`REG_BADPAT' + For example, in the extended regular expression syntax, the empty + group `()' in `a()b' is invalid. + +`REG_ESIZE' + Returned when a regular expression needs a pattern buffer larger + than 65536 bytes. + +`REG_ESPACE' + Returned when a regular expression makes Regex to run out of + memory. + + +File: regex.info, Node: POSIX Matching, Next: Reporting Errors, Prev: POSIX Regular Expression Compiling, Up: POSIX Regex Functions + +POSIX Matching +-------------- + + Matching the POSIX way means trying to match a null-terminated string +starting at its first character. Once you've compiled a pattern into a +pattern buffer (*note POSIX Regular Expression Compiling::.), you can +ask the matcher to match that pattern against a string using: + + int + regexec (const regex_t *PREG, const char *STRING, + size_t NMATCH, regmatch_t PMATCH[], int EFLAGS) + +PREG is the address of a pattern buffer for a compiled pattern. STRING +is the string you want to match. + + *Note Using Byte Offsets::, for an explanation of PMATCH. If you +pass zero for NMATCH or you compiled PREG with the compilation flag +`REG_NOSUB' set, then `regexec' will ignore PMATCH; otherwise, you must +allocate it to have at least NMATCH elements. `regexec' will record +NMATCH byte offsets in PMATCH, and set to -1 any unused elements up to +PMATCH`[NMATCH]' - 1. + + EFLAGS specifies "execution flags"--namely, the two bits `REG_NOTBOL' +and `REG_NOTEOL' (defined in `regex.h'). If you set `REG_NOTBOL', then +the match-beginning-of-line operator (*note Match-beginning-of-line +Operator::.) always fails to match. This lets you match against pieces +of a line, as you would need to if, say, searching for repeated +instances of a given pattern in a line; it would work correctly for +patterns both with and without match-beginning-of-line operators. +`REG_NOTEOL' works analogously for the match-end-of-line operator +(*note Match-end-of-line Operator::.); it exists for symmetry. + + `regexec' tries to find a match for PREG in STRING according to the +syntax in PREG's `syntax' field. (*Note POSIX Regular Expression +Compiling::, for how to set it.) The function returns zero if the +compiled pattern matches STRING and `REG_NOMATCH' (defined in +`regex.h') if it doesn't. + + +File: regex.info, Node: Reporting Errors, Next: Using Byte Offsets, Prev: POSIX Matching, Up: POSIX Regex Functions + +Reporting Errors +---------------- + + If either `regcomp' or `regexec' fail, they return a nonzero error +code, the possibilities for which are defined in `regex.h'. *Note +POSIX Regular Expression Compiling::, and *Note POSIX Matching::, for +what these codes mean. To get an error string corresponding to these +codes, you can use: + + size_t + regerror (int ERRCODE, + const regex_t *PREG, + char *ERRBUF, + size_t ERRBUF_SIZE) + +ERRCODE is an error code, PREG is the address of the pattern buffer +which provoked the error, ERRBUF is the error buffer, and ERRBUF_SIZE +is ERRBUF's size. + + `regerror' returns the size in bytes of the error string +corresponding to ERRCODE (including its terminating null). If ERRBUF +and ERRBUF_SIZE are nonzero, it also returns in ERRBUF the first +ERRBUF_SIZE - 1 characters of the error string, followed by a null. +eRRBUF_SIZE must be a nonnegative number less than or equal to the size +in bytes of ERRBUF. + + You can call `regerror' with a null ERRBUF and a zero ERRBUF_SIZE to +determine how large ERRBUF need be to accommodate `regerror''s error +string. + + +File: regex.info, Node: Using Byte Offsets, Next: Freeing POSIX Pattern Buffers, Prev: Reporting Errors, Up: POSIX Regex Functions + +Using Byte Offsets +------------------ + + In POSIX, variables of type `regmatch_t' hold analogous information, +but are not identical to, GNU's registers (*note Using Registers::.). +To get information about registers in POSIX, pass to `regexec' a +nonzero PMATCH of type `regmatch_t', i.e., the address of a structure +of this type, defined in `regex.h': + + typedef struct + { + regoff_t rm_so; + regoff_t rm_eo; + } regmatch_t; + + When reading in *Note Using Registers::, about how the matching +function stores the information into the registers, substitute PMATCH +for REGS, `PMATCH[I]->rm_so' for `REGS->start[I]' and +`PMATCH[I]->rm_eo' for `REGS->end[I]'. + + +File: regex.info, Node: Freeing POSIX Pattern Buffers, Prev: Using Byte Offsets, Up: POSIX Regex Functions + +Freeing POSIX Pattern Buffers +----------------------------- + + To free any allocated fields of a pattern buffer, use: + + void + regfree (regex_t *PREG) + +PREG is the pattern buffer whose allocated fields you want freed. +`regfree' also sets PREG's `allocated' and `used' fields to zero. +After freeing a pattern buffer, you need to again compile a regular +expression in it (*note POSIX Regular Expression Compiling::.) before +passing it to the matching function (*note POSIX Matching::.). + + +File: regex.info, Node: BSD Regex Functions, Prev: POSIX Regex Functions, Up: Programming with Regex + +BSD Regex Functions +=================== + + If you're writing code that has to be Berkeley UNIX compatible, +you'll need to use these functions whose interfaces are the same as +those in Berkeley UNIX. + +* Menu: + +* BSD Regular Expression Compiling:: re_comp () +* BSD Searching:: re_exec () + + +File: regex.info, Node: BSD Regular Expression Compiling, Next: BSD Searching, Up: BSD Regex Functions + +BSD Regular Expression Compiling +-------------------------------- + + With Berkeley UNIX, you can only search for a given regular +expression; you can't match one. To search for it, you must first +compile it. Before you compile it, you must indicate the regular +expression syntax you want it compiled according to by setting the +variable `re_syntax_options' (declared in `regex.h' to some syntax +(*note Regular Expression Syntax::.). + + To compile a regular expression use: + + char * + re_comp (char *REGEX) + +REGEX is the address of a null-terminated regular expression. +`re_comp' uses an internal pattern buffer, so you can use only the most +recently compiled pattern buffer. This means that if you want to use a +given regular expression that you've already compiled--but it isn't the +latest one you've compiled--you'll have to recompile it. If you call +`re_comp' with the null string (*not* the empty string) as the +argument, it doesn't change the contents of the pattern buffer. + + If `re_comp' successfully compiles the regular expression, it returns +zero. If it can't compile the regular expression, it returns an error +string. `re_comp''s error messages are identical to those of +`re_compile_pattern' (*note GNU Regular Expression Compiling::.). + + +File: regex.info, Node: BSD Searching, Prev: BSD Regular Expression Compiling, Up: BSD Regex Functions + +BSD Searching +------------- + + Searching the Berkeley UNIX way means searching in a string starting +at its first character and trying successive positions within it to +find a match. Once you've compiled a pattern using `re_comp' (*note +BSD Regular Expression Compiling::.), you can ask Regex to search for +that pattern in a string using: + + int + re_exec (char *STRING) + +STRING is the address of the null-terminated string in which you want +to search. + + `re_exec' returns either 1 for success or 0 for failure. It +automatically uses a GNU fastmap (*note Searching with Fastmaps::.). + + +File: regex.info, Node: Copying, Next: Index, Prev: Programming with Regex, Up: Top + +GNU GENERAL PUBLIC LICENSE +************************** + + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 675 Mass Ave, Cambridge, MA 02139, USA + + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + +Preamble +======== + + The licenses for most software are designed to take away your freedom +to share and change it. By contrast, the GNU General Public License is +intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it in +new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 1. This License applies to any program or other work which contains a + notice placed by the copyright holder saying it may be distributed + under the terms of this General Public License. The "Program", + below, refers to any such program or work, and a "work based on + the Program" means either the Program or any derivative work under + copyright law: that is to say, a work containing the Program or a + portion of it, either verbatim or with modifications and/or + translated into another language. (Hereinafter, translation is + included without limitation in the term "modification".) Each + licensee is addressed as "you". + + Activities other than copying, distribution and modification are + not covered by this License; they are outside its scope. The act + of running the Program is not restricted, and the output from the + Program is covered only if its contents constitute a work based on + the Program (independent of having been made by running the + Program). Whether that is true depends on what the Program does. + + 2. You may copy and distribute verbatim copies of the Program's + source code as you receive it, in any medium, provided that you + conspicuously and appropriately publish on each copy an appropriate + copyright notice and disclaimer of warranty; keep intact all the + notices that refer to this License and to the absence of any + warranty; and give any other recipients of the Program a copy of + this License along with the Program. + + You may charge a fee for the physical act of transferring a copy, + and you may at your option offer warranty protection in exchange + for a fee. + + 3. You may modify your copy or copies of the Program or any portion + of it, thus forming a work based on the Program, and copy and + distribute such modifications or work under the terms of Section 1 + above, provided that you also meet all of these conditions: + + a. You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b. You must cause any work that you distribute or publish, that + in whole or in part contains or is derived from the Program + or any part thereof, to be licensed as a whole at no charge + to all third parties under the terms of this License. + + c. If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display + an announcement including an appropriate copyright notice and + a notice that there is no warranty (or else, saying that you + provide a warranty) and that users may redistribute the + program under these conditions, and telling the user how to + view a copy of this License. (Exception: if the Program + itself is interactive but does not normally print such an + announcement, your work based on the Program is not required + to print an announcement.) + + These requirements apply to the modified work as a whole. If + identifiable sections of that work are not derived from the + Program, and can be reasonably considered independent and separate + works in themselves, then this License, and its terms, do not + apply to those sections when you distribute them as separate + works. But when you distribute the same sections as part of a + whole which is a work based on the Program, the distribution of + the whole must be on the terms of this License, whose permissions + for other licensees extend to the entire whole, and thus to each + and every part regardless of who wrote it. + + Thus, it is not the intent of this section to claim rights or + contest your rights to work written entirely by you; rather, the + intent is to exercise the right to control the distribution of + derivative or collective works based on the Program. + + In addition, mere aggregation of another work not based on the + Program with the Program (or with a work based on the Program) on + a volume of a storage or distribution medium does not bring the + other work under the scope of this License. + + 4. You may copy and distribute the Program (or a work based on it, + under Section 2) in object code or executable form under the terms + of Sections 1 and 2 above provided that you also do one of the + following: + + a. Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of + Sections 1 and 2 above on a medium customarily used for + software interchange; or, + + b. Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a + medium customarily used for software interchange; or, + + c. Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with + such an offer, in accord with Subsection b above.) + + The source code for a work means the preferred form of the work for + making modifications to it. For an executable work, complete + source code means all the source code for all modules it contains, + plus any associated interface definition files, plus the scripts + used to control compilation and installation of the executable. + However, as a special exception, the source code distributed need + not include anything that is normally distributed (in either + source or binary form) with the major components (compiler, + kernel, and so on) of the operating system on which the executable + runs, unless that component itself accompanies the executable. + + If distribution of executable or object code is made by offering + access to copy from a designated place, then offering equivalent + access to copy the source code from the same place counts as + distribution of the source code, even though third parties are not + compelled to copy the source along with the object code. + + 5. You may not copy, modify, sublicense, or distribute the Program + except as expressly provided under this License. Any attempt + otherwise to copy, modify, sublicense or distribute the Program is + void, and will automatically terminate your rights under this + License. However, parties who have received copies, or rights, + from you under this License will not have their licenses + terminated so long as such parties remain in full compliance. + + 6. You are not required to accept this License, since you have not + signed it. However, nothing else grants you permission to modify + or distribute the Program or its derivative works. These actions + are prohibited by law if you do not accept this License. + Therefore, by modifying or distributing the Program (or any work + based on the Program), you indicate your acceptance of this + License to do so, and all its terms and conditions for copying, + distributing or modifying the Program or works based on it. + + 7. Each time you redistribute the Program (or any work based on the + Program), the recipient automatically receives a license from the + original licensor to copy, distribute or modify the Program + subject to these terms and conditions. You may not impose any + further restrictions on the recipients' exercise of the rights + granted herein. You are not responsible for enforcing compliance + by third parties to this License. + + 8. If, as a consequence of a court judgment or allegation of patent + infringement or for any other reason (not limited to patent + issues), conditions are imposed on you (whether by court order, + agreement or otherwise) that contradict the conditions of this + License, they do not excuse you from the conditions of this + License. If you cannot distribute so as to satisfy simultaneously + your obligations under this License and any other pertinent + obligations, then as a consequence you may not distribute the + Program at all. For example, if a patent license would not permit + royalty-free redistribution of the Program by all those who + receive copies directly or indirectly through you, then the only + way you could satisfy both it and this License would be to refrain + entirely from distribution of the Program. + + If any portion of this section is held invalid or unenforceable + under any particular circumstance, the balance of the section is + intended to apply and the section as a whole is intended to apply + in other circumstances. + + It is not the purpose of this section to induce you to infringe any + patents or other property right claims or to contest validity of + any such claims; this section has the sole purpose of protecting + the integrity of the free software distribution system, which is + implemented by public license practices. Many people have made + generous contributions to the wide range of software distributed + through that system in reliance on consistent application of that + system; it is up to the author/donor to decide if he or she is + willing to distribute software through any other system and a + licensee cannot impose that choice. + + This section is intended to make thoroughly clear what is believed + to be a consequence of the rest of this License. + + 9. If the distribution and/or use of the Program is restricted in + certain countries either by patents or by copyrighted interfaces, + the original copyright holder who places the Program under this + License may add an explicit geographical distribution limitation + excluding those countries, so that distribution is permitted only + in or among countries not thus excluded. In such case, this + License incorporates the limitation as if written in the body of + this License. + + 10. The Free Software Foundation may publish revised and/or new + versions of the General Public License from time to time. Such + new versions will be similar in spirit to the present version, but + may differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the + Program specifies a version number of this License which applies + to it and "any later version", you have the option of following + the terms and conditions either of that version or of any later + version published by the Free Software Foundation. If the Program + does not specify a version number of this License, you may choose + any version ever published by the Free Software Foundation. + + 11. If you wish to incorporate parts of the Program into other free + programs whose distribution conditions are different, write to the + author to ask for permission. For software which is copyrighted + by the Free Software Foundation, write to the Free Software + Foundation; we sometimes make exceptions for this. Our decision + will be guided by the two goals of preserving the free status of + all derivatives of our free software and of promoting the sharing + and reuse of software generally. + + NO WARRANTY + + 12. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO + WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE + LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT + HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT + WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT + NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND + FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE + QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE + PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY + SERVICING, REPAIR OR CORRECTION. + + 13. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN + WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY + MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE + LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, + INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR + INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF + DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU + OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY + OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + +Appendix: How to Apply These Terms to Your New Programs +======================================================= + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these +terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. + Copyright (C) 19YY NAME OF AUTHOR + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + + Also add information on how to contact you by electronic and paper +mail. + + If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + + The hypothetical commands `show w' and `show c' should show the +appropriate parts of the General Public License. Of course, the +commands you use may be called something other than `show w' and `show +c'; they could even be mouse-clicks or menu items--whatever suits your +program. + + You should also get your employer (if you work as a programmer) or +your school, if any, to sign a "copyright disclaimer" for the program, +if necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + SIGNATURE OF TY COON, 1 April 1989 + Ty Coon, President of Vice + + This General Public License does not permit incorporating your +program into proprietary programs. If your program is a subroutine +library, you may consider it more useful to permit linking proprietary +applications with the library. If this is what you want to do, use the +GNU Library General Public License instead of this License. + + +File: regex.info, Node: Index, Prev: Copying, Up: Top + +Index +***** + +* Menu: + +* $: Match-end-of-line Operator. +* (: Grouping Operators. +* ): Grouping Operators. +* *: Match-zero-or-more Operator. +* +: Match-one-or-more Operator. +* -: List Operators. +* .: Match-any-character Operator. +* :] in regex: Character Class Operators. +* ?: Match-zero-or-one Operator. +* {: Interval Operators. +* }: Interval Operators. +* [: in regex: Character Class Operators. +* [^: List Operators. +* [: List Operators. +* \': Match-end-of-buffer Operator. +* \<: Match-beginning-of-word Operator. +* \>: Match-end-of-word Operator. +* \{: Interval Operators. +* \}: Interval Operators. +* \b: Match-word-boundary Operator. +* \B: Match-within-word Operator. +* \s: Match-syntactic-class Operator. +* \S: Match-not-syntactic-class Operator. +* \w: Match-word-constituent Operator. +* \W: Match-non-word-constituent Operator. +* \`: Match-beginning-of-buffer Operator. +* \: List Operators. +* ]: List Operators. +* ^: List Operators. +* allocated initialization: GNU Regular Expression Compiling. +* alternation operator: Alternation Operator. +* alternation operator and ^: Match-beginning-of-line Operator. +* anchoring: Anchoring Operators. +* anchors: Match-end-of-line Operator. +* anchors: Match-beginning-of-line Operator. +* Awk: Predefined Syntaxes. +* back references: Back-reference Operator. +* backtracking: Match-zero-or-more Operator. +* backtracking: Alternation Operator. +* beginning-of-line operator: Match-beginning-of-line Operator. +* bracket expression: List Operators. +* buffer field, set by re_compile_pattern: GNU Regular Expression Compiling. +* buffer initialization: GNU Regular Expression Compiling. +* character classes: Character Class Operators. +* Egrep: Predefined Syntaxes. +* Emacs: Predefined Syntaxes. +* end in struct re_registers: Using Registers. +* end-of-line operator: Match-end-of-line Operator. +* fastmap initialization: GNU Regular Expression Compiling. +* fastmaps: Searching with Fastmaps. +* fastmap_accurate field, set by re_compile_pattern: GNU Regular Expression Compiling. +* Grep: Predefined Syntaxes. +* grouping: Grouping Operators. +* ignoring case: POSIX Regular Expression Compiling. +* interval expression: Interval Operators. +* matching list: List Operators. +* matching newline: List Operators. +* matching with GNU functions: GNU Matching. +* newline_anchor field in pattern buffer: Match-beginning-of-line Operator. +* nonmatching list: List Operators. +* not_bol field in pattern buffer: Match-beginning-of-line Operator. +* num_regs in struct re_registers: Using Registers. +* open-group operator and ^: Match-beginning-of-line Operator. +* or operator: Alternation Operator. +* parenthesizing: Grouping Operators. +* pattern buffer initialization: GNU Regular Expression Compiling. +* pattern buffer, definition of: GNU Pattern Buffers. +* POSIX Awk: Predefined Syntaxes. +* range argument to re_search: GNU Searching. +* regex.c: Overview. +* regex.h: Overview. +* regexp anchoring: Anchoring Operators. +* regmatch_t: Using Byte Offsets. +* regs_allocated: Using Registers. +* REGS_FIXED: Using Registers. +* REGS_REALLOCATE: Using Registers. +* REGS_UNALLOCATED: Using Registers. +* regular expressions, syntax of: Regular Expression Syntax. +* REG_EXTENDED: POSIX Regular Expression Compiling. +* REG_ICASE: POSIX Regular Expression Compiling. +* REG_NEWLINE: POSIX Regular Expression Compiling. +* REG_NOSUB: POSIX Regular Expression Compiling. +* RE_BACKSLASH_ESCAPE_IN_LIST: Syntax Bits. +* RE_BK_PLUS_QM: Syntax Bits. +* RE_CHAR_CLASSES: Syntax Bits. +* RE_CONTEXT_INDEP_ANCHORS: Syntax Bits. +* RE_CONTEXT_INDEP_ANCHORS (and ^): Match-beginning-of-line Operator. +* RE_CONTEXT_INDEP_OPS: Syntax Bits. +* RE_CONTEXT_INVALID_OPS: Syntax Bits. +* RE_DOT_NEWLINE: Syntax Bits. +* RE_DOT_NOT_NULL: Syntax Bits. +* RE_INTERVALS: Syntax Bits. +* RE_LIMITED_OPS: Syntax Bits. +* RE_NEWLINE_ALT: Syntax Bits. +* RE_NO_BK_BRACES: Syntax Bits. +* RE_NO_BK_PARENS: Syntax Bits. +* RE_NO_BK_REFS: Syntax Bits. +* RE_NO_BK_VBAR: Syntax Bits. +* RE_NO_EMPTY_RANGES: Syntax Bits. +* re_nsub field, set by re_compile_pattern: GNU Regular Expression Compiling. +* re_pattern_buffer definition: GNU Pattern Buffers. +* re_registers: Using Registers. +* re_syntax_options initialization: GNU Regular Expression Compiling. +* RE_UNMATCHED_RIGHT_PAREN_ORD: Syntax Bits. +* searching with GNU functions: GNU Searching. +* start argument to re_search: GNU Searching. +* start in struct re_registers: Using Registers. +* struct re_pattern_buffer definition: GNU Pattern Buffers. +* subexpressions: Grouping Operators. +* syntax field, set by re_compile_pattern: GNU Regular Expression Compiling. +* syntax bits: Syntax Bits. +* syntax initialization: GNU Regular Expression Compiling. +* syntax of regular expressions: Regular Expression Syntax. +* translate initialization: GNU Regular Expression Compiling. +* used field, set by re_compile_pattern: GNU Regular Expression Compiling. +* word boundaries, matching: Match-word-boundary Operator. +* \: The Backslash Character. +* \(: Grouping Operators. +* \): Grouping Operators. +* \|: Alternation Operator. +* ^: Match-beginning-of-line Operator. +* |: Alternation Operator. + + + +Tag Table: +Node: Top1064 +Node: Overview4562 +Node: Regular Expression Syntax6746 +Node: Syntax Bits7916 +Node: Predefined Syntaxes14018 +Node: Collating Elements vs. Characters17872 +Node: The Backslash Character18835 +Node: Common Operators21992 +Node: Match-self Operator23445 +Node: Match-any-character Operator23941 +Node: Concatenation Operator24520 +Node: Repetition Operators25017 +Node: Match-zero-or-more Operator25436 +Node: Match-one-or-more Operator27483 +Node: Match-zero-or-one Operator28341 +Node: Interval Operators29196 +Node: Alternation Operator30991 +Node: List Operators32489 +Node: Character Class Operators35272 +Node: Range Operator36901 +Node: Grouping Operators38930 +Node: Back-reference Operator40251 +Node: Anchoring Operators43073 +Node: Match-beginning-of-line Operator43447 +Node: Match-end-of-line Operator44779 +Node: GNU Operators45518 +Node: Word Operators45767 +Node: Non-Emacs Syntax Tables46391 +Node: Match-word-boundary Operator47465 +Node: Match-within-word Operator47858 +Node: Match-beginning-of-word Operator48255 +Node: Match-end-of-word Operator48588 +Node: Match-word-constituent Operator48908 +Node: Match-non-word-constituent Operator49234 +Node: Buffer Operators49545 +Node: Match-beginning-of-buffer Operator49952 +Node: Match-end-of-buffer Operator50264 +Node: GNU Emacs Operators50558 +Node: Syntactic Class Operators50901 +Node: Emacs Syntax Tables51307 +Node: Match-syntactic-class Operator51963 +Node: Match-not-syntactic-class Operator52560 +Node: What Gets Matched?53150 +Node: Programming with Regex53799 +Node: GNU Regex Functions54237 +Node: GNU Pattern Buffers55078 +Node: GNU Regular Expression Compiling58303 +Node: GNU Matching61181 +Node: GNU Searching63101 +Node: Matching/Searching with Split Data64913 +Node: Searching with Fastmaps66369 +Node: GNU Translate Tables68921 +Node: Using Registers70892 +Node: Freeing GNU Pattern Buffers77000 +Node: POSIX Regex Functions77593 +Node: POSIX Pattern Buffers78266 +Node: POSIX Regular Expression Compiling78709 +Node: POSIX Matching82836 +Node: Reporting Errors84791 +Node: Using Byte Offsets86048 +Node: Freeing POSIX Pattern Buffers86861 +Node: BSD Regex Functions87467 +Node: BSD Regular Expression Compiling87886 +Node: BSD Searching89258 +Node: Copying89960 +Node: Index109122 + +End Tag Table |