summaryrefslogtreecommitdiffstats
path: root/contrib/libg++/librx/DOC
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/libg++/librx/DOC')
-rw-r--r--contrib/libg++/librx/DOC179
1 files changed, 0 insertions, 179 deletions
diff --git a/contrib/libg++/librx/DOC b/contrib/libg++/librx/DOC
deleted file mode 100644
index dce6e2e..0000000
--- a/contrib/libg++/librx/DOC
+++ /dev/null
@@ -1,179 +0,0 @@
-Most of the interfaces are covered by the documentation for GNU regex.
-
-This file will accumulate factoids about other interfaces until
-somebody writes a manual.
-
-
-* Don't Pass Registers Gratuitously
-
-Search and match functions take an optional parameter which is a
-pointer to "registers" or "match positions". This parameter points
-to a structure which during the match is filled in with the offset locations
-of parenthesized subexpressions.
-
-Unless you specificly need the values that would be stored in that
-structure, you should pass NULL for this parameter. Usually Rx will
-do less backtracking (and so run much faster) if subexpression
-positions are not being measured.
-
-
-* Use syntax_parens
-
-Sometimes you need to know the positions of *some* parenthesized
-subexpressions, but not others. You can still help Rx to avoid
-backtracking by telling it specificly which subexpressions you are interested
-in. You do this by filling in the rxb.syntax_parens field of
-a pattern buffer.
-
- /* If this is a valid pointer, it tells rx not to store the extents of
- * certain subexpressions (those corresponding to non-zero entries).
- * Passing 0x1 is the same as passing an array of all ones. Passing 0x0
- * is the same as passing an array of all zeros.
- * The array should contain as many entries as their are subexps in the
- * regexp.
- */
- char * syntax_parens;
-
-
-* RX_SEARCH
-
-For an example of how to use rx_search, you can look at how
-re_search_2 is defined (in rx.c). Basicly you need to define three
-functions. These are GET_BURST, FETCH_CHAR, and BACK_REF. They each
-operate on `struct rx_string_position' and a closure of your design
-passed as void *.
-
- struct rx_string_position
- {
- const unsigned char * pos; /* The current pos. */
- const unsigned char * string; /* The current string burst. */
- const unsigned char * end; /* First invalid position >= POS. */
- int offset; /* Integer address of current burst */
- int size; /* Current string's size. */
- int search_direction; /* 1 or -1 */
- int search_end; /* First position to not try. */
- };
-
-On entry to GET_BURST, all these fields are set, but POS may be >=
-END. In fact, STRING and END might both be 0.
-
-The function of GET_BURST is to make all the fields valid without
-changing the logical position in the string. SEARCH_DIRECTION is a
-hint about which way the matcher will move next. It is usually 1, and
-is -1 only when fastmapping during a reverse search. SEARCH_END
-terminates the burst.
-
- typedef enum rx_get_burst_return
- (*rx_get_burst_fn) (struct rx_string_position * pos,
- void * app_closure,
- int stop);
-
-
-The closure is whatever you pass to rx_search. STOP is an argument to
-rx_search that bounds the search. You should never return a string
-position from with SEARCH_END set beyond the position indicated by
-STOP.
-
-
- enum rx_get_burst_return
- {
- rx_get_burst_continuation,
- rx_get_burst_error,
- rx_get_burst_ok,
- rx_get_burst_no_more
- };
-
-Those are the possible return values of get_burst. Normally, you only
-ever care about the last two. An error return indicates something
-like trouble reading a file. A continuation return means suspend the
-search and resume by retrying GET_BURST if the search is restarted.
-
-GET_BURST is not quite as trivial as you might hope. If you have a
-fragmented string, you really have to keep two adjacent fragments at
-all times, even though the GET_BURST interface looks like you only
-need one. This is because of operators like `word-boundary' that try
-to look at two adjacent characters. Such operators are implemented
-with FETCH_CHAR.
-
- typedef int (*rx_fetch_char_fn) (struct rx_string_position * pos,
- int offset,
- void * app_closure,
- int stop);
-
-That takes the same closure passed to GET_BURST. It returns the
-character at POS or at one past POS according to whether OFFSET is 0
-or 1.
-
-It is guaranteed that POS + OFFSET is within the string being searched.
-
-
-
-The last function compares characters at one position with characters
-previously matched by a parenthesized subexpression.
-
- enum rx_back_check_return
- {
- rx_back_check_continuation,
- rx_back_check_error,
- rx_back_check_pass,
- rx_back_check_fail
- };
-
- typedef enum rx_back_check_return
- (*rx_back_check_fn) (struct rx_string_position * pos,
- int lparen,
- int rparen,
- unsigned char * translate,
- void * app_closure,
- int stop);
-
-LPAREN and RPAREN are the integer indexes of the previously matched
-characters. The comparison should translate both characters being
-compared by mapping them through TRANSLATE. POS is the point at which
-to begin comparing. It should be advanced to the last character
-matched during backreferencing.
-
-* Compilation Stages
-
-In rx_compile, a string is compiled into a pattern buffer.
-Compilation proceeds in these stages:
-
- 1. Make a syntax tree for the regexp.
- 2. Duplicate the syntax tree and make both trees nodes
- in a single unifying tree.
- 3. In one of the two trees, remove all side effects that
- aren't needed to test for the possibility of a match.
- Such side effects include the filling in of output registers
- for subexpressions that are not backreferenced.
- 4. Optimize the unifying tree.
- 5. Translate the tree to an NFA.
- 6. Analyze and optimize the NFA.
- 7. Copy the NFA into a contiguous region of memory.
-
-
-* Cache Size
-
-During a search or match, the NFA is translated into a "super NFA".
-A super NFA can match the patterns of the corresponding NFA in no more
-and often fewer steps.
-
-The catch is that the super NFA may be costly to construct in its entirety;
-it may not even fit in memory. So, states of the NFA are constructed
-on demand and discarded after a period of non-use. They are kept in a cache
-so that time is not wasted constructing existing nodes twice.
-
-The size of the super state NFA cache is a contributing factor the performance
-of Rx. The larger the cache (to a point) the faster Rx can run. The
-variable rx_cache_bound is an upper limit on the number of superstates
-that can exist in the cache.
-
-The defaulting setting is 128. GNU sed uses 4096. Neither setting
-has much justification although sed's is after a small number of quick
-and dirty experiments. The memory consumed by one superstate is
-between 4k and 8k. The cache only grows to its bounded size if there
-is actual demand for that many states. Sed's setting, for example,
-may appear quite high but in practice that much memory is hardly ever
-used. The default setting was chosen based on the heuristic that a
-megabyte is the upper limit on what a good citizen library can
-allocate without special arrangement.
-
OpenPOWER on IntegriCloud