FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

diff options

author	tjr <tjr@FreeBSD.org>	2004-07-12 07:35:59 +0000
committer	tjr <tjr@FreeBSD.org>	2004-07-12 07:35:59 +0000
commit	ba689b40433c4dbba7960454bbb126e6da5bdf59 (patch)
tree	0f9f638917b5d0fb257cbfab36551506ddb3b43f /lib/libc/string/wcscat.c
parent	031e087d2c3a8f23b40ec3cd62b741f935e1e2e6 (diff)
download	FreeBSD-src-ba689b40433c4dbba7960454bbb126e6da5bdf59.zip FreeBSD-src-ba689b40433c4dbba7960454bbb126e6da5bdf59.tar.gz

Make regular expression matching aware of multibyte characters. The general

idea is that we perform multibyte->wide character conversion while parsing and compiling, then convert byte sequences to wide characters when they're needed for comparison and stepping through the string during execution. As with tr(1), the main complication is to efficiently represent sets of characters in bracket expressions. The old bitmap representation is replaced by a bitmap for the first 256 characters combined with a vector of individual wide characters, a vector of character ranges (for [A-Z] etc.), and a vector of character classes (for [[:alpha:]] etc.). One other point of interest is that although the Boyer-Moore algorithm had to be disabled in the general multibyte case, it is still enabled for UTF-8 because of its self-synchronizing nature. This greatly speeds up matching by reducing the number of multibyte conversions that need to be done.

Diffstat (limited to 'lib/libc/string/wcscat.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: