diff options
author | pfg <pfg@FreeBSD.org> | 2016-06-24 02:24:34 +0000 |
---|---|---|
committer | pfg <pfg@FreeBSD.org> | 2016-06-24 02:24:34 +0000 |
commit | 33e55d7e3fd8a63e84f2d3cc5a0e3834a98dd86f (patch) | |
tree | 39088ff1700d95a9713154b77b4299f1cd2feafa /lib/libc/regex/regex.3 | |
parent | c45c824e6b4172efa131a7166214ec7a14ff5168 (diff) | |
download | FreeBSD-src-33e55d7e3fd8a63e84f2d3cc5a0e3834a98dd86f.zip FreeBSD-src-33e55d7e3fd8a63e84f2d3cc5a0e3834a98dd86f.tar.gz |
MFC r300683:
libc: regexec(3) adjustment.
Change the behavior of when REG_STARTEND is combined with REG_NOTBOL.
From the original posting[1]:
"Enable the assumption that pmatch[0].rm_so is a continuation offset
to a string and allows us to do a proper assessment of the character
in regards to it's word position ('^' or '\<'), without risking going
into unallocated memory."
This change makes us similar to how glibc handles REG_STARTEND |
REG_NOTBOL, and is closely related to a soon-to-land fix to sed.
Special thanks to Martijn van Duren and Ingo Schwarze for working
out some consistent behaviour.
Differential Revision: https://reviews.freebsd.org/D6257
Taken from: openbsd-tech 2016-05-24 [1] (Martijn van Duren)
Diffstat (limited to 'lib/libc/regex/regex.3')
-rw-r--r-- | lib/libc/regex/regex.3 | 70 |
1 files changed, 48 insertions, 22 deletions
diff --git a/lib/libc/regex/regex.3 b/lib/libc/regex/regex.3 index ea1ba25..70be400 100644 --- a/lib/libc/regex/regex.3 +++ b/lib/libc/regex/regex.3 @@ -32,7 +32,7 @@ .\" @(#)regex.3 8.4 (Berkeley) 3/20/94 .\" $FreeBSD$ .\" -.Dd August 17, 2005 +.Dd May 25, 2016 .Dt REGEX 3 .Os .Sh NAME @@ -235,11 +235,16 @@ The argument is the bitwise OR of zero or more of the following flags: .Bl -tag -width REG_STARTEND .It Dv REG_NOTBOL -The first character of -the string -is not the beginning of a line, so the -.Ql ^\& -anchor should not match before it. +The first character of the string is treated as the continuation +of a line. +This means that the anchors +.Ql ^\& , +.Ql [[:<:]] , +and +.Ql \e< +do not match before it; but see +.Dv REG_STARTEND +below. This does not affect the behavior of newlines under .Dv REG_NEWLINE . .It Dv REG_NOTEOL @@ -247,19 +252,16 @@ The NUL terminating the string does not end a line, so the .Ql $\& -anchor should not match before it. +anchor does not match before it. This does not affect the behavior of newlines under .Dv REG_NEWLINE . .It Dv REG_STARTEND The string is considered to start at -.Fa string -+ -.Fa pmatch Ns [0]. Ns Va rm_so -and to have a terminating NUL located at -.Fa string -+ -.Fa pmatch Ns [0]. Ns Va rm_eo -(there need not actually be a NUL at that location), +.Fa string No + +.Fa pmatch Ns [0]. Ns Fa rm_so +and to end before the byte located at +.Fa string No + +.Fa pmatch Ns [0]. Ns Fa rm_eo , regardless of the value of .Fa nmatch . See below for the definition of @@ -271,13 +273,37 @@ compatible with but not specified by .St -p1003.2 , and should be used with caution in software intended to be portable to other systems. -Note that a non-zero -.Va rm_so -does not imply -.Dv REG_NOTBOL ; -.Dv REG_STARTEND -affects only the location of the string, -not how it is matched. +.Pp +Without +.Dv REG_NOTBOL , +the position +.Fa rm_so +is considered the beginning of a line, such that +.Ql ^ +matches before it, and the beginning of a word if there is a word +character at this position, such that +.Ql [[:<:]] +and +.Ql \e< +match before it. +.Pp +With +.Dv REG_NOTBOL , +the character at position +.Fa rm_so +is treated as the continuation of a line, and if +.Fa rm_so +is greater than 0, the preceding character is taken into consideration. +If the preceding character is a newline and the regular expression was compiled +with +.Dv REG_NEWLINE , +.Ql ^ +matches before the string; if the preceding character is not a word character +but the string starts with a word character, +.Ql [[:<:]] +and +.Ql \e< +match before the string. .El .Pp See |