summaryrefslogtreecommitdiffstats
path: root/lib
diff options
context:
space:
mode:
authorpfg <pfg@FreeBSD.org>2016-06-24 02:24:34 +0000
committerpfg <pfg@FreeBSD.org>2016-06-24 02:24:34 +0000
commit33e55d7e3fd8a63e84f2d3cc5a0e3834a98dd86f (patch)
tree39088ff1700d95a9713154b77b4299f1cd2feafa /lib
parentc45c824e6b4172efa131a7166214ec7a14ff5168 (diff)
downloadFreeBSD-src-33e55d7e3fd8a63e84f2d3cc5a0e3834a98dd86f.zip
FreeBSD-src-33e55d7e3fd8a63e84f2d3cc5a0e3834a98dd86f.tar.gz
MFC r300683:
libc: regexec(3) adjustment. Change the behavior of when REG_STARTEND is combined with REG_NOTBOL. From the original posting[1]: "Enable the assumption that pmatch[0].rm_so is a continuation offset to a string and allows us to do a proper assessment of the character in regards to it's word position ('^' or '\<'), without risking going into unallocated memory." This change makes us similar to how glibc handles REG_STARTEND | REG_NOTBOL, and is closely related to a soon-to-land fix to sed. Special thanks to Martijn van Duren and Ingo Schwarze for working out some consistent behaviour. Differential Revision: https://reviews.freebsd.org/D6257 Taken from: openbsd-tech 2016-05-24 [1] (Martijn van Duren)
Diffstat (limited to 'lib')
-rw-r--r--lib/libc/regex/engine.c4
-rw-r--r--lib/libc/regex/regex.370
2 files changed, 50 insertions, 24 deletions
diff --git a/lib/libc/regex/engine.c b/lib/libc/regex/engine.c
index 2ca971b..a756bba 100644
--- a/lib/libc/regex/engine.c
+++ b/lib/libc/regex/engine.c
@@ -786,7 +786,7 @@ fast( struct match *m,
ASSIGN(fresh, st);
SP("start", st, *p);
coldp = NULL;
- if (start == m->beginp)
+ if (start == m->offp || (start == m->beginp && !(m->eflags&REG_NOTBOL)))
c = OUT;
else {
/*
@@ -891,7 +891,7 @@ slow( struct match *m,
SP("sstart", st, *p);
st = step(m->g, startst, stopst, st, NOTHING, st);
matchp = NULL;
- if (start == m->beginp)
+ if (start == m->offp || (start == m->beginp && !(m->eflags&REG_NOTBOL)))
c = OUT;
else {
/*
diff --git a/lib/libc/regex/regex.3 b/lib/libc/regex/regex.3
index ea1ba25..70be400 100644
--- a/lib/libc/regex/regex.3
+++ b/lib/libc/regex/regex.3
@@ -32,7 +32,7 @@
.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
.\" $FreeBSD$
.\"
-.Dd August 17, 2005
+.Dd May 25, 2016
.Dt REGEX 3
.Os
.Sh NAME
@@ -235,11 +235,16 @@ The
argument is the bitwise OR of zero or more of the following flags:
.Bl -tag -width REG_STARTEND
.It Dv REG_NOTBOL
-The first character of
-the string
-is not the beginning of a line, so the
-.Ql ^\&
-anchor should not match before it.
+The first character of the string is treated as the continuation
+of a line.
+This means that the anchors
+.Ql ^\& ,
+.Ql [[:<:]] ,
+and
+.Ql \e<
+do not match before it; but see
+.Dv REG_STARTEND
+below.
This does not affect the behavior of newlines under
.Dv REG_NEWLINE .
.It Dv REG_NOTEOL
@@ -247,19 +252,16 @@ The NUL terminating
the string
does not end a line, so the
.Ql $\&
-anchor should not match before it.
+anchor does not match before it.
This does not affect the behavior of newlines under
.Dv REG_NEWLINE .
.It Dv REG_STARTEND
The string is considered to start at
-.Fa string
-+
-.Fa pmatch Ns [0]. Ns Va rm_so
-and to have a terminating NUL located at
-.Fa string
-+
-.Fa pmatch Ns [0]. Ns Va rm_eo
-(there need not actually be a NUL at that location),
+.Fa string No +
+.Fa pmatch Ns [0]. Ns Fa rm_so
+and to end before the byte located at
+.Fa string No +
+.Fa pmatch Ns [0]. Ns Fa rm_eo ,
regardless of the value of
.Fa nmatch .
See below for the definition of
@@ -271,13 +273,37 @@ compatible with but not specified by
.St -p1003.2 ,
and should be used with
caution in software intended to be portable to other systems.
-Note that a non-zero
-.Va rm_so
-does not imply
-.Dv REG_NOTBOL ;
-.Dv REG_STARTEND
-affects only the location of the string,
-not how it is matched.
+.Pp
+Without
+.Dv REG_NOTBOL ,
+the position
+.Fa rm_so
+is considered the beginning of a line, such that
+.Ql ^
+matches before it, and the beginning of a word if there is a word
+character at this position, such that
+.Ql [[:<:]]
+and
+.Ql \e<
+match before it.
+.Pp
+With
+.Dv REG_NOTBOL ,
+the character at position
+.Fa rm_so
+is treated as the continuation of a line, and if
+.Fa rm_so
+is greater than 0, the preceding character is taken into consideration.
+If the preceding character is a newline and the regular expression was compiled
+with
+.Dv REG_NEWLINE ,
+.Ql ^
+matches before the string; if the preceding character is not a word character
+but the string starts with a word character,
+.Ql [[:<:]]
+and
+.Ql \e<
+match before the string.
.El
.Pp
See
OpenPOWER on IntegriCloud