diff options
author | pfg <pfg@FreeBSD.org> | 2014-05-05 14:50:53 +0000 |
---|---|---|
committer | pfg <pfg@FreeBSD.org> | 2014-05-05 14:50:53 +0000 |
commit | 9fe5eca952a101650cad59dd5bb5256c130c962d (patch) | |
tree | e459c6f9d7671fac72a4c257ffe0a77eb7375a11 /lib/libc/locale | |
parent | 0bbe9c267ffe4ce0f7ad36dc92d99ea5083be996 (diff) | |
download | FreeBSD-src-9fe5eca952a101650cad59dd5bb5256c130c962d.zip FreeBSD-src-9fe5eca952a101650cad59dd5bb5256c130c962d.tar.gz |
MFC r265095, r265167;
citrus: Avoid invalid code points.
The UTF-8 decoder should not accept byte sequences which decode to
unicode code positions U+D800 to U+DFFF (UTF-16 surrogates).[1]
Contrary to the original OpenBSD patch, we do pass U+FFFE and U+FFFF,
both values are valid "non-characters" [2] and must be mapped through
UTFs.
[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
[2] http://www.unicode.org/faq/private_use.html
Reported by: Stefan Sperling [1]
Thanks to: jilles [2]
Obtained from: OpenBSD
Diffstat (limited to 'lib/libc/locale')
-rw-r--r-- | lib/libc/locale/utf8.c | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/lib/libc/locale/utf8.c b/lib/libc/locale/utf8.c index 40f0e17..cffa241 100644 --- a/lib/libc/locale/utf8.c +++ b/lib/libc/locale/utf8.c @@ -203,6 +203,13 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc, const char * __restrict s, size_t n, errno = EILSEQ; return ((size_t)-1); } + if (wch >= 0xd800 && wch <= 0xdfff) { + /* + * Malformed input; invalid code points. + */ + errno = EILSEQ; + return ((size_t)-1); + } if (pwc != NULL) *pwc = wch; us->want = 0; |