summaryrefslogtreecommitdiffstats
path: root/usr.bin/tr
diff options
context:
space:
mode:
authortjr <tjr@FreeBSD.org>2004-07-23 05:44:04 +0000
committertjr <tjr@FreeBSD.org>2004-07-23 05:44:04 +0000
commit2322892e0bbf7bdfdc334c74442cd2fd8b435e9c (patch)
tree8e513a35fca945014d750a9a4eb629826fb1c5b8 /usr.bin/tr
parentaa7eec1b492ac8e2f8e78f641829f125fb5b4531 (diff)
downloadFreeBSD-src-2322892e0bbf7bdfdc334c74442cd2fd8b435e9c.zip
FreeBSD-src-2322892e0bbf7bdfdc334c74442cd2fd8b435e9c.tar.gz
Add a lengthy discussion of why "tr a-z A-Z" and "tr A-Z a-z" are not the
right way to perform case-conversion.
Diffstat (limited to 'usr.bin/tr')
-rw-r--r--usr.bin/tr/tr.142
1 files changed, 41 insertions, 1 deletions
diff --git a/usr.bin/tr/tr.1 b/usr.bin/tr/tr.1
index 80f4516..ef39711 100644
--- a/usr.bin/tr/tr.1
+++ b/usr.bin/tr/tr.1
@@ -35,7 +35,7 @@
.\" @(#)tr.1 8.1 (Berkeley) 6/6/93
.\" $FreeBSD$
.\"
-.Dd July 9, 2004
+.Dd July 23, 2004
.Dt TR 1
.Os
.Sh NAME
@@ -169,6 +169,13 @@ as defined by the collation sequence.
If either or both of the range endpoints are octal sequences, it
represents the range of specific coded values between the
range endpoints, inclusive.
+.Pp
+.Bf Em
+See the COMPATIBILITY section below for an important note regarding
+differences in the way the current
+implementation interprets range expressions differently from
+previous implementations.
+.Ef
.It [:class:]
Represents all characters belonging to the defined character class.
Class names are:
@@ -274,6 +281,12 @@ Translate the contents of file1 to upper-case.
.Pp
.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
.Pp
+(This should be preferred over the traditional
+.Ux
+idiom of
+.Ql "tr a-z A-Z" ,
+since it works correctly in all locales.)
+.Pp
Strip out non-printable characters from file1.
.Pp
.D1 Li "tr -cd \*q[:print:]\*q < file1"
@@ -285,6 +298,33 @@ Remove diacritical marks from all accented variants of the letter
.Sh DIAGNOSTICS
.Ex -std
.Sh COMPATIBILITY
+Previous
+.Fx
+implementations of
+.Nm
+did not order characters in range expressions according to the current
+locale's collation order, making it possible to convert unaccented Latin
+characters (esp. as found in English text) from upper to lower case using
+the traditional
+.Ux
+idiom of
+.Ql "tr A-Z a-z" .
+Since
+.Nm
+now obeys the locale's collation order, this idiom may not produce
+correct results when there is not a 1:1 mapping between lower and
+upper case, or when the order of characters within the two cases differs.
+As noted in the
+.Sx EXAMPLES
+section above, the character class expressions
+.Ql "[:lower:]"
+and
+.Ql "[:upper:]"
+should be used instead of explicit character ranges like
+.Ql "a-z"
+and
+.Ql "A-Z" .
+.Pp
System V has historically implemented character ranges using the syntax
``[c-c]'' instead of the ``c-c'' used by historic
.Bx
OpenPOWER on IntegriCloud