summaryrefslogtreecommitdiffstats
path: root/magic.man
diff options
context:
space:
mode:
Diffstat (limited to 'magic.man')
-rw-r--r--magic.man199
1 files changed, 119 insertions, 80 deletions
diff --git a/magic.man b/magic.man
index 3842b64..314a014 100644
--- a/magic.man
+++ b/magic.man
@@ -1,11 +1,11 @@
-.\" $File: magic.man,v 1.39 2007/11/08 00:31:37 christos Exp $
-.Dd January 10, 2007
+.\" $File: magic.man,v 1.57 2008/08/30 09:50:20 christos Exp $
+.Dd August 30, 2008
.Dt MAGIC __FSECTION__
.Os
-.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
+.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems.
.Sh NAME
.Nm magic
-.Nd file command's magic number file
+.Nd file command's magic pattern file
.Sh DESCRIPTION
This manual page documents the format of the magic file as
used by the
@@ -15,18 +15,17 @@ The
.Xr file __CSECTION__
command identifies the type of a file using,
among other tests,
-a test for whether the file begins with a certain
-.Dq "magic number" .
+a test for whether the file contains certain
+.Dq "magic patterns" .
The file
.Pa __MAGIC__
-specifies what magic numbers are to be tested for,
-what message to print if a particular magic number is found,
+specifies what patterns are to be tested for, what message or
+MIME type to print if a particular pattern is found,
and additional information to extract from the file.
.Pp
Each line of the file specifies a test to be performed.
A test compares the data starting at a particular offset
-in the file with a 1-byte, 2-byte, or 4-byte numeric value or
-a string.
+in the file with a byte value, a string or a numeric value.
If the test succeeds, a message is printed.
The line consists of the following fields:
.Bl -tag -width ".Dv message"
@@ -40,15 +39,15 @@ The possible values are:
.It Dv byte
A one-byte value.
.It Dv short
-A two-byte value (on most systems) in this machine's native byte order.
+A two-byte value in this machine's native byte order.
.It Dv long
-A four-byte value (on most systems) in this machine's native byte order.
+A four-byte value in this machine's native byte order.
.It Dv quad
-An eight-byte value (on most systems) in this machine's native byte order.
+An eight-byte value in this machine's native byte order.
.It Dv float
-A 32-bit (on most systems) single precision IEEE floating point number in this machine's native byte order.
+A 32-bit single precision IEEE floating point number in this machine's native byte order.
.It Dv double
-A 64-bit (on most systems) double precision IEEE floating point number in this machine's native byte order.
+A 64-bit double precision IEEE floating point number in this machine's native byte order.
.It Dv string
A string of bytes.
The string type specification can be optionally followed
@@ -69,10 +68,10 @@ Finally the
.Dq c
flag, specifies case insensitive matching: lowercase
characters in the magic match both lower and upper case characters in the
-targer, whereas upper case characters in the magic, only much uppercase
+target, whereas upper case characters in the magic only match uppercase
characters in the target.
.It Dv pstring
-A pascal style string where the first byte is interpreted as the an
+A Pascal-style string where the first byte is interpreted as the an
unsigned length.
The string is not NUL terminated.
.It Dv date
@@ -86,106 +85,119 @@ local time rather than UTC.
An eight-byte value interpreted as a UNIX-style date, but interpreted as
local time rather than UTC.
.It Dv beshort
-A two-byte value (on most systems) in big-endian byte order.
+A two-byte value in big-endian byte order.
.It Dv belong
-A four-byte value (on most systems) in big-endian byte order.
+A four-byte value in big-endian byte order.
.It Dv bequad
-An eight-byte value (on most systems) in big-endian byte order.
+An eight-byte value in big-endian byte order.
.It Dv befloat
-A 32-bit (on most systems) single precision IEEE floating point number in big-endian byte order.
+A 32-bit single precision IEEE floating point number in big-endian byte order.
.It Dv bedouble
-A 64-bit (on most systems) double precision IEEE floating point number in big-endian byte order.
+A 64-bit double precision IEEE floating point number in big-endian byte order.
.It Dv bedate
-A four-byte value (on most systems) in big-endian byte order,
+A four-byte value in big-endian byte order,
interpreted as a Unix date.
.It Dv beqdate
-An eight-byte value (on most systems) in big-endian byte order,
+An eight-byte value in big-endian byte order,
interpreted as a Unix date.
.It Dv beldate
-A four-byte value (on most systems) in big-endian byte order,
+A four-byte value in big-endian byte order,
interpreted as a UNIX-style date, but interpreted as local time rather
than UTC.
.It Dv beqldate
-An eight-byte value (on most systems) in big-endian byte order,
+An eight-byte value in big-endian byte order,
interpreted as a UNIX-style date, but interpreted as local time rather
than UTC.
.It Dv bestring16
A two-byte unicode (UCS16) string in big-endian byte order.
.It Dv leshort
-A two-byte value (on most systems) in little-endian byte order.
+A two-byte value in little-endian byte order.
.It Dv lelong
-A four-byte value (on most systems) in little-endian byte order.
+A four-byte value in little-endian byte order.
.It Dv lequad
-An eight-byte value (on most systems) in little-endian byte order.
+An eight-byte value in little-endian byte order.
.It Dv lefloat
-A 32-bit (on most systems) single precision IEEE floating point number in little-endian byte order.
+A 32-bit single precision IEEE floating point number in little-endian byte order.
.It Dv ledouble
-A 64-bit (on most systems) double precision IEEE floating point number in little-endian byte order.
+A 64-bit double precision IEEE floating point number in little-endian byte order.
.It Dv ledate
-A four-byte value (on most systems) in little-endian byte order,
+A four-byte value in little-endian byte order,
interpreted as a UNIX date.
.It Dv leqdate
-An eight-byte value (on most systems) in little-endian byte order,
+An eight-byte value in little-endian byte order,
interpreted as a UNIX date.
.It Dv leldate
-A four-byte value (on most systems) in little-endian byte order,
+A four-byte value in little-endian byte order,
interpreted as a UNIX-style date, but interpreted as local time rather
than UTC.
.It Dv leqldate
-An eight-byte value (on most systems) in little-endian byte order,
+An eight-byte value in little-endian byte order,
interpreted as a UNIX-style date, but interpreted as local time rather
than UTC.
.It Dv lestring16
A two-byte unicode (UCS16) string in little-endian byte order.
.It Dv melong
-A four-byte value (on most systems) in middle-endian (PDP-11) byte order.
+A four-byte value in middle-endian (PDP-11) byte order.
.It Dv medate
-A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
+A four-byte value in middle-endian (PDP-11) byte order,
interpreted as a UNIX date.
.It Dv meldate
-A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
+A four-byte value in middle-endian (PDP-11) byte order,
interpreted as a UNIX-style date, but interpreted as local time rather
than UTC.
.It Dv regex
A regular expression match in extended POSIX regular expression syntax
-(much like egrep).
-The type specification can be optionally followed by /[cse]*.
+(like egrep). Regular expressions can take exponential time to
+process, and their performance is hard to predict, so their use is
+discouraged. When used in production environments, their performance
+should be carefully checked. The type specification can be optionally
+followed by
+.Dv /[c][s] .
The
.Dq c
flag makes the match case insensitive, while the
.Dq s
-or
-.Dq e
-flags update the offset to the starting or ending offsets of the
-match (only one should be used).
-By default, regex does not update the offset.
-The regular expression is always tested against the first
-.Dv N
-lines, where
+flag update the offset to the start offset of the match, rather than the end.
+The regular expression is tested against line
+.Dv N + 1
+onwards, where
.Dv N
-is the given offset, thus it
-is only useful for (single-byte encoded) text.
+is the given offset.
+Line endings are assumed to be in the machine's native format.
.Dv ^
and
.Dv $
-will match the beginning and end of individual lines, respectively,
+match the beginning and end of individual lines, respectively,
not beginning and end of file.
.It Dv search
-A literal string search starting at the given offset.
-It must be followed by
-.Dv \*[Lt]number\*[Gt]
-which specifies how many matches shall be attempted (the range).
-This is suitable for searching larger binary expressions with variable
-offsets, using
+A literal string search starting at the given offset. The same
+modifier flags can be used as for string patterns. The modifier flags
+(if any) must be followed by
+.Dv /number
+the range, that is, the number of positions at which the match will be
+attempted, starting from the start offset. This is suitable for
+searching larger binary expressions with variable offsets, using
.Dv \e
-escapes for special characters.
+escapes for special characters. The offset works as for regex.
.It Dv default
-This is intended to be used with the text
-.Dv x
+This is intended to be used with the test
+.Em x
(which is always true) and a message that is to be used if there are
no other matches.
.El
-.El
+.Pp
+Each top-level magic pattern (see below for an explanation of levels)
+is classified as text or binary according to the types used. Types
+.Dq regex
+and
+.Dq search
+are classified as text tests, unless non-printable characters are used
+in the pattern. All other tests are classified as binary. A top-level
+pattern is considered to be a test text when all its patterns are text
+patterns; otherwise, it is considered to be a binary pattern. When
+matching a file, binary patterns are tried first; if no match is
+found, and the file looks like text, then its encoding is determined
+and the text patterns are tried.
.Pp
The numeric types may optionally be followed by
.Dv \*[Am]
@@ -195,7 +207,6 @@ numeric value before any comparisons are done.
Prepending a
.Dv u
to the type indicates that ordered comparisons should be unsigned.
-.Bl -tag -width ".Dv message"
.It Dv test
The value to be compared with the value from the file.
If the type is
@@ -232,12 +243,8 @@ Operators
and
.Dv ~
don't work with floats and doubles.
-For all tests except
-.Em string
-and
-.Em regex ,
-operation
-.Dv !
+The operator
+.Dv !\&
specifies that the line matches if the test does
.Em not
succeed.
@@ -250,8 +257,8 @@ is octal, and
.Dv 0x13
is hexadecimal.
.Pp
-For string values, the byte string from the
-file must match the specified byte string.
+For string values, the string from the
+file must match the specified string.
The operators
.Dv = ,
.Dv \*[Lt]
@@ -262,10 +269,10 @@ and
can be applied to strings.
The length used for matching is that of the string argument
in the magic file.
-This means that a line can match any string, and
-then presumably print that string, by doing
+This means that a line can match any non-empty string (usually used to
+then print the string), with
.Em \*[Gt]\e0
-(because all strings are greater than the null string).
+(because all non-empty strings are greater than the empty string).
.Pp
The special test
.Em x
@@ -276,11 +283,44 @@ If the string contains a
.Xr printf 3
format specification, the value from the file (with any specified masking
performed) is printed using the message as the format string.
-If the string begins with ``\\b'', the message printed is the
-remainder of the string with no whitespace added before it: multiple
-matches are normally separated by a single space.
+If the string begins with
+.Dq \eb ,
+the message printed is the remainder of the string with no whitespace
+added before it: multiple matches are normally separated by a single
+space.
.El
.Pp
+A MIME type is given on a separate line, which must be the next
+non-blank or comment line after the magic line that identifies the
+file type, and has the following format:
+.Bd -literal -offset indent
+!:mime MIMETYPE
+.Ed
+.Pp
+i.e. the literal string
+.Dq !:mime
+followed by the MIME type.
+.Pp
+An optional strength can be supplied on a separate line which refers to
+the current magic description using the following format:
+.Bd -literal -offset indent
+!:strength OP VALUE
+.Ed
+.Pp
+The operand
+.Dv OP
+can be:
+.Dv + ,
+.Dv - ,
+.Dv * ,
+or
+.Dv /
+and
+.Dv VALUE
+is a constant between 0 and 255.
+This constant is applied using the specified operand
+to the currently computed default magic strength.
+.Pp
Some file formats contain additional information which is to be printed
along with the file type or need additional tests to determine the true
file type.
@@ -350,13 +390,13 @@ That way variable length structures can be examined:
\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2)
.Ed
.Pp
-This strategy of examining has one drawback: You must make sure that
+This strategy of examining has a drawback: You must make sure that
you eventually print something, or users may get empty output (like, when
there is neither PE\e0\e0 nor LE\e0\e0 in the above example)
.Pp
-If this indirect offset cannot be used as-is, there are simple calculations
+If this indirect offset cannot be used directly, simple calculations are
possible: appending
-.Em [+-*/%\*[Am]|^]\*[Lt]number\*[Gt]
+.Em [+-*/%\*[Am]|^]number
inside parentheses allows one to modify
the value read from the file before it is used as an offset:
.Bd -literal -offset indent
@@ -468,4 +508,3 @@ a system on which the lengths are invariant.
.\" the changes I posted to the S5R2 version.
.\"
.\" Modified for Ian Darwin's version of the file command.
-.\" @(#)$Id: magic.man,v 1.39 2007/11/08 00:31:37 christos Exp $
OpenPOWER on IntegriCloud