diff options
Diffstat (limited to 'magic.man')
-rw-r--r-- | magic.man | 471 |
1 files changed, 471 insertions, 0 deletions
diff --git a/magic.man b/magic.man new file mode 100644 index 0000000..3842b64 --- /dev/null +++ b/magic.man @@ -0,0 +1,471 @@ +.\" $File: magic.man,v 1.39 2007/11/08 00:31:37 christos Exp $ +.Dd January 10, 2007 +.Dt MAGIC __FSECTION__ +.Os +.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems. +.Sh NAME +.Nm magic +.Nd file command's magic number file +.Sh DESCRIPTION +This manual page documents the format of the magic file as +used by the +.Xr file __CSECTION__ +command, version __VERSION__. +The +.Xr file __CSECTION__ +command identifies the type of a file using, +among other tests, +a test for whether the file begins with a certain +.Dq "magic number" . +The file +.Pa __MAGIC__ +specifies what magic numbers are to be tested for, +what message to print if a particular magic number is found, +and additional information to extract from the file. +.Pp +Each line of the file specifies a test to be performed. +A test compares the data starting at a particular offset +in the file with a 1-byte, 2-byte, or 4-byte numeric value or +a string. +If the test succeeds, a message is printed. +The line consists of the following fields: +.Bl -tag -width ".Dv message" +.It Dv offset +A number specifying the offset, in bytes, into the file of the data +which is to be tested. +.It Dv type +The type of the data to be tested. +The possible values are: +.Bl -tag -width ".Dv lestring16" +.It Dv byte +A one-byte value. +.It Dv short +A two-byte value (on most systems) in this machine's native byte order. +.It Dv long +A four-byte value (on most systems) in this machine's native byte order. +.It Dv quad +An eight-byte value (on most systems) in this machine's native byte order. +.It Dv float +A 32-bit (on most systems) single precision IEEE floating point number in this machine's native byte order. +.It Dv double +A 64-bit (on most systems) double precision IEEE floating point number in this machine's native byte order. +.It Dv string +A string of bytes. +The string type specification can be optionally followed +by /[Bbc]*. +The +.Dq B +flag compacts whitespace in the target, which must +contain at least one whitespace character. +If the magic has +.Dv n +consecutive blanks, the target needs at least +.Dv n +consecutive blanks to match. +The +.Dq b +flag treats every blank in the target as an optional blank. +Finally the +.Dq c +flag, specifies case insensitive matching: lowercase +characters in the magic match both lower and upper case characters in the +targer, whereas upper case characters in the magic, only much uppercase +characters in the target. +.It Dv pstring +A pascal style string where the first byte is interpreted as the an +unsigned length. +The string is not NUL terminated. +.It Dv date +A four-byte value interpreted as a UNIX date. +.It Dv qdate +A eight-byte value interpreted as a UNIX date. +.It Dv ldate +A four-byte value interpreted as a UNIX-style date, but interpreted as +local time rather than UTC. +.It Dv qldate +An eight-byte value interpreted as a UNIX-style date, but interpreted as +local time rather than UTC. +.It Dv beshort +A two-byte value (on most systems) in big-endian byte order. +.It Dv belong +A four-byte value (on most systems) in big-endian byte order. +.It Dv bequad +An eight-byte value (on most systems) in big-endian byte order. +.It Dv befloat +A 32-bit (on most systems) single precision IEEE floating point number in big-endian byte order. +.It Dv bedouble +A 64-bit (on most systems) double precision IEEE floating point number in big-endian byte order. +.It Dv bedate +A four-byte value (on most systems) in big-endian byte order, +interpreted as a Unix date. +.It Dv beqdate +An eight-byte value (on most systems) in big-endian byte order, +interpreted as a Unix date. +.It Dv beldate +A four-byte value (on most systems) in big-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.It Dv beqldate +An eight-byte value (on most systems) in big-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.It Dv bestring16 +A two-byte unicode (UCS16) string in big-endian byte order. +.It Dv leshort +A two-byte value (on most systems) in little-endian byte order. +.It Dv lelong +A four-byte value (on most systems) in little-endian byte order. +.It Dv lequad +An eight-byte value (on most systems) in little-endian byte order. +.It Dv lefloat +A 32-bit (on most systems) single precision IEEE floating point number in little-endian byte order. +.It Dv ledouble +A 64-bit (on most systems) double precision IEEE floating point number in little-endian byte order. +.It Dv ledate +A four-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX date. +.It Dv leqdate +An eight-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX date. +.It Dv leldate +A four-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.It Dv leqldate +An eight-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.It Dv lestring16 +A two-byte unicode (UCS16) string in little-endian byte order. +.It Dv melong +A four-byte value (on most systems) in middle-endian (PDP-11) byte order. +.It Dv medate +A four-byte value (on most systems) in middle-endian (PDP-11) byte order, +interpreted as a UNIX date. +.It Dv meldate +A four-byte value (on most systems) in middle-endian (PDP-11) byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.It Dv regex +A regular expression match in extended POSIX regular expression syntax +(much like egrep). +The type specification can be optionally followed by /[cse]*. +The +.Dq c +flag makes the match case insensitive, while the +.Dq s +or +.Dq e +flags update the offset to the starting or ending offsets of the +match (only one should be used). +By default, regex does not update the offset. +The regular expression is always tested against the first +.Dv N +lines, where +.Dv N +is the given offset, thus it +is only useful for (single-byte encoded) text. +.Dv ^ +and +.Dv $ +will match the beginning and end of individual lines, respectively, +not beginning and end of file. +.It Dv search +A literal string search starting at the given offset. +It must be followed by +.Dv \*[Lt]number\*[Gt] +which specifies how many matches shall be attempted (the range). +This is suitable for searching larger binary expressions with variable +offsets, using +.Dv \e +escapes for special characters. +.It Dv default +This is intended to be used with the text +.Dv x +(which is always true) and a message that is to be used if there are +no other matches. +.El +.El +.Pp +The numeric types may optionally be followed by +.Dv \*[Am] +and a numeric value, +to specify that the value is to be AND'ed with the +numeric value before any comparisons are done. +Prepending a +.Dv u +to the type indicates that ordered comparisons should be unsigned. +.Bl -tag -width ".Dv message" +.It Dv test +The value to be compared with the value from the file. +If the type is +numeric, this value +is specified in C form; if it is a string, it is specified as a C string +with the usual escapes permitted (e.g. \en for new-line). +.Pp +Numeric values +may be preceded by a character indicating the operation to be performed. +It may be +.Dv = , +to specify that the value from the file must equal the specified value, +.Dv \*[Lt] , +to specify that the value from the file must be less than the specified +value, +.Dv \*[Gt] , +to specify that the value from the file must be greater than the specified +value, +.Dv \*[Am] , +to specify that the value from the file must have set all of the bits +that are set in the specified value, +.Dv ^ , +to specify that the value from the file must have clear any of the bits +that are set in the specified value, or +.Dv ~ , +the value specified after is negated before tested. +.Dv x , +to specify that any value will match. +If the character is omitted, it is assumed to be +.Dv = . +Operators +.Dv \*[Am] , +.Dv ^ , +and +.Dv ~ +don't work with floats and doubles. +For all tests except +.Em string +and +.Em regex , +operation +.Dv ! +specifies that the line matches if the test does +.Em not +succeed. +.Pp +Numeric values are specified in C form; e.g. +.Dv 13 +is decimal, +.Dv 013 +is octal, and +.Dv 0x13 +is hexadecimal. +.Pp +For string values, the byte string from the +file must match the specified byte string. +The operators +.Dv = , +.Dv \*[Lt] +and +.Dv \*[Gt] +(but not +.Dv \*[Am] ) +can be applied to strings. +The length used for matching is that of the string argument +in the magic file. +This means that a line can match any string, and +then presumably print that string, by doing +.Em \*[Gt]\e0 +(because all strings are greater than the null string). +.Pp +The special test +.Em x +always evaluates to true. +.Dv message +The message to be printed if the comparison succeeds. +If the string contains a +.Xr printf 3 +format specification, the value from the file (with any specified masking +performed) is printed using the message as the format string. +If the string begins with ``\\b'', the message printed is the +remainder of the string with no whitespace added before it: multiple +matches are normally separated by a single space. +.El +.Pp +Some file formats contain additional information which is to be printed +along with the file type or need additional tests to determine the true +file type. +These additional tests are introduced by one or more +.Em \*[Gt] +characters preceding the offset. +The number of +.Em \*[Gt] +on the line indicates the level of the test; a line with no +.Em \*[Gt] +at the beginning is considered to be at level 0. +Tests are arranged in a tree-like hierarchy: +If a the test on a line at level +.Em n +succeeds, all following tests at level +.Em n+1 +are performed, and the messages printed if the tests succeed, untile a line +with level +.Em n +(or less) appears. +For more complex files, one can use empty messages to get just the +"if/then" effect, in the following way: +.Bd -literal -offset indent +0 string MZ +\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable +\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) +.Ed +.Pp +Offsets do not need to be constant, but can also be read from the file +being examined. +If the first character following the last +.Em \*[Gt] +is a +.Em ( +then the string after the parenthesis is interpreted as an indirect offset. +That means that the number after the parenthesis is used as an offset in +the file. +The value at that offset is read, and is used again as an offset +in the file. +Indirect offsets are of the form: +.Em (( x [.[bslBSL]][+\-][ y ]) . +The value of +.Em x +is used as an offset in the file. +A byte, short or long is read at that offset depending on the +.Em [bslBSLm] +type specifier. +The capitalized types interpret the number as a big endian +value, whereas the small letter versions interpret the number as a little +endian value; +the +.Em m +type interprets the number as a middle endian (PDP-11) value. +To that number the value of +.Em y +is added and the result is used as an offset in the file. +The default type if one is not specified is long. +.Pp +That way variable length structures can be examined: +.Bd -literal -offset indent +# MS Windows executables are also valid MS-DOS executables +0 string MZ +\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) +# skip the whole block below if it is not an extended executable +\*[Gt]0x18 leshort \*[Gt]0x3f +\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) +\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) +.Ed +.Pp +This strategy of examining has one drawback: You must make sure that +you eventually print something, or users may get empty output (like, when +there is neither PE\e0\e0 nor LE\e0\e0 in the above example) +.Pp +If this indirect offset cannot be used as-is, there are simple calculations +possible: appending +.Em [+-*/%\*[Am]|^]\*[Lt]number\*[Gt] +inside parentheses allows one to modify +the value read from the file before it is used as an offset: +.Bd -literal -offset indent +# MS Windows executables are also valid MS-DOS executables +0 string MZ +# sometimes, the value at 0x18 is less that 0x40 but there's still an +# extended executable, simply appended to the file +\*[Gt]0x18 leshort \*[Lt]0x40 +\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) +\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) +.Ed +.Pp +Sometimes you do not know the exact offset as this depends on the length or +position (when indirection was used before) of preceding fields. +You can specify an offset relative to the end of the last up-level +field using +.Sq \*[Am] +as a prefix to the offset: +.Bd -literal -offset indent +0 string MZ +\*[Gt]0x18 leshort \*[Gt]0x3f +\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) +# immediately following the PE signature is the CPU type +\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 +\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha +.Ed +.Pp +Indirect and relative offsets can be combined: +.Bd -literal -offset indent +0 string MZ +\*[Gt]0x18 leshort \*[Lt]0x40 +\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) +# if it's not COFF, go back 512 bytes and add the offset taken +# from byte 2/3, which is yet another way of finding the start +# of the extended executable +\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) +.Ed +.Pp +Or the other way around: +.Bd -literal -offset indent +0 string MZ +\*[Gt]0x18 leshort \*[Gt]0x3f +\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) +# at offset 0x80 (-4, since relative offsets start at the end +# of the up-level match) inside the LE header, we find the absolute +# offset to the code area, where we look for a specific signature +\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed +.Ed +.Pp +Or even both! +.Bd -literal -offset indent +0 string MZ +\*[Gt]0x18 leshort \*[Gt]0x3f +\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) +# at offset 0x58 inside the LE header, we find the relative offset +# to a data area where we look for a specific signature +\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive +.Ed +.Pp +Finally, if you have to deal with offset/length pairs in your file, even the +second value in a parenthesized expression can be taken from the file itself, +using another set of parentheses. +Note that this additional indirect offset is always relative to the +start of the main indirect offset. +.Bd -literal -offset indent +0 string MZ +\*[Gt]0x18 leshort \*[Gt]0x3f +\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) +# search for the PE section called ".idata"... +\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata +# ...and go to the end of it, calculated from start+length; +# these are located 14 and 10 bytes after the section name +\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive +.Ed +.Sh SEE ALSO +.Xr file __CSECTION__ +\- the command that reads this file. +.Sh BUGS +The formats +.Dv long , +.Dv belong , +.Dv lelong , +.Dv melong , +.Dv short , +.Dv beshort , +.Dv leshort , +.Dv date , +.Dv bedate , +.Dv medate , +.Dv ledate , +.Dv beldate , +.Dv leldate , +and +.Dv meldate +are system-dependent; perhaps they should be specified as a number +of bytes (2B, 4B, etc), +since the files being recognized typically come from +a system on which the lengths are invariant. +.\" +.\" From: guy@sun.uucp (Guy Harris) +.\" Newsgroups: net.bugs.usg +.\" Subject: /etc/magic's format isn't well documented +.\" Message-ID: <2752@sun.uucp> +.\" Date: 3 Sep 85 08:19:07 GMT +.\" Organization: Sun Microsystems, Inc. +.\" Lines: 136 +.\" +.\" Here's a manual page for the format accepted by the "file" made by adding +.\" the changes I posted to the S5R2 version. +.\" +.\" Modified for Ian Darwin's version of the file command. +.\" @(#)$Id: magic.man,v 1.39 2007/11/08 00:31:37 christos Exp $ |