diff options
Diffstat (limited to 'usr.bin/file')
-rw-r--r-- | usr.bin/file/file.1 | 47 | ||||
-rw-r--r-- | usr.bin/file/magic.5 | 227 |
2 files changed, 235 insertions, 39 deletions
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1 index 7891226..2e8dd60 100644 --- a/usr.bin/file/file.1 +++ b/usr.bin/file/file.1 @@ -1,6 +1,6 @@ .\" $FreeBSD$ -.\" $Id: file.man,v 1.54 2003/10/27 18:09:08 christos Exp $ -.Dd October 27, 2003 +.\" $Id: file.man,v 1.57 2005/08/18 15:18:22 christos Exp $ +.Dd August 18, 2005 .Dt FILE 1 "Copyright but distributable" .Os .Sh NAME @@ -8,7 +8,7 @@ .Nd determine file type .Sh SYNOPSIS .Nm -.Op Fl bcikLnNprsvz +.Op Fl bchikLnNprsvz .Op Fl f Ar namefile .Op Fl F Ar separator .Op Fl m Ar magicfiles @@ -17,7 +17,7 @@ .Fl C .Op Fl m Ar magicfile .Sh DESCRIPTION -This manual page documents version 4.12 of the +This manual page documents version 4.17 of the .Nm utility which tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: @@ -103,6 +103,13 @@ magic file or .Pa /usr/share/misc/magic if the compile file does not exist. +In addition +.Nm +will look in +.Pa $HOME/.magic.mgc , +or +.Pa $HOME/.magic +for magic entries. .Pp If a file does not match any of the entries in the magic file, it is examined to see if it seems to be a text file. @@ -187,6 +194,13 @@ Use the specified string as the separator between the filename and the file result returned. Defaults to .Ql \&: . +.It Fl h , -no-dereference +Causes symlinks not to be followed +(on systems that support symbolic links). +This is the default if the +environment variable +.Ev POSIXLY_CORRECT +is not defined. .It Fl i , -mime Causes the file command to output mime type strings rather than the more traditional human readable ones. @@ -206,8 +220,11 @@ section, below). Do not stop at the first match, keep going. .It Fl L , -dereference option causes symlinks to be followed, as the like-named option in -.Xr ls 1 . +.Xr ls 1 (on systems that support symbolic links). +This is the default if the environment variable +.Ev POSIXLY_CORRECT +is defined. .It Fl m , -magic-file Ar list Specify an alternate list of files containing magic numbers. This can be a single file, or a colon-separated list of files. @@ -281,19 +298,35 @@ option is specified. Default list of magic numbers, used to output mime types when the .Fl i option is specified. -.It Pa /etc/magic -Local additions to magic wisdom. .El .Sh ENVIRONMENT The environment variable .Ev MAGIC can be used to set the default magic number file name. +If that variable is set, then +.Nm +will not attempt to open +.Pa $HOME/.magic . .Nm adds .Pa .mime and/or .Pa .mgc to the value of this variable as appropriate. +The environment variable +.Ev POSIXLY_CORRECT +controls (on systems that support symbolic links), if +.Nm +will attempt to follow symlinks or not. +If set, then +.Nm +follows symlink, otherwise it does not. +This is also controlled +by the +.Fl L +and +.Fl h +options. .Sh SEE ALSO .Xr hexdump 1 , .Xr od 1 , diff --git a/usr.bin/file/magic.5 b/usr.bin/file/magic.5 index 75f52d6..27e85a1 100644 --- a/usr.bin/file/magic.5 +++ b/usr.bin/file/magic.5 @@ -3,7 +3,7 @@ .\" .\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems. .\" -.Dd September 12, 2003 +.Dd February 19, 2006 .Dt MAGIC 5 "Public Domain" .Os .Sh NAME @@ -13,7 +13,7 @@ This manual page documents the format of the magic file as used by the .Nm -command, version 4.12. +command, version 4.17. The .Nm file command identifies the type of a file using, @@ -68,6 +68,12 @@ flag, specifies case insensitive matching: lowercase characters in the magic match both lower and upper case characters in the targer, whereas upper case characters in the magic, only much uppercase characters in the target. +.It pstring +A pascal style string where the first byte is interpreted as the an +unsigned length. +The string is not +.Dv NUL +terminated. .It date A four-byte value interpreted as a .Ux @@ -86,6 +92,14 @@ A four-byte value (on most systems) in big-endian byte order, interpreted as a .Ux date. +.It beldate +A four-byte value (on most systems) in big-endian byte order, +interpreted as a +.Ux Ns -style +date, but interpreted as local time rather +than UTC. +.It bestring16 +A two-byte unicode (UCS16) string in big-endian byte order. .It leshort A two-byte value (on most systems) in little-endian byte order. .It lelong @@ -101,6 +115,50 @@ interpreted as a .Ux Ns -style date, but interpreted as local time rather than UTC. +.It lestring16 +A two-byte unicode (UCS16) string in little-endian byte order. +.It melong +A four-byte value (on most systems) in middle-endian (PDP-11) byte order. +.It medate +A four-byte value (on most systems) in middle-endian (PDP-11) byte order, +interpreted as a +.Ux +date. +.It meldate +A four-byte value (on most systems) in middle-endian (PDP-11) byte order, +interpreted as a +.Ux Ns -style +date, but interpreted as local time rather +than UTC. +.It regex +A regular expression match in extended +.Tn POSIX +regular expression syntax +(much like egrep). +The type specification can be optionally followed by +.Ql /c +for case-insensitive matches. +The regular expression is always +tested against the first +.Ar N +lines, where +.Ar N +is the given offset, thus it +is only useful for (single-byte encoded) text. +.Ql ^ +and +.Ql $ +will match the beginning and end of individual lines, respectively, +not beginning and end of file. +.It search +A literal string search starting at the given offset. +It must be followed by +.Li / Ns Aq Ar number +which specifies how many matches shall be attempted (the range). +This is suitable for searching larger binary expressions with variable +offsets, using +.Ql \e +escapes for special characters. .El .El .Pp @@ -137,11 +195,22 @@ that are set in the specified value, .Em ^ , to specify that the value from the file must have clear any of the bits that are set in the specified value, or +.Em ~ , +the value specified after is negated before tested, or .Em x , to specify that any value will match. If the character is omitted, it is assumed to be .Em = . +For all tests except +.Dq string +and +.Dq regex , +operation +.Em !\& +specifies that the line matches if the test does +.Em not +succeed. .It "" Numeric values are specified in C form; e.g.\& .Em 13 @@ -177,29 +246,35 @@ performed) is printed using the message as the format string. .El .Pp Some file formats contain additional information which is to be printed -along with the file type. -A line which begins with the character +along with the file type or need additional tests to determine the true +file type. +These additional tests are introduced by one or more .Em > -indicates additional tests and messages to be printed. +characters preceding the offset. The number of .Em > on the line indicates the level of the test; a line with no .Em > at the beginning is considered to be at level 0. -Each line at level -.Em n+1 -is under the control of the line at level -.Em n -most closely preceding it in the magic file. -If the test on a line at level +Tests are arranged in a tree-like hierarchy: +If a the test on a line at level .Em n -succeeds, the tests specified in all the subsequent lines at level +succeeds, all following tests at level .Em n+1 -are performed, and the messages printed if the tests succeed. -The next -line at level +are performed, and the messages printed if the tests succeed, until a line +with level .Em n -terminates this. +(or less) appears. +For more complex files, one can use empty messages to get just the +"if/then" effect, in the following way: +.Bd -literal -offset indent +0 string MZ +>0x18 leshort <0x40 MS-DOS executable +>0x18 leshort >0x3f extended PC executable (e.g., MS Windows) +.Ed +.Pp +Offsets do not need to be constant, but can also be read from the file +being examined. If the first character following the last .Em > is a @@ -216,45 +291,133 @@ The value of is used as an offset in the file. A byte, short or long is read at that offset depending on the -.Em [bslBSL] +.Em [bslBSLm] type specifier. The capitalized types interpret the number as a big endian value, whereas -a small letter versions interpret the number as a little endian value. +a small letter versions interpret the number as a little endian value; +the +.Em m +type interprets the number as a middle endian (PDP-11) value. To that number the value of .Em y is added and the result is used as an offset in the file. The default type if one is not specified is long. .Pp -Sometimes you do not know the exact offset as this depends on the length of -preceding fields. -You can specify an offset relative to the end of the -last uplevel field (of course this may only be done for sublevel tests, i.e.\& -test beginning with -.Em > Ns ) . -Such a relative offset is specified using +That way variable length structures can be examined: +.Bd -literal -offset indent +# MS Windows executables are also valid MS-DOS executables +0 string MZ +>0x18 leshort <0x40 MZ executable (MS-DOS) +# skip the whole block below if it is not an extended executable +>0x18 leshort >0x3f +>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) +>>(0x3c.l) string LX\e0\e0 LX executable (OS/2) +.Ed +.Pp +This strategy of examining has one drawback: You must make sure that +you eventually print something, or users may get empty output (like, when +there is neither PE\e0\e0 nor LE\e0\e0 in the above example). +.Pp +If this indirect offset cannot be used as-is, there are simple calculations +possible: appending +.Em [+-*/%&|^]<number> +inside parentheses allows one to modify +the value read from the file before it is used as an offset: +.Bd -literal -offset indent +# MS Windows executables are also valid MS-DOS executables +0 string MZ +# sometimes, the value at 0x18 is less that 0x40 but there's still an +# extended executable, simply appended to the file +>0x18 leshort <0x40 +>>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) +>>(4.s*512) leshort !0x014c MZ executable (MS-DOS) +.Ed +.Pp +Sometimes you do not know the exact offset as this depends on the length or +position (when indirection was used before) of preceding fields. +You can +specify an offset relative to the end of the last uplevel field using .Em & -as a prefix to the offset. +as a prefix to the offset: +.Bd -literal -offset indent +0 string MZ +>0x18 leshort >0x3f +>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) +# immediately following the PE signature is the CPU type +>>>&0 leshort 0x14c for Intel 80386 +>>>&0 leshort 0x184 for DEC Alpha +.Ed +.Pp +Indirect and relative offsets can be combined: +.Bd -literal -offset indent +0 string MZ +>0x18 leshort <0x40 +>>(4.s*512) leshort !0x014c MZ executable (MS-DOS) +# if it's not COFF, go back 512 bytes and add the offset taken +# from byte 2/3, which is yet another way of finding the start +# of the extended executable +>>>&(2.s-514) string LE LE executable (MS Windows VxD driver) +.Ed +.Pp +Or the other way around: +.Bd -literal -offset indent +0 string MZ +>0x18 leshort >0x3f +>>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) +# at offset 0x80 (-4, since relative offsets start at the end +# of the uplevel match) inside the LE header, we find the absolute +# offset to the code area, where we look for a specific signature +>>>(&0x7c.l+0x26) string UPX \eb, UPX compressed +.Ed +.Pp +Or even both! +.Bd -literal -offset indent +0 string MZ +>0x18 leshort >0x3f +>>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) +# at offset 0x58 inside the LE header, we find the relative offset +# to a data area where we look for a specific signature +>>>&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive +.Ed +.Pp +Finally, if you have to deal with offset/length pairs in your file, even the +second value in a parenthesed expression can be taken from the file itself, +using another set of parentheses. +Note that this additional indirect offset +is always relative to the start of the main indirect offset. +.Bd -literal -offset indent +0 string MZ +>0x18 leshort >0x3f +>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) +# search for the PE section called ".idata"... +>>>&0xf4 search/0x140 .idata +# ...and go to the end of it, calculated from start+length; +# these are located 14 and 10 bytes after the section name +>>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive +.Ed .Sh BUGS The formats .Em long , .Em belong , .Em lelong , +.Em melong , .Em short , .Em beshort , .Em leshort , .Em date , .Em bedate , +.Em medate , +.Em ledate , +.Em beldate , +.Em leldate , and -.Em ledate +.Em meldate are system-dependent; perhaps they should be specified as a number of bytes (2B, 4B, etc), since the files being recognized typically come from a system on which the lengths are invariant. .Pp -There is (currently) no support for specified-endian data to be used in -indirect offsets. -.Pp If .Pa /usr/share/misc/magic is newer than @@ -264,7 +427,7 @@ Use the command: .Po cd /usr/share/misc && .Nm file -.Fl C +.Fl C .Fl m Ar magic .Pc to rebuild. @@ -283,4 +446,4 @@ to rebuild. .\" the changes I posted to the S5R2 version. .\" .\" Modified for Ian Darwin's version of the file command. -.\" @(#)$Id: magic.man,v 1.27 2003/09/12 19:43:30 christos Exp $ +.\" @(#)$Id: magic.man,v 1.30 2006/02/19 18:16:03 christos Exp $ |