summaryrefslogtreecommitdiffstats
path: root/usr.bin/file
diff options
context:
space:
mode:
Diffstat (limited to 'usr.bin/file')
-rw-r--r--usr.bin/file/file.147
-rw-r--r--usr.bin/file/magic.5227
2 files changed, 235 insertions, 39 deletions
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1
index 7891226..2e8dd60 100644
--- a/usr.bin/file/file.1
+++ b/usr.bin/file/file.1
@@ -1,6 +1,6 @@
.\" $FreeBSD$
-.\" $Id: file.man,v 1.54 2003/10/27 18:09:08 christos Exp $
-.Dd October 27, 2003
+.\" $Id: file.man,v 1.57 2005/08/18 15:18:22 christos Exp $
+.Dd August 18, 2005
.Dt FILE 1 "Copyright but distributable"
.Os
.Sh NAME
@@ -8,7 +8,7 @@
.Nd determine file type
.Sh SYNOPSIS
.Nm
-.Op Fl bcikLnNprsvz
+.Op Fl bchikLnNprsvz
.Op Fl f Ar namefile
.Op Fl F Ar separator
.Op Fl m Ar magicfiles
@@ -17,7 +17,7 @@
.Fl C
.Op Fl m Ar magicfile
.Sh DESCRIPTION
-This manual page documents version 4.12 of the
+This manual page documents version 4.17 of the
.Nm
utility which tests each argument in an attempt to classify it.
There are three sets of tests, performed in this order:
@@ -103,6 +103,13 @@ magic file
or
.Pa /usr/share/misc/magic
if the compile file does not exist.
+In addition
+.Nm
+will look in
+.Pa $HOME/.magic.mgc ,
+or
+.Pa $HOME/.magic
+for magic entries.
.Pp
If a file does not match any of the entries in the magic file,
it is examined to see if it seems to be a text file.
@@ -187,6 +194,13 @@ Use the specified string as the separator between the filename and the
file result returned.
Defaults to
.Ql \&: .
+.It Fl h , -no-dereference
+Causes symlinks not to be followed
+(on systems that support symbolic links).
+This is the default if the
+environment variable
+.Ev POSIXLY_CORRECT
+is not defined.
.It Fl i , -mime
Causes the file command to output mime type strings rather than the more
traditional human readable ones.
@@ -206,8 +220,11 @@ section, below).
Do not stop at the first match, keep going.
.It Fl L , -dereference
option causes symlinks to be followed, as the like-named option in
-.Xr ls 1 .
+.Xr ls 1
(on systems that support symbolic links).
+This is the default if the environment variable
+.Ev POSIXLY_CORRECT
+is defined.
.It Fl m , -magic-file Ar list
Specify an alternate list of files containing magic numbers.
This can be a single file, or a colon-separated list of files.
@@ -281,19 +298,35 @@ option is specified.
Default list of magic numbers, used to output mime types when the
.Fl i
option is specified.
-.It Pa /etc/magic
-Local additions to magic wisdom.
.El
.Sh ENVIRONMENT
The environment variable
.Ev MAGIC
can be used to set the default magic number file name.
+If that variable is set, then
+.Nm
+will not attempt to open
+.Pa $HOME/.magic .
.Nm
adds
.Pa .mime
and/or
.Pa .mgc
to the value of this variable as appropriate.
+The environment variable
+.Ev POSIXLY_CORRECT
+controls (on systems that support symbolic links), if
+.Nm
+will attempt to follow symlinks or not.
+If set, then
+.Nm
+follows symlink, otherwise it does not.
+This is also controlled
+by the
+.Fl L
+and
+.Fl h
+options.
.Sh SEE ALSO
.Xr hexdump 1 ,
.Xr od 1 ,
diff --git a/usr.bin/file/magic.5 b/usr.bin/file/magic.5
index 75f52d6..27e85a1 100644
--- a/usr.bin/file/magic.5
+++ b/usr.bin/file/magic.5
@@ -3,7 +3,7 @@
.\"
.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
.\"
-.Dd September 12, 2003
+.Dd February 19, 2006
.Dt MAGIC 5 "Public Domain"
.Os
.Sh NAME
@@ -13,7 +13,7 @@
This manual page documents the format of the magic file as
used by the
.Nm
-command, version 4.12.
+command, version 4.17.
The
.Nm file
command identifies the type of a file using,
@@ -68,6 +68,12 @@ flag, specifies case insensitive matching: lowercase characters
in the magic match both lower and upper case characters in the
targer, whereas upper case characters in the magic, only much
uppercase characters in the target.
+.It pstring
+A pascal style string where the first byte is interpreted as the an
+unsigned length.
+The string is not
+.Dv NUL
+terminated.
.It date
A four-byte value interpreted as a
.Ux
@@ -86,6 +92,14 @@ A four-byte value (on most systems) in big-endian byte order,
interpreted as a
.Ux
date.
+.It beldate
+A four-byte value (on most systems) in big-endian byte order,
+interpreted as a
+.Ux Ns -style
+date, but interpreted as local time rather
+than UTC.
+.It bestring16
+A two-byte unicode (UCS16) string in big-endian byte order.
.It leshort
A two-byte value (on most systems) in little-endian byte order.
.It lelong
@@ -101,6 +115,50 @@ interpreted as a
.Ux Ns -style
date, but interpreted as local time rather
than UTC.
+.It lestring16
+A two-byte unicode (UCS16) string in little-endian byte order.
+.It melong
+A four-byte value (on most systems) in middle-endian (PDP-11) byte order.
+.It medate
+A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
+interpreted as a
+.Ux
+date.
+.It meldate
+A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
+interpreted as a
+.Ux Ns -style
+date, but interpreted as local time rather
+than UTC.
+.It regex
+A regular expression match in extended
+.Tn POSIX
+regular expression syntax
+(much like egrep).
+The type specification can be optionally followed by
+.Ql /c
+for case-insensitive matches.
+The regular expression is always
+tested against the first
+.Ar N
+lines, where
+.Ar N
+is the given offset, thus it
+is only useful for (single-byte encoded) text.
+.Ql ^
+and
+.Ql $
+will match the beginning and end of individual lines, respectively,
+not beginning and end of file.
+.It search
+A literal string search starting at the given offset.
+It must be followed by
+.Li / Ns Aq Ar number
+which specifies how many matches shall be attempted (the range).
+This is suitable for searching larger binary expressions with variable
+offsets, using
+.Ql \e
+escapes for special characters.
.El
.El
.Pp
@@ -137,11 +195,22 @@ that are set in the specified value,
.Em ^ ,
to specify that the value from the file must have clear any of the bits
that are set in the specified value, or
+.Em ~ ,
+the value specified after is negated before tested, or
.Em x ,
to specify that any value will match.
If the character is omitted,
it is assumed to be
.Em = .
+For all tests except
+.Dq string
+and
+.Dq regex ,
+operation
+.Em !\&
+specifies that the line matches if the test does
+.Em not
+succeed.
.It ""
Numeric values are specified in C form; e.g.\&
.Em 13
@@ -177,29 +246,35 @@ performed) is printed using the message as the format string.
.El
.Pp
Some file formats contain additional information which is to be printed
-along with the file type.
-A line which begins with the character
+along with the file type or need additional tests to determine the true
+file type.
+These additional tests are introduced by one or more
.Em >
-indicates additional tests and messages to be printed.
+characters preceding the offset.
The number of
.Em >
on the line indicates the level of the test; a line with no
.Em >
at the beginning is considered to be at level 0.
-Each line at level
-.Em n+1
-is under the control of the line at level
-.Em n
-most closely preceding it in the magic file.
-If the test on a line at level
+Tests are arranged in a tree-like hierarchy:
+If a the test on a line at level
.Em n
-succeeds, the tests specified in all the subsequent lines at level
+succeeds, all following tests at level
.Em n+1
-are performed, and the messages printed if the tests succeed.
-The next
-line at level
+are performed, and the messages printed if the tests succeed, until a line
+with level
.Em n
-terminates this.
+(or less) appears.
+For more complex files, one can use empty messages to get just the
+"if/then" effect, in the following way:
+.Bd -literal -offset indent
+0 string MZ
+>0x18 leshort <0x40 MS-DOS executable
+>0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
+.Ed
+.Pp
+Offsets do not need to be constant, but can also be read from the file
+being examined.
If the first character following the last
.Em >
is a
@@ -216,45 +291,133 @@ The value of
is used as an offset in the file.
A byte, short or long is read at that offset
depending on the
-.Em [bslBSL]
+.Em [bslBSLm]
type specifier.
The capitalized types interpret the number as a big endian value, whereas
-a small letter versions interpret the number as a little endian value.
+a small letter versions interpret the number as a little endian value;
+the
+.Em m
+type interprets the number as a middle endian (PDP-11) value.
To that number the value of
.Em y
is added and the result is used as an offset in the file.
The default type
if one is not specified is long.
.Pp
-Sometimes you do not know the exact offset as this depends on the length of
-preceding fields.
-You can specify an offset relative to the end of the
-last uplevel field (of course this may only be done for sublevel tests, i.e.\&
-test beginning with
-.Em > Ns ) .
-Such a relative offset is specified using
+That way variable length structures can be examined:
+.Bd -literal -offset indent
+# MS Windows executables are also valid MS-DOS executables
+0 string MZ
+>0x18 leshort <0x40 MZ executable (MS-DOS)
+# skip the whole block below if it is not an extended executable
+>0x18 leshort >0x3f
+>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+>>(0x3c.l) string LX\e0\e0 LX executable (OS/2)
+.Ed
+.Pp
+This strategy of examining has one drawback: You must make sure that
+you eventually print something, or users may get empty output (like, when
+there is neither PE\e0\e0 nor LE\e0\e0 in the above example).
+.Pp
+If this indirect offset cannot be used as-is, there are simple calculations
+possible: appending
+.Em [+-*/%&|^]<number>
+inside parentheses allows one to modify
+the value read from the file before it is used as an offset:
+.Bd -literal -offset indent
+# MS Windows executables are also valid MS-DOS executables
+0 string MZ
+# sometimes, the value at 0x18 is less that 0x40 but there's still an
+# extended executable, simply appended to the file
+>0x18 leshort <0x40
+>>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
+>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
+.Ed
+.Pp
+Sometimes you do not know the exact offset as this depends on the length or
+position (when indirection was used before) of preceding fields.
+You can
+specify an offset relative to the end of the last uplevel field using
.Em &
-as a prefix to the offset.
+as a prefix to the offset:
+.Bd -literal -offset indent
+0 string MZ
+>0x18 leshort >0x3f
+>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+# immediately following the PE signature is the CPU type
+>>>&0 leshort 0x14c for Intel 80386
+>>>&0 leshort 0x184 for DEC Alpha
+.Ed
+.Pp
+Indirect and relative offsets can be combined:
+.Bd -literal -offset indent
+0 string MZ
+>0x18 leshort <0x40
+>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
+# if it's not COFF, go back 512 bytes and add the offset taken
+# from byte 2/3, which is yet another way of finding the start
+# of the extended executable
+>>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
+.Ed
+.Pp
+Or the other way around:
+.Bd -literal -offset indent
+0 string MZ
+>0x18 leshort >0x3f
+>>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+# at offset 0x80 (-4, since relative offsets start at the end
+# of the uplevel match) inside the LE header, we find the absolute
+# offset to the code area, where we look for a specific signature
+>>>(&0x7c.l+0x26) string UPX \eb, UPX compressed
+.Ed
+.Pp
+Or even both!
+.Bd -literal -offset indent
+0 string MZ
+>0x18 leshort >0x3f
+>>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows)
+# at offset 0x58 inside the LE header, we find the relative offset
+# to a data area where we look for a specific signature
+>>>&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive
+.Ed
+.Pp
+Finally, if you have to deal with offset/length pairs in your file, even the
+second value in a parenthesed expression can be taken from the file itself,
+using another set of parentheses.
+Note that this additional indirect offset
+is always relative to the start of the main indirect offset.
+.Bd -literal -offset indent
+0 string MZ
+>0x18 leshort >0x3f
+>>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows)
+# search for the PE section called ".idata"...
+>>>&0xf4 search/0x140 .idata
+# ...and go to the end of it, calculated from start+length;
+# these are located 14 and 10 bytes after the section name
+>>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive
+.Ed
.Sh BUGS
The formats
.Em long ,
.Em belong ,
.Em lelong ,
+.Em melong ,
.Em short ,
.Em beshort ,
.Em leshort ,
.Em date ,
.Em bedate ,
+.Em medate ,
+.Em ledate ,
+.Em beldate ,
+.Em leldate ,
and
-.Em ledate
+.Em meldate
are system-dependent; perhaps they should be specified as a number
of bytes (2B, 4B, etc),
since the files being recognized typically come from
a system on which the lengths are invariant.
.Pp
-There is (currently) no support for specified-endian data to be used in
-indirect offsets.
-.Pp
If
.Pa /usr/share/misc/magic
is newer than
@@ -264,7 +427,7 @@ Use the command:
.Po
cd /usr/share/misc &&
.Nm file
-.Fl C
+.Fl C
.Fl m Ar magic
.Pc
to rebuild.
@@ -283,4 +446,4 @@ to rebuild.
.\" the changes I posted to the S5R2 version.
.\"
.\" Modified for Ian Darwin's version of the file command.
-.\" @(#)$Id: magic.man,v 1.27 2003/09/12 19:43:30 christos Exp $
+.\" @(#)$Id: magic.man,v 1.30 2006/02/19 18:16:03 christos Exp $
OpenPOWER on IntegriCloud