summaryrefslogtreecommitdiffstats
path: root/usr.bin/file/file.1
diff options
context:
space:
mode:
Diffstat (limited to 'usr.bin/file/file.1')
-rw-r--r--usr.bin/file/file.1416
1 files changed, 223 insertions, 193 deletions
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1
index 8f9451f..263a8a2 100644
--- a/usr.bin/file/file.1
+++ b/usr.bin/file/file.1
@@ -1,5 +1,6 @@
.\" $FreeBSD$
-.Dd December 8, 2000
+.\" $Id: file.man,v 1.39 2001/04/27 22:48:33 christos Exp $
+.Dd April 4, 2001
.Dt FILE 1 "Copyright but distributable"
.Os
.Sh NAME
@@ -11,10 +12,14 @@
.Op Fl f Ar namefile
.Op Fl m Ar magicfiles
.Ar
+.Nm
+.Fl C
+.Op Fl m Ar magicfile
.Sh DESCRIPTION
-This manual page documents version 3.33 of the
+This manual page documents version 3.36 of the
.Nm
command.
+.Pp
.Nm File
tests each argument in an attempt to classify it.
There are three sets of tests, performed in this order:
@@ -24,131 +29,136 @@ The
test that succeeds causes the file type to be printed.
.Pp
The type printed will usually contain one of the words
-.Em text
-(the file contains only printing characters and a few
-common control characters and is probably safe to read on
-an
+.Dq Li text
+(the file contains only
+printing characters and a few common control
+characters and is probably safe to read on an
.Tn ASCII
terminal),
-.Em executable
+.Dq Li executable
(the file contains the result of compiling a program
in a form understandable to some
.Ux
-kernel or another), or
-.Em data
+kernel or another),
+or
+.Dq Li data
meaning anything else (data is usually
.Sq binary
or non-printable).
-Exceptions are well-known file formats (core files, tar
-archives) that are known to contain binary data.
+Exceptions are well-known file formats (core files, tar archives)
+that are known to contain binary data.
When modifying the file
.Pa /usr/share/misc/magic
or the program itself,
.Em "preserve these keywords" .
-People depend on knowing that all the readable files in a
-directory have the word
-.Dq text
+People depend on knowing that all the readable files in a directory
+have the word
+.Dq Li text
printed.
Don't do as Berkeley did and change
-.Dq shell commands text
+.Dq Li "shell commands text"
to
-.Dq shell script .
+.Dq Li "shell script" .
Note that the file
.Pa /usr/share/misc/magic
-is built mechanically from a large number of
-small files in the subdirectory
+is built mechanically from a large number of small files in
+the subdirectory
.Pa Magdir
in the source distribution of this program.
.Pp
-The filesystem tests are based on examining the return
-from a
+The filesystem tests are based on examining the return from a
.Xr stat 2
system call.
-The program checks to see if the file is empty, or if it's
-some sort of special file.
-Any known file types appropriate to the system you are
-running on (sockets, symbolic links, or named pipes
-(FIFOs) on those systems that implement them) are intuited
-if they are defined in the system header file
+The program checks to see if the file is empty,
+or if it's some sort of special file.
+Any known file types appropriate to the system you are running on
+(sockets, symbolic links, or named pipes (FIFOs) on those systems that
+implement them)
+are intuited if they are defined in
+the system header file
.Aq Pa sys/stat.h .
.Pp
-The magic number tests are used to check for files with
-data in particular fixed formats.
+The magic number tests are used to check for files with data in
+particular fixed formats.
The canonical example of this is a binary executable (compiled program)
.Pa a.out
file, whose format is defined in
-.Pa a.out.h
+.Aq Pa a.out.h
and possibly
-.Pa exec.h
+.Aq Pa exec.h
in the standard include directory.
These files have a
-.Sq magic number
-stored in a particular place near the beginning of the file
-that tells the
+.Sq "magic number"
+stored in a particular place
+near the beginning of the file that tells the
.Ux
-operating system that the file is a binary executable,
-and which of several types thereof.
+operating system
+that the file is a binary executable, and which of several types thereof.
The concept of
-.Sq magic number
+.Sq "magic number"
has been applied by extension to data files.
-Any file with some invariant identifier at a small fixed offset
-into the file can usually be described in this way.
-The information identifying these files is read from the magic file
-.Pa /usr/share/misc/magic .
-.Pp
-If a file does not match any of the entries in the magic
-file, it is examined to see if it seems to be a text file.
-.Tn ASCII ,
-.Tn ISO-8859-x ,
-non-ISO 8-bit extended-ASCII character
-sets (such as those used on Macintosh and IBM PC systems),
-.Tn UTF-8-encoded Unicode ,
-.Tn UTF-16-encoded Unicode ,
-and
-.Tn EBCDIC
-character sets can be distinguished by the different ranges
-and sequences of bytes that constitute printable text in each set.
+Any file with some invariant identifier at a small fixed
+offset into the file can usually be described in this way.
+The information identifying these files is read from the compiled
+magic file
+.Pa /usr/share/misc/magic.mgc ,
+or
+.Pa /usr/share/misc/magic
+if the compile file does not exist.
+.Pp
+If a file does not match any of the entries in the magic file,
+it is examined to see if it seems to be a text file.
+ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
+(such as those used on Macintosh and IBM PC systems),
+UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
+character sets can be distinguished by the different
+ranges and sequences of bytes that constitute printable text
+in each set.
If a file passes any of these tests, its character set is reported.
-.Tn ASCII ,
-.Tn ISO-8859-x ,
-.Tn UTF-8 ,
-and extended-ASCII files are identified as
-.Dq text
+ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
+as
+.Dq Li text
because they will be mostly readable on nearly any terminal;
-.Tn UTF-16
-and
-.Tn EBCDIC
-are only
-.Dq character data
-because, while they contain text, it is text that will
-require translation before it can be read.
-In addition, file will attempt to determine other characteristics of
-text-type files.
-If the lines of a file are terminated by CR, CRLF, or NEL,
-instead of the Unix-standard LF, this will be reported.
-Files that contain embedded escape sequences or overstriking will
-also be identified.
-.Pp
-Once file has determined the character set used in a text-type file,
-it will attempt to determine in what language the file is written.
+UTF-16 and EBCDIC are only
+.Dq Li "character data"
+because, while
+they contain text, it is text that will require translation
+before it can be read.
+In addition,
+.Nm
+will attempt to determine other characteristics of text-type files.
+If the lines of a file are terminated by CR, CRLF, or NEL, instead
+of the
+.Ux Ns -standard
+LF, this will be reported.
+Files that contain embedded escape sequences or overstriking
+will also be identified.
+.Pp
+Once
+.Nm
+has determined the character set used in a text-type file,
+it will
+attempt to determine in what language the file is written.
The language tests look for particular strings (cf
.Pa names.h )
that can appear anywhere in the first few blocks of a file.
For example, the keyword
-.Em \&.br
+.Ic .br
indicates that the file is most likely a
.Xr troff 1
-input file, just as the keyword struct indicates a C program.
-These tests are less reliable than the previous two
-groups, so they are performed last.
-The language test routines also test for some miscellany (such as
+input file, just as the keyword
+.Ic struct
+indicates a C program.
+These tests are less reliable than the previous
+two groups, so they are performed last.
+The language test routines also test for some miscellany
+(such as
.Xr tar 1
archives).
.Pp
Any file that cannot be identified as having been written
-in any of the character sets listed above is simply said
-to be
-.Dq data .
+in any of the character sets listed above is simply said to be
+.Dq Li data .
.Sh OPTIONS
.Bl -tag -width indent
.It Fl b
@@ -158,6 +168,11 @@ Cause a checking printout of the parsed form of the magic file.
This is usually used in conjunction with
.Fl m
to debug a new magic file before installing it.
+.It Fl C
+Write a
+.Pa magic.mgc
+output file that contains a pre-parsed version of
+file.
.It Fl f Ar namefile
Read the names of the files to be examined from
.Ar namefile
@@ -167,19 +182,19 @@ Either
.Ar namefile
or at least one filename argument must be present;
to test the standard input, use
-.Dq Ar -
+.Dq Fl
as a filename argument.
.It Fl i
-Causes the file command to output mime type strings rather than the
-more traditional human readable ones.
+Causes the file command to output mime type strings rather than the more
+traditional human readable ones.
Thus it may say
-.Dq text/plain; charset=us-ascii
+.Dq Li "text/plain; charset=us-ascii"
rather than
-.Dq ASCII text .
-In order for this option to work, file changes the way it handles
-files recognised by the command itself (such as many of the text
-file types, directories etc), and makes use of an alternative
-.Dq Pa magic
+.Dq Li "ASCII text" .
+In order for this option to work, file changes the way
+it handles files recognised by the command itself (such as many of the
+text file types, directories etc), and makes use of an alternative
+.Pa magic
file.
(See
.Sx FILES
@@ -187,44 +202,46 @@ section, below).
.It Fl k
Don't stop at the first match, keep going.
.It Fl m Ar list
-Specify an alternate
-.Ar list
-of files containing magic numbers.
+Specify an alternate list of files containing magic numbers.
This can be a single file, or a colon-separated list of files.
.It Fl n
Force stdout to be flushed after checking each file.
This is only useful if checking a list of files.
-It is intended to be used by programs that
-want filetype output from a pipe.
-.It Fl s
-Normally, file only attempts to read and determine
-the type of argument files which
-.Xr stat 2
-reports are ordinary files.
-This prevents problems, because reading special files
-may have peculiar consequences.
-Specifying the
-.Fl s
-option causes file to also read argument files which
-are block or character special files.
-This is useful for determining the filesystem types of
-the data in raw disk partitions, which are block special files.
-This option also causes file to disregard the file size as
-reported by
-.Xr stat 2
-since on some systems it reports a zero size for raw
-disk partitions.
+It is intended to be used by programs that want
+filetype output from a pipe.
.It Fl v
Print the version of the program and exit.
.It Fl z
Try to look inside compressed files.
.It Fl L
-Cause symlinks to be followed, as the like-named option in
+option causes symlinks to be followed, as the like-named option in
.Xr ls 1 .
(on systems that support symbolic links).
+.It Fl s
+Normally,
+.Nm
+only attempts to read and determine the type of argument files which
+.Xr stat 2
+reports are ordinary files.
+This prevents problems, because reading special files may have peculiar
+consequences.
+Specifying the
+.Fl s
+option causes
+.Nm
+to also read argument files which are block or character special files.
+This is useful for determining the filesystem types of the data in raw
+disk partitions, which are block special files.
+This option also causes
+.Nm
+to disregard the file size as reported by
+.Xr stat 2
+since on some systems it reports a zero size for raw disk partitions.
.El
.Sh FILES
-.Bl -tag -width /usr/share/misc/magic.mime -compact
+.Bl -tag -width ".Pa /usr/share/misc/magic.mime" -compact
+.It Pa /usr/share/misc/magic.mgc
+default compiled list of magic numbers
.It Pa /usr/share/misc/magic
default list of magic numbers
.It Pa /usr/share/misc/magic.mime
@@ -237,11 +254,13 @@ The environment variable
.Ev MAGIC
can be used to set the default magic number files.
.Sh SEE ALSO
+.Xr hexdump 1 ,
.Xr od 1 ,
.Xr strings 1 ,
.Xr magic 5
.Sh STANDARDS CONFORMANCE
-This program is believed to exceed the System V Interface Definition
+This program is believed to exceed the
+.St -svid4
of FILE(CMD), as near as one can determine from the vague language
contained therein.
Its behaviour is mostly compatible with the System V program of the same name.
@@ -253,33 +272,33 @@ between this version and System V
is that this version treats any white space
as a delimiter, so that spaces in pattern strings must be escaped.
For example,
-.Bd -literal -compact
->10 string language impress (imPRESS data)
-.Ed
+.Pp
+.Dl ">10 string language impress\ (imPRESS data)"
+.Pp
in an existing magic file would have to be changed to
-.Bd -literal -compact
->10 string language\e impress (imPRESS data)
-.Ed
+.Pp
+.Dl ">10 string language\e impress (imPRESS data)"
.Pp
In addition, in this version, if a pattern string contains a backslash,
-it must be escaped. For example
-.Bd -literal -compact
-0 string \ebegindata Andrew Toolkit document
-.Ed
+it must be escaped.
+For example
+.Pp
+.Dl "0 string \ebegindata Andrew Toolkit document"
+.Pp
in an existing magic file would have to be changed to
-.Bd -literal -compact
-0 string \e\ebegindata Andrew Toolkit document
-.Ed
+.Pp
+.Dl "0 string \e\ebegindata Andrew Toolkit document"
.Pp
SunOS releases 3.2 and later from Sun Microsystems include a
.Xr file 1
command derived from the System V one, but with some extensions.
My version differs from Sun's only in minor ways.
-It includes the extension of the `&' operator, used as,
+It includes the extension of the
+.Sq Ic &
+operator, used as,
for example,
-.Bd -literal -compact
->16 long&0x7fffffff >0 not stripped
-.Ed
+.Pp
+.Dl ">16 long&0x7fffffff >0 not stripped"
.Sh MAGIC DIRECTORY
The magic file entries have been collected from various sources,
mainly USENET, and contributed by various authors.
@@ -330,7 +349,7 @@ There has been a
command in every
.Ux
since at least Research Version 6
-(man page dated January, 1975).
+(man page dated January 16, 1975).
The System V version introduced one significant major change:
the external list of magic number types.
This slowed the program down slightly but made it a lot more flexible.
@@ -347,7 +366,7 @@ the first version.
found several inadequacies
and provided some magic file entries.
Contributions by the
-.Sq \&&
+.Sq Ic &
operator by
.An Rob McMahon Aq cudcv@warwick.ac.uk ,
1989.
@@ -355,21 +374,24 @@ operator by
.An Guy Harris Aq guy@netapp.com ,
made many changes from 1993 to the present.
.Pp
-Primary development and maintenance from 1990 to the
-present by
+Primary development and maintenance from 1990 to the present by
.An Christos Zoulas Aq christos@astron.com .
.Pp
Altered by
.An Chris Lowth Aq chris@lowth.com ,
-2000: Handle the
+2000:
+Handle the
.Fl i
-option to output mime type strings and using an
-alternative magic file and internal logic.
+option to output mime type strings and using an alternative
+magic file and internal logic.
.Pp
Altered by
.An Eric Fischer Aq enf@pobox.com ,
-July, 2000, to identify character codes and attempt to identify
-the languages of non-ASCII files.
+July, 2000,
+to identify character codes and attempt to identify the languages
+of
+.No non- Ns Tn ASCII
+files.
.Pp
The list of contributors to the
.Pa Magdir
@@ -377,10 +399,11 @@ directory (source for the
.Pa /usr/share/misc/magic
file) is too long to include here.
You know who you are; thank you.
-.Sh "LEGAL NOTICE"
+.Sh LEGAL NOTICE
Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
-Covered by the standard Berkeley Software Distribution
-copyright; see the file LEGAL.NOTICE in the source distribution.
+Covered by the standard Berkeley Software Distribution copyright; see the file
+.Pa LEGAL.NOTICE
+in the source distribution.
.Pp
The files
.Pa tar.h
@@ -388,89 +411,96 @@ and
.Pa is_tar.c
were written by
.An John Gilmore
-from his public-domain tar program, and are not covered by
-the above license.
+from his public-domain
+.Nm tar
+program, and are not covered by the above license.
.Sh BUGS
-There must be a better way to automate the construction of
-the
+There must be a better way to automate the construction of the
.Pa Magic
file from all the glop in
.Pa Magdir .
What is it?
-Better yet, the magic file should be compiled into binary
-(say,
+Better yet, the magic file should be compiled into binary (say,
.Xr ndbm 3
or, better yet, fixed-length
.Tn ASCII
-strings for use in heterogenous network environments) for
-faster startup.
-Then the program would run as fast as the Version 7 program
-of the same name, with the flexibility of
-the System V version.
-.Pp
-File uses several algorithms that favor speed over accuracy,
-thus it can be misled about the contents of text
+strings for use in heterogenous network environments) for faster startup.
+Then the program would run as fast as the Version 7 program of the same name,
+with the flexibility of the System V version.
+.Pp
+.Nm File
+uses several algorithms that favor speed over accuracy,
+thus it can be misled about the contents of
+text
files.
.Pp
-The support for text files (primarily for programming languages)
+The support for
+text
+files (primarily for programming languages)
is simplistic, inefficient and requires recompilation to update.
.Pp
There should be an
-.Dq else
+.Ic else
clause to follow a series of continuation lines.
.Pp
-The magic file and keywords should have regular expression
-support.
-Their use of ASCII TAB as a field delimiter is
-ugly and makes it hard to edit the files, but is
-entrenched.
+The magic file and keywords should have regular expression support.
+Their use of
+.Tn "ASCII TAB"
+as a field delimiter is ugly and makes
+it hard to edit the files, but is entrenched.
.Pp
-It might be advisable to allow upper-case letters in
-keywords for e.g.,
+It might be advisable to allow upper-case letters in keywords
+for e.g.,
.Xr troff 1
commands vs man page macros.
Regular expression support would make this easy.
.Pp
-The program doesn't grok FORTRAN.
-It should be able to figure FORTRAN by seeing some keywords
-which appear indented at the start of line.
+The program doesn't grok
+.Tn FORTRAN .
+It should be able to figure
+.Tn FORTRAN
+by seeing some keywords which
+appear indented at the start of line.
Regular expression support would make this easy.
.Pp
-The list of keywords in ascmagic probably belongs in the
+The list of keywords in
+.Pa ascmagic
+probably belongs in the
.Pa Magic
file.
This could be done by using some keyword like
-`*' for the offset value.
+.Sq Ic *
+for the offset value.
.Pp
-Another optimisation would be to sort the magic file so
-that we can just run down all the tests for the first
-byte, first word, first long, etc, once we have fetched
-it.
+Another optimisation would be to sort
+the magic file so that we can just run down all the
+tests for the first byte, first word, first long, etc, once we
+have fetched it.
Complain about conflicts in the magic file entries.
-Make a rule that the magic entries sort based on file offset
-rather than position within the magic file?
+Make a rule that the magic entries sort based on file offset rather
+than position within the magic file?
.Pp
-The program should provide a way to give an estimate of
+The program should provide a way to give an estimate
+of
.Dq how good
a guess is.
-We end up removing guesses (e.g.
-.Dq From
-as first 5 chars of file) because they are not
-as good as other guesses (e.g.
-.Dq Newsgroups:
+We end up removing guesses (e.g.\&
+.Dq Li "From "
+as first 5 chars of file) because
+they are not as good as other guesses (e.g.\&
+.Dq Li "Newsgroups:"
versus
-.Dq Return-Path: ) .
-Still, if the others don't pan out, it
-should be possible to use the first guess.
+.Dq Li "Return-Path:" ) .
+Still, if the others don't pan out, it should be
+possible to use the first guess.
.Pp
This program is slower than some vendors' file commands.
-The new support for multiple character codes makes it even
-slower.
+The new support for multiple character codes makes it even slower.
.Pp
This manual page, and particularly this section, is too long.
.Sh AVAILABILITY
-You can obtain the original author's latest version by
-anonymous FTP on
+You can obtain the original author's latest version by anonymous FTP
+on
.Pa ftp.astron.com
in the directory
.Pa /pub/file/file-X.YY.tar.gz
OpenPOWER on IntegriCloud