summaryrefslogtreecommitdiffstats
path: root/usr.bin/file
diff options
context:
space:
mode:
authorphantom <phantom@FreeBSD.org>2000-12-11 15:50:04 +0000
committerphantom <phantom@FreeBSD.org>2000-12-11 15:50:04 +0000
commit697c4f2cd847ad9e1aaee911f44dbb387ab2bb20 (patch)
tree47a6ddb8572f812f0021bca7c10952b0b108fd46 /usr.bin/file
parentee790970190f15c72cd6467762df260062bc6bf7 (diff)
downloadFreeBSD-src-697c4f2cd847ad9e1aaee911f44dbb387ab2bb20.zip
FreeBSD-src-697c4f2cd847ad9e1aaee911f44dbb387ab2bb20.tar.gz
Re-add home born file(1) and magic(5) manual pages. Update them to
current file(1) version (3.33) Approved by: obrien
Diffstat (limited to 'usr.bin/file')
-rw-r--r--usr.bin/file/Makefile14
-rw-r--r--usr.bin/file/file.1473
-rw-r--r--usr.bin/file/magic.5242
3 files changed, 716 insertions, 13 deletions
diff --git a/usr.bin/file/Makefile b/usr.bin/file/Makefile
index 9c00d58..9471b7f 100644
--- a/usr.bin/file/Makefile
+++ b/usr.bin/file/Makefile
@@ -41,7 +41,7 @@ SRCS= file.c apprentice.c fsmagic.c softmagic.c ascmagic.c \
MAN1= file.1
MAN5= magic.5
-CLEANFILES+= magic file.1 magic.5 version
+CLEANFILES+= magic
MAGFILES= ${SRCDIR}/Header\
${SRCDIR}/Localstuff\
@@ -52,18 +52,6 @@ all: file magic
magic: $(MAGFILES)
cat $(MAGFILES) > $(.TARGET)
-version: Makefile.std
- @sed '/.*VERSION.*=[ ]*/s///w ${.TARGET}' ${.ALLSRC} > /dev/null
-
-.for MP in file.1 magic.5
-${MP}: ${SRCDIR}/${MP:C/[0-9]$/man/} version
- sed -e 's|__CSECTION__|1|g'\
- -e 's|__FSECTION__|5|g'\
- -e 's|__MAGIC__|${MAGICFILE}|g'\
- -e "s|__VERSION__|`cat version`|g"\
- ${SRCDIR}/${MP:C/[0-9]$/man/} > ${.TARGET}
-.endfor
-
beforeinstall:
$(INSTALL) $(COPY) -o $(BINOWN) -g $(BINGRP) -m $(MAGICMODE) \
magic $(DESTDIR)$(MAGICFILE)
diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1
new file mode 100644
index 0000000..16154fe
--- /dev/null
+++ b/usr.bin/file/file.1
@@ -0,0 +1,473 @@
+.\" $FreeBSD$
+.Dd December 08, 2000
+.Dt FILE 1 "Copyright but distributable"
+.Os
+.Sh NAME
+.Nm file
+.Nd determine file type
+.Sh SYNOPSIS
+.Nm
+.Op Fl bciknsvzL
+.Op Fl f Ar namefile
+.Op Fl m Ar magicfiles
+.Ar
+.Sh DESCRIPTION
+This manual page documents version 3.33 of the
+.Nm
+command.
+.Nm File
+tests each argument in an attempt to classify it.
+There are three sets of tests, performed in this order:
+filesystem tests, magic number tests, and language tests.
+The
+.Em first
+test that succeeds causes the file type to be printed.
+.Pp
+The type printed will usually contain one of the words
+.Em text
+(the file contains only printing characters and a few
+common control characters and is probably safe to read on
+an
+.Tn ASCII
+terminal),
+.Em executable
+(the file contains the result of compiling a program
+in a form understandable to some
+.Ux
+kernel or another), or
+.Em data
+meaning anything else (data is usually
+.Sq binary
+or non-printable).
+Exceptions are well-known file formats (core files, tar
+archives) that are known to contain binary data.
+When modifying the file
+.Pa /usr/share/misc/magic
+or the program itself,
+.Em "preserve these keywords" .
+People depend on knowing that all the readable files in a
+directory have the word
+.Dq text
+printed.
+Don't do as Berkeley did and change
+.Dq shell commands text
+to
+.Dq shell script .
+Note that the file
+.Pa /usr/share/misc/magic
+is built mechanically from a large number of
+small files in the subdirectory
+.Pa Magdir
+in the source distribution of this program.
+.Pp
+The filesystem tests are based on examining the return
+from a
+.Xr stat 2
+system call.
+The program checks to see if the file is empty, or if it's
+some sort of special file.
+Any known file types appropriate to the system you are
+running on (sockets, symbolic links, or named pipes
+(FIFOs) on those systems that implement them) are intuited
+if they are defined in the system header file
+.Aq Pa sys/stat.h .
+.Pp
+The magic number tests are used to check for files with
+data in particular fixed formats.
+The canonical example of this is a binary executable (compiled program)
+.Pa a.out
+file, whose format is defined in
+.Pa a.out.h
+and possibly
+.Pa exec.h
+in the standard include directory.
+These files have a
+.Sq magic number
+stored in a particular place near the beginning of the file
+that tells the
+.Ux
+operating system that the file is a binary executable,
+and which of several types thereof.
+The concept of
+.Sq magic number
+has been applied by extension to data files.
+Any file with some invariant identifier at a small fixed offset
+into the file can usually be described in this way.
+The information identifying these files is read from the magic file
+.Pa /usr/share/misc/magic .
+.Pp
+If a file does not match any of the entries in the magic
+file, it is examined to see if it seems to be a text file.
+.Tn ASCII ,
+.Tn ISO-8859-x ,
+non-ISO 8-bit extended-ASCII character
+sets (such as those used on Macintosh and IBM PC systems),
+.Tn UTF-8-encoded Unicode ,
+.Tn UTF-16-encoded Unicode ,
+and
+.Tn EBCDIC
+character sets can be distinguished by the different ranges
+and sequences of bytes that constitute printable text in each set.
+If a file passes any of these tests, its character set is reported.
+.Tn ASCII ,
+.Tn ISO-8859-x ,
+.Tn UTF-8 ,
+and extended-ASCII files are identified as
+.Dq text
+because they will be mostly readable on nearly any terminal;
+.Tn UTF-16
+and
+.Tn EBCDIC
+are only
+.Dq character data
+because, while they contain text, it is text that will
+require translation before it can be read.
+In addition, file will attempt to determine other characteristics of
+text-type files.
+If the lines of a file are terminated by CR, CRLF, or NEL,
+instead of the Unix-standard LF, this will be reported.
+Files that contain embedded escape sequences or overstriking will
+also be identified.
+.Pp
+Once file has determined the character set used in a text-type file,
+it will attempt to determine in what language the file is written.
+The language tests look for particular strings (cf
+.Pa names.h )
+that can appear anywhere in the first few blocks of a file.
+For example, the keyword
+.Em \&.br
+indicates that the file is most likely a
+.Xr troff 1
+input file, just as the keyword struct indicates a C program.
+These tests are less reliable than the previous two
+groups, so they are performed last.
+The language test routines also test for some miscellany (such as
+.Xr tar 1
+archives).
+.Pp
+Any file that cannot be identified as having been written
+in any of the character sets listed above is simply said
+to be
+.Dq data .
+.Sh OPTIONS
+.Bl -tag -width indent
+.It Fl b
+Do not prepend filenames to output lines (brief mode).
+.It Fl c
+Cause a checking printout of the parsed form of the magic file.
+This is usually used in conjunction with
+.Fl m
+to debug a new magic file before installing it.
+.It Fl f Ar namefile
+Read the names of the files to be examined from
+.Ar namefile
+(one per line)
+before the argument list.
+Either
+.Ar namefile
+or at least one filename argument must be present;
+to test the standard input, use
+.Dq -
+as a filename argument.
+.It Fl i
+Causes the file command to output mime type strings rather than the
+more traditional human readable ones.
+Thus it may say
+.Dq text/plain; charset=us-ascii
+rather than
+.Dq ASCII text .
+In order for this option to work, file changes the way it handles
+files recognised by the command itself (such as many of the text
+file types, directories etc), and makes use of an alternative
+.Dq Pa magic
+file. (See
+.Sx FILES
+section, below).
+.It Fl k
+Don't stop at the first match, keep going.
+.It Fl m Ar list
+Specify an alternate
+.Ar list
+of files containing magic numbers.
+This can be a single file, or a colon-separated list of files.
+.It Fl n
+Force stdout to be flushed after checking each file.
+This is only useful if checking a list of files.
+It is intended to be used by programs that
+want filetype output from a pipe.
+.It Fl s
+Normally, file only attempts to read and determine
+the type of argument files which
+.Xr stat 2
+reports are ordinary files.
+This prevents problems, because reading special files
+may have peculiar consequences. Specifying the
+.Fl s
+option causes file to also read argument files which
+are block or character special files.
+This is useful for determining the filesystem types of
+the data in raw disk partitions, which are block special files.
+This option also causes file to disregard the file size as
+reported by
+.Xr stat 2
+since on some systems it reports a zero size for raw
+disk partitions.
+.It Fl v
+Print the version of the program and exit.
+.It Fl z
+Try to look inside compressed files.
+.It Fl L
+Cause symlinks to be followed, as the like-named option in
+.Xr ls 1 .
+(on systems that support symbolic links).
+.El
+.Sh FILES
+.Bl -tag -width /usr/share/misc/magic.mime -compact
+.It Pa /usr/share/misc/magic
+default list of magic numbers
+.It Pa /usr/share/misc/magic.mime
+default list of magic numbers, used to output mime types when the
+.Fl i
+option is specified.
+.El
+.Sh ENVIRONMENT
+The environment variable
+.Ev MAGIC
+can be used to set the default magic number files.
+.Sh SEE ALSO
+.Xr od 1 ,
+.Xr strings 1 ,
+.Xr magic 5
+
+.Sh STANDARDS CONFORMANCE
+This program is believed to exceed the System V Interface Definition
+of FILE(CMD), as near as one can determine from the vague language
+contained therein.
+Its behaviour is mostly compatible with the System V program of the same name.
+This version knows more magic, however, so it will produce
+different (albeit more accurate) output in many cases.
+.Pp
+The one significant difference
+between this version and System V
+is that this version treats any white space
+as a delimiter, so that spaces in pattern strings must be escaped.
+For example,
+.Bd -literal -compact
+>10 string language impress (imPRESS data)
+.Ed
+in an existing magic file would have to be changed to
+.Bd -literal -compact
+>10 string language\e impress (imPRESS data)
+.Ed
+.Pp
+In addition, in this version, if a pattern string contains a backslash,
+it must be escaped. For example
+.Bd -literal -compact
+0 string \ebegindata Andrew Toolkit document
+.Ed
+in an existing magic file would have to be changed to
+.Bd -literal -compact
+0 string \e\ebegindata Andrew Toolkit document
+.Ed
+.Pp
+SunOS releases 3.2 and later from Sun Microsystems include a
+.Xr file 1
+command derived from the System V one, but with some extensions.
+My version differs from Sun's only in minor ways.
+It includes the extension of the `&' operator, used as,
+for example,
+.Bd -literal -compact
+>16 long&0x7fffffff >0 not stripped
+.Ed
+.Sh MAGIC DIRECTORY
+The magic file entries have been collected from various sources,
+mainly USENET, and contributed by various authors.
+.An Christos Zoulas
+(address below) will collect additional
+or corrected magic file entries.
+A consolidation of magic file entries
+will be distributed periodically.
+.Pp
+The order of entries in the magic file is significant.
+Depending on what system you are using, the order that
+they are put together may be incorrect.
+If your old
+.Nm
+command uses a magic file,
+keep the old magic file around for comparison purposes
+(rename it to
+.Pa /usr/share/misc/magic.orig Ns ).
+.Sh EXAMPLES
+.Bd -literal
+$ file file.c file /dev/hda
+file.c: C program text
+file: ELF 32-bit LSB executable, Intel 80386, version 1,
+ dynamically linked, not stripped
+/dev/hda: block special
+
+$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
+/dev/hda: x86 boot sector
+/dev/hda1: Linux/i386 ext2 filesystem
+/dev/hda2: x86 boot sector
+/dev/hda3: x86 boot sector, extended partition table
+/dev/hda4: Linux/i386 ext2 filesystem
+/dev/hda5: Linux/i386 swap file
+/dev/hda6: Linux/i386 swap file
+/dev/hda7: Linux/i386 swap file
+/dev/hda8: Linux/i386 swap file
+/dev/hda9: empty
+/dev/hda10: empty
+
+$ file -i file.c file /dev/hda
+file.c: text/x-c
+file: application/x-executable, dynamically linked (uses shared libs), not stripped
+/dev/hda: application/x-not-regular-file
+.Ed
+.Sh HISTORY
+There has been a
+.Nm
+command in every
+.Ux
+since at least Research Version 6
+(man page dated January, 1975).
+The System V version introduced one significant major change:
+the external list of magic number types.
+This slowed the program down slightly but made it a lot more flexible.
+.Pp
+This program, based on the System V version,
+was written by
+.An Ian Darwin Aq ian@darwinsys.com
+without looking at anybody else's source code.
+.Pp
+.An John Gilmore
+revised the code extensively, making it better than
+the first version.
+.An Geoff Collyer
+found several inadequacies
+and provided some magic file entries.
+Contributions by the
+.Sq \&&
+operator by
+.An Rob McMahon Aq cudcv@warwick.ac.uk ,
+1989.
+.Pp
+.An Guy Harris Aq guy@netapp.com ,
+made many changes from 1993 to the present.
+.Pp
+Primary development and maintenance from 1990 to the
+present by
+.An Christos Zoulas Aq christos@astron.com .
+.Pp
+Altered by
+.An Chris Lowth Aq chris@lowth.com ,
+2000: Handle the
+.Fl i
+option to output mime type strings and using an
+alternative magic file and internal logic.
+.Pp
+Altered by
+.An Eric Fischer Aq enf@pobox.com ,
+July, 2000, to identify character codes and attempt to identify
+the languages of non-ASCII files.
+.Pp
+The list of contributors to the
+.Pa Magdir
+directory (source for the
+.Pa /usr/share/misc/magic
+file) is too long to include here.
+You know who you are; thank you.
+.Sh "LEGAL NOTICE"
+Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
+Covered by the standard Berkeley Software Distribution
+copyright; see the file LEGAL.NOTICE in the source distribution.
+.Pp
+The files
+.Pa tar.h
+and
+.Pa is_tar.c
+were written by
+.An John Gilmore
+from his public-domain tar program, and are not covered by
+the above license.
+.Sh BUGS
+There must be a better way to automate the construction of
+the
+.Pa Magic
+file from all the glop in
+.Pa Magdir .
+What is it?
+Better yet, the magic file should be compiled into binary
+(say,
+.Xr ndbm 3
+or, better yet, fixed-length
+.Tn ASCII
+strings for use in heterogenous network environments) for
+faster startup.
+Then the program would run as fast as the Version 7 program
+of the same name, with the flexibility of
+the System V version.
+.Pp
+File uses several algorithms that favor speed over accuracy,
+thus it can be misled about the contents of text
+files.
+.Pp
+The support for text files (primarily for programming languages)
+is simplistic, inefficient and requires recompilation to update.
+.Pp
+There should be an
+.Dq else
+clause to follow a series of continuation lines.
+.Pp
+The magic file and keywords should have regular expression
+support. Their use of ASCII TAB as a field delimiter is
+ugly and makes it hard to edit the files, but is
+entrenched.
+.Pp
+It might be advisable to allow upper-case letters in
+keywords for e.g.,
+.Xr troff 1
+commands vs man page macros.
+Regular expression support would make this easy.
+.Pp
+The program doesn't grok FORTRAN.
+It should be able to figure FORTRAN by seeing some keywords
+which appear indented at the start of line.
+Regular expression support would make this easy.
+.Pp
+The list of keywords in ascmagic probably belongs in the
+.Pa Magic
+file.
+This could be done by using some keyword like
+`*' for the offset value.
+.Pp
+Another optimisation would be to sort the magic file so
+that we can just run down all the tests for the first
+byte, first word, first long, etc, once we have fetched
+it. Complain about conflicts in the magic file entries.
+Make a rule that the magic entries sort based on file offset
+rather than position within the magic file?
+.Pp
+The program should provide a way to give an estimate of
+.Dq how good
+a guess is.
+We end up removing guesses (e.g.
+.Dq From
+as first 5 chars of file) because they are not
+as good as other guesses (e.g.
+.Dq Newsgroups:
+versus
+.Dq Return-Path: Ns ).
+Still, if the others don't pan out, it
+should be possible to use the first guess.
+.Pp
+This program is slower than some vendors' file commands.
+The new support for multiple character codes makes it even
+slower.
+.Pp
+This manual page, and particularly this section, is too long.
+.Sh AVAILABILITY
+You can obtain the original author's latest version by
+anonymous FTP on
+.Pa ftp.astron.com
+in the directory
+.Pa /pub/file/file-X.YY.tar.gz
diff --git a/usr.bin/file/magic.5 b/usr.bin/file/magic.5
new file mode 100644
index 0000000..3b88eac
--- /dev/null
+++ b/usr.bin/file/magic.5
@@ -0,0 +1,242 @@
+.\"
+.\" $FreeBSD$
+.\"
+.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
+.\"
+.Dd December 08, 2000
+.Dt MAGIC 5 "Public Domain"
+.Os
+.Sh NAME
+.Nm magic
+.Nd file command's magic number file
+.Sh DESCRIPTION
+This manual page documents the format of the magic file as
+used by the
+.Nm
+command, version 3.33. The
+.Nm file
+command identifies the type of a file using,
+among other tests,
+a test for whether the file begins with a certain
+.Em "magic number" .
+The file
+.Pa /usr/share/misc/magic
+specifies what magic numbers are to be tested for,
+what message to print if a particular magic number is found,
+and additional information to extract from the file.
+.Pp
+Each line of the file specifies a test to be performed.
+A test compares the data starting at a particular offset
+in the file with a 1-byte, 2-byte, or 4-byte numeric value or
+a string.
+If the test succeeds, a message is printed.
+The line consists of the following fields:
+.Bl -tag -width indent
+.It offset
+A number specifying the offset, in bytes, into the file of the data
+which is to be tested.
+.It type
+The type of the data to be tested.
+The possible values are:
+.Bl -tag -width indent
+.It byte
+A one-byte value.
+.It short
+A two-byte value (on most systems) in this machine's native byte order.
+.It long
+A four-byte value (on most systems) in this machine's native byte order.
+.It string
+A string of bytes.
+The string type specification can be optionally followed
+by /[Bbc]*. The
+.Dq B
+flag compacts whitespace in the target, which must contain
+at least one whitespace character.
+If the magic has "n" consecutive blanks, the target needs
+at least "n" consecutive blanks to match.
+The
+.Dq b
+flag treats every blank in the target as an optional blank.
+Finally the
+.Dq c
+flag, specifies case insensitive matching: lowercase characters
+in the magic match both lower and upper case characters in the
+targer, whereas upper case characters in the magic, only much
+uppercase characters in the target.
+.It date
+A four-byte value interpreted as a unix date.
+.It beshort
+A two-byte value (on most systems) in big-endian byte order.
+.It belong
+A four-byte value (on most systems) in big-endian byte order.
+.It bedate
+A four-byte value (on most systems) in big-endian byte order,
+interpreted as a unix date.
+.It leshort
+A two-byte value (on most systems) in little-endian byte order.
+.It lelong
+A four-byte value (on most systems) in little-endian byte order.
+.It ledate
+A four-byte value (on most systems) in little-endian byte order,
+interpreted as a unix date.
+.El
+.El
+.Pp
+The numeric types may optionally be followed by
+.Em &
+and a numeric value,
+to specify that the value is to be AND'ed with the
+numeric value before any comparisons are done. Prepending a
+.Em u
+to the type indicates that ordered comparisons should be unsigned.
+.Bl -tag -width indent
+.It test
+The value to be compared with the value from the file. If the type is
+numeric, this value
+is specified in C form; if it is a string, it is specified as a C string
+with the usual escapes permitted (e.g. \en for new-line).
+.It ""
+Numeric values
+may be preceded by a character indicating the operation to be performed.
+It may be
+.Em = ,
+to specify that the value from the file must equal the specified value,
+.Em < ,
+to specify that the value from the file must be less than the specified
+value,
+.Em > ,
+to specify that the value from the file must be greater than the specified
+value,
+.Em & ,
+to specify that the value from the file must have set all of the bits
+that are set in the specified value,
+.Em ^ ,
+to specify that the value from the file must have clear any of the bits
+that are set in the specified value, or
+.Em x ,
+to specify that any value will match.
+If the character is omitted,
+it is assumed to be
+.Em = .
+.It ""
+Numeric values are specified in C form; e.g.
+.Em 13
+is decimal,
+.Em 013
+is octal, and
+.Em 0x13
+is hexadecimal.
+.It ""
+For string values, the byte string from the
+file must match the specified byte string.
+The operators
+.Em = ,
+.Em <
+and
+.Em >
+(but not
+.Em & )
+can be applied to strings.
+The length used for matching is that of the string argument
+in the magic file. This means that a line can match any string, and
+then presumably print that string, by doing
+.Em >\e0
+(because all strings are greater than the null string).
+.It message
+The message to be printed if the comparison succeeds. If the string
+contains a
+.Xr printf 3
+format specification, the value from the file (with any specified masking
+performed) is printed using the message as the format string.
+.El
+.Pp
+Some file formats contain additional information which is to be printed
+along with the file type. A line which begins with the character
+.Em >
+indicates additional tests and messages to be printed. The number of
+.Em >
+on the line indicates the level of the test; a line with no
+.Em >
+at the beginning is considered to be at level 0.
+Each line at level
+.Em n+1
+is under the control of the line at level
+.Em n
+most closely preceding it in the magic file.
+If the test on a line at level
+.Em n
+succeeds, the tests specified in all the subsequent lines at level
+.Em n+1
+are performed, and the messages printed if the tests succeed. The next
+line at level
+.Em n
+terminates this.
+If the first character following the last
+.Em >
+is a
+.Em \&(
+then the string after the parenthesis is interpreted as an indirect offset.
+That means that the number after the parenthesis is used as an offset in
+the file.
+The value at that offset is read, and is used again as an offset
+in the file.
+Indirect offsets are of the form:
+.Em (x[.[bslBSL]][+-][y]) .
+The value of
+.Em x
+is used as an offset in the file.
+A byte, short or long is read at that offset
+depending on the
+.Em [bslBSL]
+type specifier.
+The capitalized types interpret the number as a big endian value, whereas
+a small letter versions interpret the number as a little endian value.
+To that number the value of
+.Em y
+is added and the result is used as an offset in the file.
+The default type
+if one is not specified is long.
+.Pp
+Sometimes you do not know the exact offset as this depends on the length of
+preceding fields.
+You can specify an offset relative to the end of the
+last uplevel field (of course this may only be done for sublevel tests, i.e.
+test beginning with
+.Em > Ns ).
+Such a relative offset is specified using
+.Em &
+as a prefix to the offset.
+.Sh BUGS
+The formats
+.Em long ,
+.Em belong ,
+.Em lelong ,
+.Em short ,
+.Em beshort ,
+.Em leshort ,
+.Em date ,
+.Em bedate ,
+and
+.Em ledate
+are system-dependent; perhaps they should be specified as a number
+of bytes (2B, 4B, etc),
+since the files being recognized typically come from
+a system on which the lengths are invariant.
+.Pp
+There is (currently) no support for specified-endian data to be used in
+indirect offsets.
+.Sh SEE ALSO
+.Xr file 1
+.\"
+.\" From: guy@sun.uucp (Guy Harris)
+.\" Newsgroups: net.bugs.usg
+.\" Subject: /etc/magic's format isn't well documented
+.\" Message-ID: <2752@sun.uucp>
+.\" Date: 3 Sep 85 08:19:07 GMT
+.\" Organization: Sun Microsystems, Inc.
+.\" Lines: 136
+.\"
+.\" Here's a manual page for the format accepted by the "file" made by adding
+.\" the changes I posted to the S5R2 version.
+.\"
+.\" Modified for Ian Darwin's version of the file command.
OpenPOWER on IntegriCloud