diff options
author | phantom <phantom@FreeBSD.org> | 2000-12-11 15:50:04 +0000 |
---|---|---|
committer | phantom <phantom@FreeBSD.org> | 2000-12-11 15:50:04 +0000 |
commit | 697c4f2cd847ad9e1aaee911f44dbb387ab2bb20 (patch) | |
tree | 47a6ddb8572f812f0021bca7c10952b0b108fd46 /usr.bin/file | |
parent | ee790970190f15c72cd6467762df260062bc6bf7 (diff) | |
download | FreeBSD-src-697c4f2cd847ad9e1aaee911f44dbb387ab2bb20.zip FreeBSD-src-697c4f2cd847ad9e1aaee911f44dbb387ab2bb20.tar.gz |
Re-add home born file(1) and magic(5) manual pages. Update them to
current file(1) version (3.33)
Approved by: obrien
Diffstat (limited to 'usr.bin/file')
-rw-r--r-- | usr.bin/file/Makefile | 14 | ||||
-rw-r--r-- | usr.bin/file/file.1 | 473 | ||||
-rw-r--r-- | usr.bin/file/magic.5 | 242 |
3 files changed, 716 insertions, 13 deletions
diff --git a/usr.bin/file/Makefile b/usr.bin/file/Makefile index 9c00d58..9471b7f 100644 --- a/usr.bin/file/Makefile +++ b/usr.bin/file/Makefile @@ -41,7 +41,7 @@ SRCS= file.c apprentice.c fsmagic.c softmagic.c ascmagic.c \ MAN1= file.1 MAN5= magic.5 -CLEANFILES+= magic file.1 magic.5 version +CLEANFILES+= magic MAGFILES= ${SRCDIR}/Header\ ${SRCDIR}/Localstuff\ @@ -52,18 +52,6 @@ all: file magic magic: $(MAGFILES) cat $(MAGFILES) > $(.TARGET) -version: Makefile.std - @sed '/.*VERSION.*=[ ]*/s///w ${.TARGET}' ${.ALLSRC} > /dev/null - -.for MP in file.1 magic.5 -${MP}: ${SRCDIR}/${MP:C/[0-9]$/man/} version - sed -e 's|__CSECTION__|1|g'\ - -e 's|__FSECTION__|5|g'\ - -e 's|__MAGIC__|${MAGICFILE}|g'\ - -e "s|__VERSION__|`cat version`|g"\ - ${SRCDIR}/${MP:C/[0-9]$/man/} > ${.TARGET} -.endfor - beforeinstall: $(INSTALL) $(COPY) -o $(BINOWN) -g $(BINGRP) -m $(MAGICMODE) \ magic $(DESTDIR)$(MAGICFILE) diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1 new file mode 100644 index 0000000..16154fe --- /dev/null +++ b/usr.bin/file/file.1 @@ -0,0 +1,473 @@ +.\" $FreeBSD$ +.Dd December 08, 2000 +.Dt FILE 1 "Copyright but distributable" +.Os +.Sh NAME +.Nm file +.Nd determine file type +.Sh SYNOPSIS +.Nm +.Op Fl bciknsvzL +.Op Fl f Ar namefile +.Op Fl m Ar magicfiles +.Ar +.Sh DESCRIPTION +This manual page documents version 3.33 of the +.Nm +command. +.Nm File +tests each argument in an attempt to classify it. +There are three sets of tests, performed in this order: +filesystem tests, magic number tests, and language tests. +The +.Em first +test that succeeds causes the file type to be printed. +.Pp +The type printed will usually contain one of the words +.Em text +(the file contains only printing characters and a few +common control characters and is probably safe to read on +an +.Tn ASCII +terminal), +.Em executable +(the file contains the result of compiling a program +in a form understandable to some +.Ux +kernel or another), or +.Em data +meaning anything else (data is usually +.Sq binary +or non-printable). +Exceptions are well-known file formats (core files, tar +archives) that are known to contain binary data. +When modifying the file +.Pa /usr/share/misc/magic +or the program itself, +.Em "preserve these keywords" . +People depend on knowing that all the readable files in a +directory have the word +.Dq text +printed. +Don't do as Berkeley did and change +.Dq shell commands text +to +.Dq shell script . +Note that the file +.Pa /usr/share/misc/magic +is built mechanically from a large number of +small files in the subdirectory +.Pa Magdir +in the source distribution of this program. +.Pp +The filesystem tests are based on examining the return +from a +.Xr stat 2 +system call. +The program checks to see if the file is empty, or if it's +some sort of special file. +Any known file types appropriate to the system you are +running on (sockets, symbolic links, or named pipes +(FIFOs) on those systems that implement them) are intuited +if they are defined in the system header file +.Aq Pa sys/stat.h . +.Pp +The magic number tests are used to check for files with +data in particular fixed formats. +The canonical example of this is a binary executable (compiled program) +.Pa a.out +file, whose format is defined in +.Pa a.out.h +and possibly +.Pa exec.h +in the standard include directory. +These files have a +.Sq magic number +stored in a particular place near the beginning of the file +that tells the +.Ux +operating system that the file is a binary executable, +and which of several types thereof. +The concept of +.Sq magic number +has been applied by extension to data files. +Any file with some invariant identifier at a small fixed offset +into the file can usually be described in this way. +The information identifying these files is read from the magic file +.Pa /usr/share/misc/magic . +.Pp +If a file does not match any of the entries in the magic +file, it is examined to see if it seems to be a text file. +.Tn ASCII , +.Tn ISO-8859-x , +non-ISO 8-bit extended-ASCII character +sets (such as those used on Macintosh and IBM PC systems), +.Tn UTF-8-encoded Unicode , +.Tn UTF-16-encoded Unicode , +and +.Tn EBCDIC +character sets can be distinguished by the different ranges +and sequences of bytes that constitute printable text in each set. +If a file passes any of these tests, its character set is reported. +.Tn ASCII , +.Tn ISO-8859-x , +.Tn UTF-8 , +and extended-ASCII files are identified as +.Dq text +because they will be mostly readable on nearly any terminal; +.Tn UTF-16 +and +.Tn EBCDIC +are only +.Dq character data +because, while they contain text, it is text that will +require translation before it can be read. +In addition, file will attempt to determine other characteristics of +text-type files. +If the lines of a file are terminated by CR, CRLF, or NEL, +instead of the Unix-standard LF, this will be reported. +Files that contain embedded escape sequences or overstriking will +also be identified. +.Pp +Once file has determined the character set used in a text-type file, +it will attempt to determine in what language the file is written. +The language tests look for particular strings (cf +.Pa names.h ) +that can appear anywhere in the first few blocks of a file. +For example, the keyword +.Em \&.br +indicates that the file is most likely a +.Xr troff 1 +input file, just as the keyword struct indicates a C program. +These tests are less reliable than the previous two +groups, so they are performed last. +The language test routines also test for some miscellany (such as +.Xr tar 1 +archives). +.Pp +Any file that cannot be identified as having been written +in any of the character sets listed above is simply said +to be +.Dq data . +.Sh OPTIONS +.Bl -tag -width indent +.It Fl b +Do not prepend filenames to output lines (brief mode). +.It Fl c +Cause a checking printout of the parsed form of the magic file. +This is usually used in conjunction with +.Fl m +to debug a new magic file before installing it. +.It Fl f Ar namefile +Read the names of the files to be examined from +.Ar namefile +(one per line) +before the argument list. +Either +.Ar namefile +or at least one filename argument must be present; +to test the standard input, use +.Dq - +as a filename argument. +.It Fl i +Causes the file command to output mime type strings rather than the +more traditional human readable ones. +Thus it may say +.Dq text/plain; charset=us-ascii +rather than +.Dq ASCII text . +In order for this option to work, file changes the way it handles +files recognised by the command itself (such as many of the text +file types, directories etc), and makes use of an alternative +.Dq Pa magic +file. (See +.Sx FILES +section, below). +.It Fl k +Don't stop at the first match, keep going. +.It Fl m Ar list +Specify an alternate +.Ar list +of files containing magic numbers. +This can be a single file, or a colon-separated list of files. +.It Fl n +Force stdout to be flushed after checking each file. +This is only useful if checking a list of files. +It is intended to be used by programs that +want filetype output from a pipe. +.It Fl s +Normally, file only attempts to read and determine +the type of argument files which +.Xr stat 2 +reports are ordinary files. +This prevents problems, because reading special files +may have peculiar consequences. Specifying the +.Fl s +option causes file to also read argument files which +are block or character special files. +This is useful for determining the filesystem types of +the data in raw disk partitions, which are block special files. +This option also causes file to disregard the file size as +reported by +.Xr stat 2 +since on some systems it reports a zero size for raw +disk partitions. +.It Fl v +Print the version of the program and exit. +.It Fl z +Try to look inside compressed files. +.It Fl L +Cause symlinks to be followed, as the like-named option in +.Xr ls 1 . +(on systems that support symbolic links). +.El +.Sh FILES +.Bl -tag -width /usr/share/misc/magic.mime -compact +.It Pa /usr/share/misc/magic +default list of magic numbers +.It Pa /usr/share/misc/magic.mime +default list of magic numbers, used to output mime types when the +.Fl i +option is specified. +.El +.Sh ENVIRONMENT +The environment variable +.Ev MAGIC +can be used to set the default magic number files. +.Sh SEE ALSO +.Xr od 1 , +.Xr strings 1 , +.Xr magic 5 + +.Sh STANDARDS CONFORMANCE +This program is believed to exceed the System V Interface Definition +of FILE(CMD), as near as one can determine from the vague language +contained therein. +Its behaviour is mostly compatible with the System V program of the same name. +This version knows more magic, however, so it will produce +different (albeit more accurate) output in many cases. +.Pp +The one significant difference +between this version and System V +is that this version treats any white space +as a delimiter, so that spaces in pattern strings must be escaped. +For example, +.Bd -literal -compact +>10 string language impress (imPRESS data) +.Ed +in an existing magic file would have to be changed to +.Bd -literal -compact +>10 string language\e impress (imPRESS data) +.Ed +.Pp +In addition, in this version, if a pattern string contains a backslash, +it must be escaped. For example +.Bd -literal -compact +0 string \ebegindata Andrew Toolkit document +.Ed +in an existing magic file would have to be changed to +.Bd -literal -compact +0 string \e\ebegindata Andrew Toolkit document +.Ed +.Pp +SunOS releases 3.2 and later from Sun Microsystems include a +.Xr file 1 +command derived from the System V one, but with some extensions. +My version differs from Sun's only in minor ways. +It includes the extension of the `&' operator, used as, +for example, +.Bd -literal -compact +>16 long&0x7fffffff >0 not stripped +.Ed +.Sh MAGIC DIRECTORY +The magic file entries have been collected from various sources, +mainly USENET, and contributed by various authors. +.An Christos Zoulas +(address below) will collect additional +or corrected magic file entries. +A consolidation of magic file entries +will be distributed periodically. +.Pp +The order of entries in the magic file is significant. +Depending on what system you are using, the order that +they are put together may be incorrect. +If your old +.Nm +command uses a magic file, +keep the old magic file around for comparison purposes +(rename it to +.Pa /usr/share/misc/magic.orig Ns ). +.Sh EXAMPLES +.Bd -literal +$ file file.c file /dev/hda +file.c: C program text +file: ELF 32-bit LSB executable, Intel 80386, version 1, + dynamically linked, not stripped +/dev/hda: block special + +$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} +/dev/hda: x86 boot sector +/dev/hda1: Linux/i386 ext2 filesystem +/dev/hda2: x86 boot sector +/dev/hda3: x86 boot sector, extended partition table +/dev/hda4: Linux/i386 ext2 filesystem +/dev/hda5: Linux/i386 swap file +/dev/hda6: Linux/i386 swap file +/dev/hda7: Linux/i386 swap file +/dev/hda8: Linux/i386 swap file +/dev/hda9: empty +/dev/hda10: empty + +$ file -i file.c file /dev/hda +file.c: text/x-c +file: application/x-executable, dynamically linked (uses shared libs), not stripped +/dev/hda: application/x-not-regular-file +.Ed +.Sh HISTORY +There has been a +.Nm +command in every +.Ux +since at least Research Version 6 +(man page dated January, 1975). +The System V version introduced one significant major change: +the external list of magic number types. +This slowed the program down slightly but made it a lot more flexible. +.Pp +This program, based on the System V version, +was written by +.An Ian Darwin Aq ian@darwinsys.com +without looking at anybody else's source code. +.Pp +.An John Gilmore +revised the code extensively, making it better than +the first version. +.An Geoff Collyer +found several inadequacies +and provided some magic file entries. +Contributions by the +.Sq \&& +operator by +.An Rob McMahon Aq cudcv@warwick.ac.uk , +1989. +.Pp +.An Guy Harris Aq guy@netapp.com , +made many changes from 1993 to the present. +.Pp +Primary development and maintenance from 1990 to the +present by +.An Christos Zoulas Aq christos@astron.com . +.Pp +Altered by +.An Chris Lowth Aq chris@lowth.com , +2000: Handle the +.Fl i +option to output mime type strings and using an +alternative magic file and internal logic. +.Pp +Altered by +.An Eric Fischer Aq enf@pobox.com , +July, 2000, to identify character codes and attempt to identify +the languages of non-ASCII files. +.Pp +The list of contributors to the +.Pa Magdir +directory (source for the +.Pa /usr/share/misc/magic +file) is too long to include here. +You know who you are; thank you. +.Sh "LEGAL NOTICE" +Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. +Covered by the standard Berkeley Software Distribution +copyright; see the file LEGAL.NOTICE in the source distribution. +.Pp +The files +.Pa tar.h +and +.Pa is_tar.c +were written by +.An John Gilmore +from his public-domain tar program, and are not covered by +the above license. +.Sh BUGS +There must be a better way to automate the construction of +the +.Pa Magic +file from all the glop in +.Pa Magdir . +What is it? +Better yet, the magic file should be compiled into binary +(say, +.Xr ndbm 3 +or, better yet, fixed-length +.Tn ASCII +strings for use in heterogenous network environments) for +faster startup. +Then the program would run as fast as the Version 7 program +of the same name, with the flexibility of +the System V version. +.Pp +File uses several algorithms that favor speed over accuracy, +thus it can be misled about the contents of text +files. +.Pp +The support for text files (primarily for programming languages) +is simplistic, inefficient and requires recompilation to update. +.Pp +There should be an +.Dq else +clause to follow a series of continuation lines. +.Pp +The magic file and keywords should have regular expression +support. Their use of ASCII TAB as a field delimiter is +ugly and makes it hard to edit the files, but is +entrenched. +.Pp +It might be advisable to allow upper-case letters in +keywords for e.g., +.Xr troff 1 +commands vs man page macros. +Regular expression support would make this easy. +.Pp +The program doesn't grok FORTRAN. +It should be able to figure FORTRAN by seeing some keywords +which appear indented at the start of line. +Regular expression support would make this easy. +.Pp +The list of keywords in ascmagic probably belongs in the +.Pa Magic +file. +This could be done by using some keyword like +`*' for the offset value. +.Pp +Another optimisation would be to sort the magic file so +that we can just run down all the tests for the first +byte, first word, first long, etc, once we have fetched +it. Complain about conflicts in the magic file entries. +Make a rule that the magic entries sort based on file offset +rather than position within the magic file? +.Pp +The program should provide a way to give an estimate of +.Dq how good +a guess is. +We end up removing guesses (e.g. +.Dq From +as first 5 chars of file) because they are not +as good as other guesses (e.g. +.Dq Newsgroups: +versus +.Dq Return-Path: Ns ). +Still, if the others don't pan out, it +should be possible to use the first guess. +.Pp +This program is slower than some vendors' file commands. +The new support for multiple character codes makes it even +slower. +.Pp +This manual page, and particularly this section, is too long. +.Sh AVAILABILITY +You can obtain the original author's latest version by +anonymous FTP on +.Pa ftp.astron.com +in the directory +.Pa /pub/file/file-X.YY.tar.gz diff --git a/usr.bin/file/magic.5 b/usr.bin/file/magic.5 new file mode 100644 index 0000000..3b88eac --- /dev/null +++ b/usr.bin/file/magic.5 @@ -0,0 +1,242 @@ +.\" +.\" $FreeBSD$ +.\" +.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems. +.\" +.Dd December 08, 2000 +.Dt MAGIC 5 "Public Domain" +.Os +.Sh NAME +.Nm magic +.Nd file command's magic number file +.Sh DESCRIPTION +This manual page documents the format of the magic file as +used by the +.Nm +command, version 3.33. The +.Nm file +command identifies the type of a file using, +among other tests, +a test for whether the file begins with a certain +.Em "magic number" . +The file +.Pa /usr/share/misc/magic +specifies what magic numbers are to be tested for, +what message to print if a particular magic number is found, +and additional information to extract from the file. +.Pp +Each line of the file specifies a test to be performed. +A test compares the data starting at a particular offset +in the file with a 1-byte, 2-byte, or 4-byte numeric value or +a string. +If the test succeeds, a message is printed. +The line consists of the following fields: +.Bl -tag -width indent +.It offset +A number specifying the offset, in bytes, into the file of the data +which is to be tested. +.It type +The type of the data to be tested. +The possible values are: +.Bl -tag -width indent +.It byte +A one-byte value. +.It short +A two-byte value (on most systems) in this machine's native byte order. +.It long +A four-byte value (on most systems) in this machine's native byte order. +.It string +A string of bytes. +The string type specification can be optionally followed +by /[Bbc]*. The +.Dq B +flag compacts whitespace in the target, which must contain +at least one whitespace character. +If the magic has "n" consecutive blanks, the target needs +at least "n" consecutive blanks to match. +The +.Dq b +flag treats every blank in the target as an optional blank. +Finally the +.Dq c +flag, specifies case insensitive matching: lowercase characters +in the magic match both lower and upper case characters in the +targer, whereas upper case characters in the magic, only much +uppercase characters in the target. +.It date +A four-byte value interpreted as a unix date. +.It beshort +A two-byte value (on most systems) in big-endian byte order. +.It belong +A four-byte value (on most systems) in big-endian byte order. +.It bedate +A four-byte value (on most systems) in big-endian byte order, +interpreted as a unix date. +.It leshort +A two-byte value (on most systems) in little-endian byte order. +.It lelong +A four-byte value (on most systems) in little-endian byte order. +.It ledate +A four-byte value (on most systems) in little-endian byte order, +interpreted as a unix date. +.El +.El +.Pp +The numeric types may optionally be followed by +.Em & +and a numeric value, +to specify that the value is to be AND'ed with the +numeric value before any comparisons are done. Prepending a +.Em u +to the type indicates that ordered comparisons should be unsigned. +.Bl -tag -width indent +.It test +The value to be compared with the value from the file. If the type is +numeric, this value +is specified in C form; if it is a string, it is specified as a C string +with the usual escapes permitted (e.g. \en for new-line). +.It "" +Numeric values +may be preceded by a character indicating the operation to be performed. +It may be +.Em = , +to specify that the value from the file must equal the specified value, +.Em < , +to specify that the value from the file must be less than the specified +value, +.Em > , +to specify that the value from the file must be greater than the specified +value, +.Em & , +to specify that the value from the file must have set all of the bits +that are set in the specified value, +.Em ^ , +to specify that the value from the file must have clear any of the bits +that are set in the specified value, or +.Em x , +to specify that any value will match. +If the character is omitted, +it is assumed to be +.Em = . +.It "" +Numeric values are specified in C form; e.g. +.Em 13 +is decimal, +.Em 013 +is octal, and +.Em 0x13 +is hexadecimal. +.It "" +For string values, the byte string from the +file must match the specified byte string. +The operators +.Em = , +.Em < +and +.Em > +(but not +.Em & ) +can be applied to strings. +The length used for matching is that of the string argument +in the magic file. This means that a line can match any string, and +then presumably print that string, by doing +.Em >\e0 +(because all strings are greater than the null string). +.It message +The message to be printed if the comparison succeeds. If the string +contains a +.Xr printf 3 +format specification, the value from the file (with any specified masking +performed) is printed using the message as the format string. +.El +.Pp +Some file formats contain additional information which is to be printed +along with the file type. A line which begins with the character +.Em > +indicates additional tests and messages to be printed. The number of +.Em > +on the line indicates the level of the test; a line with no +.Em > +at the beginning is considered to be at level 0. +Each line at level +.Em n+1 +is under the control of the line at level +.Em n +most closely preceding it in the magic file. +If the test on a line at level +.Em n +succeeds, the tests specified in all the subsequent lines at level +.Em n+1 +are performed, and the messages printed if the tests succeed. The next +line at level +.Em n +terminates this. +If the first character following the last +.Em > +is a +.Em \&( +then the string after the parenthesis is interpreted as an indirect offset. +That means that the number after the parenthesis is used as an offset in +the file. +The value at that offset is read, and is used again as an offset +in the file. +Indirect offsets are of the form: +.Em (x[.[bslBSL]][+-][y]) . +The value of +.Em x +is used as an offset in the file. +A byte, short or long is read at that offset +depending on the +.Em [bslBSL] +type specifier. +The capitalized types interpret the number as a big endian value, whereas +a small letter versions interpret the number as a little endian value. +To that number the value of +.Em y +is added and the result is used as an offset in the file. +The default type +if one is not specified is long. +.Pp +Sometimes you do not know the exact offset as this depends on the length of +preceding fields. +You can specify an offset relative to the end of the +last uplevel field (of course this may only be done for sublevel tests, i.e. +test beginning with +.Em > Ns ). +Such a relative offset is specified using +.Em & +as a prefix to the offset. +.Sh BUGS +The formats +.Em long , +.Em belong , +.Em lelong , +.Em short , +.Em beshort , +.Em leshort , +.Em date , +.Em bedate , +and +.Em ledate +are system-dependent; perhaps they should be specified as a number +of bytes (2B, 4B, etc), +since the files being recognized typically come from +a system on which the lengths are invariant. +.Pp +There is (currently) no support for specified-endian data to be used in +indirect offsets. +.Sh SEE ALSO +.Xr file 1 +.\" +.\" From: guy@sun.uucp (Guy Harris) +.\" Newsgroups: net.bugs.usg +.\" Subject: /etc/magic's format isn't well documented +.\" Message-ID: <2752@sun.uucp> +.\" Date: 3 Sep 85 08:19:07 GMT +.\" Organization: Sun Microsystems, Inc. +.\" Lines: 136 +.\" +.\" Here's a manual page for the format accepted by the "file" made by adding +.\" the changes I posted to the S5R2 version. +.\" +.\" Modified for Ian Darwin's version of the file command. |