From b9953222d0e325c0e436388067fa77720bc3d686 Mon Sep 17 00:00:00 2001 From: ru Date: Thu, 9 Aug 2001 14:38:10 +0000 Subject: Update for 3.36. Reduce diffs to distributed, man(7) format, version. Markup nits. --- usr.bin/file/file.1 | 416 ++++++++++++++++++++++++++++------------------------ 1 file changed, 223 insertions(+), 193 deletions(-) diff --git a/usr.bin/file/file.1 b/usr.bin/file/file.1 index 8f9451f..263a8a2 100644 --- a/usr.bin/file/file.1 +++ b/usr.bin/file/file.1 @@ -1,5 +1,6 @@ .\" $FreeBSD$ -.Dd December 8, 2000 +.\" $Id: file.man,v 1.39 2001/04/27 22:48:33 christos Exp $ +.Dd April 4, 2001 .Dt FILE 1 "Copyright but distributable" .Os .Sh NAME @@ -11,10 +12,14 @@ .Op Fl f Ar namefile .Op Fl m Ar magicfiles .Ar +.Nm +.Fl C +.Op Fl m Ar magicfile .Sh DESCRIPTION -This manual page documents version 3.33 of the +This manual page documents version 3.36 of the .Nm command. +.Pp .Nm File tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: @@ -24,131 +29,136 @@ The test that succeeds causes the file type to be printed. .Pp The type printed will usually contain one of the words -.Em text -(the file contains only printing characters and a few -common control characters and is probably safe to read on -an +.Dq Li text +(the file contains only +printing characters and a few common control +characters and is probably safe to read on an .Tn ASCII terminal), -.Em executable +.Dq Li executable (the file contains the result of compiling a program in a form understandable to some .Ux -kernel or another), or -.Em data +kernel or another), +or +.Dq Li data meaning anything else (data is usually .Sq binary or non-printable). -Exceptions are well-known file formats (core files, tar -archives) that are known to contain binary data. +Exceptions are well-known file formats (core files, tar archives) +that are known to contain binary data. When modifying the file .Pa /usr/share/misc/magic or the program itself, .Em "preserve these keywords" . -People depend on knowing that all the readable files in a -directory have the word -.Dq text +People depend on knowing that all the readable files in a directory +have the word +.Dq Li text printed. Don't do as Berkeley did and change -.Dq shell commands text +.Dq Li "shell commands text" to -.Dq shell script . +.Dq Li "shell script" . Note that the file .Pa /usr/share/misc/magic -is built mechanically from a large number of -small files in the subdirectory +is built mechanically from a large number of small files in +the subdirectory .Pa Magdir in the source distribution of this program. .Pp -The filesystem tests are based on examining the return -from a +The filesystem tests are based on examining the return from a .Xr stat 2 system call. -The program checks to see if the file is empty, or if it's -some sort of special file. -Any known file types appropriate to the system you are -running on (sockets, symbolic links, or named pipes -(FIFOs) on those systems that implement them) are intuited -if they are defined in the system header file +The program checks to see if the file is empty, +or if it's some sort of special file. +Any known file types appropriate to the system you are running on +(sockets, symbolic links, or named pipes (FIFOs) on those systems that +implement them) +are intuited if they are defined in +the system header file .Aq Pa sys/stat.h . .Pp -The magic number tests are used to check for files with -data in particular fixed formats. +The magic number tests are used to check for files with data in +particular fixed formats. The canonical example of this is a binary executable (compiled program) .Pa a.out file, whose format is defined in -.Pa a.out.h +.Aq Pa a.out.h and possibly -.Pa exec.h +.Aq Pa exec.h in the standard include directory. These files have a -.Sq magic number -stored in a particular place near the beginning of the file -that tells the +.Sq "magic number" +stored in a particular place +near the beginning of the file that tells the .Ux -operating system that the file is a binary executable, -and which of several types thereof. +operating system +that the file is a binary executable, and which of several types thereof. The concept of -.Sq magic number +.Sq "magic number" has been applied by extension to data files. -Any file with some invariant identifier at a small fixed offset -into the file can usually be described in this way. -The information identifying these files is read from the magic file -.Pa /usr/share/misc/magic . -.Pp -If a file does not match any of the entries in the magic -file, it is examined to see if it seems to be a text file. -.Tn ASCII , -.Tn ISO-8859-x , -non-ISO 8-bit extended-ASCII character -sets (such as those used on Macintosh and IBM PC systems), -.Tn UTF-8-encoded Unicode , -.Tn UTF-16-encoded Unicode , -and -.Tn EBCDIC -character sets can be distinguished by the different ranges -and sequences of bytes that constitute printable text in each set. +Any file with some invariant identifier at a small fixed +offset into the file can usually be described in this way. +The information identifying these files is read from the compiled +magic file +.Pa /usr/share/misc/magic.mgc , +or +.Pa /usr/share/misc/magic +if the compile file does not exist. +.Pp +If a file does not match any of the entries in the magic file, +it is examined to see if it seems to be a text file. +ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets +(such as those used on Macintosh and IBM PC systems), +UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC +character sets can be distinguished by the different +ranges and sequences of bytes that constitute printable text +in each set. If a file passes any of these tests, its character set is reported. -.Tn ASCII , -.Tn ISO-8859-x , -.Tn UTF-8 , -and extended-ASCII files are identified as -.Dq text +ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified +as +.Dq Li text because they will be mostly readable on nearly any terminal; -.Tn UTF-16 -and -.Tn EBCDIC -are only -.Dq character data -because, while they contain text, it is text that will -require translation before it can be read. -In addition, file will attempt to determine other characteristics of -text-type files. -If the lines of a file are terminated by CR, CRLF, or NEL, -instead of the Unix-standard LF, this will be reported. -Files that contain embedded escape sequences or overstriking will -also be identified. -.Pp -Once file has determined the character set used in a text-type file, -it will attempt to determine in what language the file is written. +UTF-16 and EBCDIC are only +.Dq Li "character data" +because, while +they contain text, it is text that will require translation +before it can be read. +In addition, +.Nm +will attempt to determine other characteristics of text-type files. +If the lines of a file are terminated by CR, CRLF, or NEL, instead +of the +.Ux Ns -standard +LF, this will be reported. +Files that contain embedded escape sequences or overstriking +will also be identified. +.Pp +Once +.Nm +has determined the character set used in a text-type file, +it will +attempt to determine in what language the file is written. The language tests look for particular strings (cf .Pa names.h ) that can appear anywhere in the first few blocks of a file. For example, the keyword -.Em \&.br +.Ic .br indicates that the file is most likely a .Xr troff 1 -input file, just as the keyword struct indicates a C program. -These tests are less reliable than the previous two -groups, so they are performed last. -The language test routines also test for some miscellany (such as +input file, just as the keyword +.Ic struct +indicates a C program. +These tests are less reliable than the previous +two groups, so they are performed last. +The language test routines also test for some miscellany +(such as .Xr tar 1 archives). .Pp Any file that cannot be identified as having been written -in any of the character sets listed above is simply said -to be -.Dq data . +in any of the character sets listed above is simply said to be +.Dq Li data . .Sh OPTIONS .Bl -tag -width indent .It Fl b @@ -158,6 +168,11 @@ Cause a checking printout of the parsed form of the magic file. This is usually used in conjunction with .Fl m to debug a new magic file before installing it. +.It Fl C +Write a +.Pa magic.mgc +output file that contains a pre-parsed version of +file. .It Fl f Ar namefile Read the names of the files to be examined from .Ar namefile @@ -167,19 +182,19 @@ Either .Ar namefile or at least one filename argument must be present; to test the standard input, use -.Dq Ar - +.Dq Fl as a filename argument. .It Fl i -Causes the file command to output mime type strings rather than the -more traditional human readable ones. +Causes the file command to output mime type strings rather than the more +traditional human readable ones. Thus it may say -.Dq text/plain; charset=us-ascii +.Dq Li "text/plain; charset=us-ascii" rather than -.Dq ASCII text . -In order for this option to work, file changes the way it handles -files recognised by the command itself (such as many of the text -file types, directories etc), and makes use of an alternative -.Dq Pa magic +.Dq Li "ASCII text" . +In order for this option to work, file changes the way +it handles files recognised by the command itself (such as many of the +text file types, directories etc), and makes use of an alternative +.Pa magic file. (See .Sx FILES @@ -187,44 +202,46 @@ section, below). .It Fl k Don't stop at the first match, keep going. .It Fl m Ar list -Specify an alternate -.Ar list -of files containing magic numbers. +Specify an alternate list of files containing magic numbers. This can be a single file, or a colon-separated list of files. .It Fl n Force stdout to be flushed after checking each file. This is only useful if checking a list of files. -It is intended to be used by programs that -want filetype output from a pipe. -.It Fl s -Normally, file only attempts to read and determine -the type of argument files which -.Xr stat 2 -reports are ordinary files. -This prevents problems, because reading special files -may have peculiar consequences. -Specifying the -.Fl s -option causes file to also read argument files which -are block or character special files. -This is useful for determining the filesystem types of -the data in raw disk partitions, which are block special files. -This option also causes file to disregard the file size as -reported by -.Xr stat 2 -since on some systems it reports a zero size for raw -disk partitions. +It is intended to be used by programs that want +filetype output from a pipe. .It Fl v Print the version of the program and exit. .It Fl z Try to look inside compressed files. .It Fl L -Cause symlinks to be followed, as the like-named option in +option causes symlinks to be followed, as the like-named option in .Xr ls 1 . (on systems that support symbolic links). +.It Fl s +Normally, +.Nm +only attempts to read and determine the type of argument files which +.Xr stat 2 +reports are ordinary files. +This prevents problems, because reading special files may have peculiar +consequences. +Specifying the +.Fl s +option causes +.Nm +to also read argument files which are block or character special files. +This is useful for determining the filesystem types of the data in raw +disk partitions, which are block special files. +This option also causes +.Nm +to disregard the file size as reported by +.Xr stat 2 +since on some systems it reports a zero size for raw disk partitions. .El .Sh FILES -.Bl -tag -width /usr/share/misc/magic.mime -compact +.Bl -tag -width ".Pa /usr/share/misc/magic.mime" -compact +.It Pa /usr/share/misc/magic.mgc +default compiled list of magic numbers .It Pa /usr/share/misc/magic default list of magic numbers .It Pa /usr/share/misc/magic.mime @@ -237,11 +254,13 @@ The environment variable .Ev MAGIC can be used to set the default magic number files. .Sh SEE ALSO +.Xr hexdump 1 , .Xr od 1 , .Xr strings 1 , .Xr magic 5 .Sh STANDARDS CONFORMANCE -This program is believed to exceed the System V Interface Definition +This program is believed to exceed the +.St -svid4 of FILE(CMD), as near as one can determine from the vague language contained therein. Its behaviour is mostly compatible with the System V program of the same name. @@ -253,33 +272,33 @@ between this version and System V is that this version treats any white space as a delimiter, so that spaces in pattern strings must be escaped. For example, -.Bd -literal -compact ->10 string language impress (imPRESS data) -.Ed +.Pp +.Dl ">10 string language impress\ (imPRESS data)" +.Pp in an existing magic file would have to be changed to -.Bd -literal -compact ->10 string language\e impress (imPRESS data) -.Ed +.Pp +.Dl ">10 string language\e impress (imPRESS data)" .Pp In addition, in this version, if a pattern string contains a backslash, -it must be escaped. For example -.Bd -literal -compact -0 string \ebegindata Andrew Toolkit document -.Ed +it must be escaped. +For example +.Pp +.Dl "0 string \ebegindata Andrew Toolkit document" +.Pp in an existing magic file would have to be changed to -.Bd -literal -compact -0 string \e\ebegindata Andrew Toolkit document -.Ed +.Pp +.Dl "0 string \e\ebegindata Andrew Toolkit document" .Pp SunOS releases 3.2 and later from Sun Microsystems include a .Xr file 1 command derived from the System V one, but with some extensions. My version differs from Sun's only in minor ways. -It includes the extension of the `&' operator, used as, +It includes the extension of the +.Sq Ic & +operator, used as, for example, -.Bd -literal -compact ->16 long&0x7fffffff >0 not stripped -.Ed +.Pp +.Dl ">16 long&0x7fffffff >0 not stripped" .Sh MAGIC DIRECTORY The magic file entries have been collected from various sources, mainly USENET, and contributed by various authors. @@ -330,7 +349,7 @@ There has been a command in every .Ux since at least Research Version 6 -(man page dated January, 1975). +(man page dated January 16, 1975). The System V version introduced one significant major change: the external list of magic number types. This slowed the program down slightly but made it a lot more flexible. @@ -347,7 +366,7 @@ the first version. found several inadequacies and provided some magic file entries. Contributions by the -.Sq \&& +.Sq Ic & operator by .An Rob McMahon Aq cudcv@warwick.ac.uk , 1989. @@ -355,21 +374,24 @@ operator by .An Guy Harris Aq guy@netapp.com , made many changes from 1993 to the present. .Pp -Primary development and maintenance from 1990 to the -present by +Primary development and maintenance from 1990 to the present by .An Christos Zoulas Aq christos@astron.com . .Pp Altered by .An Chris Lowth Aq chris@lowth.com , -2000: Handle the +2000: +Handle the .Fl i -option to output mime type strings and using an -alternative magic file and internal logic. +option to output mime type strings and using an alternative +magic file and internal logic. .Pp Altered by .An Eric Fischer Aq enf@pobox.com , -July, 2000, to identify character codes and attempt to identify -the languages of non-ASCII files. +July, 2000, +to identify character codes and attempt to identify the languages +of +.No non- Ns Tn ASCII +files. .Pp The list of contributors to the .Pa Magdir @@ -377,10 +399,11 @@ directory (source for the .Pa /usr/share/misc/magic file) is too long to include here. You know who you are; thank you. -.Sh "LEGAL NOTICE" +.Sh LEGAL NOTICE Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. -Covered by the standard Berkeley Software Distribution -copyright; see the file LEGAL.NOTICE in the source distribution. +Covered by the standard Berkeley Software Distribution copyright; see the file +.Pa LEGAL.NOTICE +in the source distribution. .Pp The files .Pa tar.h @@ -388,89 +411,96 @@ and .Pa is_tar.c were written by .An John Gilmore -from his public-domain tar program, and are not covered by -the above license. +from his public-domain +.Nm tar +program, and are not covered by the above license. .Sh BUGS -There must be a better way to automate the construction of -the +There must be a better way to automate the construction of the .Pa Magic file from all the glop in .Pa Magdir . What is it? -Better yet, the magic file should be compiled into binary -(say, +Better yet, the magic file should be compiled into binary (say, .Xr ndbm 3 or, better yet, fixed-length .Tn ASCII -strings for use in heterogenous network environments) for -faster startup. -Then the program would run as fast as the Version 7 program -of the same name, with the flexibility of -the System V version. -.Pp -File uses several algorithms that favor speed over accuracy, -thus it can be misled about the contents of text +strings for use in heterogenous network environments) for faster startup. +Then the program would run as fast as the Version 7 program of the same name, +with the flexibility of the System V version. +.Pp +.Nm File +uses several algorithms that favor speed over accuracy, +thus it can be misled about the contents of +text files. .Pp -The support for text files (primarily for programming languages) +The support for +text +files (primarily for programming languages) is simplistic, inefficient and requires recompilation to update. .Pp There should be an -.Dq else +.Ic else clause to follow a series of continuation lines. .Pp -The magic file and keywords should have regular expression -support. -Their use of ASCII TAB as a field delimiter is -ugly and makes it hard to edit the files, but is -entrenched. +The magic file and keywords should have regular expression support. +Their use of +.Tn "ASCII TAB" +as a field delimiter is ugly and makes +it hard to edit the files, but is entrenched. .Pp -It might be advisable to allow upper-case letters in -keywords for e.g., +It might be advisable to allow upper-case letters in keywords +for e.g., .Xr troff 1 commands vs man page macros. Regular expression support would make this easy. .Pp -The program doesn't grok FORTRAN. -It should be able to figure FORTRAN by seeing some keywords -which appear indented at the start of line. +The program doesn't grok +.Tn FORTRAN . +It should be able to figure +.Tn FORTRAN +by seeing some keywords which +appear indented at the start of line. Regular expression support would make this easy. .Pp -The list of keywords in ascmagic probably belongs in the +The list of keywords in +.Pa ascmagic +probably belongs in the .Pa Magic file. This could be done by using some keyword like -`*' for the offset value. +.Sq Ic * +for the offset value. .Pp -Another optimisation would be to sort the magic file so -that we can just run down all the tests for the first -byte, first word, first long, etc, once we have fetched -it. +Another optimisation would be to sort +the magic file so that we can just run down all the +tests for the first byte, first word, first long, etc, once we +have fetched it. Complain about conflicts in the magic file entries. -Make a rule that the magic entries sort based on file offset -rather than position within the magic file? +Make a rule that the magic entries sort based on file offset rather +than position within the magic file? .Pp -The program should provide a way to give an estimate of +The program should provide a way to give an estimate +of .Dq how good a guess is. -We end up removing guesses (e.g. -.Dq From -as first 5 chars of file) because they are not -as good as other guesses (e.g. -.Dq Newsgroups: +We end up removing guesses (e.g.\& +.Dq Li "From " +as first 5 chars of file) because +they are not as good as other guesses (e.g.\& +.Dq Li "Newsgroups:" versus -.Dq Return-Path: ) . -Still, if the others don't pan out, it -should be possible to use the first guess. +.Dq Li "Return-Path:" ) . +Still, if the others don't pan out, it should be +possible to use the first guess. .Pp This program is slower than some vendors' file commands. -The new support for multiple character codes makes it even -slower. +The new support for multiple character codes makes it even slower. .Pp This manual page, and particularly this section, is too long. .Sh AVAILABILITY -You can obtain the original author's latest version by -anonymous FTP on +You can obtain the original author's latest version by anonymous FTP +on .Pa ftp.astron.com in the directory .Pa /pub/file/file-X.YY.tar.gz -- cgit v1.1