summaryrefslogtreecommitdiffstats
path: root/lib/libarchive/tar.5
diff options
context:
space:
mode:
authorkientzle <kientzle@FreeBSD.org>2004-02-09 23:22:54 +0000
committerkientzle <kientzle@FreeBSD.org>2004-02-09 23:22:54 +0000
commitaf9413b539628b0758b4af66ffbb879fba49cc6d (patch)
tree20e1d80fd0a1d288a08af1696ee258bcd08f41d0 /lib/libarchive/tar.5
parent6b6e533f7e91602d4ccb3353351eb9f816c2daf2 (diff)
downloadFreeBSD-src-af9413b539628b0758b4af66ffbb879fba49cc6d.zip
FreeBSD-src-af9413b539628b0758b4af66ffbb879fba49cc6d.tar.gz
Initial import of libarchive.
What it is: A library for reading and writing various streaming archive formats, especially tar and cpio. Being a library, it should be easy to incorporate into pkg_* tools, sysinstall, and any other place that needs to read or write such archives. Features: * Full automatic detection of both compression and archive format. * Extensible internal architecture to make it easy to add new formats. * Support for "pax interchange format," a new POSIX-standard tar format that eliminates essentially all of the restrictions of historic formats. * BSD license Thanks to: jkh for pushing me to start this work, gordon for encouraging me to commit it, bde for answering endless style questions, and many others for feedback and encouragement. Status: Pretty good overall, though there are still a few rough edges and the library could always use more testing. Feedback eagerly solicited.
Diffstat (limited to 'lib/libarchive/tar.5')
-rw-r--r--lib/libarchive/tar.5626
1 files changed, 626 insertions, 0 deletions
diff --git a/lib/libarchive/tar.5 b/lib/libarchive/tar.5
new file mode 100644
index 0000000..5149454
--- /dev/null
+++ b/lib/libarchive/tar.5
@@ -0,0 +1,626 @@
+.\" Copyright (c) 2003-2004 Tim Kientzle
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd October 1, 2003
+.Dt TAR 5
+.Os
+.Sh NAME
+.Nm tar
+.Nd format of tape archive files
+.Sh DESCRIPTION
+The
+.Nm
+archive format collects any number of files, directories, and other
+filesystem objects (symbolic links, device nodes, etc.) into a single
+stream of bytes.
+The format was originally designed to be used with
+tape drives that operate with fixed-size blocks, but is widely used as
+a general packaging mechanism.
+.Ss General Format
+A
+.Nm
+archive consists of a series of 512-byte records.
+Each filesystem object requires a header record which stores basic metadata
+(pathname, owner, permissions, etc.) and zero or more records containing any
+file data.
+The end of the archive is indicated by two records consisting
+entirely of zero bytes.
+.Pp
+For compatibility with tape drives that use fixed block sizes,
+programs that read or write tar files always read or write a fixed
+number of records with each I/O operation.
+These
+.Dq blocks
+are always a multiple of the record size.
+The most common block size---and the maximum supported by historical
+implementations---is 10240 bytes or 20 records.
+(Note: the terms
+.Dq block
+and
+.Dq record
+here are not entirely standard; this document follows the
+convention established by John Gilmore in documenting
+.Nm pdtar . )
+.Ss Old-Style Archive Format
+The original tar archive format has been extended many times to
+include additional information that various implementors found
+necessary.
+This section describes a variant that is compatible with
+most historic
+.Nm
+implementations.
+.Pp
+The header record for an old-style
+.Nm
+archive consists of the following:
+.Bd -literal -offset indent
+struct tarfile_header_old {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+};
+.Ed
+The remaining bytes in the header record are filled with nulls.
+.Bl -tag -width indent
+.It Va name
+Pathname, stored as a null-terminated string.
+Some very early implementations only supported regular files.
+However, a common early convention added
+a trailing "/" character to indicate a directory name, allowing
+directory permissions and owner information to be archived and restored.
+.It Va mode
+File mode, stored as an octal number in ASCII.
+.It Va uid , Va gid
+User id and group id of owner, as octal number in ASCII.
+.It Va size
+Size of file, as octal number in ASCII.
+.It Va mtime
+Modification time of file, as an octal number in ASCII.
+This indicates the number of seconds since the start of the epoch,
+00:00:00 UTC January 1, 1970.
+Note that negative values should be avoided
+here, as they are handled inconsistently.
+.It Va checksum
+Header checksum, stored as an octal number in ASCII.
+To compute the checksum, set the checksum field to all spaces,
+then sum all bytes in the header using unsigned arithmetic.
+This field should be stored as six octal digits followed by a null and a space
+character.
+Note that for many years, Sun tar used signed arithmetic
+for the checksum field, which can cause interoperability problems
+when transferring archives between systems.
+This error was propagated to other implementations that used Sun
+tar as a reference.
+Modern robust readers compute the checksum both ways and accept the
+header if either computation matches.
+.El
+.Pp
+Early implementations of
+.Nm
+varied in how they terminated these fields.
+Early BSD documentation specified the following: the pathname must
+be null-terminated; the mode, uid, and gid fields must end in a space and a
+null byte; the size and mtime fields must end in a space; the checksum is
+terminated by a null and a space.
+For best portability, writers of
+.Nm
+archives should fill the numeric fields with leading zeros.
+.Ss Early Extensions
+Very early
+.Nm
+implementations only supported regular files.
+Two early extensions added support for directories, hard links, and
+symbolic links.
+.Pp
+Early
+.Nm
+archives indicated directories by adding a trailing
+.Pa /
+to the name.
+The size field was often used to indicate the total size of all files
+in the directory.
+This was intended to facilitate extraction on systems that pre-allocated
+directory storage; most modern readers should simply ignore the
+size field for directories.
+.Pp
+To support hard links and symbolic links,
+.Va linktype
+and
+.Va linkname
+fields were added:
+.Bd -literal -offset indent
+struct tarfile_entry_common {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char linktype[1];
+ char linkname[100];
+};
+.Ed
+.Pp
+The
+.Va linktype
+field indicates the type of entry.
+For backwards compatibility, a NULL
+character here indicates a regular file or directory.
+An ASCII "1" here indicates a hard link entry, ASCII "2" indicates
+a symbolic link.
+The
+.Va linkname
+field holds the name of the file linked to.
+.Ss POSIX Standard Archives
+POSIX 1003.1 defines a standard
+.Nm
+file format that is read and written
+by POSIX-compliant implementations
+of
+.Xr pax 1 .
+This format is often called the
+.Dq ustar
+format, after the magic value used
+in the header.
+(The name is an acronym for
+.Dq Unix Standard TAR . )
+It extends the format above
+with new fields:
+.Bd -literal -offset indent
+struct tarfile_entry_posix {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char prefix[155];
+};
+.Ed
+.Bl -tag -width indent
+.It Va typeflag
+Type of entry. POSIX adopted the BSD
+.Va linktype
+field and extended it with several new type values:
+.Bl -tag -width indent -compact
+.It Dq 0
+Regular file. NULL should be treated as a synonym, for compatibility purposes.
+.It Dq 1
+Hard link.
+.It Dq 2
+Symbolic link.
+.It Dq 3
+Character device node.
+.It Dq 4
+Block device node.
+.It Dq 5
+Directory.
+.It Dq 6
+FIFO node.
+.It Dq 7
+Reserved.
+.It Other
+A POSIX-compliant implementation must treat any unrecognized typeflag value
+as a regular file.
+In particular, writers should ensure that all entries
+have a valid filename so that they can be restored by readers that do not
+support the corresponding extension.
+Uppercase letters "A" through "Z" are reserved for custom extensions.
+Note that sockets and whiteout entries are not archivable.
+.El
+.It Va magic
+Contains the magic value
+.Dq ustar
+followed by a NULL byte to indicate that this is a POSIX standard archive.
+Full compliance requires the uname and gname fields be properly set.
+(Note that GNU tar archives uses a trailing space rather than a trailing
+NULL here and are therefore not POSIX standard archives.)
+.It Va version
+Version. This should be
+.Dq 00
+(two copies of the ASCII digit zero) for POSIX standard archives.
+(Note that GNU tar archives fill this with a space and a null.)
+.It Va uname , Va gname
+User and group names, as null-terminated ASCII strings.
+These should be used in preference to the uid/gid values
+when they are set and the corresponding names exist on
+the system.
+.It Va devmajor , Va devminor
+Major and minor numbers for character device or block device entry.
+.It Va prefix
+First part of pathname.
+If the pathname is too long to fit in the 100 bytes provided by the standard
+format, it can be split at any
+.Pa /
+character with the first portion going here.
+If the prefix field is not empty, the reader will prepend
+the prefix value and a
+.Pa /
+character to the regular name field to obtain the full pathname.
+.El
+.Pp
+Note that all unused bytes must be set to
+.Dv NULL .
+.Pp
+Field termination is specified slightly differently by POSIX
+than by previous implementations.
+The
+.Va magic ,
+.Va uname ,
+and
+.Va gname
+fields must have a trailing
+.Dv NULL .
+The
+.Va pathname ,
+.Va linkname ,
+and
+.Va prefix
+fields must have a trailing
+.Dv NULL
+unless they fill the entire field.
+(In particular, it is possible to store a 256-character pathname if it
+happens to have a
+.Pa /
+as the 156th character.)
+POSIX requires numeric fields to be zero-padded in the front, and allows
+them to be terminated with either space or
+.Dv NULL
+characters.
+.Ss Pax Interchange Format
+There are many attributes that cannot be portably stored in a
+POSIX ustar archive.
+POSIX defined a
+.Dq pax interchange format
+that uses two new types of entries to hold text-formatted
+metadata that applies to following entries.
+Note that a pax interchange format archive is a ustar archive in every
+respect.
+The new data is stored in ustar-compatible archive entries that use the
+.Dq x
+or
+.Dq g
+typeflag.
+In particular, older implementations that do not fully support these
+extensions will extract the metadata into regular files, where the
+metadata can be examined as necessary.
+.Pp
+An entry in a pax interchange format archive consists of one or
+two standard entries, each with its own header and data.
+The first optional entry stores the extended attributes
+for the second entry.
+This optional first entry has an "x" typeflag and a size field that
+indicates the total size of the extended attributes.
+The extended attributes themselves are stored as a series of text-format
+lines encoded in the portable UTF-8 encoding.
+Each line consists of a decimal number, a space, a key string, an equals
+sign, a value string, and a new line.
+The decimal number indicates the length of the entire line, including the
+initial length field and the trailing newline.
+Keys are always encoded in portable 7-bit ASCII.
+Keys in all lowercase are reserved for future standardization.
+Vendors can add their own keys by prefixing them with an all uppercase
+vendor name and a period.
+Note that, unlike the historic header, numeric values are stored using
+decimal, not octal.
+.Bl -tag -width indent
+.It Cm atime , Cm ctime , Cm mtime
+File access, inode change, and modification times.
+These fields can be negative or include a decimal point and a fractional value.
+.It Cm uname , Cm uid , Cm gname , Cm gid
+User name, group name, and numeric UID and GID values. The user name
+and group name stored here are encoded in UTF8 and can thus include
+non-ASCII characters. The UID and GID fields can be of arbitrary length.
+.It Cm linkpath
+The full path of the linked-to file. Note that this is encoded in UTF8
+and can thus include non-ASCII characters.
+.It Cm path
+The full pathname of the entry. Note that this is encoded in UTF8
+and can thus include non-ASCII characters.
+.It Cm realtime.* , Cm security.*
+These keys are reserved by SUSv3 and may be used for future standardization.
+.It Cm size
+The size of the file. Note that there is no length limit on this field,
+allowing
+.Nm
+archives to store files much larger than the historic 8GB limit.
+.It Cm SCHILY.*
+Vendor-specific attributes used by Joerg Schilling's
+.Nm star
+implementation.
+.It Cm SCHILY.acl.access , Cm SCHILY.acl.default
+Stores the access and default ACLs as textual strings in a format
+that's an extension of the format specified by POSIX XXXX draft 17.
+In particular, each user or group access specification can include a fourth
+field with the integer UID or GID.
+This allows ACLs to be restored on systems that may not have complete
+user or group information available (such as when NIS/YP or LDAP services
+are temporarily unavailable).
+.It Cm SCHILY.devminor , Cm SCHILY.devmajor
+The full minor and major numbers for device nodes.
+.It Cm SCHILY.ino
+The inode number for the entry.
+.It Cm VENDOR.*
+XXX document other vendor-specific extensions XXX
+.El
+.Pp
+Any values stored in an extended attribute override the corresponding
+values in the regular tar header.
+Note that compliant readers should ignore the regular fields when they
+are overridden.
+This is important, as existing archivers are known to store non-compliant
+values in the standard header fields in this situation.
+There are no limits on length for any of these fields.
+In particular, numeric fields can be arbitrarily large.
+All text fields are encoded in UTF8.
+Compliant writers should store only portable 7-bit ASCII characters in
+the standard ustar header and use extended
+attributes whenever a text value contains non-ASCII characters.
+.Pp
+In addition to the
+.Cm x
+entry described above, the pax interchange format
+also supports a
+.Cm g
+entry.
+The
+.Cm g
+entry is identical in format, but specifies attributes that serve as
+defaults for all subsequent archive entries.
+The
+.Cm g
+entry is not widely used.
+.Ss GNU Tar Archives
+The GNU tar program added new features by starting with an early draft
+of POSIX and using three different extension mechanisms: They added
+new fields to the empty space in the header (some of which was later
+used by POSIX for conflicting purposes);
+they allowed the header to
+be continued over multiple records;
+and they defined new entries
+that modify following entries (similar in principle to the
+.Cm x
+entry described above, but each GNU special entry is single-purpose,
+unlike the general-purpose
+.Cm x
+entry).
+As a result, GNU tar archives are not POSIX compatible, although
+more lenient POSIX-compliant readers can successfully extract most
+GNU tar archives.
+.Bd -literal -offset indent
+struct tarfile_entry_gnu {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char atime[12];
+ char ctime[12];
+ char offset[12];
+ char longnames[4];
+ char unused[1];
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[4];
+ char isextended[1];
+ char realsize[12];
+};
+.Ed
+.Bl -tag -width indent
+.It Va typeflag
+GNU tar uses the following special entry types.
+.Bl -tag -width indent
+.It "7"
+GNU tar treats type "7" records identically to type "0" records,
+except on one obscure RTOS where they are used to indicate the
+pre-allocation of a contiguous file on disk.
+.It "D"
+This indicates a directory entry. Unlike the POSIX-standard "5"
+typeflag, the header is followed by data records listing the names
+of files in this directory. Each name is preceded by an ASCII "Y"
+if the file is stored in this archive or "N" if the file is not
+stored in this archive. Each name is terminated with a null, and
+an extra null marks the end of the name list. The purpose of this
+entry is to support incremental backups; a program restoring from
+such an archive may wish to delete files on disk that did not exist
+in the directory when the archive was made.
+.Pp
+Note that the "D" typeflag specifically violates POSIX, which requires
+that unrecognized typeflags be restored as normal files.
+In this case, restoring the "D" entry as a file could interfere
+with subsequent creation of the like-named directory.
+.It "K"
+The data for this entry is a long linkname for the following regular entry.
+.It "L"
+The data for this entry is a long pathname for the following regular entry.
+.It "M"
+This is a continuation of the last file on the previous volume.
+GNU multi-volume archives gaurantee that each volume begins with a valid
+entry header.
+To ensure this, a file may be split, with part stored at the end of one volume,
+and part stored at the beginning of the next volume.
+The "M" typeflag indicates that this entry continues
+an existing file.
+Such entries can only occur as the first or second entry
+in an archive (the latter only if the first entry is a volume label).
+The
+.Va size
+field specifies the size of this entry.
+The
+.Va offset
+field at bytes 369-380 specifies the offset where this file fragment
+begins.
+The
+.Va realsize
+field specifies the total size of the file (which must equal
+.Va size
+plus
+.Va offset ) .
+When extracting, GNU tar checks that the header file name is the one it is
+expecting, that the header offset is in the correct sequence, and that
+the sum of offset and size is equal to realsize.
+FreeBSD's version of GNU tar does not handle the corner case of an
+archive being continued in the middle of a long name or other
+extension header.
+.It "N"
+Type "N" records are no longer generated by GNU tar. They contained a
+list of files to be renamed or symlinked after extraction; this was
+originally used to support long names. The contents of this record
+are a text description of the operations to be done, in the form
+.Dq Rename %s to %s\en
+or
+.Dq Symlink %s to %s\en ;
+in either case, both
+filenames are escaped using K&R C syntax.
+.It "S"
+This is a
+.Dq sparse
+regular file.
+Sparse files are stored as a series of fragments.
+The header contains a list of fragment offset/length pairs.
+If more than four such entries are required, the header is
+extended as necessary with
+.Dq extra
+header extensions (an older format that's no longer used), or
+.Dq sparse
+extensions.
+.It "V"
+The
+.Va name
+field should be interpreted as a tape/volume header name.
+This entry should generally be ignored on extraction.
+.El
+.It Va magic
+The magic field holds the five characters
+.Dq ustar
+followed by a space.
+Note that POSIX ustar archives have a trailing null.
+.It Va version
+The version field holds a space character followed by a null.
+Note that POSIX ustar archive use two copies of the ASCII digit
+.Dq 0 .
+.It Va atime , Va ctime
+The time the file was last accessed and the time of
+last change of file information, stored in octal as with
+.Va mtime.
+.It Va longnames
+This field is apparently no longer used.
+.It Sparse Va offset / Va numbytes
+Each such structure specifies a single fragment of a sparse
+file.
+The two fields store values as octal numbers.
+The fragments are each padded to a multiple of 512 bytes
+in the archive.
+On extraction, the list of fragments is collected from the
+header (including any extension headers), and the data
+is then read and written to the file at appropriate offsets.
+.It Va isextended
+If this is set to non-zero, the header will be followed by
+additional
+.Dq sparse header
+records.
+Each such record contains XXX more details needed XXX
+.It Va realsize
+A binary representation of the size, with a much larger range
+than the POSIX file size.
+.El
+.Ss Other Extensions
+One common extension, utilized by GNU tar, star, and other newer
+.Nm
+implementations, permits binary numbers in the standard numeric
+fields.
+This is flagged by setting the high bit of the first character.
+This permits 95-bit values for the length and time fields
+and 63-bit values for the uid, gid, and device numbers.
+GNU tar supports this extension for the
+length, mtime, ctime, and atime fields.
+Joerg Schilling's star program supports this extension for
+all numeric fields.
+Note that this extension is largely obsoleted by the extended attribute
+record provided by the pax interchange format.
+.Pp
+Another early GNU extension allowed base-64 values rather
+than octal.
+This extension was short-lived and such archives are almost never seen.
+However, there is still code in GNU tar to support them; this code is
+responsible for a very cryptic warning message that is sometimes seen when
+GNU tar encounters a damaged archive.
+.Sh SEE ALSO
+.Xr ar 1 ,
+.Xr pax 1 ,
+.Xr tar 1 ,
+.Sh STANDARDS
+The
+.Nm tar
+utility is no longer a part of any official standard.
+It last appeared in SUSv2.
+It has been supplanted in subsequent standards by
+.Xr pax 1 .
+The ustar format is defined in
+.St -p1003.1
+as part of the specification for the
+.Xr pax 1
+utility.
+The pax interchange file format is new with
+.St -p1003.1-2001 .
+.Sh HISTORY
+A
+.Nm tar
+command appeared in Sixth Edition Unix.
+John Gilmore's
+.Nm pdtar
+public-domain implementation (circa 1987) was highly influential
+and formed the basis of GNU tar.
+Joerg Shilling's
+.Nm star
+archiver is another open-source (GPL) archiver (originally developed
+circa 1985) which features complete support for pax interchange
+format.
OpenPOWER on IntegriCloud