From 8fb65ce818b3e3c6f165b583b910af24000768a5 Mon Sep 17 00:00:00 2001 From: rgrimes Date: Tue, 24 May 1994 10:09:53 +0000 Subject: BSD 4.4 Lite Kernel Sources --- sys/ufs/lfs/README | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 sys/ufs/lfs/README (limited to 'sys/ufs/lfs/README') diff --git a/sys/ufs/lfs/README b/sys/ufs/lfs/README new file mode 100644 index 0000000..724b18f --- /dev/null +++ b/sys/ufs/lfs/README @@ -0,0 +1,139 @@ +# @(#)README 8.1 (Berkeley) 6/11/93 + +The file system is reasonably stable, but incomplete. There are +places where cleaning performance can be improved dramatically (see +comments in lfs_syscalls.c). For details on the implementation, +performance and why garbage collection always wins, see Dr. Margo +Seltzer's thesis available for anonymous ftp from toe.cs.berkeley.edu, +in the directory pub/personal/margo/thesis.ps.Z, or the January 1993 +USENIX paper. + +Missing Functionality: + Multiple block sizes and/or fragments are not yet implemented. + +---------- +The disk is laid out in segments. The first segment starts 8K into the +disk (the first 8K is used for boot information). Each segment is composed +of the following: + + An optional super block + One or more groups of: + segment summary + 0 or more data blocks + 0 or more inode blocks + +The segment summary and inode/data blocks start after the super block (if +present), and grow toward the end of the segment. + + _______________________________________________ + | | | | | + | summary | data/inode | summary | data/inode | + | block | blocks | block | blocks | ... + |_________|____________|_________|____________| + +The data/inode blocks following a summary block are described by the +summary block. In order to permit the segment to be written in any order +and in a forward direction only, a checksum is calculated across the +blocks described by the summary. Additionally, the summary is checksummed +and timestamped. Both of these are intended for recovery; the former is +to make it easy to determine that it *is* a summary block and the latter +is to make it easy to determine when recovery is finished for partially +written segments. These checksums are also used by the cleaner. + + Summary block (detail) + ________________ + | sum cksum | + | data cksum | + | next segment | + | timestamp | + | FINFO count | + | inode count | + | flags | + |______________| + | FINFO-1 | 0 or more file info structures, identifying the + | . | blocks in the segment. + | . | + | . | + | FINFO-N | + | inode-N | + | . | + | . | + | . | 0 or more inode daddr_t's, identifying the inode + | inode-1 | blocks in the segment. + |______________| + +Inode blocks are blocks of on-disk inodes in the same format as those in +the FFS. However, spare[0] contains the inode number of the inode so we +can find a particular inode on a page. They are packed page_size / +sizeof(inode) to a block. Data blocks are exactly as in the FFS. Both +inodes and data blocks move around the file system at will. + +The file system is described by a super-block which is replicated and +occurs as the first block of the first and other segments. (The maximum +number of super-blocks is MAXNUMSB). Each super-block maintains a list +of the disk addresses of all the super-blocks. The super-block maintains +a small amount of checkpoint information, essentially just enough to find +the inode for the IFILE (fs->lfs_idaddr). + +The IFILE is visible in the file system, as inode number IFILE_INUM. It +contains information shared between the kernel and various user processes. + + Ifile (detail) + ________________ + | cleaner info | Cleaner information per file system. (Page + | | granularity.) + |______________| + | segment | Space available and last modified times per + | usage table | segment. (Page granularity.) + |______________| + | IFILE-1 | Per inode status information: current version #, + | . | if currently allocated, last access time and + | . | current disk address of containing inode block. + | . | If current disk address is LFS_UNUSED_DADDR, the + | IFILE-N | inode is not in use, and it's on the free list. + |______________| + + +First Segment at Creation Time: +_____________________________________________________________ +| | | | | | | | +| 8K pad | Super | summary | inode | ifile | root | l + f | +| | block | | block | | dir | dir | +|________|_______|_________|_______|_______|_______|_______| + ^ + Segment starts here. + +Some differences from the Sprite LFS implementation. + +1. The LFS implementation placed the ifile metadata and the super block + at fixed locations. This implementation replicates the super block + and puts each at a fixed location. The checkpoint data is divided into + two parts -- just enough information to find the IFILE is stored in + two of the super blocks, although it is not toggled between them as in + the Sprite implementation. (This was deliberate, to avoid a single + point of failure.) The remaining checkpoint information is treated as + a regular file, which means that the cleaner info, the segment usage + table and the ifile meta-data are stored in normal log segments. + (Tastes great, less filling...) + +2. The segment layout is radically different in Sprite; this implementation + uses something a lot like network framing, where data/inode blocks are + written asynchronously, and a checksum is used to validate any set of + summary and data/inode blocks. Sprite writes summary blocks synchronously + after the data/inode blocks have been written and the existence of the + summary block validates the data/inode blocks. This permits us to write + everything contiguously, even partial segments and their summaries, whereas + Sprite is forced to seek (from the end of the data inode to the summary + which lives at the end of the segment). Additionally, writing the summary + synchronously should cost about 1/2 a rotation per summary. + +3. Sprite LFS distinguishes between different types of blocks in the segment. + Other than inode blocks and data blocks, we don't. + +4. Sprite LFS traverses the IFILE looking for free blocks. We maintain a + free list threaded through the IFILE entries. + +5. The cleaner runs in user space, as opposed to kernel space. It shares + information with the kernel by reading/writing the IFILE and through + cleaner specific system calls. + -- cgit v1.1