summaryrefslogtreecommitdiffstats
path: root/share/doc/papers/memfs/1.t
diff options
context:
space:
mode:
Diffstat (limited to 'share/doc/papers/memfs/1.t')
-rw-r--r--share/doc/papers/memfs/1.t392
1 files changed, 0 insertions, 392 deletions
diff --git a/share/doc/papers/memfs/1.t b/share/doc/papers/memfs/1.t
deleted file mode 100644
index e079be9..0000000
--- a/share/doc/papers/memfs/1.t
+++ /dev/null
@@ -1,392 +0,0 @@
-.\" Copyright (c) 1990 The Regents of the University of California.
-.\" All rights reserved.
-.\"
-.\" Redistribution and use in source and binary forms, with or without
-.\" modification, are permitted provided that the following conditions
-.\" are met:
-.\" 1. Redistributions of source code must retain the above copyright
-.\" notice, this list of conditions and the following disclaimer.
-.\" 2. Redistributions in binary form must reproduce the above copyright
-.\" notice, this list of conditions and the following disclaimer in the
-.\" documentation and/or other materials provided with the distribution.
-.\" 3. All advertising materials mentioning features or use of this software
-.\" must display the following acknowledgement:
-.\" This product includes software developed by the University of
-.\" California, Berkeley and its contributors.
-.\" 4. Neither the name of the University nor the names of its contributors
-.\" may be used to endorse or promote products derived from this software
-.\" without specific prior written permission.
-.\"
-.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
-.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
-.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
-.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
-.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
-.\" SUCH DAMAGE.
-.\"
-.\" @(#)1.t 5.1 (Berkeley) 4/16/91
-.\"
-.nr PS 11
-.nr VS 13
-.NH
-Introduction
-.PP
-This paper describes the motivation for and implementation of
-a memory-based filesystem.
-Memory-based filesystems have existed for a long time;
-they have generally been marketed as RAM disks or sometimes
-as software packages that use the machine's general purpose memory.
-.[
-white
-.]
-.PP
-A RAM disk is designed to appear like any other disk peripheral
-connected to a machine.
-It is normally interfaced to the processor through the I/O bus
-and is accessed through a device driver similar or sometimes identical
-to the device driver used for a normal magnetic disk.
-The device driver sends requests for blocks of data to the device
-and the requested data is then DMA'ed to or from the requested block.
-Instead of storing its data on a rotating magnetic disk,
-the RAM disk stores its data in a large array of random access memory
-or bubble memory.
-Thus, the latency of accessing the RAM disk is nearly zero
-compared to the 15-50 milliseconds of latency incurred when
-access rotating magnetic media.
-RAM disks also have the benefit of being able to transfer data at
-the maximum DMA rate of the system,
-while disks are typically limited by the rate that the data passes
-under the disk head.
-.PP
-Software packages simulating RAM disks operate by allocating
-a fixed partition of the system memory.
-The software then provides a device driver interface similar
-to the one described for hardware RAM disks,
-except that it uses memory-to-memory copy instead of DMA to move
-the data between the RAM disk and the system buffers,
-or it maps the contents of the RAM disk into the system buffers.
-Because the memory used by the RAM disk is not available for
-other purposes, software RAM-disk solutions are used primarily
-for machines with limited addressing capabilities such as PC's
-that do not have an effective way of using the extra memory anyway.
-.PP
-Most software RAM disks lose their contents when the system is powered
-down or rebooted.
-The contents can be saved by using battery backed-up memory,
-by storing critical filesystem data structures in the filesystem,
-and by running a consistency check program after each reboot.
-These conditions increase the hardware cost
-and potentially slow down the speed of the disk.
-Thus, RAM-disk filesystems are not typically
-designed to survive power failures;
-because of their volatility, their usefulness is limited to transient
-or easily recreated information such as might be found in
-.PN /tmp .
-Their primary benefit is that they have higher throughput
-than disk based filesystems.
-.[
-smith
-.]
-This improved throughput is particularly useful for utilities that
-make heavy use of temporary files, such as compilers.
-On fast processors, nearly half of the elapsed time for a compilation
-is spent waiting for synchronous operations required for file
-creation and deletion.
-The use of the memory-based filesystem nearly eliminates this waiting time.
-.PP
-Using dedicated memory to exclusively support a RAM disk
-is a poor use of resources.
-The overall throughput of the system can be improved
-by using the memory where it is getting the highest access rate.
-These needs may shift between supporting process virtual address spaces
-and caching frequently used disk blocks.
-If the memory is dedicated to the filesystem,
-it is better used in a buffer cache.
-The buffer cache permits faster access to the data
-because it requires only a single memory-to-memory copy
-from the kernel to the user process.
-The use of memory is used in a RAM-disk configuration may require two
-memory-to-memory copies, one from the RAM disk
-to the buffer cache,
-then another copy from the buffer cache to the user process.
-.PP
-The new work being presented in this paper is building a prototype
-RAM-disk filesystem in pageable memory instead of dedicated memory.
-The goal is to provide the speed benefits of a RAM disk
-without paying the performance penalty inherent in dedicating
-part of the physical memory on the machine to the RAM disk.
-By building the filesystem in pageable memory,
-it competes with other processes for the available memory.
-When memory runs short, the paging system pushes its
-least-recently-used pages to backing store.
-Being pageable also allows the filesystem to be much larger than
-would be practical if it were limited by the amount of physical
-memory that could be dedicated to that purpose.
-We typically operate our
-.PN /tmp
-with 30 to 60 megabytes of space
-which is larger than the amount of memory on the machine.
-This configuration allows small files to be accessed quickly,
-while still allowing
-.PN /tmp
-to be used for big files,
-although at a speed more typical of normal, disk-based filesystems.
-.PP
-An alternative to building a memory-based filesystem would be to have
-a filesystem that never did operations synchronously and never
-flushed its dirty buffers to disk.
-However, we believe that such a filesystem would either
-use a disproportionately large percentage of the buffer
-cache space, to the detriment of other filesystems,
-or would require the paging system to flush its dirty pages.
-Waiting for other filesystems to push dirty pages
-subjects them to delays while waiting for the pages to be written.
-We await the results of others trying this approach.
-.[
-Ohta
-.]
-.NH
-Implementation
-.PP
-The current implementation took less time to write than did this paper.
-It consists of 560 lines of kernel code (1.7K text + data)
-and some minor modifications to the program that builds
-disk based filesystems, \fInewfs\fP.
-A condensed version of the kernel code for the
-memory-based filesystem are reproduced in Appendix 1.
-.PP
-A filesystem is created by invoking the modified \fInewfs\fP, with
-an option telling it to create a memory-based filesystem.
-It allocates a section of virtual address space of the requested
-size and builds a filesystem in the memory
-instead of on a disk partition.
-When built, it does a \fImount\fP system call specifying a filesystem type of
-.SM MFS
-(Memory File System).
-The auxiliary data parameter to the mount call specifies a pointer
-to the base of the memory in which it has built the filesystem.
-(The auxiliary data parameter used by the local filesystem, \fIufs\fP,
-specifies the block device containing the filesystem.)
-.PP
-The mount system call allocates and initializes a mount table
-entry and then calls the filesystem-specific mount routine.
-The filesystem-specific routine is responsible for doing
-the mount and initializing the filesystem-specific
-portion of the mount table entry.
-The memory-based filesystem-specific mount routine,
-.RN mfs_mount ,
-is shown in Appendix 1.
-It allocates a block-device vnode to represent the memory disk device.
-In the private area of this vnode it stores the base address of
-the filesystem and the process identifier of the \fInewfs\fP process
-for later reference when doing I/O.
-It also initializes an I/O list that it
-uses to record outstanding I/O requests.
-It can then call the \fIufs\fP filesystem mount routine,
-passing the special block-device vnode that it has created
-instead of the usual disk block-device vnode.
-The mount proceeds just as any other local mount, except that
-requests to read from the block device are vectored through
-.RN mfs_strategy
-(described below) instead of the usual
-.RN spec_strategy
-block device I/O function.
-When the mount is completed,
-.RN mfs_mount
-does not return as most other filesystem mount functions do;
-instead it sleeps in the kernel awaiting I/O requests.
-Each time an I/O request is posted for the filesystem,
-a wakeup is issued for the corresponding \fInewfs\fP process.
-When awakened, the process checks for requests on its buffer list.
-A read request is serviced by copying data from the section of the
-\fInewfs\fP address space corresponding to the requested disk block
-to the kernel buffer.
-Similarly a write request is serviced by copying data to the section of the
-\fInewfs\fP address space corresponding to the requested disk block
-from the kernel buffer.
-When all the requests have been serviced, the \fInewfs\fP
-process returns to sleep to await more requests.
-.PP
-Once mounted,
-all operations on files in the memory-based filesystem are handled
-by the \fIufs\fP filesystem code until they get to the point where the
-filesystem needs to do I/O on the device.
-Here, the filesystem encounters the second piece of the
-memory-based filesystem.
-Instead of calling the special-device strategy routine,
-it calls the memory-based strategy routine,
-.RN mfs_strategy .
-Usually,
-the request is serviced by linking the buffer onto the
-I/O list for the memory-based filesystem
-vnode and sending a wakeup to the \fInewfs\fP process.
-This wakeup results in a context-switch to the \fInewfs\fP
-process, which does a copyin or copyout as described above.
-The strategy routine must be careful to check whether
-the I/O request is coming from the \fInewfs\fP process itself, however.
-Such requests happen during mount and unmount operations,
-when the kernel is reading and writing the superblock.
-Here,
-.RN mfs_strategy
-must do the I/O itself to avoid deadlock.
-.PP
-The final piece of kernel code to support the
-memory-based filesystem is the close routine.
-After the filesystem has been successfully unmounted,
-the device close routine is called.
-For a memory-based filesystem, the device close routine is
-.RN mfs_close .
-This routine flushes any pending I/O requests,
-then sets the I/O list head to a special value
-that is recognized by the I/O servicing loop in
-.RN mfs_mount
-as an indication that the filesystem is unmounted.
-The
-.RN mfs_mount
-routine exits, in turn causing the \fInewfs\fP process
-to exit, resulting in the filesystem vanishing in a cloud of dirty pages.
-.PP
-The paging of the filesystem does not require any additional
-code beyond that already in the kernel to support virtual memory.
-The \fInewfs\fP process competes with other processes on an equal basis
-for the machine's available memory.
-Data pages of the filesystem that have not yet been used
-are zero-fill-on-demand pages that do not occupy memory,
-although they currently allocate space in backing store.
-As long as memory is plentiful, the entire contents of the filesystem
-remain memory resident.
-When memory runs short, the oldest pages of \fInewfs\fP will be
-pushed to backing store as part of the normal paging activity.
-The pages that are pushed usually hold the contents of
-files that have been created in the memory-based filesystem
-but have not been recently accessed (or have been deleted).
-.[
-leffler
-.]
-.NH
-Performance
-.PP
-The performance of the current memory-based filesystem is determined by
-the memory-to-memory copy speed of the processor.
-Empirically we find that the throughput is about 45% of this
-memory-to-memory copy speed.
-The basic set of steps for each block written is:
-.IP 1)
-memory-to-memory copy from the user process doing the write to a kernel buffer
-.IP 2)
-context-switch to the \fInewfs\fP process
-.IP 3)
-memory-to-memory copy from the kernel buffer to the \fInewfs\fP address space
-.IP 4)
-context switch back to the writing process
-.LP
-Thus each write requires at least two memory-to-memory copies
-accounting for about 90% of the
-.SM CPU
-time.
-The remaining 10% is consumed in the context switches and
-the filesystem allocation and block location code.
-The actual context switch count is really only about half
-of the worst case outlined above because
-read-ahead and write-behind allow multiple blocks
-to be handled with each context switch.
-.PP
-On the six-\c
-.SM "MIPS CCI"
-Power 6/32 machine,
-the raw reading and writing speed is only about twice that of
-a regular disk-based filesystem.
-However, for processes that create and delete many files,
-the speedup is considerably greater.
-The reason for the speedup is that the filesystem
-must do two synchronous operations to create a file,
-first writing the allocated inode to disk, then creating the
-directory entry.
-Deleting a file similarly requires at least two synchronous
-operations.
-Here, the low latency of the memory-based filesystem is
-noticeable compared to the disk-based filesystem,
-as a synchronous operation can be done with
-just two context switches instead of incurring the disk latency.
-.NH
-Future Work
-.PP
-The most obvious shortcoming of the current implementation
-is that filesystem blocks are copied twice, once between the \fInewfs\fP
-process' address space and the kernel buffer cache,
-and once between the kernel buffer and the requesting process.
-These copies are done in different process contexts, necessitating
-two context switches per group of I/O requests.
-These problems arise because of the current inability of the kernel
-to do page-in operations
-for an address space other than that of the currently-running process,
-and the current inconvenience of mapping process-owned pages into the kernel
-buffer cache.
-Both of these problems are expected to be solved in the next version
-of the virtual memory system,
-and thus we chose not to address them in the current implementation.
-With the new version of the virtual memory system, we expect to use
-any part of physical memory as part of the buffer cache,
-even though it will not be entirely addressable at once within the kernel.
-In that system, the implementation of a memory-based filesystem
-that avoids the double copy and context switches will be much easier.
-.PP
-Ideally part of the kernel's address space would reside in pageable memory.
-Once such a facility is available it would be most efficient to
-build a memory-based filesystem within the kernel.
-One potential problem with such a scheme is that many kernels
-are limited to a small address space (usually a few megabytes).
-This restriction limits the size of memory-based
-filesystem that such a machine can support.
-On such a machine, the kernel can describe a memory-based filesystem
-that is larger than its address space and use a ``window''
-to map the larger filesystem address space into its limited address space.
-The window would maintain a cache of recently accessed pages.
-The problem with this scheme is that if the working set of
-active pages is greater than the size of the window, then
-much time is spent remapping pages and invalidating
-translation buffers.
-Alternatively, a separate address space could be constructed for each
-memory-based filesystem as in the current implementation,
-and the memory-resident pages of that address space could be mapped
-exactly as other cached pages are accessed.
-.PP
-The current system uses the existing local filesystem structures
-and code to implement the memory-based filesystem.
-The major advantages of this approach are the sharing of code
-and the simplicity of the approach.
-There are several disadvantages, however.
-One is that the size of the filesystem is fixed at mount time.
-This means that a fixed number of inodes (files) and data blocks
-can be supported.
-Currently, this approach requires enough swap space for the entire
-filesystem, and prevents expansion and contraction of the filesystem on demand.
-The current design also prevents the filesystem from taking advantage
-of the memory-resident character of the filesystem.
-It would be interesting to explore other filesystem implementations
-that would be less expensive to execute and that would make better
-use of the space.
-For example, the current filesystem structure is optimized for magnetic
-disks.
-It includes replicated control structures, ``cylinder groups''
-with separate allocation maps and control structures,
-and data structures that optimize rotational layout of files.
-None of this is useful in a memory-based filesystem (at least when the
-backing store for the filesystem is dynamically allocated and not
-contiguous on a single disk type).
-On the other hand,
-directories could be implemented using dynamically-allocated
-memory organized as linked lists or trees rather than as files stored
-in ``disk'' blocks.
-Allocation and location of pages for file data might use virtual memory
-primitives and data structures rather than direct and indirect blocks.
-A reimplementation along these lines will be considered when the virtual
-memory system in the current system has been replaced.
-.[
-$LIST$
-.]
OpenPOWER on IntegriCloud