summaryrefslogtreecommitdiffstats
path: root/share/doc/papers/nqnfs/nqnfs.me
diff options
context:
space:
mode:
Diffstat (limited to 'share/doc/papers/nqnfs/nqnfs.me')
-rw-r--r--share/doc/papers/nqnfs/nqnfs.me2004
1 files changed, 0 insertions, 2004 deletions
diff --git a/share/doc/papers/nqnfs/nqnfs.me b/share/doc/papers/nqnfs/nqnfs.me
deleted file mode 100644
index 9502ae1..0000000
--- a/share/doc/papers/nqnfs/nqnfs.me
+++ /dev/null
@@ -1,2004 +0,0 @@
-.\" Copyright (c) 1993 The Usenix Association. All rights reserved.
-.\"
-.\" This document is derived from software contributed to Berkeley by
-.\" Rick Macklem at The University of Guelph with the permission of
-.\" the Usenix Association.
-.\"
-.\" Redistribution and use in source and binary forms, with or without
-.\" modification, are permitted provided that the following conditions
-.\" are met:
-.\" 1. Redistributions of source code must retain the above copyright
-.\" notice, this list of conditions and the following disclaimer.
-.\" 2. Redistributions in binary form must reproduce the above copyright
-.\" notice, this list of conditions and the following disclaimer in the
-.\" documentation and/or other materials provided with the distribution.
-.\" 3. All advertising materials mentioning features or use of this software
-.\" must display the following acknowledgement:
-.\" This product includes software developed by the University of
-.\" California, Berkeley and its contributors.
-.\" 4. Neither the name of the University nor the names of its contributors
-.\" may be used to endorse or promote products derived from this software
-.\" without specific prior written permission.
-.\"
-.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
-.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
-.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
-.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
-.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
-.\" SUCH DAMAGE.
-.\"
-.\" @(#)nqnfs.me 8.1 (Berkeley) 4/20/94
-.\" $FreeBSD$
-.\"
-.lp
-.nr PS 12
-.ps 12
-Reprinted with permission from the "Proceedings of the Winter 1994 Usenix
-Conference", January 1994, San Francisco, CA, Copyright The Usenix
-Association.
-.nr PS 14
-.ps 14
-.sp
-.ce
-\fBNot Quite NFS, Soft Cache Consistency for NFS\fR
-.nr PS 12
-.ps 12
-.sp
-.ce
-\fIRick Macklem\fR
-.ce
-\fIUniversity of Guelph\fR
-.sp
-.nr PS 12
-.ps 12
-.ce
-\fBAbstract\fR
-.nr PS 10
-.ps 10
-.pp
-There are some constraints inherent in the NFS\(tm\(mo protocol
-that result in performance limitations
-for high performance
-workstation environments.
-This paper discusses an NFS-like protocol named Not Quite NFS (NQNFS),
-designed to address some of these limitations.
-This protocol provides full cache consistency during normal
-operation, while permitting more effective client-side caching in an
-effort to improve performance.
-There are also a variety of minor protocol changes, in order to resolve
-various NFS issues.
-The emphasis is on observed performance of a
-preliminary implementation of the protocol, in order to show
-how well this design works
-and to suggest possible areas for further improvement.
-.sh 1 "Introduction"
-.pp
-It has been observed that
-overall workstation performance has not been scaling with
-processor speed and that file system I/O is a limiting factor [Ousterhout90].
-Ousterhout
-notes
-that a principal challenge for operating system developers is the
-decoupling of system calls from their underlying I/O operations, in order
-to improve average system call response times.
-For distributed file systems, every synchronous Remote Procedure Call (RPC)
-takes a minimum of a few milliseconds and, as such, is analogous to an
-underlying I/O operation.
-This suggests that client caching with a very good
-hit ratio for read type operations, along with asynchronous writing, is required in order to avoid delays waiting for RPC replies.
-However, the NFS protocol requires that the server be stateless\**
-.(f
-\**The server must not require any state that may be lost due to a crash, to
-function correctly.
-.)f
-and does not provide any explicit mechanism for client cache
-consistency, putting
-constraints on how the client may cache data.
-This paper describes an NFS-like protocol that includes a cache consistency
-component designed to enhance client caching performance. It does provide
-full consistency under normal operation, but without requiring that hard
-state information be maintained on the server.
-Design tradeoffs were made towards simplicity and
-high performance over cache consistency under abnormal conditions.
-The protocol design uses a variation of Leases [Gray89]
-to provide state on the server that does not need to be recovered after a
-crash.
-.pp
-The protocol also includes changes designed to address other limitations
-of NFS in a modern workstation environment.
-The use of TCP transport is optionally available to avoid
-the pitfalls of Sun RPC over UDP transport when running across an internetwork [Nowicki89].
-Kerberos [Steiner88] support is available
-to do proper user authentication, in order to provide improved security and
-arbitrary client to server user ID mappings.
-There are also a variety of other changes to accommodate large file systems,
-such as 64bit file sizes and offsets, as well as lifting the 8Kbyte I/O size
-limit.
-The remainder of this paper gives an overview of the protocol, highlighting
-performance related components, followed by an evaluation of resultant performance
-for the 4.4BSD implementation.
-.sh 1 "Distributed File Systems and Caching"
-.pp
-Clients using distributed file systems cache recently-used data in order
-to reduce the number of synchronous server operations, and therefore improve
-average response times for system calls.
-Unfortunately, maintaining consistency between these caches is a problem
-whenever write sharing occurs; that is, when a process on a client writes
-to a file and one or more processes on other client(s) read the file.
-If the writer closes the file before any reader(s) open the file for reading,
-this is called sequential write sharing. Both the Andrew ITC file system
-[Howard88] and NFS [Sandberg85] maintain consistency for sequential write
-sharing by requiring the writer to push all the writes through to the
-server on close and having readers check to see if the file has been
-modified upon open. If the file has been modified, the client throws away
-all cached data for that file, as it is now stale.
-NFS implementations typically detect file modification by checking a cached
-copy of the file's modification time; since this cached value is often
-several seconds out of date and only has a resolution of one second, an NFS
-client often uses stale cached data for some time after the file has
-been updated on the server.
-.pp
-A more difficult case is concurrent write sharing, where write operations are intermixed
-with read operations.
-Consistency for this case, often referred to as "full cache consistency,"
-requires that a reader always receives the most recently written data.
-Neither NFS nor the Andrew ITC file system maintain consistency for this
-case.
-The simplest mechanism for maintaining full cache consistency is the one
-used by Sprite [Nelson88], which disables all client caching of the
-file whenever concurrent write sharing might occur.
-There are other mechanisms described in the literature [Kent87a,
-Burrows88], but they appeared to be too elaborate for incorporation
-into NQNFS (for example, Kent's requires specialized hardware).
-NQNFS differs from Sprite in the way it
-detects write sharing. The Sprite server maintains a list of files currently open
-by the various clients and detects write sharing when a file open request
-for writing is received and the file is already open for reading
-(or vice versa).
-This list of open files is hard state information that must be recovered
-after a server crash, which is a significant problem in its own
-right [Mogul93, Welch90].
-.pp
-The approach used by NQNFS is a variant of the Leases mechanism [Gray89].
-In this model, the server issues to a client a promise, referred to as a
-"lease," that the client may cache a specific object without fear of
-conflict.
-A lease has a limited duration and must be renewed by the client if it
-wishes to continue to cache the object.
-In NQNFS, clients hold short-term (up to one minute) leases on files
-for reading or writing.
-The leases are analogous to entries in the open file list, except that
-they expire after the lease term unless renewed by the client.
-As such, one minute after issuing the last lease there are no current
-leases and therefore no lease records to be recovered after a crash, hence
-the term "soft server state."
-.pp
-A related design consideration is the way client writing is done.
-Synchronous writing requires that all writes be pushed through to the server
-during the write system call.
-This is the simplest variant, from a consistency point of view, since the
-server always has the most recently written data. It also permits any write
-errors, such as "file system out of space" to be propagated back to the
-client's process via the write system call return.
-Unfortunately this approach limits the client write rate, based on server write
-performance and client/server RPC round trip time (RTT).
-.pp
-An alternative to this is delayed writing, where the write system call returns
-as soon as the data is cached on the client and the data is written to the
-server sometime later.
-This permits client writing to occur at the rate of local storage access
-up to the size of the local cache.
-Also, for cases where file truncation/deletion occurs shortly after writing,
-the write to the server may be avoided since the data has already been
-deleted, reducing server write load.
-There are some obvious drawbacks to this approach.
-For any Sprite-like system to maintain
-full consistency, the server must "callback" to the client to cause the
-delayed writes to be written back to the server when write sharing is about to
-occur.
-There are also problems with the propagation of errors
-back to the client process that issued the write system call.
-The reason for this is that
-the system call has already returned without reporting an error and the
-process may also have already terminated.
-As well, there is a risk of the loss of recently written data if the client
-crashes before the data is written back to the server.
-.pp
-A compromise between these two alternatives is asynchronous writing, where
-the write to the server is initiated during the write system call but the write system
-call returns before the write completes.
-This approach minimizes the risk of data loss due to a client crash, but negates
-the possibility of reducing server write load by throwing writes away when
-a file is truncated or deleted.
-.pp
-NFS implementations usually do a mix of asynchronous and delayed writing
-but push all writes to the server upon close, in order to maintain open/close
-consistency.
-Pushing the delayed writes on close
-negates much of the performance advantage of delayed writing, since the
-delays that were avoided in the write system calls are observed in the close
-system call.
-Akin to Sprite, the NQNFS protocol does delayed writing in an effort to achieve
-good client performance and uses a callback mechanism to maintain full cache
-consistency.
-.sh 1 "Related Work"
-.pp
-There has been a great deal of effort put into improving the performance and
-consistency of the NFS protocol. This work can be put in two categories.
-The first category are implementation enhancements for the NFS protocol and
-the second involve modifications to the protocol.
-.pp
-The work done on implementation enhancements have attacked two problem areas,
-NFS server write performance and RPC transport problems.
-Server write performance is a major problem for NFS, in part due to the
-requirement to push all writes to the server upon close and in part due
-to the fact that, for writes, all data and meta-data must be committed to
-non-volatile storage before the server replies to the write RPC.
-The Prestoserve\(tm\(dg
-[Moran90]
-system uses non-volatile RAM as a buffer for recently written data on the server,
-so that the write RPC replies can be returned to the client before the data is written to the
-disk surface.
-Write gathering [Juszczak94] is a software technique used on the server where a write
-RPC request is delayed for a short time in the hope that another contiguous
-write request will arrive, so that they can be merged into one write operation.
-Since the replies to all of the merged writes are not returned to the client until the write
-operation is completed, this delay does not violate the protocol.
-When write operations are merged, the number of disk writes can be reduced,
-improving server write performance.
-Although either of the above reduces write RPC response time for the server,
-it cannot be reduced to zero, and so, any client side caching mechanism
-that reduces write RPC load or client dependence on server RPC response time
-should still improve overall performance.
-Good client side caching should be complementary to these server techniques,
-although client performance improvements as a result of caching may be less
-dramatic when these techniques are used.
-.pp
-In NFS, each Sun RPC request is packaged in a UDP datagram for transmission
-to the server. A timer is started, and if a timeout occurs before the corresponding
-RPC reply is received, the RPC request is retransmitted.
-There are two problems with this model.
-First, when a retransmit timeout occurs, the RPC may be redone, instead of
-simply retransmitting the RPC request message to the server. A recent-request
-cache can be used on the server to minimize the negative impact of redoing
-RPCs [Juszczak89].
-The second problem is that a large UDP datagram, such as a read request or
-write reply, must be fragmented by IP and if any one IP fragment is lost in
-transit, the entire UDP datagram is lost [Kent87]. Since entire requests and replies
-are packaged in a single UDP datagram, this puts an upper bound on the read/write
-data size (8 kbytes).
-.pp
-Adjusting the retransmit timeout (RTT) interval dynamically and applying a
-congestion window on outstanding requests has been shown to be of some help
-[Nowicki89] with the retransmission problem.
-An alternative to this is to use TCP transport to delivery the RPC messages
-reliably [Macklem90] and one of the performance results in this paper
-shows the effects of this further.
-.pp
-Srinivasan and Mogul [Srinivasan89] enhanced the NFS protocol to use the Sprite cache
-consistency algorithm in an effort to improve performance and to provide
-full client cache consistency.
-This experimental implementation demonstrated significantly better
-performance than NFS, but suffered from a lack of crash recovery support.
-The NQNFS protocol design borrowed heavily from this work, but differed
-from the Sprite algorithm by using Leases instead of file open state
-to detect write sharing.
-The decision to use Leases was made primarily to avoid the crash recovery
-problem.
-More recent work by the Sprite group [Baker91] and Mogul [Mogul93] have
-addressed the crash recovery problem, making this design tradeoff more
-questionable now.
-.pp
-Sun has recently updated the NFS protocol to Version 3 [SUN93], using some
-changes similar to NQNFS to address various issues. The Version 3 protocol
-uses 64bit file sizes and offsets, provides a Readdir_and_Lookup RPC and
-an access RPC.
-It also provides cache hints, to permit a client to be able to determine
-whether a file modification is the result of that client's write or some
-other client's write.
-It would be possible to add either Spritely NFS or NQNFS support for cache
-consistency to the NFS Version 3 protocol.
-.sh 1 "NQNFS Consistency Protocol and Recovery"
-.pp
-The NQNFS cache consistency protocol uses a somewhat Sprite-like [Nelson88]
-mechanism, but is based on Leases [Gray89] instead of hard server state information
-about open files.
-The basic principle is that the server disables client caching of files whenever
-concurrent write sharing could occur, by performing a server-to-client
-callback,
-forcing the client to flush its caches and to do all subsequent I/O on the file with
-synchronous RPCs.
-A Sprite server maintains a record of the open state of files for
-all clients and uses this to determine when concurrent write sharing might
-occur.
-This \fIopen state\fR information might also be referred to as an infinite-term
-lease for the file, with explicit lease cancellation.
-NQNFS, on the other hand, uses a short-term lease that expires due to timeout
-after a maximum of one minute, unless explicitly renewed by the client.
-The fundamental difference is that an NQNFS client must keep renewing
-a lease to use cached data whereas a Sprite client assumes the data is valid until canceled
-by the server
-or the file is closed.
-Using leases permits the server to remain "stateless," since the soft
-state information, which consists of the set of current leases, is
-moot after one minute, when all the leases expire.
-.pp
-Whenever a client wishes to access a file's data it must hold one of
-three types of lease: read-caching, write-caching or non-caching.
-The latter type requires that all file operations be done synchronously with
-the server via the appropriate RPCs.
-.pp
-A read-caching lease allows for client data caching but no modifications
-may be done.
-It may, however, be shared between multiple clients. Diagram 1 shows a typical
-read-caching scenario. The vertical solid black lines depict the lease records.
-Note that the time lines are nowhere near to scale, since a client/server
-interaction will normally take less than one hundred milliseconds, whereas the
-normal lease duration is thirty seconds.
-Every lease includes a \fImodrev\fR value, which changes upon every modification
-of the file. It may be used to check to see if data cached on the client is
-still current.
-.pp
-A write-caching lease permits delayed write caching,
-but requires that all data be pushed to the server when the lease expires
-or is terminated by an eviction callback.
-When a write-caching lease has almost expired, the client will attempt to
-extend the lease if the file is still open, but is required to push the delayed writes to the server
-if renewal fails (as depicted by diagram 2).
-The writes may not arrive at the server until after the write lease has
-expired on the client, but this does not result in a consistency problem,
-so long as the write lease is still valid on the server.
-Note that, in diagram 2, the lease record on the server remains current after
-the expiry time, due to the conditions mentioned in section 5.
-If a write RPC is done on the server after the write lease has expired on
-the server, this could be considered an error since consistency could be
-lost, but it is not handled as such by NQNFS.
-.pp
-Diagram 3 depicts how read and write leases are replaced by a non-caching
-lease when there is the potential for write sharing.
-.(z
-.sp
-.PS
-.ps
-.ps 50
-line from 0.738,5.388 to 1.238,5.388
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 1.488,10.075 to 1.488,5.450
-line dashed from 2.987,10.075 to 2.987,5.450
-line dashed from 4.487,10.075 to 4.487,5.450
-.ps
-.ps 50
-line from 4.487,7.013 to 4.487,5.950
-line from 2.987,7.700 to 2.987,5.950 to 2.987,6.075
-line from 1.488,7.513 to 1.488,5.950
-line from 2.987,9.700 to 2.987,8.325
-line from 1.488,9.450 to 1.488,8.325
-.ps
-.ps 10
-line from 2.987,6.450 to 4.487,6.200
-line from 4.385,6.192 to 4.487,6.200 to 4.393,6.241
-line from 4.487,6.888 to 2.987,6.575
-line from 3.080,6.620 to 2.987,6.575 to 3.090,6.571
-line from 2.987,7.263 to 4.487,7.013
-line from 4.385,7.004 to 4.487,7.013 to 4.393,7.054
-line from 4.487,7.638 to 2.987,7.388
-line from 3.082,7.429 to 2.987,7.388 to 3.090,7.379
-line from 2.987,6.888 to 1.488,6.575
-line from 1.580,6.620 to 1.488,6.575 to 1.590,6.571
-line from 1.488,7.200 to 2.987,6.950
-line from 2.885,6.942 to 2.987,6.950 to 2.893,6.991
-line from 2.987,7.700 to 1.488,7.513
-line from 1.584,7.550 to 1.488,7.513 to 1.590,7.500
-line from 1.488,8.012 to 2.987,7.763
-line from 2.885,7.754 to 2.987,7.763 to 2.893,7.804
-line from 2.987,9.012 to 1.488,8.825
-line from 1.584,8.862 to 1.488,8.825 to 1.590,8.813
-line from 1.488,9.325 to 2.987,9.137
-line from 2.885,9.125 to 2.987,9.137 to 2.891,9.175
-line from 2.987,9.637 to 1.488,9.450
-line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438
-line from 1.488,9.887 to 2.987,9.700
-line from 2.885,9.688 to 2.987,9.700 to 2.891,9.737
-.ps
-.ps 12
-.ft
-.ft R
-"Lease valid on machine" at 1.363,5.296 ljust
-"with same modrev" at 1.675,7.421 ljust
-"miss)" at 2.612,9.233 ljust
-"(cache" at 2.300,9.358 ljust
-.ps
-.ps 14
-"Diagram #1: Read Caching Leases" at 0.738,5.114 ljust
-"Client B" at 4.112,10.176 ljust
-"Server" at 2.612,10.176 ljust
-"Client A" at 0.925,10.176 ljust
-.ps
-.ps 12
-"from cache" at 4.675,6.546 ljust
-"Read syscalls" at 4.675,6.796 ljust
-"Reply" at 3.737,6.108 ljust
-"(cache miss)" at 3.675,6.421 ljust
-"Read req" at 3.737,6.608 ljust
-"to lease" at 3.112,6.796 ljust
-"Client B added" at 3.112,6.983 ljust
-"Reply" at 3.237,7.296 ljust
-"Read + lease req" at 3.175,7.671 ljust
-"Read syscall" at 4.675,7.608 ljust
-"Reply" at 1.675,6.796 ljust
-"miss)" at 2.487,7.108 ljust
-"Read req (cache" at 1.675,7.233 ljust
-"from cache" at 0.425,6.296 ljust
-"Read syscalls" at 0.425,6.546 ljust
-"cache" at 0.425,6.858 ljust
-"so can still" at 0.425,7.108 ljust
-"Modrev same" at 0.425,7.358 ljust
-"Reply" at 1.675,7.671 ljust
-"Get lease req" at 1.675,8.108 ljust
-"Read syscall" at 0.425,7.983 ljust
-"Lease times out" at 0.425,8.296 ljust
-"from cache" at 0.425,9.046 ljust
-"Read syscalls" at 0.425,9.296 ljust
-"for Client A" at 3.112,9.296 ljust
-"Read caching lease" at 3.112,9.483 ljust
-"Reply" at 1.675,8.983 ljust
-"Read req" at 1.675,9.358 ljust
-"Reply" at 1.675,9.608 ljust
-"Read + lease req" at 1.675,9.921 ljust
-"Read syscall" at 0.425,9.921 ljust
-.ps
-.ft
-.PE
-.sp
-.)z
-.(z
-.sp
-.PS
-.ps
-.ps 50
-line from 1.175,5.700 to 1.300,5.700
-line from 0.738,5.700 to 1.175,5.700
-line from 2.987,6.638 to 2.987,6.075
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 2.987,6.575 to 2.987,5.950
-line dashed from 1.488,6.575 to 1.488,5.888
-.ps
-.ps 50
-line from 2.987,9.762 to 2.987,6.638
-line from 1.488,9.450 to 1.488,7.700
-.ps
-.ps 10
-line from 2.987,6.763 to 1.488,6.575
-line from 1.584,6.612 to 1.488,6.575 to 1.590,6.563
-line from 1.488,7.013 to 2.987,6.825
-line from 2.885,6.813 to 2.987,6.825 to 2.891,6.862
-line from 2.987,7.325 to 1.488,7.075
-line from 1.582,7.116 to 1.488,7.075 to 1.590,7.067
-line from 1.488,7.700 to 2.987,7.388
-line from 2.885,7.383 to 2.987,7.388 to 2.895,7.432
-line from 2.987,8.575 to 1.488,8.325
-line from 1.582,8.366 to 1.488,8.325 to 1.590,8.317
-line from 1.488,8.887 to 2.987,8.637
-line from 2.885,8.629 to 2.987,8.637 to 2.893,8.679
-line from 2.987,9.637 to 1.488,9.450
-line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438
-line from 1.488,9.887 to 2.987,9.762
-line from 2.886,9.746 to 2.987,9.762 to 2.890,9.796
-line dashed from 2.987,10.012 to 2.987,6.513
-line dashed from 1.488,10.012 to 1.488,6.513
-.ps
-.ps 12
-.ft
-.ft R
-"write" at 4.237,5.921 ljust
-"Lease valid on machine" at 1.425,5.733 ljust
-.ps
-.ps 14
-"Diagram #2: Write Caching Lease" at 0.738,5.551 ljust
-"Server" at 2.675,10.114 ljust
-"Client A" at 1.113,10.114 ljust
-.ps
-.ps 12
-"seconds after last" at 3.112,5.921 ljust
-"Expires write_slack" at 3.112,6.108 ljust
-"due to write activity" at 3.112,6.608 ljust
-"Expiry delayed" at 3.112,6.796 ljust
-"Lease times out" at 3.112,7.233 ljust
-"Lease renewed" at 3.175,8.546 ljust
-"Lease for client A" at 3.175,9.358 ljust
-"Write caching" at 3.175,9.608 ljust
-"Reply" at 1.675,6.733 ljust
-"Write req" at 1.988,7.046 ljust
-"Reply" at 1.675,7.233 ljust
-"Write req" at 1.675,7.796 ljust
-"Lease expires" at 0.487,7.733 ljust
-"Close syscall" at 0.487,8.108 ljust
-"lease granted" at 1.675,8.546 ljust
-"Get write lease" at 1.675,8.921 ljust
-"before expiry" at 0.487,8.608 ljust
-"Lease renewal" at 0.487,8.796 ljust
-"syscalls" at 0.487,9.046 ljust
-"Delayed write" at 0.487,9.233 ljust
-"lease granted" at 1.675,9.608 ljust
-"Get write lease req" at 1.675,9.921 ljust
-"Write syscall" at 0.487,9.858 ljust
-.ps
-.ft
-.PE
-.sp
-.)z
-.(z
-.sp
-.PS
-.ps
-.ps 50
-line from 0.613,2.638 to 1.238,2.638
-line from 1.488,4.075 to 1.488,3.638
-line from 2.987,4.013 to 2.987,3.575
-line from 4.487,4.013 to 4.487,3.575
-.ps
-.ps 10
-line from 2.987,3.888 to 4.487,3.700
-line from 4.385,3.688 to 4.487,3.700 to 4.391,3.737
-line from 4.487,4.138 to 2.987,3.950
-line from 3.084,3.987 to 2.987,3.950 to 3.090,3.938
-line from 2.987,4.763 to 4.487,4.450
-line from 4.385,4.446 to 4.487,4.450 to 4.395,4.495
-.ps
-.ps 50
-line from 4.487,4.438 to 4.487,4.013
-.ps
-.ps 10
-line from 4.487,5.138 to 2.987,4.888
-line from 3.082,4.929 to 2.987,4.888 to 3.090,4.879
-.ps
-.ps 50
-line from 4.487,6.513 to 4.487,5.513
-line from 4.487,6.513 to 4.487,6.513 to 4.487,5.513
-line from 2.987,5.450 to 2.987,5.200
-line from 1.488,5.075 to 1.488,4.075
-line from 2.987,5.263 to 2.987,4.013
-line from 2.987,7.700 to 2.987,5.325
-line from 4.487,7.575 to 4.487,6.513
-line from 1.488,8.512 to 1.488,8.075
-line from 2.987,8.637 to 2.987,8.075
-line from 2.987,9.637 to 2.987,8.825
-line from 1.488,9.450 to 1.488,8.950
-.ps
-.ps 10
-line from 2.987,4.450 to 1.488,4.263
-line from 1.584,4.300 to 1.488,4.263 to 1.590,4.250
-line from 1.488,4.888 to 2.987,4.575
-line from 2.885,4.571 to 2.987,4.575 to 2.895,4.620
-line from 2.987,5.263 to 1.488,5.075
-line from 1.584,5.112 to 1.488,5.075 to 1.590,5.063
-line from 4.487,5.513 to 2.987,5.325
-line from 3.084,5.362 to 2.987,5.325 to 3.090,5.313
-line from 2.987,5.700 to 4.487,5.575
-line from 4.386,5.558 to 4.487,5.575 to 4.390,5.608
-line from 4.487,6.013 to 2.987,5.825
-line from 3.084,5.862 to 2.987,5.825 to 3.090,5.813
-line from 2.987,6.200 to 4.487,6.075
-line from 4.386,6.058 to 4.487,6.075 to 4.390,6.108
-line from 4.487,6.450 to 2.987,6.263
-line from 3.084,6.300 to 2.987,6.263 to 3.090,6.250
-line from 2.987,6.700 to 4.487,6.513
-line from 4.385,6.500 to 4.487,6.513 to 4.391,6.550
-line from 1.488,6.950 to 2.987,6.763
-line from 2.885,6.750 to 2.987,6.763 to 2.891,6.800
-line from 2.987,7.700 to 4.487,7.575
-line from 4.386,7.558 to 4.487,7.575 to 4.390,7.608
-line from 4.487,7.950 to 2.987,7.763
-line from 3.084,7.800 to 2.987,7.763 to 3.090,7.750
-line from 2.987,8.637 to 1.488,8.512
-line from 1.585,8.546 to 1.488,8.512 to 1.589,8.496
-line from 1.488,8.887 to 2.987,8.700
-line from 2.885,8.688 to 2.987,8.700 to 2.891,8.737
-line from 2.987,9.637 to 1.488,9.450
-line from 1.584,9.487 to 1.488,9.450 to 1.590,9.438
-line from 1.488,9.950 to 2.987,9.762
-line from 2.885,9.750 to 2.987,9.762 to 2.891,9.800
-dashwid = 0.050i
-line dashed from 4.487,10.137 to 4.487,2.825
-line dashed from 2.987,10.137 to 2.987,2.825
-line dashed from 1.488,10.137 to 1.488,2.825
-.ps
-.ps 12
-.ft
-.ft R
-"(not cached)" at 4.612,3.858 ljust
-.ps
-.ps 14
-"Diagram #3: Write sharing case" at 0.613,2.239 ljust
-.ps
-.ps 12
-"Write syscall" at 4.675,7.546 ljust
-"Read syscall" at 0.550,9.921 ljust
-.ps
-.ps 14
-"Lease valid on machine" at 1.363,2.551 ljust
-.ps
-.ps 12
-"(can still cache)" at 1.675,8.171 ljust
-"Reply" at 3.800,3.858 ljust
-"Write" at 3.175,4.046 ljust
-"writes" at 4.612,4.046 ljust
-"synchronous" at 4.612,4.233 ljust
-"write syscall" at 4.675,5.108 ljust
-"non-caching lease" at 3.175,4.296 ljust
-"Reply " at 3.175,4.483 ljust
-"req" at 3.175,4.983 ljust
-"Get write lease" at 3.175,5.108 ljust
-"Vacated msg" at 3.175,5.483 ljust
-"to the server" at 4.675,5.858 ljust
-"being flushed to" at 4.675,6.046 ljust
-"Delayed writes" at 4.675,6.233 ljust
-.ps
-.ps 16
-"Server" at 2.675,10.182 ljust
-"Client B" at 3.925,10.182 ljust
-"Client A" at 0.863,10.182 ljust
-.ps
-.ps 12
-"(not cached)" at 0.550,4.733 ljust
-"Read data" at 0.550,4.921 ljust
-"Reply data" at 1.675,4.421 ljust
-"Read request" at 1.675,4.921 ljust
-"lease" at 1.675,5.233 ljust
-"Reply non-caching" at 1.675,5.421 ljust
-"Reply" at 3.737,5.733 ljust
-"Write" at 3.175,5.983 ljust
-"Reply" at 3.737,6.171 ljust
-"Write" at 3.175,6.421 ljust
-"Eviction Notice" at 3.175,6.796 ljust
-"Get read lease" at 1.675,7.046 ljust
-"Read syscall" at 0.550,6.983 ljust
-"being cached" at 4.675,7.171 ljust
-"Delayed writes" at 4.675,7.358 ljust
-"lease" at 3.175,7.233 ljust
-"Reply write caching" at 3.175,7.421 ljust
-"Get write lease" at 3.175,7.983 ljust
-"Write syscall" at 4.675,7.983 ljust
-"with same modrev" at 1.675,8.358 ljust
-"Lease" at 0.550,8.171 ljust
-"Renewed" at 0.550,8.358 ljust
-"Reply" at 1.675,8.608 ljust
-"Get Lease Request" at 1.675,8.983 ljust
-"Read syscall" at 0.550,8.733 ljust
-"from cache" at 0.550,9.108 ljust
-"Read syscall" at 0.550,9.296 ljust
-"Reply " at 1.675,9.671 ljust
-"plus lease" at 2.050,9.983 ljust
-"Read Request" at 1.675,10.108 ljust
-.ps
-.ft
-.PE
-.sp
-.)z
-A write-caching lease is not used in the Stanford V Distributed System [Gray89],
-since synchronous writing is always used. A side effect of this change
-is that the five to ten second lease duration recommended by Gray was found
-to be insufficient to achieve good performance for the write-caching lease.
-Experimentation showed that thirty seconds was about optimal for cases where
-the client and server are connected to the same local area network, so
-thirty seconds is the default lease duration for NQNFS.
-A maximum of twice that value is permitted, since Gray showed that for some
-network topologies, a larger lease duration functions better.
-Although there is an explicit get_lease RPC defined for the protocol,
-most lease requests are piggybacked onto the other RPCs to minimize the
-additional overhead introduced by leasing.
-.sh 2 "Rationale"
-.pp
-Leasing was chosen over hard server state information for the following
-reasons:
-.ip 1.
-The server must maintain state information about all current
-client leases.
-Since at most one lease is allocated for each RPC and the leases expire
-after their lease term,
-the upper bound on the number of current leases is the product of the
-lease term and the server RPC rate.
-In practice, it has been observed that less than 10% of RPCs request new leases
-and since most leases have a term of thirty seconds, the following rule of
-thumb should estimate the number of server lease records:
-.sp
-.nf
- Number of Server Lease Records \(eq 0.1 * 30 * RPC rate
-.fi
-.sp
-Since each lease record occupies 64 bytes of server memory, storing the lease
-records should not be a serious problem.
-If a server has exhausted lease storage, it can simply wait a few seconds
-for a lease to expire and free up a record.
-On the other hand, a Sprite-like server must store records for all files
-currently open by all clients, which can require significant storage for
-a large, heavily loaded server.
-In [Mogul93], it is proposed that a mechanism vaguely similar to paging could be
-used to deal with this for Spritely NFS, but this
-appears to introduce a fair amount of complexity and may limit the
-usefulness of open records for storing other state information, such
-as file locks.
-.ip 2.
-After a server crashes it must recover lease records for
-the current outstanding leases, which actually implies that if it waits
-until all leases have expired, there is no state to recover.
-The server must wait for the maximum lease duration of one minute, and it must serve
-all outstanding write requests resulting from terminated write-caching
-leases before issuing new leases. The one minute delay can be overlapped with
-file system consistency checking (eg. fsck).
-Because no state must be recovered, a lease-based server, like an NFS server,
-avoids the problem of state recovery after a crash.
-.sp
-There can, however, be problems during crash recovery
-because of a potentially large number of write backs due to terminated
-write-caching leases.
-One of these problems is a "recovery storm" [Baker91], which could occur when
-the server is overloaded by the number of write RPC requests.
-The NQNFS protocol deals with this by replying
-with a return status code called
-try_again_later to all
-RPC requests (except write) until the write requests subside.
-At this time, there has not been sufficient testing of server crash
-recovery while under heavy server load to determine if the try_again_later
-reply is a sufficient solution to the problem.
-The other problem is that consistency will be lost if other RPCs are performed
-before all of the write backs for terminated write-caching leases have completed.
-This is handled by only performing write RPCs until
-no write RPC requests arrive
-for write_slack seconds, where write_slack is set to several times
-the client timeout retransmit interval,
-at which time it is assumed all clients have had an opportunity to send their writes
-to the server.
-.ip 3.
-Another advantage of leasing is that, since leases are required at times when other I/O operations occur,
-lease requests can almost always be piggybacked on other RPCs, avoiding some of the
-overhead associated with the explicit open and close RPCs required by a Sprite-like system.
-Compared with Sprite cache consistency,
-this can result in a significantly lower RPC load (see table #1).
-.sh 1 "Limitations of the NQNFS Protocol"
-.pp
-There is a serious risk when leasing is used for delayed write
-caching.
-If the server is simply too busy to service a lease renewal before a write-caching
-lease terminates, the client will not be able to push the write
-data to the server before the lease has terminated, resulting in
-inconsistency.
-Note that the danger of inconsistency occurs when the server assumes that
-a write-caching lease has terminated before the client has
-had the opportunity to write the data back to the server.
-In an effort to avoid this problem, the NQNFS server does not assume that
-a write-caching lease has terminated until three conditions are met:
-.sp
-.(l
-1 - clock time > (expiry time + clock skew)
-2 - there is at least one server daemon (nfsd) waiting for an RPC request
-3 - no write RPCs received for leased file within write_slack after the corrected expiry time
-.)l
-.lp
-The first condition ensures that the lease has expired on the client.
-The clock_skew, by default three seconds, must be
-set to a value larger than the maximum time-of-day clock error that is likely to occur
-during the maximum lease duration.
-The second condition attempts to ensure that the client
-is not waiting for replies to any writes that are still queued for service by
-an nfsd. The third condition tries to guarantee that the client has
-transmitted all write requests to the server, since write_slack is set to
-several times the client's timeout retransmit interval.
-.pp
-There are also certain file system semantics that are problematic for both NFS and NQNFS,
-due to the
-lack of state information maintained by the
-server. If a file is unlinked on one client while open on another it will
-be removed from the file server, resulting in failed file accesses on the
-client that has the file open.
-If the file system on the server is out of space or the client user's disk
-quota has been exceeded, a delayed write can fail long after the write system
-call was successfully completed.
-With NFS this error will be detected by the close system call, since
-the delayed writes are pushed upon close. With NQNFS however, the delayed write
-RPC may not occur until after the close system call, possibly even after the process
-has exited.
-Therefore,
-if a process must check for write errors,
-a system call such as \fIfsync\fR must be used.
-.pp
-Another problem occurs when a process on one client is
-running an executable file
-and a process on another client starts to write to the file. The read lease on
-the first client is terminated by the server, but the client has no recourse but
-to terminate the process, since the process is already in progress on the old
-executable.
-.pp
-The NQNFS protocol does not support file locking, since a file lock would have
-to involve hard, recovered after a crash, state information.
-.sh 1 "Other NQNFS Protocol Features"
-.pp
-NQNFS also includes a variety of minor modifications to the NFS protocol, in an
-attempt to address various limitations.
-The protocol uses 64bit file sizes and offsets in order to handle large files.
-TCP transport may be used as an alternative to UDP
-for cases where UDP does not perform well.
-Transport mechanisms
-such as TCP also permit the use of much larger read/write data sizes,
-which might improve performance in certain environments.
-.pp
-The NQNFS protocol replaces the Readdir RPC with a Readdir_and_Lookup
-RPC that returns the file handle and attributes for each file in the
-directory as well as name and file id number.
-This additional information may then be loaded into the lookup and file-attribute
-caches on the client.
-Thus, for cases such as "ls -l", the \fIstat\fR system calls can be performed
-locally without doing any lookup or getattr RPCs.
-Another additional RPC is the Access RPC that checks for file
-accessibility against the server. This is necessary since in some cases the
-client user ID is mapped to a different user on the server and doing the
-access check locally on the client using file attributes and client credentials is
-not correct.
-One case where this becomes necessary is when the NQNFS mount point is using
-Kerberos authentication, where the Kerberos authentication ticket is translated
-to credentials on the server that are mapped to the client side user id.
-For further details on the protocol, see [Macklem93].
-.sh 1 "Performance"
-.pp
-In order to evaluate the effectiveness of the NQNFS protocol,
-a benchmark was used that was
-designed to typify
-real work on the client workstation.
-Benchmarks, such as Laddis [Wittle93], that perform server load characterization
-are not appropriate for this work, since it is primarily client caching
-efficiency that needs to be evaluated.
-Since these tests are measuring overall client system performance and
-not just the performance of the file system,
-each sequence of runs was performed on identical hardware and operating system in order to factor out the system
-components affecting performance other than the file system protocol.
-.pp
-The equipment used for the all the benchmarks are members of the DECstation\(tm\(dg
-family of workstations using the MIPS\(tm\(sc RISC architecture.
-The operating system running on these systems was a pre-release version of
-4.4BSD Unix\(tm\(dd.
-For all benchmarks, the file server was a DECstation 2100 (10 MIPS) with 8Mbytes of
-memory and a local RZ23 SCSI disk (27msec average access time).
-The clients range in speed from DECstation 2100s
-to a DECstation 5000/25, and always run with six block I/O daemons
-and a 4Mbyte buffer cache, except for the test runs where the
-buffer cache size was the independent variable.
-In all cases /tmp is mounted on the local SCSI disk\**, all machines were
-attached to the same uncongested Ethernet, and ran in single user mode during the benchmarks.
-.(f
-\**Testing using the 4.4BSD MFS [McKusick90] resulted in slightly degraded performance,
-probably since the machines only had 16Mbytes of memory, and so paging
-increased.
-.)f
-Unless noted otherwise, test runs used UDP RPC transport
-and the results given are the average values of four runs.
-.pp
-The benchmark used is the Modified Andrew Benchmark (MAB)
-[Ousterhout90],
-which is a slightly modified version of the benchmark used to characterize
-performance of the Andrew ITC file system [Howard88].
-The MAB was set up with the executable binaries in the remote mounted file
-system and the final load step was commented out, due to a linkage problem
-during testing under 4.4BSD.
-Therefore, these results are not directly comparable to other reported MAB
-results.
-The MAB is made up of five distinct phases:
-.sp
-.ip "1." 10
-Makes five directories (no significant cost)
-.ip "2." 10
-Copy a file system subtree to a working directory
-.ip "3." 10
-Get file attributes (stat) of all the working files
-.ip "4." 10
-Search for strings (grep) in the files
-.ip "5." 10
-Compile a library of C sources and archive them
-.lp
-Of the five phases, the fifth is by far the largest and is the one affected most
-by client caching mechanisms.
-The results for phase #1 are invariant over all
-the caching mechanisms.
-.sh 2 "Buffer Cache Size Tests"
-.pp
-The first experiment was done to see what effect changing the size of the
-buffer cache would have on client performance. A single DECstation 5000/25
-was used to do a series of runs of MAB with different buffer cache sizes
-for four variations of the file system protocol. The four variations are
-as follows:
-.ip "Case 1:" 10
-NFS - The NFS protocol as implemented in 4.4BSD
-.ip "Case 2:" 10
-Leases - The NQNFS protocol using leases for cache consistency
-.ip "Case 3:" 10
-Leases, Rdirlookup - The NQNFS protocol using leases for cache consistency
-and with the readdir RPC replaced by Readdir_and_Lookup
-.ip "Case 4:" 10
-Leases, Attrib leases, Rdirlookup - The NQNFS protocol using leases for
-cache consistency, with the readdir
-RPC replaced by the Readdir_and_Lookup,
-and requiring a valid lease not only for file-data access, but also for file-attribute access.
-.lp
-As can be seen in figure 1, the buffer cache achieves about optimal
-performance for the range of two to ten megabytes in size. At eleven
-megabytes in size, the system pages heavily and the runs did not
-complete in a reasonable time. Even at 64Kbytes, the buffer cache improves
-performance over no buffer cache by a significant margin of 136-148 seconds
-versus 239 seconds.
-This may be due, in part, to the fact that the Compile Phase of the MAB
-uses a rather small working set of file data.
-All variants of NQNFS achieve about
-the same performance, running around 30% faster than NFS, with a slightly
-larger difference for large buffer cache sizes.
-Based on these results, all remaining tests were run with the buffer cache
-size set to 4Mbytes.
-Although I do not know what causes the local peak in the curves between 0.5 and 2 megabytes,
-there is some indication that contention for buffer cache blocks, between the update process
-(which pushes delayed writes to the server every thirty seconds) and the I/O
-system calls, may be involved.
-.(z
-.PS
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 0.900,7.888 to 4.787,7.888
-line dashed from 0.900,7.888 to 0.900,10.262
-line from 0.900,7.888 to 0.963,7.888
-line from 4.787,7.888 to 4.725,7.888
-line from 0.900,8.188 to 0.963,8.188
-line from 4.787,8.188 to 4.725,8.188
-line from 0.900,8.488 to 0.963,8.488
-line from 4.787,8.488 to 4.725,8.488
-line from 0.900,8.775 to 0.963,8.775
-line from 4.787,8.775 to 4.725,8.775
-line from 0.900,9.075 to 0.963,9.075
-line from 4.787,9.075 to 4.725,9.075
-line from 0.900,9.375 to 0.963,9.375
-line from 4.787,9.375 to 4.725,9.375
-line from 0.900,9.675 to 0.963,9.675
-line from 4.787,9.675 to 4.725,9.675
-line from 0.900,9.963 to 0.963,9.963
-line from 4.787,9.963 to 4.725,9.963
-line from 0.900,10.262 to 0.963,10.262
-line from 4.787,10.262 to 4.725,10.262
-line from 0.900,7.888 to 0.900,7.950
-line from 0.900,10.262 to 0.900,10.200
-line from 1.613,7.888 to 1.613,7.950
-line from 1.613,10.262 to 1.613,10.200
-line from 2.312,7.888 to 2.312,7.950
-line from 2.312,10.262 to 2.312,10.200
-line from 3.025,7.888 to 3.025,7.950
-line from 3.025,10.262 to 3.025,10.200
-line from 3.725,7.888 to 3.725,7.950
-line from 3.725,10.262 to 3.725,10.200
-line from 4.438,7.888 to 4.438,7.950
-line from 4.438,10.262 to 4.438,10.200
-line from 0.900,7.888 to 4.787,7.888
-line from 4.787,7.888 to 4.787,10.262
-line from 4.787,10.262 to 0.900,10.262
-line from 0.900,10.262 to 0.900,7.888
-line from 3.800,8.775 to 4.025,8.775
-line from 0.925,10.088 to 0.925,10.088
-line from 0.925,10.088 to 0.938,9.812
-line from 0.938,9.812 to 0.988,9.825
-line from 0.988,9.825 to 1.075,9.838
-line from 1.075,9.838 to 1.163,9.938
-line from 1.163,9.938 to 1.250,9.838
-line from 1.250,9.838 to 1.613,9.825
-line from 1.613,9.825 to 2.312,9.750
-line from 2.312,9.750 to 3.025,9.713
-line from 3.025,9.713 to 3.725,9.850
-line from 3.725,9.850 to 4.438,9.875
-dashwid = 0.037i
-line dotted from 3.800,8.625 to 4.025,8.625
-line dotted from 0.925,9.912 to 0.925,9.912
-line dotted from 0.925,9.912 to 0.938,9.887
-line dotted from 0.938,9.887 to 0.988,9.713
-line dotted from 0.988,9.713 to 1.075,9.562
-line dotted from 1.075,9.562 to 1.163,9.562
-line dotted from 1.163,9.562 to 1.250,9.562
-line dotted from 1.250,9.562 to 1.613,9.675
-line dotted from 1.613,9.675 to 2.312,9.363
-line dotted from 2.312,9.363 to 3.025,9.375
-line dotted from 3.025,9.375 to 3.725,9.387
-line dotted from 3.725,9.387 to 4.438,9.450
-line dashed from 3.800,8.475 to 4.025,8.475
-line dashed from 0.925,10.000 to 0.925,10.000
-line dashed from 0.925,10.000 to 0.938,9.787
-line dashed from 0.938,9.787 to 0.988,9.650
-line dashed from 0.988,9.650 to 1.075,9.537
-line dashed from 1.075,9.537 to 1.163,9.613
-line dashed from 1.163,9.613 to 1.250,9.800
-line dashed from 1.250,9.800 to 1.613,9.488
-line dashed from 1.613,9.488 to 2.312,9.375
-line dashed from 2.312,9.375 to 3.025,9.363
-line dashed from 3.025,9.363 to 3.725,9.325
-line dashed from 3.725,9.325 to 4.438,9.438
-dashwid = 0.075i
-line dotted from 3.800,8.325 to 4.025,8.325
-line dotted from 0.925,9.963 to 0.925,9.963
-line dotted from 0.925,9.963 to 0.938,9.750
-line dotted from 0.938,9.750 to 0.988,9.662
-line dotted from 0.988,9.662 to 1.075,9.613
-line dotted from 1.075,9.613 to 1.163,9.613
-line dotted from 1.163,9.613 to 1.250,9.700
-line dotted from 1.250,9.700 to 1.613,9.438
-line dotted from 1.613,9.438 to 2.312,9.463
-line dotted from 2.312,9.463 to 3.025,9.312
-line dotted from 3.025,9.312 to 3.725,9.387
-line dotted from 3.725,9.387 to 4.438,9.425
-.ps
-.ps -1
-.ft
-.ft I
-"0" at 0.825,7.810 rjust
-"20" at 0.825,8.110 rjust
-"40" at 0.825,8.410 rjust
-"60" at 0.825,8.697 rjust
-"80" at 0.825,8.997 rjust
-"100" at 0.825,9.297 rjust
-"120" at 0.825,9.597 rjust
-"140" at 0.825,9.885 rjust
-"160" at 0.825,10.185 rjust
-"0" at 0.900,7.660
-"2" at 1.613,7.660
-"4" at 2.312,7.660
-"6" at 3.025,7.660
-"8" at 3.725,7.660
-"10" at 4.438,7.660
-"Time (sec)" at 0.150,8.997
-"Buffer Cache Size (MBytes)" at 2.837,7.510
-"Figure #1: MAB Phase 5 (compile)" at 2.837,10.335
-"NFS" at 3.725,8.697 rjust
-"Leases" at 3.725,8.547 rjust
-"Leases, Rdirlookup" at 3.725,8.397 rjust
-"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust
-.ps
-.ft
-.PE
-.)z
-.sh 2 "Multiple Client Load Tests"
-.pp
-During preliminary runs of the MAB, it was observed that the server RPC
-counts were reduced significantly by NQNFS as compared to NFS (table 1).
-(Spritely NFS and Ultrix\(tm4.3/NFS numbers were taken from [Mogul93]
-and are not directly comparable, due to numerous differences in the
-experimental setup including deletion of the load step from phase 5.)
-This suggests
-that the NQNFS protocol might scale better with
-respect to the number of clients accessing the server.
-The experiment described in this section
-ran the MAB on from one to ten clients concurrently, to observe the
-effects of heavier server load.
-The clients were started at roughly the same time by pressing all the
-<return> keys together and, although not synchronized beyond that point,
-all clients would finish the test run within about two seconds of each
-other.
-This was not a realistic load of N active clients, but it did
-result in a reproducible increasing client load on the server.
-The results for the four variants
-are plotted in figures 2-5.
-.(z
-.ps -1
-.TS
-box, center;
-c s s s s s s s
-c c c c c c c c
-l | n n n n n n n.
-Table #1: MAB RPC Counts
-RPC Getattr Read Write Lookup Other GetLease/Open-Close Total
-_
-BSD/NQNFS 277 139 306 575 294 127 1718
-BSD/NFS 1210 506 451 489 238 0 2894
-Spritely NFS 259 836 192 535 306 1467 3595
-Ultrix4.3/NFS 1225 1186 476 810 305 0 4002
-.TE
-.ps
-.)z
-.pp
-For the MAB benchmark, the NQNFS protocol reduces the RPC counts significantly,
-but with a minimum of extra overhead (the GetLease/Open-Close count).
-.(z
-.PS
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 0.900,7.888 to 4.787,7.888
-line dashed from 0.900,7.888 to 0.900,10.262
-line from 0.900,7.888 to 0.963,7.888
-line from 4.787,7.888 to 4.725,7.888
-line from 0.900,8.225 to 0.963,8.225
-line from 4.787,8.225 to 4.725,8.225
-line from 0.900,8.562 to 0.963,8.562
-line from 4.787,8.562 to 4.725,8.562
-line from 0.900,8.900 to 0.963,8.900
-line from 4.787,8.900 to 4.725,8.900
-line from 0.900,9.250 to 0.963,9.250
-line from 4.787,9.250 to 4.725,9.250
-line from 0.900,9.588 to 0.963,9.588
-line from 4.787,9.588 to 4.725,9.588
-line from 0.900,9.925 to 0.963,9.925
-line from 4.787,9.925 to 4.725,9.925
-line from 0.900,10.262 to 0.963,10.262
-line from 4.787,10.262 to 4.725,10.262
-line from 0.900,7.888 to 0.900,7.950
-line from 0.900,10.262 to 0.900,10.200
-line from 1.613,7.888 to 1.613,7.950
-line from 1.613,10.262 to 1.613,10.200
-line from 2.312,7.888 to 2.312,7.950
-line from 2.312,10.262 to 2.312,10.200
-line from 3.025,7.888 to 3.025,7.950
-line from 3.025,10.262 to 3.025,10.200
-line from 3.725,7.888 to 3.725,7.950
-line from 3.725,10.262 to 3.725,10.200
-line from 4.438,7.888 to 4.438,7.950
-line from 4.438,10.262 to 4.438,10.200
-line from 0.900,7.888 to 4.787,7.888
-line from 4.787,7.888 to 4.787,10.262
-line from 4.787,10.262 to 0.900,10.262
-line from 0.900,10.262 to 0.900,7.888
-line from 3.800,8.900 to 4.025,8.900
-line from 1.250,8.325 to 1.250,8.325
-line from 1.250,8.325 to 1.613,8.500
-line from 1.613,8.500 to 2.312,8.825
-line from 2.312,8.825 to 3.025,9.175
-line from 3.025,9.175 to 3.725,9.613
-line from 3.725,9.613 to 4.438,10.012
-dashwid = 0.037i
-line dotted from 3.800,8.750 to 4.025,8.750
-line dotted from 1.250,8.275 to 1.250,8.275
-line dotted from 1.250,8.275 to 1.613,8.412
-line dotted from 1.613,8.412 to 2.312,8.562
-line dotted from 2.312,8.562 to 3.025,9.088
-line dotted from 3.025,9.088 to 3.725,9.375
-line dotted from 3.725,9.375 to 4.438,10.000
-line dashed from 3.800,8.600 to 4.025,8.600
-line dashed from 1.250,8.250 to 1.250,8.250
-line dashed from 1.250,8.250 to 1.613,8.438
-line dashed from 1.613,8.438 to 2.312,8.637
-line dashed from 2.312,8.637 to 3.025,9.088
-line dashed from 3.025,9.088 to 3.725,9.525
-line dashed from 3.725,9.525 to 4.438,10.075
-dashwid = 0.075i
-line dotted from 3.800,8.450 to 4.025,8.450
-line dotted from 1.250,8.262 to 1.250,8.262
-line dotted from 1.250,8.262 to 1.613,8.425
-line dotted from 1.613,8.425 to 2.312,8.613
-line dotted from 2.312,8.613 to 3.025,9.137
-line dotted from 3.025,9.137 to 3.725,9.512
-line dotted from 3.725,9.512 to 4.438,9.988
-.ps
-.ps -1
-.ft
-.ft I
-"0" at 0.825,7.810 rjust
-"20" at 0.825,8.147 rjust
-"40" at 0.825,8.485 rjust
-"60" at 0.825,8.822 rjust
-"80" at 0.825,9.172 rjust
-"100" at 0.825,9.510 rjust
-"120" at 0.825,9.847 rjust
-"140" at 0.825,10.185 rjust
-"0" at 0.900,7.660
-"2" at 1.613,7.660
-"4" at 2.312,7.660
-"6" at 3.025,7.660
-"8" at 3.725,7.660
-"10" at 4.438,7.660
-"Time (sec)" at 0.150,8.997
-"Number of Clients" at 2.837,7.510
-"Figure #2: MAB Phase 2 (copying)" at 2.837,10.335
-"NFS" at 3.725,8.822 rjust
-"Leases" at 3.725,8.672 rjust
-"Leases, Rdirlookup" at 3.725,8.522 rjust
-"Leases, Attrib leases, Rdirlookup" at 3.725,8.372 rjust
-.ps
-.ft
-.PE
-.)z
-.(z
-.PS
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 0.900,7.888 to 4.787,7.888
-line dashed from 0.900,7.888 to 0.900,10.262
-line from 0.900,7.888 to 0.963,7.888
-line from 4.787,7.888 to 4.725,7.888
-line from 0.900,8.188 to 0.963,8.188
-line from 4.787,8.188 to 4.725,8.188
-line from 0.900,8.488 to 0.963,8.488
-line from 4.787,8.488 to 4.725,8.488
-line from 0.900,8.775 to 0.963,8.775
-line from 4.787,8.775 to 4.725,8.775
-line from 0.900,9.075 to 0.963,9.075
-line from 4.787,9.075 to 4.725,9.075
-line from 0.900,9.375 to 0.963,9.375
-line from 4.787,9.375 to 4.725,9.375
-line from 0.900,9.675 to 0.963,9.675
-line from 4.787,9.675 to 4.725,9.675
-line from 0.900,9.963 to 0.963,9.963
-line from 4.787,9.963 to 4.725,9.963
-line from 0.900,10.262 to 0.963,10.262
-line from 4.787,10.262 to 4.725,10.262
-line from 0.900,7.888 to 0.900,7.950
-line from 0.900,10.262 to 0.900,10.200
-line from 1.613,7.888 to 1.613,7.950
-line from 1.613,10.262 to 1.613,10.200
-line from 2.312,7.888 to 2.312,7.950
-line from 2.312,10.262 to 2.312,10.200
-line from 3.025,7.888 to 3.025,7.950
-line from 3.025,10.262 to 3.025,10.200
-line from 3.725,7.888 to 3.725,7.950
-line from 3.725,10.262 to 3.725,10.200
-line from 4.438,7.888 to 4.438,7.950
-line from 4.438,10.262 to 4.438,10.200
-line from 0.900,7.888 to 4.787,7.888
-line from 4.787,7.888 to 4.787,10.262
-line from 4.787,10.262 to 0.900,10.262
-line from 0.900,10.262 to 0.900,7.888
-line from 3.800,8.775 to 4.025,8.775
-line from 1.250,8.975 to 1.250,8.975
-line from 1.250,8.975 to 1.613,8.963
-line from 1.613,8.963 to 2.312,8.988
-line from 2.312,8.988 to 3.025,9.037
-line from 3.025,9.037 to 3.725,9.062
-line from 3.725,9.062 to 4.438,9.100
-dashwid = 0.037i
-line dotted from 3.800,8.625 to 4.025,8.625
-line dotted from 1.250,9.312 to 1.250,9.312
-line dotted from 1.250,9.312 to 1.613,9.287
-line dotted from 1.613,9.287 to 2.312,9.675
-line dotted from 2.312,9.675 to 3.025,9.262
-line dotted from 3.025,9.262 to 3.725,9.738
-line dotted from 3.725,9.738 to 4.438,9.512
-line dashed from 3.800,8.475 to 4.025,8.475
-line dashed from 1.250,9.400 to 1.250,9.400
-line dashed from 1.250,9.400 to 1.613,9.287
-line dashed from 1.613,9.287 to 2.312,9.575
-line dashed from 2.312,9.575 to 3.025,9.300
-line dashed from 3.025,9.300 to 3.725,9.613
-line dashed from 3.725,9.613 to 4.438,9.512
-dashwid = 0.075i
-line dotted from 3.800,8.325 to 4.025,8.325
-line dotted from 1.250,9.400 to 1.250,9.400
-line dotted from 1.250,9.400 to 1.613,9.412
-line dotted from 1.613,9.412 to 2.312,9.700
-line dotted from 2.312,9.700 to 3.025,9.537
-line dotted from 3.025,9.537 to 3.725,9.938
-line dotted from 3.725,9.938 to 4.438,9.812
-.ps
-.ps -1
-.ft
-.ft I
-"0" at 0.825,7.810 rjust
-"5" at 0.825,8.110 rjust
-"10" at 0.825,8.410 rjust
-"15" at 0.825,8.697 rjust
-"20" at 0.825,8.997 rjust
-"25" at 0.825,9.297 rjust
-"30" at 0.825,9.597 rjust
-"35" at 0.825,9.885 rjust
-"40" at 0.825,10.185 rjust
-"0" at 0.900,7.660
-"2" at 1.613,7.660
-"4" at 2.312,7.660
-"6" at 3.025,7.660
-"8" at 3.725,7.660
-"10" at 4.438,7.660
-"Time (sec)" at 0.150,8.997
-"Number of Clients" at 2.837,7.510
-"Figure #3: MAB Phase 3 (stat/find)" at 2.837,10.335
-"NFS" at 3.725,8.697 rjust
-"Leases" at 3.725,8.547 rjust
-"Leases, Rdirlookup" at 3.725,8.397 rjust
-"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust
-.ps
-.ft
-.PE
-.)z
-.(z
-.PS
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 0.900,7.888 to 4.787,7.888
-line dashed from 0.900,7.888 to 0.900,10.262
-line from 0.900,7.888 to 0.963,7.888
-line from 4.787,7.888 to 4.725,7.888
-line from 0.900,8.188 to 0.963,8.188
-line from 4.787,8.188 to 4.725,8.188
-line from 0.900,8.488 to 0.963,8.488
-line from 4.787,8.488 to 4.725,8.488
-line from 0.900,8.775 to 0.963,8.775
-line from 4.787,8.775 to 4.725,8.775
-line from 0.900,9.075 to 0.963,9.075
-line from 4.787,9.075 to 4.725,9.075
-line from 0.900,9.375 to 0.963,9.375
-line from 4.787,9.375 to 4.725,9.375
-line from 0.900,9.675 to 0.963,9.675
-line from 4.787,9.675 to 4.725,9.675
-line from 0.900,9.963 to 0.963,9.963
-line from 4.787,9.963 to 4.725,9.963
-line from 0.900,10.262 to 0.963,10.262
-line from 4.787,10.262 to 4.725,10.262
-line from 0.900,7.888 to 0.900,7.950
-line from 0.900,10.262 to 0.900,10.200
-line from 1.613,7.888 to 1.613,7.950
-line from 1.613,10.262 to 1.613,10.200
-line from 2.312,7.888 to 2.312,7.950
-line from 2.312,10.262 to 2.312,10.200
-line from 3.025,7.888 to 3.025,7.950
-line from 3.025,10.262 to 3.025,10.200
-line from 3.725,7.888 to 3.725,7.950
-line from 3.725,10.262 to 3.725,10.200
-line from 4.438,7.888 to 4.438,7.950
-line from 4.438,10.262 to 4.438,10.200
-line from 0.900,7.888 to 4.787,7.888
-line from 4.787,7.888 to 4.787,10.262
-line from 4.787,10.262 to 0.900,10.262
-line from 0.900,10.262 to 0.900,7.888
-line from 3.800,8.775 to 4.025,8.775
-line from 1.250,9.412 to 1.250,9.412
-line from 1.250,9.412 to 1.613,9.425
-line from 1.613,9.425 to 2.312,9.463
-line from 2.312,9.463 to 3.025,9.600
-line from 3.025,9.600 to 3.725,9.875
-line from 3.725,9.875 to 4.438,10.075
-dashwid = 0.037i
-line dotted from 3.800,8.625 to 4.025,8.625
-line dotted from 1.250,9.450 to 1.250,9.450
-line dotted from 1.250,9.450 to 1.613,9.438
-line dotted from 1.613,9.438 to 2.312,9.438
-line dotted from 2.312,9.438 to 3.025,9.525
-line dotted from 3.025,9.525 to 3.725,9.550
-line dotted from 3.725,9.550 to 4.438,9.662
-line dashed from 3.800,8.475 to 4.025,8.475
-line dashed from 1.250,9.438 to 1.250,9.438
-line dashed from 1.250,9.438 to 1.613,9.412
-line dashed from 1.613,9.412 to 2.312,9.450
-line dashed from 2.312,9.450 to 3.025,9.500
-line dashed from 3.025,9.500 to 3.725,9.613
-line dashed from 3.725,9.613 to 4.438,9.675
-dashwid = 0.075i
-line dotted from 3.800,8.325 to 4.025,8.325
-line dotted from 1.250,9.387 to 1.250,9.387
-line dotted from 1.250,9.387 to 1.613,9.600
-line dotted from 1.613,9.600 to 2.312,9.625
-line dotted from 2.312,9.625 to 3.025,9.738
-line dotted from 3.025,9.738 to 3.725,9.850
-line dotted from 3.725,9.850 to 4.438,9.800
-.ps
-.ps -1
-.ft
-.ft I
-"0" at 0.825,7.810 rjust
-"5" at 0.825,8.110 rjust
-"10" at 0.825,8.410 rjust
-"15" at 0.825,8.697 rjust
-"20" at 0.825,8.997 rjust
-"25" at 0.825,9.297 rjust
-"30" at 0.825,9.597 rjust
-"35" at 0.825,9.885 rjust
-"40" at 0.825,10.185 rjust
-"0" at 0.900,7.660
-"2" at 1.613,7.660
-"4" at 2.312,7.660
-"6" at 3.025,7.660
-"8" at 3.725,7.660
-"10" at 4.438,7.660
-"Time (sec)" at 0.150,8.997
-"Number of Clients" at 2.837,7.510
-"Figure #4: MAB Phase 4 (grep/wc/find)" at 2.837,10.335
-"NFS" at 3.725,8.697 rjust
-"Leases" at 3.725,8.547 rjust
-"Leases, Rdirlookup" at 3.725,8.397 rjust
-"Leases, Attrib leases, Rdirlookup" at 3.725,8.247 rjust
-.ps
-.ft
-.PE
-.)z
-.(z
-.PS
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 0.900,7.888 to 4.787,7.888
-line dashed from 0.900,7.888 to 0.900,10.262
-line from 0.900,7.888 to 0.963,7.888
-line from 4.787,7.888 to 4.725,7.888
-line from 0.900,8.150 to 0.963,8.150
-line from 4.787,8.150 to 4.725,8.150
-line from 0.900,8.412 to 0.963,8.412
-line from 4.787,8.412 to 4.725,8.412
-line from 0.900,8.675 to 0.963,8.675
-line from 4.787,8.675 to 4.725,8.675
-line from 0.900,8.938 to 0.963,8.938
-line from 4.787,8.938 to 4.725,8.938
-line from 0.900,9.213 to 0.963,9.213
-line from 4.787,9.213 to 4.725,9.213
-line from 0.900,9.475 to 0.963,9.475
-line from 4.787,9.475 to 4.725,9.475
-line from 0.900,9.738 to 0.963,9.738
-line from 4.787,9.738 to 4.725,9.738
-line from 0.900,10.000 to 0.963,10.000
-line from 4.787,10.000 to 4.725,10.000
-line from 0.900,10.262 to 0.963,10.262
-line from 4.787,10.262 to 4.725,10.262
-line from 0.900,7.888 to 0.900,7.950
-line from 0.900,10.262 to 0.900,10.200
-line from 1.613,7.888 to 1.613,7.950
-line from 1.613,10.262 to 1.613,10.200
-line from 2.312,7.888 to 2.312,7.950
-line from 2.312,10.262 to 2.312,10.200
-line from 3.025,7.888 to 3.025,7.950
-line from 3.025,10.262 to 3.025,10.200
-line from 3.725,7.888 to 3.725,7.950
-line from 3.725,10.262 to 3.725,10.200
-line from 4.438,7.888 to 4.438,7.950
-line from 4.438,10.262 to 4.438,10.200
-line from 0.900,7.888 to 4.787,7.888
-line from 4.787,7.888 to 4.787,10.262
-line from 4.787,10.262 to 0.900,10.262
-line from 0.900,10.262 to 0.900,7.888
-line from 3.800,8.675 to 4.025,8.675
-line from 1.250,8.800 to 1.250,8.800
-line from 1.250,8.800 to 1.613,8.912
-line from 1.613,8.912 to 2.312,9.113
-line from 2.312,9.113 to 3.025,9.438
-line from 3.025,9.438 to 3.725,9.750
-line from 3.725,9.750 to 4.438,10.088
-dashwid = 0.037i
-line dotted from 3.800,8.525 to 4.025,8.525
-line dotted from 1.250,8.637 to 1.250,8.637
-line dotted from 1.250,8.637 to 1.613,8.700
-line dotted from 1.613,8.700 to 2.312,8.713
-line dotted from 2.312,8.713 to 3.025,8.775
-line dotted from 3.025,8.775 to 3.725,8.887
-line dotted from 3.725,8.887 to 4.438,9.037
-line dashed from 3.800,8.375 to 4.025,8.375
-line dashed from 1.250,8.675 to 1.250,8.675
-line dashed from 1.250,8.675 to 1.613,8.688
-line dashed from 1.613,8.688 to 2.312,8.713
-line dashed from 2.312,8.713 to 3.025,8.825
-line dashed from 3.025,8.825 to 3.725,8.887
-line dashed from 3.725,8.887 to 4.438,9.062
-dashwid = 0.075i
-line dotted from 3.800,8.225 to 4.025,8.225
-line dotted from 1.250,8.700 to 1.250,8.700
-line dotted from 1.250,8.700 to 1.613,8.688
-line dotted from 1.613,8.688 to 2.312,8.762
-line dotted from 2.312,8.762 to 3.025,8.812
-line dotted from 3.025,8.812 to 3.725,8.925
-line dotted from 3.725,8.925 to 4.438,9.025
-.ps
-.ps -1
-.ft
-.ft I
-"0" at 0.825,7.810 rjust
-"50" at 0.825,8.072 rjust
-"100" at 0.825,8.335 rjust
-"150" at 0.825,8.597 rjust
-"200" at 0.825,8.860 rjust
-"250" at 0.825,9.135 rjust
-"300" at 0.825,9.397 rjust
-"350" at 0.825,9.660 rjust
-"400" at 0.825,9.922 rjust
-"450" at 0.825,10.185 rjust
-"0" at 0.900,7.660
-"2" at 1.613,7.660
-"4" at 2.312,7.660
-"6" at 3.025,7.660
-"8" at 3.725,7.660
-"10" at 4.438,7.660
-"Time (sec)" at 0.150,8.997
-"Number of Clients" at 2.837,7.510
-"Figure #5: MAB Phase 5 (compile)" at 2.837,10.335
-"NFS" at 3.725,8.597 rjust
-"Leases" at 3.725,8.447 rjust
-"Leases, Rdirlookup" at 3.725,8.297 rjust
-"Leases, Attrib leases, Rdirlookup" at 3.725,8.147 rjust
-.ps
-.ft
-.PE
-.)z
-.pp
-In figure 2, where a subtree of seventy small files is copied, the difference between the protocol variants is minimal,
-with the NQNFS variants performing slightly better.
-For this case, the Readdir_and_Lookup RPC is a slight hindrance under heavy
-load, possibly because it results in larger directory blocks in the buffer
-cache.
-.pp
-In figure 3, for the phase that gets file attributes for a large number
-of files, the leasing variants take about 50% longer, indicating that
-there are performance problems in this area. For the case where valid
-current leases are required for every file when attributes are returned,
-the performance is significantly worse than when the attributes are allowed
-to be stale by a few seconds on the client.
-I have not been able to explain the oscillation in the curves for the
-Lease cases.
-.pp
-For the string searching phase depicted in figure 4, the leasing variants
-that do not require valid leases for files when attributes are returned
-appear to scale better with server load than NFS.
-However, the effect appears to be
-negligible until the server load is fairly heavy.
-.pp
-Most of the time in the MAB benchmark is spent in the compilation phase
-and this is where the differences between caching methods are most
-pronounced.
-In figure 5 it can be seen that any protocol variant using Leases performs
-about a factor of two better than NFS
-at a load of ten clients. This indicates that the use of NQNFS may
-allow servers to handle significantly more clients for this type of
-workload.
-.pp
-Table 2 summarizes the MAB run times for all phases for the single client
-DECstation 5000/25. The \fILeases\fR case refers to using leases, whereas
-the \fILeases, Rdirl\fR case uses the Readdir_and_Lookup RPC as well and
-the \fIBCache Only\fR case uses leases, but only the buffer cache and not
-the attribute or name caches.
-The \fINo Caching\fR cases does not do any client side caching, performing
-all system calls via synchronous RPCs to the server.
-.(z
-.ps -1
-.TS
-box, center;
-c s s s s s s
-c c c c c c c c
-l | n n n n n n n.
-Table #2: Single DECstation 5000/25 Client Elapsed Times (sec)
-Phase 1 2 3 4 5 Total % Improvement
-_
-No Caching 6 35 41 40 258 380 -93
-NFS 5 24 15 20 133 197 0
-BCache Only 5 20 24 23 116 188 5
-Leases, Rdirl 5 20 21 20 105 171 13
-Leases 5 19 21 21 99 165 16
-.TE
-.ps
-.)z
-.sh 2 "Processor Speed Tests"
-.pp
-An important goal of client-side file system caching is to decouple the
-I/O system calls from the underlying distributed file system, so that the
-client's system performance might scale with processor speed. In order
-to test this, a series of MAB runs were performed on three
-DECstations that are similar except for processor speed.
-In addition to the four protocol variants used for the above tests, runs
-were done with the client caches turned off, for
-worst case performance numbers for caching mechanisms with a 100% miss rate. The CPU utilization
-was measured, as an indicator of how much the processor was blocking for
-I/O system calls. Note that since the systems were running in single user mode
-and otherwise quiescent, almost all CPU activity was directly related
-to the MAB run.
-The results are presented in
-table 3.
-The CPU time is simply the product of the CPU utilization and
-elapsed running time and, as such, is the optimistic bound on performance
-achievable with an ideal client caching scheme that never blocks for I/O.
-.(z
-.ps -1
-.TS
-box, center;
-c s s s s s s s s s
-c c s s c s s c s s
-c c c c c c c c c c
-c c c c c c c c c c
-l | n n n n n n n n n.
-Table #3: MAB Phase 5 (compile)
- DS2100 (10.5 MIPS) DS3100 (14.0 MIPS) DS5000/25 (26.7 MIPS)
- Elapsed CPU CPU Elapsed CPU CPU Elapsed CPU CPU
- time Util(%) time time Util(%) time time Util(%) time
-_
-Leases 143 89 127 113 87 98 99 89 88
-Leases, Rdirl 150 89 134 110 91 100 105 88 92
-BCache Only 169 85 144 129 78 101 116 75 87
-NFS 172 77 132 135 74 100 133 71 94
-No Caching 330 47 155 256 41 105 258 39 101
-.TE
-.ps
-.)z
-As can be seen in the table, any caching mechanism achieves significantly
-better performance than when caching is disabled, roughly doubling the CPU
-utilization with a corresponding reduction in run time. For NFS, the CPU
-utilization is dropping with increase in CPU speed, which would suggest that
-it is not scaling with CPU speed. For the NQNFS variants, the CPU utilization
-remains at just below 90%, which suggests that the caching mechanism is working
-well and scaling within this CPU range.
-Note that for this benchmark, the ratio of CPU times for
-the DECstation 3100 and DECstation 5000/25 are quite different than the
-Dhrystone MIPS ratings would suggest.
-.pp
-Overall, the results seem encouraging, although it remains to be seen whether
-or not the caching provided by NQNFS can continue to scale with CPU
-performance.
-There is a good indication that NQNFS permits a server to scale
-to more clients than does NFS, at least for workloads akin to the MAB compile phase.
-A more difficult question is "What if the server is much faster doing
-write RPCs?" as a result of some technology such as Prestoserve
-or write gathering.
-Since a significant part of the difference between NFS and NQNFS is
-the synchronous writing, it is difficult to predict how much a server
-capable of fast write RPCs will negate the performance improvements of NQNFS.
-At the very least, table 1 indicates that the write RPC load on the server
-has decreased by approximately 30%, and this reduced write load should still
-result in some improvement.
-.pp
-Indications are that the Readdir_and_Lookup RPC has not improved performance
-for these tests and may in fact be degrading performance slightly.
-The results in figure 3 indicate some problems, possibly with handling
-of the attribute cache. It seems logical that the Readdir_and_Lookup RPC
-should be permit priming of the attribute cache improving hit rate, but the
-results are counter to that.
-.sh 2 "Internetwork Delay Tests"
-.pp
-This experimental setup was used to explore how the different protocol
-variants might perform over internetworks with larger RPC RTTs. The
-server was moved to a separate Ethernet, using a MicroVAXII\(tm as an
-IP router to the other Ethernet. The 4.3Reno BSD Unix system running on the
-MicroVAXII was modified to delay IP packets being forwarded by a tunable N
-millisecond delay. The implementation was rather crude and did not try to
-simulate a distribution of delay times nor was it programmed to drop packets
-at a given rate, but it served as a simple emulation of a long,
-fat network\** [Jacobson88].
-.(f
-\**Long fat networks refer to network interconnections with
-a Bandwidth X RTT product > 10\u5\d bits.
-.)f
-The MAB was run using both UDP and TCP RPC transports
-for a variety of RTT delays from five to two hundred milliseconds,
-to observe the effects of RTT delay on RPC transport.
-It was found that, due to a high variability between runs, four runs was not
-suffice, so eight runs at each value was done.
-The results in figure 6 and table 4 are the average for the eight runs.
-.(z
-.PS
-.ps
-.ps 10
-dashwid = 0.050i
-line dashed from 0.900,7.888 to 4.787,7.888
-line dashed from 0.900,7.888 to 0.900,10.262
-line from 0.900,7.888 to 0.963,7.888
-line from 4.787,7.888 to 4.725,7.888
-line from 0.900,8.350 to 0.963,8.350
-line from 4.787,8.350 to 4.725,8.350
-line from 0.900,8.800 to 0.963,8.800
-line from 4.787,8.800 to 4.725,8.800
-line from 0.900,9.262 to 0.963,9.262
-line from 4.787,9.262 to 4.725,9.262
-line from 0.900,9.713 to 0.963,9.713
-line from 4.787,9.713 to 4.725,9.713
-line from 0.900,10.175 to 0.963,10.175
-line from 4.787,10.175 to 4.725,10.175
-line from 0.900,7.888 to 0.900,7.950
-line from 0.900,10.262 to 0.900,10.200
-line from 1.825,7.888 to 1.825,7.950
-line from 1.825,10.262 to 1.825,10.200
-line from 2.750,7.888 to 2.750,7.950
-line from 2.750,10.262 to 2.750,10.200
-line from 3.675,7.888 to 3.675,7.950
-line from 3.675,10.262 to 3.675,10.200
-line from 4.600,7.888 to 4.600,7.950
-line from 4.600,10.262 to 4.600,10.200
-line from 0.900,7.888 to 4.787,7.888
-line from 4.787,7.888 to 4.787,10.262
-line from 4.787,10.262 to 0.900,10.262
-line from 0.900,10.262 to 0.900,7.888
-line from 4.125,8.613 to 4.350,8.613
-line from 0.988,8.400 to 0.988,8.400
-line from 0.988,8.400 to 1.637,8.575
-line from 1.637,8.575 to 2.375,8.713
-line from 2.375,8.713 to 3.125,8.900
-line from 3.125,8.900 to 3.862,9.137
-line from 3.862,9.137 to 4.600,9.425
-dashwid = 0.037i
-line dotted from 4.125,8.463 to 4.350,8.463
-line dotted from 0.988,8.375 to 0.988,8.375
-line dotted from 0.988,8.375 to 1.637,8.525
-line dotted from 1.637,8.525 to 2.375,8.850
-line dotted from 2.375,8.850 to 3.125,8.975
-line dotted from 3.125,8.975 to 3.862,9.137
-line dotted from 3.862,9.137 to 4.600,9.625
-line dashed from 4.125,8.312 to 4.350,8.312
-line dashed from 0.988,8.525 to 0.988,8.525
-line dashed from 0.988,8.525 to 1.637,8.688
-line dashed from 1.637,8.688 to 2.375,8.838
-line dashed from 2.375,8.838 to 3.125,9.150
-line dashed from 3.125,9.150 to 3.862,9.275
-line dashed from 3.862,9.275 to 4.600,9.588
-dashwid = 0.075i
-line dotted from 4.125,8.162 to 4.350,8.162
-line dotted from 0.988,8.525 to 0.988,8.525
-line dotted from 0.988,8.525 to 1.637,8.838
-line dotted from 1.637,8.838 to 2.375,8.863
-line dotted from 2.375,8.863 to 3.125,9.137
-line dotted from 3.125,9.137 to 3.862,9.387
-line dotted from 3.862,9.387 to 4.600,10.200
-.ps
-.ps -1
-.ft
-.ft I
-"0" at 0.825,7.810 rjust
-"100" at 0.825,8.272 rjust
-"200" at 0.825,8.722 rjust
-"300" at 0.825,9.185 rjust
-"400" at 0.825,9.635 rjust
-"500" at 0.825,10.097 rjust
-"0" at 0.900,7.660
-"50" at 1.825,7.660
-"100" at 2.750,7.660
-"150" at 3.675,7.660
-"200" at 4.600,7.660
-"Time (sec)" at 0.150,8.997
-"Round Trip Delay (msec)" at 2.837,7.510
-"Figure #6: MAB Phase 5 (compile)" at 2.837,10.335
-"Leases,UDP" at 4.050,8.535 rjust
-"Leases,TCP" at 4.050,8.385 rjust
-"NFS,UDP" at 4.050,8.235 rjust
-"NFS,TCP" at 4.050,8.085 rjust
-.ps
-.ft
-.PE
-.)z
-.(z
-.ps -1
-.TS
-box, center;
-c s s s s s s s s
-c c s c s c s c s
-c c c c c c c c c
-c c c c c c c c c
-l | n n n n n n n n.
-Table #4: MAB Phase 5 (compile) for Internetwork Delays
- NFS,UDP NFS,TCP Leases,UDP Leases,TCP
-Delay Elapsed Standard Elapsed Standard Elapsed Standard Elapsed Standard
-(msec) time (sec) Deviation time (sec) Deviation time (sec) Deviation time (sec) Deviation
-_
-5 139 2.9 139 2.4 112 7.0 108 6.0
-40 175 5.1 208 44.5 150 23.8 139 4.3
-80 207 3.9 213 4.7 180 7.7 210 52.9
-120 276 29.3 273 17.1 221 7.7 238 5.8
-160 304 7.2 328 77.1 275 21.5 274 10.1
-200 372 35.0 506 235.1 338 25.2 379 69.2
-.TE
-.ps
-.)z
-.pp
-I found these results somewhat surprising, since I had assumed that stability
-across an internetwork connection would be a function of RPC transport
-protocol.
-Looking at the standard deviations observed between the eight runs, there is an indication
-that the NQNFS protocol plays a larger role in
-maintaining stability than the underlying RPC transport protocol.
-It appears that NFS over TCP transport
-is the least stable variant tested.
-It should be noted that the TCP implementation used was roughly at 4.3BSD Tahoe
-release and that the 4.4BSD TCP implementation was far less stable and would
-fail intermittently, due to a bug I was not able to isolate.
-It would appear that some of the recent enhancements to the 4.4BSD TCP
-implementation have a detrimental effect on the performance of
-RPC-type traffic loads, which intermix small and large
-data transfers in both directions.
-It is obvious that more exploration of this area is needed before any
-conclusions can be made
-beyond the fact that over a local area network, TCP transport provides
-performance comparable to UDP.
-.sh 1 "Lessons Learned"
-.pp
-Evaluating the performance of a distributed file system is fraught with
-difficulties, due to the many software and hardware factors involved.
-The limited benchmarking presented here took a considerable amount of time
-and the results gained by the exercise only give indications of what the
-performance might be for a few scenarios.
-.pp
-The IP router with delay introduction proved to be a valuable tool for protocol debugging\**,
-.(f
-\**It exposed two bugs in the 4.4BSD networking, one a problem in the Lance chip
-driver for the DECstation and the other a TCP window sizing problem that I was
-not able to isolate.
-.)f
-and may be useful for a more extensive study of performance over internetworks
-if enhanced to do a better job of simulating internetwork delay and packet loss.
-.pp
-The Leases mechanism provided a simple model for the provision of cache
-consistency and did seem to improve performance for various scenarios.
-Unfortunately, it does not provide the server state information that is required
-for file system semantics, such as locking, that many software systems demand.
-In production environments on my campus, the need for file locking and the correct
-generation of the ETXTBSY error code
-are far more important that full cache consistency, and leasing
-does not satisfy these needs.
-Another file system semantic that requires hard server state is the delay
-of file removal until the last close system call. Although Spritely NFS
-did not support this semantic either, it is logical that the open file
-state maintained by that system would facilitate the implementation of
-this semantic more easily than would the Leases mechanism.
-.sh 1 "Further Work"
-.pp
-The current implementation uses a fixed, moderate sized buffer cache designed
-for the local UFS [McKusick84] file system.
-The results in figure 1 suggest that this is adequate so long as the cache
-is of an appropriate size.
-However, a mechanism permitting the cache to vary in size
-has been shown to outperform fixed sized buffer caches [Nelson90], and could
-be beneficial. It could also be useful to allow the buffer cache to grow very
-large by making use of local backing store for cases where server performance
-is limited.
-A very large buffer cache size would in turn permit experimentation with
-much larger read/write data sizes, facilitating bulk data transfers
-across long fat networks, such as will characterize the Internet of the
-near future.
-A careful redesign of the buffer cache mechanism to provide
-support for these features would probably be the next implementation step.
-.pp
-The results in figure 3 indicate that the mechanics of caching file
-attributes and maintaining the attribute cache's consistency needs to
-be looked at further.
-There also needs to be more work done on the interaction between a
-Readdir_and_Lookup RPC and the name and attribute caches, in an effort
-to reduce Getattr and Lookup RPC loads.
-.pp
-The NQNFS protocol has never been used in a production environment and doing
-so would provide needed insight into how well the protocol saisfies the
-needs of real workstation environments.
-It is hoped that the distribution of the implementation in 4.4BSD will
-facilitate use of the protocol in production environments elsewhere.
-.pp
-The big question that needs to be resolved is whether Leases are an adequate
-mechanism for cache consistency or whether hard server state is required.
-Given the work presented here and in the papers related to Sprite and Spritely
-NFS, there are clear indications that a cache consistency algorithm can
-improve both performance and file system semantics.
-As yet, however, it is unclear what the best approach to maintain consistency is.
-It would appear that hard state information is required for file locking and
-other mechanisms and, if so, it seems appropriate to use it for cache
-consistency as well.
-.sh 1 "Acknowledgements"
-.pp
-I would like to thank the members of the CSRG at the University of California,
-Berkeley for their continued support over the years. Without their encouragement and assistance this
-software would never have been implemented.
-Prof. Jim Linders and Prof. Tom Wilson here at the University of Guelph helped
-proofread this paper and Jeffrey Mogul provided a great deal of
-assistance, helping to turn my gibberish into something at least moderately
-readable.
-.sh 1 "References"
-.ip [Baker91] 15
-Mary Baker and John Ousterhout, Availability in the Sprite Distributed
-File System, In \fIOperating System Review\fR, (25)2, pg. 95-98,
-April 1991.
-.ip [Baker91a] 15
-Mary Baker, private communication, May 1991.
-.ip [Burrows88] 15
-Michael Burrows, Efficient Data Sharing, Technical Report #153,
-Computer Laboratory, University of Cambridge, Dec. 1988.
-.ip [Gray89] 15
-Cary G. Gray and David R. Cheriton, Leases: An Efficient Fault-Tolerant
-Mechanism for Distributed File Cache Consistency, In \fIProc. of the
-Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park,
-AZ, Dec. 1989.
-.ip [Howard88] 15
-John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols,
-M. Satyanarayanan, Robert N. Sidebotham and Michael J. West,
-Scale and Performance in a Distributed File System, \fIACM Trans. on
-Computer Systems\fR, (6)1, pg 51-81, Feb. 1988.
-.ip [Jacobson88] 15
-Van Jacobson and R. Braden, \fITCP Extensions for Long-Delay Paths\fR,
-ARPANET Working Group Requests for Comment, DDN Network Information Center,
-SRI International, Menlo Park, CA, October 1988, RFC-1072.
-.ip [Jacobson89] 15
-Van Jacobson, Sun NFS Performance Problems, \fIPrivate Communication,\fR
-November, 1989.
-.ip [Juszczak89] 15
-Chet Juszczak, Improving the Performance and Correctness of an NFS Server,
-In \fIProc. Winter 1989 USENIX Conference,\fR pg. 53-63, San Diego, CA, January 1989.
-.ip [Juszczak94] 15
-Chet Juszczak, Improving the Write Performance of an NFS Server,
-to appear in \fIProc. Winter 1994 USENIX Conference,\fR San Francisco, CA, January 1994.
-.ip [Kazar88] 15
-Michael L. Kazar, Synchronization and Caching Issues in the Andrew File System,
-In \fIProc. Winter 1988 USENIX Conference,\fR pg. 27-36, Dallas, TX, February
-1988.
-.ip [Kent87] 15
-Christopher. A. Kent and Jeffrey C. Mogul, \fIFragmentation Considered Harmful\fR, Research Report 87/3,
-Digital Equipment Corporation Western Research Laboratory, Dec. 1987.
-.ip [Kent87a] 15
-Christopher. A. Kent, \fICache Coherence in Distributed Systems\fR, Research Report 87/4,
-Digital Equipment Corporation Western Research Laboratory, April 1987.
-.ip [Macklem90] 15
-Rick Macklem, Lessons Learned Tuning the 4.3BSD Reno Implementation of the
-NFS Protocol,
-In \fIProc. Winter 1991 USENIX Conference,\fR pg. 53-64, Dallas, TX,
-January 1991.
-.ip [Macklem93] 15
-Rick Macklem, The 4.4BSD NFS Implementation,
-In \fIThe System Manager's Manual\fR, 4.4 Berkeley Software Distribution,
-University of California, Berkeley, June 1993.
-.ip [McKusick84] 15
-Marshall K. McKusick, William N. Joy, Samuel J. Leffler and Robert S. Fabry,
-A Fast File System for UNIX, \fIACM Transactions on Computer Systems\fR,
-Vol. 2, Number 3, pg. 181-197, August 1984.
-.ip [McKusick90] 15
-Marshall K. McKusick, Michael J. Karels and Keith Bostic, A Pageable Memory
-Based Filesystem,
-In \fIProc. Summer 1990 USENIX Conference,\fR pg. 137-143, Anaheim, CA, June
-1990.
-.ip [Mogul93] 15
-Jeffrey C. Mogul, Recovery in Spritely NFS,
-Research Report 93/2, Digital Equipment Corporation Western Research
-Laboratory, June 1993.
-.ip [Moran90] 15
-Joseph Moran, Russel Sandberg, Don Coleman, Jonathan Kepecs and Bob Lyon,
-Breaking Through the NFS Performance Barrier,
-In \fIProc. Spring 1990 EUUG Conference,\fR pg. 199-206, Munich, FRG,
-April 1990.
-.ip [Nelson88] 15
-Michael N. Nelson, Brent B. Welch, and John K. Ousterhout, Caching in the
-Sprite Network File System, \fIACM Transactions on Computer Systems\fR (6)1
-pg. 134-154, February 1988.
-.ip [Nelson90] 15
-Michael N. Nelson, \fIVirtual Memory vs. The File System\fR, Research Report
-90/4, Digital Equipment Corporation Western Research Laboratory, March 1990.
-.ip [Nowicki89] 15
-Bill Nowicki, Transport Issues in the Network File System, In \fIComputer
-Communication Review\fR, pg. 16-20, March 1989.
-.ip [Ousterhout90] 15
-John K. Ousterhout, Why Aren't Operating Systems Getting Faster As Fast as
-Hardware? In \fIProc. Summer 1990 USENIX Conference\fR, pg. 247-256, Anaheim,
-CA, June 1990.
-.ip [Sandberg85] 15
-Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon,
-Design and Implementation of the Sun Network filesystem, In \fIProc. Summer
-1985 USENIX Conference\fR, pages 119-130, Portland, OR, June 1985.
-.ip [Srinivasan89] 15
-V. Srinivasan and Jeffrey. C. Mogul, Spritely NFS: Experiments with
-Cache-Consistency Protocols,
-In \fIProc. of the
-Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park,
-AZ, Dec. 1989.
-.ip [Steiner88] 15
-J. G. Steiner, B. C. Neuman and J. I. Schiller, Kerberos: An Authentication
-Service for Open Network Systems,
-In \fIProc. Winter 1988 USENIX Conference,\fR pg. 191-202, Dallas, TX, February
-1988.
-.ip [SUN89] 15
-Sun Microsystems Inc., \fINFS: Network File System Protocol Specification\fR,
-ARPANET Working Group Requests for Comment, DDN Network Information Center,
-SRI International, Menlo Park, CA, March 1989, RFC-1094.
-.ip [SUN93] 15
-Sun Microsystems Inc., \fINFS: Network File System Version 3 Protocol Specification\fR,
-Sun Microsystems Inc., Mountain View, CA, June 1993.
-.ip [Wittle93] 15
-Mark Wittle and Bruce E. Keith, LADDIS: The Next Generation in NFS File
-Server Benchmarking,
-In \fIProc. Summer 1993 USENIX Conference,\fR pg. 111-128, Cincinnati, OH, June
-1993.
-.(f
-\(mo
-NFS is believed to be a trademark of Sun Microsystems, Inc.
-.)f
-.(f
-\(dg
-Prestoserve is a trademark of Legato Systems, Inc.
-.)f
-.(f
-\(sc
-MIPS is a trademark of Silicon Graphics, Inc.
-.)f
-.(f
-\(dg
-DECstation, MicroVAXII and Ultrix are trademarks of Digital Equipment Corp.
-.)f
-.(f
-\(dd
-Unix is a trademark of Novell, Inc.
-.)f
OpenPOWER on IntegriCloud