Synopsis of problem being fixed: Dan Nelson originally reported that

blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
author: dillon <dillon@FreeBSD.org> 1999-12-12 06:09:57 +0000
committer: dillon <dillon@FreeBSD.org> 1999-12-12 06:09:57 +0000
commit: 1da218306142327914ba163494fe5569fc587d6d (patch)
tree: ee9e75232133b77df56df727656abe5a80829194 /sys/nfsclient/nfs_bio.c
parent: af2fc77e5c1d88a74bf1b5f6def6f7c8209bdcad (diff)
download: FreeBSD-src-1da218306142327914ba163494fe5569fc587d6d.zip
FreeBSD-src-1da218306142327914ba163494fe5569fc587d6d.tar.gz
1 files changed, 26 insertions, 10 deletions
diff --git a/sys/nfsclient/nfs_bio.c b/sys/nfsclient/nfs_bio.c
index 63cc67e..10be58d 100644
--- a/sys/nfsclient/nfs_bio.c
+++ b/sys/nfsclient/nfs_bio.c
@@ -928,7 +928,15 @@ again:
 		}
 
 		error = uiomove((char *)bp->b_data + on, n, uio);
-		bp->b_flags &= ~B_NEEDCOMMIT;
+
+		/*
+		 * Since this block is being modified, it must be written
+		 * again and not just committed.  Since write clustering does
+		 * not work for the stage 1 data write, only the stage 2
+		 * commit rpc, we have to clear B_CLUSTEROK as well.
+		 */
+		bp->b_flags &= ~(B_NEEDCOMMIT | B_CLUSTEROK);
+
 		if (error) {
 			bp->b_flags |= B_ERROR;
 			brelse(bp);
@@ -951,12 +959,6 @@ again:
 		}
 
 		/*
-		 * Since this block is being modified, it must be written
-		 * again and not just committed.
-		 */
-		bp->b_flags &= ~B_NEEDCOMMIT;
-
-		/*
 		 * If the lease is non-cachable or IO_SYNC do bwrite().
 		 *
 		 * IO_INVAL appears to be unused.  The idea appears to be
@@ -986,10 +988,18 @@ again:
 
 /*
  * Get an nfs cache block.
+ *
  * Allocate a new one if the block isn't currently in the cache
  * and return the block marked busy. If the calling process is
  * interrupted by a signal for an interruptible mount point, return
  * NULL.
+ *
+ * The caller must carefully deal with the possible B_INVAL state of
+ * the buffer.  nfs_doio() clears B_INVAL (and nfs_asyncio() clears it
+ * indirectly), so synchronous reads can be issued without worrying about
+ * the B_INVAL state.  We have to be a little more careful when dealing
+ * with writes (see comments in nfs_write()) when extending a file past
+ * its EOF.
  */
 static struct buf *
 nfs_getcacheblk(vp, bn, size, p)
@@ -1359,7 +1369,7 @@ nfs_doio(bp, cr, p)
 		    bp->b_flags &= ~B_WRITEINPROG;
 		    if (retv == 0) {
 			    bp->b_dirtyoff = bp->b_dirtyend = 0;
-			    bp->b_flags &= ~B_NEEDCOMMIT;
+			    bp->b_flags &= ~(B_NEEDCOMMIT | B_CLUSTEROK);
 			    bp->b_resid = 0;
 			    biodone(bp);
 			    return (0);
@@ -1397,7 +1407,13 @@ nfs_doio(bp, cr, p)
 		 * When setting B_NEEDCOMMIT also set B_CLUSTEROK to try
 		 * to cluster the buffers needing commit.  This will allow
 		 * the system to submit a single commit rpc for the whole
-		 * cluster.
+		 * cluster.  We can do this even if the buffer is not 100% 
+		 * dirty (relative to the NFS blocksize), so we optimize the
+		 * append-to-file-case.
+		 *
+		 * (when clearing B_NEEDCOMMIT, B_CLUSTEROK must also be
+		 * cleared because write clustering only works for commit
+		 * rpc's, not for the data portion of the write).
 		 */
 
 		if (!error && iomode == NFSV3WRITE_UNSTABLE) {
@@ -1406,7 +1422,7 @@ nfs_doio(bp, cr, p)
 			&& bp->b_dirtyend == bp->b_bcount)
 			bp->b_flags |= B_CLUSTEROK;
 		} else {
-		    bp->b_flags &= ~B_NEEDCOMMIT;
+		    bp->b_flags &= ~(B_NEEDCOMMIT | B_CLUSTEROK);
 		}
 		bp->b_flags &= ~B_WRITEINPROG;
author	dillon <dillon@FreeBSD.org>	1999-12-12 06:09:57 +0000
committer	dillon <dillon@FreeBSD.org>	1999-12-12 06:09:57 +0000
commit	1da218306142327914ba163494fe5569fc587d6d (patch)
tree	ee9e75232133b77df56df727656abe5a80829194 /sys/nfsclient/nfs_bio.c
parent	af2fc77e5c1d88a74bf1b5f6def6f7c8209bdcad (diff)
download	FreeBSD-src-1da218306142327914ba163494fe5569fc587d6d.zip FreeBSD-src-1da218306142327914ba163494fe5569fc587d6d.tar.gz