summaryrefslogtreecommitdiffstats
path: root/sys
diff options
context:
space:
mode:
authordillon <dillon@FreeBSD.org>2002-06-19 09:39:41 +0000
committerdillon <dillon@FreeBSD.org>2002-06-19 09:39:41 +0000
commit7aa1bb9d056544a059b4be0616b86179bc88e108 (patch)
tree82e8afa76d01b565de9d16bba1f66d0f11988324 /sys
parent80af7bc382685a69865e5f3cf600e8f8efb39dd1 (diff)
downloadFreeBSD-src-7aa1bb9d056544a059b4be0616b86179bc88e108.zip
FreeBSD-src-7aa1bb9d056544a059b4be0616b86179bc88e108.tar.gz
In rev 1.72 a situation related to write/mmap was fixed which could result
in a user process gaining visibility into the 'old' contents of a filesystem block. There were two cases: (1) when uiomove() fails (user process issues illegal write), and (2) when uiomove() overlaps a mmap() of the same file at the same offset (fault -> recursive buffer I/O reads contents of old block). Unfortunately 1.72 also had the unintended effect of forcing the filesystem to do a read-before-write in the case of a full-block-write (non append case), e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This destroys performance.. not only is a read forced for every write, but clustering breaks as well. The solution is to clear the buffer manually in the full-block case rather then asking BALLOC to do it (BALLOC issues the read-before-write). In the partial-block case we want BALLOC to do it because the read-before-write is necessary. This patch should greatly improve database and news-feed server performance. Found by: MKI <mki@mozone.net> MFC after: 3 days
Diffstat (limited to 'sys')
-rw-r--r--sys/ufs/ufs/ufs_readwrite.c18
1 files changed, 11 insertions, 7 deletions
diff --git a/sys/ufs/ufs/ufs_readwrite.c b/sys/ufs/ufs/ufs_readwrite.c
index af7298c..c76081f 100644
--- a/sys/ufs/ufs/ufs_readwrite.c
+++ b/sys/ufs/ufs/ufs_readwrite.c
@@ -491,23 +491,27 @@ WRITE(ap)
vnode_pager_setsize(vp, uio->uio_offset + xfersize);
/*
- * Avoid a data-consistency race between write() and mmap()
- * by ensuring that newly allocated blocks are zerod. The
- * race can occur even in the case where the write covers
- * the entire block.
+ * We must perform a read-before-write if the transfer size
+ * does not cover the entire buffer.
*/
- flags |= B_CLRBUF;
-#if 0
if (fs->fs_bsize > xfersize)
flags |= B_CLRBUF;
else
flags &= ~B_CLRBUF;
-#endif
/* XXX is uio->uio_offset the right thing here? */
error = UFS_BALLOC(vp, uio->uio_offset, xfersize,
ap->a_cred, flags, &bp);
if (error != 0)
break;
+ /*
+ * If the buffer is not valid we have to clear out any
+ * garbage data from the pages instantiated for the buffer.
+ * If we do not, a failed uiomove() during a write can leave
+ * the prior contents of the pages exposed to a userland
+ * mmap(). XXX deal with uiomove() errors a better way.
+ */
+ if ((bp->b_flags & B_CACHE) == 0 && fs->fs_bsize <= xfersize)
+ vfs_bio_clrbuf(bp);
if (ioflag & IO_DIRECT)
bp->b_flags |= B_DIRECT;
if (ioflag & IO_NOWDRAIN)
OpenPOWER on IntegriCloud