diff options
author | dillon <dillon@FreeBSD.org> | 2002-06-19 09:39:41 +0000 |
---|---|---|
committer | dillon <dillon@FreeBSD.org> | 2002-06-19 09:39:41 +0000 |
commit | 7aa1bb9d056544a059b4be0616b86179bc88e108 (patch) | |
tree | 82e8afa76d01b565de9d16bba1f66d0f11988324 | |
parent | 80af7bc382685a69865e5f3cf600e8f8efb39dd1 (diff) | |
download | FreeBSD-src-7aa1bb9d056544a059b4be0616b86179bc88e108.zip FreeBSD-src-7aa1bb9d056544a059b4be0616b86179bc88e108.tar.gz |
In rev 1.72 a situation related to write/mmap was fixed which could result
in a user process gaining visibility into the 'old' contents of a filesystem
block. There were two cases: (1) when uiomove() fails (user process issues
illegal write), and (2) when uiomove() overlaps a mmap() of the same file at
the same offset (fault -> recursive buffer I/O reads contents of old block).
Unfortunately 1.72 also had the unintended effect of forcing the filesystem
to do a read-before-write in the case of a full-block-write (non append case),
e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This
destroys performance.. not only is a read forced for every write, but
clustering breaks as well.
The solution is to clear the buffer manually in the full-block case rather
then asking BALLOC to do it (BALLOC issues the read-before-write). In the
partial-block case we want BALLOC to do it because the read-before-write
is necessary. This patch should greatly improve database and news-feed
server performance.
Found by: MKI <mki@mozone.net>
MFC after: 3 days
-rw-r--r-- | sys/ufs/ufs/ufs_readwrite.c | 18 |
1 files changed, 11 insertions, 7 deletions
diff --git a/sys/ufs/ufs/ufs_readwrite.c b/sys/ufs/ufs/ufs_readwrite.c index af7298c..c76081f 100644 --- a/sys/ufs/ufs/ufs_readwrite.c +++ b/sys/ufs/ufs/ufs_readwrite.c @@ -491,23 +491,27 @@ WRITE(ap) vnode_pager_setsize(vp, uio->uio_offset + xfersize); /* - * Avoid a data-consistency race between write() and mmap() - * by ensuring that newly allocated blocks are zerod. The - * race can occur even in the case where the write covers - * the entire block. + * We must perform a read-before-write if the transfer size + * does not cover the entire buffer. */ - flags |= B_CLRBUF; -#if 0 if (fs->fs_bsize > xfersize) flags |= B_CLRBUF; else flags &= ~B_CLRBUF; -#endif /* XXX is uio->uio_offset the right thing here? */ error = UFS_BALLOC(vp, uio->uio_offset, xfersize, ap->a_cred, flags, &bp); if (error != 0) break; + /* + * If the buffer is not valid we have to clear out any + * garbage data from the pages instantiated for the buffer. + * If we do not, a failed uiomove() during a write can leave + * the prior contents of the pages exposed to a userland + * mmap(). XXX deal with uiomove() errors a better way. + */ + if ((bp->b_flags & B_CACHE) == 0 && fs->fs_bsize <= xfersize) + vfs_bio_clrbuf(bp); if (ioflag & IO_DIRECT) bp->b_flags |= B_DIRECT; if (ioflag & IO_NOWDRAIN) |