summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_bio.c
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate the undocumented, experimental, non-delivering and highlyphk2000-03-161-26/+0
| | | | dangerous MAX_PERF option.
* Need to reset the buffer pointer to avoid reconsidering the same buffermckusick2000-01-181-0/+1
| | | | | | | again (without this the rollback analysis was being lost). Should reduce the write count for most workloads. Submitted by: Craig A Soules <soules+@andrew.cmu.edu>
* Give vn_isdisk() a second argument where it can return a suitable errno.phk2000-01-101-3/+3
| | | | Suggested by: bde
* Several performance improvements for soft updates have been added:mckusick2000-01-101-5/+144
| | | | | | | | | | | | | | | 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.
* Introduce a mechanism to suspend/resume system processes. Suspend syncerluoqi2000-01-071-10/+21
| | | | and bufdaemon prior to disk sync during system shutdown.
* Reimplement buf_daemon / getnewbuf() interaction for dealing withdillon1999-12-201-46/+43
| | | | | | | | stressful situations. buf_daemon now makes a distinction between being woken up and its sleep timing out, and as a consequence is now much better able to dynamically tune itself to its environment. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
* Collect read and write counts for filesystems. This new codemckusick1999-12-011-20/+0
| | | | | | | | | | | | | | drops the counting in bwrite and puts it all in spec_strategy. I did some tests and verified that the counts collected for writes in spec_strategy is identical to the counts that we previously collected in bwrite. We now also get read counts (async reads come from requests for read-ahead blocks). Note that you need to compile a new version of mount to get the read counts printed out. The old mount binary is completely compatible, the only reason to install a new mount is to get the read counts printed. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Convert various pieces of code to use vn_isdisk() rather than checkingphk1999-11-221-4/+4
| | | | | | | | for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos
* This is a partial commit of the patch from PR 14914:phk1999-11-161-3/+1
| | | | | | | | | | | | | Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
* Remove a #define which doesn't do miracles anymore.phk1999-10-301-1/+0
|
* useracc() the prequel:phk1999-10-291-1/+0
| | | | | | | | | | | Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
* Adjust the buffer cache to better handle small-memory machines. Adillon1999-10-241-56/+57
| | | | | | | | | | | | | slightly older version of this code was tested by BDE and I. Also fixes a lockup situation when kva gets too fragmented. Remove the maxvmiobufspace variable and sysctl, they are no longer used. Also cleanup (remove) #if 0 sections from prior commits. This code is more of a hack, but presumably the whole buffer cache implementation is going to be rewritten in the next year so it's no big deal.
* Trim unused options (or #ifdef for undoc options).peter1999-10-111-1/+0
| | | | Submitted by: phk
* Count bogus_page as wired.dt1999-09-301-0/+1
|
* Fix bug in brelse() regarding redirtying buffers on B_ERROR. brelse()dillon1999-09-201-3/+5
| | | | | | | | | | | | improperly ignored the B_INVAL flag when acting on the B_ERROR. If both B_INVAL and B_ERROR are set the buffer is typically out of the underlying device's block range and must be destroyed. If only B_ERROR is set (for a write), a write error occured and operation remains as it was before: the buffer must be redirtied to avoid corrupting the filesystem state. Reviewed by: David Greenman <dg@root.com> Submitted by: Tor.Egge@fast.no
* Allow getblk() to be called from an idle context (by panic() insideluoqi1999-09-031-4/+10
| | | | | | an interrupt handler). Reviewed by: dillon
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Cast pointers to uintptr_t instead of casting them to u_long, and/or vicebde1999-08-241-4/+5
| | | | versa. Cosmetic.
* Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,phk1999-08-081-2/+2
| | | | | | a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
* Add sysctl and support code to allow directories to be VMIO'd. The defaultalc1999-07-261-1/+4
| | | | | | setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
* bufhashinit() is called with a caddr_t and is expected to return thepeter1999-07-091-3/+3
| | | | same in both the alpha and i386 ports.
* Condition in KASSERT was reversed.mckusick1999-07-081-2/+2
|
* These changes appear to give us benefits with both small (32MB) andmckusick1999-07-081-167/+214
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* The buffer queue mechanism has been reformulated. Instead of havingmckusick1999-07-041-198/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache). Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block. The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future. reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system. The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks. A small race condition was fixed in getpbuf() in vm/vm_pager.c. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Hopefully fix the remaining glitches with the BUF_*() changes. This shouldpeter1999-06-291-2/+3
| | | | | | | | | | | | | (really this time) fix pageout to swap and a couple of clustering cases. This simplifies BUF_KERNPROC() so that it unconditionally reassigns the lock owner rather than testing B_ASYNC and having the caller decide when to do the reassign. At present this is required because some places use B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also, vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers rather than the parent walking the cluster_head tailq. Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Fix a bug that was almost certainly making breadn() fail. BUF_KERNPROC()peter1999-06-281-2/+2
| | | | | was being called on the wrong bp - it should be called on the one that's just about to be fed to VOP_STRATEGY().
* GC the remnants of the old pre-softupdates update daemon. It's beenpeter1999-06-261-48/+1
| | | | #if 0'd for a fair while now.
* Convert buffer locking from using the B_BUSY and B_WANTED flags to usingmckusick1999-06-261-50/+59
| | | | | | | lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
* When allocating new buffers in getnewbuf, there are several pointsmckusick1999-06-221-11/+40
| | | | | | | | | | | | | at which we may sleep. So, after completing our buffer allocation we must ensure that another process has not come along and allocated a different buffer with the same identity. We do this by keeping a global counter of the number of buffers that getnewbuf has allocated. We save this count when we enter getnewbuf and check it when we are about to return. If it has changed, then other buffers were allocated while we were in getnewbuf, so we must return NULL to let our parent know that it must recheck to see if it still needs the new buffer. Hopefully this fix will eliminate the creation of duplicate buffers with the same identity and the obscure corruptions that they cause.
* Add a vnode argument to VOP_BWRITE to get rid of the last vnodemckusick1999-06-161-7/+7
| | | | | operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
* If we still haven't got a sufficient number of free buffers after thetegge1999-06-161-2/+2
| | | | | | call to flushdirtybuffers() then sleep in waitfreebuffers(). PR: 11697 Reviewed by: David Greenman, Matt Dillon
* Get rid of the global variable rushjob and replace it with a function inmckusick1999-06-151-3/+2
| | | | | kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.
* Try an fix a couple of dev_t/major/minor etc nits.peter1999-05-121-3/+4
|
* remove b_proc from struct buf, it's (now) unused.phk1999-05-061-3/+2
| | | | Reviewed by: dillon, bde
* Remove unused fields from struct buf:phk1999-05-061-6/+1
| | | | | | | | b_savekva b_validoff b_validend Reviewed by: dillon, bde
* The VFS/BIO subsystem contained a number of hacks in order to optimizealc1999-05-021-314/+516
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* Address a performance problem in getnewbuf:alc1999-04-291-5/+33
| | | | | | | | | | In heavy-writing situations, QUEUE_LRU can contain a large number of DELWRI buffers at its head. These buffers must be moved to the tail if they cannot be written async in order to reduce the scanning time required to skip past these buffers in later getnewbuf() calls. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* getnewbuf(): check return value from tsleep(). Interruptible NFS may passdt1999-04-141-7/+4
| | | | PCATCH to slpflag.
* Fix a performance problem with the new getnewbuf() code: in an outofspacealc1999-04-071-26/+30
| | | | | | | condition ( bufspace > hibufspace ), an inappropriate scan of the empty queue was performed looking for buffer space to free up. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* Catch a case spotted by Tor where files mmapped could leave garbage in thejulian1999-04-051-1/+17
| | | | | | | | | | | | unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
* Fixed a serious bug in rev.1.202. getnewbuf() sometimes didn'tbde1999-03-191-2/+2
| | | | | initialise bp->b_data. This tended to cause panics for file systems whose block size is smaller than one page.
* Reviewed by: Many at differnt times in differnt parts,julian1999-03-121-376/+660
| | | | | | | | | | | | | | | | | | | | | | including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
* Make comment match code.julian1999-03-021-3/+2
|
* Remove inapropriate use of VOP_ISLOCKED()julian1999-03-021-2/+2
| | | | | | | | | This produced races resulting in panics and filesystem corruptions under some circumstances. Reviewed by: luoqi chen <luoqi@freebsd.org> Reviewed by: Kirk McKusick <mckusick@mckusick.com> Submitted by: Matt Dillon <dillon@freebsd.org>
* Fix warnings in preparation for adding -Wall -Wcast-qual to thedillon1999-01-271-2/+2
| | | | kernel compile
* Don't try to calculate B_CACHE for an NFS related bp that has adillon1999-01-241-2/+5
| | | | | | | > 0 b_validend. This will screw up small-writes, causing lots of little writes out the network. We will assume that NFS handles B_CACHE properly.
* Fix an expression parenthesization typo in a conditional. It should notdillon1999-01-231-2/+10
| | | | | have any operational effects other then to make the code in question a little faster. Also added a more involved comment.
* Don't throw away the buffer contents on a fatal write error; just markdg1999-01-221-2/+5
| | | | | | | | the buffer as still being dirty. This isn't a perfect solution, but throwing away the buffer contents will often result in filesystem corruption and this solution will at least correctly deal with transient errors. Submitted by: Kirk McKusick <mckusick@mckusick.com>
* The main operational changes are in getblk()'s handling of thedillon1999-01-211-1/+1
| | | | | | | B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS. Also, a number of cases where manually inserted code has been removed and replaced with an inline function call giving us better functional isolation in the source.
* This is a rather large commit that encompasses the new swapper,dillon1999-01-211-44/+75
| | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
OpenPOWER on IntegriCloud