summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* - In reassignbuf() don't unlock vp and lock newvp if they are the same.jeff2003-09-201-6/+11
| | | | | | | | Doing so creates a race where the buf is on neither list. - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we should. - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this really should not happen and is fast to check.
* - Remove spls(). The locking that has replaced them is in place and theyjeff2003-09-191-62/+6
| | | | no longer serve as guidelines for future work.
* Eliminate one case of VI_UNLOCK followed by an immediatekan2003-09-191-3/+2
| | | | VI_LOCK.
* Consistently use the BSD u_int and u_short instead of the SYSV uint andjhb2003-08-071-1/+1
| | | | | | | ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)
* Revert stuff which accidentally ended up in the previous commit.phk2003-07-221-7/+8
|
* Don't attempt to inline large functions mb_alloc() and mb_free(),phk2003-07-221-8/+7
| | | | | | it more than doubles the text size of this file. GCC has wisely ignored us on this previously
* Use __FBSDID().obrien2003-06-111-1/+4
|
* Remove unused variable and now unbalanced call to splbio();phk2003-05-311-2/+0
| | | | Found by: FlexeLint
* Make the maximum number of vnodes a function of both the physical memoryalc2003-05-231-1/+10
| | | | | | | | | | | | | | | | | size and the kernel's heap size, specifically, vm_kmem_size. This function allows a maximum of 40% of the vm_kmem_size to be used for vnodes and vm objects. This is a conservative bound based upon recent problem reports. (In other words, a slight increase in this percentage may be safe.) Finally, machines with less than ~3GB of RAM should be unaffected by this change, i.e., the maximum number of vnodes should remain the same. If necessary, machines with 3GB or more of RAM can increase the maximum number of vnodes by increasing vm_kmem_size. Desired by: scottl Tested by: jake Approved by: re (rwatson,scottl)
* Detect that a vnode has been reclaimed while vflush() was waiting to locktruckman2003-05-161-0/+11
| | | | | | | | | the vnode and restart the loop. Vflush() is vulnerable since it does not hold a reference to the vnode and it holds no other locks while waiting for the vnode lock. The vnode will no longer be on the list when the loop is restarted. Approved by: re (rwatson)
* Optimize the use of splay in gbincore(). During a "make buildworld" thealc2003-05-131-7/+22
| | | | | | | | desired buffer is found at one of the roots more than 60% of the time. Thus, checking both roots before performing either splay eliminates unnecessary splays on the first tree splayed. Approved by: re (jhb)
* Remove bogus locking from DDB's "show lockedvnods" command: usingrwatson2003-05-121-11/+7
| | | | | | | | | | | | | synchronization primitives from inside DDB is generally a bad idea, and in this case it frequently results in panics due to DDB commands being executed from the sio fast interrupt context on a serial console. Replace the locking with a note that a lack of locking means that DDB may get see inconsistent views of the mount and vnode lists, which could also result in a panic. More frequently, though, this avoids a panic than causes it. Discussed with ages ago: bde Approved by: re (scottl)
* - Revert kern/vfs_subr.c revision 1.444. The vm_object's size isn'talc2003-05-031-1/+1
| | | | | | | | | trustworthy for vnode-backed objects. - Restore the old behavior of vm_object_page_remove() when the end of the given range is zero. Add a comment to vm_object_page_remove() regarding this behavior. Reported by: iedowse
* Lock accesses to the vm_object's ref_count and resident_page_count.alc2003-05-011-5/+9
|
* Various changes to vm_object_page_remove():alc2003-04-261-1/+1
| | | | | | | | | | - Eliminate an odd, special-case feature: if start == end == 0 then all pages are removed. Only one caller used this feature and that caller can trivially pass the object's size. - Assert that the vm_object is locked on entry; don't bother testing for a NULL vm_object. - Style: Fix lines that are longer than 80 characters.
* - Convert vm_object_pip_wait() from using tsleep() to msleep().alc2003-04-261-0/+2
| | | | | - Make vm_object_pip_sleep() static. - Lock the vm_object when performing vm_object_pip_wait().
* - Acquire the vm_object's lock when performing vm_object_page_clean().alc2003-04-241-0/+2
| | | | | | - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)
* Update locking around vm_object_page_remove() to use the new macros.alc2003-04-181-2/+2
|
* Use vm_object_pip_wait() rather than reimplementing it.alc2003-04-131-2/+1
|
* Adjust the number of vnodes scanned by vlrureclaim() according to thetegge2003-03-261-8/+11
| | | | size of the vnode list.
* We shouldn't assert that a vode is locked in vop_lock_post()yar2003-03-221-1/+2
| | | | | | if VOP_LOCK() has failed. Reviewed by: jeff
* - Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not beenjeff2003-03-131-14/+17
| | | | | | | possible for some time. - Lock the buf before accessing fields. This should very rarely be locked. - Assert that B_DELWRI is set after we acquire the buf. This should always be the case now.
* - Remove a race between fsync like functions and flushbufqueues() byjeff2003-03-131-1/+0
| | | | | | | | | | | requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.
* Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.alc2003-03-061-6/+0
| | | | Discussed on: arch@
* Finish cleanup of vprint() which was begun with changing v_tag to a string.njl2003-03-031-25/+3
| | | | | | Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.
* - Hold the vnode interlock across calls to bgetvp instead of acquiring itjeff2003-03-021-2/+1
| | | | | internally. This is required to stop multiple bufs from being associated with a single lblkno.
* - gc USE_BUFHASH. The smp locking of the buf cache renders this useless.jeff2003-03-011-4/+0
|
* Prevent large files from monopolizing the system buffers. Keepmckusick2003-02-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.
* - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.jeff2003-02-251-41/+34
| | | | | | | | | | - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick
* Bracket the kern.vnode sysctl in #ifdef notyet because it resultsphk2003-02-231-0/+2
| | | | | | | in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems.
* Back out M_* changes, per decision of the TRB.imp2003-02-191-4/+4
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-4/+4
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVEiedowse2002-12-291-15/+38
| | | | | | | | | | | | | | | | | | | call is in progress on the vnode. When vput() or vrele() sees a 1->0 reference count transition, it now return without any further action if this flag is set. This flag is necessary to avoid recursion into VOP_INACTIVE if the filesystem inactive routine causes the reference count to increase and then drop back to zero. It is also used to guarantee that an unlocked vnode will not be recycled while blocked in VOP_INACTIVE(). There are at least two cases where the recursion can occur: one is that the softupdates code called by ufs_inactive() via ffs_truncate() can call vput() on the vnode. This has been reported by many people as "lockmgr: draining against myself" panics. The other case is that nfs_inactive() can call vget() and then vrele() on the vnode to clean up a sillyrename file. Reviewed by: mckusick (an older version of the patch)
* Use a timeout of one second while we wait for the vnode washer,phk2002-12-291-1/+1
| | | | | this prevents a potential race and makes the system a little bit less jerky under extreme loads.
* Vnodes pull in 800-900 bytes these days, all things counted, so we needphk2002-12-291-5/+15
| | | | | | | | | | | | | | | | to treat desiredvnodes much more like a limit than as a vague concept. On a 2GB RAM machine where desired vnodes is 130k, we run out of kmem_map space when we hit about 190k vnodes. If we wake up the vnode washer in getnewvnode(), sleep until it is done, so that it has a chance to offer us a washed vnode. If we don't sleep here we'll just race ahead and allocate yet a vnode which will never get freed. In the vnodewasher, instead of doing 10 vnodes per mountpoint per rotation, do 10% of the vnodes distributed evenly across the mountpoints.
* KASSERT that vop_revoke() gets a VCHR.phk2002-12-281-1/+2
|
* Perform vm_object_lock() and vm_object_unlock() aroundalc2002-12-151-0/+2
| | | | vm_object_page_remove().
* To avoid lock order reversals in getnewvnode(), the call to uma_zfree()alc2002-12-081-3/+11
| | | | | | | must be delayed until the vnode interlock is released. Reported by: kris@ Approved by: re (jhb)
* Do not set a variable (vp->p_pollinfo) to NULL if we knowrobert2002-11-271-1/+1
| | | | | | it already has that value. Approved by: re
* Slightly change the semantics of vnode labels for MAC: rather thanrwatson2002-10-261-0/+2
| | | | | | | | | | | | | | | | | | | | | "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* In vrele() we can actually have a VCHR with v_rdev == NULL if wephk2002-10-251-1/+1
| | | | came from the bottom of addaliasu(). Don't panic.
* Within ufs, the ffs_sync and ffs_fsync functions did not alwaysmckusick2002-10-251-3/+3
| | | | | | | | | | | | check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.
* Fix the spechash lock order reversal by keeping an updated sumphk2002-10-241-19/+28
| | | | | | | of v_usecount in the dev_t which vcount() can return without locking any vnodes. Seen by: jhb
* When scanning the freelist looking for candidate vnodes to recycle,mckusick2002-10-141-4/+3
| | | | | | | | | be sure to exit the loop with vp == NULL if no candidates are found. Formerly, this bug would cause the last vnode inspected to be used, even if it was not available. The result was a panic "vn_finished_write: neg cnt". Sponsored by: DARPA & NAI Labs.
* Unconditionally reset vp->v_vnlock back to the default in themckusick2002-10-141-1/+3
| | | | | | | | | | vclean() function (e.g., vp->v_vnlock = &vp->v_lock) rather than requiring filesystems that use alternate locks to do so in their vop_reclaim functions. This change is a further cleanup of the vop_stdlock interface. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Sponsored by: DARPA & NAI Labs.
* Regularize the vop_stdlock'ing protocol across all the filesystemsmckusick2002-10-141-9/+8
| | | | | | | | | | | | | | | | | | | | that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.
* When considering a vnode for reuse in getnewvnode, we callmckusick2002-10-111-13/+18
| | | | | | | | | | | | | | | | | vcanrecycle to check a free vnode's availability. If it is available, vcanrecycle returns an error code of zero and the vnode in question locked. The getnewvnode routine then used to call vn_start_write with the V_NOWAIT flag. If the filesystem was suspended while taking a snapshot, the vn_start_write would fail but getnewvnode would fail to unlock the vnode, instead leaving it locked on the freelist. The result would be that the vnode would be locked forever and would eventually hang the system with a race to the root when it was attempted to recycle it. This fix moves the vn_start_write check into vcanrecycle where it will properly unlock the vnode if it is unavailable for recycling due to filesystem suspension. Sponsored by: DARPA & NAI Labs.
* Fix problem introduced in rev.1.406, which can cause already unlockedsobomax2002-10-051-0/+1
| | | | mutex being unlocked again causing system panic.
* Fix some harmless mis-indents.phk2002-10-011-1/+1
| | | | Spotted by: FlexeLint
* Move vnode MAC label initialization to after the release of the vnoderwatson2002-09-301-3/+3
| | | | | | | | | | | interlock in getnewvnode() to avoid possible sleeps while holding the mutex. Note that the warning from Witness is a slight false positive since we know there will be no contention on the interlock since we haven't made the vnode available for use yet, but the theory is not a bad one. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
OpenPOWER on IntegriCloud