summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Add a bandaid to avoid a deadlock in a situation, when we are trying to suspendpjd2006-08-091-0/+10
| | | | | | | | | | | | | | a file system, but need to obtain a vnode. We may not be able to do it, because all vnodes could be already in use and other processes cannot release them, because they are waiting in "suspfs" state. In such situation, we allow to allocate a vnode anyway. This is a temporary fix - there is no backpressure to free vnodes allocated in those circumstances. MFC after: 1 week Reviewed by: tegge
* Improve commenting of vaccess(), making sure to be clear that the ifdefrwatson2006-08-061-4/+10
| | | | | capabilities code is there for reference and never actually used. Slight style tweak.
* Enable debug.mpsafevfs by default on arm. Since every architecture exceptalc2006-07-151-2/+1
| | | | | | | powerpc has debug.mpsafevfs enabled by default, it is shorter to enumerate the architectures on which debug.mpsafevfs is off. Tested by: cognet@
* Back out my rev. 1.674. The better fix (rev. 1.637) is already in tree.kib2006-07-051-3/+3
| | | | Approved by: kan (mentor)
* Backed out the change by request from rwatson.babkin2006-06-261-72/+0
| | | | PR: kern/14584
* The common UID/GID space implementation. It has been discussed on -archbabkin2006-06-251-0/+72
| | | | | | | | | | in 1999, and there are changes to the sysctl names compared to PR, according to that discussion. The description is in sys/conf/NOTES. Lines in the GENERIC files are added in commented-out form. I'll attach the test script I've used to PR. PR: kern/14584 Submitted by: babkin
* Fix the LOR that occurs when the MAC compiled into the kernelkib2006-06-081-3/+3
| | | | | | | | | and vnode is destroyed. Reviewed by: rwatson LOR: 189 MFC after: 2 weeks Approved by: kan (mentor)
* Do not set B_NOCACHE on buffers when releasing them in flushbuflist().ups2006-05-251-1/+1
| | | | | | | | | | | | | | | If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days
* Remove various bits of conditional Alpha code and fixup a few comments.jhb2006-05-121-1/+1
|
* vn_start_write()/vn_finished_write() is not needed here, becausepjd2006-04-291-2/+0
| | | | | | | | vn_start_write() is always called earlier in the code path and calling the function recursively may lead to a deadlock. Confirmed by: tegge MFC after: 2 weeks
* - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all childjeff2006-04-281-1/+2
| | | | | | buffers to go on the buf daemon's DIRTYGIANT queue. - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler runs in the context of the buf daemon and may require Giant.
* - VFS_LOCK_GIANT when recycling a vnode via getnewvnode. We may bejeff2006-04-041-0/+3
| | | | | | | | recycling for an unrelated filesystem. I really don't like potentially acquiring giant in the context of a giantless filesystem but there are reasonable objections to removing the recycling from this path. Sponsored by: Isilon Systems, Inc.
* - Add an assert to vgone. It is illegal to call vgone without a referencejeff2006-03-311-3/+0
| | | | | | | to the vnode. Without a reference the vnode will never be vdestroy'd and the memory will never be reclaimed. Sponsored by: Isilon Systems, Inc.
* - Hold a reference from the time vfs_busy starts until vfs_unbusy isjeff2006-03-311-3/+9
| | | | | | | | | | | called. - vfs_getvfs has to return a reference to prevent the returned mountpoint from changing identities. - Release references acquired via vfs_getvfs. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
* - Add the B_NEEDSGIANT flag which is only set if the vnode that owns a bufjeff2006-03-311-0/+3
| | | | | | | | | requires Giant. It is set in bgetvp and cleared in brelvp. - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant. - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and only if we think there are buffers in that queue. Sponsored by: Isilon Systems, Inc.
* - Correct an assert in vop_rename_pre. fdvp may be locked if it is eitherjeff2006-03-191-1/+1
| | | | | | | the target directory or file. This case should fail in the filesystem anyway and perhaps kern_rename() should catch it. Sponsored by: Isilon Systems, Inc.
* Use vn_start_secondary_write() and vn_finished_secondary_write() as ategge2006-03-081-4/+20
| | | | | | | | | | | replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
* Eliminate a deadlock when creating snapshots. Blocking vn_start_write() musttegge2006-03-021-0/+2
| | | | | | be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.
* Don't try to show marker nodes.tegge2006-03-021-1/+1
|
* - Move softdep from using a global worklist to per-mount worklists. Thisjeff2006-03-021-10/+0
| | | | | | | | | | | | | | | | | | | | | | | has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris
* - Release the mount ref once the vnode has been recycled rather than oncejeff2006-02-231-3/+2
| | | | | | | | | the last reference is dropped. I forgot that vnodes can stick around for a very long time until processes discover that they are dead. This means that a vnode reference is not sufficient to keep the mount referenced and even more code will be required to ref mount points. Discovered by: kris
* - Grab a mnt ref in vfs_busy() before dropping the interlock. This willjeff2006-02-221-1/+6
| | | | | | | | prevent the mount point from going away while we're waiting on the lock. The ref does not need to persist once we have the lock because the lock prevents the mount point from being unmounted. MFC After: 1 week
* - Add a ref count to the mount structure. Sleep for up to 3 seconds injeff2006-02-061-6/+8
| | | | | | | | | | | | | | | | vfs_mount_destroy waiting for this ref to hit 0. We don't print an error if we are rebooting as the root mount always retains some refernces by init proc. - Acquire a mnt ref for every vnode allocated to a mount point. Drop this ref only once vdestroy() has been called and the mount has been freed. - No longer NULL the v_mount pointer in delmntque() so that we may release the ref after vgone() has been called. This allows us to guarantee that the mount point structure will be valid until the last vnode has lost its last ref. - Fix a few places that rely on checking v_mount to detect recycling. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
* - Solve a race where we could lose a call to VOP_INACTIVE. If vget() waitingjeff2006-02-011-12/+30
| | | | | | | | | | | | | on a lock held the last usecount ref on a vnode and the lock failed we would not call INACTIVE. Solve this by only holding a holdcnt to prevent the vnode from disappearing while we wait on vn_lock. Other callers may now VOP_INACTIVE while we are waiting on the lock, however this race is acceptable, while losing INACTIVE is not. Discussed with: kan, pjd Tested by: kkenn Sponsored by: Isilon Systems, Inc. MFC After: 1 week
* Back out r1.653; it turns out that the race (or at least the printf) iskris2006-01-281-20/+0
| | | | | | actually not hard to trigger, and it can cause a lot of console spam. Approved by: kan
* Convert remaining functions in vfs_subr.c from K&R prototypes to ANSI Crwatson2006-01-211-82/+34
| | | | | | | | | | prototypes, as the majority of new functions added have been in this style. Changing prototype style now results in gcc noticing that the implementation of vn_pollrecord() has a 'short' argument instead of 'int' as prototyped in vnode.h, so correct that definition. In practice this didn't matter as only poll flags in the lower 16 bits are used. MFC after: 1 week
* Add marker vnodes to ensure that all vnodes associated with the mount point aretegge2006-01-091-22/+17
| | | | | | iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
* Print a warning when we miss vinactive() call, because of race in vget().pjd2005-12-291-0/+20
| | | | | | | | | | | The race is very real, but conditions needed for triggering it are rather hard to meet now. When gjournal will be committed (where it is quite easy to trigger) we need to fix it. For now, verify if it is really hard to trigger. Discussed with: kan
* This is a workaround for a complicated issue involving VFS cookies and devfs.dwhite2005-11-091-0/+4
| | | | | | | | | | | | | The PR and patch have the details. The ultimate fix requires architectural changes and clarifications to the VFS API, but this will prevent the system from panicking when someone does "ls /dev" while running in a shell under the linuxulator. This issue affects HEAD and RELENG_6 only. PR: 88249 Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com> MFC after: 3 days
* Normalize a significant number of kernel malloc type names:rwatson2005-10-311-1/+1
| | | | | | | | | | | | | | | | | | | - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
* mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months,kris2005-10-141-1/+1
| | | | | | | | so we are ready for mpsafevfs=1 by default on sparc64 too. I have been running this on all my sparc64 machines for over 6 months, and have not encountered MD problems. MFC after: 1 week
* Move execve's access time update functionality into a newdds2005-10-121-0/+17
| | | | | | | | vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks
* Un-staticize runningbufwakeup() and staticize updateproc.truckman2005-09-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>
* Break out of loop if next buffer pointer has become invalid while flushingtegge2005-09-161-0/+15
| | | | | | current buffer. Reviewed by: kan
* In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupportedrwatson2005-09-121-1/+1
| | | | | | kqueue filter type is requested on a vnode. MFC after: 3 days
* use monotonic `time_uptime' instead of `time_second'jkim2005-09-121-4/+4
| | | | | Approved by: anholt (mentor) Discussed on: arch
* Introduce vfs_read_dirent() which can help VOP_READDIR() implementationsphk2005-09-121-0/+27
| | | | by handling all the cookie stuff.
* Fix a typo in vop_rename_pre() where we ended up using vholdl()ssouhlal2005-08-281-1/+1
| | | | | | instead of vhold(), even though the vnode interlock is unlocked. MFC after: 3 days
* Back out the removal of LK_NOWAIT from the VOP_LOCK() call intruckman2005-08-231-7/+37
| | | | | | | | | | | | | | | | | | vlrureclaim() in vfs_subr.c 1.636 because waiting for the vnode lock aggravates an existing race condition. It is also undesirable according to the commit log for 1.631. Fix the tiny race condition that remains by rechecking the vnode state after grabbing the vnode lock and grabbing the vnode interlock. Fix the problem of other threads being starved (which 1.636 attempted to fix by removing LK_NOWAIT) by calling uio_yield() periodically in vlrureclaim(). This should be more deterministic than hoping that VOP_LOCK() without LK_NOWAIT will block, which may not happen in this loop. Reviewed by: kan MFC after: 5 days
* Silence "busy" warnings when unmounting devfs at system shutdown. Thisrwatson2005-08-201-6/+16
| | | | | | | | | | | | is a workaround for non-symetric teardown of the file systems at shutdown with respect to the mount order at boot. The proper long term fix is to properly detach devfs from the root mount before unmounting each, and should be implemented, but since the problem is non-harmful, this temporary band-aid will prevent false positive bug reports and unnecessary error output for 6.0-RELEASE. MFC after: 3 days Tested by: pav, pjd
* Make mpsafe_vfs=1 the default on ia64.marcel2005-08-131-1/+2
|
* Do not drop the vnode interlock if vdropl is called on already doomed vnode.kan2005-08-101-3/+1
| | | | | | vdropl callers expect it to return with interlock still being held. MFC after: 2 days
* Holding a vnode doesn't prevent v_mount from disappearing (when thessouhlal2005-08-061-0/+2
| | | | | | | | | | | | vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days
* - Unlock before we call mac_destroy_vnode to prevent a lock order reversal.jeff2005-08-031-0/+1
| | | | Found by: trhodes
* - Allow vnlru to drop giant if the filesystem does not require it. Thejeff2005-07-201-2/+11
| | | | | | | | | | | | | | vnlru proc is extremely inefficient, potentially iteration over tens of thousands of vnodes without blocking. Droping Giant allows other threads to preempt us although we should revisit the algorithm to fix the runtime problems especially since this may hold up all vnode allocations. - Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim. This provides a natural blocking point to help alleviate the situation described above although it may not technically be desirable. - yield after we make a pass on all mount points to prevent us from blocking other threads which require Giant. MFC after: 2 weeks
* Fix one "wrong b_bufobj" panic in reassignbuf() by moving VI_UNLOCK(vp)pjd2005-07-051-1/+1
| | | | | | | | below KASSERT()s, which means there was no real problem here, we just needed better locking for assertions. OK'ed by: jeff Approved by: re (scottl)
* Fix the recent panics/LORs/hangs created by my kqueue commit by:ssouhlal2005-07-011-23/+49
| | | | | | | | | | | | | | | | | - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
* - Try to catch the wrong bufobj panics a little earlier. I believe theyjeff2005-06-181-0/+5
| | | | | | | | | | are actually caused by a buf with both VNCLEAN and VNDIRTY set. In the traces it is clear that the buf is removed from the dirty queue while it is actually on the clean queue which leaves the tail pointer set. Assert that both flags are not set in buf_vlist_add and buf_vlist_remove. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)
* - Change holdcnt use around vnode recycling. We now always keep a holdcntjeff2005-06-161-202/+198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ref while we're calling vgone(). This prevents transient refs from re-adding us to the free list. Previously, a vfree() triggered via vinvalbuf() getting rid of all of a vnode's pages could place a partially destructed vnode on the free list where vtryrecycle() could find it. The first call to vtryrecycle would hang up on the vnode lock, but when it failed it would place a now dead vnode onto the free list, and another call to vtryrecycle() would free an already free vnode. There were many complications of having a zero ref count while freeing which can now go away. - Change vdropl() to release the interlock before returning. All callers now respect this, so vdropl() directly frees VI_DOOMED vnodes once the last ref is dropped. This means that we'll never have VI_DOOMED vnodes on the free list. - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and v_decr_useonly(). The incr/decr split is so that incr usecount can return with the interlock still held while decr drops the interlock so it can call vdropl() which will potentially free the vnode. The calling function can't drop the lock of an already free'd node. v_decr_useonly() drops a usecount without droping the hold count. This is done so the usecount reaches zero in vput() before we recycle, however the holdcount is still 1 which prevents any new references from placing the vnode back on the free list. - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget(). We wouldn't want vnlrureclaim() to bump the usecount since this has different semantics. Also change vnlrureclaim() to do a NOWAIT on the vn_lock. When this function runs we're usually in a desperate situation and we wouldn't want to wait for any specific vnode to be released. - Fix a bunch of misc comments to reflect the new behavior. - Add vhold() and vdrop() to vflush() for the same reasons that we do in vlrureclaim(). Previously we held no reference and a vnode could have been freed while we were waiting on the lock. - Get rid of vlruvp() and vfreehead(). Neither are used. vlruvp() should really be rethought before it's reintroduced. - vgonel() always returns with the vnode locked now and never puts the vnode back on a free list. The vnode will be freed as soon as the last reference is released. Sponsored by: Isilon Systems, Inc. Debugging help from: Kris Kennaway, Peter Holm Approved by: re (blanket vfs)
* - In reassignbuf() add many asserts to validate the head and tail pointersjeff2005-06-141-18/+29
| | | | | | | | | | | of the clean and dirty lists. This is in an attempt to catch the wrong bufobj problem sooner. - In vgonel() don't acquire an extra reference in the active case, the vnode lock and VI_DOOMED protect us from recursively cleaning. - Also in vgonel() clean up some stale comments. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)
OpenPOWER on IntegriCloud