summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* Slightly change the semantics of vnode labels for MAC: rather thanrwatson2002-10-261-0/+2
| | | | | | | | | | | | | | | | | | | | | "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* In vrele() we can actually have a VCHR with v_rdev == NULL if wephk2002-10-251-1/+1
| | | | came from the bottom of addaliasu(). Don't panic.
* Within ufs, the ffs_sync and ffs_fsync functions did not alwaysmckusick2002-10-251-3/+3
| | | | | | | | | | | | check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.
* Fix the spechash lock order reversal by keeping an updated sumphk2002-10-241-19/+28
| | | | | | | of v_usecount in the dev_t which vcount() can return without locking any vnodes. Seen by: jhb
* When scanning the freelist looking for candidate vnodes to recycle,mckusick2002-10-141-4/+3
| | | | | | | | | be sure to exit the loop with vp == NULL if no candidates are found. Formerly, this bug would cause the last vnode inspected to be used, even if it was not available. The result was a panic "vn_finished_write: neg cnt". Sponsored by: DARPA & NAI Labs.
* Unconditionally reset vp->v_vnlock back to the default in themckusick2002-10-141-1/+3
| | | | | | | | | | vclean() function (e.g., vp->v_vnlock = &vp->v_lock) rather than requiring filesystems that use alternate locks to do so in their vop_reclaim functions. This change is a further cleanup of the vop_stdlock interface. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Sponsored by: DARPA & NAI Labs.
* Regularize the vop_stdlock'ing protocol across all the filesystemsmckusick2002-10-141-9/+8
| | | | | | | | | | | | | | | | | | | | that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.
* When considering a vnode for reuse in getnewvnode, we callmckusick2002-10-111-13/+18
| | | | | | | | | | | | | | | | | vcanrecycle to check a free vnode's availability. If it is available, vcanrecycle returns an error code of zero and the vnode in question locked. The getnewvnode routine then used to call vn_start_write with the V_NOWAIT flag. If the filesystem was suspended while taking a snapshot, the vn_start_write would fail but getnewvnode would fail to unlock the vnode, instead leaving it locked on the freelist. The result would be that the vnode would be locked forever and would eventually hang the system with a race to the root when it was attempted to recycle it. This fix moves the vn_start_write check into vcanrecycle where it will properly unlock the vnode if it is unavailable for recycling due to filesystem suspension. Sponsored by: DARPA & NAI Labs.
* Fix problem introduced in rev.1.406, which can cause already unlockedsobomax2002-10-051-0/+1
| | | | mutex being unlocked again causing system panic.
* Fix some harmless mis-indents.phk2002-10-011-1/+1
| | | | Spotted by: FlexeLint
* Move vnode MAC label initialization to after the release of the vnoderwatson2002-09-301-3/+3
| | | | | | | | | | | interlock in getnewvnode() to avoid possible sleeps while holding the mutex. Note that the warning from Witness is a slight false positive since we know there will be no contention on the interlock since we haven't made the vnode available for use yet, but the theory is not a bad one. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Be consistent about "static" functions: if the function is markedphk2002-09-281-1/+1
| | | | | | static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
* - Move ASSERT_VOP_*LOCK* functionality into functions in vfs_subr.cjeff2002-09-261-29/+92
| | | | | | | - Make the VI asserts more orthogonal to the rest of the asserts by using a new, common vfs_badlock() function and adding a 'str' arg. - Adjust generated ASSERTS to match the new prototype. - Adjust explicit ASSERTS to match the new prototype.
* - Lock down the syncer with sync_mtx.jeff2002-09-251-74/+188
| | | | | | | | | | | | | | | | | | | - Enable vfs_badlock_mutex by default. - Assert that the vp is locked in VOP_UNLOCK. - Use standard interlock macros in remaining code. - Correct a race in getnewvnode(). - Lock access to v_numoutput with interlock. - Lock access to buf lists and splay tree with interlock. - Add VOP and VI asserts. - Lock b_vnbufs with the vnode interlock. - Add vrefcnt() for callers who want to retreive the vnode ref without holding a lock. Add a comment that describes when this is safe. - Add vholdl() and vdropl() so that callers who already own the interlock can avoid race conditions and unnecessary unlocking. - Move the VOP_GETATTR() in vflush() into the WRITECLOSE conditional case. - Hold the interlock before droping the mntlist_mtx in vflush() to avoid a race. - Fix locking in vfs_msync().
* Remove any VOP_PRINT that redundantly prints the tag.njl2002-09-181-8/+8
| | | | | | Move lockmgr_printinfo() into vprint() for everyone's benefit. Suggested by: bde
* Remove all use of vnode->v_tag, replacing with appropriate substitutes.njl2002-09-141-7/+9
| | | | | | | | | | | | v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)
* Indentation does not make a block.. need curly braces too.julian2002-09-111-1/+2
| | | | Submitted by: Eagle-eyes evans <bde@freebsd.org>
* Completely redo thread states.julian2002-09-111-2/+6
| | | | Reviewed by: davidxu@freebsd.org
* Fix an inherited style bug: compare with NOCRED instead of NULL.phk2002-09-051-1/+1
| | | | Sponsored by: DARPA & NAI Labs.
* Introduce new extattr_check_cred() function which implements the canonicalphk2002-09-051-0/+34
| | | | | | crential washing for extended attributes. Sponsored by: DARPA & NAI Labs.
* Replace various spelling with FALLTHROUGH which is lint()ablecharnier2002-08-251-1/+1
|
* - Fix a mistake in my last few commits. The PDROP flag stops msleep fromjeff2002-08-231-4/+1
| | | | | | | re-acquiring the mutex. Pointy hat to: me Noticed by: tegge
* - Make vn_lock() vget() and VOP_LOCK() all behave the same way WRTjeff2002-08-221-15/+7
| | | | | | | | LK_INTERLOCK. The interlock will never be held on return from these functions even when there is an error. Errors typically only occur when the XLOCK is held which means this isn't the vnode we want anyway. Almost all users of these interfaces expected this behavior even though it was not provided before.
* - Fix interlock handling in vn_lock(). Previously, vn_lock() could returnjeff2002-08-221-17/+9
| | | | | | | with interlock held in error conditions when the caller did not specify LK_INTERLOCK. - Add several comments to vn_lock() describing the rational behind the code flow since it was not immediately obvious.
* - Document two cases, one in vget and the other in vn_lock, where the statejeff2002-08-211-0/+1
| | | | | of interlock on exit is not consistent. There are probably several bugs relating to this.
* - If vn_lock fails with the LK_INTERLOCK flag set, interlock will not bejeff2002-08-211-2/+3
| | | | | | | released. vcanrecycle() failed to unlock interlock under this condition. - Remove an extra VOP_UNLOCK from a failure case in vcanrecycle(). Pointed out by: rwatson
* - Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKEDjeff2002-08-211-6/+59
| | | | | | | | - Use the new VI asserts in place of the old mtx_assert checks. - Add the VI asserts to the automated lock checking in the VOP calls. The interlock should not be held across vops with a few exceptions. - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held when LK_INTERLOCK is set.
* - Extend the vnode_free_list_mtx to cover numvnodes and freevnodes. Thisjeff2002-08-131-3/+15
| | | | was done only some of the time before, and now it is uniformly applied.
* - Introduce a new struct xvfsconf, the userland version of struct vfsconf.mux2002-08-101-17/+57
| | | | | | | | | | | | | | | | | | | | | | | - Make getvfsbyname() take a struct xvfsconf *. - Convert several consumers of getvfsbyname() to use struct xvfsconf. - Correct the getvfsbyname.3 manpage. - Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the kernel, and rewrite getvfsbyname() to use this instead of the weird existing API. - Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist sysctl. - Convert a vfsload() call in nfsiod.c to kldload() and remove the useless vfsisloadable() and endvfsent() calls. - Add a warning printf() in vfs_sysctl() to tell people they are using an old userland. After these changes, it's possible to modify struct vfsconf without breaking the binary compatibility. Please note that these changes don't break this compatibility either. When bp will have updated mount_smbfs(8) with the patch I sent him, there will be no more consumers of the {set,get,end}vfsent(), vfsisloadable() and vfsload() API, and I will promptly delete it.
* - Move some logic from getnewvnode() to a new function vcanrecycle()jeff2002-08-051-69/+95
| | | | | - Unlock the free list mutex around vcanrecycle to prevent a lock order reversal.
* - Replace v_flag with v_iflag and v_vflagjeff2002-08-041-101/+154
| | | | | | | | | | | | | | | - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
* Include file cleanup; mac.h and malloc.h at one point had orderingrwatson2002-08-011-1/+1
| | | | | | relationship requirements, and no longer do. Reminded by: bde
* Nit in previous commit: the correct sysctl type is "S,xvnode"des2002-07-311-1/+1
|
* Initialize v_cachedid to -1 in getnewvnode().des2002-07-311-42/+66
| | | | | | | Reintroduce the kern.vnode sysctl and make it export xvnodes rather than vnodes. Sponsored by: DARPA, NAI Labs
* Note that the privilege indicating flag to vaccess() originally usedrwatson2002-07-311-1/+1
| | | | by the process accounting system is now deprecated.
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+8
| | | | | | | | | | | | | kernel access control. Invoke the necessary MAC entry points to maintain labels on vnodes. In particular, initialize the label when the vnode is allocated or reused, and destroy the label when the vnode is going to be released, or reused. Wow, an object where there really is exactly one place where it's allocated, and one other where it's freed. Amazing. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* - Backout the patch made in revision 1.75 of vfs_mount.c. The vputs herejeff2002-07-291-0/+1
| | | | | | | were hiding the real problem of the missing unlock in sync_inactive. - Add the missing unlock in sync_inactive. Submitted by: iedowse
* Wire the sysctl output buffer before grabbing any locks to preventtruckman2002-07-281-0/+1
| | | | | | | SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.
* Teach discretionary access control methods for files about VAPPENDrwatson2002-07-221-4/+4
| | | | | | | and VALLPERM. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Add support to UFS2 to provide storage for extended attributes.mckusick2002-07-191-51/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags |= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.
* Change utimes to set the file creation time (for filesystems thatmckusick2002-07-171-0/+2
| | | | | | | | support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.
* Replace the global buffer hash table with per-vnode splay trees using adillon2002-07-101-85/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc
* - Use standard locking functions in syncer's opvjeff2002-07-091-6/+47
| | | | | - vput instead of vrele syncer vnodes in vfs_mount - Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP
* - Don't hold the vn lock while calling VOP_CLOSE in vclean().jeff2002-07-071-6/+10
|
* - BUF_REFCNT() seems to be the preferred method for verifying a locked buf.jeff2002-07-071-3/+9
| | | | | | Tell vop_strategy_pre() to use this instead. - Ignore B_CLUSTER bufs. Their components are locked but they don't really exist so they don't have to be. This isn't ideal but it is safe.
* Fix a mistake in my last commit. Don't grab an extra reference to the objectjeff2002-07-061-3/+1
| | | | in bp->b_object.
* Fixup uses of GETVOBJECT.jeff2002-07-061-0/+4
| | | | | | | | | | | - Cache a pointer to the vnode's object in the buf. - Hold a reference to that object in addition to the vnode's reference just to be consistent. - Cleanup code that got the object indirectly through the vp and VOP calls. This fixes at least one case where we were calling GETVOBJECT without a lock. It also avoids an expensive layered call at the cost of another pointer in struct buf.
* - Add vop_strategy_pre to validate VOP_STRATEGY locking.jeff2002-07-061-0/+16
| | | | | | | | | - Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.
* Add "vop_rename_pre" to do pre rename lock verification. This is enabled onlyjeff2002-07-061-1/+20
| | | | with DEBUG_VFS_LOCKS.
* Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot()mux2002-07-031-71/+0
| | | | which was #if 0'd and is not likely to be used now.
OpenPOWER on IntegriCloud